- Audience level:
- Novice

Using Python, Pandas, Scikit-learn, and Bokeh I will explore detailed quarterly balance sheet data for all deposit-taking U.S banks for the years 1992 through 2015. I will assess the statistical support for Zipf's Law as a model for the upper tail of the bank size distribution and study how the upper tail of this distribution has evolved over the last 25 years.

Using Python, Pandas, Scikit-learn and Bokeh I will explore a 5+ GB data set of publicly available balance sheet data covering all Federal Deposit Insurance Corporation (FDIC) regulated U.S. banks from Q1 1992 through Q4 2015. The objective will be to understand how to measure the size of a bank and to model the distribution of bank size.

Specifically, I will develop a few different measures of bank size and then I will assess the statistical support for Zipf's Law (i.e., a the power law distribution with a scaling exponent of roughly α=2) as an appropriate model for the upper tail of the size distribution of U.S. banks. Although I will find statistically significant departures from Zipf's Law for most measures of bank size in most years, a power law distribution with α = 1.9 out performs other plausible heavy-tailed alternative distributions.

If there is time, I may discuss some possible policy implications suggested by the data analysis that I have done so far.