BW #7: Bank failures (solution)
Last week, Silicon Valley Bank collapsed. How often does this happen? Where are collapsed banks located, and big are they?
This week’s topic: Bank failures
This week, we looked at describing bank failures and assistance, which I (perhaps unfairly!) lumped into a single category of “failures.” This, of course, comes in the wake of the collapse of Silicon Valley Bank.
Our data set comes from the FDIC itself. You can download it in CSV format from:
https://banks.data.fdic.gov/explore/failures?aggReport=detail&displayFields=NAME%2CCERT%2CFIN%2CCITYST%2CFAILDATE%2CSAVR%2CRESTYPE%2CCOST%2CRESTYPE1%2CCHCLASS1%2CQBFDEP%2CQBFASSET&endFailYear=2023&sortField=FAILDATE&sortOrder=desc&startFailYear=1934
Our questions for this week are:
According to our document, how many bank failures have there been since the FDIC was opened?
What was the earliest failure in our data set? What is the most recent failure in our data set?
In which five years did the greatest number of banks fail?
In which three states were the greatest number of failed banks?
What was the average market capitalization of the banks that failed? Given a capitalization of $200b, did that make SVB above, below, or about average?
When was the most recent failure greater than SVB?
Bank failures can be resolved in several different ways. How often, historically, have we seen each resolution? Were the odds good that SVB's uninsured depositors would get their money?
What about bank failures in the last 25 years -- if we just look at those, do the odds change?
What was the mean estimated loss in bank failures? What proportion of a bank's assets did this generally involve?
The learning goals for this week include working with dates and strings. And some insights into whether people were right to panic about losing their money when SVB went under.
Discussion
First, before anything else, I did my standard setup for working with Pandas:
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
Then I had to retrieve the file and turn it into a data frame. Getting the URL accurate took a bit of time; it was only as I was doing the calculations that I discovered the default download starts in 2012, long after the FDIC started!
The URL is thus
https://banks.data.fdic.gov/explore/failures?aggReport=detail&displayFields=NAME%2CCERT%2CFIN%2CCITYST%2CFAILDATE%2CSAVR%2CRESTYPE%2CCOST%2CRESTYPE1%2CCHCLASS1%2CQBFDEP%2CQBFASSET&endFailYear=2023&sortField=FAILDATE&sortOrder=desc&startFailYear=1934
That URL takes us to a page from which we can download the data.
In my code, I handled it as follows, with read_csv, passing it a URL (rather than a downloaded filename):
data_url = 'https://pfabankapi.app.cloud.gov/api/failures?fields=NAME%2CCERT%2CFIN%2CCITYST%2CFAILDATE%2CSAVR%2CRESTYPE%2CCOST%2CRESTYPE1%2CCHCLASS1%2CQBFDEP%2CQBFASSET&filters=FAILYR%3A%5B1934%20TO%202023%5D&limit=10000&react=true&sort_by=FAILDATE&sort_order=desc&subtotal_by=RESTYPE&total_fields=QBFDEP%2CQBFASSET%2CCOST&format=csv&download=true&filename=bank-data'
df = pd.read_csv(data_url)
That loaded the data frame just fine. However, it wasn’t quite good enough for our purposes. That’s because the “FAILDATE” column contains date information, and we’ll need the date for our calculations. We can force Pandas to treat FAILDATE as a datetime column by passing the parse_dates keyword argument:
df = pd.read_csv(data_url,
parse_dates=['FAILDATE'])
We won’t be needing all columns of the data frame, but it’s a small enough data set that I’m OK keeping them around, without any more filtering.
And with that, let’s move onto the first question!
According to our document, how many bank failures have there been since the FDIC was opened?
Assuming that each row in the data frame represents one bank failure, we merely need to grab the “shape” attribute from our data frame:
df.shape
(Remember that shape is an attribute, not a method! Trying to invoke it will result in an error.)
Of course, that returns a 2-element tuple with the number of rows and the number of columns. So you would have to retrieve index 0 from the resulting tuple to just get the number of rows.
Keep reading with a 7-day free trial
Subscribe to