Skip to content

Bamboo Weekly #162: Spotify and car accidents

Get better at: Combining files, dates and times, grouping, joining, correlations, and plotting

Bamboo Weekly #162: Spotify and car accidents

A few weeks ago, while listening to Slate Money (https://slate.com/podcasts/slate-money), I heard a statistic that was simultaneously fantastic and awful: A recent paper published by the National Bureau of Economics Research (https://www.nber.org/) found that traffic fatalities increased on days that popular albums were released (https://www.nber.org/papers/w34866).

In other words: When Taylor Swift releases a new album, many people will stop whatever they're doing and start to listen. Today, a huge number of people listen via streaming services, such as Spotify. And of course, many of them will listen to Spotify while driving. The NBER researchers found that on days when major albums were released, the number of traffic fatalities was higher than on other days. Meaning, more or less, that so many people were listening to Spotify while driving that they got into accidents.

Now, one of the first things that you learn in a statistics class is that "correlation isn't causation," so we're not going to accuse Taylor Swift of homicide, or even reckless endangerment — at least, not just yet. But this was such an amazing set of findings that I thought it would be interesting (in a semi-morbid kind of way) to investigate this topic, and see if we could find similar results.

The NBER paper used data from two sources:

This week, we'll dig into these data sets, and see what we can find!

Paid subscribers, both to Bamboo Weekly and to my LernerPython+data membership program (https://LernerPython.com) get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.

Learning goals for this week include combining multiple files, dates and times, joins, and grouping.

Data and five questions

The data for this week has two sources:

Note that while the Spotify data looks at songs, the NBER paper looked at albums. We thus won't be able to replicate the paper precisely, but we can look at something similar.

In addition, here's a Pandas series containing the dates on which the 10 most popular albums were released, as described in the NBER paper:

album_releases = pd.Series(pd.to_datetime([
    '2022-10-21',  # Midnights - Taylor Swift
    '2021-09-03',  # Certified Lover Boy - Drake
    '2022-05-06',  # Un Verano Sin Ti - Bad Bunny
    '2018-06-29',  # Scorpion - Drake
    '2022-05-13',  # Mr. Morale - Kendrick Lamar
    '2022-05-20',  # Harry's House - Harry Styles
    '2022-11-04',  # Her Loss - Drake & 21 Savage
    '2021-08-29',  # Donda - Kanye West
    '2021-11-12',  # Red (Taylor's Version) - Taylor Swift
    '2020-07-24',  # Folklore - Taylor Swift
]))

Here are my five questions for this week. I'll be back tomorrow with solutions and explanations: