Skip to content
3 min read · Tags: datetime csv pivot-table fireducks grouping joins

BW #118: Flight delays

Get better at: CSV files, working with dates and times, pivot tables, and plotting

BW #118: Flight delays

[Administrative note: Are you at PyCon US? Please come and say "hi"! I'm giving a talk at tomorrow's Python education summit (https://us.pycon.org/2025/events/education-summit/) on the use of AI in teaching and learning Python, and then "The PyArrow revolution in Pandas," on Friday morning (https://us.pycon.org/2025/schedule/presentation/69/). I have a booth in the expo advertising my new-and-improved LernerPython.com learning platform — come and get a free T-shirt, as well as hang out to chat!]

This week marks the start of PyCon US 2025, with lots of talks, tutorials, and activities on the agenda. People come from all over the world to attend PyCon US – and this year, a number of them found their flights into Newark Liberty Airport were delayed, canceled, or rerouted to another airport.

Why? It seems that there were two basic sets of issues: Equipment failures and staffing shortages. The staffing shortages ranged from electricians (needed to fix problematic equipment, as well as do preventative maintenance) to air-traffic controllers (who were already woefully understaffed). The combination has led to a number of terrifying stories about air-traffic controllers unable to know where planes are for 90-second periods. And Newark, as a busy airport, requires more training than most others (
https://www.nytimes.com/2025/05/12/business/newark-airport-delays-staffing-shortage.html?unlocked_article_code=1.HE8.Ke_D.yCujYtg90-cx&smid=url-share).

This week, we'll look at data about airport delays, based on data from the Department of Transportation's Bureau of Transportation Statistics. The most recent data is from January, which means that we won't be able to look at data from the most recent month. But when that data does come out, we'll be able to apply the same techniques to understand the scope of the issues.

Normally, I ask you to solve Bamboo Weekly problems using Pandas. But this week, I'll ask you to perform each query twice, once with regular Pandas, and a second time using a relatively new package, Fireducks (https://fireducks-dev.github.io/), which implements much of the Pandas API while claiming to improve performance dramatically. We'll put that to the test, running a number of queries in both Pandas and Fireducks, and seeing which runs faster.

Data and five questions

This week's data comes from the Bureau of Transportation Statistics. You can fill out a form to indicate which fields you want to download from here:

https://www.transtats.bts.gov/DL_SelectFields.aspx?gnoyr_VQ=FGJ&QO_fu146_anzr=b0-gvzr

You'll want the following fields:

You will also need to download two more files, the "lookup table" (really, another CSV file) for the reporting airline, and the lookup table (yes, a third CSV file) for the airport codes.

Learning goals for this week include Fireducks, grouping, pivot tables, dates and times, and joins.

Paid subscribers can download these files at the end of this message.

I'll be back tomorrow with full explanations and solutions, including a downloadable Jupyter notebook.

Here are my five questions: