Skip to content
3 min read · Tags: web-scraping multiple-files datetime grouping plotting

BW 146: Thanksgiving travel

Get better at: Working with multiple files, scraping, dates and times, grouping, and plotting.

BW 146: Thanksgiving travel

Two administrative notes:

  1. Want to improve your Python, Git, and Pandas skills in a small group that I personally mentor — with a clear syllabus and tons of practice? Cohort 8 of my Python Data Analysis Bootcamp (PythonDAB) will start on December 4th. Watch the recorded info session at https://www.youtube.com/watch?v=pTDP9rSv75Y, or sign up for an interview about the bootcamp at https://savvycal.com/reuven/pythondab . Full info is at https://PythonDAB.com .
  2. I'm holding a Black Friday sale for new subscribers to my LernerPython+data membership program who get an annual membership. Use the BF2025 coupon code at https://LernerPython.com . This sale only lasts through Monday, December 1st, so don't wait!

And now, for this week's challenges:

I'm currently in the United States for Thanksgiving, the first time I've celebrated this holiday in a good number of years. Thanksgiving has long been known as the peak travel season in the United States, with traffic jams and long lines at airports, and I'm expecting to see my fair share of fellow frustrated travelers over the coming days.

But wait: Do more Americans really travel around Thanksgiving than at other times? Or is that just a myth?

Fortunately, we can find out, at least to some degree, by analyzing data. Automobile data isn't readily available, but the Transportation Safety Administration (TSA), the government agency responsible for scanning our bags and bodies when we travel, publishes a daily report of how many people it checked on each day. These records go back to 2018, allowing us to see, at least over the last few years, how many Americans traveled by air, and when the peak travel days are.

This week, we'll examine this data. Along the way, we'll use a lot of the Pandas functionality that looks at dates and times, and a variety of calculations we can do with them.

Data and five questions

This week's data comes from the TSA's "travel checkpoint numbers" pages, based at https://www.tsa.gov/travel/passenger-volumes . That page shows, in an HTML table, the number of passengers screened by TSA for each day of this year. Other pages, to which there are links from that main page, have similar data for each previous year, starting with 2019.

Learning goals for this week include: Scraping data, combining multiple files, working with dates and times, grouping, and plotting.

Paid subscribers, including members of my LernerPython.com membership program, get the data files provided to them (except for this week, since retrieving them is part of the task), as well as all of the questions and answers each week, downloadable notebooks, and participation in monthly office hours.

Here are my five questions for this week. I'll be back tomorrow with solutions and full explanations: