Skip to content
2 min read · Tags: web-scraping multiple-files regular-expressions grouping

BW #135: Airline seats

Get better at: Scraping, working with multiple files, regular expressions, grouping, and pivot tables

BW #135: Airline seats

I have been on quite a travel spree for the last few months, mostly attending conferences but also taking a short vacation. I've thus been on a lot of airplanes, with a number of different carriers, and many different planes.

Just yesterday I few on Thai Airways from Taipei to Bangalore, where I'm attending (and keynoting at) PyCon India this weekend. And I have to say, the economy seats on Thai Airways were surprisingly comfortable and roomy.

That got me thinking: How much of a difference is there between airline seats? And is there a data set we can use to figure out those differences?

I was happy to find that SeatGuru, which often has useful information about which seats are better and worse on a given flight, has a set of Web pages answering these exact questions (https://www.seatguru.com/charts/generalcharts.php).

This week, we'll read through that data, and attempt to find out if there's a big difference between carriers, seats, and other amenities.

Data and five questions

This week's data, as I mentioned, comes from SeatGuru, from https://www.seatguru.com/charts/generalcharts.php . But that page doesn't have the data itself; that is spread across six different pages. Each page has information about a different duration flight (short haul or long haul, where six hours is the cap for short-haul flights) and the various classes they offer.

Learning goals for this week include: Scraping data from Web pages, regular expressions, working with multiple files, pivot tables, and grouping.

This week, there is no file to download, because scraping the data is part of the plan. However, paid subscribers will be able to download the data, as well as my Marimo notebook, in tomorrow's solutions.

Here are this week's five tasks and questions: