First and foremost: THANK YOU. This issue marks the 3rd anniversary of Bamboo Weekly. I started it because I was tired of seeing so many data-analytics exercises that were boring and/or based on made-up data. Bamboo Weekly is one of the favorite things that I write, and I hope that you enjoy reading it as much as I enjoy writing it.
If you have friends or colleagues who use Pandas, then please tell them about Bamboo Weekly. I want everyone to see how fun, relevant, and interesting analyzing data can be.
And if you can think of ways that I can improve this newsletter, making it more interesting or relevant, then please drop me a line. I'm always happy to hear from you.
To those of you who have a paid subscription, either directly here at BambooWeekly.com or via my LernerPython platform, I give an additional "thank you," for making it possible to spend about a day each week researching, solving, and writing these newsletters. Your support makes it all possible.
With that, let's move on to this week's issue:
The 2026 Winter Olympics (https://en.wikipedia.org/wiki/2026_Winter_Olympics) will open later this week in northern Italy. It'll bring lots of excitement and entertainment, as some of the world's most impressive athletes compete on ice and snow.
But for data nerds? The Olympics provides a treasure trove of statistics, allowing us to make all sorts of interesting comparisons.
This week, we'll thus look at data about the Winter Olympics, allowing us to think about the events, the countries, and the athletes from a Pandas-centric perspective. (We can't include data about this year's competition; I'll leave such predictive analytics to newsletters about machine learning.)
But wait: Given the winter theme, I thought it would also be appropriate to compare the style and speed of Pandas with Polars, another data-analysis tool. (You know, because polar bears live in the snow and ice, and ... OK, you probably knew that.) I'll thus ask you to perform each of these analyses twice, once in Pandas and once in Polars.
Paid subscribers, as usual, get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.
Learning goals for this week include working with CSV files, cleaning data, joins, pivot tables, and using Polars.
Data and five questions
This week's data comes from a data set on GitHub from developer Keith Galli, at https://github.com/KeithGalli/Olympics-Dataset. We will use several of the files in the "clean-data" section of this repository, specifically:
bios.csv, with information about the athletes' biographiesnoc_regions.csv, listing the regions in the Olympic games, what we would normally call "countries," except that there isn't a perfect overlap between the notion of a team and a countryresults.csv, with the results from the Olympic games
Here are this week's five questions:
- Load each of the three files into a Pandas data frame, keeping only the rows for the Winter Olympics when reading
results.csv. Make sure that theborn_dateanddied_datecolumns (inbios.csv) aredatetimevalues. How many times has the Winter Olympics taken place? How many different athletes have competed? What country has won the greatest number of medals? (Show the country name, not the NOC code.) What country has won the greatest number of gold medals? (Again show the country name, not the NOC code.) - Repeat all of the above, but with Polars. What is the speed difference for each query?