Bamboo Weekly #151: PyPI in 2025

Happy New Year! I'm so delighted that you've chosen to spend part of your 2025 with me. I hope that you enjoy reading Bamboo Weekly as much as I enjoy writing it. There's lots more to come in 2026, I assure you!

As I wrote in my year-in-review blog post (https://lerner.co.il/2025/12/23/reuvens-2025-in-review/), the Python Software Foundation is crucial to the present and future success of Python. Consider joining the PSF as a paid member, at https://www.python.org/psf/membership/. This supports Python, and also allows you to vote in the elections, thus shaping the future of the language and ecosystem.

One of the PSF's most visible projects is PyPI, the Python Package Index (https://pypi.org), from which we download Python packages. And so, I thought it would be fitting that the final edition of BW for this year look at trends on PyPI in the last year.

You cannot get PyPI download information directly. That's mostly because so many people now use Python that tracking each individual download would be impossible for most people or organizations. Besides, many of those downloads are cached, in order to avoid overloading the PyPI servers.

You can get summaries at https://pypistats.org, which the PSF recently took over. Beyond that, you can get full download information, about every download of every package on PyPI, via Google BigQuery. I hadn't ever used BigQuery before, so I did the most 2025-appropriate thing possible, namely working with Claude to learn about it and craft some queries that would allow me to look through PyPI download records.

Data and five questions

This week's data, as I wrote, comes from Google BigQuery. I've provided programs to execute each query and save the results to a Parquet file. Note that BigQuery costs money beyond the first 1TB that you use each month; running these queries even a few times shouldn't go past that free tier.

bw-151-programs

bw-151-programs.zip

4 KB

To run the programs, you'll need to create a project (I called mine "bw-151") on Google Cloud, setting up application credentials to use BigQuery. You can see that each of my programs starts by loading my application credentials and setting the client. I then create a query in a string, using BigQuery's SQL syntax. I run the query with client.query, get a Pandas data frame back, and then save the data frame to a parquet file. It definitely took me a while to wrestle BigQuery into submission, first with generating the appropriate keys and permissions, and then with installing everything I needed – but again, Claude helped me quite a bit, both with configuring BigQuery and in writing the SQL queries.

Paid subscribers to Bamboo Weekly, or to my LernerPython membership program (at https://LernerPython.com), can download the data files from the end of this message. You also get access to all questions and answers, downloadable notebooks, and an invitation to monthly office hours.

Learning goals for this week include: Loading Parquet files, dates and times, pivot tables, and plotting.

Here are my five questions and tasks for this week:

Read the "daily downloads" data into a Pandas data frame. (The Parquet files was written using a weird type that Arrow and PyArrow can handle, but pure Pandas cannot. So use dtype_backend='pyarrow', in order for it to work.) Was the mean number of downloads higher on weekends or weekdays? PyCon US took place from May 14-18, 2025. Was the mean on those days higher or lower than the rest of the year? (And by how much did it differ?) On which 5 days were there the most downloads? The fewest?
The uv package manager has become extremely popular over the last year. (I've even created a free 15-part e-mail course, which you can take at https://uvCrashCourse.com.) I'm curious to know if uv has improved its market share over the course of the year. Read the per-day, per-installer data into a Pandas data frame. Create a stacked bar plot showing the proportion of downloads used by each installer. Do you see an obvious increase in the use of uv? Is there a percentage increase in uv over time?

Bamboo Weekly #151: PyPI in 2025

Data and five questions

Pandas office hours — Wednesday, December 17th (recording)