Skip to content
4 min read · Tags: plotting plotly datetime parquet pivot-table

Bamboo Weekly #151: PyPI in 2025

Get better at: Working with APIs, grouping, dates and times, and plotting with Plotly.

Bamboo Weekly #151: PyPI in 2025

Happy New Year! I'm so delighted that you've chosen to spend part of your 2025 with me. I hope that you enjoy reading Bamboo Weekly as much as I enjoy writing it. There's lots more to come in 2026, I assure you!

As I wrote in my year-in-review blog post (https://lerner.co.il/2025/12/23/reuvens-2025-in-review/), the Python Software Foundation is crucial to the present and future success of Python. Consider joining the PSF as a paid member, at https://www.python.org/psf/membership/. This supports Python, and also allows you to vote in the elections, thus shaping the future of the language and ecosystem.

One of the PSF's most visible projects is PyPI, the Python Package Index (https://pypi.org), from which we download Python packages. And so, I thought it would be fitting that the final edition of BW for this year look at trends on PyPI in the last year.

You cannot get PyPI download information directly. That's mostly because so many people now use Python that tracking each individual download would be impossible for most people or organizations. Besides, many of those downloads are cached, in order to avoid overloading the PyPI servers.

You can get summaries at https://pypistats.org, which the PSF recently took over. Beyond that, you can get full download information, about every download of every package on PyPI, via Google BigQuery. I hadn't ever used BigQuery before, so I did the most 2025-appropriate thing possible, namely working with Claude to learn about it and craft some queries that would allow me to look through PyPI download records.

Data and five questions

This week's data, as I wrote, comes from Google BigQuery. I've provided programs to execute each query and save the results to a Parquet file. Note that BigQuery costs money beyond the first 1TB that you use each month; running these queries even a few times shouldn't go past that free tier.

To run the programs, you'll need to create a project (I called mine "bw-151") on Google Cloud, setting up application credentials to use BigQuery. You can see that each of my programs starts by loading my application credentials and setting the client. I then create a query in a string, using BigQuery's SQL syntax. I run the query with client.query, get a Pandas data frame back, and then save the data frame to a parquet file. It definitely took me a while to wrestle BigQuery into submission, first with generating the appropriate keys and permissions, and then with installing everything I needed – but again, Claude helped me quite a bit, both with configuring BigQuery and in writing the SQL queries.

Paid subscribers to Bamboo Weekly, or to my LernerPython membership program (at https://LernerPython.com), can download the data files from the end of this message. You also get access to all questions and answers, downloadable notebooks, and an invitation to monthly office hours.

Learning goals for this week include: Loading Parquet files, dates and times, pivot tables, and plotting.

Here are my five questions and tasks for this week: