Before we begin, several announcements:
- We'll hold office hours for paid subscribers on Sunday, January 25th. Come with any questions you have! A message with a Zoom link was sent out yesterday; let me know if you didn't get it.
- On February 1st, I'll start the 4th cohort of HOPPy, my Hands-on Projects in Python class. This time, every participant will create their own data dashboard, on a topic of their choosing, using Marimo. Not sure? Join the info session I'm running on Monday, January 26th: https://us02web.zoom.us/webinar/register/WN_YbmUmMSgT2yuOqfg8KXF5A
- My newest classes, AI-Powered Python Practice Workshop, and AI-Powered Pandas Practice Workshop, are happening on February 2nd and 9th, respectively. If you want to get better at programming and data analysis executed via Claude, then you'll love these! More info is coming on Thursday.
What is a university? When I was in high school, I assumed that it was just a harder, more exclusive place to learn than high school.
But while the best-known universities in the world teach classes, they are mainly research institutions. Most of a professor's time is spent doing research — asking new questions and trying to answer them in the best, most reliable ways. Graduate students are in a form of apprenticeship program, learning how to conduct research by participating in it, helping their advisor to advance knowledge in their field.
As a friend once told me, "A university is a place where people go in and ideas come out."
These ideas can have a huge effect on a country's economy. This is especially true in the United States, where the federal government has sponsored numerous university research labs for many decades. From those labs have come everything from radar to the Internet, from medicines to flash bulbs. And thanks in no small part to that government funding, American research universities have been at the top of world rankings for a very long time.
Until now, that is: According to the Leiden university rankings, from Centre for Science and Technology Studies at Leiden University in the Netherlands, most of the top 10 universities in the world are now in China. The reason, according to a New York Times story from earlier this week, is not that US universities are producing less research than before. On the contrary, they're producing more research than ever before. But the Chinese universities are advancing even faster (https://www.nytimes.com/2026/01/15/us/harvard-global-ranking-chinese-universities-trump-cuts.html?unlocked_article_code=1.GFA.qfme.XigRP5r4bz6d&smid=url-share).
These rankings assume that research is a university's top priority, and that two good ways to measure a university's research are to count the number of journal articles it produces, and also how many collaborations there are with other universities.
However, we won't be writing any analysis code! In the spirit of my recent blog post (https://lerner.co.il/2026/01/21/were-all-vcs-now-the-skills-developers-need-in-the-ai-era/), my upcoming AI-Powered Python/Pandas workshops, and trends I'm seeing in the industry, I thought that it would be fun and interesting to use Claude Code (or the AI system of your choice) to answer the queries for us.
Data and six questions
This week's data comes from Leiden University, https://traditional.leidenranking.com/downloads. The original Excel data can be downloaded from: https://zenodo.org/records/17473109/files/CWTS%20Leiden%20Ranking%20Traditional%20Edition%202025.xlsx?download=1. The most recent information seems to be from the period 2020-2023.
A data dictionary is at https://traditional.leidenranking.com/information/indicators.
Paid subscribers, as usual, get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.
Learning goals for this week include using AI to write queries, working with Excel files, grouping, pivot tables, and plotting.
Remember that I don't want you to write these queries yourself, but rather use Claude Code to do so on your behalf. What configuration will you need to give it? What works (and doesn't) in your prompt? How accurate are the results? And how good are the queries that it produces?
Here are this week's six tasks and questions; I'll be back tomorrow with my solutions and explanations.
- Download the data file and import it into a Pandas data frame. Keep only those lines for which the period is either 2006-2009 or 2020-2023. In each of those two periods, find the countries for which
P_top_1(greatest number of frequently cited publications) andPP_top_1(greatest percentage of frequently cited publications) had the greatest mean value. Do we see any differences between these two lists? - From the 2006-2009 period to the 2020-2023 period, which five countries rose the most in the
pp_top_1numbers? Which five countries dropped the most?