[Administrative note: The 7th cohort of my Python Data Analysis Bootcamp (PythonDAB) will start on June 19th! To learn more about this 4-month intense-but-intimate mentoring program in Python, Git, and Pandas, join me at a Webinar on Monday, June 9th: https://us02web.zoom.us/webinar/register/WN_by8ZHiQkRPGmXjZrkaPeVw .]
In early 2020, when the Covid-19 pandemic started to spread around the world, and people were making comparisons with the Spanish flu pandemic of 1918, I read "The Great Influenza: The Story of the Deadliest Pandemic in History," by John Barry.
Aside from its (fascinating, chilling, and prescient) storytelling, the book described how researchers at Johns Hopkins University applied scientific rigor and analysis to medical issues, leading to huge, demonstrable advances in medical diagnosis and treatment.
Those researchers at Hopkins sparked a revolution, one which has benefitted the United States — and the world — to this day. For nearly a century, the US government has funded not just medical research, but a wide array of other research projects. Many are carried out at universities, but others are done at government laboratories and non-profit institutions.
There have always been debates about how much funding science and medical research should receive, as well as how to prioritize and allocate funding. But over the last few months, the Trump administration has cancelled many billions of dollars in funding for scientific and medical research -- including research that had already started. They also fired many thousands of scientists and researchers.
This has led to numerous questions and concerns. A recent article in the Economist warns that the US might well experience a brain drain, as researchers are wooed to Europe and China (https://www.economist.com/science-and-technology/2025/05/21/america-is-in-danger-of-experiencing-an-academic-brain-drain).
Meanwhile, in the Atlantic, Adam Serwer asks if this is a self-imposed new Dark Age, similar to the period following the Roman Empire (https://www.theatlantic.com/ideas/archive/2025/05/trump-defund-schools-research-republicans/682742/?gift=oY9TCwcAO4lary6E0C-eKcBeTSLQWT3s3bxdCxpXbps&utm_source=copy-link&utm_medium=social&utm_campaign=share).
These certainly raises questions about how the US expects to remain a powerhouse in science, technology, and medicine -- as well as the many companies founded to commercialize the products of this research. The effects are likely to be bad, both in the short term (for all of the now-jobless researchers and students) and in the long term (for the US, the West, and the world).
The Economist's coverage of these funding cuts mentioned Grant Watch (https://grant-watch.us/) which tracks research grants cancelled at the National Institutes of Health (NIH) and National Science Foundation (NSF), traditionally two of the largest backers of science and medical research. This week, we'll look at some of the data collected by Grant Watch, to understand what kinds of research have been affected, who would be receiving these grants, and the reasons why some of these grants were terminated.
Data and six questions
This week's data comes from Grant Watch (https://grant-watch.us/), about which you can read on its home page. Their data comes from a variety of sources, including the government and individual researchers who have reported on terminated grants. NIH and NSF data is tabulated separately, so I'll ask a few questions from each data set. You can, of course, dig into either or both of them more deeply if and when you need.
Paid subscribers can download copies of the data files via the link at the end of this message.
I'll be back tomorrow with my full explanations, as well as downloadable notebooks (Marimo and Jupyter) and a one-click Web version of the notebook.
Learning goals this week include working with CSV files, grouping, working with text, formatting, and date/time data.
- Read the NIH data into a Pandas data frame. Which 15 institutions had the greatest dollar amount of grants (
usa_total_award
) terminated? Display each institution in "title" format (i.e., with each word starting with a capital letter) and with each dollar amount with commas before every three digits. - Repeat this task with the NSF data, looking at
nsf_total_budget
. You'll want thensf_startdate
column to be treated as adatetime
value. Do you see any overlap between the NIH and NSF results? What issues might we have in trying to combine the two data sets?