A cruise can be a wonderful way to see the world. But passengers on the HV Hondius learned last week that their cruise had gone horribly wrong. A number of passengers were infected with a form of hantavirus (https://en.wikipedia.org/wiki/Hantavirus_infection). This has already led to three deaths and dozens of hospitalizations (https://www.nytimes.com/2026/05/16/world/europe/hantavirus-hondius-cruise.html?unlocked_article_code=1.j1A.3KML.uri7nsW30w0Z&smid=url-share).
This outbreak is clearly awful, although from what I've read, the hantavirus cannot spread as easily as the covid-19 virus did, back in 2020. (Can you believe that was six years ago?) That said, getting infected can be quite dangerous, which is why people are being tested and quarantined if they test positive.
This week, we'll look at data about the hantavirus outbreak, including what happened and when.
Paid subscribers, both to Bamboo Weekly and to my LernerPython+data membership program (
https://LernerPython.com) get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.
Learning goals for this week include working with CSV files, cleaning data, dates and times, pivot tables, null values, and plotting with Plotly.
Data and five questions
This week's data comes from a GitHub repo that is being updated on a regular basis with information from the hantavirus infection on the HV Hondius, at https://github.com/kraemer-lab/Hondius_hantavirus_h2026 . We'll specifically be looking at a CSV file, data/linelist/2026_hantavirus.csv, from the repository.
Once you have downloaded the repo, you can work on these five tasks and problems. I'll be back tomorrow with my full solutions and explanations:
- Read the list of people exposed to hantavirus into a Pandas data frame. Make sure that all of the date columns have a
datetimedtype. How long were passengers on the ship? Was it the same for all passengers? - Combine the
outcomeandtreatmentcolumns into a single column. From this data, what were the three most common treatments and outcomes? Does it matter in which order you perform the combination?