Before we get into this week's topics, here are three administrative notes:
- We'll have office hours on Monday, the 17th; full info will go out this Friday.
- If you want to learn about uv, the new Python super-app for packaging, check out my uv crash course at https://uvCrashCourse.com.
- A new cohort of PythonDAB, my Python Data Analysis Bootcamp, will be starting on December 4th. Learn more at https://LernerPython.com/bootcamp, or sign up for an info session at https://us02web.zoom.us/webinar/register/WN_t9f0zd-7SkiflqI1Jz7QkA .
And now, onto our regularly scheduled program:
On October 19th, the world was shocked by the theft, in broad daylight, of more than $100 million in jewels from the Louvre in Paris — in 8 minutes! If you thought that the most famous museum in Paris, if not the world, was well guarded, then you weren't alone. The thieves, dressed as construction workers, entered the gallery via a window using glass cutters, smashed some of the displays, and ran off in just minutes. Some of them have been caught, but others remain at large.
The heist seemed more suited to a Hollywood caper film than the front page of major world newspapers. The New York Times recently described the theft itself in great detail, at https://www.nytimes.com/2025/10/30/world/europe/inside-louvre-jewel-heist.html?unlocked_article_code=1.0k8.3jfi.Fqdb95Os1tWk&smid=url-share .
How common are such heists? When and where do they take place? How many people are involved? And do they often use weapons? These, and other questions, were raised by Sandra Clopés and Marc Balcells, professors at Pompeu Fabra University in Barcelona, Spain, in a recent paper, "The Science of Art Theft: Using Data to Identify Criminal Patterns, 1990–2022" (https://www.cambridge.org/core/journals/international-journal-of-cultural-property/article/science-of-art-theft-using-data-to-identify-criminal-patterns-19902022/838EFE01FB7C7AE9244FB458FF103EB1), published in April in the International Journal of Cultural Property.
The paper only includes heists from 1990 through 2022, but can still provide us with an understanding of museum thefts. And indeed, that's our topic for this week, trying to better understand the database of heists used in the paper.
Data and five questions
There are large, complete databases of stolen artworks. However, these are generally behind paywalls or cannot be downloaded. For example, Interpol (the international police network) has a "stolen works of art" database, and even added the Louvre jewels (https://www.interpol.int/News-and-Events/News/2025/Louvre-Museum-theft-Stolen-jewels-added-to-INTERPOL-s-Stolen-Works-of-Art-database). But the database itself can only be searched, not downloaded.
I was thus happy to see that the Clopés and Balcells paper did include data — but only in PDF, and cut in a way that would make analysis difficult. I contacted Professor Clopés, who kindly and quickly sent me an Excel file with their database. You can download it from here:
This week, I have five tasks and questions for you based on this database.
Normally, I only provide a downloadable version of the data to paid BW subscribers and members of my LernerPython subscription program — but this week is an obvious exception to that rule. Tomorrow, I'll not only provide my solutions and explanations, but also my downloadable Marimo notebook, and a clickable link to view and explore the data using the Marimo "Molab" facility.
Learning goals for this week include: Working with dates and times, cleaning data, grouping, regular expressions, and handling weird multi-indexes.
Here are this week's five tasks. I'll be back tomorrow with my solutions and explanations:
- Read the Excel file into a Pandas data frame. Use lines 2 and 3 from the Excel file as the column names,
but end up with a single-level index on the columns, not a multi-index. Ensure that theFull DateandDate of heist discoveryaredatetimevalues; assume that "-" as the day of the month is really the first day of that month. Which country appears in this database the most times? Has any museum been robbed more than once in this period? Which museum had the most items stolen? Which had the greatest monetary value stolen? (For this last question, you can treat missing values as 0.) - In how many cases was the heist discovered on the same day as it took place? For those heists that were discovered later on, how long did it take, on average? What was the longest amount of time it took for the heist to be discovered?