BW #53: Airport animals

How many animals entered Heathrow Airport in 2023? What do people bring into the airport? And how has this changed over the years?

BW #53: Airport animals

I recently read a story in the Economist ("How to transport a rhino," https://www.economist.com/britain/2024/01/25/how-to-transport-a-rhino) about the various animals that are transported into and out of London's Heathrow Airport every year. There is, it turns out, a special place (HARC, the Heathrow Animal Reception Centre), run by the City of London, which employs 55 people and handles the import and transit of a wide variety of animals — from small insects to lions and horses.

The article was amusing, and led me to wonder where they had gotten their data. After all, the article said that more than 30 million butterfly pupae were transported through Heathrow in 2023. That number had to come from somewhere, right?

Friends, I'm delighted to say that I have managed to track down that data! Perhaps it also exists elsewhere, but I found it in a letter submitted by HARC to the Chair of the Environment, Food, and Rural Affairs Committee of the UK's House of Commons. The letter was submitted on November 1st of last year, so it isn't completely up to date. But unless your animal's passport is out of date -- and yes, I've learned that there is such a thing as a "pet passport" — the data should still be interesting and fun to review.

Data and seven questions

The data that we'll be examining isn't available in either CSV or Excel format. Rather, it's buried inside of a letter in PDF format. The letter can be downloaded from here:

https://committees.parliament.uk/writtenevidence/126507/default/

And no, there is no filename or extension on that URL. Going to that link should force the download of the data, at least from a normal browser. Using `wget` doesn't seem to work, however.

I didn’t see a data dictionary for this information, but I think that it’s mostly self-explanatory. I did look up some of the animal-related terms, and will happily bore you with the details, if you like.

This week, I have seven tasks and questions for you to answer based on the data.

The learning goals for this week include working with PDF files, indexes and multi-indexes, and cleaning data.

I’ll be back tomorrow with detailed solutions to all of the questions, along with the Jupyter notebook I created in solving them.

  1. Turn the table on page 3 of the PDF into a data frame. I used `tabula-py` (a wrapper around the `tabula-java` package written in Java), available on PyPI (https://pypi.org/project/tabula-py/). I also used JPype1 (https://pypi.org/project/JPype1/), which improved the Python-to-Java communication.

  2. The final column was mis-parsed, at least on my system, such that it contains information for both consignments and animals from 2023, separated by spaces. Replace this one column with two columns.

  3. Remove the first row. Set the index to be the (first) TAXA column. Drop the (final) "total" row. Drop the columns containing only NaN values (i.e., Unnamed 1, 3, 5, and 7). Turn all values into integers.

  4. Replace the original index with a two-level multi-index. The outer level will be the years 2019-2023, and the inner level will be "Consignments" and "Animals", repeated for each year, for a total of 10 columns.

  5. What are the five most common types of animals that passed through Heathrow in 2023? Format the numbers with commas every three digits. What are the five most common types for which the total number was less than 1,000?

  6. Which animals had the greatest percentage growth from 2022 to 2023? Which had the greatest percentage drop? What does it mean when we see `inf` or `NaN`? Display the results as percentages, rather than floats.

  7. Produce a line plot showing, over the years, the number of consignments of dogs, cats, fish, and horses that entered Heathrow. The x axis should represent years, and the y axis should show the number of consignments.

  • Turn the table on page 3 of the PDF into a data frame. I used `tabula-py` (a wrapper around the `tabula-java` package written in Java), available on PyPI (https://pypi.org/project/tabula-py/). I also used JPype1 (https://pypi.org/project/JPype1/), which improved the Python-to-Java communication.
  • The final column was mis-parsed, at least on my system, such that it contains information for both consignments and animals from 2023, separated by spaces. Replace this one column with two columns.