Skip to content
3 min read · Tags: csv pyarrow memory-optimization datetime pivot-table plotting

BW #131: Canadian border crossings

Get better at: CSV files, PyArrow, reducing memory usage, working with dates and times, pivot tables, and plotting.

BW #131: Canadian border crossings

Administrative note: We'll be holding Pandas office hours on Thursday at 6 p.m. in Israel, which is 11 a.m. Eastern. This sessions is open to all paid subscribers (thanks for your support!) and also subscribers to my LernerPython+data program. Sorry for the short notice; I'll send Zoom information a few hours before it's slated to begin. Please do come, and bring your Pandas questions with you!

One of the cornerstones of US Donald Trump's presidential campaign in 2024 was his promise to reduce immigration. He especially promised to deport a large number of people staying in the United States illegally. He seems to be making good on his promise, with large numbers of arrests and deportations taking place on a regular basis.

However, many of the people arrested and deported aren't vicious criminals, or even criminals at all. Many of them, it turns out, are legally staying in the United States, and some are even US citizens. There have been numerous stories of non-Americans legally living in the United States who have been arrested, detained, or even deported.

Needless to say, many foreigners are thinking twice about visiting the US. At PyCon US this May, I heard that more than 300 people canceled their plans to attend because they were nervous about entering the United States.

For years, one group of travelers reliably visited the United States on a regular basis: Canadians. With the world's longest undefended border, a history of friendly relations, and a common language, it seems natural that Canadians would want to visit attractions in the United States.

But I've seen a growing number of articles (including https://www.travelandtourworld.com/news/article/canadian-travelers-are-turning-their-back-on-the-u-s-for-europe-heres-why/) indicating that Canadians have been visiting the US in smaller numbers than before. And so this week, we'll look at data about Canadian border crossings, and how these numbers have changed over time.

Data and five questions

This week's data comes from Statistics Canada (https://www.statcan.gc.ca/en/start), the Canadian government's statistics bureau. The specific data we'll look at is from a page with the oh-so-exciting title, "Leading indicator, International visitors entering or returning to Canada by land, by vehicle type, vehicle licence plate and traveller type," which accurately describes the contents, even if it won't win any awards for exciting text. That page is at https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=2410005701 .

However, that Web page only displays a small part of the data. I clicked on "download options," and clicked on the second-to-final option, for a CSV file with all of the data. That downloaded a large (881 MB) zipfile to my computer. Unzipping it produced a 16 GB CSV file.

This file can take quite a while to download, so be prepared and patient. (I was neither!)

Learning goals for this week include PyArrow, reducing memory usage, working with dates and times, pivot tables, and plotting.

Paid subscribers, including members of my LernerPython+data membership program, can download a copy of the file from the end of this post.

Here are my five tasks and questions. Meanwhile, I'll be back tomorrow with my solutions: