BW #24: Wildfire smoke

Several weeks ago, I started to see a number of odd-looking pictures on Facebook and other social media. The pictures were from New York and other US cities, but they all seemed oddly ... orange. People also took selfies with N95 masks on, making jokes (!) about how they've returned to wearing masks for the first time in a year or more — only this time, they were masking outside, rather than inside. And they weren't worried about viruses, but rather particles of smoke.

The reason, as you might recall, was a set of Canadian wildifres whose smoke ended up covering part of the United States. People were — and please pardon my use of highly technical language — freaking out for a few days, and taking nonstop pictures of what looked like a science fiction story.

(And yes, to my readers on the West Coast of the US, I realize that you've been suffering from such problems for years, and often need masks when wildfires are burning. Pre-covid, the only people I knew with masks on hand were people living in Northern California. I'm certainly not trying to ignore you!)

How bad was the smoke? Was it equally bad in all places, at all times? And can we somehow use Pandas to plot the smoke amounts on a map?

Those are the questions that I want to address this week. Along the way, we'll not only work with some interesting and important data, but we'll also experiment with GeoPandas, an extension to Pandas that (as you might guess) lets us explore and plot geographical data. I've recently started to play with GeoPandas, and my only regret is that I didn't start doing so years ago. I've found it to be both exciting and addictive, and hope that you'll be similarly excited to start working with it.

Data and questions

This week's data comes from the US Environmental Protection Agency. The EPA collects a variety of types of data regarding outdoor air quality, but much of the data is historical, and doesn't include more recent values. If you want recent data, you'll need to get it from the following page:

https://www.epa.gov/outdoor-air-quality-data/download-daily-data

Unfortunately, this page only lets you download info about pollutant and state at at a time. Not being an expert in this sort of thing, I decided that we would look at what's known as PM 2.5 µm or less, and that we would only look at a handful of states.

In order to get this data, you'll need to go to the above page. Then:

Choose PM2.5 for "pollutant."
Choose 2023 for the year.
We'll download data for three states: New York, Pennsylvania, and Ohio. (And yes, that means you'll need to fill out this form three times.)
After choosing a state, you'll be asked to either choose a city (first list) or county (second list). The first option on the second (county) list is "all sites," and it should also be the default.
Click on "get data."
Finally, click on the "Download CSV" link that appears in place of the "submit" button. As they say, the link will only work for 10 minutes.
Rename each of the three files you downloaded, and put them in the same directory.

And now, with the data downloaded into CSV files, we can start to work with it.

I have 11 tasks and questions for you this week. The learning goals are: Working with multiple files, time series, grouping, plotting, creating a GeoDataFrame from an existing one, and plotting geographical data against a map.

Create a single Pandas data frame from the three downloaded files. We'll only need a few of the columns: Date, PM2.5 concentration, site name, state, longitude, and latitude. Make the index a combination of the date and state name. Rename the columns to be all lowercase and shorter, to make it easier to work with.
What were the minimum, median, and maximum PM2.5 particle counts measured in these three states?

BW #24: Wildfire smoke

Data and questions

BW #23: Misery index (solution)