Skip to content

BW #128: Extreme heat

Get better at: Using PyArrow, pivot tables, plotting, optimizing query speed, datetime, multi-indexes, and using xarray

BW #128: Extreme heat

It's summer in the northern hemisphere, and everyone is complaining about the heat. I just got back from Prague, where it was a surprisingly warm 30 degrees Celsius over the weekend. Europe has generally had scorching weather over the last few weeks. Much of North America is about to have extreme heat.

The New York Times wrote about this last week in their "Climate Forward" newsletter (https://www.nytimes.com/2025/07/17/climate/climate-forward-extreme-heat.html?unlocked_article_code=1.Yk8.s6B8.zsHwJuy6CYBf&smid=url-share), where they wrote about this summer's high temperatures. In a separate article, they wrote that traditional vacation locations are changing in the wake of high summer temperatures (https://www.nytimes.com/2025/07/14/world/europe/spain-italy-greece-heat.html?unlocked_article_code=1.Yk8.ODAn.Zk7NaeV2a2OO&smid=url-share).

This week, we'll thus look at global temperatures, exploring their rise over the last number of years. But actually, the climate questions take a back seat in many ways to lots of other topics this week, from xarray to PyArrow.

Data and six questions

This week's data comes from the Copernicus Climate Change Service (C3S, https://climate.copernicus.eu), which offers scientific data about weather and climate. Our data comes from their "climate data store," for which you need to register (for free) to download data.

The data comes in GRIB format and xarray, neither of which I had used before. I hope that you'll enjoy the challenge of learning about these things as much as I did.

This week's learning goals include working with xarray, optimizing memory, multi-indexes, PyArrow, datetime data, pivot tables, and plotting.

Paid subscribers, including members of my LernerPython+data platform, can download the data file from a link below.

I'll be back tomorrow with my full solutions and explanations.

Meanwhile, here are my six tasks and questions:

    • product type: reanalysis data
    • variable: 2m temperature and 2m dewpoint temperature
    • year: 2020 through 2025
    • month: all
    • day: 1, 15, 28
    • time: 00:00, 08:00, 16:00
    • geography: whole available region
    • data format: GRIB
    • Format: unarchived

Download this data using either their "download" button or the API, with their provided Python code. (To use the API, you'll need to get a free key from their site, and configure your computer to use it.) Load the downloaded data into an xarray data set, and then turn the "an" part of that data set into a Pandas data frame. How big is this data set? How much memory does it use?