BW #128: Extreme heat

It's summer in the northern hemisphere, and everyone is complaining about the heat. I just got back from Prague, where it was a surprisingly warm 30 degrees Celsius over the weekend. Europe has generally had scorching weather over the last few weeks. Much of North America is about to have extreme heat.

The New York Times wrote about this last week in their "Climate Forward" newsletter (https://www.nytimes.com/2025/07/17/climate/climate-forward-extreme-heat.html?unlocked_article_code=1.Yk8.s6B8.zsHwJuy6CYBf&smid=url-share), where they wrote about this summer's high temperatures. In a separate article, they wrote that traditional vacation locations are changing in the wake of high summer temperatures (https://www.nytimes.com/2025/07/14/world/europe/spain-italy-greece-heat.html?unlocked_article_code=1.Yk8.ODAn.Zk7NaeV2a2OO&smid=url-share).

This week, we'll thus look at global temperatures, exploring their rise over the last number of years. But actually, the climate questions take a back seat in many ways to lots of other topics this week, from xarray to PyArrow.

Data and six questions

This week's data comes from the Copernicus Climate Change Service (C3S, https://climate.copernicus.eu), which offers scientific data about weather and climate. Our data comes from their "climate data store," for which you need to register (for free) to download data.

The data comes in GRIB format and xarray, neither of which I had used before. I hope that you'll enjoy the challenge of learning about these things as much as I did.

This week's learning goals include working with xarray, optimizing memory, multi-indexes, PyArrow, datetime data, pivot tables, and plotting.

Paid subscribers, including members of my LernerPython+data platform, can download the data file from a link below.

I'll be back tomorrow with my full solutions and explanations.

Meanwhile, here are my six tasks and questions:

From the climate data store (https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=download), download:

product type: reanalysis data
variable: 2m temperature and 2m dewpoint temperature
year: 2020 through 2025
month: all
day: 1, 15, 28
time: 00:00, 08:00, 16:00
geography: whole available region
data format: GRIB
Format: unarchived

Download this data using either their "download" button or the API, with their provided Python code. (To use the API, you'll need to get a free key from their site, and configure your computer to use it.) Load the downloaded data into an xarray data set, and then turn the "an" part of that data set into a Pandas data frame. How big is this data set? How much memory does it use?

How much memory do you save by removing the number, step, surface, and valid_time columns, switching t2m, latitude and longitude to float32? Before converting it to float32, round the t2m column to 1 decimal place.

BW #128: Extreme heat

Data and six questions

BW #127: European comparisons (solution)