Skip to content

Bamboo Weekly #163: Daylight saving time (solution)

Get better at: Scraping, regular expressions, grouping, plotting with Plotly, and handling dates and times.

Bamboo Weekly #163: Daylight saving time (solution)

Spring forward! This weekend, many Europeans will be changing their clocks, joining "summer time," known as "daylight saving time" in North America — which switched a few weeks ago. Changing the clocks, which happens twice each year in many countries, is highly controversial. Many argue that we should stop it. And indeed, a number of countries that used to change their clocks no longer do so.

This week, we looked at daylight saving time from a variety of perspectives, including what countries do it, how long it lasts, and how many countries have observed it in each year.

Paid subscribers, both to Bamboo Weekly and to my LernerPython+data membership program (https://LernerPython.com) get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.

Learning goals for this week include scraping Web sites, working with dates and times, grouping, and plotting with Plotly.

Data and five questions

This week's data comes from the "Time and date" web site, which has a dedicated overview about daylight saving time at https://www.timeanddate.com/time/dst/statistics.html and detailed, per-country info for 2026 at https://www.timeanddate.com/time/dst/2026.html . We'll use both of these.

Here are my five questions for this week, along with my solutions and explanations:

From the 2026 DST info page, create a data frame for each country, in which the start and end columns are datetime values indicating when DST starts and ends. If the country doesn't observe DST, then you can use NaT ("not a time").

As usual, I started by loading both Pandas and Plotly:

import pandas as pd
from plotly import express as px

I then defined a variable with the 2026 DST information. But how can we retrieve information from a Web page into a data frame? Pandas offers us an elegant solution with read_html, a a method that returns a list of data frames, one for each HTML table on a page. A bit of experimentation showed me that the table in question, one showing when each country starts and ends DST, is the first on the page (i.e., index 0), which gave me the following code:

dst_2026_url = 'https://www.timeanddate.com/time/dst/2026.html'

dst_2026_df = (pd
               .read_html(dst_2026_url)[0]
              )

But that's not nearly enough. For starters, because of the way the HTML table was designed, we ended up with a multi-index on the columns, and with weird names. I used set_axis to set the column names to something more easily understood:

dst_2026_url = 'https://www.timeanddate.com/time/dst/2026.html'

dst_2026_df = (pd
               .read_html(dst_2026_url)[0]
               .set_axis(['country', 'divisions', 'start', 'end'], axis='columns')
              )

I wanted the start and end columns to contain datetime values. In order to do that, I needed to run pd.to_datetime on them, something I've done many times before. But in this case, the text we got wasn't a standard date format. That's fine; we can always pass the format keyword argument to pd.to_datetime, and force a format. True, but doing so in this case, without any year, ended up setting the year to 1 (i.e., about 2025 years ago).

I decided to fix this in a somewhat odd way:

dst_2026_df = (pd
               .read_html(dst_2026_url)[0]
               .set_axis(['country', 'divisions', 'start', 'end'], axis='columns')
               .assign(start = lambda df_: pd.to_datetime(df_['start'] + ', 2026', 
                                                          format='%A, %B %d, %Y', 
                                                          errors='coerce'),
                       end = lambda df_: pd.to_datetime(df_['end'] + ', 2026', 
                                                        format='%A, %B %d, %Y', 
                                                        errors='coerce'))                       
              )

The result? A data frame with 263 rows and 4 columns. We won't really use the divisions column, but will use the others.

How many countries won't change their clocks at all? Of those that do, what is the most common date for starting? As of today, how many have started it already?

Countries that will change their clocks have useful datetime values in the start and end columns. Those that don't have NaT, the datetime equivalent of NaN.

We can thus use [] just to retrieve the start column as as series. We can then invoke isna on it, getting a boolean (True or False) series back:

(
    dst_2026_df['start']
    .isna()
)

Then we can invoke value_counts on the resulting series, passing normalize=True to get percentages rather than absolute numbers:

(
    dst_2026_df['start']
    .isna()
    .value_counts(normalize=True)
)

I then went one step further, using apply and str.format to display percentages with two digits after the decimal point:


(
    dst_2026_df['start']
    .isna()
    .value_counts(normalize=True)
    .apply('{:.02%}'.format)
)

The result:

start	proportion
true	71.86%
false	28.14%

In other words, more than 70 percent of countries do not change their clocks for DST at any point during the year.

Among those that do change their clocks, what is the most common date? For that, I used set_index to set the data frame's index to the country column, and then grabbed only the start column (as as series).

I removed the NaT values with dropna, then used .dt.date to retrieve just the date value from each of the start dates. I then counted how often each date appeared with value_counts, which automatically sorts from the most to least common:

(
    dst_2026_df
    .set_index('country')
    ['start']
    .dropna()
    .dt.date
    .value_counts()
)

The result:

start	count
2026-03-29	50
2026-03-08	10
2026-10-04	3
2026-09-27	2
2026-03-28	2
2026-03-22	2
2026-03-15	1
2026-09-06	1
2026-09-05	1
2026-04-24	1
2026-03-27	1

By far, most countries are starting DST this year on March 29th – Sunday of this coming weekend. Another 10 countries started on March 8th, when the United States and Canada did so. A few will do it a bit later, but then you have some stragglers, as well. Those starting in September and October are in the southern hemisphere. (We'll get back to that in a moment.) Israel is starting tomorrow, March 27th, since our weekend is Friday-Saturday, rather than Saturday-Sunday.

How many started DST already? We can once again set the index to be country and grab start, then dropna, and then use loc to retrieve only those values that come before now – using pd.Timestamp.now, a very convenient Pandas method:

(
    dst_2026_df
    .set_index('country')
    ['start']
    .dropna()
    .loc[lambda s_: s_ < pd.Timestamp.now() ]
)

The results?

country	start
Antarctica	2026-03-15
Bermuda	2026-03-08
Canada	2026-03-08
Cuba	2026-03-08
Greenland	2026-03-08
Haiti	2026-03-08
Mexico	2026-03-08
Morocco	2026-03-22
Saint Pierre and Miquelon	2026-03-08
The Bahamas	2026-03-08
Turks and Caicos Islands	2026-03-08
United States	2026-03-08
Western Sahara	2026-03-22

I must admit that until now, I didn't realize that Antarctica observed DST. I'll have to remember that the next time I schedule a meeting with a penguin.

By the way, the story of when we change the clocks in Israel was a major political saga for years. If you're interested in calendars, clocks, religion, and coalition politics, you might find this interesting: https://en.wikipedia.org/wiki/Israel_Summer_Time

And here's something I didn't know before: Morocco observes daylight saving time only during the month of Ramadan, when observant Muslims fast while the sun is up: https://en.wikipedia.org/wiki/Daylight_saving_time_in_Morocco