Spring forward! This weekend, many Europeans will be changing their clocks, joining "summer time," known as "daylight saving time" in North America — which switched a few weeks ago. Changing the clocks, which happens twice each year in many countries, is highly controversial. Many argue that we should stop it. And indeed, a number of countries that used to change their clocks no longer do so.
This week, we looked at daylight saving time from a variety of perspectives, including what countries do it, how long it lasts, and how many countries have observed it in each year.
Paid subscribers, both to Bamboo Weekly and to my LernerPython+data membership program (https://LernerPython.com) get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.
Learning goals for this week include scraping Web sites, working with dates and times, grouping, and plotting with Plotly.
Data and five questions
This week's data comes from the "Time and date" web site, which has a dedicated overview about daylight saving time at https://www.timeanddate.com/time/dst/statistics.html and detailed, per-country info for 2026 at https://www.timeanddate.com/time/dst/2026.html . We'll use both of these.
Here are my five questions for this week, along with my solutions and explanations:
From the 2026 DST info page, create a data frame for each country, in which the start and end columns are datetime values indicating when DST starts and ends. If the country doesn't observe DST, then you can use NaT ("not a time").
As usual, I started by loading both Pandas and Plotly:
import pandas as pd
from plotly import express as pxI then defined a variable with the 2026 DST information. But how can we retrieve information from a Web page into a data frame? Pandas offers us an elegant solution with read_html, a a method that returns a list of data frames, one for each HTML table on a page. A bit of experimentation showed me that the table in question, one showing when each country starts and ends DST, is the first on the page (i.e., index 0), which gave me the following code:
dst_2026_url = 'https://www.timeanddate.com/time/dst/2026.html'
dst_2026_df = (pd
.read_html(dst_2026_url)[0]
)
But that's not nearly enough. For starters, because of the way the HTML table was designed, we ended up with a multi-index on the columns, and with weird names. I used set_axis to set the column names to something more easily understood:
dst_2026_url = 'https://www.timeanddate.com/time/dst/2026.html'
dst_2026_df = (pd
.read_html(dst_2026_url)[0]
.set_axis(['country', 'divisions', 'start', 'end'], axis='columns')
)
I wanted the start and end columns to contain datetime values. In order to do that, I needed to run pd.to_datetime on them, something I've done many times before. But in this case, the text we got wasn't a standard date format. That's fine; we can always pass the format keyword argument to pd.to_datetime, and force a format. True, but doing so in this case, without any year, ended up setting the year to 1 (i.e., about 2025 years ago).
I decided to fix this in a somewhat odd way:
- I used
assignto assign a new value to a column name. In this case, I did it for two columns,startandend. - When you use
assignto set a column that already exists, you replace the existing one. So this was a way to replace the current value instartandendwith a new value. - The new value I wanted needed to be calculated. The value passed to
assignin each case was thus alambda, an anonymous function – one that took the data frame as an argument. - I first used
+to add', 2026'to each of the values instartandend, concatenating that string to whatever was already there. In the case of an actual date, this added the year, removing the ambiguity over years that we had before. - In the case of a non-date, the string was
'No DST in 2026'. So adding', 2026'to that string didn't change the fact that it wasn't a date. - I then called
pd.to_datetimeon the combination of the original value and our addition a comma and the year. - I specified the format as
'%A, %B %d, %Y', which means:- Day of week + a comma
- Month name
- Day of month + a comma
- Year
- Finally, I passed
errors='coerce', which tellspd.to_datetimethat if the string can be translated with the provided format, then great – and if not, then we should use the specialNaTvalue, which is short for "not a time." It's thedatetimeequivalent ofNaN. And indeed, you can removeNaTvalues withdropna:
dst_2026_df = (pd
.read_html(dst_2026_url)[0]
.set_axis(['country', 'divisions', 'start', 'end'], axis='columns')
.assign(start = lambda df_: pd.to_datetime(df_['start'] + ', 2026',
format='%A, %B %d, %Y',
errors='coerce'),
end = lambda df_: pd.to_datetime(df_['end'] + ', 2026',
format='%A, %B %d, %Y',
errors='coerce'))
)The result? A data frame with 263 rows and 4 columns. We won't really use the divisions column, but will use the others.
How many countries won't change their clocks at all? Of those that do, what is the most common date for starting? As of today, how many have started it already?
Countries that will change their clocks have useful datetime values in the start and end columns. Those that don't have NaT, the datetime equivalent of NaN.
We can thus use [] just to retrieve the start column as as series. We can then invoke isna on it, getting a boolean (True or False) series back:
(
dst_2026_df['start']
.isna()
)Then we can invoke value_counts on the resulting series, passing normalize=True to get percentages rather than absolute numbers:
(
dst_2026_df['start']
.isna()
.value_counts(normalize=True)
)I then went one step further, using apply and str.format to display percentages with two digits after the decimal point:
(
dst_2026_df['start']
.isna()
.value_counts(normalize=True)
.apply('{:.02%}'.format)
)The result:
start proportion
true 71.86%
false 28.14%In other words, more than 70 percent of countries do not change their clocks for DST at any point during the year.
Among those that do change their clocks, what is the most common date? For that, I used set_index to set the data frame's index to the country column, and then grabbed only the start column (as as series).
I removed the NaT values with dropna, then used .dt.date to retrieve just the date value from each of the start dates. I then counted how often each date appeared with value_counts, which automatically sorts from the most to least common:
(
dst_2026_df
.set_index('country')
['start']
.dropna()
.dt.date
.value_counts()
)The result:
start count
2026-03-29 50
2026-03-08 10
2026-10-04 3
2026-09-27 2
2026-03-28 2
2026-03-22 2
2026-03-15 1
2026-09-06 1
2026-09-05 1
2026-04-24 1
2026-03-27 1
By far, most countries are starting DST this year on March 29th – Sunday of this coming weekend. Another 10 countries started on March 8th, when the United States and Canada did so. A few will do it a bit later, but then you have some stragglers, as well. Those starting in September and October are in the southern hemisphere. (We'll get back to that in a moment.) Israel is starting tomorrow, March 27th, since our weekend is Friday-Saturday, rather than Saturday-Sunday.
How many started DST already? We can once again set the index to be country and grab start, then dropna, and then use loc to retrieve only those values that come before now – using pd.Timestamp.now, a very convenient Pandas method:
(
dst_2026_df
.set_index('country')
['start']
.dropna()
.loc[lambda s_: s_ < pd.Timestamp.now() ]
)The results?
country start
Antarctica 2026-03-15
Bermuda 2026-03-08
Canada 2026-03-08
Cuba 2026-03-08
Greenland 2026-03-08
Haiti 2026-03-08
Mexico 2026-03-08
Morocco 2026-03-22
Saint Pierre and Miquelon 2026-03-08
The Bahamas 2026-03-08
Turks and Caicos Islands 2026-03-08
United States 2026-03-08
Western Sahara 2026-03-22
I must admit that until now, I didn't realize that Antarctica observed DST. I'll have to remember that the next time I schedule a meeting with a penguin.
By the way, the story of when we change the clocks in Israel was a major political saga for years. If you're interested in calendars, clocks, religion, and coalition politics, you might find this interesting: https://en.wikipedia.org/wiki/Israel_Summer_Time
And here's something I didn't know before: Morocco observes daylight saving time only during the month of Ramadan, when observant Muslims fast while the sun is up: https://en.wikipedia.org/wiki/Daylight_saving_time_in_Morocco