As of this writing, the standoff between Iran and the United States in the Strait of Hormuz continues. Energy prices continue to rise (https://www.nytimes.com/2026/04/22/business/iran-war-oil-hoarding.html?unlocked_article_code=1.c1A.3QDh.fYueF0YAuuyE&smid=url-share). Just yesterday, Germany's Lufthansa airline announced that it's cancelling 20,000 flights to save on jet fuel (https://www.msn.com/en-us/travel/news/deutsche-lufthansa-to-cancel-20000-short-haul-flights-to-save-jet-fuel/ar-AA21pvF2).
But what exactly is the price of oil? There are two main prices, and we compared them this week:
- We first looked at the "spot" price, meaning how much you would have to pay to get a barrel of oil (i.e., 42 gallons or 159 liters) right now.
- We then looked at oil futures, where you pay to get oil at a specific point down the road, say in July. If you're worried that the price of oil will rise, you might buy such futures, locking in the current price even if they go up.
For more about oil prices and purchasing, check out NPR's Planet Money podcast from about 10 years ago, when they bought some oil: https://www.npr.org/sections/money/2016/08/26/491342091/planet-money-buys-oil
Paid subscribers, both to Bamboo Weekly and to my LernerPython+data membership program (https://LernerPython.com) get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.
Learning goals for this week include joins, dates and times, APIs, grouping, and plotting with Plotly.
Data and six questions
This week's data comes from two different sources:
- Spot oil prices will come from FRED (https://fred.stlouisfed.org/), from the St. Louis Federal Reserve.
- Prices for oil futures based on fixed dates (in this case, July 2026) will come from Yahoo Finance.
Here are my solutions and explanations for this week's six questions:
Use the fedfred package from PyPI to download the daily spot oil prices from FRED for both West Texas Intermediate (WTI) and Brent. These are known as the DCOILWTICO and DCOILBRENTEU data sets. Only keep rows for which there's data on both prices. Put them into a single Pandas data frame. Are prices currently at an all-time high? (If not, then when were they?)
I started my notebook by importing a bunch of Python packages that I ended up using in my queries:
import pandas as pd
from plotly import express as px
import fedfred as fd
import yfinance as yf
import dotenv
import osI imported Pandas and Plotly, as usual. Because I'll be using fedfred to download data from FRED, and yfinance to download data from Yahoo Finance, I imported those packages, too. (Marimo has a nice feature here, that if you try to import a package that it hasn't installed, and notices that you're in a uv project, it'll run uv add on your behalf.)
However, I also imported dotenv , available on PyPI as https://pypi.org/project/python-dotenv/. This is a great little package that allows you to store secrets and sensitive data in a file in your home directory, typically called .env. You load up dotenv in your program, and then invoke dotenv.load_dotenv() -- and all of the key-value pairs in that file are available as environment variables. To load them, you can then use the os module from Python's standard library.
I set things up with:
dotenv.load_dotenv()
fred = fd.FredAPI(api_key=os.getenv('FRED_API_KEY'))The fedfred package, which I only discovered recently, is a nice wrapper around the FRED API. You can use the API directly, but why do that, when you can get back a Pandas data frame, using the get_series_observations method.
However, get_series_observations only works on a single data set. You pass the unique FRED identifier, and get back a data frame with a datetime index.
How can we get two different data sets in the same data frame? I decided to invoke get_series_observations twice, once for each data set. And then, since they both had a datetime index, I invoked join on them, getting back one data frame with two columns:
df = (
fred
.get_series_observations('DCOILWTICO')
.join(fred.get_series_observations('DCOILBRENTEU'))
)However, this failed — because both of the input data frames had the same name column names, and column names cannot normally repeat in a data frame. To avoid this, I passed the lsuffix and rsuffix keyword arguments, indicating what suffix should be added to column names from the left (WTI) and right (Brent) data sets:
df = (
fred
.get_series_observations('DCOILWTICO')
.join(fred.get_series_observations('DCOILBRENTEU'),
lsuffix='_wti',
rsuffix='_brent')
)I was only interested in the columns that started with value_, so I used filter to keep only those columns, passing a regular expression indicating that value_ should be anchored to the front of the column name:
df = (
fred
.get_series_observations('DCOILWTICO')
.join(fred.get_series_observations('DCOILBRENTEU'),
lsuffix='_wti',
rsuffix='_brent')
.filter(regex='^value_')
)Finally, I was only interested in rows with data for both WTI and Brent, and ran dropna on the data frame:
df = (
fred
.get_series_observations('DCOILWTICO')
.join(fred.get_series_observations('DCOILBRENTEU'),
lsuffix='_wti',
rsuffix='_brent')
.filter(regex='^value_')
.dropna()
)The result? A data frame with 9,699 rows and 2 columns. The index contains datetime values, starting on May 20th 1987. The most recent values, when I retrieved them, were from April 20th.
Are prices currently at an all-time high? I could invoke max on the data frame to get the highest value in each column. But that would give me the value, not the date when oil hit that price. I could get the date of the high price, thanks to the fact that the dates are in the index, using the idxmax method. But what if I want both of these? I can use agg, which lets me run multiple aggregation methods on the same data frame:
(
df
.agg(['max', 'idxmax'])
)The result?
value_wti value_brent
max 145.31 143.95
idxmax 2008-07-03T00:00:00.000 2008-07-03T00:00:00.000In other words:
- WTI had a high price of $145.31 on July 3, 2008.
- Brent had a high price of $143.95, also on July 3, 2008.
So yes, oil prices are high. But they aren't the highest they've ever been.
Calculate, for the entire data set, the mean price for each two-month period. Where do the most recent two-month period's prices rank across historical oil prices? (Are they the highest, second highest, third highest, etc.)
It's not a problem to calculate the mean price of oil in our data set; we just invoke mean on the data frame. But I wanted to know the mean for every two-month period in the data set.
To do that, we most easily use resample, a from of groupby that allows us to perform aggregations on arbitrary segments of time. In this case, since we want to look at two-month chunks, we pass '2ME' to resample. To calculate based on the month's end, we say 'ME', and to calculate based on the end of 2-month periods, we say '2ME':
(
df
.resample('2ME').mean()
)Note that when you use 'ME', the date labels are the final day of each month. When you use '2ME', then the date labels are the final day of every two-month period. So the current (most recent) values will say 26-05-31, even though that's more than one month in the future.
Where does the most recent two-month period fit in, historically? We can use the rank method, which replaces the data frame's values with integers indicating their place in the sort order. Notice that we used ascending=False, meaning that the lowest will be numbered 0, the next-lowest 1, etc.
To get the ranks of the final values, we can say iloc[-1], meaning that we want the final row:
(
df
.resample('2ME').mean()
.rank(ascending=False)
.iloc[-1]
)The result:
2026-05-31 00:00:00
value_wti 16
value_brent 4In other words, the last two-month period was the 4th-highest two-month mean price for Brent, and the 16th-highest two-month mean for WTI.