Skip to content
11 min read · Tags: csv datetime filtering correlations grouping plotting

BW #125: Shrinking dollars (solution)

Get better at: Working with CSV files, dates and times, filtering, correlations, grouping, and plotting

BW #125: Shrinking dollars (solution)
An American abroad pays for groceries with more dollars than they expected

This week, we looked at data having to do with the value of the US dollar against other currencies. A number of news reports, including one earlier this week in the New York Times (https://www.nytimes.com/2025/06/30/business/dollar-decline-trump.html), indicated that the dollar had lost more value in the first six month of 2025 than in 50 years.

This has a number of effects, including making imports and trips abroad more expensive for Americans, while making US exports cheaper abroad. The weaker dollar seems to reflect nervousness among investors regarding the Trump administration's policies, especially those having to do with tariffs and the high levels of debt the proposed budget would create.

Data and five questions

This week's data comes from the Federal Reserve's tracker of foreign exchange rates (https://www.federalreserve.gov/releases/h10/20250630/), described as H.10 on the Fed's system. They have been tracking a number of different currencies against the US dollar for some time.

I used FRED (the online data tool sponsored by the Federal Reserve of Saint Louis) to download all of the H.10 values that I could at https://fred.stlouisfed.org/release?rid=17 . I then asked to plot all of them, and downloaded a CSV file from that page of plots.

However, I seem to have maxed out FRED's downloading system; I only got from DTWEXBGS (broad index), DEXUSEU (dollar-euro), DEXJPUS (dollar-yen), DEXCHUS (dollar-yuan), DEXCAUS (dollar-Canadian dollar), DEXUSUK (dollar-pound), DEXKOUS (dollar-Korean won), DEXMXUS (dollar-Mexican peso), DEXVZUS (dollar-Venezuelan peso), DTWEXAFEGS (advanced foreign economy index), DEXINUS (dollar-Indian rupee), and DEXBZUS (dollar-Brazilian real). So those are the ones we'll look at.

Paid subscribers (including members of my LernerPython+data membership) can download my data file from the bottom of this message, as well as a copy of my Jupyter notebook.

This week's learning goals include working with CSV files, dates and times, filtering, correlations, grouping, and plotting.

Read the data into a data frame. Make sure that the observation_date column is a datetime, which should then be set to the index. Which currencies have been tracked at the start of the Fed's data set? Which was the most recent currency to be added?

Let's start by loading Pandas:

import pandas as pd

I downloaded the data as a CSV file, so to read it into Pandas I used read_csv. By default, read_csv doesn't recognize datetime columns, but leaves them as strings. I thus used the parse_dates keyword argument to convert the values. I also used the index_col keyword argument to ask that the observation_date column be turned into the data frame's index:

filename = 'data/bw-125.csv'

df = pd.read_csv(filename,
                 parse_dates=['observation_date'],
                 index_col='observation_date')

With that in place, I asked you to find which currencies were tracked at the start of the Fed's data set?

The data frame contains one column per currency or index. I only wanted currencies, which meant keeping only those columns that started with "DEX". That's most easily done with the filter method:

(
    df
    .filter(like='DEX')
)

Notice that I used the like keyword argument for filter, which basically runs Python's in operator on each of the column names, keeping only those for which it's True. I'm still a bit more partial to regular expressions, and thus passed the regex keyword argument with ^DEX as a value (i.e., 'DEX' anchored to the start of the string):

(
    df
    .filter(regex='^DEX')
)

I then grabbed the first row in the data frame using iloc:

(
    df
    .filter(regex='^DEX')
    .iloc[0]
)

This gave me a series in which the index represented the exchange rates and the values represented the rate on that first day of trading – or NaN if there was no value.

That was the key to finding the currencies that we had from the beginning; I invoked dropna to remove the NaN values. The index entries for the remaining values thus represented those that were there on the first day:

(
    df
    .filter(regex='^DEX')
    .iloc[0]
    .dropna()
    .index
)

The result:

Index(['DEXJPUS', 'DEXCAUS', 'DEXUSUK'], dtype='object')

In other words, the Fed has been tracking the exchange rate with Japan, Canada, and the UK from the start.

But I also asked you to find the most recently added currency. Once again, I started by keeping only currencies with filter and regex.

But then I wanted to find the date on which the first non-NaN value appeared in each column. To do this, I used the first_valid_index method, which returns the first index with a non-NaN value.

Running it on the data frame doesn't really help here, since it'll give us the first index in which the entire data frame has a non-NaN value. But if we run it on each column, we'll get that particular column's first non-NaN index. And while I try to avoid it, sometimes there's no choice but to use apply to run a function on each column in a data frame.

The function, in this case, will be a lambda expression. Its argument will be a column, and we'll invoke first_valid_index on that column. The result will be the first index in each column for which we have a non-NaN.

I passed axis='rows', which worked fine. This might seem counterintuitive, given that I'm running it on each column. I like to think of it as saying, "I want to get a new row, sharing the column names of the data frame."

(
    df
    .filter(regex='^DEX')
    .apply(lambda c_: c_.first_valid_index(), axis='rows')
)

Finally, I invoked sort_values to get them from earliest to latest:

(
    df
    .filter(regex='^DEX')
    .apply(lambda c_: c_.first_valid_index(), axis='rows')
    .sort_values()
)

The result:

DEXJPUS   1971-01-04
DEXCAUS   1971-01-04
DEXUSUK   1971-01-04
DEXINUS   1973-01-02
DEXCHUS   1981-01-02
DEXKOUS   1981-04-13
DEXMXUS   1993-11-08
DEXVZUS   1995-01-02
DEXBZUS   1995-01-02
DEXUSEU   1999-01-04
dtype: datetime64[ns]

Not surprisingly, the euro (from the European Union) is the most recent addition to the Fed's currency comparisons. That started on January 4th, 1999.

The New York Times article said, "The dollar is off to its worst start to a year in more than half a century." Calculate the percentage drop in the "broad" index (DTWEXBGS) for the first six months of each year starting in 2020.

The "broad" index attempts to capture the general strength of the dollar. I asked you to find the percentage drop in this index in the first half of each year starting in 2020.

For starters, I used loc to retrieve only data from 2020 and onward, taking advantage of the fact that when datetime values are in the index, and when we specify only the year, it uses wildcards for all of the other portions of the date and time. We can also use a slice – so in this case, loc['2020':] does the trick, giving us all data from 2020 and onward.

I used the two-argument version of loc, specifying the column we want ("DTWEXBGS"). The result is a series:

(
    df
    .loc['2020':, 'DTWEXBGS']
)

Next, I got rid of the NaN values in this series with dropna. 2020 started with a bunch of them, and while they won't affect our calculations, I just didn't want them around. I also invoked reset_index to turn our series into a two-column data frame:

(
    df
    .loc['2020':, 'DTWEXBGS']
    .dropna()
    .reset_index()
)

Next, I used assign to grab the year from the datetime value, using dt.year, and adding a new year column, and also assign a new month column from dt.month. I didn't have to do this, but it made the subsequent queries easier to write and read.

(
    df
    .loc['2020':, 'DTWEXBGS']
    .dropna()
    .reset_index()
    .assign(year = lambda df_: df_['observation_date'].dt.year,
           month = lambda df_: df_['observation_date'].dt.month)
)

I only wanted data from the first six months of the year. Now that I have a month column defined, I can use isin inside of a loc with lambda to find rows from the first six months of the year:

(
    df
    .loc['2020':, 'DTWEXBGS']
    .dropna()
    .reset_index()
    .assign(year = lambda df_: df_['observation_date'].dt.year,
           month = lambda df_: df_['observation_date'].dt.month)
    .loc[lambda df_: df_['month'].isin(range(7))]
)

Now that I have the data I want, I can start to manipulate it. I used groupby on the DTWEXBGS column, running the first and last aggregation methods via the agg method.

I'll note that only after using them did I discover that first and last are deprecated... but they're pretty convenient here, and they still do work, so I'm going to ignore the documentation's warnings and still use them:

(
    df
    .loc['2020':, 'DTWEXBGS']
    .dropna()
    .reset_index()
    .assign(year = lambda df_: df_['observation_date'].dt.year,
           month = lambda df_: df_['observation_date'].dt.month)
    .loc[lambda df_: df_['month'].isin(range(7))]
    .groupby('year')['DTWEXBGS'].agg(['first', 'last'])
)

The above gives us a two-column data frame. The index contains the years (2020 through 2025), and the columns are "first" and "last", showing the first and last values in the first half of each year:

first      last
year                    
2020  120.6757  120.4746
2021  111.2143  112.5613
2022  115.4243  120.9718
2023  122.0844  119.7118
2024  119.6165  124.5172
2025  129.6666  120.0786

How can we use this to determine the degree to which the dollar has declined? We can run pct_change on our data frame, passing axis='columns' so that it calculates from left to right. This will give us the percentage change from the start of the year to the midpoint (i.e., the end of June). We can then grab the last column, which contains the percentage change:

(
    df
    .loc['2020':, 'DTWEXBGS']
    .dropna()
    .reset_index()
    .assign(year = lambda df_: df_['observation_date'].dt.year,
           month = lambda df_: df_['observation_date'].dt.month)
    .loc[lambda df_: df_['month'].isin(range(7))]
    .groupby('year')['DTWEXBGS'].agg(['first', 'last'])
    .pct_change(axis='columns')
    ['last']
)

Finally, we can invoke sort_values to find out which year had the biggest drop:

(
    df
    .loc['2020':, 'DTWEXBGS']
    .dropna()
    .reset_index()
    .assign(year = lambda df_: df_['observation_date'].dt.year,
           month = lambda df_: df_['observation_date'].dt.month)
    .loc[lambda df_: df_['month'].isin(range(7))]
    .groupby('year')['DTWEXBGS'].agg(['first', 'last'])
    .pct_change(axis='columns')
    ['last']
    .sort_values()
)

Here is what I got:

year
2025   -0.073943
2023   -0.019434
2020   -0.001666
2021    0.012112
2024    0.040970
2022    0.048062
Name: last, dtype: float64

Sure enough, 2025 had the biggest drop in the last five years, by quite a bit, going down about 7 percent since the start of the year.