It's October 1st, which means that it's also the start of the fiscal year for the United States government. Except that President Donald Trump and the two houses of Congress weren't able to pass a new budget. Senate Democrats want to restore billions of dollars in health-care assistance that were cut earlier this year, and Republicans don't want to. Until they can reach a deal, the federal government is thus shut down (https://www.nytimes.com/live/2025/10/01/us/government-shutdown-trump-news?unlocked_article_code=1.qE8.fjVr.9_ldFlaxkmsQ&smid=url-share).
This isn't the first time that the government has shut down; you can see a full list at Wikipedia (https://en.wikipedia.org/wiki/Government_shutdowns_in_the_United_States). During a shutdown, employees and contractors -- other than those deemed essential -- are furloughed, essentially forced to take an unpaid vacation. Trump has said that he'll use the government shutdown to fire employees from a number of departments, rather than just furlough them until the budget is passed, which was the case during previous shutdowns.
Just who will be furloughed, who will be fired, and who will be classified as essential, remains to be seen. But if you had any appointments with federal government over the coming days, you'll almost certainly need to reschedule it.
This week, I thought it might be interesting to look at data about US federal employees -- who they are, and what sort of work we do. We already looked at some data about federal employees back in BW #105, but we'll be working with different data sets this time around, and will analyze them in some different ways.
Data and five questions
This week's data comes from the OPM, the Office of Personnel Management (https://www.opm.gov/data/datasets/). I found some other government sites with information about federal employees, but they weren't working; it wasn't clear whether they were simply having problems, if they had been taken down earlier this year, or if they were unavailable because of the shutdown.
Regardless, I was happy to find that https://www.opm.gov/data/datasets/ is still working (as of this writing), and that I could download federal employee data March of this year, which was posted on July 1st. We'll be using the full data, which comes in three separate downloadable zipfiles from the OPM site, each of which contains a CSV file with information about federal employees. These files are listed as "data file 1," "data file 2," and "data file 3" on the download page.
Learning goals for this week include working with multiple files, memory optimization, date/time handling, and grouping.
Paid members, including subscribers to my LernerPython.com membership program (https://LernerPython.com), can download the data file from the bottom of this message. Paid subscribers also get all of the questions and answers, downloadable notebooks, one-click access to the data and notebook in Google Colab or Marimo Molab, and invitations to monthly office hours.
Here are this week's tasks and questions. I'll be back tomorrow (late, after Yom Kippur ends in Israel) with my solutions and explanations:
- Create a data frame from the CSV files inside of the three zipfiles. Make sure that the
DATECODE
column is adatetime
value. How much memory do you save by turning the string columns into categories? Make sure theSALARY
column contains floats. - Assuming that each row in the data frame describes one federal employees, what's the fastest and best way to find out how many employees there are? The New York Times, as of this writing, says that the Environmental Protection Agency, Department of Education, Department of Commerce, Department of Labor, and Department of Housing and Urban Development have had the greatest percentage of furloughs across government agencies. How many people work, total, at those agencies? Does the number match what the New York Times reports? Why or why not?