Bamboo Weekly #165: Artemis II (solution)

This week, we looked at data from NASA's Artemis II mission (https://en.wikipedia.org/wiki/Artemis_II), currently on its way home to Earth. Orion, the spacecraft that took four astronauts around the moon, looped around the other ("dark") side of the moon, allowing them to see features that have almost never been seen before.

Paid subscribers, both to Bamboo Weekly and to my LernerPython+data membership program (https://LernerPython.com) get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.

Learning goals for this week include reading working with APIs, working with dates and times, joins, and plotting.

Data and six questions

This week's data comes from NASA's Horizons API (https://ssd-api.jpl.nasa.gov/doc/horizons.html). This system has been around for many years, and the API reflects that. You'll need to make a GET request to https://ssd.jpl.nasa.gov/api/horizons.api. (No API key or registration is needed.) You'll need to pass a number of parameters to the API, most easily done via the following Python dict:


```python
 horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",      
        'CSV_FORMAT': "'YES'",
    }
```

The dict reflects the parameters that you need to pass. Note that all of them must be put inside of single quotes! It's not enough for them to be strings:

format of 'text' returns a string with some CSV inside of it. You can also give 'json' as a value... but that just returns a very small JSON object which contains the text inside of one element! It's just not worthwhile.
COMMAND indicates what body you want to track with an integer. Natural bodies (e.g., planets and moons) get positive numbers, and spacecraft get negative numbers. So the moon is 301, and Orion is -1024. If you ask for Orion data outside of its launch time, you'll get an error. But if you ask for Orion data within its time in space, but before it has landed (as is the case as I'm writing this), you'll get its expected/predicted information. Information about the moon has no limitations.
EPHEM_TYPE indicates whether we want information relative to Earth or in another format that's more appropriate for telescopes. We'll stick with 'VECTORS', to give us information relative to Earth.
The START_TIME and STOP_TIME parameters are exactly what you might expect.
The STEP_SIZE parameter describes the granularity with which we want to get results. We'll ask for '1h', for one-hour granularity.
The VEC_TABLE indicates which values (column names) we want to get back. By specifying '2', you get:
- JDTDB, the Julian day (https://en.wikipedia.org/wiki/Julian_day) as a float, where .0 marks noon, not midnight,
- datetime_str, the date and time of the observation,
- X, Y, and Z, for the locations relative to the center of Earth,
- VX, VY, and VZ, for the velocity (in km/s) along each axis
- There is one more junk column that we have to read in, because of a trailing comma on each line, but we don't care about.
CSV_FORMAT: We'll say 'YES', to avoid getting fix-width fields.

Here are my six questions, along with solutions and explanations:

Write a function, `get_horizons_data`, that takes three arguments: A command number, a starting time, and a stopping time. The function should return a Pandas data frame with the colummn names described in `VEC_TABLE`, above. The `datetime_str` should be a `datetime` value.

I started by importing a number of Python modules:

import pandas as pd
from plotly import express as px
import requests
from io import StringIO

I import pandas and plotly every week. And sometimes, like this time, I've needed to use requests in the past, allowing us to consume APIs.

But what's with StringIO? Simply put, it's a data structure (in Python's standard library) that acts like a file, but is actually all in memory. You can create a StringIO object with a string, and then use it where you would otherwise use a file. This is an example of what Python calls a "file-like object," in that it implements the same API as a file. We'll need to use this for some of our manipulations when reading from the API.

Next, I implemented the function. As requested, it takes three arguments – a command (really an integer indicating what body we want to track), along with a start time and a stop time. The step size is assumed to be '1h', for one-hour granularity. I started the function by defining some variables.

I used requests to make a GET request to horizons_url, passing the horizons_params dictionary as query parameters.

def get_horizons_data(command, start_time, stop_time):
    horizons_url = 'https://ssd.jpl.nasa.gov/api/horizons.api'

    horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",       # position + velocity
        'CSV_FORMAT': "'YES'",
    }

    response = requests.get(horizons_url, params=horizons_params)

The response variable contains a "response object" from the HTTP request. This contains all of the information we might want, from the numeric response code to the headers. But really, all we care about is grabbing the text from the response, which we'll assume (somewhat naively) worked just fine.

The thing is, the CSV file is between the '$$SOE' and '$$EOE' markers. So we'll need to get the text content from the response, then grab only the text between those markers as a slice. Here's how I did it:

def get_horizons_data(command, start_time, stop_time):
    horizons_url = 'https://ssd.jpl.nasa.gov/api/horizons.api'

    horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",       # position + velocity
        'CSV_FORMAT': "'YES'",
    }

    response = requests.get(horizons_url, params=horizons_params)

    start_token = '$$SOE'
    end_token = '$$EOE'

    response_text = response.text
    # print(response_text)   # uncomment for debugging

    start_token_index = response_text.index(start_token)
    end_token_index = response_text.index(end_token)

    data = response.text[start_token_index + len(start_token): end_token_index]

In other words, I used str.index to find start_token and end_token, and then got the text between the two with a slice. (Notice that I had to move past start_token to avoid including it.

data now contains a string with CSV-formatted data. We'll somehow want to turn that into a file-like object that we can pass to read_csv. As I mentioned above, we can use StringIO for this purpose, and it does the job great.

I also passed two other keyword arguments to read_csv: header, setting it to None, since the data doesn't have any column headers, and then names, with a list of strings we can use. Notice that I passed '_extra' at the end of the names list, because Horizon's data format always includes a trailing comma:

def get_horizons_data(command, start_time, stop_time):
    horizons_url = 'https://ssd.jpl.nasa.gov/api/horizons.api'

    horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",       # position + velocity
        'CSV_FORMAT': "'YES'",
    }

    response = requests.get(horizons_url, params=horizons_params)

    start_token = '$$SOE'
    end_token = '$$EOE'

    response_text = response.text
    # print(response_text)   # uncomment for debugging

    start_token_index = response_text.index(start_token)
    end_token_index = response_text.index(end_token)

    data = response.text[start_token_index + len(start_token): end_token_index]
    return (pd
            .read_csv(StringIO(data), 
                      header=None,
                      names=['JDTDB', 'datetime_str', 
                             'X', 'Y', 'Z', 
                             'VX', 'VY', 'VZ', '_extra'])
           )

This is good, except that we want the datetime_str column in an actual datetime dtype. Normally, we could just pass date_format to read_csv and have the column parsed. But here, the format is a bit weird, starting with a space, then 'A. D.', then the date and time in an odd format. I handled this first with str.strip (to remove any leading/trailing whitespace), then str.removeprefix (to remove the extraneous characters).

I then took the resulting string and passed it to pd.to_datetime, giving it a format of '%Y-%b-%d %H:%M:%S.%f', which means: Years, month (name), day, hour, minutes, seconds, and fractions of a second. (This was much easier, thanks to the https://www.strfti.me/ site.)

I wrapped all of that inside of a lambda, and then used assign to create a new column, datetime, with the resulting value.

I then used drop to remove the _extra and also the datetime_str column, which I didn't need any more:

def get_horizons_data(command, start_time, stop_time):
    horizons_url = 'https://ssd.jpl.nasa.gov/api/horizons.api'

    horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",       # position + velocity
        'CSV_FORMAT': "'YES'",
    }

    response = requests.get(horizons_url, params=horizons_params)

    start_token = '$$SOE'
    end_token = '$$EOE'

    response_text = response.text
    # print(response_text)   # uncomment for debugging

    start_token_index = response_text.index(start_token)
    end_token_index = response_text.index(end_token)

    data = response.text[start_token_index + len(start_token): end_token_index]
    return (pd
            .read_csv(StringIO(data), 
                      header=None,
                      names=['JDTDB', 'datetime_str', 
                             'X', 'Y', 'Z', 
                             'VX', 'VY', 'VZ', '_extra'])
            .assign(datetime = lambda df_: pd.to_datetime(df_['datetime_str'].str.strip().str.removeprefix('A.D. '),
                                                          format='%Y-%b-%d %H:%M:%S.%f'))
            .drop(columns=['_extra', 'datetime_str'])            
           )

By invoking this function with a command, a starting time, and an ending time, we'll then get back a data frame for the body we're tracking, in one hour increments.

Get data frames for Orion (command `'-1024'`) and the moon (command `'301'`), starting at April 2, 2026 at 02:00 and ending on April 10, 2026 at 23:00. Join them together into a single data frame, with the datetime values as the index.

Now that get_horizons_data is defined, we can invoke it twice, once for Orion and once for the moon. Let's start with Orion:

orion_command = -1024
orion_df = get_horizons_data(orion_command,
                             '2026-04-02 02:00',
                             '2026-04-10 23:00')

This gives us a data frame with 214 rows and 8 columns. We can then repeat this for the moon:

moon_command = 301
moon_df = get_horizons_data(moon_command,
                             '2026-04-02 02:00',
                             '2026-04-10 23:00')

Once again, we get a data frame with 214 rows and 8 columns.

To combine these, we can use either join or merge. The former combines two data frames via their indexes, whereas merge does so via any two columns we name (or a single column we specify with on).

I decided to use merge, combining orion_df with moon_df. I specified that we'll perform the merge on the datetimecolumn in each.

However, there's a problem with merging the two data frames: They both have identically named columns, thanks to the fact that they were both created by invoking get_horizons_data. merge will raise an exception, complaining that it's being asked to produce a data frame with more than one identically named column.

The solution is to pass the suffixes keyword argument, giving it a two-element list of strings. The first string is added to column names from the calling data frame (orion_df, in this case), and the second string is added to column names from the data frame passed as the first argument (moon_df, in this case).

Note that because datetime is the column on which the merge is performed, it remains with its original name. It doesn't get a suffix, and isn't doubled. I then decided to make the datetime column into the index, by invoking set_index:

df = (
      orion_df
      .merge(moon_df, on='datetime', suffixes=['_orion', '_moon'])
      .set_index('datetime')
)

This gave us a new data frame with 214 rows (same as before) and 14 columns. There were 8 columns in each of the original data frames, so you might think we would get 16 back. But datetime is shared between them, giving us 15 columns – and then we moved it to be the index, so it isn't counted, giving us a total of 14.

Bamboo Weekly #165: Artemis II (solution)

Data and six questions

Write a function, get_horizons_data, that takes three arguments: A command number, a starting time, and a stopping time. The function should return a Pandas data frame with the colummn names described in VEC_TABLE, above. The datetime_str should be a datetime value.

Get data frames for Orion (command '-1024') and the moon (command '301'), starting at April 2, 2026 at 02:00 and ending on April 10, 2026 at 23:00. Join them together into a single data frame, with the datetime values as the index.

Read next

Bamboo Weekly #165: Artemis II

Write a function, `get_horizons_data`, that takes three arguments: A command number, a starting time, and a stopping time. The function should return a Pandas data frame with the colummn names described in `VEC_TABLE`, above. The `datetime_str` should be a `datetime` value.

Get data frames for Orion (command `'-1024'`) and the moon (command `'301'`), starting at April 2, 2026 at 02:00 and ending on April 10, 2026 at 23:00. Join them together into a single data frame, with the datetime values as the index.