Skip to content
13 min read plotly plotting joins datetime api

Bamboo Weekly #165: Artemis II (solution)

Get better at: APIs, dates and times, joins, and plotting with Plotly.

Bamboo Weekly #165: Artemis II (solution)

This week, we looked at data from NASA's Artemis II mission (https://en.wikipedia.org/wiki/Artemis_II), currently on its way home to Earth. Orion, the spacecraft that took four astronauts around the moon, looped around the other ("dark") side of the moon, allowing them to see features that have almost never been seen before.

Paid subscribers, both to Bamboo Weekly and to my LernerPython+data membership program (https://LernerPython.com) get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.

Learning goals for this week include reading working with APIs, working with dates and times, joins, and plotting.

Data and six questions

This week's data comes from NASA's Horizons API (https://ssd-api.jpl.nasa.gov/doc/horizons.html). This system has been around for many years, and the API reflects that. You'll need to make a GET request to https://ssd.jpl.nasa.gov/api/horizons.api. (No API key or registration is needed.) You'll need to pass a number of parameters to the API, most easily done via the following Python dict:


```python
 horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",      
        'CSV_FORMAT': "'YES'",
    }
```    

The dict reflects the parameters that you need to pass. Note that all of them must be put inside of single quotes! It's not enough for them to be strings:

Here are my six questions, along with solutions and explanations:

Write a function, get_horizons_data, that takes three arguments: A command number, a starting time, and a stopping time. The function should return a Pandas data frame with the colummn names described in VEC_TABLE, above. The datetime_str should be a datetime value.

I started by importing a number of Python modules:

import pandas as pd
from plotly import express as px
import requests
from io import StringIO

I import pandas and plotly every week. And sometimes, like this time, I've needed to use requests in the past, allowing us to consume APIs.

But what's with StringIO? Simply put, it's a data structure (in Python's standard library) that acts like a file, but is actually all in memory. You can create a StringIO object with a string, and then use it where you would otherwise use a file. This is an example of what Python calls a "file-like object," in that it implements the same API as a file. We'll need to use this for some of our manipulations when reading from the API.

Next, I implemented the function. As requested, it takes three arguments – a command (really an integer indicating what body we want to track), along with a start time and a stop time. The step size is assumed to be '1h', for one-hour granularity. I started the function by defining some variables.

I used requests to make a GET request to horizons_url, passing the horizons_params dictionary as query parameters.

def get_horizons_data(command, start_time, stop_time):
    horizons_url = 'https://ssd.jpl.nasa.gov/api/horizons.api'

    horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",       # position + velocity
        'CSV_FORMAT': "'YES'",
    }

    response = requests.get(horizons_url, params=horizons_params)

The response variable contains a "response object" from the HTTP request. This contains all of the information we might want, from the numeric response code to the headers. But really, all we care about is grabbing the text from the response, which we'll assume (somewhat naively) worked just fine.

The thing is, the CSV file is between the '$$SOE' and '$$EOE' markers. So we'll need to get the text content from the response, then grab only the text between those markers as a slice. Here's how I did it:

def get_horizons_data(command, start_time, stop_time):
    horizons_url = 'https://ssd.jpl.nasa.gov/api/horizons.api'

    horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",       # position + velocity
        'CSV_FORMAT': "'YES'",
    }

    response = requests.get(horizons_url, params=horizons_params)

    start_token = '$$SOE'
    end_token = '$$EOE'

    response_text = response.text
    # print(response_text)   # uncomment for debugging

    start_token_index = response_text.index(start_token)
    end_token_index = response_text.index(end_token)

    data = response.text[start_token_index + len(start_token): end_token_index]

In other words, I used str.index to find start_token and end_token, and then got the text between the two with a slice. (Notice that I had to move past start_token to avoid including it.

data now contains a string with CSV-formatted data. We'll somehow want to turn that into a file-like object that we can pass to read_csv. As I mentioned above, we can use StringIO for this purpose, and it does the job great.

I also passed two other keyword arguments to read_csv: header, setting it to None, since the data doesn't have any column headers, and then names, with a list of strings we can use. Notice that I passed '_extra' at the end of the names list, because Horizon's data format always includes a trailing comma:

def get_horizons_data(command, start_time, stop_time):
    horizons_url = 'https://ssd.jpl.nasa.gov/api/horizons.api'

    horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",       # position + velocity
        'CSV_FORMAT': "'YES'",
    }

    response = requests.get(horizons_url, params=horizons_params)

    start_token = '$$SOE'
    end_token = '$$EOE'

    response_text = response.text
    # print(response_text)   # uncomment for debugging

    start_token_index = response_text.index(start_token)
    end_token_index = response_text.index(end_token)

    data = response.text[start_token_index + len(start_token): end_token_index]
    return (pd
            .read_csv(StringIO(data), 
                      header=None,
                      names=['JDTDB', 'datetime_str', 
                             'X', 'Y', 'Z', 
                             'VX', 'VY', 'VZ', '_extra'])
           )

This is good, except that we want the datetime_str column in an actual datetime dtype. Normally, we could just pass date_format to read_csv and have the column parsed. But here, the format is a bit weird, starting with a space, then 'A. D.', then the date and time in an odd format. I handled this first with str.strip (to remove any leading/trailing whitespace), then str.removeprefix (to remove the extraneous characters).

I then took the resulting string and passed it to pd.to_datetime, giving it a format of '%Y-%b-%d %H:%M:%S.%f', which means: Years, month (name), day, hour, minutes, seconds, and fractions of a second. (This was much easier, thanks to the https://www.strfti.me/ site.)

I wrapped all of that inside of a lambda, and then used assign to create a new column, datetime, with the resulting value.

I then used drop to remove the _extra and also the datetime_str column, which I didn't need any more:

def get_horizons_data(command, start_time, stop_time):
    horizons_url = 'https://ssd.jpl.nasa.gov/api/horizons.api'

    horizons_params = {
        'format': 'text',
        'COMMAND': f"'{command}'",
        'EPHEM_TYPE': "'VECTORS'",
        'START_TIME': f"'{start_time}'",
        'STOP_TIME': f"'{stop_time}'",
        'STEP_SIZE': "'1h'",
        'VEC_TABLE': "'2'",       # position + velocity
        'CSV_FORMAT': "'YES'",
    }

    response = requests.get(horizons_url, params=horizons_params)

    start_token = '$$SOE'
    end_token = '$$EOE'

    response_text = response.text
    # print(response_text)   # uncomment for debugging

    start_token_index = response_text.index(start_token)
    end_token_index = response_text.index(end_token)

    data = response.text[start_token_index + len(start_token): end_token_index]
    return (pd
            .read_csv(StringIO(data), 
                      header=None,
                      names=['JDTDB', 'datetime_str', 
                             'X', 'Y', 'Z', 
                             'VX', 'VY', 'VZ', '_extra'])
            .assign(datetime = lambda df_: pd.to_datetime(df_['datetime_str'].str.strip().str.removeprefix('A.D. '),
                                                          format='%Y-%b-%d %H:%M:%S.%f'))
            .drop(columns=['_extra', 'datetime_str'])            
           )

By invoking this function with a command, a starting time, and an ending time, we'll then get back a data frame for the body we're tracking, in one hour increments.

Get data frames for Orion (command '-1024') and the moon (command '301'), starting at April 2, 2026 at 02:00 and ending on April 10, 2026 at 23:00. Join them together into a single data frame, with the datetime values as the index.

Now that get_horizons_data is defined, we can invoke it twice, once for Orion and once for the moon. Let's start with Orion:

orion_command = -1024
orion_df = get_horizons_data(orion_command,
                             '2026-04-02 02:00',
                             '2026-04-10 23:00')

This gives us a data frame with 214 rows and 8 columns. We can then repeat this for the moon:

moon_command = 301
moon_df = get_horizons_data(moon_command,
                             '2026-04-02 02:00',
                             '2026-04-10 23:00')

Once again, we get a data frame with 214 rows and 8 columns.

To combine these, we can use either join or merge. The former combines two data frames via their indexes, whereas merge does so via any two columns we name (or a single column we specify with on).

I decided to use merge, combining orion_df with moon_df. I specified that we'll perform the merge on the datetimecolumn in each.

However, there's a problem with merging the two data frames: They both have identically named columns, thanks to the fact that they were both created by invoking get_horizons_data. merge will raise an exception, complaining that it's being asked to produce a data frame with more than one identically named column.

The solution is to pass the suffixes keyword argument, giving it a two-element list of strings. The first string is added to column names from the calling data frame (orion_df, in this case), and the second string is added to column names from the data frame passed as the first argument (moon_df, in this case).

Note that because datetime is the column on which the merge is performed, it remains with its original name. It doesn't get a suffix, and isn't doubled. I then decided to make the datetime column into the index, by invoking set_index:

df = (
      orion_df
      .merge(moon_df, on='datetime', suffixes=['_orion', '_moon'])
      .set_index('datetime')
)

This gave us a new data frame with 214 rows (same as before) and 14 columns. There were 8 columns in each of the original data frames, so you might think we would get 16 back. But datetime is shared between them, giving us 15 columns – and then we moved it to be the index, so it isn't counted, giving us a total of 14.