BW #61: Solar eclipse

Much of North American enjoyed a total eclipse of the sun on Monday. This week, we'll look at NASA data about the eclipse, calculating who got to enjoy totality, and for how long.

BW #61: Solar eclipse

Some of the biggest news to come out of North America this week was the total eclipse of the sun, which took place on Monday. Although I didn't see it myself, I saw lots of photos from friends and family in the United States. Plus, one of the students in the corporate training I'm doing this week went outside to see the eclipse, and shared his view with us via his computer's camera.

I discovered that there will be a partial solar eclipse in Israel in 2027. And the next total solar eclipse? It'll take place after the year 2200, which means I'm unlikely to get to enjoy it. Oh, well.

This week, we'll look at data that NASA has collected about the eclipse that took place on Monday. We'll figure out where they saw a total eclipse, how long totality lasted, and even create plots that'll show us where the eclipse was most clearly visible.

Data and seven questions

This week's data comes from NASA's 2024 Eclipse page:

https://svs.gsfc.nasa.gov/5073

We'll first look at the JSON file with information about US cities, and when they can expect to see the eclipse in various forms:

https://svs.gsfc.nasa.gov/vis/a000000/a005000/a005073/cities-eclipse-2024.json

The NASA page has a data dictionary of sorts -- but as we'll see, it's not as useful as we might have hoped.

Learning goals for this week include working with JSON, manipulating data frames with arrays (lists) of data, working with dates and times, using the new case_when syntax, and plotting.

Here are my seven questions and tasks for this week:

  • Load the cities data into a data frame. Replace the ECLIPSE column, which contains a list of times, with six datetime values. The date for all should be April 8th, 2024.
  • According to the NASA documentation, the ECLIPSE column contains an array of five values, and says that they represent the start of the eclipse, 50% coverage, 100% coverage, 50% coverage, and the end of the eclipse. (Note that "100% coverage" doesn't mean totality; rather, it means the maximum coverage for a given location.) However, this column sometimes contains five values, and sometimes contains six. Without peeking at the next questions, what is your best guess/interpretation regarding these values? What do you think about the structure of this data?