I'm writing from Taiwan, where I'll be participating in PyCon Taiwan this coming weekend, including giving two talks. This is my third trip to Taiwan, and I have come to really enjoy it – not only because of the food and beautiful hikes, but also because I get a chance to practice my Chinese.
Living in Israel, I'm used to hot weather. But Taiwan is rather different – it's hot, but it's also quite humid, and it also rains a lot at this time of year. (Israel doesn't have any rain from about April to October.) So far, I've managed to avoid getting soaked in a rainstorm, but the humidity level makes it feel like we're hiking through a steam bath, albeit one with gorgeous views.
This week, we'll thus look through a data set from the Taiwanese government, looking at the climate of Taipei, which is where PyCon Taiwan will be held. It'll not only give you a chance to understand something about Taipei's weather, but also how to wrestle data into submission when it comes formatted in ways you didn't expect.
Data and five questions
This week's data set is a CSV file provided by the government of Taipei City. The page from which you can download the data is at https://data.gov.tw/en/datasets/145785, and the CSV file itself can be downloaded from https://tsis.dbas.gov.taipei/statis/webMain.aspx?sys=220&ymf=8701&kind=21&type=0&funid=a04000101&cycle=1&outmode=12&compmode=00&outkind=1&deflst=2&nzo=1 .
Learning goals for this week include: Working with non-Latin characters, renaming columns, handling date-time information, pivot tables, styling, and plotting.
Paid subscribers, including members of my LernerPython+data membership program at https://LernerPython.com, can download a copy of the data from a link at the bottom of this message. Paid subscribers can also download my Jupyter/Marimo notebooks (provided with tomorrow's solutions) and participate in monthly office hours.
I'll be back tomorrow with my solutions and explanations. Meanwhile, here are my questions and tasks:
- Load the CSV file into a data frame. Convert the column names (which are originally in Chinese) into English, using the "data fields" list on the data page. Also convert the "statistical period" column into
datetime
values, keeping in mind that the dates in the original data are written as87年 1月
, where the first number is the Republic of China year (i.e., where year 1 is 1912) and the second number is the month, so "1月" means January and "1月" means February. - Create a new data frame based on this one in which the years are the rows, the months are the columns, and the values are the total rainfall for that month. Style the data frame such that the heavier the rain, the darker the background color of the cell, limiting the floats to two digits after the decimal point. Which months (and years) stand out?