BW #75: Refugees
[Sorry for the delayed publication; I had another packed, late day of corporate training and office hours. Solutions should come on time Thursday!]
I was in Prague last week for the Euro Python conference, which was delightful in every possible way. I was chatting with my driver on the way to the airport, and he reminded me that while Prague (and the Czech Republic in general) is quite peaceful, the Russian invasion of Ukraine is happening relatively close by. I knew that the war had created a huge number of refugees, especially women and children, and that a number of countries, including the Czech Republic, had absorbed many of them.
This got me thinking: How many Ukrainian refugees are there? Indeed, how many refugees are there in general around the world? Has the number increased over the years? Where do refugees come from, and where do they go?
This week, we'll thus look at data about international refugees, and try to understand the countries and numbers involved. This is obviously a large and complex topic, one filled with lots of heartbreak and pain, as well as political turmoil in the countries that refugees came from and arrived at. Immigration and refugees are hotly debated topics across a large number of countries.
Data and seven questions
This week's data comes from three files, all produced by the World Bank:
- The population of each country and area, per year: https://api.worldbank.org/v2/en/indicator/SP.POP.TOTL?downloadformat=csv
- The number of refugees who left each country and area, per year: https://api.worldbank.org/v2/en/indicator/SM.POP.REFG.OR?downloadformat=csv
- The number of refugees who left each country and area, per year: https://api.worldbank.org/v2/en/indicator/SM.POP.REFG?downloadformat=csv
Learning goals include combining data frames, working with multi-indexes, grouping, filtering, working with time data, and resampling.
Here are this week's seven challenges and questions. I'll be back on Thursday with my solutions, including my Jupyter notebook:
- Create a single data frame whose index is made up of country names. The columns will be a multi-index with top-level names "origin", "destination", and "population". The lower level of the multi-index should contain the years; you can remove other columns.
- What 10 countries accepted the most refugees in 2000 and 2023?