Skip to content

Bamboo Weekly #176: Religious restrictions (solutions)

Get better at: Working with CSV files, reshaping data, pivot tables, grouping, and plotting.

Bamboo Weekly #176: Religious restrictions (solutions)

This week, we looked at data from the Pew Research Center's recently released report on religious restrictions and hostilities in countries around the world (https://www.pewresearch.org/religion/2026/06/15/more-countries-had-elevated-levels-of-social-hostilities-involving-religion-in-2023/ and https://www.pewresearch.org/religion/feature/religious-restrictions-around-the-world/). Pew has released such reports for more than 10 years, giving us insights into how governments and societies think about and enforce religious rules — among members of their own religion, and with (or against) people from other religions.

This week, we looked at Pew's historical data, which goes through 2022. We also looked at a small part of the latest data, including 2023, which was used in their most recent report.

Data and five questions

The Pew data can be downloaded from https://www.pewresearch.org/dataset/dataset-global-restrictions-on-religion-2007-2022/ . Click on the "Download dataset," or use this link to get the data in CSV format, along with a data dictionary explaining the (very large) number of measures:

https://www.pewresearch.org/wp-content/uploads/sites/20/2025/12/Global-Restrictions-on-Religion-2007-2022-Dataset.zip

Last week's Pew report included data from 2023; we'll use only one of those downloadable files, the one listing social hostilities around the world. Go to the appropriate chart on https://www.pewresearch.org/religion/2026/06/15/more-countries-had-elevated-levels-of-social-hostilities-involving-religion-in-2023/, and click on "data," then "download CSV."

Paid subscribers, both to Bamboo Weekly and to my LernerPython+data membership program (https://LernerPython.com) get all of the questions and answers, as well as downloadable data files, downloadable versions of my notebooks, one-click access to my notebooks, and invitations to monthly office hours.

Learning goals for this week include grouping, reshaping data, pivot tables, and plotting with Plotly.

Here are my five questions for this week, along with my solutions and explanations:

Read the main CSV file into a Pandas data frame. From the 2022 data, were any countries among the 10 highest GRI (government restrictions index) and also the 10 highest SHI (social hostilities index) based on religion?

As usual, I started by loading Pandas and Plotly:

import pandas as pd
from plotly import express as px

With that out of the way, I loaded the CSV file containing the data using the read_csv method.

This is one of those rare times when we could just read the data in, as is, without any filtering, restrictions, or type changes. It's tempting to select a subset of the rows or columns, but the entire thing takes up 2 MB of memory, a tiny data set for a modern computer:

filename = 'data/bw-176-pew-religion.csv'
df = pd.read_csv(filename)

I asked you to find the 10 countries with the highest GRI (government restrictions index), indicating that government policies take religion into account. Because I want to use this later, I performed the query and then assigned the result to a variable, highest_gri.

The query itself started off by filtering df to keep only data from the most recent year in this file, namely 2022. I did that with a combination of loc and pd.col, specifying that I only wanted rows from 2022.

But loc can take an optional second argument, which can be a string (for a single column) or a list (to specify a number of columns). I passed the two columns we'll use in this analysis — the country name and the GRI. The result was a much smaller data frame than we had before.

I then invoked set_index to move the country name into the index, then used nlargest to get the 10 countries seen as most restrictive on religion. Notice that nlargest on a data frame needs to be told which column to specify, even though there's only one column in this particular data frame.

Here's the full query:


highest_gri = (
    df
    .loc[pd.col('Question_Year') == 2022,
        ['Ctry_EditorialName', 'GRI']]
    .set_index('Ctry_EditorialName')
    .nlargest(columns='GRI', n=10)
)

Here are the results from that query:

Ctry_EditorialName	GRI
China	9.085
Egypt	8.37
Afghanistan	8.22
Iran	8.2
Syria	7.895
Indonesia	7.855
Russia	7.66
Azerbaijan	7.485
Algeria	7.455
Malaysia	7.305

This shows that the most restrictive government regarding religion is China, followed by Egypt, Afghanistan, Iran, and Syria. (Note that Syria reflects 2022, before the Assad regime was toppled.)

What if we look not at government policy, but at hostility to religion (or more precisely, to members of one or more particular religions) among the population, rather than as part of government policy? We can run the same query, just substituting SHI for GRI:

Next, I repeated the query, but sorted by SHI, rather than GRI:

highest_shi = (
    df
    .loc[pd.col('Question_Year') == 2022,
        ['Ctry_EditorialName', 'SHI']]
    .set_index('Ctry_EditorialName')
    .nlargest(columns='SHI', n=10)
)

The results:

Ctry_EditorialName	SHI
India	9.2923
Nigeria	8.7154
Syria	8.0769
Pakistan	7.8846
Iraq	7.8231
Egypt	7.3692
Afghanistan	7.3077
Israel	7.1154
Libya	6.4769
Palestinian territories	6.2846

A different picture emerges here: China is nowhere to be found, but at the top of the list are India and Nigeria.

My question was whether any countries are on both of these lists, with high government restrictions on religion and also high levels of social tension and hostility among the population.

To solve this, I used the fact that indexes on Pandas data frames are special objects with their own methods – including intersection, which allows us to find which elements are common between two indexes:

highest_shi.index.intersection(highest_gri.index)

This returns a new index object containing the elements that are common to the two inputs:

Index(['Syria', 'Egypt', 'Afghanistan'], dtype='str', name='Ctry_EditorialName')

We can see that Syria, Egypt, and Afghanistan are all common to both lists. So it's not just that the government will give you trouble over religion in these countries – people on the street, and in day-to-day life, might do so, as well.

Which 10 countries had the greatest improvement (i.e., lower score) in GRI from 2007 to 2022? Which 10 countries showed such improvements in the area of SHI?

Next, I was curious to know which countries had improved on both of these general scores, moving from a higher GRI in 2007 to a lower one in 2022.

I started by filtering the data frame with loc. My row selector was pd.col, specifying the Question_Year column. I'm a big fan of the isin method, which allows us to avoid the use of | for "or" logical queries in Pandas. This allowed me to keep just data from the years 2007 and 2022.

For the second argument, I passed a list of three strings (i.e., column names):

(
    df
    .loc[pd.col('Question_Year').isin([2007, 2022]),
        ['Ctry_EditorialName', 'Question_Year', 'GRI']]
)

In order to calculate the difference between each country's score in 2007 and in 2022, I needed to reformat the data frame. I did this with pivot_table, indicating that the distinct country names should be in the index, the distinct years (really, 2007 and 2022) should be the column names, and that the intersection should use the mean GRI. (Since there's only one GRI value for each country in each year, the use of mean doesn't do much. You could even say that it's meaningless.)

Here's how that part of the query looked:


(
    df
    .loc[pd.col('Question_Year').isin([2007, 2022]),
        ['Ctry_EditorialName', 'Question_Year', 'GRI']]
    .pivot_table(index='Ctry_EditorialName',
                 columns='Question_Year',
                 values='GRI',
                 aggfunc='mean')
)

I then used diff to calculate the change in GRI over the years, specifying that it should calculate from left to right, rather than top to bottom. I then used [] to retrieve just the column for 2022, ran nsmallest(10) to get the 10 smallest values (i.e., those that had declined the most over the years), and then invoked round to cut out some of the unnecessary noise from the data.

The full query was thus:


(
    df
    .loc[pd.col('Question_Year').isin([2007, 2022]),
        ['Ctry_EditorialName', 'Question_Year', 'GRI']]
    .pivot_table(index='Ctry_EditorialName',
                 columns='Question_Year',
                 values='GRI',
                 aggfunc='mean')
    .diff(axis='columns')
    [2022]
    .nsmallest(10)
    .round(2)
)

The results:

2022	Ctry_EditorialName
-1.4	Monaco
-1.1	Comoros
-1.07	Saudi Arabia
-1.07	Belgium
-0.9	Sudan
-0.86	Gabon
-0.81	Turkmenistan
-0.74	Brunei
-0.74	Colombia
-0.72	Myanmar

I then repeated this query for SHI:

(
    df
    .loc[pd.col('Question_Year').isin([2007, 2022]),
        ['Ctry_EditorialName', 'Question_Year', 'SHI']]
    .pivot_table(index='Ctry_EditorialName',
                 columns='Question_Year',
                 values='SHI',
                 aggfunc='mean')
    .diff(axis='columns')
    [2022]
    .nsmallest(10)
    .round(2)
)

The result:

2022	Ctry_EditorialName
-5.64	Comoros
-5.07	Saudi Arabia
-4.24	Romania
-3.79	Ghana
-3.78	Timor-Leste
-3.52	Indonesia
-3.34	Greece
-3.27	Western Sahara
-3.02	Sudan
-3.02	Brunei

So Saudi Arabia's government has improved by quite a bit over the years, and the general population of Saudi Arabia has become much more tolerant at the same time.

By the way, if you're wondering where Comoros is, it's a set of islands off the coast of Africa, with fewer than 1m inhabitants, almost all of whom are Muslim (https://en.wikipedia.org/wiki/Comoros).