Women In Movies: Bechdel Test Results Over Time¶

Jennifer Smith¶

June 7, 2022¶

For this project, I explored a dataset of movies and whether they pass or fail the "Bechdel Test", the well-known test that examines the portrayal of women in movies. To pass, a movie must pass three tests: it has at least two named female characters, they talk to eachother, and they talk about something other than men. The data was downloaded from: https://www.kaggle.com/datasets/mathurinache/women-in-movies?resource=download

In [2]:
import pandas as pd
movies = pd.read_csv('movies.csv')
movies.head()
Out[2]:
year imdb title test clean_test binary budget domgross intgross code budget_2013$ domgross_2013$ intgross_2013$ period code decade code
0 1970 tt0065466 Beyond the Valley of the Dolls ok ok PASS 1000000 9000000.0 9000000.0 1970PASS 5997631 53978683.0 53978683.0 NaN NaN
1 1971 tt0067065 Escape from the Planet of the Apes notalk notalk FAIL 2500000 12300000.0 12300000.0 1971FAIL 14386286 70780525.0 70780525.0 NaN NaN
2 1971 tt0067741 Shaft notalk notalk FAIL 53012938 70327868.0 107190108.0 1971FAIL 305063707 404702718.0 616827003.0 NaN NaN
3 1971 tt0067800 Straw Dogs notalk notalk FAIL 25000000 10324441.0 11253821.0 1971FAIL 143862856 59412143.0 64760273.0 NaN NaN
4 1971 tt0067116 The French Connection notalk notalk FAIL 2200000 41158757.0 41158757.0 1971FAIL 12659931 236848653.0 236848653.0 NaN NaN
In [39]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_theme(style="white")

sns.relplot(x="year", y="budget_2013$", hue="binary", size="domgross_2013$",
            sizes=(10, 600), alpha=.5, palette="muted",
            height=6, data=movies, legend='brief')

plt.gcf().set_size_inches(15, 8)

plt.xlabel('')
plt.ylabel('Budget in 2013 $Billion', fontsize=14)
plt.title('Movies, Bechdel Test result, Budget, Domestic Revenue', fontsize=18)
Out[39]:
Text(0.5, 1.0, 'Movies, Bechdel Test result, Budget, Domestic Revenue')

6/3 Update: I was able to adjust the size of the plot since my first post! It's definitely much easier to view now, but I still think my bar and line charts tell the story more clearly.

This was the first visualization I tried. It captures the raw number of movies that pass or fail the test over time, as well as the budget of those movies and the domestic revenue they brought in. I do think this is an interesting visual, and when studying it you can draw some conclusions, but for many purposes I think it is likely too much information.

In [41]:
count_per_year = movies.pivot_table('test', index = 'year', columns = 'binary', aggfunc = 'count')
count_per_year['Total'] = count_per_year['FAIL'] + count_per_year['PASS']
count_per_year.head()
Out[41]:
binary FAIL PASS Total
year
1970 NaN 1.0 NaN
1971 5.0 NaN NaN
1972 2.0 1.0 3.0
1973 4.0 1.0 5.0
1974 5.0 2.0 7.0
In [44]:
import numpy as np
import matplotlib.pyplot as plt

N = len(count_per_year)
ind = np.arange(N)

plt.figure(figsize=(16,8))
plt.grid(visible=False, which='both', axis='x')
plt.grid(visible=True, which='both', axis='y', color='lightgrey', linestyle='-', linewidth=0.5)

p1 = plt.bar(ind, count_per_year['Total'], color = 'mediumaquamarine')
p2 = plt.bar(ind, count_per_year['FAIL'], color = 'grey')

plt.ylabel('Count of movies', fontsize=14)
plt.title('Movies that passed and failed Bechdel Test by year', fontsize=20)
plt.xticks(ind, count_per_year.index.values, rotation=45, fontsize=12)
plt.yticks(np.arange(0, 140, 10), fontsize=13)
plt.legend((p1[0], p2[0]), ('Passed', 'Failed'), fontsize=16, frameon=True, 
           facecolor='white', edgecolor="white", borderpad=1, ncol=2)

plt.figtext(0.18, 0.69, 'Requirements to Pass:', fontsize=18, fontweight='bold', color='grey', backgroundcolor='white')
plt.figtext(0.18, 0.58, ' 1. At least 2 named female characters\n 2. They talk to eachother\n 3. About something other than men',
            fontsize=17, color='grey', linespacing=2, wrap=True, backgroundcolor='white')

plt.show()

After the first visualization that tried to do too much, I wanted something straigthforward that would quickly convey how many movies still don't pass the test (by count). I think this stacked bar chart shows the difference quite clearly. To get this to work I first created a pivot table from the original dataset, then used the new df for the visualization. I also played around a lot with formatting - colors, gridlines, fonts - and included text to explain the test for viewers who might not be familiar.

In [5]:
budget = movies.pivot_table('budget_2013$', index = 'year', columns = 'binary', aggfunc='sum')
budget.head()
Out[5]:
binary FAIL PASS
year
1970 NaN 5997631.0
1971 493236323.0 NaN
1972 61293532.0 66866.0
1973 125732851.0 62926730.0
1974 34388727.0 103921974.0
In [6]:
import numpy as np
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(16, 8))

plt.grid(visible=False, which='both', axis='x')
plt.grid(visible=True, which='major', axis='y', color='lightgrey', linestyle='-', linewidth=1)
plt.grid(visible=True, which='minor', axis='y', color='lightgrey', linestyle='-', linewidth=0.5)

line1, = ax.plot(budget.index.values, budget['FAIL'], label='Failed', color = 'grey', linewidth=4)
line2, = ax.plot(budget.index.values, budget['PASS'], label='Passed', color= 'mediumaquamarine', linewidth=2)

def currency(x, pos):
    
    if x >= 1e6:
        s = '${:1.1f}B'.format(x*1e-9)
    else:
        s = '${:1.0f}K'.format(x*1e-3)
    return s

ax.yaxis.set_major_formatter(currency)
plt.title('Budgets of Movies that passed and failed Bechdel Test by year', fontsize=20)
plt.ylabel('Budget of movies (in 2013 dollars)', fontsize=14)
plt.xticks(budget.index.values, rotation=45, fontsize=12)
plt.yticks(fontsize=13)

plt.figtext(0.15, 0.65, 'Takeaway: the movie industry continues to spend \nfar more on movies that fail the Bechdel Test', 
            fontsize=18, color='grey', backgroundcolor='white', linespacing=2)

ax.legend(fontsize=16, frameon=True, loc='upper center',
           facecolor='white', edgecolor='white', borderpad=1, ncol=2)
plt.show()

To go along with the preceding bar chart, I decided to try a line chart that shows the budgets of movies that pass and fail over time. I wanted to look at the budgets to make the point that the movie industry is proliferating the problem by continuing to under fund movies with solid female representation. Yes, budgeting for movies that pass has grown over time, but budgeting for movies that fail has grown faster in the same amount of time.

In [ ]: