Developer breaks 'The Simpsons' down by the numbers

Https%3a%2f%2fblueprint-api-production.s3.amazonaws.com%2fuploads%2fstory%2fthumbnail%2f22622%2f2014_2f08_2f24_2f11_2fthesimpsons.f85d6-1
Https%3a%2f%2fblueprint-api-production.s3.amazonaws.com%2fuploads%2fstory%2fthumbnail%2f22622%2f2014_2f08_2f24_2f11_2fthesimpsons.f85d6-1

In the last 27 years, the writers and animators behind The Simpsons have built a vast fictional world that presents plenty of opportunities for analysis. 

Todd W. Schneider, a developer at Genius, took a deep dive into Simpsons data using scripts pulled from Simpsons World. From there, he wrote code that organizes dialogue by character and then ranks each character by number of words spoken.

Schneider's findings, published in a post on his personal website, cover both main and supporting character dialogue and the locations within the show where these conversations take place. His piece also takes a larger look at the show's declining ratings, and how those numbers stack up against wider trends in TV viewership.

SEE ALSO: A love letter to the glorious 'Bart vs. Australia' episode of 'The Simpsons'

In terms of words spoken by main and supporting characters, the Simpson family itself unsurprisingly accounts for 47% of the show's dialogue. But one of Schneider's more interesting findings has to do with the number of words spoken by female characters. 

According to his analysis, female characters overall account for 25% of the show's dialogue. But take Marge and Lisa out of the equation, and Schneider writes that the percentage drops to below 10%. His analysis highlights a disparity between lines spoken by men versus lines spoken by women. 

"A look at the show’s list of writers reveals that 9 of the top 10 writers are male," he writes. "I did not collect data on which writers wrote which episodes, but it would make for an interesting follow-up to see if the episodes written by women have a more equal distribution of dialogue between male and female characters."

Schneider also uses a statistical measure called term frequency-inverse document frequency (tf-idf) to pull specific keywords that relate to each episode. For each script, tf-idf determines which words appear more often than usual in that specific episode. For example, the keyword assigned to Season 5 episode "Cape Feare" is "Sideshow Bob," which makes sense considering the character's heavy role in that episode's plot line

Check out the keywords for each episode (through Season 26, at least), and read more about Schneider's fascinating analysis on his website