Whoo-hoo! A Simpsons word-search tool

While the FXX network was in the midst of its epic marathon of The Simpsons—which featured all 552 episodes from all 25 seasons aired chronologically—Ben Schmidt was driving home from summer vacation and thinking about all the classic jokes the series has produced over the years.

Then he had another thought, which can be summed up in one word: “Excellent.”

Schmidt is an assistant professor of history and a core faculty member in the NU Lab for Texts, Maps, and Networks, Northeastern’s center for digital humanities and computational social science. A couple of years ago, he and a colleague at Harvard developed a tool called Bookworm that can visualize trends in repositories of digitized texts. He says it’s a powerful tool that can provide fascinating insight into the use of language and how the use of certain words has changed over time; he’s used it to examine the language used in everything from newspaper articles to past U.S. presidents’ State of the Union addresses.

He’s also put his Big Data computational tools to work for two TV series—CBS’ Vegas and Showtime’s The Masters of Sex—to consult on whether scripts include the proper terminology from the eras in which the shows are set. He also blogs about such historical inaccuracies in shows like Mad Men, Downton Abbey, and Foyle’s War.

And thanks to a recently announced National Endowment for the Humanities grant, Schmidt will be enhancing and integrating his Bookworm tool with the HathiTrust Digital Library, which holds 3.9 billion pages of digitized materials.

“We can find interesting trends when you parse out the words,” he says.

So, he thought, why not take a closer look at The Simpsons? He was particularly interested in learning more about the structure of episodes. To create the Bookworm, Schmidt collected the closed-caption text for every episode and built a tool that allows users to search references to specific words throughout the show’s 25-season run by season, episode, and even time within each episode.

He said he created this Bookworm because although The Simpsons is a culturally important show in its own way, these types of digital tools are particularly valuable for examining things other than TV shows that you can’t just sit down and watch.

When users first check out the site, they’ll see a Bookworm browser of how often “Maude,” “Troy McClure,” and “Duffman” appear; a linear graph indicates the frequency per season, and the user can click on individual axis points to bring up each instance of that word by episode, when in the episode it appears, and even the line from the episode. In addition, users can input their own word searches.

Schmidt says the project has revealed some interesting trends. For instance, “Kent Brockman” mostly appears in the opening scenes, which indicates the Springfield newscaster character is typically used to set up the episode. Also, the word “school” appears much more frequently in the early minutes of episodes. “That’s pretty interesting, actually: pretty much every minute, the plots seem to shift away from school,” Schmidt wrote on his blog.

In most cases, Schmidt’s Simpsons Bookworm is a perfectly cromulent tool, but he acknowledges the limitations, in part due to relying on the closed captioned text. For instance, ironically users can’t search Homer’s iconic “D’oh” due to the apostrophe. And it’s probably hard to search for many of Ned Flanders’ sayings because who the heck can spell them.

Next, Schmidt plans to create a similar search tool for thousands of feature films.

Mmmmm…..Big Data.