Skip to content

You can't shut down government text-mining

Photo via Thinkstock.
Photo via Thinkstock.

Photo via Thinkstock.

Northeastern University professors David Smith and Ryan Cordell are interested in hidden social networks. In particular, they want to understand how ties between editors, writers, politicians, business magnates, etc., made the 19th century news go ’round.

To do so, the duo of digital humanities experts decided to sift through thousands of bits of text from historical newspapers to find snippets that repeat themselves in various places. Since this would be too much work for human eyeballs, Smith, a computer scientist, developed a program that is doing it for them. The program picks up on things like two editors, one in Missouri and another in Vermont, printing the same content, verbatim, a few days apart. This sort of data, Smith said, can tell you with pretty high certainty that the two guys had at least some political views in common, if not a personal connection.
In a collaboration with researchers at the University of Washington, Smith has now applied that same code to policy bills from the 111th congress (the one that spanned the first two years of Obama’s presidency). By doing so, he found that 11 percent of the Democratic congress’ work had Republican origins. That doesn’t sound like a lot, until you think about how the two sides of the aisle recently gave each other the silent treatment long enough to shut down the government for 17 days.
As we all know by now, one of the big things that initiated this shenanigans was a disagreement about the Patient Protection and Affordable Care Act. “This law is a trainwreck,” said House Speaker John Boehner about a week before the shutdown.  “It’s time to protect American families from this unworkable law.” One of his least favorite parts of the bill? The individual mandate. But as many have said before, that particular (and important) part of “Obamacare” happens to have its roots in Republican policy ideas.
Smith’s text mining experiment gives more striking results when “markup bills,” which are like second, third, and fourth editions in the book world, are excluded. In that case, the amount of Republican influence jumps to 28 percent. The work, Smith said, “is decomposing the monolithic idea of a legislator, providing a more finely articulated view of how policy and politics work.” Just like with the newspaper data, this work uncovers a hidden social network of politicians working together (gasp), to get their ideas into practice. If you see one bill spearheaded by a Republican, get scrapped, only to have its content repurposed almost exactly in a new bill by a Democrat, you might suspect they know each other. Perhaps they’re roommates when they visit the hill from their home states, or maybe they were in the same entering class of congressmen, doing trust falls together their first days in office.
When it comes to social science, you like to see causality, said Smith. You like to be able to pull a string here and see a lever moving over there. When you look at your friend network on Facebook that may very well be the case. But with politicians, it can be harder to see.
What I’m having a hard time understanding is why exactly politicans don’t want to look like they’re cooperating. I know I’d be much more satisfied with and confident in my government if I knew they’d managed to pass kindergarten, that important time in our education when we’re taught to cooperate.

Cookies on Northeastern sites

This website uses cookies and similar technologies to understand your use of our website and give you a better experience. By continuing to use the site or closing this banner without changing your cookie settings, you agree to our use of cookies and other technologies. To find out more about our use of cookies and how to change your settings, please go to our Privacy Statement.