Unprecedented data collection project, ‘a huge missing piece of the study of the internet,’ now underway

variations of blue and white squares are illuminated against a black background
Photo by Matthew Modoono/Northeastern University

An interdisciplinary group of researchers at Northeastern have embarked on an ambitious, multi-million-dollar project to study the way people behave online—and in turn, how the internet behaves back.

Thanks to a $15.7 million grant from the National Science Foundation, the team has begun to recruit volunteers for the online data collection project, which will involve monitoring the online experiences of tens of thousands of volunteer users through a web browser extension researchers are building, then documenting and analyzing the results. When all is said and done, the data collected will be available to scientists around the world and across disciplines for research purposes, as per the foundation’s mandate for the project. 

“This is a huge missing piece in the study of the internet,” says David Lazer, university distinguished professor of political science and computer sciences, and co-director of the NULab for Texts, Maps, and Networks, who is leading the project.

The money is now being used to build a “National Internet Observatory.”

Headshot of David Lazer (left) and David Choffnes (right)
Portraits of David Lazer, distinguished professor of political science and computer and information science, and Associate Professor David Choffnes. Photos by Adam Glanzman/Northeastern University

“The observatory enables a wide range of research concerning the Internet, including examination of the state of the information ecosystem, analysis of damaging online behavior of a variety of types, and generally studies of manifold aspects of the online world,” the researchers wrote in their proposal. 

There are numerous research questions spurring the project forward as it relates to these goals, Lazer says. Like to what extent does Twitter, for example, amplify some voices and accounts over others? Or how often does Google point people to high versus low reliability resources?

Researchers also hope to learn more about how information systems and their algorithms enable users to find information—reporting, commentary and other sources—that fit their own ideologies. This is referred to as the “filter bubble” effect, which experts have pointed to as a factor contributing to political polarization and broader social divisions.  

An underlying motivation for researchers is to explore and, where possible, disentangle “human and algorithmic choice” on the internet. As it relates, the monitoring project will help researchers gain “insight into what people choose to do” when using social media platforms, “but also what the platforms are doing in return,” Lazer says. 

All of this would take place without compromising the personal privacy of the volunteers involved. 

Other project collaborators include Christo Wilson, associate professor of computer sciences; David Choffnes, associate professor of computer sciences and executive director of the Cybersecurity and Privacy Institute; John Basl, associate professor of philosophy at Northeastern; and Michelle Meyer, a bioethicist at Geisinger Health System. Lazer, Wilson and Choffnes are principal investigators on the project. 

Researchers have built and deployed the web browser extension that will feed them information about the URLs the volunteers visit and what they search for on their devices. A critical component of this monitoring infrastructure is making sure it captures the activities of not just those who have desktop devices, but mobile users as well. 

That’s the focus of Choffnes, who is spearheading the mobile data collection side of the project, deploying apps for both Androids and iPhones that will enable some collection of network traffic from those devices.

“I think that many of us spend most of our time online via apps on our mobile devices, not web browsers,” Choffnes says. “I’m working on deploying a measurement system that allows us to capture this view—specifically, the services people use on their phones and tablets.”

Choffnes says “this will give us insight into how people interact with the incredibly rich and wide range of online services that exist in the mobile space, from social media to navigation, health, and more—and how those services tailor content for users and share information with others.”

Researchers began recruiting volunteers last week; they’re hoping to get the word out to anyone who might be interested, and aim to recruit a diverse sample population.  

“You have to go through a pretty thorough consent process,” Lazer says. “We’ll explain exactly what we’re collecting, then quiz people to make sure they understand it.”

Those interested can do so on the National Internet Observatory website. Once information is collected, it will be stored in a secure server maintained by Northeastern. Those allowed to access the databases include system administrators and researchers who’ve undergone “a rigorous ethical and technical vetting process.”

For media inquiries, please contact media@northeastern.edu.