In the 21st century, our lives are online: Around the world, we can shop, socialize, bank, attend events, visit doctors, watch TV, listen to music, order takeout, work, learn, and much more, all with an internet connection. However, a vast majority of these activities are facilitated by a handful of digital gatekeepers, leaving everyone else with no meaningful way to parse how people are using those platforms or how those platforms are using their customers.
Researchers at Northeastern University were awarded a $15.7 million grant from the National Science Foundation to build a research infrastructure that will provide scientists around the world and across disciplines with open, ethical, analytic information about how people behave online.
“This would be a platform for research on basic human behavior,” says David Lazer, university distinguished professor of political science and computer sciences, and co-director of the NULab for Texts, Maps, and Networks, who is leading the project.
Christo Wilson, associate professor of computer sciences, and David Choffnes, associate professor of computer sciences and executive director of the Cybersecurity and Privacy Institute, are also working on the project, along with John Basl, associate professor of philosophy at Northeastern, and Michelle Meyer, a bioethicist at Geisinger Health System.
Up to this point, researchers who study any aspect of online behavior or the platforms that enable it have done so by using a few techniques: They’ve used bots to scrape data data from platforms, or used small samples of people who provide data about their online behavior, or sometimes requested or purchased specific sets of data from the platforms directly.
The strategies have drawbacks, however: Data is collected on a case-by-case basis, creating bespoke silos of information but not a wide picture of life online.
“Much of the research on digital traces is akin to searching by the lamppost in the parking lot for your keys, rather than in the dark where you dropped them,” the researchers write in their proposal.
Recruiting small samples of people to provide their own online data can also be expensive, thereby limiting the extent to which scientists can afford to study life online at all. Finally, online companies can be opaque in their methods for collecting and disseminating their data.
“People’s online lives are their lives; they’re inseparable,” Wilson says, “and these experiences are being shaped by platforms that aren’t transparent about how they make decisions. The fact that we can’t introspect what have really become pillars of modern life is a problem.”
The development of the research infrastructure is still in early stages, but the researchers proposed that it would work generally as follows:
- Northeastern scientists will recruit a small sample of people (roughly 2,000) for a rigorous examination of their online behavior, including how and how often they use major platforms. Then, they’ll recruit a larger sample (tens of thousands of people) to assess broader population trends, using the more granular information from the smaller sample as the gold standard to calibrate for the larger.
- The researchers will build a web browser extension to collect data about the URLs the volunteers visit and what they search for, from their laptops and desktops. To capture information from mobile devices, the researchers will build out apps for both Androids and iPhones that will enable limited collection of network traffic from the devices.
- In both cases—on a desktop or a mobile device—volunteers will also be prompted to fill out short surveys about their choices online, the researchers say.
The team will include two ethicists, Basl and Meyer, among its core researchers, and the research itself will follow strict regulatory compliance. The researchers, experts in cybersecurity and privacy issues, will also take every measure to ensure the data they collect is stored securely in private servers and available for scholarship in a format and with a process that protects the volunteers’ privacy.
Indeed Basl’s interest in the project, he says, is in studying existing data privacy norms followed by online platforms as a means for bolstering them.
“One of the challenges we face in the project is that existing privacy norms, standards, and tools are often inadequate for promoting and protecting privacy in the face of big data analytics,” he says. “So, I’ll be watching and working to help ensure that we are cognizant of this issue, and work to develop new ways to evaluate, understand, and protect privacy.”
This infrastructure would provide myriad other opportunities for research, too.
Economists could study how money flows online; political scientists could study how people gather information ahead of elections; social scientists could study how social media platforms shape the information landscape in ways that have real-world consequences.
The computer science inquiries are vast: Researchers could study how algorithms influence people’s individual experiences online, or the extent to which individual data is being collected and used by corporate entities.
“Nothing like this has ever been done before,” Choffnes says, and the possibilities for research based on the data he and his colleagues collect are nearly limitless. “It would shine a light on dark areas of the world, not just for computer scientists, but for anyone who studies anything on the internet.”
The researchers compare this infrastructure to the Hubble telescope: another powerful tool for scientific inquiry and exploration.
“Science is in part based on the tools of the scientists, and we don’t have adequate tools to answer our questions—yet,” Lazer says.