Over the weekend, The New York Times reported that voter-profiling company Cambridge Analytica had obtained data on 50 million Facebook users—information that underpinned its work on President Donald J. Trump’s 2016 campaign.
The tactics used by researchers to gather the Facebook data weren’t illegal, didn’t violate Facebook’s terms at the time, and don’t constitute a data break or hack, according to professors in Northeastern’s College of Computer and Information Science.
It was a researcher selling the data to a third party—Cambridge Analytica—and failing to destroy it afterward that shines a light on two of the murkier areas of data collection: weak enforcement of security measures; and who owns our personal information.
Assistant professor Christo Wilson, whose research involves large-scale measurements of Facebook, said this was “absolutely a violation of people’s expectations and a breach of trust.”
Here’s how it went down.
In 2014, when an outside researcher harvested the data, he did so through an app he developed. At the time, Facebook allowed users to add apps to their accounts, explained associate professor Alan Mislove. For example, the virtual farming game, Farmville, was among the most popular of these types of apps.
The apps would ask users for certain permissions, giving them access to certain account information. The app used by the researcher who later sold their data to Cambridge Analytica was called “thisisyourdigitallife,” according to a Facebook memo on the issue. Roughly 270,000 people downloaded the app and gave it permission to access not only their own information, but that of their Facebook friends as well.
“That’s how they were able to get the data of 50 million people from only 270,000 installations,” Mislove said.
In a statement, Facebook denied the data harvesting constituted a data breach, and announced it would be suspending several of the accounts associated with Cambridge Analytica. “We are committed to vigorously enforcing our policies to protect people’s information,” the statement reads.
In their statement, Cambridge Analytica denied that the organization is in violation of Facebook’s terms and said it is in communication with Facebook following the news it had been suspended from the platform.
‘This is how it was built’
“What strikes me is that this is entirely how the app was intended to work at the time,” Mislove said.
Indeed, it’s exactly how Facebook itself was designed to operate, Wilson said. “This is how it was built,” he said. “If you got someone to install a Facebook app before 2015, you got that person’s information and all their friends’ information.”
This means all the information they uploaded, often including where they work and live, what they like, and more, Wilson said.
In 2015, the platform’s privacy settings changed dramatically. Among the changes, Facebook removed the ability of an app to gather detailed information on users’ friends.
Even before Facebook overhauled the rules, though, developers were not allowed to store most information longer than a day or so, Wilson said.
The catch, then and now, is that “there’s no consistent enforcement of that, as far as I can tell,” Mislove said. “The problem really is that there isn’t any good way to validate that the developer isn’t sharing this data or that the developer isn’t keeping a copy of it.”
‘There is no Mulligan with data’
Which is reportedly what happened with Cambridge Analytica. According to The New York Times, “Cambridge paid to acquire the personal information through an outside researcher who, Facebook says, claimed to be collecting it for academic purposes.”
And, though Facebook officials “demanded” that the acquired data be destroyed, it was revealed as recently as a few days ago that not all the data had been deleted.
Because the data were used by Cambridge Analytica during the 2016 election, and because of the recent developments that it hadn’t all been destroyed, several state and federal officials are launching investigations into the matter, including Massachusetts Attorney General Maura Healey’s office.
A Healey spokeswoman said Monday that the attorney general’s office had opened a civil investigation on the matter, and had been in touch with Facebook to better understand the scope of the data impacted as well as whether it included any Massachusetts residents.
‘In general, it’s the Wild West’
Assistant professor David Choffnes, whose research includes designing solutions to internet security and privacy, said there’s largely no framework in the U.S. for the vast majority of data collection.
Data related to health, financial credit, and that of children under 13 is federally protected, but “in general, it’s the Wild West,” Choffnes said.
It’s also incredibly difficult to track—there’s virtually no way to know if a piece of data has been duplicated, nor how many times.
“Once your data is out there, it’s out there,” he said. “You can’t ask for a Mulligan; the internet never forgets.”
And the real threat here “is exactly what happened,” Wilson said. Facebook allows advertisers to create custom audiences by uploading phone numbers of individual people. By targeting who among its 50 million candidates were the most vulnerable to inflammatory advertising, Cambridge Analytica can pummel those users with ads. Furthermore, Wilson explained, advertisers can ask Facebook to find other users who have similar attributes and send ads to those users as well.
“When you start considering the scale, it grows very fast,” Wilson said.
Furthermore, there’s no telling where else users’ data might be, the researchers said.
“This is the case that’s making the news right now, but it says nothing about what else might be out there,” Choffnes said.