The three biggest players in voice assistants –– Google, Apple and Amazon –– have radically different approaches to profiling users, Northeastern University researchers say.
Voice assistants like Alexa and Siri are almost like part of the family. Some people talk to them every day, asking questions or making requests that will help make their lives easier. But do they really know you? Should you be worried if they do?
These devices are almost always listening, and the companies behind them are already collecting our data. However, a team of researchers recently set out to determine just how much companies like Amazon, Apple and Google are using the data gathered through their voice assistants to profile us –– track and monitor our behavior –– across the internet.
“The concern for me has always been, what are they doing with this data? When you ask them questions, are they just answering you?” says David Choffnes, an associate professor of computer and executive director of the Cybersecurity and Privacy Institute at Northeastern University.
“Are they developing some understanding of who you are and what you’re interested in and then using that for other purposes beyond just answering your questions or responding to your command?” Choffnes asks.
The study focused on the behaviors of the three biggest voice assistant platforms: Amazon’s Alexa, Apple’s Siri and Google Assistant. What researchers found was that how concerned you should be about your smart assistant profiling you varies greatly depending on which device you use.
But in order to figure this out, they had to essentially trick voice assistants into profiling them.
They downloaded publicly available information that Google compiles on every user based on their searches, like gender, age range, relationship status and income bracket. Using those labels, they were able to design questions that could easily convince the platforms that they were, for example, married, had children or were a homeowner not a renter.
The researchers then recorded themselves asking these questions and replayed the audio to voice assistants over and over again. Over the course of 20 months, they conducted 1,171 experiments involving nearly 25,000 queries.
In addition, Choffnes and his team had to create personas with different identifying information for the voice assistants to profile. Instead of using information from real people, they created new email addresses and phone numbers that were set up specifically to “present to Google or Amazon or Apple in certain ways,” Choffnes says.
For example, they would log in to a Google account they had designed to be the persona of a homeowner with children. They would ask Google Assistant questions related to homeownership and children and then ask Google for the data corresponding to that account to see if it classified the account correctly.
What they ended up finding was that Alexa exhibits the most straightforward kind of profiling behavior: It’s all based on your interest in products.
“If you’re asking about products over Alexa, you’re going to see that behavior also reflected on the web,” Choffnes says. “If you go to Amazon.com, for example, you’ll see recommendations based on what you asked.”
However, with Siri and Google Assistant, things are more complicated.
After reaching out to Apple to get their data, the company insisted “they had no data on us,” Choffnes says, “which means we couldn’t even test anything or prove any hypothesis about whether there was any profiling happening.”
“We couldn’t find any particular evidence of Apple doing this kind of profiling, which is potentially good but also [comes] with the huge caveat that we don’t know,” Choffnes adds. “We can’t say that they are not doing profiling, but we can’t confirm how the profiling works in the way we did with Amazon and Google.”
Meanwhile, Google Assistant was the strangest of the bunch. The researchers found that it was clearly profiling its users but often incorrectly.
It turns out that when someone creates a new Google account, the platform essentially conjures labels for that person seemingly at random. Instead of assuming nothing about a new user, it assumes everything with no information to back up that assumption.
“Even if you’ve never interacted with Google, we’ll see tags like ‘in a relationship,’ we’ll see ‘moderate or high income,’ we’ll see your employer is a large employer, that you are a homeowner and that you are not a parent,” Choffnes says.
Even when Google was able to draw on a user’s behavior, it still frequently profiled people incorrectly. According to the study, Google was able to correctly profile a married person with 70% accuracy but could only do so with 10% accuracy for people who are “in a relationship.”
“In many cases, Google was not providing the tags that we expected, even though we were asking questions where we expected it would be glaringly obvious that a certain label should be assigned to our profiles,” Choffnes says.
Google Assistant’s wild behavior left Choffnes and his team with more questions, namely is it more or less concerning for a user’s privacy that Google’s profiling is often flat-out wrong?
It’s unclear whether this data is even being used for things like targeted ads, but Choffnes says the faults in the system could be a net positive for consumers because “then they’re not necessarily targeting you in ways that you may not want to be targeted.”
For advertisers, it’s a different story and potential liability for Google.
“If you’re an advertiser and you’re being charged money to target an audience and that audience that you end up targeting is the wrong one because Google used prepopulated labels, that would look very bad for Google,” Choffnes says.