Researchers find racial, gender bias in online freelance marketplaces

In the so-called “gig economy,” where algorithms, not people, drive the matches between workers and customers, hiring someone—from a plumber to a logo designer—should be bias free, right?

Wrong, according to new research led by Northeastern’s Christo Wilson. The researchers explored two prominent online marketplaces that facilitate freelance services and found that both exhibit racial and gender disparities. They were TaskRabbit, which caters to physical household tasks, and Fiverr, which offers virtual creative services.

For example, women on both sites received fewer reviews than men. On Fiverr, African American received lower ratings than white workers. On TaskRabbit, African American workers received lower ratings than white and Asian workers with essentially the same attributes, and the site’s algorithm, which automatically ranks recommended workers in search results, consistently placed both white and African American women lower than others. No bias was found in Fiverr’s ranking algorithm.

What I suspect is going on with TaskRabbit’s algorithm is that social feedback, such as reviewer comments, are considered in determining the ranking, and we know that social feedback can be biased.
— Christo Wilson, assistant professor

“In our research, we strive to reverse-engineer what each company is doing with its algorithm,” says Wilson, assistant professor in the College of Computer and Information Science. “It’s hard to do, and we can never say for sure how the algorithms work. But what I suspect is going on with TaskRabbit’s algorithm is that social feedback, such as reviewer comments, are considered in determining the ranking, and we know that social feedback can be biased.”

Troubling correlations

In conducting the study, which was co-led by Northeastern doctoral student Anikó Hannák, the researchers collected 3,707 worker profiles from TaskRabbit and 9,788 from Fiverr. The information included each worker’s name, image, gender, race, rating, number of tasks completed, customer reviews, and position in search rankings. Given the scope of the undertaking, they used an online service to examine the images to determine gender and race. Then they started crunching numbers.

The team, which also included associate professor Alan Mislove, explored two elements: Social feedback, that is, reviews and comments, to see if they correlated with demographics, for example, whether one racial or ethnic group consistently got worse social feedback than others; and rankings in search results to see if they correlated with demographics.

“Social feedback is where you would expect to find biases in the real world, and unfortunately we also saw them coming up in these online contexts,” says Wilson. “That was disturbing but not completely unexpected.”

October 9, 2013 - Northeastern University assistant professor, Christo Wilson, inside West Village H on Wednesday, October 9, 2013.

Christo Wilson, assistant professor Photo by Mariah Tauger/Northeastern University

On TaskRabbit, but not Fiverr, however, they also saw correlations between rankings and demographics. “There’s a negative correlation for white women and African American women with respect to search rank,” says Wilson. “Both groups consistently appeared lower than everybody else regardless of the task, whether the search was for an electrician or a personal shopper.”

Unknown confounding factors could have affected the results, says Wilson. “For starters, we don’t know anything about the customer mix in these group effects, so we can’t control for them.” Consider: If 95 percent of the people looking for a personal shopper were men, the correlation with higher ranking for male workers could represent choice, not bias. Another confounding factor could be the objective quality of the work. Could it be that male personal shoppers on this particular site in fact did make better purchasing choices for customers? “I absolutely don’t believe that’s true, but I can’t control for it,” says Wilson.

The demographic effects regarding rankings are subtle, notes Wilson. “That means that if social feedback is used in the algorithm, it’s a minor consideration, perhaps added if there’s a tie between a number of longtime active workers with similar ratings. The correlations are concerning, but if social feedback were a major feature, we’d see hugely biased rankings, and they’re not that way at all.”