Millions of research papers are published in a year. How do scientists keep up?
If you want to be a scientist, youāre going to have to do a lot of reading.Ā
Science is an endeavor focused on building and sharing knowledge. Researchers publish papers detailing their discoveries, breakthroughs, and innovations in order to share those revelations with colleagues. And there are millions of scientific papers each year.
Keeping up with the latest developments in their field is a challenge for researchers at all points of their careers, but it especially affects early-career scientists, as they also have to read the many papers that represent the foundation of their field.

Ajay Satpute, assistant professor of psychology and director of the Affective and Brain Science Lab. Photo by Ruby Wallau/Northeastern University
āItās impossible to read everything. Absolutely impossible,ā Ajay Satpute, director of the Affective and Brain Science Lab and an assistant professor of psychology at Northeastern. āAnd if you donāt know everything that has happened in the field, thereās a real chance of reinventing the wheel over and over and over again.ā The challenge, he says, is to figure out how to train the next generation of scientists economically, balancing the need to read all the seminal papers with training them as researchers in their own right.
That task is only getting more difficult, says Alessia Iancarelli, a Ph.D student studying affective and social psychology in Satputeās lab. āThe volume of published literature just keeps increasing,ā she says. āHow are scientists able to develop their scholarship in a field given this huge amount of literature?ā They have to pick and choose what to read.
But common approaches to that prioritization, Iancarelli says, can incorporate biases and leave out crucial corners of the field. So Iancarelli, Satpute and colleagues developed a machine learning approach to find a betterāand less biasedāway to make a reading list. Their results, which were published last week in the journal PLOS One, also help reduce gender bias.
āThere really is a problem about how we develop scholarship,ā Satpute says. Right now, scientists will often use a search tool like Google Scholar on a topic and start from there, he says. āOr, if youāre lucky, youāll get a wonderful instructor and have a great syllabus. But thatās going to be basically the field through that personās eyes. And so I think that this really fills a niche that might help create balance and cross-disciplinary scholarship without necessarily having access to a wonderful instructor, because not everyone gets that.ā

The problem with something like Google Scholar, Iancarelli explains, is that it will give you the most popular papers in a field, measured by how many other papers have cited them. If there are subsets of that field that arenāt as popular but are still relevant, the important papers on those topics might get missed with such a search.
Take, for example, the topic of aggression (which is the subject the researchers focused on to develop their algorithm). Media and video games are a particularly hot topic in aggression research, Iancarelli says, and therefore there are a lot more papers on that subset of the field than on other topics, such as the role of testosterone, and social aggression.
So Iancarelli decided to group papers on the topic of aggression into communities. Using citation network analysis, she identified 15 research communities on aggression. Rather than looking at the raw number of times a paper has been cited in another research paper, the algorithm determines a community of papers that tend to cite each other or the same core set of papers. The largest communities it revealed were media and video games, stress, traits and aggression, rumination and displaced aggression, the role of testosterone, and social aggression. But there were also some surprises, such as a smaller community of research papers focused on aggression and horses.
āIf you use community detection, then you get this really rich, granular look at the aggression field,ā Satpute says. āYou have sort of a birdās-eye-view of the entire field rather than [it appearing that] the field of aggression is basically media, video games, and violence.ā
In addition to diversifying the topics featured by using this community approach, the researchers also found that the percentage of articles with women first authors dubbed influential by the algorithm doubled in comparison to when they focused only on total citation counts. (Iancarelli adds there might be some biases baked into that result, as the team couldnāt ask the authors directly about their gender identity and instead had to rely on assumptions based on the authorās name, picture, and any pronouns used to refer to them.)
The team has released the code behind this algorithm so that others can use it and replicate their citation network analysis approach in other fields of research.Ā
For Iancarelli, thereās another motivation: āI would love to use this work to create a syllabus and teach my own course on human aggression. I would really love to base the syllabus on the most relevant papers from each different community to give a true general view of the human aggression field.ā
For media inquiries, please contact Shannon Nargi at s.nargi@northeastern.edu or 617-373-5718.





