Chemists are training machine learning algorithms used by Facebook and Google to find new molecules

Steven A. Lopez is an assistant professor of chemistry and chemical biology in the College of Science at Northeastern. Photo by Matthew Modoono/Northeastern University

For more than a decade, Facebook and Google algorithms have been learning as much as they can about you. It’s how they refine their systems to deliver the news you read, those puppy videos you love, and the political ads you engage with. 

These same kinds of algorithms can be used to find billions of molecules and catalyze important chemical reactions that are currently induced with expensive and toxic metals, says Steven A. Lopez, an assistant professor of chemistry and chemical biology at Northeastern. 

Lopez is working with a team of researchers to train machine learning algorithms to spot the molecular patterns that could help find new molecules in bulk, and fast. It’s a much smarter approach than scanning through billions—and billions—of molecules without a streamlined process.  

“We’re teaching the machines to learn the chemistry knowledge that we have,” Lopez says. “Why should I just have the chemical intuition for myself?”

The alternative to using expensive metals is organic molecules, and particularly plastics, which are everywhere, Lopez says. Depending on their molecular structure and ability to absorb light, these plastics can be converted with chemistry to produce better materials for today’s most important problems. 

Lopez says the goal is to find molecules with the right properties and similar structures as metal catalysts. But to attain that goal, Lopez will need to explore an enormous number of molecules.

Thus far, scientists have been able to synthesize only about a million molecules. But conservative estimates of the number of possible molecules that could be analyzed is a quintillion, which is 10 raised to the power of 18, or the number one followed by 18 zeros.

Lopez thinks of this enormous number of possibilities as a vast ocean made up of billions of unexplored molecules. Such an immense molecular space  is practically impossible to navigate—even if scientists were to combine experiments with supercomputer analysis. 

Lopez says all of the calculations that have ever been done by computers add up to about a billion, or 10 to the ninth power. That’s about a million times less than the possible molecules.

“Forget it, there’s no chance,” he says. “We just have to use a smarter search technique.”

That’s why Lopez is leading a team, supported by a grant from the National Science Foundation, that includes research from Tufts University, Washington University in St. Louis, Drexel University, and Colorado School of Mines. The team is using an open-access database of organic molecules called VERDE materials DB, which Lopez and colleagues recently published, to improve their algorithms and find more useful molecules.

The database will also register newly found molecules, and can serve as a data hub of information for researchers across several different domains, Lopez says. That’s because it can launch researchers toward finding different molecules with many new properties and applications.

In tandem with the database, the algorithms will allow scientists to use computational resources more efficiently. After molecules of interest are found, researchers will recalibrate the algorithm to find more similar groups of molecules. 

The active-search algorithm, developed by Roman Garnett at Washington University in St. Louis, uses a process similar to the classic board game Battleship, in which two players guess hidden locations off a grid to target and destroy vessels within a naval fleet.

In that grid, players place vessels as far apart as possible to make opponents miss targets. Once a ship is hit, players can readjust their strategy and redirect their attacks to the coordinates surrounding that hit. 

That’s exactly how Lopez thinks of the concept of exploring a vast ocean of molecules.

“We are looking for regions within this ocean,” he says. “We are starting to set up the coordinates of all the possible molecules.”

Hitting the right candidate molecules might also expand the understanding that chemists have of this unexplored chemical space.

“Maybe we’ll find out through this analysis that we have something really at the edge of what we call the ocean, and that we can expand this ocean out a bit more in that region,” Lopez says. “Those are things that we wouldn’t [be able to find by searching] with a brute force, trial-and-error kind of approach.” 

For media inquiries, please contact Jessica Hair at or 617-373-5718.