Northeastern researchers are using machine learning algorithms to predict the function of enzymes from their amino acids

Penny Beuning and Mary Jo Ondrechen, Northeastern professors of chemistry and chemical biology, are developing new ways to predict the function of an enzyme by looking at the roles of the individual amino acids that make up its structure. Photo by Ruby Wallau/Northeastern University

If you drink beer, eat cheese, or enjoy a good sourdough, you can thank enzymes for speeding up the chemical reactions that make these foods possible. But these complicated molecules have more to offer than just a good meal, if researchers can figure out exactly what they’re doing at the chemical level.

“By understanding how enzymes work, we can better understand a whole range of things,” says Penny Beuning, a professor of chemistry and chemical biology at Northeastern. “This very basic research has a ton of applications in things like human health or biofuels.” 

Enzymes facilitate chemical reactions that are critical to the survival of living things. Learning more about how they perform this task could reveal new ways to target harmful bacteria and improve our understanding of our own bodies, as well as help researchers engineer new, more efficient enzymes for producing food, cleaning up environmental contamination, or manufacturing fuels from renewable sources. 

Using a combination of computational predictions and experimental testing, Beuning and her colleagues are trying to make it easier to determine what a particular enzyme does. They recently received a grant from the National Science Foundation to develop new ways to predict the function of an enzyme by looking at the roles of the individual amino acids that make up its structure.

Enzymes are made of tangled chains of amino acids. The researchers are using the positions and chemical properties of these amino acids to predict the roles of different enzymes, and then testing those predictions in the lab. Photos by Ruby Wallau/Northeastern University

Amino acids are small compounds that get strung together like beads on a necklace. These long chains of amino acids then spiral and fold into tangled-looking proteins. (Most enzymes are proteins.) The chemical properties of the various amino acids, and their positions within the three-dimensional structure, determine whether an enzyme will make copies of your DNA for new cells or help you digest milk.

“You could have proteins that look similar, but do different things; and you can have proteins that look very different but do the same thing,” Beuning says. “That’s why looking at which amino acids are contributing to the chemical reaction, and where they are in space, is really valuable.”

The researchers don’t need to look at every single amino acid. They only need to focus on those amino acids that are relevant to an enzyme’s role as a catalyst, speeding up chemical reactions. Mary Jo Ondrechen, the principal investigator for the project, has already developed a method for predicting which amino acids are involved in the chemical reactions.

“We can calculate chemical and electrostatic properties of the amino acids and the protein structure,” says Ondrechen, who is a professor of chemistry and chemical biology at Northeastern. “And that tells us where the chemistry happens.”

The logical next step is to try to predict the individual roles of the amino acids. If Beuning and Ondrechen can identify arrangements of amino acids that indicate a particular function, they can use that chemical fingerprint to figure out what other enzymes do.

“Even if they don’t look like anything that’s been seen before, we can at least say something about what they could do, based on the rules that we identify for the individual amino acids,” Ondrechen says. “This is totally uncharted territory.”

With the help of Deniz Erdogmus, a professor of electrical and computer engineering at Northeastern, the researchers will use machine learning algorithms to predict the roles of specific amino acids in a family of enzymes called glycoside hydrolases, which interact with sugars. Then they’ll run experiments in Beuning’s lab to check those predictions. 

They expect the resulting algorithms will be able to identify the functions of a broad swath of enzymes, opening the door to new research and innovations in food, health, energy, and other industries. 

“The genomic data set for humans, and now thousands of other species, is a treasure trove,” Ondrechen says. “We have a lot to learn, and a lot to gain from understanding it.” 

For media inquiries, please contact