Northeastern professor leads an international effort to map the human proteome

Last year marked the 10th anniversary of the Human Genome Project, which identified each of the 22,000 genes in human DNA. But as chemistry professor William Hancock pointed out, this was only a beginning.

He is co-organizing an international effort to map more than 500,000 proteins (collectively called the proteome), which are encoded by our DNA, one chromosome at a time.

“The proteome tells us what is happening in an individual, whereas the genome measures the potential for disease,” said Hancock, the Bradstreet Chair in Bioanalytical Chemistry at Northeastern’s Barnett Institute of Chemical and Biological Analysis in the College of Science. “C-HPP [the Chromosome-Centric Human Proteome Project] will give a more complete parts list of what proteins are expressed in health and disease,” he said.

The project began in September, and Hancock and colleagues presented the C-HPP concept to the community in an article in the journal Nature Biotechnology in March. In April, the Journal of Proteome Research published the Standard Guidelines for the project, and next week, Hancock will join more than 1,000 researchers from around the world in Beijing for the 6th biannual congress of the Asia Oceania Human Proteome Organization to plan the next steps in the initiative.

“The Human Genome Project was largely concentrated in the west,” Hancock said. “The proteome is a much more international effort.” So far, 16 nations on three continents have signed on, with Asian nations taking a significantly more prominent role than in the past. “Such a huge project requires the resources of many countries. Also, there are genetic and disease differences between different ethnic groups and geological locations,” he said.

Every cell in the human species contains 23 pairs of chromosomes, the structural units that organize our DNA in a linear string of genes. Each C-HPP team will devote its efforts to defining the full protein parts list of a single chromosome, which contains, on average, about 1,000 protein-coding genes. Depending on genetic and environmental factors, Hancock said, these thousand genes can code for one to approximately 40 protein isoforms, which will subtly alter an individual’s disease/health status. Hancock’s team at Northeastern will work on Chromosome 17, which is a particularly unstable chromosome and includes many genes associated with cancer, including the gene with the highest inherited risk for breast cancer.

The rationale for using a chromosome-centric approach, he explained, comes down to the need to integrate massive amounts of genomic and proteomic data. This is much easier to do in the context of gene location, which is determined by the chromosomes.

In addition to taking a unique approach, the C-HPP will also use novel data presentations to manage the massive quantities of information that the teams will generate. For example, an interactive color map will make accessible several levels of data for each protein coding gene, such as gene activity, protein mass spectrometry, antibody reagents, tissue localization and disease information.

“This is a major scientific undertaking and will greatly aid medical research and development of innovative, personalized drugs,” Hancock said.