3Qs: Big Data project reaches for the ‘cloud’

Northeastern University is a key partner in the Massachusetts Open Cloud Project, a university-industry collaboration designed to create a new public cloud computing infrastructure to spur Big Data innovation. Last month, at the Massachusetts Green High Performance Computing Center in Holyoke, Massachusetts—which counts Northeastern as a partner institution—Gov. Deval Patrick announced a $3 million investment to get the project up and running. Peter Desnoyers, an assistant professor in the College of Computer and Information Science at Northeastern, helped bring the initiative to fruition. Here, he explains what the project means for both the future of Big Data and the university.

How did Northeastern get involved with the Massachusetts Open Cloud Project, and how will it benefit the university?

The project is a great example of the synergies between industry and academia in an area such as Boston. It’s the brainchild of Orran Krieger, a computer science professor at Boston University, whom I worked with before we each came to academia. The proposal to the state has been a collaborative effort between the two universities, and the preliminary research and development has been done by a collaborative group of Northeastern and BU students.

A primary goal of the Massachusetts Open Cloud is to enable fundamental research in cloud computing at Northeastern and the other partner universities. This is a rapidly growing and innovating field in which universities could bring much to the table, but it’s dominated by a few providers such as Amazon, Microsoft, and Google—the only organizations with access to perform research in this area. By supporting a variety of cloud implementations—ranging from experimental and pilot services implemented by computer science researchers to production systems for researchers in other fields—the MOC will enable academic research in this field, helping establish us as a center of excellence in cloud computing research.

What is Big Data, and why is it important?

Big Data refers to the processing of data sets that are far larger than the ability of individual computers to handle, requiring the coordination of hundreds or thousands of systems. It is most commonly applied to applications that handle unstructured text data, as opposed to high-performance computing, which typically works with large, but highly structured, numeric data sets.

Big Data techniques are the basis of much of the Internet economy, from recommendation systems on sites such as Amazon to indexing Web data for search engines such as Google. In addition, the recent availability of large data sets in areas ranging from historical documents to cell phone usage has allowed computational techniques based on Big Data analysis to be applied in transformative ways to areas of research ranging from political science to medicine and beyond.

What is Northeastern’s next step in regard to the Massachusetts Open Cloud project?

In the next year you will be seeing research proposals and early research from several Northeastern faculty including myself; Gene Cooperman, professor in the College of Computer and Information Science; and David Kaeli, professor in the College of Engineering. I will also be leading the effort to develop novel infrastructure to allow both experimental and reliable systems to coexist and interoperate within the open cloud, much as they do in today’s Internet.