Responsible AI model for programmers being advanced by Northeastern computer scientist

by Alena Kuzub

September 12, 2023

Photo by Matthew Modoono/Northeastern University

Believing in open scientific collaboration on AI technology, a Northeastern professor joined others in creating a state-of-the-art open generative model for programmers that can be licensed and adapted for different uses such as gaming and industrial automation.

Generative artificial intelligence and large language models have taken the world by storm in the last few years, says Arjun Guha, associate professor of computer science at Khoury College of Computer Sciences at Northeastern University. They are having a particularly significant impact on programming.

Computer scientists, programmers and smaller-market players, however, have very limited insight into the development process of these models, and that prevents them from developing a deeper understanding of the technology. It also excludes them from meaningful participation in its further expansion.

Head shot of Arjun Guha. — Arjun Guha, associate professor of computer science. Photo by Matthew Modoono/Northeastern University

That is why Guha and his research group got heavily involved in the BigCode project, launched by two private companies, Hugging Face and ServiceNow.

Hugging Face, a company that hosts a large open-source machine learning community, and ServiceNow, which helps businesses optimize technology solutions, teamed up to support individuals with professional AI research background in responsible development and use of open large language models for coding. They committed significant people and hardware resources to the project. As a result, StarCoder, a state-of-the-art, open generative model for programmers can be now licensed and adapted by others for different uses.

“You can spend an enormous amount of money building one of these things and not actually know if it’s any good,” Guha says.

The few multi-billion-dollar companies that have resources to build such learning models and “drop” them every now and then to stun the world, Guha says, are completely closed to the idea of sharing with the community what this technology is capable of.

“If you ask the people who make them, ‘What can I do with it?’, I think the answer they will always give you disingenuously is ‘anything,’ which is misleading,” he says.

Guha believes that academic research has a role to play in shaping generative AI technology.

“An academic can come in and rigorously evaluate these things and say that here are its strengths and weaknesses. Yes, use it to do this, but please don’t use it to do these other things without some serious guardrails,” Guha says.

A much more pressing issue is people using this technology to make decisions that impact other people, for example, about a loan application or a job opening.

A person looks at code on a computer screen. — Photo by Matthew Modoono/Northeastern University

A silhouette of a person sitting in front of a computer. — Photo by Matthew Modoono/Northeastern University

“We should talk about when it is not appropriate to use these models, when they are doing more harm than good,” he says.

Guha dedicated a lot of energy to BigCode, which launched in September 2022, he says, leading a working group that focused on evaluating the open models, StarCoder and SantaCoder, created by the project.

Building an LLM first requires identifying the data that will be fed into the model to train it. When the model has been trained, Guha says, it should be evaluated on what it can and cannot actually do.

The models created by the BigCode project were trained at the Hugging Face cluster. Guha’s group evaluated the majority of them at the Northeastern Discovery cluster at the Massachusetts Green High Performance Computing Center, a high-powered parallel computing system that incorporates cutting-edge computing technologies and robust storage solutions.

They conducted an extensive evaluation in 19 different programming languages to understand the capabilities of the models.

“When this project launched, one of the goals was to have it work on lots and lots of languages to make several communities happy,” Guha says.

The models were tested to implement such tasks as producing code from natural language descriptions, documenting code and predicting type annotations.

Other researchers carried out other analyses, such as a bias and toxicity analysis that showed that since the coding model was not trained on vast internet data, it consumed less toxic content and was not likely to produce toxic output.

Guha says the StarCoder model underwent the most extensive evaluation that ever occurred for a focused LLM, because of the massive collaborative nature of the BigCode project.

“It’s been a great project that brought together a lot of researchers at various stages in their careers,” he says.

The paper that came out of this part of the BigCode project in May had almost 70 co-authors. Several doctoral students and undergraduates, Guha says, were able to contribute to the model.

Anyone now can request to download and use Starcoder or SantaBase for free for research, commercial or non-commercial purposes as long as they sign the BigCode Open Responsible AI Licenses agreement and follow restrictions that apply, including to the modified material.

For example, Guha is collaborating with MathWorks, a corporation that specializes in mathematical computing software for engineers and scientists, and Roblox, an online global game platform, on exploring how they could use StarCoder, bring it in-house and customize to their needs.

A number of researchers are using the model as well, Guha says.

The BigCode project is very transparent and explicit, Guha says, about what data its models are using. People can file a request if they want the project to stop using their data. So far, only a couple dozen people have done so.

BigCode is ramping up for the next round of the project and expects to make announcements on further developments soon.

Alena Kuzub is a Northeastern Global News reporter. Email her at a.kuzub@northeastern.edu. Follow her on Twitter @AlenaKuzub.

by Alena Kuzub

September 12, 2023

More by Alena Kuzub

This Northeastern graduate is pioneering women’s leadership in Boston’s real estate development

What can Kamala Harris learn from Donald Trump to win the 2024 presidential election?

Northeastern’s Miami campus partners with Karlie Kloss’ summer camp for aspiring computer coders

Editor's Picks

What do corporations need to ethically implement AI? Turns out, a philosopher

Business leaders should use human-centered approaches to AI adoption, Northeastern dean says

Expert advice: Coping strategies for navigating the 24-hour news cycle

Google’s brand ads are a “sham” but companies have to buy them anyway, new report finds

With the help of Northeastern, Tennessee Valley Authority experiments with a new forecast model to better predict extreme rainfalls

Featured Stories

They’re living boulders on the ocean floor. Northeastern research explains the mysterious corallith

Wendy Parmet became a public health giant. In true Northeastern fashion, it started with a co-op

With the help of Northeastern, Tennessee Valley Authority experiments with a new forecast model to better predict extreme rainfalls

Northeastern’s Summer Youth Employment Program expands in Oakland, empowering more high school students

What do corporations need to ethically implement AI? Turns out, a philosopher

Business leaders should use human-centered approaches to AI adoption, Northeastern dean says

Have MinuteClinics had their minute? Why retail health clinics are shutting their doors, and what’s next

Can you trust AI-powered search engines like OpenAI’s SearchGPT? Northeastern expert explains why she’s ‘extremely skeptical’

Shelley Stewart, a global supply chain leader, appointed to Northeastern University Board of Trustees

This Northeastern graduate is pioneering women’s leadership in Boston’s real estate development

What do corporations need to ethically implement AI? Turns out, a philosopher

Expert advice: Coping strategies for navigating the 24-hour news cycle

What can Kamala Harris learn from Donald Trump to win the 2024 presidential election?

How soon will pollsters have good data on a Harris-Trump matchup?

Can you trust AI-powered search engines like OpenAI’s SearchGPT? Northeastern expert explains why she’s ‘extremely skeptical’

Google’s brand ads are a “sham” but companies have to buy them anyway, new report finds

With the help of Northeastern, Tennessee Valley Authority experiments with a new forecast model to better predict extreme rainfalls

Legal scholar Patricia Williams explores race, bodily integrity and law in ‘The Miracle of the Black Leg’

10 books to add to your summer must-read list

Looking for cheese plate inspiration and recipes? This food stylist, connoisseur and influencer built a global community

Have MinuteClinics had their minute? Why retail health clinics are shutting their doors, and what’s next

Job applicants perceive AI-powered hiring process as more fair when it is blind to characteristics such as race or gender, new study finds

Why the Boston Celtics’ sale that could top $4.7 billion signals a booming market for sports franchises

Listeria outbreak linked to deli meats. Those who are pregnant are at severe risk, Northeastern expert warns

Northeastern cannabinoids researcher developing drugs to fight pain and inflammation

New treatments for Alzheimer’s cost tens of thousands of dollars a year. Here’s why

Is joking about Trump’s assassination attempt protected speech? You might not get charged, but you could lose your job, experts say

Can Donald Trump or Joe Biden play whatever music they want at a rally or convention? Legal expert says it’s more complicated

From factories to TikTok, how child labor laws are struggling to keep up with the digital revolution

Efforts to limit fast-food near homes need rethinking, Northeastern researcher says

Nike Dunks, Air Jordans, Yeezy slides: Huskick’s club is all about sneakers

Video: The story and science behind Rupee Beer, a lager designed to be paired with Indian food

From London to Paris: What the 2012 Olympics taught us about urban transformation

Falling out of a coconut tree into a ‘brat summer’ — why Kamala Harris is embracing meme culture

Donald Trump ‘has a new lease on life.’ Can a traumatic event like surviving a shooting change a person’s personality?

Northeastern graduate Fiona Howard named to 2024 U.S. Paralympic dressage team

Northeastern star Mike Sirota goes to the Cincinnati Reds in third round of Major League Baseball draft

Boston Unity Cup partners with Northeastern for international soccer celebration at Carter Playground

This Northeastern graduate is pioneering women’s leadership in Boston’s real estate development

What can Kamala Harris learn from Donald Trump to win the 2024 presidential election?

Northeastern’s Miami campus partners with Karlie Kloss’ summer camp for aspiring computer coders

What do corporations need to ethically implement AI? Turns out, a philosopher

Business leaders should use human-centered approaches to AI adoption, Northeastern dean says

Expert advice: Coping strategies for navigating the 24-hour news cycle

Google’s brand ads are a “sham” but companies have to buy them anyway, new report finds

With the help of Northeastern, Tennessee Valley Authority experiments with a new forecast model to better predict extreme rainfalls

.ngn-magazine__shapes {fill: var(--wp--custom--color--emphasize, #000) } .ngn-magazine__arrow {fill: var(--wp--custom--color--accent, #cf2b28) } NGN Magazine They’re living boulders on the ocean floor. Northeastern research explains the mysterious corallith

.ngn-magazine__shapes {fill: var(--wp--custom--color--emphasize, #000) } .ngn-magazine__arrow {fill: var(--wp--custom--color--accent, #cf2b28) } NGN Magazine Wendy Parmet became a public health giant. In true Northeastern fashion, it started with a co-op

With the help of Northeastern, Tennessee Valley Authority experiments with a new forecast model to better predict extreme rainfalls

Northeastern’s Summer Youth Employment Program expands in Oakland, empowering more high school students

Science & Technology

Recent Stories

They’re living boulders on the ocean floor. Northeastern research explains the mysterious corallith

Wendy Parmet became a public health giant. In true Northeastern fashion, it started with a co-op