Northeastern students found a promising solution to AI data storage woes — and the industry took notice

Students on Northeastern’s Vancouver campus found a way to make storing tensor data, used in machine learning models, more cost and energy efficient. It landed them a spot at the 2024 Data + AI Summit.

by Cody Mello-Klein

July 16, 2024

Liam Bao, Liao-Liao Liu and Zhiyu Wu, computer science graduate students at Northeastern’s Vancouver campus, presented their research at the 2024 Data + AI Summit. Courtesy photo Liao-Liao Liu

In 2024, data is the name of the game. What data a company collects and how that data gets analyzed is key to the business model of many modern companies, from Spotify to Meta. But there’s an even more pressing question for many of these companies: Where do you store all this data?

Especially at a time when there has never been more data, data storage is a make or break issue. If mishandled, it can increase financial as well as energy costs and also negatively impact users.

Enter a trio of computer science graduate students from Northeastern University’s Vancouver campus. For their capstone project, Liam Bao, Liao-Liao Liu and Zhiyu Wu found a way to make storing a specific yet precious kind of data –– the tensor data used in machine learning models like ChatGPT –– cost less for companies, consumers and the planet.

“Traditionally, people are not using cloud technology,” Bao says. “They are using a premade database, so all of their data is stored on a physical level server in their own company. As the technology grows into the cloud native era, we can use the cloud service. We have a very elastic power of storage service and computer service. You can think that in the cloud, it’s unlimited.”

“We bring the tensor into a database instead of just purely on disc as a binary format,” Bao continues. “After we bring it into the database, we can leverage many different kinds of storage optimization techniques to save space. That’s the one big difference. After that, we can also increase the efficiency of reading and writing the machine learning model.”

Featured Stories

Students bond with ancient ‘beings’ in old growth forests of the Pacific Northwest

A person watches an instructor while standing on a large rock at a national park.

Students bond with ancient ‘beings’ in old growth forests of the Pacific Northwest

Northeastern convocation 2024

Students with their hands in the air as confetti falls around them at Convocation.

Northeastern convocation 2024

Emmy-nominated Northeastern grad is helping revolutionize broadcast graphics at Fox Sports

Emmy-nominated Northeastern grad is helping revolutionize broadcast graphics at Fox Sports

How Northeastern researchers are cutting valuable minutes off urban train maintenance with new strategies

How Northeastern researchers are cutting valuable minutes off urban train maintenance with new strategies

To effectively train, a machine learning model or large language model requires a massive amount of data. As a result, the kinds of databases that organizations use to store and distribute data today can quickly become resource drains, Liu says.

“That data can scale to a very scary level, like a petabyte, even exabyte,” Liu says. “Even one single point of storage efficiency will bring a very big difference in the energy cost.”

According to the paper, this storage solution reduces the amount of data that needs to be transferred from a network by 90% and improves energy efficiency by 10%. Together, that could also come with a reduction in dollars spent, too.

“When you store all the data in the cloud, storing more data size means you have to pay more, so it’s also cost effective,” Wu says.

Outside of the potential improvements for companies, Liu says using their storage solution could be a boon for developers, too.

For example, someone trying to train an object detection model relies on massive image datasets and platforms. In order to make those images usable for their model, they have to use platforms that download entire collections of image files in order to transform them into tensors. Their method cuts out that step entirely, speeding things up in the process.

“Any time the user wants to use the whole tensor or a part of a tensor, using our project they can say, ‘Hey, this portion of the data, can you help me grab it?’ and it will give it to you,” Liu says. “There’s no actual computation doing the transformation from the JPEG files into the tensors.”

With data storage on the minds of so many in the tech world, it took little time for Bao, Liu and Wu’s work to catch the attention of major industry players. After posting their work online, they were invited by Databricks, a global data and AI company, to present at its annual Data + AI Summit alongside the likes of Google, Apple and Nvidia.

For a team of students that had poured everything they had into their research not expecting to be recognized by the larger tech community, it was a welcome surprise.

“Nothing feels better than your research getting recognized,” Liu says. “That moment is huge.”

An Olympic runner was just killed by her partner, showing how domestic violence ‘affects every single person’

The iPhone is about to be very different – and potentially better – in Europe. Experts explain why

An Olympic runner was just killed by her partner, showing how domestic violence ‘affects every single person’

The iPhone is about to be very different – and potentially better – in Europe. Experts explain why

Northeastern convocation 2024

How will AI transform health care? Northeastern to host workshop and conference focused on precision health

Photos: Convocation ceremonies, Welcome Week and Fall Fest

An Olympic runner was just killed by her partner, showing how domestic violence ‘affects every single person’

Banned in Brazil: The world is moving toward greater regulation of social media, two Northeastern experts say

Georgia school shooting is a reminder that mass killings are tragic but rare, Northeastern criminologist says

The iPhone is about to be very different – and potentially better – in Europe. Experts explain why

Is cooling paint the key to turning down the planet’s temperature dial?

Northeastern researchers test well water in North Carolina, empowering communities with critical data

Cicely Carew says public art can ‘inspire wonder and allow for enchantment’

Dylan Rockoff, chart-topping singer-songwriter and Northeastern grad, carves his own path

Emmy-nominated Northeastern grad is helping revolutionize broadcast graphics at Fox Sports

Why are food prices still so high? What is price gouging — and why is it so complicated?

From Wendy’s to Popeyes and Five Guys, experts explain why US fast food chains are making a comeback in the UK

Weak jobs report sends stocks tumbling, but what’s really happening in the labor market?

New Northeastern research could revolutionize treatment for children’s growth plate fractures using stem cells

What does ‘pain’ mean, medically? From Wharton to Ellison and Hayes, literature has answers

As ongoing listeria outbreak claims nine lives, Northeastern expert explains why food recalls are on the rise

Can Jools Lebron still trademark ‘Very Demure, Very Mindful’? Legal expert explains her options

A knife attack left three young girls dead and sparked riots in Southport. So why can’t the British press name the suspect?

Is joking about Trump’s assassination attempt protected speech? You might not get charged, but you could lose your job, experts say

How AshaPops, founded by a Northeastern grad and his mom, are revolutionizing snacks with popped water lily seeds

Efforts to limit fast-food near homes need rethinking, Northeastern researcher says

Nike Dunks, Air Jordans, Yeezy slides: Huskick’s club is all about sneakers

Halloween is coming to stores earlier and earlier. There’s a spooky reason why

JD Vance, Trump’s VP pick, reignites America’s fascination with Appalachia. This anthropologist dispels the myths

You’re more likely to die in a car crash than you are from a shark attack. So why are we more afraid of sharks?

Matt Janning returns to Northeastern as assistant coach after 13-year international basketball career

Northeastern graduate Fiona Howard wins equestrian gold at the Paris Paralympics

Should tennis star Jannik Sinner have been suspended for two positive doping tests?

Featured Stories

Students bond with ancient ‘beings’ in old growth forests of the Pacific Northwest

Northeastern convocation 2024

Emmy-nominated Northeastern grad is helping revolutionize broadcast graphics at Fox Sports

How Northeastern researchers are cutting valuable minutes off urban train maintenance with new strategies

University News

Recent Stories