Skip to content

Northeastern researchers develop AI app to help speech-impaired users communicate more naturally

Aanchan Mohan and Mirjana Prpa are developing ways to give users a range of tools on their phones: speech recognition, text, whole-word selection and emojis.

A person's hand holding a smartphone open to a purple screen that says 'AI' on it over a graphic of the human brain.
Northeastern University researchers are developing an AI-integrated app that will give speech-impaired users access to speech recognition and personalized text-to-speech synthesis. Getty Images

More than 250 million people worldwide have verbal communication disorders that make it difficult to use automatic speech recognition programs. Simply sharing what they’d like to eat for dinner by using ASR is cumbersome.

The result comes out in a generic audio voice that doesn’t reflect the mood of the speaker. And since the human voice is so closely linked to identity, when a communication tool sounds like a machine, or doesn’t work at all, the user may worry that their personality will be misinterpreted.

Northeastern University researchers are working to change that. Computer science professors Aanchan Mohan and Mirjana Prpa are developing an AI-integrated app that will give speech-impaired users access to a range of communication tools on their phones: speech recognition, text, whole-word selection, emojis and personalized text-to-speech synthesis. 

“People either use speech recognition in isolation, or they use text-to-speech in isolation, or they type in isolation,” Mohan said. “Nobody had put all three together.”

They are calling the app Speak Ease. Using large language models to predict a user’s next phrases, the app will make it easier for people with communication disorders to converse in real time. But what makes it different from other automatic speech recognition software is that it will allow users to communicate in their own voices with the specific mood expression they choose.

“Expressivity is always on a back burner because everyone is trying to solve the speed issue,” Prpa said. “Very little research actually focused on solving the problem of whether the speech provided sounds the way the user would like to sound.”

The software Mohan and Prpa are building goes beyond automatic speech recognition and falls into the category of augmentative and alternative communication software, which emphasizes context awareness and authenticity as users speak and type. Transcriptions can be edited to correct errors, and the app suggests contextually relevant phrases with an emotional tone suggested by AI.

Mohan and Prpa presented a paper and video about the app in August at Interspeech, a conference about the science and technology of spoken language processing.

Prpa, whose research focuses on human-computer interactions, and Mohan, who works on natural language processing, are based on Northeastern’s Vancouver campus. 

“We realized there might be a lot of potential in leveraging large language models to help people who have communication challenges,” Prpa said. 

They are developing the app with help from speech language pathologists, who emphasized that users want digital tools that stress expressivity and not just speed. Through focus group evaluations, they have identified ways that Speak Ease can enhance expressivity by giving users more ways to personalize communication.

Mohan and Prpa worked with a partner agency in British Columbia, Communication Assistance for Youth and Adults, whose speech and language pathologists provided input in the app’s development.

Using samples of a user’s voice, the app will eventually be able to convert atypical speech to a more intelligible version. A user who wants to compose a message to their father in a happy tone, for example, can use the app’s “speak mode” to create a transcription, which they can edit and play back in their own voice using text-to-speech software. 

The app’s large language model features will use past conversations between the user and their dad to suggest relevant words and phrases. And users can select from choices on the interface to pick a mood for the message.

“What we are looking for in our app is that when I talk to mom, or someone in my family, I might want to sound very different than when I speak in school,” Prpa said. 

Preserved speech samples would make the app useful for someone with a degenerative condition, Prpa said, that impairs their ability to communicate. As their capacity deteriorates, they can use the app to continue “speaking” as they intend to. The same feature could be used in the opposite context, for someone recovering from a stroke. Speak Ease could support a person as they gain the capacity to speak again. 

In addition to adding expressivity, the app is intended to provide clarity. An example of when this could be useful is a visit to the doctor’s office. Some people with speech difficulties find it difficult to be understood by medical professionals.  

“Say an individual with Down syndrome is describing a condition,” said Mohan. “People tend to be polite, let the person finish and say, ‘Can you say that again, right?’ Meaning they didn’t understand.”

Speak Ease will help in these situations by providing a real-time transcript that can be corrected and read aloud, both clarifying questions in the moment and doing so in the speaker’s own voice.

Mohan acknowledges that this is a technical challenge.

“The intention is to be able to capture what was transcribed versus what is eventually composed, take the difference between the two and use that to signal to train the system,” he said.