Personalized text-to-speech voices help people with speech disabilities maintain identity and social connection

by Emily Arntsen

November 18, 2019

VocaliD, a company founded by Northeastern professor Rupal Patel, creates custom voices for people who either can’t speak or are at risk of losing their voices in the future. Photo by Ruby Wallau/Northeastern University

When Stephen Hawking lost his ability to speak, he began using a text-to-speech technology called Perfect Paul. For the most part, Perfect Paul served its purpose—Hawking was able to communicate despite his paralyzing neurodegenerative disease.

But Perfect Paul wasn’t actually perfect. It not only made Hawking sound like a robot; it was missing his British accent. In adopting a synthetic voice, Hawking relinquished the characteristics that made his voice distinctly his.

Maybe that seems like a small price to pay for the ability to speak again. But Rupal Patel, a professor of communication sciences and disorders at Northeastern, thinks having a unique voice isn’t superfluous; it’s a basic right.

“If your voice sounds like an ATM or an Alexa device, how can you take ownership over that prosthesis? It doesn’t feel like an extension of you anymore,” she says. “Yet, if you could continue speaking the way you used to, a social connection is maintained.”

This was the impetus for VocaliD, a company founded by Patel that creates custom voices for people who either can’t speak or are at risk of losing their voices in the future.

This is how it looks when a patient donates her voice at the VocaliD clinic. Photo by Ruby Wallau/Northeastern University

For people who might lose their voices because of conditions such as throat or tongue cancer, for example, VocaliD can create vocal legacies, which are recordings that preserve the voice of someone who might need text-to-speech assistance later in life. That way, the person’s own voice can be used for generated speech in the future.

In September, Northeastern’s Voice Preservation Clinic opened on the Boston campus for people from the community to bank or donate their voices. The process takes three to four hours, during which participants record themselves reading about two thousand sentences.

Even people who don’t think they’re at risk of losing their voices permanently can benefit from banking their voices in other ways. Patel says that as voice analysis technology becomes more advanced, doctors can use our voices to track changes in our health.

Northeastern professor Rupal Patel has a joint appointment in the Khoury College of Computer Sciences and the Bouvé College of Health Sciences. Photo by Ruby Wallau/Northeastern University

“Voice changes are often the first signs of vocal or neurological disorders,” she says. “Speech is such a fine-grained movement that even small changes can show up as symptoms early on before someone loses full motor control.”

For example, early symptoms of Parkinson’s disease include a softening of the voice and trouble starting speech, Patel says.

“This is such a rich area of research right now because there are so many devices listening to us speak,” she says. “We should capitalize on that, and maybe we could catch changes to our health early on.”

As for people who have never been able to speak because of conditions such as cerebral palsy or severe autism, VocaliD can use nonverbal sounds these people make to create voices that reflect the vowel sounds, melody and pitch of what they would sound like if they could speak.

“They can’t speak, but they can still do other things with their voice,” Patel says. Those nonverbal sounds are then blended with recordings made by a voice donor, who matches the age and gender of the person who needs the voice.

Restoring a person’s voice is an emotional process, Patel says. She recalls a time when she presented three potential voices to a man who lost the ability to speak after developing amyotrophic lateral sclerosis.

“His wife used to call his phone to listen to his answering machine,” she says. “But eventually he traded in his phone, and they accidentally lost that recording.”

Patel created three potential voices from scratch. When the couple heard the first two, “they nodded politely,” she says. “But when I played the last one for them, the woman whispered, ‘It sounds like you,’ and the man started shaking and crying.”

“There’s a moment when you reconnect with a piece of you that you thought you lost forever,” she says. “That’s magical.”

For media inquiries, please contact Marirose Sartoretto at m.sartoretto@northeastern.edu or 617-373-5718.