“I am your father” are four of the most famous words ever spoken on screen. When Darth Vader shattered Luke Skywalker’s world in “The Empire Strikes Back,” he sent shivers down the spines of audiences everywhere—in large part because of actor James Earl Jones’ famous baritone.
Now, Jones, 91, has announced he is hanging up the mask and retiring as the voice of one of the most infamous cinematic villains. But don’t despair: Although Jones will no longer record new lines for Star Wars projects, the character—and Jones’ voice—will live on thanks to artificial intelligence.
As first reported by Vanity Fair, Respeecher, a Ukrainian voice synthesis company, will use a combination of archival recordings, voice acting and AI technology to continue bringing Darth Vader to the screen.
This is just the latest example of how vocal AI is making its way into Hollywood—and reshaping the industry in the process. Respeecher has already used this technology in the Disney+ miniseries Obi-Wan Kenobi to create a Darth Vader that was closer to the version of the character in the original Star Wars trilogy. And Sonantic, another voice synthesis company, recently worked to recreate Val Kilmer’s voice for an emotional moment in “Top Gun: Maverick.”
As use of the technology has expanded, it has also raised questions about how AI will impact actors and their work, the entertainment industry and its reliance on well-known intellectual property—and our understanding of the human voice in general.
When it comes to Respeecher, concerns about humans vs. machines are a little more complicated. The company uses what is called a speech-to-speech approach, as opposed to text-to-speech. This technique involves layering a human actor’s voice performance to modulate an AI voice engine that has been trained on archival audio of a specific voice. In this case, the result is a voice that sounds like Jones’ Darth Vader but has the inflection and melody of a human voice actor.
“STS models don’t require a famous talent to generate the final audio, but they do require someone whose delivery can be used to ‘breathe’ life into the voice model,” says Rupal Patel, professor of communication sciences and disorders at Northeastern.
Like Darth Vader, this approach is a melding of human and machine that sits somewhere in between. Contrary to concerns about automating the acting industry, Patel is optimistic the technology can be used as a new creative tool, not unlike autotune.
“Just like autotune and touch ups didn’t remove the need for singers or makeup artists, I think speech synthesis—TTS and STS—will be powerful tools for creatives to entertain and bring us into new imaginary worlds,” Patel says.
Outside of labor concerns, the technology also raises deeper ethical questions about what viewers can “trust” on screen, says Rébecca Kleinberger, assistant professor with a joint appointment in Northeastern’s Khoury College of Computer Sciences and the College of Art, Media and Design.
Can an AI trick viewers who have been listening to Jones’ voice for decades?
That day is probably not far away, but despite rapid advancements in technology, there is still some element of the uncanny valley in these performances, Kleinberger says.
“What you often are going to have when you use a synthesized voice or try to fool the brain or, in this specific case, try to make people think it’s the same person, you’re going to first create some slight uncanny feeling that something is not completely right,” Kleinberger says.
The strange mental disconnect is not always from the voice, she says. Kleinberger is quick to point out that Darth Vader was, from his very inception, a creative collaboration between Jones, sound designer Ben Burtt and David Prowse, the actor in the suit. The idea of disconnecting the voice and body behind Vader is not new, but filtering another actor’s vocal performance through an AI model could provide additional layers of disconnection between voice and physicality and the audience.
It also doesn’t help that the human brain is also very good at detecting details from voices, especially familiar ones.
“From a fraction of a second, you can detect roughly the age, the gender, or at least the hormonal identity, of the person you talk to and their health level,” Kleinberger says. “Your brain actually detects some elements of their facial structure, the size of their nose. You detect a lot of elements of their emotion, their intent.”
For a voice researcher like Kleinberger, an AI-voiced Darth Vader is a helpful way to broach broader philosophical questions about aspects of the voices we take for granted. Like the idea that the voice changes over time with age, with sickness, even with time of day. By taking ownership of this version of Jones’ voice, Disney is fixing it in time to create some level of consistency for the character.
“This litheness of voice, which also comes with the fact that it’s embodied, is really what makes a voice a voice,” Kleinberger says. “What does that mean to fix a voice in time? Well, it allows us to do a lot of things for entertainment businesses, but what does that mean for the actor who will continue to age or continue to have a voice?”
Looming over all of these questions is the specter of deepfakes and other non-consensual uses of someone’s voice or likeness. The misuses of this technology are just as important as its potential uses, which is why groups like Northeastern’s AI Ethics Advisory Board are looking for ways to chart a more responsible future for AI.
“I believe that all new technology faces these issues and that rather than cast a shadow over it, we need to spark the dialogue about the ethics and work with industry leaders, consortiums, and all stakeholders to educate, understand and formulate a path forward that leverages the technology without compromising our shared values and humanity,” Patel says.
With more Hollywood studios using vocal AI, Kleinberger is also hopeful that cases like this can inspire the general public to talk about the value of the human voice, something that is often taken for granted.
“I’m glad that this brings some discussion around the voice that makes people aware of what they actually possess when they have a voice and what we’re actually giving up or letting go of when we talk to Alexa or when we talk to all those systems that record our voice 24/7,” Kleinberger says. “Maybe if that makes us aware of what IP, and the value, we have by each having a voice, that might be a silver lining.”