Brain signals translated into speech using AI

Technology could one day be used to help people who can’t talk to communicate.

In an effort to provide a voice for people who can’t speak, neuroscientists have designed a device that can transform brain signals into speech.

This technology isn’t yet accurate enough for use outside the lab, although it can synthesize whole sentences that are mostly intelligible. Its creators described their speech-decoding device in a study¹published on 24 April in Nature.

Scientists have previously used artificial intelligence to translate single words²^,³, mostly consisting of one syllable, from brain activity, says Chethan Pandarinath, a neuroengineer at Emory University in Atlanta, Georgia, who co-wrote a commentary accompanying the study. “Making the leap from single syllables to sentences is technically quite challenging and is one of the things that makes the current work so impressive,” he says.

Mapping movements

Many people who have lost the ability to speak communicate using technology that requires them to make tiny movements to control a cursor that selects letters or words on a screen. UK physicist Stephen Hawking, who had motor-neuron disease, was one famous example. He used a speech-generating device activated by a muscle in his cheek, says study leader Edward Chang, a neurosurgeon at the University of California, San Francisco.

Because people who use such devices must type out words letter by letter, these devices can be very slow, producing up to ten words per minute, Chang says. Natural spoken speech averages 150 words per minute. “It’s the efficiency of the vocal tract that allows us to do that,” he says. And so Chang and his team decided to model the vocal system when constructing their decoder.

Researchers implanted electrodes similar to these in participants’ skulls to record their brain signals.

The researchers worked with five people who had electrodes implanted on the surface of their brains as part of epilepsy treatment. First, the team recorded brain activity as the participants read hundreds of sentences aloud. Then, Chang and his colleagues combined these recordings with data from previous experiments that determined how movements of the tongue, lips, jaw and larynx created sound.

The team trained a deep-learning algorithm on these data, and then incorporated the program into their decoder. The device transforms brain signals into estimated movements of the vocal tract, and turns these movements into synthetic speech. People who listened to 101 synthesized sentences could understand 70% of the words on average, Chang says.

In another experiment, the researchers asked one participant to read sentences aloud and then to mime the same sentences by moving their mouth without producing sound. The sentences synthesized in this test were of lower quality than those created from audible speech, Chang says, but the results are still encouraging.

Intelligible future

Speech created by mapping brain activity to movements of the vocal tract and translating them to sound is more easily understood than that produced by mapping brain activity directly to sound, says Stephanie Riès, a neuroscientist at San Diego State University in California.

But it’s unclear whether the new speech decoder would work with words that people only think, says Amy Orsborn, a neural engineer at the University of Washington in Seattle. “The paper does a really good job of showing that this works for mimed speech,” she says. “But how would this work when someone’s not moving their mouth?”

Marc Slutzky, a neurologist at Northwestern University in Chicago, Illinois, agrees and says that the decoder’s performance leaves room for improvement. He notes that listeners identified the synthesized speech by selecting words from a set of choices; as the number of choices increased, people had more trouble understanding the words.

The study “is a really important step, but there’s still a long way to go before synthesized speech is easily intelligible”, Slutzky says.