NYU researchers are developing a groundbreaking AI speech synthesis system

A team of researchers at New York University has made advances in neural speech decoding, bringing us closer to a future by which individuals who have lost the flexibility to talk can regain their voice.

The studypublished in Nature Machine Intelligence, introduces a novel deep learning framework that precisely translates brain signals into comprehensible language.

People with brain injuries as a consequence of strokes, degenerative diseases or physical trauma may have the ability to make use of such devices to talk based solely on their thoughts using speech synthesizers.

This is a deep learning model that maps electrocorticography (ECoG) signals to a variety of interpretable speech features, corresponding to: B. pitch, loudness and the spectral content of speech sounds.

ECoG data captures the essential elements of speech production and allows the system to generate a compact representation of the intended speech.

The second stage features a neural speech synthesizer that converts the extracted speech features into an audible spectrogram, which may then be converted right into a speech waveform.

This waveform can eventually be converted into natural-sounding synthetic speech.

New paper published today in @NatMachIntellwhere we reveal robust neural speech decoding in 48 patients. https://t.co/rNPAMr4l68 pic.twitter.com/FG7QKCBVzp

— Adeen Flinker 🇮🇱🇺🇦🎗️ (@adeenflinker) April 9, 2024

This is how the study works

This study involves training an AI model that may power a speech synthesizer so that folks with speech loss can communicate using thoughts alone.

This is how it really works intimately:

1. Collect brain data

The first step is to gather the raw data required to coach the speech decoding model. Researchers worked with 48 participants who underwent neurosurgery for epilepsy.

During the study, these participants were asked to read tons of of sentences aloud while their brain activity was recorded using ECoG grids.

These grids are placed directly on the surface of the brain and capture electrical signals from the brain regions involved in speech production.

2. Mapping brain signals to language

Using speech data, the researchers developed a complicated AI model that maps the recorded brain signals to specific speech characteristics, corresponding to pitch, loudness, and the unique frequencies that make up different speech sounds.

3. Synthesis of language from features

The third step involves converting the speech features extracted from brain signals back into audible speech.

The researchers used a special speech synthesizer that takes the extracted features and generates a spectrogram – a visible representation of the speech sounds.

4. Evaluation of the outcomes

The researchers compared the language produced by their model with the participants’ original language.

They used objective metrics to measure the similarity between the 2 and located that the language produced closely matched the content and rhythm of the unique.

5. Testing recent words

To make sure that the model can handle recent words that it has not seen before, certain words were intentionally omitted through the training phase of the model after which the model’s performance on these unknown words was tested.

The model’s ability to accurately decode even recent words demonstrates its potential for generalization and processing of assorted language patterns.

NYU’s speech synthesis system. Source: Nature (open access)

The top section of the diagram above describes a process for converting brain signals into speech. First, a decoder converts these signals into speech parameters over time. A synthesizer then creates sound images (spectrograms) from these parameters. Another tool converts these images back into sound waves.

The section below discusses a system that helps train the brain’s signal decoder by imitating speech. It records a sound image, converts it into speech parameters and uses it to create a brand new sound image. This a part of the system learns from actual speech sounds to enhance.

After training, only the highest process is required to convert brain signals into speech.

A key advantage of the NYU approach is its ability to realize high-quality speech decoding without the necessity for ultra-high density electrode arrays, that are impractical for long-term implantation.

Essentially, it is a lighter, portable solution.

Another notable achievement is the successful decoding of language from each the left and right hemispheres of the brain, which is vital for patients with brain damage to at least one side of the brain.

Using AI to convert thoughts into speech

The NYU study builds on previous research on neural speech decoding and brain-computer interfaces (BCIs).

In 2023, a team on the University of California, San Francisco made this possible for a paralyzed stroke survivor construct sentences at a rate of 78 words per minute using a BCI that synthesized each vocalizations and facial expressions from brain signals.

Other recent studies have examined the usage of AI to interpret various facets of human considering based on brain activity. Researchers have demonstrated the flexibility to generate images, text and even music from fMRI and EEG data.

For example one Studied on the University of Helsinki used EEG signals to guide a generative adversarial network (GAN) in creating facial images that matched participants’ thoughts.

Meta AI too developed a way to decode what someone heard using non-invasively collected brainwaves.

However, it was not enough to predict language from thoughts alone.

Opportunities and challenges

The NYU method uses more widely available and clinically useful electrodes than previous methods, making it more accessible.

While these advances are exciting, major obstacles should be overcome before mind-reading AI could be deployed on a big scale.

For one thing, collecting high-quality brain data requires extensive training for machine learning models, and individual differences in brain activity could make generalization difficult.

Nevertheless, the NYU study represents a step on this direction by demonstrating high-precision speech decoding using lighter ECoG arrays.

Looking forward, the NYU team desires to refine their models for real-time speech decoding to bring us closer to the last word goal of enabling natural, fluid conversations for individuals with speech disabilities.

They also intend to adapt the system to incorporate fully implantable wireless devices that could be utilized in on a regular basis life.

This article was originally published at dailyai.com

NYU researchers are developing a groundbreaking AI speech synthesis system

This is how the study works

Using AI to convert thoughts into speech

Opportunities and challenges

About The Author

MyAiQ

Leave a reply Cancel reply

Recent Posts

NYU researchers are developing a groundbreaking AI speech synthesis system

This is how the study works

Using AI to convert thoughts into speech

Opportunities and challenges

About The Author

MyAiQ

Related Posts

News coverage of artificial intelligence reflects business and government hype — not critical voices

When AI meets your shopping experience it knows what you purchase – and what you should buy

Future of TV: we’re putting recent personalised features into shows using an ethical version of AI

Dr. Vacit Oguz Yazici, PhD at Wide Eyes Technologies, winner of the Pioner Award 2022

Leave a reply Cancel reply

Recent Posts