Microsoft’s VALL-E AI can mimic any voice from a short audio sample

Microsoft has shown off its latest research in text-to-speech AI with a model called VALL-E that can simulate someone’s voice from just a three-second audio sample, Ars Technica has reported. The speech can not only match the timbre but also the emotional tone of the speaker, and even the acoustics of a room. It could one day be used for customized or high-end text-to-speech applications, though like deepfakes, it carries risks of misuse. 

VALL-E is what Microsoft calls a “neural codec language model.” It’s derived from Meta’s AI-powered compression neural net Encodec, generating audio from text input and short samples from the target speaker.

In a paper, researchers describe how they trained VALL-E on 60,000 hours of English language speech from 7,000-plus speakers on Meta’s LibriLight audio library. The voice it attempts to mimic must be a close match to a voice in the training data. If that’s the case, it uses the training data to infer what the target speaker would sound like if speaking the desired text input.

Microsoft's VALL-E AI can simulate any person's voice from a short audio sample

Microsoft

The team shows exactly how well this works on the VALL-E Github page. For each phrase they want the AI to “speak,” they have a three-second prompt from the speaker to imitate, a “ground truth” of the same speaker saying another phrase for comparison, a “baseline” conventional text-to-speech synthesis and the VALL-E sample at the end. 

The results are mixed, with some sounding machine-like and others being surprisingly realistic. The fact that it retains the emotional tone of the original samples is what sells the ones that work. It also faithfully matches the acoustic environment, so if the speaker recorded their voice in an echo-y hall, the VALL-E output also sounds like it came from the same place. 

To improve the model, Microsoft plans to scale up its training data “to improve the model performance across prosody, speaking style, and speaker similarity perspectives.” It’s also exploring ways to reduce words that are unclear or missed.

Microsoft elected to not make the code open source, possibly due to the risks inherent with AI that can put words in someone’s mouth. It added that it would follow its “Microsoft AI Principals” on any further development. “Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating,” the company wrote in the “Broader impacts” section of its conclusion.

All products recommended by Engadget are selected by our editorial team, independent of our parent company. Some of our stories include affiliate links. If you buy something through one of these links, we may earn an affiliate commission. All prices are correct at the time of publishing.

Read More
Steve Dent

Latest

Control Resonant Hands-On: Giving One Of The Most Consistently Surprising Developers The Benefit Of The Doubt

When Remedy released Control in 2019, it didn’t fit neatly into any existing space. It wasn’t quite a booming AAA shooter, nor quite an indie darling, but rather its own strange little AA joint destined for “cult classic” status. Control ‘s winning combination of brutalist aesthetics, mundane everyday items and office spaces, and the weird

Eyewitness Recalls ‘Tragic’ Hit-and-Run That Killed Ex-Penn State Player’s Fiancee & Left Him on Life Support

What began as a routine walk through a quiet Colorado neighborhood turned into an unimaginable tragedy for former Penn State football player Kyle Vasey and his fiancée, Corinne More. On June 3, a pickup truck veered onto a sidewalk and struck the couple, leaving More dead and Vasey fighting for his life. One bystander who

Texas Southern Football Releases Multi-Venue 2026 Home Schedule

HOUSTON — A clearer picture is emerging of where Texas Southern University will play its home football games in 2026. A school representative contacted HBCU Legends and said the schedule has not been finalized and remains subject to change. As Texas Southern marks its centennial next year, the football program is framing this season's multi-venue

Will Bettridge, Ted Lasso and the embodiment of a Virginia football player

Will Bettridge is about to become Virginia’s all-time leading scorer.  He is like a goldfish, according to former Virginia kicker Matt Ganyard. “I think about what makes a great kicker,” Ganyard said in an interview with UVA On SI. “And then looking at Will, he absolutely embodies it. Thinking back to the Ted Lasso quote

Newsletter

Don't miss

Control Resonant Hands-On: Giving One Of The Most Consistently Surprising Developers The Benefit Of The Doubt

When Remedy released Control in 2019, it didn’t fit neatly into any existing space. It wasn’t quite a booming AAA shooter, nor quite an indie darling, but rather its own strange little AA joint destined for “cult classic” status. Control ‘s winning combination of brutalist aesthetics, mundane everyday items and office spaces, and the weird

Eyewitness Recalls ‘Tragic’ Hit-and-Run That Killed Ex-Penn State Player’s Fiancee & Left Him on Life Support

What began as a routine walk through a quiet Colorado neighborhood turned into an unimaginable tragedy for former Penn State football player Kyle Vasey and his fiancée, Corinne More. On June 3, a pickup truck veered onto a sidewalk and struck the couple, leaving More dead and Vasey fighting for his life. One bystander who

Texas Southern Football Releases Multi-Venue 2026 Home Schedule

HOUSTON — A clearer picture is emerging of where Texas Southern University will play its home football games in 2026. A school representative contacted HBCU Legends and said the schedule has not been finalized and remains subject to change. As Texas Southern marks its centennial next year, the football program is framing this season's multi-venue

Will Bettridge, Ted Lasso and the embodiment of a Virginia football player

Will Bettridge is about to become Virginia’s all-time leading scorer.  He is like a goldfish, according to former Virginia kicker Matt Ganyard. “I think about what makes a great kicker,” Ganyard said in an interview with UVA On SI. “And then looking at Will, he absolutely embodies it. Thinking back to the Ted Lasso quote

The NFL’s Changing Landscape: Why Talent Evaluation Matters More Than Ever

The NFL’s Changing Landscape: Why Talent Evaluation Matters More Than Ever The National Football League remains the most popular sports competition in the United States, attracting millions of viewers every season and generating enormous interest among fans, analysts, scouts, and bettors alike. While star quarterbacks and championship contenders often dominate headlines, the foundation of every

Business delegation visits Kazakhstan to strengthen economic and trade cooperation

Astana, Kazakhstan, Jun 2, 2026 - (ACN Newswire) - A business delegation led by the Chief Executive of the Hong Kong Special Administrative Region (HKSAR), John Lee, and organised by the Hong Kong Trade Development Council (HKTDC), began its visit to Astana, the capital of Kazakhstan, on 1 June. During the visit, a total of 43

13 Real Business Trip Stories That Prove Work Travel Collects More Stories Than Miles

Real business trips almost never go the way the itinerary promised. They start with a confidently-packed suitcase and an eight-page agenda, and somewhere between the airport gate and the hotel breakfast they quietly turn into something nobody could have invented — equal parts comedy, chaos, and unscheduled adventure. These 13 real business trip moments are exactly that kind of work-trip plot

Your business texts could look like scam messages from July 1 if you don’t act now

From July 1, any branded SMS your business sends without a registered sender ID will be labelled “Unverified” and grouped with scam messages.  What’s happening: From 1 July 2026, any business or organisation that sends SMS using a branded name, such as “MyShop” or “AcmeServices”, instead of a phone number, must have that sender ID