Cool or creepy? Microsoft's VASA-1 is a new AI model that turns photos into 'talking faces'

Ryan Morrison

April 18, 2024 at 11:50 AM·3 min read

A new AI research paper from Microsoft promises a future where you can upload a photo, a sample of your voice and create a live, animated talking head of your own face.

VASA-1 takes in a single portrait photo and an audio file and converts it into a hyper realistic talking face video complete with lip sync, realistic facial features and head movement.

The model is currently only a research preview and not available for anyone outside of the Microsoft Research team to try, but the demo videos look impressive.

Similar lip sync and head movement technology is already available from Runway and Nvidia but this seems to be of a much higher quality and realism, reducing mouth artifacts. This approach to audio-driven animation is also similar to a recent VLOGGER AI model from Google Research.

How does VASA-1 work?

Microsoft says this is a new framework for the creation of lifelike talking faces and specifically for the purpose of animating virtual characters. All of the people in the examples were synthetic, made using DALL-E but if it can animate a realistic AI image, it can animate a real photo.

In the demo we see people talking as if they were being filmed, with slightly jerky but otherwise natural-looking movement. The lip sync is very impressive, with natural movement and no artefacts around the top and bottom of the mouth seen in other tools.

One of the most impressive things about VASA-1 seems to be the fact it doesn't require a face-forward portrait style image to make it work.

There are examples with shots facing a range of directions. The model also seems to have a high degree of control, capable of taking eye gaze direction, head distance and even emotion as an input to steer the generation.

What is the point of VASA-1?

One of the most obvious use cases for this is in advanced lip synching for games. Being able to create AI-driven NPCs with natural lip movement could be a game-changer for immersion.

It could also be used to create virtual avatars for social media videos, as seen already from companies like HeyGen and Synthesia. One other area is in AI-based movie making. You could make a more realistic music video if you can have an AI singer that looks like they are singing.

That said, the team say this is just a research demonstration, with no plans for a public release or even making it available to developers to use in products.

How well does VASA-1 work?

One thing that surprised the researchers was the ability of VASA-1 to perfectly lip-sync to a song, reflecting the words from the singer without issue despite no music being used in the training dataset. It also handled different image styles including the Mona Lisa.

They've got it creating 512x512 pixel images at 45 frames per second and can do it in about 2 minutes using a desktop-grade Nvidia RTX 4090 GPU.

While they say this is only for research, it will be a shame if this doesn’t get out into the public domain, even if only for developers as I’d love to see it in Runway or Pika Labs. Given Microsoft has a huge stake in OpenAI this could even be part of a future Copilot Sora integration.

More from Tom's Guide

TechCrunch
Microsoft dodges UK antitrust scrutiny over its Mistral AI stake
Microsoft won't be facing antitrust scrutiny in the U.K. over its recent investment in French AI startup, Mistral AI, with the country's Competition and Markets Authority (CMA) on Friday concluding that the partnership "does not qualify for investigation under the merger provisions of the Enterprise Act 2002." The decision comes three weeks after the CMA revealed a trio of early-stage probes into Amazon and Microsoft's various AI investments and partnerships, including the Redmond-based company's $16 million investment in Mistral AI, an OpenAI rival working on large language models. Shortly after, Microsoft hired the team behind Inflection AI, another OpenAI rival, essentially gutting the startup.
TechCrunch
EU warns Microsoft it could be fined billions over missing GenAI risk info
The European Union has warned Microsoft that it could be fined up to 1% of its global annual turnover under the bloc's online governance regime, the Digital Services Act (DSA), after the company failed to respond to a request for information (RFI) that focused on its generative AI tools. Back in March, the EU asked Microsoft and a number of other tech giants for information about systemic risks posed by generative AI tools. On Friday, the Commission said Microsoft failed to provide some of the documents it asked for.
Yahoo News
AI-generated images are running rampant on social media. What are X, TikTok and Meta doing to control them?
While an AI-generated photo of Katy Perry at the Met Gala isn't a major cause for concern, Instagram's fact-checkers taking hours to flag it indicates a larger issue social platforms have to grapple with regarding AI.
Engadget
The OpenAI team tasked with protecting humanity is no more
In the summer of 2023, OpenAI created a “Superalignment” team whose goal was to steer and control future AI systems that could be so powerful they could lead to human extinction. Less than a year later, that team is dead.
Engadget
Surface Pro 10 for Business review: A safe upgrade for IT workers
The Surface Pro 10 for Business is basically just a chip upgrade over the previous model, but it’s still a thin, light and well-designed Windows tablet.
TechCrunch
Slack under attack over sneaky AI training policy
It all kicked off last night, when a note on Hacker News raised the issue of how Slack trains its AI services, by way of a straight link to its privacy principles -- no additional comment was needed.
Engadget
A Fallout crossover is coming to Fortnite
A Fallout crossover is coming to Fortnite, likely when the battle royale's new season starts on May 24.
Yahoo News
Courtroom sketch artists capture history at Trump's hush money trial. Here are some of the best.
Since it’s not being televised, — the only images of testimony from inside the courtroom are portraits being done by sketch artists like Jane Rosenberg, whose sketches depict Trump and other key figures in various states and moods.
TechCrunch
Meta's latest experiment copies BeReal and Snapchat's core ideas
Meta is once again taking on its competitors by developing a feature that borrows concepts from others -- in this case, BeReal and Snapchat. The company is developing a feature for Instagram called “Peek” that would allow users to post authentic pictures that can only be viewed once. While Snapchat popularized the idea of ephemeral content on social media, BeReal led the trend of posting authentic, unedited content.
Yahoo Finance
Apple’s iPad Pro is its most incredible product, but software holds it back
Apple's iPad Pro is an amazing piece of technology, but it needs the software to keep up.
Yahoo Life Shopping
These classic Coach bags are on clearance for 70% off — snag them before it's too late, starting at $98
Treat yourself to a compact crossbody or a hold-everything tote at fire-sale prices.
Yahoo Life Shopping
The 40+ best Amazon early Memorial Day deals: Save up to 80% on summer essentials, vacuums, tech and more
A No. 1 bestselling Ninja air fryer for $80 is calling our name, as is a sleek stick vac marked down by over 50%. That's just for starters.
Yahoo Life Shopping
I'm an interior designer, and these are my top picks from Amazon's outdoor rug Memorial Day sale — save up to 80%
Treat your toes to these elusive deals on Safavieh, Loloi, Amber Lewis, Nourison, NuLoom and more.
Yahoo Sports
Yankees looking to extend Juan Soto, Elly De La Cruz is unreal & listener emails
Jake Mintz & Jordan Shusterman discuss the Yankees looking to extend Juan Soto during the season, Elly De La Cruz being dangerous on the base paths, answer some listener emails and give their weekly rendition of the Good, the Bad & the Uggla.
TechCrunch
IndieBio's SF incubator lineup is making some wild biotech promises
IndieBio's Bay Area incubator is about to debut its 15th cohort of biotech startups. "With Stream, you're just watching the nucleotides incorporate in real time, looking at the colors associated with the As, Ts, Gs, and Cs coming up, and doing it without a huge computational load," said Dang.
Yahoo Sports
Is the Toronto Blue Jays' window to contend officially closing?
Sitting last in the AL East with regression throughout the lineup, the Blue Jays will soon have to face the reality of being sellers at this year's trade deadline.
Yahoo Sports
Attorneys say Scottie Scheffler likely won't face felony conviction: 'Probably about a zero percent chance'
Scottie Scheffler will likely avoid the most serious charges filed against him stemming from Friday morning's altercation with police.
Yahoo Sports
LeBron James reportedly won't leave Lakers to join Bronny with another NBA team
LeBron James reportedly wouldn't join an NBA team who drafts his son Bronny. Previously, LeBron has said he wants to play his final season with his son.
Engadget
Valve’s next game appears to be Deadlock, a MOBA hero shooter
According to a pair of content creators, the wait for a new Valve game is almost over: A third-person hero shooter is imminent, and it’s called Deadlock.
Yahoo Life Shopping
'Gentle and simple': Olivia Wilde loves CeraVe — and the brand's eye cream is down to just $14
Tone down dark circles, puffiness and dryness with this No. 1 bestseller.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Cool or creepy? Microsoft's VASA-1 is a new AI model that turns photos into 'talking faces'

How does VASA-1 work?

What is the point of VASA-1?

How well does VASA-1 work?

More from Tom's Guide

Recommended Stories

Microsoft dodges UK antitrust scrutiny over its Mistral AI stake

EU warns Microsoft it could be fined billions over missing GenAI risk info

AI-generated images are running rampant on social media. What are X, TikTok and Meta doing to control them?

The OpenAI team tasked with protecting humanity is no more

Surface Pro 10 for Business review: A safe upgrade for IT workers

Slack under attack over sneaky AI training policy

A Fallout crossover is coming to Fortnite

Courtroom sketch artists capture history at Trump's hush money trial. Here are some of the best.

Meta's latest experiment copies BeReal and Snapchat's core ideas

Apple’s iPad Pro is its most incredible product, but software holds it back

These classic Coach bags are on clearance for 70% off — snag them before it's too late, starting at $98

The 40+ best Amazon early Memorial Day deals: Save up to 80% on summer essentials, vacuums, tech and more

I'm an interior designer, and these are my top picks from Amazon's outdoor rug Memorial Day sale — save up to 80%

Yankees looking to extend Juan Soto, Elly De La Cruz is unreal & listener emails

IndieBio's SF incubator lineup is making some wild biotech promises

Is the Toronto Blue Jays' window to contend officially closing?

Attorneys say Scottie Scheffler likely won't face felony conviction: 'Probably about a zero percent chance'

LeBron James reportedly won't leave Lakers to join Bronny with another NBA team

Valve’s next game appears to be Deadlock, a MOBA hero shooter

'Gentle and simple': Olivia Wilde loves CeraVe — and the brand's eye cream is down to just $14