OpenAI Unveils GPT-4 Omni’s Voice Capabilities and They’re Literally Unbelievable

Maxwell Zeff

May 13, 2024 at 2:24 PM·3 min read

Oops!
Something went wrong.
Please try again later.

OpenAI unveiled GPT-4 Omni (GPT-4o) during its Spring Update on Monday morning in San Francisco. Chief Technology Officer Mira Murati and OpenAI staff showcased their newest flagship model, capable of real-time verbal conversations with a friendly AI chatbot that convincingly speaks like a human.

“GPT-4o provides GPT-4 level intelligence but is much faster,” Murati said on stage. “We think GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural and far easier.”

GPT-40 responds instantaneously to verbal prompts in a friendly voice that sounds uncannily like Scarlett Johansson, who voiced the AI assistant in the feature film Her. Based on the demos, this technology essentially makes that movie a reality. GPT-4o’s speech has an emotional intonation, showing excitement at some times and laughing at others. Further, it can identify emotion and tone in users’ speech as well. OpenAI staff showcased conversations with the AI chatbots with almost no lag, and the chatbot was even able to pivot quickly when interrupted.

While GPT-4o audio abilities are impressive, Omni works in several mediums. Whereas ChatGPT previously processed text, vision, and audio through a network of AI models, GPT-4o is a single model capable of processing all three. This makes everything work much faster. You can show GPT-4o an image of a math problem with your phone camera while talking to the model verbally. OpenAI says its new flagship model operates at GPT-4 levels of intelligence while setting groundbreaking watermarks on multilingual, audio, and vision capabilities.

Past this jaw-dropping demo, OpenAI is releasing GPT-4o as a desktop application for macOS. Paid users are also getting the macOS app today, but GPT-4o will be available to free users in the future. Desktop application will allow you to start voice conversations with ChatGPT directly from your computer, and share your screen with minimal friction. The ChatGPT website is also getting a simplified refresh.

OpenAI staff members Mark Chen and Barret Zoph demoed how the real-time, multimodal AI model works on stage Monday. The real-time conversation mostly worked great, as Chen and Zoph interrupted the model to ask it to pivot answers. GPT-4o told bedtime stories, helped with math problems, and more. At times, GPT-4 Omni struggled to understand the intention of the users, but the model was fairly graceful in navigating the slip-ups.

The voice model was capable of doing different voices when telling a story, laughing, and even saying “That’s so sweet of you” at one point. It’s clear the OpenAI team ensured that GPT-4o had more emotion and was more conversational than previous voice models. In demos, ChatGPT sounded more human than ever.

An OpenAI staff member confirmed in a tweet that the company has been testing GPT-4o on the LMSYS Org chatbot arena as “im-also-a-good-gpt2-chatbot.” As many suspected and Sam Altman teased, these were OpenAI models in the works. According to the staffer, the latest chatbot starkly outperformed the competition, including industry leaders GPT-4 Turbo and Claude 3 Opus, on several metrics.

The release of GPT-4o feels like a seminal moment for the future of AI chatbots. This technology pushes past much of the awkward latencies that plagued early chatbots. It’s easy to imagine a version of Siri that is quite useful with GPT-4o. These real-time capabilities are likely thanks to Nvidia’s latest inference chips, which Murati was sure to call out before ending the presentation. Regardless, OpenAI reaffirmed its dominant position as the leader in AI innovation with Monday’s demo. Now, we wait to see if the presentation gave us an accurate depiction of what this thing can do, or if it was carefully stage-managed to avoid obvious flaws.

For the latest news, Facebook, Twitter and Instagram.

Yahoo Finance
Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon
Microsoft debuted a litany of new AI offerings as part of its Build developer conference as its fight with Google and Amazon continues to heat up.
TechCrunch
OpenAI debuts GPT-4o 'omni' model now powering ChatGPT
OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the "o" stands for "omni," referring to the model's ability to handle text, speech, and video. OpenAI CTO Mira Murati said that GPT-4o provides "GPT-4-level" intelligence but improves on GPT-4's capabilities across multiple modalities and media. "GPT-4o reasons across voice, text and vision," Murati said during a streamed presentation at OpenAI's offices in San Francisco on Monday.
TechCrunch
ChatGPT's mobile app revenue saw its biggest spike yet following GPT-4o launch
Consumer demand for the latest AI technology is heating up. The launch of OpenAI's latest flagship model, GPT-4o, has now driven the company's biggest-ever spike in revenue on mobile, despite the model being freely available on the web. This technical innovation is also pushing more users to upgrade to OpenAI's paid subscription, according to new data from app intelligence firm Appfigures.
Yahoo Finance
Microsoft debuts new Copilot+ PCs using OpenAI's GPT-4o while taking shots at Apple
The spring wave of AI product announcements continued Monday with Microsoft rolling out the latest version of Copilot with new AI features.
TechCrunch
Microsoft upgrades its AI app-building platforms
Microsoft's big focus at this year's Build conference is generative AI. Azure AI Studio is a toolset within Microsoft's Azure OpenAI Service that lets customers combine an AI model like OpenAI’s recently announced GPT-4o with their own data and build a chat assistant or another type of app that "reasons over" that data. Copilot Studio, meanwhile, provides tools to connect Copilot for Microsoft 365 -- the AI-powered "copilot" in apps like Excel, Word and PowerPoint as well as Microsoft’s Edge browser and Windows -- to third-party data.
TechCrunch
OpenAI and Google lay out their competing AI visions
This week had two major events from OpenAI and Google. Hot off OpenAI’s tail, Google’s I/O conference featured a smattering of announcements and integrations for its flagship model, Gemini. This week also saw some major shake-ups at AWS and OpenAI.
Engadget
OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human
The new model accepts any combination of text, audio and images as input and can generate an output in all three formats.
TechCrunch
OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here
OpenAI's livestreamed GPT announcement event happened at 10 a.m. PT Monday, but you can still catch up on the reveals. The company described the event as “a chance to demo some ChatGPT and GPT-4 updates.” As it turned out, the announcement was a new model called GPT-4o -- the "o" stands for "omni"-- which offers greater responsiveness to voice prompts, as well as better vision capabilities.
Engadget
RIP ChatGPT's knockoff Scarlett Johansson voice [2023 — 2024]
OpenAI is removing an AI voice from ChatGPT that many believe sounds like Scarlett Johansson. The company didn't give a clear explanation for the decision.
Engadget
What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI
Microsoft has a Surface showcase and its Build developer conference planned for early next week. Here's what we expect.
Engadget
OpenAI co-founder and Chief Scientist Ilya Sutskever is leaving the company
Last year, Ilya Sutskever admitted that he regretted the role he played in the sudden dismissal of OpenAI CEO Sam Altman and President Greg Brockman.
Engadget
Apple has reportedly resumed talks with OpenAI to build a chatbot for the iPhone
Apple has resumed talks with OpenAI, the maker of ChatGPT, to build an AI-powered chatbot into the iPhone, according to a new report.
Engadget
Meta and Google want to make AI deals with Hollywood studios
Meta and Google are offering Hollywood studios millions of dollars with the hope of striking licensing deals that could improve their models for AI-generated video, according to a new report.
Yahoo Sports
Charles Barkley calls TNT leads 'clowns,' suggests his production company could take over 'Inside The NBA'
Charles Barkley wants to keep the crew together.
Yahoo Sports
Report: Pistons to hire Trajan Langdon as president of basketball operations
The Detroit Pistons are working on a deal to make New Orleans Pelicans general manager Trajan Langdon the team's new president of basketball operations.
TechCrunch
The new Kia EV3 will have an AI assistant with ChatGPT DNA
The Kia EV3 — the new all-electric compact SUV revealed Thursday — illustrates a growing appetite among global automakers to bring generative AI into their vehicles. The automaker said the Kia EV3 will feature a new voice assistant that is built off ChatGPT, the text-generating AI chatbot developed by OpenAI. The Kia EV3, and its AI assistant, will first come to market in Korea in July 2024, followed by Europe in the second half of the year.
Yahoo Finance
SEC greenlights exchanges to list ether ETFs, still needs to approve money manager filings
Regulators gave major securities exchanges the green light to list exchange-traded funds that hold ether, the first major step toward allowing the new products to trade.
Yahoo Life Shopping
A kernel of truth: This top-selling $9 corn stripper can easily tackle an entire cob in under a minute
A doodad with a fan club of 5,000+ that eliminates the need for precarious knife work? I'm all 'ears'!
TechCrunch
$6M fine for robocaller who used AI to clone Biden's voice
The FCC has proposed a $6 million fine for the scammer who used voice-cloning tech to impersonate President Biden in a series of illegal robocalls during a New Hampshire primary election. It's more about robocalls than AI, but the agency is clearly positioning this as a warning to other would-be high-tech scammers. This was, of course, fake — a voice clone of President Biden using tech that has become widely available over the last couple years.
Autoblog
The Civic goes hybrid, driving the Nissan Z Nismo and more | Autoblog Podcast #833
On this week's podcast, we discuss the Honda Civic Hybrid refresh; a possible Maverick sport truck and possible new Mitsubishi Delica; booming hybrid sales; the UAW Mercedes vote; VW's ID.7 postponement; Tesla Semi news; the BYD Shark pickup truck; and, whew, even more.

News

Life

Entertainment

Finance

Sports

New on Yahoo

OpenAI Unveils GPT-4 Omni’s Voice Capabilities and They’re Literally Unbelievable

Recommended Stories

Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon

OpenAI debuts GPT-4o 'omni' model now powering ChatGPT

ChatGPT's mobile app revenue saw its biggest spike yet following GPT-4o launch

Microsoft debuts new Copilot+ PCs using OpenAI's GPT-4o while taking shots at Apple

Microsoft upgrades its AI app-building platforms

OpenAI and Google lay out their competing AI visions

OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human

OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here

RIP ChatGPT's knockoff Scarlett Johansson voice [2023 — 2024]

What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI

OpenAI co-founder and Chief Scientist Ilya Sutskever is leaving the company

Apple has reportedly resumed talks with OpenAI to build a chatbot for the iPhone

Meta and Google want to make AI deals with Hollywood studios

Charles Barkley calls TNT leads 'clowns,' suggests his production company could take over 'Inside The NBA'

Report: Pistons to hire Trajan Langdon as president of basketball operations

The new Kia EV3 will have an AI assistant with ChatGPT DNA

SEC greenlights exchanges to list ether ETFs, still needs to approve money manager filings

A kernel of truth: This top-selling $9 corn stripper can easily tackle an entire cob in under a minute

$6M fine for robocaller who used AI to clone Biden's voice

The Civic goes hybrid, driving the Nissan Z Nismo and more | Autoblog Podcast #833