OpenAI Unveils GPT-4 Omni’s Voice Capabilities and They’re Literally Unbelievable

  • Oops!
    Something went wrong.
    Please try again later.
Screenshot: OpenAI
Screenshot: OpenAI

OpenAI unveiled GPT-4 Omni (GPT-4o) during its Spring Update on Monday morning in San Francisco. Chief Technology Officer Mira Murati and OpenAI staff showcased their newest flagship model, capable of real-time verbal conversations with a friendly AI chatbot that convincingly speaks like a human.

“GPT-4o provides GPT-4 level intelligence but is much faster,” Murati said on stage. “We think GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural and far easier.”

GPT-40 responds instantaneously to verbal prompts in a friendly voice that sounds uncannily like Scarlett Johansson, who voiced the AI assistant in the feature film Her. Based on the demos, this technology essentially makes that movie a reality. GPT-4o’s speech has an emotional intonation, showing excitement at some times and laughing at others. Further, it can identify emotion and tone in users’ speech as well. OpenAI staff showcased conversations with the AI chatbots with almost no lag, and the chatbot was even able to pivot quickly when interrupted.

While GPT-4o audio abilities are impressive, Omni works in several mediums. Whereas ChatGPT previously processed text, vision, and audio through a network of AI models, GPT-4o is a single model capable of processing all three. This makes everything work much faster. You can show GPT-4o an image of a math problem with your phone camera while talking to the model verbally. OpenAI says its new flagship model operates at GPT-4 levels of intelligence while setting groundbreaking watermarks on multilingual, audio, and vision capabilities.

Past this jaw-dropping demo, OpenAI is releasing GPT-4o as a desktop application for macOS. Paid users are also getting the macOS app today, but GPT-4o will be available to free users in the future. Desktop application will allow you to start voice conversations with ChatGPT directly from your computer, and share your screen with minimal friction. The ChatGPT website is also getting a simplified refresh.

OpenAI staff members Mark Chen and Barret Zoph demoed how the real-time, multimodal AI model works on stage Monday. The real-time conversation mostly worked great, as Chen and Zoph interrupted the model to ask it to pivot answers. GPT-4o told bedtime stories, helped with math problems, and more. At times, GPT-4 Omni struggled to understand the intention of the users, but the model was fairly graceful in navigating the slip-ups.

The voice model was capable of doing different voices when telling a story, laughing, and even saying “That’s so sweet of you” at one point. It’s clear the OpenAI team ensured that GPT-4o had more emotion and was more conversational than previous voice models. In demos, ChatGPT sounded more human than ever.

An OpenAI staff member confirmed in a tweet that the company has been testing GPT-4o on the LMSYS Org chatbot arena as “im-also-a-good-gpt2-chatbot.” As many suspected and Sam Altman teased, these were OpenAI models in the works. According to the staffer, the latest chatbot starkly outperformed the competition, including industry leaders GPT-4 Turbo and Claude 3 Opus, on several metrics.

The release of GPT-4o feels like a seminal moment for the future of AI chatbots. This technology pushes past much of the awkward latencies that plagued early chatbots. It’s easy to imagine a version of Siri that is quite useful with GPT-4o. These real-time capabilities are likely thanks to Nvidia’s latest inference chips, which Murati was sure to call out before ending the presentation. Regardless, OpenAI reaffirmed its dominant position as the leader in AI innovation with Monday’s demo. Now, we wait to see if the presentation gave us an accurate depiction of what this thing can do, or if it was carefully stage-managed to avoid obvious flaws.

For the latest news, Facebook, Twitter and Instagram.