Google takes on GPT-4o with impressive Gemini voice and video prototype demo

Sarah Chaney

May 14, 2024 at 1:59 PM·3 min read

Yesterday, OpenAI launched a new multimodal GPT-4o model, conveniently timed right before Google's I/O 2024 event today.

GPT-4o looked incredibly impressive in demonstrations, with a more human-like ChatGPT voice and vision capabilities — but so does the new Gemini chatbot prototype Google just showed off in a tweet.

In Google's tweet, someone's pointing a phone at the I/O stage and chatting with Gemini. When asked what's happening in the camera view, Gemini responds, "It looks like people are setting up for a large event, perhaps a conference or presentation. Is there something particular that caught your eye?"

When prompted to analyze what the big I/O letters on the stage mean, Gemini says "Those letters represent Google I/O." Should Gemini have been able to recognize the I/O event with only one question? Perhaps, but more on that later.

Gemini went on to say that it was "always excited to learn about new advancements in artificial intelligence and how they can help people in their daily lives" when prompted about something it'd be "really excited to hear" about at Google I/O.

Is Google's new Gemini prototype better than its predecessor? Absolutely. Gemini has a fluid, human-like speaking tone, with inflections where you'd expect for emphasis, and it was able to analyze the surroundings to recognize the I/O presentation setup.

But is it better than OpenAI's new GPT-4o chatbot? That's the burning question.

How does GPT-4o compare to Google's new Gemini prototype?

Because Google hasn't officially launched the new Gemini yet (the chat shown in the tweet was with a prototype model), it might not be clear yet which model is better — but there are a few key differences between the two demos.

Both OpenAI's GPT-4o chatbot and Google's Gemini chatbot boast human-like speaking qualities, and they're both able to analyze visual input to answer a related question.

That said, I think Gemini's human-like voice sounds more natural. When listening to GPT-4o speak in the OpenAI live demo, there were a few times that the chatbot sounded overly enthusiastic or had slight stutters in speech. Gemini, on the other hand, sounded fluid, but obviously still has some improvements to make, like perhaps saying "I'm" instead of "I am" to sound more conversational.

Another big difference is the possibility to interrupt the chatbot while listening to its response. OpenAI programmed the GPT-4o model to be interrupted, so you don't have to wait through a long response before asking another question.

In Google's tweet, the speaker waited until the prototype Gemini model finished talking before responding with another question. To be fair, a listening-for-voice-input icon didn't pop up on the screen, so I don't know if the speaker could have interrupted Gemini and simply chose not to. But that seems like a feature you'd want to show off, no?

I'm also curious to know if GPT-4o could have described the Google I/O event after only one question. Gemini first recognized that the phone's camera was pointed at some type of event setup, and only mentioned the I/O event when asked about the I/O letters on stage.

After watching the GPT-4o live demo, I'm inclined to say OpenAI's chatbot would have said it was looking at the setup for the Google I/O 2024 event with only one question. Of course, that's pure speculation, but GPT-4o seems more well-rounded than Google's new Gemini.

My opinion may change after seeing more demos of Gemini over the course of Google I/O, but as of right now, GPT-4o seems like the chatbot to beat.

Check out our I/O live coverage throughout the day if you don't have time to catch the event.

Engadget
Opera is adding Google's Gemini AI to its browser
Opera ha teamed up with Google to integrate its Gemini AI models into its Aria AI browser assistant.
TechCrunch
Scarlett Johansson brought receipts to the OpenAI controversy
OpenAI announced this week that it’s removing Sky, one of the voices used by its new GPT-4o model, after users found it sounded eerily similar to Scarlett Johansson’s AI character in “Her.” While the company claims the voice was not based on Johansson’s, the actress said OpenAI had previously approached her about using her voice for the model. The U.S. Department of Justice and 30 state attorneys general filed a lawsuit against Live Nation Entertainment, Ticketmaster’s parent company, for alleged monopolistic practices.
TechCrunch
OpenAI and Google lay out their competing AI visions
This week had two major events from OpenAI and Google. Hot off OpenAI’s tail, Google’s I/O conference featured a smattering of announcements and integrations for its flagship model, Gemini. This week also saw some major shake-ups at AWS and OpenAI.
TechCrunch
OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here
OpenAI's livestreamed GPT announcement event happened at 10 a.m. PT Monday, but you can still catch up on the reveals. The company described the event as “a chance to demo some ChatGPT and GPT-4 updates.” As it turned out, the announcement was a new model called GPT-4o -- the "o" stands for "omni"-- which offers greater responsiveness to voice prompts, as well as better vision capabilities.
Yahoo Finance
Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon
Microsoft debuted a litany of new AI offerings as part of its Build developer conference as its fight with Google and Amazon continues to heat up.
TechCrunch
OpenAI debuts GPT-4o 'omni' model now powering ChatGPT
OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the "o" stands for "omni," referring to the model's ability to handle text, speech, and video. OpenAI CTO Mira Murati said that GPT-4o provides "GPT-4-level" intelligence but improves on GPT-4's capabilities across multiple modalities and media. "GPT-4o reasons across voice, text and vision," Murati said during a streamed presentation at OpenAI's offices in San Francisco on Monday.
Yahoo Finance
Microsoft debuts new Copilot+ PCs using OpenAI's GPT-4o while taking shots at Apple
The spring wave of AI product announcements continued Monday with Microsoft rolling out the latest version of Copilot with new AI features.
Engadget
RIP ChatGPT's knockoff Scarlett Johansson voice [2023 — 2024]
OpenAI is removing an AI voice from ChatGPT that many believe sounds like Scarlett Johansson. The company didn't give a clear explanation for the decision.
TechCrunch
Microsoft upgrades its AI app-building platforms
Microsoft's big focus at this year's Build conference is generative AI. Azure AI Studio is a toolset within Microsoft's Azure OpenAI Service that lets customers combine an AI model like OpenAI’s recently announced GPT-4o with their own data and build a chat assistant or another type of app that "reasons over" that data. Copilot Studio, meanwhile, provides tools to connect Copilot for Microsoft 365 -- the AI-powered "copilot" in apps like Excel, Word and PowerPoint as well as Microsoft’s Edge browser and Windows -- to third-party data.
Engadget
What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI
Microsoft has a Surface showcase and its Build developer conference planned for early next week. Here's what we expect.
TechCrunch
Google is bringing Gemini capabilities to Google Maps Platform
Gemini model capabilities are coming to the Google Maps Platform for developers, starting with the Places API, the company announced at the Google I/O 2024 conference on Tuesday. The summaries are created based on Gemini's analysis of insights from Google Maps’ community of more than 300 million contributors. The new summaries are available for many types of places, including restaurants, shops, supermarkets, parks and movie theaters.
Engadget
Google builds Gemini right into Android, adding contextual awareness within apps
Google just announced some changes to Gemini on Android at I/O 2024. The chatbot can now contextually analyze specific apps and provide relevant suggested actions.
Engadget
The Morning After: The biggest news from Google's I/O keynote
The biggest news stories this morning: With Gemini Live, Google wants you to relax and have a natural chat with AI, Google Pixel 8a review.
Engadget
With Gemini Live, Google wants you to relax and have a natural chat with AI
At Google I/O today, the company showed off Gemini Live, a new mobile experience for natural conversations with its AI.
Engadget
Google I/O 2024 live updates: The latest on Gemini AI, Android 15 and more
Engadget's liveblog of Google I/O 2024, covering the announcements about Android 15, Gemini AI and more.
Engadget
Google Gemini can power a virtual AI teammate with its own Workspace account
At its I/O keynote, Google demoed a virtual AI teammate that can provide insights on a team's work using its own Workspace account.
Yahoo News
Trump trial live updates: Marathon closing arguments come to an end; jury to receive instructions at 10 a.m.
Attorneys for the prosecution and defense are delivering closing arguments on Tuesday.
Yahoo Sports
Ángel Hernández, MLB’s most infamous umpire, leaves us with an odd legacy
Being an MLB umpire is a thankless job, both emotionally taxing and physically strenuous. But Hernández’s outwardly standoffish attitude and penchant for comically bad calls did him no favors.
TechCrunch
OpenAI's new safety committee is made up of all insiders
OpenAI has formed a new committee to oversee "critical" safety and security decisions related to the company's projects and operations. Altman and the rest of the Safety and Security Committee -- OpenAI board members Bret Taylor, Adam D’Angelo and Nicole Seligman as well as chief scientist Jakub Pachocki, Aleksander Madry (who leads OpenAI's "preparedness" team), Lilian Weng (head of safety systems), Matt Knight (head of security) and John Schulman (head of "alignment science") -- will be responsible for evaluating OpenAI's safety processes and safeguards over the next 90 days, according to a post on the company's corporate blog.
TechCrunch
AI models have favorite numbers, because they think they're people
AI models are always surprising us, not just in what they can do, but what they can't, and why. An interesting new behavior is both superficial and revealing about these systems: they pick random numbers as if they're human beings, which is to say, badly. This is actually a very old and well known limitation that we, humans, have: we overthink and misunderstand randomness.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Google takes on GPT-4o with impressive Gemini voice and video prototype demo

How does GPT-4o compare to Google's new Gemini prototype?

Recommended Stories

Opera is adding Google's Gemini AI to its browser

Scarlett Johansson brought receipts to the OpenAI controversy

OpenAI and Google lay out their competing AI visions

OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here

Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon

OpenAI debuts GPT-4o 'omni' model now powering ChatGPT

Microsoft debuts new Copilot+ PCs using OpenAI's GPT-4o while taking shots at Apple

RIP ChatGPT's knockoff Scarlett Johansson voice [2023 — 2024]

Microsoft upgrades its AI app-building platforms

What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI

Google is bringing Gemini capabilities to Google Maps Platform

Google builds Gemini right into Android, adding contextual awareness within apps

The Morning After: The biggest news from Google's I/O keynote

With Gemini Live, Google wants you to relax and have a natural chat with AI

Google I/O 2024 live updates: The latest on Gemini AI, Android 15 and more

Google Gemini can power a virtual AI teammate with its own Workspace account

Trump trial live updates: Marathon closing arguments come to an end; jury to receive instructions at 10 a.m.

Ángel Hernández, MLB’s most infamous umpire, leaves us with an odd legacy

OpenAI's new safety committee is made up of all insiders

AI models have favorite numbers, because they think they're people