OpenAI's new GPT-4o went viral with a video demoing a 'seeing' AI assisting its 'blind' counterpart, a spectacle that truly needs to be seen to be believed

Kevin Okemwa

May 14, 2024 at 2:53 PM·4 min read

ChatGPT and Microsoft Logo.

What you need to know

OpenAI just released its new flagship GPT-4o model.
It can reason across audio, vision, and text in real time, making interactions with ChatGPT more seamless.
OpenAI also unveiled a native ChatGPT app for Mac, snubbing Windows.
A viral ChatGPT demo has showcased GPT-4o's audio and visual capabilities to talk to another AI model.

OpenAI just unveiled its new flagship GPT-4o model (I know I'm not the only one getting confused by these models as they continue to ship). Essentially, GPT-4o is an improved version of OpenAI's GPT-4 and is just as smart. The model is more intuitive and can reason across audio, vision, and text in real time, making interactions with ChatGPT more seamless.

The "magic" behind OpenAI's just-concluded Spring Update event remains debatable, but the demos emerging on social media are pretty impressive, bordering on mind-blowing. Translating the Italian language to English and relaying the information in real-time is quite something, potentially keeping impediments to communication, like language barriers, at bay.

But what left me perplexed was a video demo shared by OpenAI President and Co-founder Greg Brockman on X (formerly Twitter). I never thought we'd one day reach a point where a virtual assistant could hold a full conversation with another AI assistant with minimal complications.

The demo starts with the user explaining to two AI chatbots that they'll essentially be talking to each other. The user walks the chatbots through his expectations, stating that one of the chatbots can see the world via a camera. In contrast, the other chatbot can model questions or even direct it to perform specific tasks with the user's assistance.

"Well, Well, Well, just when I thought things couldn't get any more interesting," jokingly replied the first chatbot. Talking to another AI that can see the world, this sounds like a plot twist in the AI universe." Just before the AI assistant could agree to the terms, the user asked it to pause for a bit while he gave instructions to the second AI.

Right off the bat, the user starts talking to the second AI assistant by telling it that it will have access to see the world. I assume this is a subtle prompt asking the assistant to access the camera on the phone, which it'll use as its eyes to see the world. Instantly, the interface features a camera (selfie mode) and paints a crystal clear picture of what the user is wearing and his environment.

From this point, the user points out that the first AI model will talk to it and ask questions, including moving the camera and what it sees. It's expected to be helpful, and the questions will be answered accurately.

The process begins with the AI that can "see the world," explaining what's in its view, including the user and more context about his dress code and building design. Interestingly, it almost feels like two humans conversing on FaceTime, as the first AI gives feedback based on the information shared. Additionally, the AI seems to have a firm grasp of what the user is doing, their expression, and even their style based on what they have on.

What blew my mind was when the user signaled another person in the room to come closer and appear in the AI's view. The AI instantly picked up on this and even indicated that the user "might be getting ready for presentation or conversation" based on his direct engagement with the camera.

Interestingly, introducing a third party didn't affect the conversation between both AIs. At first glance, it's almost possible to say that the AI didn't catch a glimpse of the person entering the room and standing behind the user holding the phone.

However, this isn't the case. The user briefly stopped the conversation between both AIs to ask if something unusual had occurred. The AI with visual capabilities pointed out that a second person came into view behind the first person and playfully made bunny eyes behind the first person before quickly leaving the frame. The AI referred to the situation as light-hearted and unexpected.

The demo continues to showcase GPT-4o's vast capabilities. The user even requests that both models create a song based on the events that had just transpired and sing it by alternating lines. At some point, it seems like a choir master prepping his choir for a significant upcoming event at church.

I should also point out that most of the demos I've seen are mostly on Apple devices like the iPhone and MacBook. Perhaps this is why OpenAI released a native ChatGPT app for Mac users before shipping it to Windows. Besides, OpenAI CEO Sam Altman admitted that "the iPhone is the greatest piece of technology humanity has ever made."

Engadget
OpenAI's board allegedly learned about ChatGPT launch on Twitter
“[The] board was not informed in advance of that,” Toner said on Tuesday on a podcast called The Ted AI Show. “We learned about ChatGPT on Twitter.”
TechCrunch
ChatGPT's mobile app revenue saw its biggest spike yet following GPT-4o launch
Consumer demand for the latest AI technology is heating up. The launch of OpenAI's latest flagship model, GPT-4o, has now driven the company's biggest-ever spike in revenue on mobile, despite the model being freely available on the web. This technical innovation is also pushing more users to upgrade to OpenAI's paid subscription, according to new data from app intelligence firm Appfigures.
TechCrunch
OpenAI and Google lay out their competing AI visions
This week had two major events from OpenAI and Google. Hot off OpenAI’s tail, Google’s I/O conference featured a smattering of announcements and integrations for its flagship model, Gemini. This week also saw some major shake-ups at AWS and OpenAI.
TechCrunch
OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here
OpenAI's livestreamed GPT announcement event happened at 10 a.m. PT Monday, but you can still catch up on the reveals. The company described the event as “a chance to demo some ChatGPT and GPT-4 updates.” As it turned out, the announcement was a new model called GPT-4o -- the "o" stands for "omni"-- which offers greater responsiveness to voice prompts, as well as better vision capabilities.
Yahoo Finance
Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon
Microsoft debuted a litany of new AI offerings as part of its Build developer conference as its fight with Google and Amazon continues to heat up.
TechCrunch
OpenAI debuts GPT-4o 'omni' model now powering ChatGPT
OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the "o" stands for "omni," referring to the model's ability to handle text, speech, and video. OpenAI CTO Mira Murati said that GPT-4o provides "GPT-4-level" intelligence but improves on GPT-4's capabilities across multiple modalities and media. "GPT-4o reasons across voice, text and vision," Murati said during a streamed presentation at OpenAI's offices in San Francisco on Monday.
Yahoo Finance
Microsoft debuts new Copilot+ PCs using OpenAI's GPT-4o while taking shots at Apple
The spring wave of AI product announcements continued Monday with Microsoft rolling out the latest version of Copilot with new AI features.
Engadget
OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human
The new model accepts any combination of text, audio and images as input and can generate an output in all three formats.
TechCrunch
Microsoft upgrades its AI app-building platforms
Microsoft's big focus at this year's Build conference is generative AI. Azure AI Studio is a toolset within Microsoft's Azure OpenAI Service that lets customers combine an AI model like OpenAI’s recently announced GPT-4o with their own data and build a chat assistant or another type of app that "reasons over" that data. Copilot Studio, meanwhile, provides tools to connect Copilot for Microsoft 365 -- the AI-powered "copilot" in apps like Excel, Word and PowerPoint as well as Microsoft’s Edge browser and Windows -- to third-party data.
Engadget
RIP ChatGPT's knockoff Scarlett Johansson voice [2023 — 2024]
OpenAI is removing an AI voice from ChatGPT that many believe sounds like Scarlett Johansson. The company didn't give a clear explanation for the decision.
Engadget
What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI
Microsoft has a Surface showcase and its Build developer conference planned for early next week. Here's what we expect.
Engadget
Microsoft outage impacts Bing, Copilot, ChatGPT internet search and other sites
Multiple Microsoft services including Bing and Copilot, along with ChatGPT internet search and DuckDuckGo are down in Europe.
TechCrunch
OpenAI inks deal to train AI on Reddit data
OpenAI has reached a deal with Reddit to use the social news site's data for training AI models. Reddit content will be incorporated into ChatGPT, OpenAI's popular conversational AI, and the companies will work together to bring unspecified new "AI-powered features" to both Reddit users and moderators. OpenAI will also become a Reddit advertising partner.
Engadget
OpenAI co-founder and Chief Scientist Ilya Sutskever is leaving the company
Last year, Ilya Sutskever admitted that he regretted the role he played in the sudden dismissal of OpenAI CEO Sam Altman and President Greg Brockman.
TechCrunch
Anthropic is expanding to Europe and raising more money
On the heels of OpenAI announcing the latest iteration of its GPT large language model, its biggest rival in generative AI in the U.S. announced an expansion of its own. Anthropic said Monday that Claude, its AI assistant, is now live in Europe with support for "multiple languages," including French, German, Italian and Spanish across Claude.ai, its iOS app and its business plan for teams. The launch comes after Anthropic extended its API to Europe to get developers using and integrating its models.
TechCrunch
Stack Overflow signs deal with OpenAI to supply data to its models
OpenAI is collaborating with Stack Overflow, the Q&A forum for software developers, to improve its generative AI models' performance on programming-related tasks. As a result of the partnership, announced Monday, OpenAI's models, including models served through its ChatGPT chatbot platform, should get better over time at answering programming-related questions, the two companies say. At the same time, Stack Overflow will benefit from OpenAI's expertise in developing new generative AI integrations on the Stack Overflow platform.
Engadget
OpenAI is reportedly working on a search feature for ChatGPT
According to Bloomberg, the company is currently developing the capability, which can scour the web for answers to your queries and spit out results complete with citations to their sources.
Engadget
OpenAI partners with People publisher Dotdash Meredith
OpenAI is partnering with another publisher as it moves towards a licensed approach to training materials. Dotdash Meredith, the owner of brands like People and Better Homes & Gardens, will license its content for OpenAI to train ChatGPT.
TechCrunch
ChatGPT's 'hallucination' problem hit with another privacy complaint in EU
OpenAI is facing another privacy complaint in the European Union. This one, which has been filed by privacy rights nonprofit noyb on behalf of an individual complainant, targets the inability of its AI chatbot ChatGPT to correct misinformation it generates about individuals. Rather more importantly for a resource-rich giant like OpenAI: Data protection regulators can order changes to how information is processed, so GDPR enforcement could reshape how generative AI tools are able to operate in the EU.
Engadget
OpenAI’s new safety team is led by board members, including CEO Sam Altman
OpenAI has created a new Safety and Security Committee less than two weeks after the company dissolved the team tasked with protecting humanity from AI’s existential threats. This latest iteration will include two board members and CEO Sam Altman.

News

Life

Entertainment

Finance

Sports

New on Yahoo

OpenAI's new GPT-4o went viral with a video demoing a 'seeing' AI assisting its 'blind' counterpart, a spectacle that truly needs to be seen to be believed

What you need to know

Recommended Stories

OpenAI's board allegedly learned about ChatGPT launch on Twitter

ChatGPT's mobile app revenue saw its biggest spike yet following GPT-4o launch

OpenAI and Google lay out their competing AI visions

OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here

Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon

OpenAI debuts GPT-4o 'omni' model now powering ChatGPT

Microsoft debuts new Copilot+ PCs using OpenAI's GPT-4o while taking shots at Apple

OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human

Microsoft upgrades its AI app-building platforms

RIP ChatGPT's knockoff Scarlett Johansson voice [2023 — 2024]

What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI

Microsoft outage impacts Bing, Copilot, ChatGPT internet search and other sites

OpenAI inks deal to train AI on Reddit data

OpenAI co-founder and Chief Scientist Ilya Sutskever is leaving the company

Anthropic is expanding to Europe and raising more money

Stack Overflow signs deal with OpenAI to supply data to its models

OpenAI is reportedly working on a search feature for ChatGPT

OpenAI partners with People publisher Dotdash Meredith

ChatGPT's 'hallucination' problem hit with another privacy complaint in EU

OpenAI’s new safety team is led by board members, including CEO Sam Altman