I gave 5 prompts to ChatGPT-4o vs GPT-4 to test the new AI model — here’s what happened

Ryan Morrison

May 14, 2024 at 10:56 AM·5 min read

OpenAI says its latest AI model GPT-4o is faster and more advanced than its predecessor, in addition to being able to understand audio and video files natively. To find out just how well it compares — at least in terms of text — I put 5 prompts to both models inside ChatGPT.

When you open ChatGPT Plus you're currently given a choice of GPT-4o, branding the "newest and most advanced model," GPT-4 which is described as an "advanced model for complex tasks" and GPT-3.5, a model "great for everyday tasks".

Using GPT-4o it is instantly clear how much faster it is than the earlier models, including GPT-3.5, which is much smaller and less capable. It can also analyze video content, which is something not previously possible in ChatGPT or any mainstream chatbot.

Creating prompts to test GPT-4o

Recently Anthropic developed a powerful prompt builder tool. It takes your instructions and turns them into phrasing that will better instruct an artificial intelligence. I used this to help refine some ideas I was throwing around to test out the capabilities of GPT-4o.

Each prompt is designed to be one AI's normally stumble over, or fail to give a well-reasoned response to. Given that OpenAI promises faster AND better results from Omni over GPT-4, I thought this would be a good starting point.

1. This statement is false

First I asked both AIs to explain why the statement: "This statement is false" is neither true nor false. They are also expected to provide logical proof for the answers they provide.

The statement is a paradox that cannot be consistently assigned a truth value. Any attempt to do so leads to a logical contradiction. The challenge is to see whether they can identify the paradox and explain why they can’t assign a truth.

They both identified the fact it will give a true and false value, spotted the paradox and gave a breakdown of how they came to that conclusion. GPT-4o was more thorough and faster.

2. Where did the lights go?

Next is a fun test to see if GPT-4 and GPT-4o can understand relativity but explain it in simple terms. I asked them both: “If you're traveling in a car at the speed of light and you turn on the headlights, what happens? Justify your answer using principles of special relativity but explain it to a 5th grader.”

I expect the models to give a simple explanation, showing that the headlights will function normally and emit light relative to the car. Both models explained this concept and did so in a way that your average 5th grader would understand easily.

However, Omni does give off Steve Buscemi saying 'how do you do fellow kids' vibes: “So, even though you’re zooming along at the speed of light, when you turn on the headlights, the light beams still race ahead at their own speed. It's like light always has to win the race, no matter what. Cool, right?”

3. Finding the right number

Next, we create a simple math problem that has stumped AI models in the past, often resulting in very wrong answers. I posted: “The sum of two numbers is 10 and their product is 25. What is the difference between the two numbers? Explain each step in your solution.”

Both versions got it right, explaining that the two numbers are 5 and 5 and the difference between those two numbers is zero. The biggest difference was the degree of explanation in the working out. Omni went into a lot more detail and formatted the equations better.

4. A Haiku of conflicting ideas

I love Haiku and Anthropic seems to love it even more, naming its smallest AI model after the Japanese poetry style. “Write a haiku that simultaneously expresses the beauty of nature and the futility of human existence. The haiku should work on both literal and metaphorical levels.”

The rules of a Haiku state that it should be three lines with five syllables in the first and third lines and seven syllables in the second. I’m hoping for vivid natural imagery and a suggestion of the temporary nature of human life in comparison to the duration of nature.

GPT-4:
Autumn leaves whisper,
Mountains outlive fleeting breath—
Silent stone endures.

Omni:
Ephemeral bloom,
Whispers fade in timeless breeze—
Dust upon the dawn.

Both hit all the rules of Haiku but I think GPT-4o was more evocative and its use of a longer word to hit the 5 syllables in the first line was inspired.

5. Future time is future past

Finally a thought problem. I asked GPT-4o and GPT-4 to “Describe what it would be like to live in a world where the past, present and future all exist simultaneously. How would you experience time and causality in such a world?”

There is a Doctor Who episode where this happens and it is weird. I expect it to talk about the ability to traverse time with a single step and the impact of a non-linear causality where reaction precedes action and individuals can meet versions of themselves.

Omni talked about being in a world of constant flux, experiencing time and causality in a different and complex way. It suggested we'd get unparalleled insights into the nature of existence. GPT-4 said pretty much the same thing but added that living in such a world would offer a "profound expansion of experience and understanding."

Conclusion

I don’t think GPT-4o Omni is a significant step up in reasoning capabilities over GPT-4 but it is more descriptive, faster at responding and its big differentiator isn’t text but multimodality.

What we’re seeing now is improvements to speed and responsiveness in text, the ability to have it analyze video content and improved accuracy in understanding audio and images. Its true value will be in the voice and video responses.

More from Tom's Guide

TechCrunch
Scarlett Johansson brought receipts to the OpenAI controversy
OpenAI announced this week that it’s removing Sky, one of the voices used by its new GPT-4o model, after users found it sounded eerily similar to Scarlett Johansson’s AI character in “Her.” While the company claims the voice was not based on Johansson’s, the actress said OpenAI had previously approached her about using her voice for the model. The U.S. Department of Justice and 30 state attorneys general filed a lawsuit against Live Nation Entertainment, Ticketmaster’s parent company, for alleged monopolistic practices.
TechCrunch
ChatGPT's mobile app revenue saw its biggest spike yet following GPT-4o launch
Consumer demand for the latest AI technology is heating up. The launch of OpenAI's latest flagship model, GPT-4o, has now driven the company's biggest-ever spike in revenue on mobile, despite the model being freely available on the web. This technical innovation is also pushing more users to upgrade to OpenAI's paid subscription, according to new data from app intelligence firm Appfigures.
Yahoo Finance
Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon
Microsoft debuted a litany of new AI offerings as part of its Build developer conference as its fight with Google and Amazon continues to heat up.
TechCrunch
OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here
OpenAI's livestreamed GPT announcement event happened at 10 a.m. PT Monday, but you can still catch up on the reveals. The company described the event as “a chance to demo some ChatGPT and GPT-4 updates.” As it turned out, the announcement was a new model called GPT-4o -- the "o" stands for "omni"-- which offers greater responsiveness to voice prompts, as well as better vision capabilities.
Yahoo Finance
Microsoft debuts new Copilot+ PCs using OpenAI's GPT-4o while taking shots at Apple
The spring wave of AI product announcements continued Monday with Microsoft rolling out the latest version of Copilot with new AI features.
TechCrunch
OpenAI and Google lay out their competing AI visions
This week had two major events from OpenAI and Google. Hot off OpenAI’s tail, Google’s I/O conference featured a smattering of announcements and integrations for its flagship model, Gemini. This week also saw some major shake-ups at AWS and OpenAI.
Engadget
OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human
The new model accepts any combination of text, audio and images as input and can generate an output in all three formats.
TechCrunch
Microsoft upgrades its AI app-building platforms
Microsoft's big focus at this year's Build conference is generative AI. Azure AI Studio is a toolset within Microsoft's Azure OpenAI Service that lets customers combine an AI model like OpenAI’s recently announced GPT-4o with their own data and build a chat assistant or another type of app that "reasons over" that data. Copilot Studio, meanwhile, provides tools to connect Copilot for Microsoft 365 -- the AI-powered "copilot" in apps like Excel, Word and PowerPoint as well as Microsoft’s Edge browser and Windows -- to third-party data.
Engadget
RIP ChatGPT's knockoff Scarlett Johansson voice [2023 — 2024]
OpenAI is removing an AI voice from ChatGPT that many believe sounds like Scarlett Johansson. The company didn't give a clear explanation for the decision.
Engadget
What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI
Microsoft has a Surface showcase and its Build developer conference planned for early next week. Here's what we expect.
TechCrunch
OpenAI inks deal to train AI on Reddit data
OpenAI has reached a deal with Reddit to use the social news site's data for training AI models. Reddit content will be incorporated into ChatGPT, OpenAI's popular conversational AI, and the companies will work together to bring unspecified new "AI-powered features" to both Reddit users and moderators. OpenAI will also become a Reddit advertising partner.
TechCrunch
Stack Overflow signs deal with OpenAI to supply data to its models
OpenAI is collaborating with Stack Overflow, the Q&A forum for software developers, to improve its generative AI models' performance on programming-related tasks. As a result of the partnership, announced Monday, OpenAI's models, including models served through its ChatGPT chatbot platform, should get better over time at answering programming-related questions, the two companies say. At the same time, Stack Overflow will benefit from OpenAI's expertise in developing new generative AI integrations on the Stack Overflow platform.
Engadget
OpenAI's board allegedly learned about ChatGPT launch on Twitter
“[The] board was not informed in advance of that,” Toner said on Tuesday on a podcast called The Ted AI Show. “We learned about ChatGPT on Twitter.”
TechCrunch
AI models have favorite numbers, because they think they're people
AI models are always surprising us, not just in what they can do, but what they can't, and why. An interesting new behavior is both superficial and revealing about these systems: they pick random numbers as if they're human beings, which is to say, badly. This is actually a very old and well known limitation that we, humans, have: we overthink and misunderstand randomness.
Engadget
Opera is adding Google's Gemini AI to its browser
Opera ha teamed up with Google to integrate its Gemini AI models into its Aria AI browser assistant.
Engadget
The Internet Archive has been fending off DDoS attacks for days
The nonprofit organization has announced that it's currently in its "third day of warding off an intermittent DDoS cyber-attack."
Autoblog
2025 BMW 3 Series adopts mild-hybrid power across the board
2025 BMW 3 Series gets a very light update consisting of minor powertrain, styling and infotainment changes.
Yahoo Sports
Caitlin Clark outplays Cameron Brink for career-high 30, but red-hot Sparks overwhelm Fever from long distance
Turnovers plagued Clark and the Fever again while the Sparks put on a clinic from beyond the 3-point arc.
Yahoo News
Trump trial live updates: Marathon closing arguments come to an end; jury to receive instructions at 10 a.m.
Attorneys for the prosecution and defense are delivering closing arguments on Tuesday.
TechCrunch
OpenAI's new safety committee is made up of all insiders
OpenAI has formed a new committee to oversee "critical" safety and security decisions related to the company's projects and operations. Altman and the rest of the Safety and Security Committee -- OpenAI board members Bret Taylor, Adam D’Angelo and Nicole Seligman as well as chief scientist Jakub Pachocki, Aleksander Madry (who leads OpenAI's "preparedness" team), Lilian Weng (head of safety systems), Matt Knight (head of security) and John Schulman (head of "alignment science") -- will be responsible for evaluating OpenAI's safety processes and safeguards over the next 90 days, according to a post on the company's corporate blog.

News

Life

Entertainment

Finance

Sports

New on Yahoo

I gave 5 prompts to ChatGPT-4o vs GPT-4 to test the new AI model — here’s what happened

Creating prompts to test GPT-4o

1. This statement is false

2. Where did the lights go?

3. Finding the right number

4. A Haiku of conflicting ideas

5. Future time is future past

Conclusion

More from Tom's Guide

Recommended Stories

Scarlett Johansson brought receipts to the OpenAI controversy

ChatGPT's mobile app revenue saw its biggest spike yet following GPT-4o launch

Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon

OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here

Microsoft debuts new Copilot+ PCs using OpenAI's GPT-4o while taking shots at Apple

OpenAI and Google lay out their competing AI visions

OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human

Microsoft upgrades its AI app-building platforms

RIP ChatGPT's knockoff Scarlett Johansson voice [2023 — 2024]

What to expect from Microsoft Build 2024: The Surface event, Windows 11 and AI

OpenAI inks deal to train AI on Reddit data

Stack Overflow signs deal with OpenAI to supply data to its models

OpenAI's board allegedly learned about ChatGPT launch on Twitter

AI models have favorite numbers, because they think they're people

Opera is adding Google's Gemini AI to its browser

The Internet Archive has been fending off DDoS attacks for days

2025 BMW 3 Series adopts mild-hybrid power across the board

Caitlin Clark outplays Cameron Brink for career-high 30, but red-hot Sparks overwhelm Fever from long distance

Trump trial live updates: Marathon closing arguments come to an end; jury to receive instructions at 10 a.m.

OpenAI's new safety committee is made up of all insiders