Majority of Humans Fooled by GPT-4 in Turing Test, Scientists Find

Noor Al-Sibai

May 17, 2024 at 9:55 AM·2 min read

Oops!
Something went wrong.
Please try again later.

Pass/Fail

OpenAI's GPT-4 is so lifelike, it can apparently trick more than 50 percent of human test subjects into thinking they're talking to a person.

In a new paper, cognitive science researchers from the University of California San Diego found that more than half the time, people mistook writing from GPT-4 as having been written by a flesh-and-blood human. In other words, the large language model (LLM) passes the Turing test with flying colors.

The researchers performed a simple experiment: they asked roughly 500 people to have five-minute text-based conversations with either a human or a chatbot built on GPT-4. They then asked the subjects if they thought they'd been conversing with a person or an AI.

The results, as the San Diego scientists reported in their not-yet-peer-reviewed paper, were telling: 54 percent of the subjects believed they'd been speaking to humans when they'd actually been chatting with OpenAI's creation.

First theorized back in 1950 by computer science pioneer Alan Turing, the Turing Test is more of a thought experiment than an actual battery of tests. In his original test, Turing had three "players" — a human interrogator, a witness of indeterminate humanity or machine-ness, and a human observer.

For their study, the UC San Diego researchers tweaked Turing's original three-player formula by eliminating the third human observer to simplify the setup. They then had the 500 participants communicate with one of four witness types: another human, GPT-3.5, GPT-4, or the rudimentary ELIZA chatbot from the 1960s.

Coin Toss

Jones and Bergen hypothesized that the study's subjects would generally be able to tell most of the time if they were communicating with either a human or ELIZA, but that when it came to the OpenAI LLMs, they would essentially have a 50/50 chance.

As it turns out, they were pretty much on the money. Beyond the 54 percent who mistook GPT-4 for a human, exactly 50 percent of the subjects confused GPT-3.5, the latest LLM's direct predecessor, for a person as well. Compared to the 22 percent who thought ELIZA was the real deal, that's pretty stunning.

https://twitter.com/emollick/status/1790877242525942156

Despite still being under review, the paper has already made waves in the tech world with a shoutout from Ethereum cofounder Vitalik Buterin, who declared on the Farcaster social network that to his mind, the San Diego research "counts as [GPT-4] passing the Turing test."

While others have claimed to observe OpenAI's GPT models passing the Turing test, the Buterin endorsement makes this study stand apart — though we'll probably have to wait for the paper to be peer-reviewed until any grander declarations can be made.

More on GPT-4: OpenAI Secretly Trained GPT-4 With More Than a Million Hours of Transcribed YouTube Videos

Yahoo Finance
Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon
Microsoft debuted a litany of new AI offerings as part of its Build developer conference as its fight with Google and Amazon continues to heat up.
TechCrunch
ChatGPT's mobile app revenue saw its biggest spike yet following GPT-4o launch
Consumer demand for the latest AI technology is heating up. The launch of OpenAI's latest flagship model, GPT-4o, has now driven the company's biggest-ever spike in revenue on mobile, despite the model being freely available on the web. This technical innovation is also pushing more users to upgrade to OpenAI's paid subscription, according to new data from app intelligence firm Appfigures.
TechCrunch
OpenAI and Google lay out their competing AI visions
This week had two major events from OpenAI and Google. Hot off OpenAI’s tail, Google’s I/O conference featured a smattering of announcements and integrations for its flagship model, Gemini. This week also saw some major shake-ups at AWS and OpenAI.
TechCrunch
OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here
OpenAI's livestreamed GPT announcement event happened at 10 a.m. PT Monday, but you can still catch up on the reveals. The company described the event as “a chance to demo some ChatGPT and GPT-4 updates.” As it turned out, the announcement was a new model called GPT-4o -- the "o" stands for "omni"-- which offers greater responsiveness to voice prompts, as well as better vision capabilities.
TechCrunch
OpenAI debuts GPT-4o 'omni' model now powering ChatGPT
OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the "o" stands for "omni," referring to the model's ability to handle text, speech, and video. OpenAI CTO Mira Murati said that GPT-4o provides "GPT-4-level" intelligence but improves on GPT-4's capabilities across multiple modalities and media. "GPT-4o reasons across voice, text and vision," Murati said during a streamed presentation at OpenAI's offices in San Francisco on Monday.
Engadget
OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human
The new model accepts any combination of text, audio and images as input and can generate an output in all three formats.
Autoblog
Tesla recalling 125,000 cars over seatbelt warning failure
Tesla is recalling 125,227 vehicles in the United States due to a malfunction in its seatbelt warning system that can increase the risk of injury in a collision, the National Highway Traffic Safety Administration said on Friday.
Yahoo Life
Your happy, healthy guide to June: Summer solstice, swimming pools and the season's best produce
Why you should try yoga — or a CPR class — this month, and other health reminders.
Engadget
This tool unlocks Windows' AI-powered Recall feature for unsupported PCs
A console Windows app on Github called Amperage will allow users to run Recall even on older computers that the feature doesn't officially support.
Yahoo Entertainment
First we had 'quiet luxury,' then we had 'mob wife' glamour. Now with tenniscore, being rich is firmly in style.
“As ridiculous as some of these trends are, taking part in them is escapist and aspirational," fashion influencer Alex Frankel told Yahoo Entertainment.
Yahoo Sports
Aaron Judge finishes off torrid May by breaking a Babe Ruth and Lou Gehrig Yankees record
The Yankees slugger was struggling massively in April. He now leads MLB in home runs.
Yahoo Sports
Kings coach Mike Brown agrees to contract extension after previous impasse with team, per reports
Mike Brown could make up to $10 million annually with the Kings.
Yahoo Sports
Braves, down Ronald Acuña Jr., hope May’s malaise doesn’t lead to June swoon — 'We’re too talented'
Even before his injury, Acuña was one of many Braves struggling at the plate, and the team now faces a roster-wide power outage.
Yahoo Sports
Sun continue unbeaten start after fun play goes very wrong for Wings
The Connecticut Sun extended their undefeated start for the 2024 WNBA season to 7–0. But it was a very close win.
Yahoo Sports
Orioles' John Means, Tyler Wells to both undergo season-ending surgery for second torn UCLs
Means, a former All-Star, is a free agent after this season.
Yahoo Life Shopping
Cut the car clutter with Amazon's top-selling trash can, on sale for just $5
Say goodbye to mobile messes with this clever multitasker — it's leakproof to spare your ride from sticky spills.
TechCrunch
Inside EV startup Fisker’s collapse: how the company crumbled under its founders' whims
Over the past eight years, famed vehicle designer Henrik Fisker suggested his electric vehicle startup would deliver on all of these promises. Instead, Fisker Inc. is on the brink of bankruptcy after having delivered just a few thousand electric Ocean SUVs. As the company grasps for an improbable rescue, employees who spoke to TechCrunch say the blame largely rests on the shoulders of two people: the husband-and-wife team whose name is on the hood.
TechCrunch
Hacked, leaked, exposed: Why you should never use stalkerware apps
Last week, an unknown hacker broke into the servers of the U.S.-based stalkerware maker pcTattletale. The hacker then stole and leaked the company’s internal data. “This took a total of 15 minutes from reading the techcrunch article,” the hackers wrote in the defacement, referring to a recent TechCrunch article where we reported that pcTattletale was used to monitor several front desk check-in computers at Wyndham hotels across the United States.
Yahoo Life Shopping
This flowy nightgown is 'like wearing nothing at all' — and it's just $15
'Like wearing nothing at all': No wonder the easy-breezy number has a 3,400-person fan club.
Engadget
Meta says the future of Facebook is young adults (again)
Meta is once again telling the world it intends to reorient its platform in order to appeal to younger users.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Majority of Humans Fooled by GPT-4 in Turing Test, Scientists Find

Pass/Fail

Coin Toss

Recommended Stories

Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon

ChatGPT's mobile app revenue saw its biggest spike yet following GPT-4o launch

OpenAI and Google lay out their competing AI visions

OpenAI's ChatGPT announcement: Watch the GPT-4o reveal and demo here

OpenAI debuts GPT-4o 'omni' model now powering ChatGPT

OpenAI claims that its free GPT-4o model can talk, laugh, sing and see like a human

Tesla recalling 125,000 cars over seatbelt warning failure

Your happy, healthy guide to June: Summer solstice, swimming pools and the season's best produce

This tool unlocks Windows' AI-powered Recall feature for unsupported PCs

First we had 'quiet luxury,' then we had 'mob wife' glamour. Now with tenniscore, being rich is firmly in style.

Aaron Judge finishes off torrid May by breaking a Babe Ruth and Lou Gehrig Yankees record

Kings coach Mike Brown agrees to contract extension after previous impasse with team, per reports

Braves, down Ronald Acuña Jr., hope May’s malaise doesn’t lead to June swoon — 'We’re too talented'

Sun continue unbeaten start after fun play goes very wrong for Wings

Orioles' John Means, Tyler Wells to both undergo season-ending surgery for second torn UCLs

Cut the car clutter with Amazon's top-selling trash can, on sale for just $5

Inside EV startup Fisker’s collapse: how the company crumbled under its founders' whims

Hacked, leaked, exposed: Why you should never use stalkerware apps

This flowy nightgown is 'like wearing nothing at all' — and it's just $15

Meta says the future of Facebook is young adults (again)