Researchers Chart Alarming Decline in ChatGPT Response Quality

Mark Tyson

July 19, 2023 at 2:39 PM·3 min read

In recent months there has been a groundswell of anecdotal evidence and general murmurings concerning a decline in the quality of ChatGPT responses. A team of researchers from Stanford and UC Berkeley decided to determine whether there was indeed degradation and come up with metrics to quantity the scale of detrimental change. To cut a long story short, the dive in ChatGPT quality certainly wasn't imagined.

Three distinguished academics, Matei Zaharia, Lingjiao Chen, and James Zou, were behind the recently published research paper How Is ChatGPT's Behavior Changing Over Time? (PDF) Earlier today, Computer Science Professor at UC Berkeley, Zaharia, took to Twitter to share the findings. He startlingly highlighted that "GPT -4's success rate on 'is this number prime? think step by step' fell from 97.6% to 2.4% from March to June."

GPT-4 became generally available about two weeks ago and was championed by OpenAI as its most advanced and capable model. It was quickly released to paying API developers, claiming it could power a range of new innovative AI products. Therefore, it is sad and surprising that the new study finds it so wanting of quality responses in the face of some pretty straightforward queries.

We have already given an example of GPT-4's superlative failure rate in the above prime number queries. The research team designed tasks to measure the following qualitative aspects of ChatGPT's underlying large language models (LLMs) GPT-4 and GPT-3.5. Tasks fall into four categories, measuring a diverse range of AI skills while being relatively simple to evaluate for performance.

Solving math problems
Answering sensitive questions
Code generation
Visual reasoning

An overview of the performance of the Open AI LLMs is provided in the chart below. The researchers quantified GPT-4 and GPT-3.5 releases across their March 2023 and June 2023 releases.

It is clearly illustrated that the "same" LLM service answers queries quite differently over time. Significant differences are seen over this relatively short period. It remains unclear how these LLMs are updated and if changes to improve some aspects of their performance can negatively impact others. See how much 'worse' the newest version of GPT-4 is compared to the March version in three testing categories. It only enjoys a win of a small margin in visual reasoning.

Some may be unbothered about the variable quality observed in the 'same versions' of these LLMs. However, the researchers note, "Due to the popularity of ChatGPT, both GPT-4 and GPT-3.5 have been widely adopted by individual users and a number of businesses." Therefore, it isn't beyond the bounds of possibility that some GPT-generated information can affect your life.

The researchers have voiced their intent to continue to assess GPT versions in a longer study. Perhaps Open AI should monitor and publish its own regular quality checks for its paying customers. If it can't be clearer about this, it may be necessary for business or governmental organizations to keep an check on some basic quality metrics for these LLMs, which can have significant commercial and research impacts.

No, we haven't made GPT-4 dumber. Quite the opposite: we make each new version smarter than the previous one.Current hypothesis: When you use it more heavily, you start noticing issues you didn't see before.July 13, 2023

AI and LLM tech isn't a stranger to surprising issues, and with the industry's data pilfering claims and other PR quagmires, it currently seems to be the latest 'wild west' frontier on connected life and commerce.

Yahoo Sports
Chiefs' Harrison Butker may be removed from kickoffs due to new NFL rules
Kansas City Chiefs special teams coach Dave Toub said kicker Harrison Butker may be removed from kickoffs. But not because of Butker's recent controversial remarks.
Yahoo Finance
Beware the retirement savings 'time bomb,' tax expert warns
Taxes are the "retirement time bomb," according to one tax expert. Here's what you can do now.
Yahoo Sports
White Sox's Tommy Pham says he's always prepared to 'f*** somebody up' after confrontation with Brewers' William Contreras
Chicago White Sox outfielder Tommy Pham told reporters he's always prepared to fight after an on-field confrontation with Milwaukee Brewers catcher William Contreras.
Yahoo Sports
Top running backs for 2024 fantasy football, according to our experts
The Yahoo Fantasy football analysts reveal their first running back rankings for the 2024 NFL season.
Yahoo Sports
Aaron Judge finishes off torrid May by breaking a Babe Ruth and Lou Gehrig Yankees record
The Yankees slugger was struggling massively in April. He now leads MLB in home runs.
Yahoo Sports
Gable Steveson, Olympic gold medal winner and NCAA champion wrestler, signs with Bills
Olympic gold medal and two-time NCAA champion wrestler Gable Steveson has signed with the Buffalo Bills. He'll attempt to make the team as a defensive tackle.
Yahoo Sports
The Spin: Making a call on 5 slumping fantasy baseball stars
All five of these hitters were drafted highly in fantasy baseball leagues. So far, they have not lived up to their ADPs — and that's an understatement. Scott Pianowski analyzes.
Yahoo Finance
Ukraine’s best hope may be a faltering Russian economy
Sanctions are hurting Russia's economy more than President Vladimir Putin wants anybody to think. Keeping the pressure on might ultimately help Ukraine win.
Yahoo Sports
TE Darren Waller reportedly expected to retire ahead of Giants' mandatory minicamp
The Giants acquired Waller in a March 2023 trade.
Yahoo Sports
NASCAR: Austin Cindric wins at Gateway after Ryan Blaney's car slows on the final lap
Blaney appeared to run out of gas as he took the white flag.
Autoblog
Top 10 cars with drivers who get speeding tickets in 2024
The most recent study from Insurify lists the car models with drivers who have received the most speeding tickets so far in 2024.
Yahoo Finance
Warren Buffett's son Howard Buffett on his life as the potential next chairman of Berkshire Hathaway
Howard Buffett talks about life at Berkshire Hathaway and why he is donating hundreds of millions of dollars to Ukraine.
Yahoo Sports
Former 76ers player Drew Gordon, older brother of Nuggets starter Aaron Gordon, dies in car accident at 33
Drew Gordon played all over Europe after a college career at UCLA and New Mexico.
Yahoo Sports
Yankees' Juan Soto called out, Aaron Boone ejected on another bizarre infield fly interference play
Two runners have been called out for interference on an infield fly in the past week.
Yahoo Finance
Gen X is the 401(k) 'experiment generation.' Here's how that's playing out.
Nearly half of Gen Xers say their retirement savings are behind schedule, according to a new survey.
Yahoo Finance
Ford CEO on the future of EVs, Detroit, and his relationship with Tesla's Elon Musk
Ford CEO Jim Farley sat down for a new edition of Yahoo Finance's Opening Bid podcast, sharing why the auto giant has spent $1 billion to rebuild a Detroit landmark and why he remains bullish on EVs.
Yahoo Sports
Giants TE Darren Waller drops bizarre music video about recent divorce, complete with a fake Kelsey Plum
The video ends with the fake Plum stabbing Waller in the back.
Yahoo Sports
Trey Lance is close to 'being a master' of Cowboys offense, says coach Mike McCarthy
Dallas Cowboys coach Mike McCarthy praised quarterback Trey Lance, saying he's close to mastering the team's offense. Lance did not play last season.
Yahoo Sports
Charges against Scottie Scheffler dropped in police incident during PGA Championship
The World No. 1 is free of all charges stemming from a confrontation outside Valhalla Golf Club on May 17.
Yahoo Sports
Lashinda Demus will get her Olympic gold medal ... 12 years later
Demus will receive her gold medal at a ceremony at the foot of the Eiffel Tower during the 2024 Summer Olympics.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Researchers Chart Alarming Decline in ChatGPT Response Quality

Recommended Stories

Chiefs' Harrison Butker may be removed from kickoffs due to new NFL rules

Beware the retirement savings 'time bomb,' tax expert warns

White Sox's Tommy Pham says he's always prepared to 'f*** somebody up' after confrontation with Brewers' William Contreras

Top running backs for 2024 fantasy football, according to our experts

Aaron Judge finishes off torrid May by breaking a Babe Ruth and Lou Gehrig Yankees record

Gable Steveson, Olympic gold medal winner and NCAA champion wrestler, signs with Bills

The Spin: Making a call on 5 slumping fantasy baseball stars

Ukraine’s best hope may be a faltering Russian economy

TE Darren Waller reportedly expected to retire ahead of Giants' mandatory minicamp

NASCAR: Austin Cindric wins at Gateway after Ryan Blaney's car slows on the final lap

Top 10 cars with drivers who get speeding tickets in 2024

Warren Buffett's son Howard Buffett on his life as the potential next chairman of Berkshire Hathaway

Former 76ers player Drew Gordon, older brother of Nuggets starter Aaron Gordon, dies in car accident at 33

Yankees' Juan Soto called out, Aaron Boone ejected on another bizarre infield fly interference play

Gen X is the 401(k) 'experiment generation.' Here's how that's playing out.

Ford CEO on the future of EVs, Detroit, and his relationship with Tesla's Elon Musk

Giants TE Darren Waller drops bizarre music video about recent divorce, complete with a fake Kelsey Plum

Trey Lance is close to 'being a master' of Cowboys offense, says coach Mike McCarthy

Charges against Scottie Scheffler dropped in police incident during PGA Championship

Lashinda Demus will get her Olympic gold medal ... 12 years later