Google DeepMind launches new framework to assess the dangers of AI models

Reed Albergotti

May 17, 2024 at 10:02 AM·4 min read

Title icon

The Scoop

Preparing for a time when artificial intelligence is so powerful that it can pose a serious, immediate threat to people, Google DeepMind on Friday released a framework for peering inside AI models to determine if they’re approaching dangerous capabilities.

The paper released Friday describes a process in which DeepMind’s models will be reevaluated every time the compute power used to train the model increases six-fold, or is fine-tuned for three months. Early warning evaluations will help make up for the the time between evaluations.

DeepMind will work with other companies, academia and lawmakers to improve the framework, according to a statement shared exclusively with Semafor. It plans to start implementing its auditing tools by 2025.

Today, evaluating powerful, frontier AI models is more of an ad hoc process that is constantly evolving as researchers develop new techniques. “Red teams” spend weeks or months testing them by trying out different prompts that might bypass safeguards. Then companies implement various techniques, from reinforcement learning to special prompts to corral the models into compliance.

That approach works for models today because they aren’t powerful enough to pose much of a threat, but researchers believe a more robust process is needed as models gain capabilities. As that changes, critics worry that by the time people realize the technology has gone too far, it’ll be too late.

The Frontier Safety Framework released by DeepMind looks to address that issue. It’s one of several methods announced by major tech companies, including Meta, OpenAI, and Microsoft, to mitigate concerns about AI.

“Even though these risks are beyond the reach of present-day models, we hope that implementing and improving the framework will help us prepare to address them,” the company said.

Know More

DeepMind has been working on “early warning” systems for AI models for over a year. And it’s published papers on new methods to evaluate models that go far beyond the ones used by most companies today.

The Frontier Model Framework incorporates those advances in a succinct set of protocols, including ongoing evaluation of models and mitigation methods that researchers should take if they do discover what they call “critical capability levels.” That could be a model that is able to complete long-term, complex tasks, which DeepMind calls “exceptional agency,” or the capability to write sophisticated malware.

DeepMind has set specific critical capability levels for four domains: autonomy, biosecurity, cybersecurity, and machine learning research and development.

“Striking the optimal balance between mitigating risks and fostering access and innovation is paramount to the responsible development of AI,” the company said.

DeepMind will discuss the framework at an AI safety summit in Seoul next week where other industry leaders will be in attendance.

Title icon

Reed’s view

It’s encouraging that AI researchers at Google DeepMind are making progress on more scientific methods to determine what’s happening inside AI models, though they still have a way to go.

It’s also helpful for AI safety to see that, as researchers make breakthroughs on capabilities, they are also improving their ability to understand and ultimately control this software.

However, the paper released today is light on technical details about how these evaluations will be conducted. I’ve read several research papers DeepMind has put out on this subject and it appears that AI models will be used to evaluate emerging models.

I hope to learn more about how, exactly, this will be done and will write more about that in the future. For now, I think it’s fair to say we don’t really know if the technology is currently there to make this framework a success.

There’s an interesting regulatory component to this, as well. A new comprehensive AI bill in California, sponsored by state Sen. Scott Wiener, would require AI companies to evaluate the dangers of models before they are trained. This framework is the first I’ve seen that might make compliance with that law possible. But again, it’s not clear whether it’s technically possible today.

Another point: Building these techniques could be useful in another way: By helping companies forecast how AI model capabilities will change in the coming months and years. That knowledge could help product teams design new offerings more quickly, giving an advantage to Google and other companies with the ability to do these evaluations.

Title icon

Room for Disagreement

Critics of AI, such as Eliezer Yudkowsky, have expressed skepticism that humans will be able to determine whether an AI model has gained “superintelligence” fast enough to make a difference.

Their argument is that the very nature of the technology means that it will be able to outsmart anything humans can come up with.

Semafor Logo

TechCrunch
Google still hasn't fixed Gemini's biased image generator
Back in February, Google paused its AI-powered chatbot Gemini's ability to generate images of people after users complained of historical inaccuracies. Google CEO Sundar Pichai apologized, and Demis Hassabis, the co-founder of Google's AI research division DeepMind, said that a fix should arrive "in very short order" -- within the next couple of weeks. Google touted plenty of other Gemini features at its annual I/O developer conference this week, from custom chatbots to a vacation itinerary planner and integrations with Google Calendar, Keep and YouTube Music.
TechCrunch
Google's image-generating AI gets an upgrade
Google’s upgrading its image-generation tech to keep apace with rivals. At the company's I/O developer conference in Mountain View on Tuesday, Google announced Imagen 3, the latest in the tech giant's Imagen generative AI model family. Demis Hassabis, CEO of DeepMind, Google's AI research division, said that Imagen 3 more accurately understands the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations.
TechCrunch
Google Veo, a serious swing at AI-generated video, debuts at Google I/O 2024
Google’s gunning for OpenAI’s Sora with Veo, an AI model that can create 1080p video clips around a minute long given a text prompt. Unveiled on Tuesday at Google’s I/O 2024 developer conference, Veo can capture different visual and cinematic styles, including shots of landscapes and time lapses, and make edits and adjustments to already generated footage. “We’re exploring features like storyboarding and generating longer scenes to see what Veo can do,” Demis Hassabis, head of Google’s AI R&D lab DeepMind, told reporters during a virtual roundtable.
TechCrunch
Google DeepMind debuts huge AlphaFold update and free proteomics-as-a-service web app
Google DeepMind has taken the wraps off a new version of AlphaFold, their transformative machine learning model that predicts the shape and behavior of proteins. AlphaFold 3 is not only more accurate, but predicts interactions with other biomolecules, making it a far more versatile research tool — and the company is putting a limited version of the model free to use online. In recent years, computational modeling techniques like AlphaFold and RoseTTaFold have taken over from expensive, lab-based methods, accelerating the work of thousands of researchers across as many fields.
Yahoo News
What went wrong with Google's new AI search feature — and what the company is doing to try to fix it
Google's new AI search feature was bound to have its issues. Now, after weeks of jokes and memes on social media, the tech giant is responding.
Yahoo Life
Can a Mediterranean diet help you live longer? Should night owls be going to sleep earlier? The top health tips from this week's headlines.
What we learned this week about the bird flu outbreak, sleep and smartphones.
TechCrunch
Hugging Face says it detected 'unauthorized access' to its AI model hosting platform
Late Friday afternoon, a time window companies usually reserve for unflattering disclosures, AI startup Hugging Face said that its security team earlier this week detected "unauthorized access" to Spaces, Hugging Face's platform for creating, sharing and hosting AI models and resources. In a blog post, Hugging Face said that the intrusion related to Spaces secrets, or the private pieces of information that act as keys to unlock protected resources like accounts, tools and dev environments, and that it has "suspicions" some secrets could've been accessed by a third party without authorization. As a precaution, Hugging Face has revoked a number of tokens in those secrets.
Engadget
The Tribeca Film Festival will debut a bunch of short films made by AI
The Tribeca Film Festival will debut a bunch of short films made by AI. They are being made using OpenAI’s Sora model.
Engadget
The Morning After: Google tightens up its AI Overview feature after suggesting glue on a pizza
The biggest news stories this morning: Silent Hill 2 remake hits PS5 and PC on October 8, OpenAI says it stopped multiple covert influence operations that abused its AI models, Until Dawn remaster is coming to PS5 and PC this fall.
Yahoo Life Shopping
8 most affordable hearing aids of 2024, according to experts
Listen up! Brands like Zepp, Jabra, Signia and Horizon are delivering some of the best inexpensive hearing aids on the market.
Yahoo Sports
Orioles' John Means, Tyler Wells to both undergo season-ending surgery for second torn UCLs
Means, a former All-Star, is a free agent after this season.
Yahoo Sports
Birmingham-Southern rallies late, but loses DIII College World Series opener to Salve Regina
Birmingham-Southern fell behind 7–0 in its Division III College World Series matchup with Salve Regina and couldn't overcome the deficit in a 7–5 defeat.
TechCrunch
Live Nation confirms Ticketmaster was hacked, says personal information stolen in data breach
Entertainment giant Live Nation has confirmed its ticketing subsidiary Ticketmaster has been hacked. Live Nation confirmed the data breach in a filing with government regulators late on Friday after the markets closed. In its statement, Live Nation said the breach occurred on May 20, and that a cybercriminal "offered what it alleged to be Company user data for sale via the dark web."
TechCrunch
General Catalyst-backed Jasper Health lays off staff
Jasper Health, a cancer care platform startup, laid off a substantial part of its workforce, TechCrunch has learned. Engineering and product design were among the departments impacted by the cuts, according to posts on LinkedIn from impacted employees. TechCrunch was unable to independently verify the exact number of people who were cut, but an industry source who knew impacted people believes it was approximately half of Jasper Health's small team.
TechCrunch
Inside EV startup Fisker’s collapse: how the company crumbled under its founders' whims
Over the past eight years, famed vehicle designer Henrik Fisker suggested his electric vehicle startup would deliver on all of these promises. Instead, Fisker Inc. is on the brink of bankruptcy after having delivered just a few thousand electric Ocean SUVs. As the company grasps for an improbable rescue, employees who spoke to TechCrunch say the blame largely rests on the shoulders of two people: the husband-and-wife team whose name is on the hood.
Engadget
Apple is reportedly overhauling Siri with AI for improved voice controls
Apple is working on a version of Siri that will use advanced AI powered by large language models (LLMs).
Autoblog
How to clean the inside of your windshield
There are a lot of issues you can ignore with your car’s condition and cleanliness, but a dirty windshield isn’t one of them. Here's how to rid your glass of oily residue for a safe view of the road.
Yahoo Sports
President Biden dons Chiefs helmet, jokes with Travis Kelce in Super Bowl champs' second White House visit
Travis Kelce crashed the White House podium last year. This time, he received a direct invitation.
Autoblog
How to get a car loan
The five steps you need to take in order to finance the purchase of a new car.
Engadget
The Google Pixel Watch 2 is $65 off and cheaper than ever
The Google Pixel Watch 2 is $65 off and cheaper than ever. It’s down to $285 with a promo code Wellbots.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Google DeepMind launches new framework to assess the dangers of AI models

The Scoop

Know More

Reed’s view

Room for Disagreement

Recommended Stories

Google still hasn't fixed Gemini's biased image generator

Google's image-generating AI gets an upgrade

Google Veo, a serious swing at AI-generated video, debuts at Google I/O 2024

Google DeepMind debuts huge AlphaFold update and free proteomics-as-a-service web app

What went wrong with Google's new AI search feature — and what the company is doing to try to fix it

Can a Mediterranean diet help you live longer? Should night owls be going to sleep earlier? The top health tips from this week's headlines.

Hugging Face says it detected 'unauthorized access' to its AI model hosting platform

The Tribeca Film Festival will debut a bunch of short films made by AI

The Morning After: Google tightens up its AI Overview feature after suggesting glue on a pizza

8 most affordable hearing aids of 2024, according to experts

Orioles' John Means, Tyler Wells to both undergo season-ending surgery for second torn UCLs

Birmingham-Southern rallies late, but loses DIII College World Series opener to Salve Regina

Live Nation confirms Ticketmaster was hacked, says personal information stolen in data breach

General Catalyst-backed Jasper Health lays off staff

Inside EV startup Fisker’s collapse: how the company crumbled under its founders' whims

Apple is reportedly overhauling Siri with AI for improved voice controls

How to clean the inside of your windshield

President Biden dons Chiefs helmet, jokes with Travis Kelce in Super Bowl champs' second White House visit

How to get a car loan

The Google Pixel Watch 2 is $65 off and cheaper than ever