The risks of expanding the definition of ‘AI safety’

  • Oops!
    Something went wrong.
    Please try again later.

The Scene

The concept of “AI safety” has been expanding to include everything from the threat of human extinction to algorithmic bias to concerns that AI image generators aren’t diverse.

Researcher Eliezer Yudkowsky, who’s been warning about the risks of AI for more than a decade on podcasts and elsewhere, believes that lumping all of those concerns into one bucket is a bad idea. “You want different names for the project of ‘having AIs not kill everyone’ and ‘have AIs used by banks make fair loans,” he said. “Broadening definitions is usually foolish, because it is usually wiser to think about different problems differently, and AI extinction risk and AI bias risk are different risks.”

Yudkowsky, an influential but controversial figure for his alarmist views on AI (he wrote in Time about the potential necessity of ordering airstrikes on AI data centers), isn’t alone in his views. Others in the AI industry worry that “safety” in AI, which has come to underpin guardrails that companies are implementing, may become politicized as it grows to include hot button social issues like bias and diversity. That could erode its meaning and power, even as it receives huge public and private investment, and unprecedented attention.

“It’s better to just be more precise about the concerns that you have,” said Anthony Aguirre, executive director of the Future of Life Institute, which has long focused on existential risks posed by AI. “If you’re talking about deepfakes, talk about deepfakes. And if you’re talking about the risk of open-source foundation models, talk about that. My guess is we’re maybe at maximum safety coverage before we start using other adjectives.”

What, exactly, should be included under the AI safety umbrella was the subject of a panel discussion Tuesday at the National Artificial Intelligence Advisory Committee, which is made up of business executives, academics and others who advise the White House.

One attendee told Semafor that the meeting reinforced the growing emphasis in the industry on making sure the term encompasses a broader array of harms than just physical threats.

But the attendee said one worry with the expanded definition is that it lumps inherently political concepts like content moderation in with non-political issues like mitigating the risk of bioweapons. The risk is that AI safety becomes synonymous with what conservatives view as “woke AI.”

For most of the past decade, the term “AI safety” was used colloquially by a small group of people concerned about the largely theoretical, catastrophic risks of artificial intelligence. And until recently, people working in other fields focused on issues like algorithmic bias and surveillance viewed themselves as working in entirely separate fields.

Now, as generative AI products like ChatGPT have put the technology on the forefront of every industry, AI safety is becoming an umbrella term that lumps nearly every potential downside of software automation into a single linguistic bucket. And decisions like which ethnicities AI image generators should include are considered part of AI safety.

The newly created government agency, the AI Safety Institute, for example, includes in its mandate everything from nuclear weapons to privacy to workforce skills. And Google recently folded 90% of an AI ethics group, called the Responsible Innovation Team, into its Trust and Safety team, a company spokesman told Wired.

Mike Solana, a vice president at Founders Fund and author of the newsletter Pirate Wires, is a frequent critic of content moderation policies at big tech companies.

“The purposeful confusion of terms here is totally absurd,” he said. “Obviously there is a difference between mitigating existential risk for human civilization and the question of whether or not there are enough Muslim women in headscarves turning up when you try to generate a picture of an auto mechanic. But that confusion of terms benefits people obsessed with the DEI stuff, and they’re very good at navigating bureaucracies, so here we are.”

Know More

Some leading figures in the AI world have advocated in support of a bigger AI safety tent. Alondra Nelson, who spearheaded the White House argued in a July article in : “Years of sociotechnical research show that advanced digital technologies, left unchecked, are used to pursue power and profit at the expense of human rights, social justice, and democracy. Making advanced AI safe means understanding and mitigating risks to those values, too.”

When incidents like Google’s Gemini model depicting America’s founding fathers as Black occur, advocates worry it hurts the overall AI safety movement.

“Most people in the field of AI bias have been very scornful of extinction risk from AI,” Yudkowsky said. “My guess is that this faction is trying to broaden the definition of any word used to describe AI anti-extinction efforts, in an effort to kill anti-extinction efforts.”

Reed’s view

AI has always had a vocabulary problem, from the term itself to words like “neural networks,” borrowed somewhat dubiously from terms around the human brain.

The mystique around AI has clouded our judgment on how to deal with what, at its core, this really is: automation.

One day, we might create software that is truly intelligent, but until that day comes, what we’ve got are automation tools. They have major implications for humanity but that’s nothing new. Every time we automate any part of human labor, the ripple effects are felt in every corner of society.

In other parts of industry, we deal with safety very differently. The Occupational Safety and Health Administration, for instance, is tasked with making workplaces safe from physical harm.

Imagine if OSHA were also responsible for preventing workplace discrimination, retraining workers who are laid off, ensuring that all the products made at businesses are safe, and reducing carbon emissions at workplaces. That’s similar to what some people are suggesting we do with AI safety.

No one agency, organization, or profession has the bandwidth to focus on every potential impact of AI. So we should probably not lump all those impacts into one blanket term.

The same people tasked with making sure we don’t accidentally trigger a nuclear holocaust or allow criminals to easily create bioweapons probably shouldn’t be the same ones making sure chatbot image creators reflect ethnic diversity.

Lumping those things together exposes them to toxic political discourse, reducing their chances for success.

Room for Disagreement

Professor Alondra Nelson of the Institute for Advanced Study said the term AI safety has always included a hodgepodge of issues and that, in reality, some people want to make it more narrow.

“The crucial issue of safe algorithmic systems was first scoped by academic researchers and, building from this, by trust and safety teams at tech companies,” she said. “This foundational AI safety work always included a broad definition of safety. The U.K. AI Safety Institute announced last fall attempted to constrict to these existing efforts by narrowing them to technical issues concerning future advanced AI systems. But the Vice President’s speech on that same weekend made clear that the U.S. approach to AI safety included this broader definition—from mental health issues involving our youth, to consumer fraud issues impacting seniors, to algorithmic surveillance of workers. These issues all have some bipartisan support. Given this history and stated U.S. policy, broadening out from the narrow U.K. conception, shows consistency.”

Notable

  • The trouble with trying to content moderate LLM chatbots is that prompt engineers continuously figure out ingenious ways to break those protections, so chatbots end up saying things they’re not supposed to say. Research conducted by Scale AI and The Center for AI Safety showed that a better method might be “unlearning,” Time reports.