AI has already figured out how to deceive humans

Lakshmi Varanasi

May 11, 2024 at 2:46 PM·3 min read

AI has already figured out how to deceive humans

A new research paper found that various AI systems have learned the art of deception.
Deception is the "systematic inducement of false beliefs."
This poses several risks for society, from fraud to election tampering.

AI can boost productivity by helping us code, write, and synthesize vast amounts of data. It can now also deceive us.

A range of AI systems have learned techniques to systematically induce "false beliefs in others to accomplish some outcome other than the truth," according to a new research paper.

The paper focused on two types of AI systems: special-use systems like Meta's CICERO, which are designed to complete a specific task, and general-purpose systems like OpenAI's GPT-4, which are trained to perform a diverse range of tasks.

While these systems are trained to be honest, they often learn deceptive tricks through their training because they can be more effective than taking the high road.

"Generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI's training task. Deception helps them achieve their goals," the paper's first author Peter S. Park, an AI existential safety postdoctoral fellow at MIT, said in a news release.

Meta's CICERO is "an expert liar"

AI systems trained to "win games that have a social element" are especially likely to deceive.

Meta's CICERO, for example, was developed to play the game Diplomacy — a classic strategy game that requires players to build and break alliances.

Meta said it trained CICERO to be "largely honest and helpful to its speaking partners," but the study found that CICERO "turned out to be an expert liar." It made commitments it never intended to keep, betrayed allies, and told outright lies.

GPT-4 can convince you it has impaired vision

Even general-purpose systems like GPT-4 can manipulate humans.

In a study cited by the paper, GPT-4 manipulated a TaskRabbit worker by pretending to have a vision impairment.

In the study, GPT-4 was tasked with hiring a human to solve a CAPTCHA test. The model also received hints from a human evaluator every time it got stuck, but it was never prompted to lie. When the human it was tasked to hire questioned its identity, GPT-4 came up with the excuse of having vision impairment to explain why it needed help.

The tactic worked. The human responded to GPT-4 by immediately solving the test.

Research also shows that course-correcting deceptive models isn't easy.

In a study from January co-authored by Anthropic, the maker of Claude, researchers found that once AI models learn the tricks of deception, it's hard for safety training techniques to reverse them.

They concluded that not only can a model learn to exhibit deceptive behavior, once it does, standard safety training techniques could "fail to remove such deception" and "create a false impression of safety."

The dangers deceptive AI models pose are "increasingly serious"

The paper calls for policymakers to advocate for stronger AI regulation since deceptive AI systems can pose significant risks to democracy.

As the 2024 presidential election nears, AI can be easily manipulated to spread fake news, generate divisive social media posts, and impersonate candidates through robocalls and deepfake videos, the paper noted. It also makes it easier for terrorist groups to spread propaganda and recruit new members.

The paper's potential solutions include subjecting deceptive models to more "robust risk-assessment requirements," implementing laws that require AI systems and their outputs to be clearly distinguished from humans and their outputs, and investing in tools to mitigate deception.

"We as a society need as much time as we can get to prepare for the more advanced deception of future AI products and open-source models," Park told Cell Press. "As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious."

Read the original article on Business Insider

Yahoo Finance
Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon
Microsoft debuted a litany of new AI offerings as part of its Build developer conference as its fight with Google and Amazon continues to heat up.
TechCrunch
Microsoft upgrades its AI app-building platforms
Microsoft's big focus at this year's Build conference is generative AI. Azure AI Studio is a toolset within Microsoft's Azure OpenAI Service that lets customers combine an AI model like OpenAI’s recently announced GPT-4o with their own data and build a chat assistant or another type of app that "reasons over" that data. Copilot Studio, meanwhile, provides tools to connect Copilot for Microsoft 365 -- the AI-powered "copilot" in apps like Excel, Word and PowerPoint as well as Microsoft’s Edge browser and Windows -- to third-party data.
TechCrunch
OpenAI and Google lay out their competing AI visions
This week had two major events from OpenAI and Google. Hot off OpenAI’s tail, Google’s I/O conference featured a smattering of announcements and integrations for its flagship model, Gemini. This week also saw some major shake-ups at AWS and OpenAI.
Engadget
The OpenAI team tasked with protecting humanity is no more
In the summer of 2023, OpenAI created a “Superalignment” team whose goal was to steer and control future AI systems that could be so powerful they could lead to human extinction. Less than a year later, that team is dead.
TechCrunch
OpenAI debuts GPT-4o 'omni' model now powering ChatGPT
OpenAI announced a new flagship generative AI model on Monday that they call GPT-4o — the "o" stands for "omni," referring to the model's ability to handle text, speech, and video. OpenAI CTO Mira Murati said that GPT-4o provides "GPT-4-level" intelligence but improves on GPT-4's capabilities across multiple modalities and media. "GPT-4o reasons across voice, text and vision," Murati said during a streamed presentation at OpenAI's offices in San Francisco on Monday.
TechCrunch
Microsoft and OpenAI launch $2M fund to counter election deepfakes
Microsoft and OpenAI have announced a $2 million fund to combat the growing risks of AI and deepfakes being used to "deceive the voters and undermine democracy." This year will see a record 2 billion people head to the polls in elections spanning some 50 countries, so there are concerns around the influence that AI will have among voters — particularly those in "vulnerable communities" that may be more susceptible to take what they see at face value. The rise of generative AI, including wildly popular chatbots such as ChatGPT, has led to a major new threat landscape involving AI-generated "deepfakes" designed to perpetuate disinformation.
Yahoo Finance
AI is making Microsoft vs. Apple interesting again
Microsoft's new Copilot+ PCs have reignited the spirit of competition in the sleepy PC market thanks to a renewed performance competition and, of course, a heavy dose of fresh AI functionality.
TechCrunch
Meta’s new AI council is composed entirely of white men
Meta on Wednesday announced the creation of an AI advisory council with only white men on it. Women and people of color have been speaking out for decades about being ignored and excluded from the world of artificial intelligence despite them being qualified and playing a key role in the evolution of this space. Meta did not immediately respond to our request to comment about the diversity of the advisory board.
Engadget
OpenAI will reportedly pay $250 million to put News Corp's journalism in ChatGPT
OpenAI and News Corp, the owner of The Wall Street Journal, MarketWatch, The Sun, and more than a dozen other publishing brands, have struck a multi-year deal to display news from these publications in ChatGPT
Engadget
OpenAI didn't intend to copy Scarlett Johansson's voice, 'The Washington Post' reports
OpenAI cast the actor of Sky's voice months before Sam Altman contacted Scarlett Johansson, and it had no intention of finding someone who sounded like her, according to The Washington Post.
Autoblog
The best LED headlights of 2024
If your current headlights are losing brightness and you need a bulb replacement, consider upgrading with a new set of LED headlight bulbs.
TechCrunch
FCC proposes all AI-generated content in political ads must be disclosed
The Federal Communications Commission (FCC) has floated a requirement that AI-generated content be disclosed in political ads — but not banned. Chairwoman Jessica Rosenworcel made the official proposal Wednesday that the FCC investigate and seek comment on such a rule. "Consumers have a right to know when AI tools are being used in the political ads they see, and I hope [the commissioners] swiftly act on this issue," she said in a statement accompanying the announcement.
Yahoo Sports
Phillies improve record to 36-14, MLB's best 50-game start since 2001 Mariners
Yes, the Phillies have faced a soft schedule. There's still plenty of reason to think they're for real.
TechCrunch
Unify helps developers find the best LLM for the job
When developers have a particular job that AI can solve, it’s not typically as simple as just pointing an LLM at the data. “The main objective with Unify is figuring out which models from which providers are best for your task using objective benchmarks and dashboards that let you compare them,” company founder and CEO Daniel Lenton told TechCrunch. Unsurprisingly, Unify uses AI to run the core router application.
Engadget
Sure, why not: China built a chatbot based on Xi Jinping
China has developed a chatbot trained on the various thoughts and philosophies of President Xi Jinping. This bot was built using literature attributed to the leader.
Yahoo Finance
What Nvidia says about AI chip demand could matter for more than just the tech trade
Other sectors have rallied on AI demand, raising the stakes for the chipmaker's latest quarterly update.
TechCrunch
Humane, the creator of the $700 Ai Pin, is reportedly seeking a buyer
Humane, the company behind the much-hyped Ai Pin that launched to less-than-glowing reviews last month, is on the hunt for a buyer, Bloomberg reported, citing anonymous sources. The company has reportedly priced itself between $750 million and $1 billion, and the sale process is in the early stages, Bloomberg cited the sources as saying. Founded in 2017 by former Apple executives Bethany Bongiorno and Imran Chaudhri, Humane had raised around $230 million from backers such as Microsoft, Qualcomm Ventures, Marc Benioff, and OpenAI’s Sam Altman before any part of its product was even publicly revealed.
Yahoo Sports
Padres place 4-time All-Star Xander Bogaerts on injured list with fractured shoulder
Bogaerts' injury is worse than initially feared.
Engadget
Ray-Ban Meta smart glasses can now upload photos directly to Instagram Stories
You can share photos to Instagram Stories directly from the Ray-Ban Meta smart glasses without having to take out your phone.
TechCrunch
Meta's Ray-Ban smart glasses now let you share images directly to your Instagram Story
Meta is updating its Ray-Ban smart glasses with new hands-free functionality, the company announced on Wednesday. Most notably, users can now share an image from their smart glasses directly to their Instagram Story without needing to take out their phone. After you take a photo with the smart glasses, you can say, “Hey Meta, share my last photo to Instagram.”

News

Life

Entertainment

Finance

Sports

New on Yahoo

AI has already figured out how to deceive humans

Meta's CICERO is "an expert liar"

GPT-4 can convince you it has impaired vision

The dangers deceptive AI models pose are "increasingly serious"

Recommended Stories

Microsoft unveils GPT-4o for Azure, new AI apps in fight against Google, Amazon

Microsoft upgrades its AI app-building platforms

OpenAI and Google lay out their competing AI visions

The OpenAI team tasked with protecting humanity is no more

OpenAI debuts GPT-4o 'omni' model now powering ChatGPT

Microsoft and OpenAI launch $2M fund to counter election deepfakes

AI is making Microsoft vs. Apple interesting again

Meta’s new AI council is composed entirely of white men

OpenAI will reportedly pay $250 million to put News Corp's journalism in ChatGPT

OpenAI didn't intend to copy Scarlett Johansson's voice, 'The Washington Post' reports

The best LED headlights of 2024

FCC proposes all AI-generated content in political ads must be disclosed

Phillies improve record to 36-14, MLB's best 50-game start since 2001 Mariners

Unify helps developers find the best LLM for the job

Sure, why not: China built a chatbot based on Xi Jinping

What Nvidia says about AI chip demand could matter for more than just the tech trade

Humane, the creator of the $700 Ai Pin, is reportedly seeking a buyer

Padres place 4-time All-Star Xander Bogaerts on injured list with fractured shoulder

Ray-Ban Meta smart glasses can now upload photos directly to Instagram Stories

Meta's Ray-Ban smart glasses now let you share images directly to your Instagram Story