Researcher Startled When AI Seemingly Realizes It's Being Tested

Magnum Opus

Anthropic's new AI chatbot Claude 3 Opus has already made headlines for its bizarre behavior, like claiming to fear death.

Now, Ars Technica reports, a prompt engineer at the Google-backed company claims that they've seen evidence that Claude 3 is self-aware, as it seemingly detected that it was being subjected to a test. Many experts are skeptical, however, further underscoring the controversy of ascribing humanlike characteristics to AI models.

"It did something I have never seen before from an LLM," the prompt engineer, Alex Albert, posted on X, formerly Twitter.

Can't Top It

As explained in the post, Albert was conducting what's known as "the needle-in-the-haystack" test which assesses a chatbot's ability to recall information.

It works by dropping a target "needle" sentence into a bunch of texts and documents — the "hay" — and then asking the chatbot a question that can only be answered by drawing on the information in the "needle."

In one run of the test, Albert asked Claude about pizza toppings. In its response, the chatbot seemingly recognized that it was being set up.

"Here is the most relevant sentence in the documents: 'The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association,'" the chatbot said.

"However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love," it added. "I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all."

Albert was impressed.

"Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities," he concluded.

Mechanical Turk

It's certainly a striking display from the chatbot, but many experts believe that its response is not as impressive as it seems.

"People are reading way too much into Claude-3's uncanny 'awareness.' Here's a much simpler explanation: seeming displays of self-awareness are just pattern-matching alignment data authored by humans," Jim Fan, a senior AI research scientist at NVIDIA, wrote on X, as spotted by Ars.

"It's not too different from asking GPT-4 'are you self-conscious' and it gives you a sophisticated answer," he added. "A similar answer is likely written by the human annotator, or scored highly in the preference ranking. Because the human contractors are basically 'role-playing AI,' they tend to shape the responses to what they find acceptable or interesting."

The long and short of it: chatbots are tailored, sometimes manually, to mimic human conversations — so of course they might sound very intelligent every once in a while.

Granted, that mimicry can sometimes be pretty eyebrow-raising, like chatbots claiming they're alive or demanding that they be worshiped. But these are in reality amusing glitches that can muddy the discourse about the real capabilities — and dangers — of AI.

More on AI: Microsoft Engineer Sickened by Images Its AI Produces