New Anthropic Research Sheds Light on AI's 'Black Box'

Lucas Ropek

May 21, 2024 at 3:10 PM·3 min read

Photo: Andrej Sokolow/picture alliance (Getty Images)

Despite the fact that they’re created by humans, large language models are still quite mysterious. The high-octane algorithms that power our current artificial intelligence boom have a way of doing things that aren’t outwardly explicable to the people observing them. This is why AI has largely been dubbed a “black box,” a phenomenon that isn’t easily understood from the outside.

Newly published research from Anthropic, one of the top companies in the AI industry, attempts to shed some light on the more confounding aspects of AI’s algorithmic behavior. On Tuesday, Anthropic published a research paper designed to explain why its AI chatbot, Claude, chooses to generate content about certain subjects over others.

AI systems are set up in a rough approximation of the human brain—layered neural networks that intake and process information and then make “decisions” or predictions based on that information. Such systems are “trained” on large subsets of data, which allows them to make algorithmic connections. When AI systems output data based on their training, however, human observers don’t always know how the algorithm arrived at that output.

This mystery has given rise to the field of AI “interpretation,” where researchers attempt to trace the path of the machine’s decision-making so they can understand its output. In the field of AI interpretation, a “feature” refers to a pattern of activated “neurons” within a neural net—effectively a concept that the algorithm may refer back to. The more “features” within a neural net that researchers can understand, the more they can understand how certain inputs trigger the net to affect certain outputs.

In a memo on its findings, Anthropic researchers explain how they used a process known as “dictionary learning” to decipher what parts of Claude’s neural network mapped to specific concepts. Using this method, researchers say they were able to “begin to understand model behavior by seeing which features respond to a particular input, thus giving us insight into the model’s ‘reasoning’ for how it arrived at a given response.”

In an interview with Anthropic’s research team conducted by Wired’s Steven Levy, staffers explained what it was like to decipher how Claude’s “brain” works. Once they had figured out how to decrypt one feature, it led to others:

One feature that stuck out to them was associated with the Golden Gate Bridge. They mapped out the set of neurons that, when fired together, indicated that Claude was “thinking” about the massive structure that links San Francisco to Marin County. What’s more, when similar sets of neurons fired, they evoked subjects that were Golden Gate Bridge-adjacent: Alcatraz, California Governor Gavin Newsom, and the Hitchcock movie Vertigo, which was set in San Francisco. All told the team identified millions of features—a sort of Rosetta Stone to decode Claude’s neural net.

It should be noted that Anthropic, like other for-profit companies, could have certain, business-related motivations for writing and publishing its research in the way that it has. That said, the team’s paper is public, which means that you can go read it for yourself and make your own conclusions about their findings and methodologies.

For the latest news, Facebook, Twitter and Instagram.

TechCrunch
Anthropic hires former OpenAI safety lead to head up new team
Jan Leike, a leading AI researcher who earlier this month resigned from OpenAI before publicly criticizing the company's approach to AI safety, has joined OpenAI rival Anthropic to lead a new "superalignment" team. In a post on X, Leike said that his team at Anthropic will focus on various aspects of AI safety and security, specifically "scalable oversight," "weak-to-strong generalization" and automated alignment research. In many ways, Leike's team sounds similar in mission to OpenAI's recently dissolved Superalignment team.
TechCrunch
Anthropic hires Instagram co-founder as head of product
Mike Krieger, one of the co-founders of Instagram and, more recently, the co-founder of personalized news app Artifact (which TechCrunch corporate parent Yahoo recently acquired), is joining Anthropic as the company's first chief product officer. As CPO, Krieger will oversee Anthropic's product engineering, management and design efforts, Anthropic says, as the company works to expand its suite of AI apps and bring Claude, its generative AI technology, to a wider audience.
TechCrunch
Anthropic is expanding to Europe and raising more money
On the heels of OpenAI announcing the latest iteration of its GPT large language model, its biggest rival in generative AI in the U.S. announced an expansion of its own. Anthropic said Monday that Claude, its AI assistant, is now live in Europe with support for "multiple languages," including French, German, Italian and Spanish across Claude.ai, its iOS app and its business plan for teams. The launch comes after Anthropic extended its API to Europe to get developers using and integrating its models.
TechCrunch
UK opens office in San Francisco to tackle AI risk
Ahead of the AI safety summit kicking off in Seoul, South Korea later this week, its co-host, the United Kingdom, is expanding its own efforts in the field. The AI Safety Institute, a U.K. body set up in November 2023 with the ambitious goal of assessing and addressing risks in AI platforms, has said it will open a second location in San Francisco. The Bay Area is the home of companies like OpenAI, Anthropic, Google and Meta that are building foundational AI technology.
Yahoo Finance
Tech companies bet on carbon removal startups as AI tests climate goals
Carbon removal technologies are becoming increasingly important for companies, particularly for tech giants locked in a fierce battle to become the leader in artificial intelligence.
TechCrunch
Ashby injects recruiting with a dose of AI
Benjamin Encz's path to entrepreneurship was long and unusual. Having previously worked as an R&D engineer at FX companies Industrial Light & Magic and DreamWorks on films like "Transformers" and "How to Train Your Dragon," Encz left the film industry in 2012 to join VC firm Social Capital as an engineer in residence. "The industry has gone from companies rapidly growing headcount to suddenly retracting, changing the dynamic of the talent market and how talent acquisition teams need to adjust," Encz told TechCrunch.
Yahoo Finance
New jobs report report kicks off new month of trading: What to know this week
The June jobs report comes as the latest stock market rally took a breather to end May.
Yahoo Finance
Here's what's really bothering me about the exploding Nasdaq
Not everything is looking great within the record-setting Nasdaq.
Engadget
The ASUS Zenbook S16 laptop boasts an ultra-thin design and AMD's latest AI chip
The latest ASUS Zenbook S16 laptop boasts a revamped cooling system. It’s also lighter and thinner than ever.
Engadget
The Morning After: Starliner’s crewed flight gets scrubbed
The biggest news stories this morning: This tool unlocks Windows’ AI-powered Recall feature for unsupported PCs, Marvel’s “What If...?” for Apple Vision Pro looks incredible, but plays terribly, The IRS is making its free Turbo Tax alternative permanent.
TechCrunch
Inside Apple’s efforts to build a better recycling robot
Last week, TechCrunch paid a visit to Apple's Austin, Texas manufacturing facilities. In recent years, the capital city has transformed into a hot bed for tech innovation, largely owing to a massive talent pool generated by nearby University of Texas at Austin. Just ahead of the pandemic, Apple confirmed that it would also be producing that model’s successor in the city.
TechCrunch
Spotify to increase premium pricing in the US to $11.99 per month
Spotify has announced that it's hiking subscriptions for customers in the U.S., the second such price increase in the space of a year. The music-streaming giant reports that premium pricing will increase in July from $10.99 to $11.99, representing a near-10% rise. The Duo and Family plans will go up to $16.99 and $19.99, representing a $2 and $3 increase respectively, while the student plan will remain at $5.99 per month.
Yahoo Finance
Ford CEO on the future of EVs, Detroit, and his relationship with Tesla's Elon Musk
Ford CEO Jim Farley sits down for a new edition of Yahoo Finance's 'Opening Bid' podcast, sharing why the auto giant has spent $1 billion to rebuild a Detroit landmark, and why he remains bullish on EVs.
Yahoo Finance
Beware the retirement savings 'time bomb,' tax expert warns
Taxes are the "retirement time bomb," according to one tax expert. Here's what you can do now.
TechCrunch
Temasek, Fidelity buy $200M stake in Lenskart at $5B valuation
Temasek and Fidelity have purchased shares worth $200 million in Indian eyewear retailer Lenskart, according to a statement by the startup's financial advisor, Avendus. The transaction values Lenskart at $5 billion, the startup's co-founder and chief executive, Peyush Bansal, told TechCrunch in a text message. Avendus, which also advised selling shareholders on the deal, didn't name the investors who sold the shares.
Yahoo Sports
NHL playoffs: Oilers close out Stars to reach Stanley Cup Final, set up series with Panthers
The Oilers are headed to the Stanley Cup Final for the first time in nearly two decades.
Yahoo Sports
White Sox's Tommy Pham says he's always prepared to 'f*** somebody up' after confrontation with Brewers' William Contreras
Chicago White Sox outfielder Tommy Pham told reporters he's always prepared to fight after an on-field confrontation with Milwaukee Brewers catcher William Contreras.
Yahoo Sports
NASCAR: Austin Cindric wins at Gateway after Ryan Blaney's car slows on the final lap
Blaney appeared to run out of gas as he took the white flag.
Yahoo Sports
Coco Gauff calls out French Open, tennis organizers over late match schedules: 'It's not healthy'
Novak Djokovic's latest win at the French Open didn't finish until 3:07 a.m. local time in the latest example of a long-running match in the sport.
Autoblog
Hyundai Ioniq 5 N Time Attack duo headline Pikes Peak effort
The Hyundai Ioniq 5 N TA Spec EV will attempt to put Hyundai's name next to the Pikes Peak record in the Electric Modified SUV/Crossover class.

News

Life

Entertainment

Finance

Sports

New on Yahoo

New Anthropic Research Sheds Light on AI's 'Black Box'

Recommended Stories

Anthropic hires former OpenAI safety lead to head up new team

Anthropic hires Instagram co-founder as head of product

Anthropic is expanding to Europe and raising more money

UK opens office in San Francisco to tackle AI risk

Tech companies bet on carbon removal startups as AI tests climate goals

Ashby injects recruiting with a dose of AI

New jobs report report kicks off new month of trading: What to know this week

Here's what's really bothering me about the exploding Nasdaq

The ASUS Zenbook S16 laptop boasts an ultra-thin design and AMD's latest AI chip

The Morning After: Starliner’s crewed flight gets scrubbed

Inside Apple’s efforts to build a better recycling robot

Spotify to increase premium pricing in the US to $11.99 per month

Ford CEO on the future of EVs, Detroit, and his relationship with Tesla's Elon Musk

Beware the retirement savings 'time bomb,' tax expert warns

Temasek, Fidelity buy $200M stake in Lenskart at $5B valuation

NHL playoffs: Oilers close out Stars to reach Stanley Cup Final, set up series with Panthers

White Sox's Tommy Pham says he's always prepared to 'f*** somebody up' after confrontation with Brewers' William Contreras

NASCAR: Austin Cindric wins at Gateway after Ryan Blaney's car slows on the final lap

Coco Gauff calls out French Open, tennis organizers over late match schedules: 'It's not healthy'

Hyundai Ioniq 5 N Time Attack duo headline Pikes Peak effort