An AI worm has been developed to burrow its way into generative AI ecosystems, revealing sensitive data as it spreads

 Dune Awakening MMO.
Dune Awakening MMO.

There's always been something evocative and mildly terrifying about the term "computer worm". The image it conjures of a tunnelling, burrowing creature, spreading its way through your machine and feasting on its insides. Well, just to add a slightly sharper dose of existential dread to proceedings, researchers have developed an AI worm, bringing the term "artificial intelligence" along to the party just for good measure.

One particular worm has been developed by researchers Ben Nassi, Stav Cohen and Rob Bitton, and named Morris II as a reference to the notorious Morris computer worm that rampaged its way around the internet back in the heady computing days of 1988 (via Ars Technica). The AI worm was built with the express purpose of targeting generative AI powered applications, and has been demonstrated attacking an AI email assistant to steal data from messages and send out spam. Lovely.

The worm makes use of what's referred to as an "adversarial self-replicating prompt". A regular prompt triggers an AI model to output data, whereas an adversarial prompt triggers the model under attack to output a prompt of its own. These prompts can be in the form of images or text, that, when entered into a generative AI model, triggers it to output the input prompt.

These prompts can then be used to trigger vulnerable AI models to demonstrate malicious activity, like revealing confidential data, generating toxic content, distributing spam or otherwise, and also create outputs that allow the worm to exploit the generative AI ecosystem behind it to infect new "hosts".

The researchers were able to write an email including an adversarial text prompt, using it to poison the database of an AI email assistant. When the email was later retrieved by a connected retrieval augmented generation service—commonly used by LLMs to gather extra data—to be sent to an LLM, it then effectively "jailbreaks" the Gen-AI service, forcing it to replicate inputs to outputs and allowing the exfiltration of sensitive user data, before going on to infect new hosts.

A secondary method used an image with an embedded malicious prompt to force an AI email assistant to forward further images on to others, creating a self-replicating ouroboros-like nightmare of infected AI ecosystems as it went.

Well, I don't know about you, but I have a headache. Still, the researchers were keen to point out that their work is all about identifying vulnerabilities and "bad architecture design" in generative AI systems that allow these attacks to gain access and self-replicate so effectively.

Peak Storage

SATA, NVMe M.2, and PCIe SSDs on blue background
SATA, NVMe M.2, and PCIe SSDs on blue background

Best SSD for gaming: The best speedy storage today.
Best NVMe SSD: Compact M.2 drives.
Best external hard drives: Huge capacities for less.
Best external SSDs: Plug-in storage upgrades.

For now, this AI worm serves as a model of a potential attack executed within a controlled environment on test systems, and has yet to be seen "in the wild". However, the potential for bad actors to take advantage of these vulnerabilities is clear, so here's hoping that companies building and maintaining generative AI ecosystems like OpenAI and Google take heed of the warnings given by the researchers here.

A large part of the vulnerability exploited is the relative ease with which they could make an AI model perform actions on its own without proper checks and balances, and there are multiple ways this could be mitigated, be they better designed monitoring systems or human beings being kept in the loop to prevent something like this running roughshod over an entire AI ecosystem. For what it's worth, OpenAI did respond to the researchers work by saying that it's working on making its own systems "more resilient" to potential attack.

Bring on Kevin Bacon and a particularly well-placed cliff, that's what I say. You did see Tremors didn't you? Forget it. I give up.