Viruses are doing mysterious things everywhere – AI can help researchers understand what they’re up to in the oceans and in your gut

Libusha Kelly, Albert Einstein College of Medicine

May 15, 2024 at 8:16 AM·5 min read

Many viral genetic sequences code for proteins that researchers haven't seen before. <a href="https://www.gettyimages.com/detail/illustration/and-viruses-illustration-royalty-free-illustration/1311468425" rel="nofollow noopener" target="_blank" data-ylk="slk:KTSDesign/Science Photo Library via Getty Images;elm:context_link;itc:0;sec:content-canvas" class="link ">KTSDesign/Science Photo Library via Getty Images</a> — Many viral genetic sequences code for proteins that researchers haven't seen before. KTSDesign/Science Photo Library via Getty Images

Viruses are a mysterious and poorly understood force in microbial ecosystems. Researchers know they can infect, kill and manipulate human and bacterial cells in nearly every environment, from the oceans to your gut. But scientists don’t yet have a full picture of how viruses affect their surrounding environments in large part because of their extraordinary diversity and ability to rapidly evolve.

Communities of microbes are difficult to study in a laboratory setting. Many microbes are challenging to cultivate, and their natural environment has many more features influencing their success or failure than scientists can replicate in a lab.

So systems biologists like me often sequence all the DNA present in a sample – for example, a fecal sample from a patient – separate out the viral DNA sequences, then annotate the sections of the viral genome that code for proteins. These notes on the location, structure and other features of genes help researchers understand the functions viruses might carry out in the environment and help identify different kinds of viruses. Researchers annotate viruses by matching viral sequences in a sample to previously annotated sequences available in public databases of viral genetic sequences.

However, scientists are identifying viral sequences in DNA collected from the environment at a rate that far outpaces our ability to annotate those genes. This means researchers are publishing findings about viruses in microbial ecosystems using unacceptably small fractions of available data.

To improve researchers’ ability to study viruses around the globe, my team and I have developed a novel approach to annotate viral sequences using artificial intelligence. Through protein language models akin to large language models like ChatGPT but specific to proteins, we were able to classify previously unseen viral sequences. This opens the door for researchers to not only learn more about viruses, but also to address biological questions that are difficult to answer with current techniques.

Annotating viruses with AI

Large language models use relationships between words in large datasets of text to provide potential answers to questions they are not explicitly “taught” the answer to. When you ask a chatbot “What is the capital of France?” for example, the model is not looking up the answer in a table of capital cities. Rather, it is using its training on huge datasets of documents and information to infer the answer: “The capital of France is Paris.”

Similarly, protein language models are AI algorithms that are trained to recognize relationships between billions of protein sequences from environments around the world. Through this training, they may be able to infer something about the essence of viral proteins and their functions.

We wondered whether protein language models could answer this question: “Given all annotated viral genetic sequences, what is this new sequence’s function?”

In our proof of concept, we trained neural networks on previously annotated viral protein sequences in pre-trained protein language models and then used them to predict the annotation of new viral protein sequences. Our approach allows us to probe what the model is “seeing” in a particular viral sequence that leads to a particular annotation. This helps identify candidate proteins of interest either based on their specific functions or how their genome is arranged, winnowing down the search space of vast datasets.

Microscopy image of spherical bacteria colored bright green — *Prochlorococcus* is one of the many species of marine bacteria with proteins that researchers haven’t seen before. Anne Thompson/Chisholm Lab, MIT via Flickr

By identifying more distantly related viral gene functions, protein language models can complement current methods to provide new insights into microbiology. For example, my team and I were able to use our model to discover a previously unrecognized integrase – a type of protein that can move genetic information in and out of cells – in the globally abundant marine picocyanobacteria Prochlorococcus and Synechococcus. Notably, this integrase may be able to move genes in and out of these populations of bacteria in the oceans and enable these microbes to better adapt to changing environments.

Our language model also identified a novel viral capsid protein that is widespread in the global oceans. We produced the first picture of how its genes are arranged, showing it can contain different sets of genes that we believe indicates this virus serves different functions in its environment.

These preliminary findings represent only two of thousands of annotations our approach has provided.

Analyzing the unknown

Most of the hundreds of thousands of newly discovered viruses remain unclassified. Many viral genetic sequences match protein families with no known function or have never been seen before. Our work shows that similar protein language models could help study the threat and promise of our planet’s many uncharacterized viruses.

While our study focused on viruses in the global oceans, improved annotation of viral proteins is critical for better understanding the role viruses play in health and disease in the human body. We and other researchers have hypothesized that viral activity in the human gut microbiome might be altered when you’re sick. This means that viruses may help identify stress in microbial communities.

However, our approach is also limited because it requires high-quality annotations. Researchers are developing newer protein language models that incorporate other “tasks” as part of their training, particularly predicting protein structures to detect similar proteins, to make them more powerful.

Making all AI tools available via FAIR Data Principles – data that is findable, accessible, interoperable and reusable – can help researchers at large realize the potential of these new ways of annotating protein sequences leading to discoveries that benefit human health.

This article is republished from The Conversation, a nonprofit, independent news organization bringing you facts and trustworthy analysis to help you make sense of our complex world. It was written by: Libusha Kelly, Albert Einstein College of Medicine

Read more:

Libusha Kelly receives funding from the National Institutes of Health.

Yahoo Sports
Stetson Bennett says his missed season was due to mental health after returning to Rams
Bennett missed last season and the Rams wouldn't say why.
Yahoo Sports
NBA playoffs: Karl-Anthony Towns, Timberwolves finally grab win over Mavericks to avoid sweep
The Timberwolves, after falling down 0-3 in the series, have forced a Game 5 in the Western Conference finals.
TechCrunch
AI models have favorite numbers, because they think they're people
AI models are always surprising us, not just in what they can do, but what they can't, and why. An interesting new behavior is both superficial and revealing about these systems: they pick random numbers as if they're human beings, which is to say, badly. This is actually a very old and well known limitation that we, humans, have: we overthink and misunderstand randomness.
Yahoo News
Trump trial updates: Closing arguments concluded, jury in hush money case will begin deliberations on Wednesday
After more than 10 hours in court on Tuesday hearing closing arguments in former President Donald Trump’s trial on charges he falsified business records to conceal a hush money payment to adult film actress Stormy Daniels, jurors will begin their deliberations on a verdict on Wednesday.
Engadget
OpenAI's board allegedly learned about ChatGPT launch on Twitter
“[The] board was not informed in advance of that,” Toner said on Tuesday on a podcast called The Ted AI Show. “We learned about ChatGPT on Twitter.”
Yahoo Life
Actress Judi Dench says she 'can't even see' due to macular degeneration. Here's what to know about the leading cause of vision loss for people over 50.
The eye condition causes progressive sight loss in the center of vision.
TechCrunch
Anthropic hires former OpenAI safety lead to head up new team
Jan Leike, a leading AI researcher who earlier this month resigned from OpenAI before publicly criticizing the company's approach to AI safety, has joined OpenAI rival Anthropic to lead a new "superalignment" team. In a post on X, Leike said that his team at Anthropic will focus on various aspects of AI safety and security, specifically "scalable oversight," "weak-to-strong generalization" and automated alignment research. In many ways, Leike's team sounds similar in mission to OpenAI's recently dissolved Superalignment team.
Yahoo Music
Darius Rucker says Hootie & the Blowfish bandmates tried to 'outparty' each other: 'That was just how we lived'
Rucker, who has also launched a successful solo career as a country artist, has written a dishy new memoir.
Yahoo Personal Finance
FHA vs. conventional loan: Which should you choose?
The main differences between FHA vs. conventional loans are credit scores, down payments, and mortgage insurance. Learn which type of mortgage is right for you.
Engadget
VR classics Job Simulator and Vacation Simulator come to Apple Vision Pro
Job Simulator and Vacation Simulator have been released for the Apple Vision Pro. This is a version developed specifically for the platform with optimized hand-and-eye tracking.
Yahoo Life Shopping
'The sports car of stand-up weeders': This lightweight gadget helps you groom your grass without bending down
It's the lawn-care equivalent of popping a pimple: Low effort and oh-so-satisfying.
Yahoo Life Shopping
Cher loves these 'bootyfull' wide-leg pants from Amazon, and they're down to just $20
More than 28,000 shoppers agree with the legendary performer — and at over 30% off, the savings are un-'Believe'-able.
Engadget
Ooni's larger, dual-zone Koda 2 Max pizza oven is now available for pre-order
Ooni's largest pizza oven yet allows you to monitor food and ambient temps from your phone. It's now available for pre-order and ships in July.
Yahoo Life Shopping
Martha Stewart 'never leaves the house' without this tinted sunscreen, recommended by her dermatologist
It protects skin from UVA and UVB rays, but that's not all — it's also formulated to give you a glowy, even complexion.
Engadget
Opera is adding Google's Gemini AI to its browser
Opera ha teamed up with Google to integrate its Gemini AI models into its Aria AI browser assistant.
Yahoo Life Shopping
Hundreds of tummy-control swimsuits are on sale at Amazon — these are the most flattering, and they're under $40
Make a splash in these fan-favorite one-pieces.
Autoblog
Ford previews Pikes Peak Hill Climb-bound F-150 Lightning
Ford previewed the heavily-modified F-150 Lightning it's entering in the 2024 Pikes Peak Hill Climb. French pilot Romain Dumas will race the truck.
Yahoo Life Shopping
Fans are noticing 'quite a bit of growth' thanks to this $8 lash serum — at 65% off, it's at an all-time low
'I can see a definite difference,' says one user — it works on brows too.
Autoblog
BYD unveils new plug-in hybrid tech — with a claimed 1,200-mile range
BYD's latest plug-in hybrid technology, launched in sedan versions of its Qin L and Seal 06 models, is priced from 99,800 yuan ($13,775).
Yahoo Sports
Lexi Thompson, 29, set to retire after the 2024 LPGA season
Thompson will be competing in her 18th straight U.S. Women's Open later this week.

News

Life

Entertainment

Finance

Sports

New on Yahoo

Viruses are doing mysterious things everywhere – AI can help researchers understand what they’re up to in the oceans and in your gut

Annotating viruses with AI

Analyzing the unknown

Recommended Stories

Stetson Bennett says his missed season was due to mental health after returning to Rams

NBA playoffs: Karl-Anthony Towns, Timberwolves finally grab win over Mavericks to avoid sweep

AI models have favorite numbers, because they think they're people

Trump trial updates: Closing arguments concluded, jury in hush money case will begin deliberations on Wednesday

OpenAI's board allegedly learned about ChatGPT launch on Twitter

Actress Judi Dench says she 'can't even see' due to macular degeneration. Here's what to know about the leading cause of vision loss for people over 50.

Anthropic hires former OpenAI safety lead to head up new team

Darius Rucker says Hootie & the Blowfish bandmates tried to 'outparty' each other: 'That was just how we lived'

FHA vs. conventional loan: Which should you choose?

VR classics Job Simulator and Vacation Simulator come to Apple Vision Pro

'The sports car of stand-up weeders': This lightweight gadget helps you groom your grass without bending down

Cher loves these 'bootyfull' wide-leg pants from Amazon, and they're down to just $20

Ooni's larger, dual-zone Koda 2 Max pizza oven is now available for pre-order

Martha Stewart 'never leaves the house' without this tinted sunscreen, recommended by her dermatologist

Opera is adding Google's Gemini AI to its browser

Hundreds of tummy-control swimsuits are on sale at Amazon — these are the most flattering, and they're under $40

Ford previews Pikes Peak Hill Climb-bound F-150 Lightning

Fans are noticing 'quite a bit of growth' thanks to this $8 lash serum — at 65% off, it's at an all-time low

BYD unveils new plug-in hybrid tech — with a claimed 1,200-mile range

Lexi Thompson, 29, set to retire after the 2024 LPGA season