Deleted gene sequences confirm coronavirus circulated before Wuhan seafood market

The virus that causes COVID-19 did not originate at the Wuhan seafood market, confirms a new study of deleted gene sequences from the virus' earliest days.

The sequences had been posted to a website run by the National Institutes of Health but were removed for unknown reasons.

Finding earlier sequences like these might help reveal new insights into the SARS-CoV-2 virus' earliest days, said Jesse Bloom, the article's author, who studies viral evolution at the Fred Hutchinson Cancer Research Center in Seattle.

His new report, which has not yet been peer-reviewed, does not suggest an answer to the question of whether the SARS-CoV-2 virus jumped directly from animals to people or whether it was accidently leaked from a research lab in Wuhan, China.

"I hope scientifically we can get beyond just arguing about that," said Bloom, who in May joined a global call for more information about the earliest days of the outbreak.

But by studying how the viral genes mutate over time, researchers such as Bloom can reconstruct their history, figuring out which cases came first and how the virus changed as it moved through the population.

"These sequences are informative for understanding early SARS-CoV-2 spread in Wuhan," Bloom said. "They're not transformative, but they fill in some really important gaps."

Bloom started out trying to find early genetic sequences of the virus published by other researchers so he could analyze them himself.

In looking at one paper from March 2020, he saw evidence of a group of 45 sequences that he hadn't seen elsewhere. When he looked for those sequences, he realized they had been deleted from a National Institutes of Health repository.

Bloom knows that the deletion will raise suspicions in the public, but he says there are many reasons a researcher might ask for material to be taken offline, including the fact that the week the study was posted, the Chinese government instituted a requirement that it review all scientific information related to SARS-CoV-2 before publication.

Bloom reached out to two Chinese researchers to ask why they wanted the information withdrawn but has not heard back.

In a statement, the NIH said the agency is aware of the decision and has reviewed the request. The data was submitted in March 2020 and was asked to be withdrawn in June of the same year.

"The requestor indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA (the Sequence Read Archive) to avoid version control issues," the NIH said in a statement. "Submitting investigators hold the rights to their data and can request withdrawal of the data."

But Bloom also knew that although the sequences might have been deleted from the NIH database, they had likely been uploaded to the cloud and might still be findable there. And they were.

He was able to analyze 13 of the 45 sequences and noted their differences from those that had already been examined. Looking at the changes, he was able to confirm that sequences taken from people infected at the Wuhan seafood market, where coronavirus infections were once thought to have begun, could not have been the earliest cases.

The missing sequences also help explain why it has been so hard to figure out when and where the virus first began, Bloom said.

"This suggests we need to be really critical in going back and analyzing from primary data as much as possible," he said.

It's not clear why early sequences are missing, Bloom said, though it's possible that the government simply doesn't like to release information publicly. He's hopeful other early sequences can be found somewhere without relying on the Chinese government.

Other researchers agreed it's important to track down other early sequences.

"This line of inquiry may help us determine the origin of the virus and reconstruct how it spread in the earliest days of the pandemic," said Dr. W. Ian Lipkin, an epidemiologist and infectious disease expert at the Mailman School of Public Health at Columbia University.

"We also need to know how and why the sequences were removed. Was this a local, regional or national decision?" Lipkin said.

Sergei Pond, an expert in analyzing sequences at Temple University in Philadelphia, praised Bloom's research, saying it confirmed some of his own work and offered ideas for finding more early sequences.

Genetic sequences taken from people who caught the virus at the Wuhan seafood market likely had three mutations from the original viral sequence, Pond and Bloom both estimate. "For SARS-CoV-2, 3 mutations translate to about 4-6 weeks of evolution (roughly). Hence the Wuhan market is not a plausible single origin source," Pond said.

Pond agreed there was nothing nefarious about the decision to delete data warehoused on a site run by the U.S. government.

"They often receive requests to delete records for all kinds of reasons (e.g. wrong data submitted, sample contamination, technical errors, duplicate submissions), and honor them. This is routine," Pond said via email. "In fact, they must delete data if requested by the submitter."

But Dr. Eric Topol, founder and director of the Scripps Research Translational Institute in La Jolla, California, said he's concerned about the way people will react to Bloom's findings.

"He has no evidence to prove that this was done in a deliberate way, but it raises suspicions in the way it’s communicated," Topol said.

He's also dubious that the mystery will be solved. "It’s unlikely we’re ever going to get to the bottom of this," Topol said. "Finding the true zoonotic source is never easy. It can take a while, or we may never get there."

If the virus did transfer directly from bats or another animal to humans, as many virologists believe, it should be a simple matter to prove: the Chinese government could make available genetic sequences of the SARS-CoV-2 virus from the people infected earliest in the pandemic.

Those sequences almost certainly exist within China, or they could be easily created using stored nasal swab samples from people who fell ill early in the pandemic, experts say.

"It would be easy for (the Chinese) to refute this," Topol said. "Bloom gave them an opening, but whether they’ll ever come forth we can’t know."

"Until they become transparent," Topol added, "this is going to stay unsettled."

Contact Karen Weintraub at and Elizabeth Weise at

Health and patient safety coverage at USA TODAY is made possible in part by a grant from the Masimo Foundation for Ethics, Innovation and Competition in Healthcare. The Masimo Foundation does not provide editorial input.

This article originally appeared on USA TODAY: Coronavirus origins: Deleted gene sequences offer clues to early days