Sequences – COOPER GENERAL NEWS

A pile of early coronavirus data that was missing for a year has emerged from its hiding place.

In June, an American scientist discovered that more than 200 genetic sequences from Covid-19 patient samples isolated in China at the beginning of the pandemic had been mysteriously removed from an online database. Jesse Bloom, a virologist at the Fred Hutchinson Cancer Center in Seattle, used digital research to find 13 of the sequences in Google Cloud.

When Dr. Bloom, sharing his experience in a report published online, wrote that “it seems likely that the sequences were deleted to obscure their existence”.

But now a strange explanation has emerged from an editorial oversight of a scientific journal. And the sequences were uploaded to another database monitored by the Chinese government.

The story began in early 2020 when researchers from Wuhan University explored a new way to test for the deadly coronavirus that is sweeping the country. They sequenced a short section of genetic material from virus samples taken from 34 patients at a Wuhan hospital.

The researchers published their results online in March 2020. That month they also uploaded the sequences to an online database called the Sequence Read Archive, maintained by the National Institutes of Health, and submitted a publication of their results to a scientific journal called. a small one. The paper was published in June 2020.

Dr. Bloom became aware of the Wuhan sequences this spring while researching the origin of Covid-19. While reading a May 2020 report on coronavirus early genetic sequences, he came across a table that noted their presence in the Sequence Read Archive.

But dr. Bloom couldn’t find it in the database. On June 6th, he emailed the Chinese scientists to ask where the data was going, but received no response. On June 22nd, he published his report, which was covered by the New York Times and other media outlets.

At the time, a spokeswoman for the NIH said the study’s authors requested in June 2020 that the sequences be removed from the database. The authors informed the agency that the sequences would be updated and included in a different database. (The authors did not respond to inquiries from The Times.)

But a year later, Dr. Bloom couldn’t find the sequences in any database.

On July 5, more than a year after the researchers removed the sequences from the Sequence Read Archive and two weeks after Dr. Bloom’s report was published online, the sequences were quietly uploaded to a database of the China National Center for Bioinformation by Ben Hu. a researcher at Wuhan University and co-author of the small paper.

On July 21, the disappearance of the sequences was raised during a press conference in Beijing at which Chinese officials denied claims that the pandemic began as a laboratory leak.

According to a translation of the press conference by a journalist from the state-controlled Xinhua News Agency, Vice Minister of China’s National Health Commission Dr. Zeng Yixin that the problems arose when the editors of Small deleted a paragraph in which the scientists described the sequences in the Sequence Read Archive.

“Therefore, the researchers thought that it was no longer necessary to save the data in the NCBI database,” said Dr. Zeng, referring to the Sequence Read Archive published by the NIH. is operated

An editor at Small, who specializes in micro and nano science and is based in Germany, confirmed his presentation. “The data availability declaration was mistakenly deleted,” wrote editor Plamena Dogandzhiyski in an email. “We will shortly issue a fix that will clear up the error and contain a link to the depot where the data is now hosted.”

The Journal published a formal correction to this effect on Thursday.

It is not clear why the authors did not mention the journal’s error when they requested to remove the sequences from the Sequence Read Archive, or why they notified the NIH that the sequences would be updated. It’s also not clear why they waited a year to upload it to another database. Dr. Hu did not respond to an email asking for comment.

Dr. Bloom was also unable to provide an explanation for the conflicting accounts. “I am unable to judge between them,” he said in an interview.

These sequences alone cannot solve the open questions about the origin of the pandemic, be it through contact with a wild animal, a leak from a laboratory or otherwise.

In their first reports, the Wuhan researchers wrote that they extracted genetic material from “samples from outpatients suspected of having Covid-19” at the beginning of the epidemic. But the entries in the Chinese database now suggest they were taken from the Renmin Hospital at Wuhan University on Jan. 30 – almost two months after the earliest reports of Covid-19 in China.

While the disappearance of the sequences appears to be the result of an editorial error, it was Dr. Bloom still worth looking for other coronavirus sequences that might be lurking online. “That definitely means we should keep looking,” he said.

“These additional data will play a big role in that effort,” Dr. Worobey said.

It’s not clear why this valuable information went missing in the first place. Scientists can request that files be deleted by sending an email to the managers of the Sequence Read Archive. The National Library of Medicine, which manages the archive, said that the 13 sequences were removed last summer.

“These SARS-CoV-2 sequences were submitted for posting in SRA in March 2020 and subsequently requested to be withdrawn by the submitting investigator in June 2020,” said Renate Myles, a spokeswoman for the National Institutes of Health.

She said that the investigator, whom she did not name, told the archive managers that the sequences were being updated and would be added to a different database. But Dr. Bloom has searched every database he knows of, and has yet to find them. “Obviously I can’t rule out that the sequences are on some other database or web page somewhere, but I have not been able to find them any of the obvious places I’ve looked,” he said.

Three of the co-authors of the 2020 testing study that produced the 13 sequences did not immediately respond to emails inquiring about Dr. Bloom’s finding. That study did not give contact information for another co-author, Dr. Fu, who was also named on the spreadsheet from the other study.

Some scientists are skeptical that there is anything sinister behind the removal of the sequences. “I don’t really understand how this points to a cover-up,” said Stephen Goldstein, a virologist at the University of Utah.

Dr. Goldstein noted that the testing paper listed the individual mutations the Wuhan researchers found in their tests. Although the full sequences are no longer in the archive, the key information has been public for over a year, he said. It was just tucked away in a format that is hard for researchers to find.

“We all missed this relatively obscure paper,” Dr. Goldstein said.

“You can’t really say why they were removed,” Dr. Bloom acknowledged in an interview. “You can say that the practical consequence of removing them was that people didn’t notice they existed.” He also noted that the Chinese government ordered the destruction of a number of early samples of the virus and barred the publication of papers on the coronavirus without its approval.