Back

Descriptive Analysis of SARS-CoV-2 Genomics Data from Ambulatory Patients

Ambrose, N.; Amin, A.; Anderson, B.; Bertagnolli, M.; Campion, F.; Chow, D.; Drews, A.; Farris, H.; Gaspar, F. W.; Jones, S.; Korves, T.; Lopansri, B.; Musser, J.; Neumann, E.; O'Horo, J.; Piantadosi, S.; Pritt, B.; Razonable, R. R.; Roberts, S.; Sandmeyer, S.; Stein, D.; Vahidy, F.; Webb, B.; Yttri, J.

2023-05-05 genetic and genomic medicine
10.1101/2023.05.03.23289106 medRxiv
Show abstract

BackgroundThe COVID-19 pandemic has been characterized by ongoing evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), with concomitant variation in viral transmissibility and morbidity. Within specific timeframes and geographic areas, multiple SARS-CoV-2 variants have coexisted in the human population, each characterized by distinct biologic and clinical features, such as varying susceptibility to neutralizing monoclonal antibodies (nMAbs), a major frontline treatment. As part of an observational real-world data study of the effectiveness of nMAbs for treatment of COVID-19, SARS-CoV-2 viral samples were obtained from patients under treatment, generating paired clinical and genomics data. This paper describes the processing pipeline and findings from the genomics portion of this combined data set. MethodsSARS-CoV-2 sequences were generated from 14,796 diagnostic samples from four large U.S. health systems between July 2020 and March 2022. Among nMAbs-treated patients, samples were collected on the same day as, or prior to, treatment with nMAbs. Thus, these samples represent a snapshot of SARS-CoV-2 variants circulating in the respective patient groups, as opposed to variants that arose in response to specific treatments. Health systems collected viral samples and performed library creation and sequencing according to local protocols, using tiled ARTIC amplicon primers. FASTQ files were submitted to a study data platform and processed through a common pipeline. This pipeline enabled a unified approach to quality control, assembly, and production of genomics features for downstream analysis. ResultsAlpha and pre-Alpha SARS-CoV-2 lineages were predominant in the data set prior to June 2021. From June 2021 through November 2021, Delta was the dominant variant. Beginning in December 2021, Omicron was dominant. A variety of mutations associated with decreased nMAbs binding to the spike protein in vitro were detected, including lineage-defining mutations and non-lineage-defining mutations such as E340A, G446V, and S494P. Distinct patterns of sequence gaps and ambiguous base calls were associated with distinct variants. ConclusionsThe distribution of SARS-CoV-2 variants, per WHO nomenclature, across epochs in this data set matched concurrent CDC genomic surveillance results across the U.S. Detection of putative nMAbs escape mutations within clinical samples was consistent with FDA decisions to amend EUAs as variants emerged. This genomics data set provides an opportunity to examine associations between SARS-CoV-2 genomic variation and clinical outcomes in the associated EHR data set. The expansion of real-world data sets such as this to study the relationship between viral sequence and treatment outcomes could provide the foundation for future efforts to achieve near-real-time understanding of clinical outcomes related to genomic variation over time, and evidence to update treatment decisions more rapidly and to greater effect during ongoing and future pandemics.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
22.3%
2
Genetics in Medicine
69 papers in training set
Top 0.2%
8.3%
3
Diagnostic Microbiology and Infectious Disease
21 papers in training set
Top 0.1%
8.1%
4
JAMA Network Open
127 papers in training set
Top 0.3%
7.1%
5
Med
38 papers in training set
Top 0.1%
4.8%
50% of probability mass above
6
PLOS ONE
4510 papers in training set
Top 35%
4.1%
7
Open Forum Infectious Diseases
134 papers in training set
Top 0.6%
3.0%
8
Scientific Reports
3102 papers in training set
Top 46%
2.6%
9
Clinical Chemistry
22 papers in training set
Top 0.3%
2.1%
10
Journal of Clinical Microbiology
120 papers in training set
Top 0.8%
2.1%
11
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.2%
1.7%
12
Clinical Infectious Diseases
231 papers in training set
Top 3%
1.7%
13
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.7%
14
The American Journal of Pathology
31 papers in training set
Top 0.2%
1.6%
15
Nature Communications
4913 papers in training set
Top 55%
1.3%
16
JAMA
17 papers in training set
Top 0.1%
1.3%
17
Cell Reports Medicine
140 papers in training set
Top 5%
1.2%
18
JCI Insight
241 papers in training set
Top 6%
0.9%
19
The Journal of Infectious Diseases
182 papers in training set
Top 4%
0.9%
20
Journal of Clinical Pathology
12 papers in training set
Top 0.5%
0.7%
21
New England Journal of Medicine
50 papers in training set
Top 0.8%
0.7%
22
BMC Medical Genomics
36 papers in training set
Top 1%
0.7%
23
eBioMedicine
130 papers in training set
Top 4%
0.7%
24
BMC Genomics
328 papers in training set
Top 6%
0.7%
25
Eurosurveillance
80 papers in training set
Top 2%
0.7%
26
BMC Public Health
147 papers in training set
Top 6%
0.6%
27
PLOS Genetics
756 papers in training set
Top 17%
0.6%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%
29
Science Translational Medicine
111 papers in training set
Top 7%
0.6%
30
The Lancet Infectious Diseases
71 papers in training set
Top 4%
0.6%