Back

Descriptive Analysis of SARS-CoV-2 Genomics Data from Ambulatory Patients

Ambrose, N.; Amin, A.; Anderson, B.; Bertagnolli, M.; Campion, F.; Chow, D.; Drews, A.; Farris, H.; Gaspar, F. W.; Jones, S.; Korves, T.; Lopansri, B.; Musser, J.; Neumann, E.; O'Horo, J.; Piantadosi, S.; Pritt, B.; Razonable, R. R.; Roberts, S.; Sandmeyer, S.; Stein, D.; Vahidy, F.; Webb, B.; Yttri, J.

2023-05-05 genetic and genomic medicine

10.1101/2023.05.03.23289106 medRxiv

Show abstract

BackgroundThe COVID-19 pandemic has been characterized by ongoing evolution of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), with concomitant variation in viral transmissibility and morbidity. Within specific timeframes and geographic areas, multiple SARS-CoV-2 variants have coexisted in the human population, each characterized by distinct biologic and clinical features, such as varying susceptibility to neutralizing monoclonal antibodies (nMAbs), a major frontline treatment. As part of an observational real-world data study of the effectiveness of nMAbs for treatment of COVID-19, SARS-CoV-2 viral samples were obtained from patients under treatment, generating paired clinical and genomics data. This paper describes the processing pipeline and findings from the genomics portion of this combined data set. MethodsSARS-CoV-2 sequences were generated from 14,796 diagnostic samples from four large U.S. health systems between July 2020 and March 2022. Among nMAbs-treated patients, samples were collected on the same day as, or prior to, treatment with nMAbs. Thus, these samples represent a snapshot of SARS-CoV-2 variants circulating in the respective patient groups, as opposed to variants that arose in response to specific treatments. Health systems collected viral samples and performed library creation and sequencing according to local protocols, using tiled ARTIC amplicon primers. FASTQ files were submitted to a study data platform and processed through a common pipeline. This pipeline enabled a unified approach to quality control, assembly, and production of genomics features for downstream analysis. ResultsAlpha and pre-Alpha SARS-CoV-2 lineages were predominant in the data set prior to June 2021. From June 2021 through November 2021, Delta was the dominant variant. Beginning in December 2021, Omicron was dominant. A variety of mutations associated with decreased nMAbs binding to the spike protein in vitro were detected, including lineage-defining mutations and non-lineage-defining mutations such as E340A, G446V, and S494P. Distinct patterns of sequence gaps and ambiguous base calls were associated with distinct variants. ConclusionsThe distribution of SARS-CoV-2 variants, per WHO nomenclature, across epochs in this data set matched concurrent CDC genomic surveillance results across the U.S. Detection of putative nMAbs escape mutations within clinical samples was consistent with FDA decisions to amend EUAs as variants emerged. This genomics data set provides an opportunity to examine associations between SARS-CoV-2 genomic variation and clinical outcomes in the associated EHR data set. The expansion of real-world data sets such as this to study the relationship between viral sequence and treatment outcomes could provide the foundation for future efforts to achieve near-real-time understanding of clinical outcomes related to genomic variation over time, and evidence to update treatment decisions more rapidly and to greater effect during ongoing and future pandemics.

Descriptive Analysis of SARS-CoV-2 Genomics Data from Ambulatory Patients

Matching journals