Back

A Pan-Coronavirus Vaccine Candidate: Nine Amino Acid Substitutions in the ORF1ab Gene Attenuate 99% of 365 Unique Coronaviruses: A Comparative Effectiveness Research Study

Luellen, E.

2022-04-28 synthetic biology
10.1101/2022.04.28.489618 bioRxiv
Show abstract

BackgroundThe COVID-19 pandemic has been a watershed event. Industry and governments have reacted, investing over US$105 billion in vaccine research.1 The Holy Grail is a universal, pan-coronavirus, vaccine to protect humankind from future SARS-CoV-2 variants and the thousands of similar coronaviruses with pandemic potential.2 This paper proposes a new vaccine candidate that appears to attenuate the SARS-Cov-2 coronavirus variants to render it safe to use as a vaccine. Moreover, these results indicate it may be efficacious against 99% of 365 coronaviruses. This research model is wet-dry-wet; it originated in genomic sequencing laboratories, evolved to computational modeling, and the candidate result now require validation back in a wet lab. ObjectivesThis studys purpose was to test the hypothesis that machine learning applied to sequenced coronaviruses genomes could identify which amino acid substitutions likely attenuate the viruses to produce a safe and effective pan-coronavirus vaccine candidate. This candidate is now eligible to be pre-clinically then clinically tested and proven. If validated, it would constitute a traditional attenuated virus vaccine to protect against hundreds of coronaviruses, including the many future variants of SARS-CoV-2 predicted from continuously recombining in unvaccinated populations and spreading by modern mass travel. MethodsUsing machine learning, this was an in silico comparative effectiveness research study on trinucleotide functions in nonstructural proteins of 365 novel coronavirus genomes. Sequences of 7,097 codons in the ORF1ab gene were collected from 65 global locations infecting 68 species and reported to the US National Institute of Health. The data were proprietarily transformed twice to enable machine learning ingestion, mapping, and interpretation. The set of 2,590,405 data points was randomly divided into three cohorts: 255 (70%) observations for training; and two cohorts of 55 (15%) observations each for testing. Machine learning models were trained in the statistical programming language R and compared to identify which mixture of the 7.097 x 1023 possible amino-acid-location combinations would attenuate SARS-CoV-2 and other coronaviruses that have infected humans. ResultsContests of machine-learning algorithms identified nine amino-acid point substitutions in the ORF1ab gene that likely attenuate 98.98% of 365 (361) novel coronaviruses. Notably, seven substitutions are for the amino acid alanine. Most of the locations (5 of 9) are in nonstructural proteins (NSPs) 2 and 3. The substitutions are alanine to (1) valine at codon 4273; (2) leucine at codon 5077; (3) phenylalanine at codon 2001; (4) leucine at codon 372; (5) proline at codon 354; (6) phenylalanine at codon 2811; (7) phenylalanine at codon 4703; (8) leucine to serine at codon 2333; and, (9) threonine to alanine at codon 5131. ConclusionsThe primary outcome is a new, highly promising, pan-coronavirus vaccine candidate based on nine amino-acid substitutions in the ORF1ab gene. The secondary outcome was evidence that sequences of wet-dry lab collaborations - here machine learning analysis of viral genomes informing codon functions -- may discover new broader and more stable vaccines candidates more quickly and inexpensively than traditional methods.

Matching journals

The top 15 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 21%
8.7%
2
PeerJ
261 papers in training set
Top 0.9%
5.0%
3
Frontiers in Pharmacology
100 papers in training set
Top 0.5%
5.0%
4
Scientific Reports
3102 papers in training set
Top 21%
5.0%
5
Life
27 papers in training set
Top 0.1%
4.1%
6
Clinical Infectious Diseases
231 papers in training set
Top 1%
3.7%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.2%
8
Science
429 papers in training set
Top 11%
2.7%
9
Frontiers in Public Health
140 papers in training set
Top 3%
2.7%
10
Nature Communications
4913 papers in training set
Top 46%
2.2%
11
Vaccine
189 papers in training set
Top 1%
2.0%
12
JAMA Network Open
127 papers in training set
Top 2%
1.8%
13
Biology Methods and Protocols
53 papers in training set
Top 0.8%
1.8%
14
JMIR Public Health and Surveillance
45 papers in training set
Top 2%
1.8%
15
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
50% of probability mass above
16
Vaccines
196 papers in training set
Top 1%
1.5%
17
Journal of Medical Virology
137 papers in training set
Top 2%
1.5%
18
New England Journal of Medicine
50 papers in training set
Top 0.5%
1.5%
19
Frontiers in Medicine
113 papers in training set
Top 4%
1.5%
20
Cell Discovery
54 papers in training set
Top 3%
1.4%
21
eLife
5422 papers in training set
Top 48%
1.3%
22
Influenza and Other Respiratory Viruses
44 papers in training set
Top 0.3%
1.0%
23
Communications Biology
886 papers in training set
Top 17%
0.9%
24
PLOS Medicine
98 papers in training set
Top 4%
0.9%
25
The Lancet Infectious Diseases
71 papers in training set
Top 2%
0.9%
26
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%
27
BMC Medicine
163 papers in training set
Top 6%
0.8%
28
iScience
1063 papers in training set
Top 28%
0.8%
29
American Journal of Preventive Medicine
11 papers in training set
Top 0.4%
0.8%
30
EClinicalMedicine
21 papers in training set
Top 0.7%
0.8%