Back

Genomic Evolution of SARS-CoV-2 Delta Variants Pre- and Post-Omicron Emergence using Alignment-free Machine Learning models

Sankar, S.; Anandharaman, K.; Selvam, P.; Jayaraman, A.; Jayakumar, D.; Sivadoss, R.; Esaki Muthu, S.; Velu, V.; Larsson, M.; Balakrishnan, P.

2026-02-23 genomics
10.64898/2026.02.20.706927 bioRxiv
Show abstract

The SARS-CoV-2 Delta variant (B.1.617.2), initially classified as a variant of concern due to its enhanced transmissibility and vaccine-escape mutations, underwent further genomic changes following the emergence of the Omicron variant (B.1.1.529). This study investigates the genomic differences in Delta variant spike gene sequences collected before and after the emergence of Omicron. A total of 190 sequences were analyzed using an alignment-free approach incorporating k-mer-based feature extraction and machine learning models, including convolutional neural networks (CNN), K-means clustering, and random forest classification. The random forest model achieved 93% accuracy, with significant F1 scores, effectively distinguishing the two Delta variant groups. Comparative analysis revealed 157 persistent mutations and four vanished mutations in the post-Omicron group. Cluster analysis showed notable shifts, indicating stable yet evolving genomic patterns over time. The study demonstrates the advantage of alignment-free methods in detecting subtle sequence variations that alignment-based approaches may overlook. These findings enhance our understanding of SARS-CoV-2 evolution and provide a framework for identifying key genomic signatures relevant to public health. The methodology and insights gained offer potential applications in variant surveillance, vaccine design, and viral evolutionary studies, supporting preparedness for future SARS-CoV-2 variant emergence.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.1%
12.1%
2
Viruses
318 papers in training set
Top 0.3%
12.1%
3
Frontiers in Microbiology
375 papers in training set
Top 0.6%
9.8%
4
Briefings in Bioinformatics
326 papers in training set
Top 0.8%
6.6%
5
Scientific Reports
3102 papers in training set
Top 20%
6.1%
6
Frontiers in Genetics
197 papers in training set
Top 1%
4.2%
50% of probability mass above
7
BMC Genomics
328 papers in training set
Top 1.0%
3.5%
8
PLOS ONE
4510 papers in training set
Top 41%
3.5%
9
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
2.3%
10
International Journal of Biological Macromolecules
65 papers in training set
Top 2%
1.7%
11
PeerJ
261 papers in training set
Top 7%
1.7%
12
Microbial Genomics
204 papers in training set
Top 1%
1.7%
13
Virus Research
36 papers in training set
Top 0.6%
1.6%
14
mSystems
361 papers in training set
Top 5%
1.6%
15
Journal of Medical Virology
137 papers in training set
Top 2%
1.6%
16
Frontiers in Public Health
140 papers in training set
Top 5%
1.6%
17
Virus Evolution
140 papers in training set
Top 0.8%
1.6%
18
Pathogens
53 papers in training set
Top 0.7%
1.4%
19
Infection, Genetics and Evolution
43 papers in training set
Top 0.5%
1.4%
20
Heliyon
146 papers in training set
Top 4%
1.2%
21
Genes
126 papers in training set
Top 2%
0.9%
22
Emerging Microbes & Infections
74 papers in training set
Top 1%
0.9%
23
Genomics
60 papers in training set
Top 2%
0.9%
24
eLife
5422 papers in training set
Top 57%
0.8%
25
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
26
Genome Medicine
154 papers in training set
Top 8%
0.8%
27
Journal of Virology
456 papers in training set
Top 4%
0.7%
28
PNAS Nexus
147 papers in training set
Top 2%
0.7%
29
Gene
41 papers in training set
Top 2%
0.7%
30
PLOS Computational Biology
1633 papers in training set
Top 28%
0.6%