Back

Cluster Analysis of SARS-CoV-2 Gene using Deep Learning Autoencoder: Gene Profiling for Mutations and Transitions

Miyake, J.; Sato, T.; Baba, S.; Nakamura, H.; Niioka, H.; Nakazawa, Y.

2021-03-16 microbiology
10.1101/2021.03.16.435601 bioRxiv
Show abstract

We report on a method for analyzing the variant of coronavirus genes using autoencoder. Since coronaviruses have mutated rapidly and generated a large number of genotypes, an appropriate method for understanding the entire population is required. The method using autoencoder meets this requirement and is suitable for understanding how and when the variants emarge and disappear. For the over 30,000 SARS-CoV-2 ORF1ab gene sequences sampled globally from December 2019 to February 2021, we were able to represent a summary of their characteristics in a 3D plot and show the expansion, decline, and transformation of the virus types over time and by region. Based on ORF1ab genes, the SARS-CoV-2 viruses were classified into five major types (A, B, C, D, and E in the order of appearance): the virus type that originated in China at the end of 2019 (type A) practically disappeared in June 2020; two virus types (types B and C) have emerged in the United States and Europe since February 2020, and type B has become a global phenomenon. Type C is only prevalent in the U.S. and is suspected to be associated with high mortality, but this type also disappeared at the end of June. Type D is only found in Australia. Currently, the epidemic is dominated by types B and E.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 8%
19.6%
2
Scientific Reports
3102 papers in training set
Top 5%
10.6%
3
PLOS Computational Biology
1633 papers in training set
Top 5%
7.1%
4
Briefings in Bioinformatics
326 papers in training set
Top 1%
5.1%
5
Mathematics
11 papers in training set
Top 0.1%
3.8%
6
Mathematical Biosciences and Engineering
23 papers in training set
Top 0.1%
3.2%
7
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
2.6%
50% of probability mass above
8
Communications Biology
886 papers in training set
Top 5%
2.2%
9
iScience
1063 papers in training set
Top 13%
1.8%
10
BioMed Research International
25 papers in training set
Top 1%
1.8%
11
Viruses
318 papers in training set
Top 3%
1.3%
12
BMC Medical Genomics
36 papers in training set
Top 0.7%
1.3%
13
Infectious Disease Modelling
50 papers in training set
Top 0.9%
1.3%
14
Nature Communications
4913 papers in training set
Top 57%
1.2%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
16
Epidemiology and Infection
84 papers in training set
Top 2%
0.9%
17
Heliyon
146 papers in training set
Top 5%
0.8%
18
eLife
5422 papers in training set
Top 54%
0.8%
19
Biomedicines
66 papers in training set
Top 2%
0.8%
20
Journal of Medical Virology
137 papers in training set
Top 4%
0.8%
21
Methods
29 papers in training set
Top 0.5%
0.8%
22
Bioinformatics
1061 papers in training set
Top 9%
0.8%
23
PeerJ
261 papers in training set
Top 15%
0.8%
24
GigaScience
172 papers in training set
Top 3%
0.8%
25
Frontiers in Medicine
113 papers in training set
Top 8%
0.7%
26
Science China Life Sciences
26 papers in training set
Top 2%
0.7%
27
Journal of Infection
71 papers in training set
Top 4%
0.5%
28
Genes
126 papers in training set
Top 4%
0.5%
29
Chaos, Solitons & Fractals
32 papers in training set
Top 2%
0.5%
30
Journal of Clinical Medicine
91 papers in training set
Top 8%
0.5%