Back

Automatic Identification of SARS Coronavirus using Compression-Complexity Measures

Balasubramanian, K.; Nagaraj, N.

2020-03-27 bioinformatics
10.1101/2020.03.24.006007 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWFinding vaccine or specific antiviral treatment for global pandemic of virus diseases (such as the ongoing COVID-19) requires rapid analysis, annotation and evaluation of metagenomic libraries to enable a quick and efficient screening of nucleotide sequences. Traditional sequence alignment methods are not suitable and there is a need for fast alignment-free techniques for sequence analysis. Information theory and data compression algorithms provide a rich set of mathematical and computational tools to capture essential patterns in biological sequences. In 2013, our research group (Nagaraj et al., Eur. Phys. J. Special Topics 222(3-4), 2013) has proposed a novel measure known as Effort-To-Compress (ETC) based on the notion of compression-complexity to capture the information content of sequences. In this study, we propose a compression-complexity based distance measure for automatic identification of SARS coronavirus strains from a set of viruses using only short fragments of nucleotide sequences. We also demonstrate that our proposed method can correctly distinguish SARS-CoV-2 from SARS-CoV-1 viruses by analyzing very short segments of nucleotide sequences. This work could be extended further to enable medical practitioners in automatically identifying and characterizing SARS coronavirus strain in a fast and efficient fashion using short and/or incomplete segments of nucleotide sequences. Potentially, the need for sequence assembly can be circumvented. NoteThe main ideas and results of this research were first presented at the International Conference on Nonlinear Systems and Dynamics (CNSD-2013) held at Indian Institute of Technology, Indore, December 12, 2013. In this manuscript, we have extended our preliminary analysis to include SARS-CoV-2 virus as well.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Physical Review E
95 papers in training set
Top 0.1%
22.7%
2
Chaos, Solitons & Fractals
32 papers in training set
Top 0.2%
10.2%
3
Mathematics
11 papers in training set
Top 0.1%
3.6%
4
PLOS Computational Biology
1633 papers in training set
Top 11%
3.1%
5
Frontiers in Physics
20 papers in training set
Top 0.1%
2.4%
6
Chaos: An Interdisciplinary Journal of Nonlinear Science
16 papers in training set
Top 0.1%
2.1%
7
Scientific Reports
3102 papers in training set
Top 50%
2.1%
8
BioSystems
11 papers in training set
Top 0.1%
2.1%
9
Bioinformatics
1061 papers in training set
Top 7%
1.9%
50% of probability mass above
10
Entropy
20 papers in training set
Top 0.1%
1.7%
11
Physical Biology
43 papers in training set
Top 1%
1.7%
12
Physica A: Statistical Mechanics and its Applications
10 papers in training set
Top 0.1%
1.7%
13
Bulletin of Mathematical Biology
84 papers in training set
Top 1%
1.5%
14
The Journal of Physical Chemistry B
158 papers in training set
Top 1%
1.5%
15
PLOS ONE
4510 papers in training set
Top 58%
1.3%
16
Mathematical Biosciences
42 papers in training set
Top 0.7%
1.3%
17
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.3%
1.3%
18
Biophysical Journal
545 papers in training set
Top 4%
1.2%
19
Frontiers in Neuroscience
223 papers in training set
Top 5%
1.2%
20
Journal of Theoretical Biology
144 papers in training set
Top 1%
1.2%
21
Royal Society Open Science
193 papers in training set
Top 3%
1.2%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.2%
23
Frontiers in Computational Neuroscience
53 papers in training set
Top 2%
1.0%
24
Physical Review Research
46 papers in training set
Top 0.6%
1.0%
25
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
26
Mathematical Biosciences and Engineering
23 papers in training set
Top 0.5%
0.9%
27
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.6%
0.9%
28
Journal of Computational Biology
37 papers in training set
Top 0.5%
0.8%
29
PeerJ
261 papers in training set
Top 15%
0.8%
30
Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
15 papers in training set
Top 0.8%
0.8%