Back

NCodR: A multi-class SVM classification to distinguish between non-coding RNAs in Viridiplantae

Nithin, C.; Mukherjee, S.; Basak, J.; Bahadur, R. P.

2021-01-25 plant biology
10.1101/2021.01.23.427923 bioRxiv
Show abstract

Non-coding RNAs (ncRNAs) are major players in the regulation of gene expression. This study analyses seven classes of ncRNAs in plants using sequence and secondary structure-based RNA folding measures. We observe distinct regions in the distribution of AU content along with overlapping regions for different ncRNA classes. Additionally, we find similar averages for minimum folding energy index across various ncRNAs classes except for pre-miRNAs and lncRNAs. Various RNA folding measures show similar trends among the different ncRNA classes except for pre-miRNAs and lncRNAs. We observe different k-mer repeat signatures of length three among various ncRNA classes. However, in pre-miRs and lncRNAs, a diffuse pattern of k-mers is observed. Using these attributes, we train eight different classifiers to discriminate various ncRNA classes in plants. Support-vector machines employing radial basis function show the highest accuracy (average F1 of ~91%) in discriminating ncRNAs, and the classifier is implemented as a web server, NCodR.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 1.0%
18.4%
2
PLOS ONE
4510 papers in training set
Top 13%
14.6%
3
Frontiers in Genetics
197 papers in training set
Top 0.5%
8.3%
4
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
4.3%
5
Genomics
60 papers in training set
Top 0.4%
3.6%
6
International Journal of Molecular Sciences
453 papers in training set
Top 4%
2.9%
50% of probability mass above
7
PeerJ
261 papers in training set
Top 4%
2.3%
8
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
9
Genes
126 papers in training set
Top 1%
1.7%
10
Plants
39 papers in training set
Top 1%
1.7%
11
Frontiers in Plant Science
240 papers in training set
Top 4%
1.7%
12
RNA Biology
70 papers in training set
Top 0.3%
1.7%
13
Biosystems
18 papers in training set
Top 0.2%
1.7%
14
ACS Omega
90 papers in training set
Top 2%
1.5%
15
Journal of Virology
456 papers in training set
Top 2%
1.5%
16
BMC Genomics
328 papers in training set
Top 3%
1.5%
17
Viruses
318 papers in training set
Top 3%
1.3%
18
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.3%
19
Plant Direct
81 papers in training set
Top 2%
1.2%
20
F1000Research
79 papers in training set
Top 3%
0.9%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.9%
22
The Plant Journal
197 papers in training set
Top 3%
0.9%
23
Journal of General Virology
46 papers in training set
Top 0.7%
0.8%
24
Heliyon
146 papers in training set
Top 6%
0.8%
25
Planta
15 papers in training set
Top 0.4%
0.8%
26
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.7%
27
Nucleic Acids Research
1128 papers in training set
Top 18%
0.7%
28
Journal of Molecular Evolution
21 papers in training set
Top 0.4%
0.7%
29
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
30
G3 Genes|Genomes|Genetics
351 papers in training set
Top 3%
0.7%