Back

VPF-Class 2.0: a taxonomy-centered framework for automatic viral classification

Vidal, L. J.; Pons, J. C.; Fiamenghi, M. B.; Kyrpides, N.; Llabres, M.

2026-03-23 bioinformatics
10.64898/2026.03.20.713201 bioRxiv
Show abstract

Rapid expansion of viral sequence data demands classifiers that scale, track ICTV updates, and provide interpretable evidence. We present VPF-Class 2.0, an updated successor to VPF-Class, centred on the taxonomic classification, that retains marker-driven protein domain detection but replaces rule-based voting with a lightweight supervised model on per-genome marker-composition features. In controlled benchmarks, VPF-Class 2.0 achieves near-perfect family-level performance and strong genus-level accuracy while increasing confident annotation coverage. Under a practical confidence threshold (0.3), performance improves and matches or exceeds representative tools within shared taxonomic scopes. We further introduce an interpretability study that relates errors to the genus specificity of activated markers. Finally, we demonstrate applicability on large real-world viromes with consistent labels and substantial agreement with graph-based classifications. The implementation of VPF-Class 2.0 can be downloaded from https://github.com/luisvidalj/VPFClass2.git.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Biotechnology
147 papers in training set
Top 0.1%
22.0%
2
Nature Communications
4913 papers in training set
Top 14%
12.2%
3
Nature Methods
336 papers in training set
Top 1%
9.9%
4
Nucleic Acids Research
1128 papers in training set
Top 3%
6.2%
50% of probability mass above
5
Cell Systems
167 papers in training set
Top 2%
6.2%
6
Genome Biology
555 papers in training set
Top 2%
4.7%
7
Nature
575 papers in training set
Top 7%
3.6%
8
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
9
Bioinformatics
1061 papers in training set
Top 6%
2.5%
10
Genome Medicine
154 papers in training set
Top 4%
2.0%
11
Virus Evolution
140 papers in training set
Top 0.6%
2.0%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.8%
13
Nature Microbiology
133 papers in training set
Top 2%
1.8%
14
PLOS ONE
4510 papers in training set
Top 55%
1.7%
15
Nature Genetics
240 papers in training set
Top 4%
1.7%
16
Science
429 papers in training set
Top 15%
1.6%
17
Scientific Reports
3102 papers in training set
Top 67%
1.2%
18
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
19
Genome Research
409 papers in training set
Top 4%
0.8%
20
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
21
Advanced Science
249 papers in training set
Top 20%
0.7%
22
Cell Reports Methods
141 papers in training set
Top 6%
0.7%
23
Cell Host & Microbe
113 papers in training set
Top 5%
0.7%
24
Cell
370 papers in training set
Top 19%
0.6%
25
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%