Back

Enhancing Vaxign-DL for Vaccine Candidate Prediction with added ESM-Generated Features

Chen, Y.; Zhang, Y.; He, Y.

2024-09-08 bioinformatics
10.1101/2024.09.04.611295 bioRxiv
Show abstract

Many vaccine design programs have been developed, including our own machine learning approaches Vaxign-ML and Vaxign-DL. Using deep learning techniques, Vaxign-DL predicts bacterial protective antigens by calculating 509 biological and biomedical features from protein sequences. In this study, we first used the protein folding ESM program to calculate a set of 1,280 features from individual protein sequences, and then utilized the new set of features separately or in combination with the traditional set of 509 features to predict protective antigens. Our result showed that the usage of ESM-derived features alone was able to accurately predict vaccine antigens with a performance similar to the orginal Vaxign-DL prediction method, and the usage of the combined ESM-derived and orginal Vaxign-DL features significantly improved the prediction performance according to a set of seven scores including specificity, sensitivity, and AUROC. To further evaluate the updated methods, we conducted a Leave-One-Pathogen-Out Validation (LOPOV) study, and found that the usage of ESM-derived features significantly improved the the prediction of vaccine antigens from 10 bacterial pathogens. This research is the first reported study demonstrating the added value of protein folding features for vaccine antigen prediction.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Briefings in Bioinformatics
326 papers in training set
Top 0.1%
29.2%
2
Computers in Biology and Medicine
120 papers in training set
Top 0.2%
7.6%
3
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.8%
4
Scientific Reports
3102 papers in training set
Top 33%
3.8%
5
PLOS Computational Biology
1633 papers in training set
Top 11%
3.4%
6
Nature Machine Intelligence
61 papers in training set
Top 1%
3.2%
50% of probability mass above
7
Frontiers in Immunology
586 papers in training set
Top 3%
2.9%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.6%
9
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
2.5%
10
PLOS ONE
4510 papers in training set
Top 47%
2.2%
11
Quantitative Biology
11 papers in training set
Top 0.2%
1.8%
12
Frontiers in Genetics
197 papers in training set
Top 5%
1.6%
13
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
14
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.3%
15
Journal of Genetics and Genomics
36 papers in training set
Top 1%
1.2%
16
Bioinformatics
1061 papers in training set
Top 8%
1.0%
17
BioMed Research International
25 papers in training set
Top 2%
0.9%
18
Bioengineering
24 papers in training set
Top 1%
0.8%
19
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.5%
0.8%
20
Frontiers in Pharmacology
100 papers in training set
Top 4%
0.8%
21
Informatics in Medicine Unlocked
21 papers in training set
Top 1%
0.8%
22
Nature Communications
4913 papers in training set
Top 61%
0.8%
23
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.8%
24
International Immunopharmacology
15 papers in training set
Top 0.5%
0.8%
25
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.8%
26
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 9%
0.8%
27
Pharmaceuticals
33 papers in training set
Top 2%
0.7%
28
Frontiers in Microbiology
375 papers in training set
Top 10%
0.7%
29
Journal of Structural Biology
58 papers in training set
Top 2%
0.7%
30
Computational Biology and Chemistry
23 papers in training set
Top 0.7%
0.5%