Back

Deep learning of antibody epitopes using molecular permutation vectors

Vardaxis, I.; Simovski, B.; Anzar, I.; Stratford, R.; Clancy, T.

2024-03-21 bioinformatics
10.1101/2024.03.20.585661 bioRxiv
Show abstract

BackgroundThe accurate computational prediction of B cell epitopes can vastly reduce the cost and time required for identifying potential epitope candidates for the design of vaccines and immunodiagnostics. However, current computational tools for B cell epitope prediction perform poorly and are not fit-for-purpose, and there remains enormous room for improvement and the need for superior prediction strategies. ResultsHere we propose a novel approach that improves B cell epitope prediction by encoding epitopes as binary molecular permutation vectors that represent the position and structural properties of the amino acids within a protein antigen sequence that interact with an antibody, rather than the traditional approach of defining epitopes as scores per amino acid on a protein sequence that pertain to their probability of partaking in a B cell epitope antibody interaction. In addition to defining epitopes as binary molecular permutation vectors, the approach also uses the 3D macrostructure features of the unbound 3D protein structures, and in turn uses these features to train another deep learning model on the corresponding antibody-bound protein 3D structures. We demonstrate that the strategy predicts B cell epitopes with improved accuracy compared to the existing tools. Additionally, we demonstrate that this approach reliably identifies the majority of experimentally verified epitopes on the spike protein of SARS-CoV-2 not seen by the model in training and generalizes in very robust manner on dissimilar data not seen by the model in training. ConclusionsWith the approach described herein, a primary protein sequence with the query molecular permutation vector alone is required to predict B cell epitopes in a reliable manner, potentially advancing the use of computational prediction of B cell epitopes in biomedical research applications.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
22.7%
2
ImmunoInformatics
11 papers in training set
Top 0.1%
18.8%
3
BMC Bioinformatics
383 papers in training set
Top 2%
6.4%
4
Scientific Reports
3102 papers in training set
Top 23%
4.9%
50% of probability mass above
5
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.8%
4.9%
6
Biology Methods and Protocols
53 papers in training set
Top 0.2%
4.0%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.9%
8
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
2.8%
9
PLOS Computational Biology
1633 papers in training set
Top 12%
2.8%
10
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
11
PLOS ONE
4510 papers in training set
Top 48%
2.1%
12
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.9%
13
Antibody Therapeutics
16 papers in training set
Top 0.2%
1.9%
14
Frontiers in Immunology
586 papers in training set
Top 4%
1.7%
15
mAbs
28 papers in training set
Top 0.2%
1.7%
16
GigaScience
172 papers in training set
Top 2%
0.9%
17
Protein Science
221 papers in training set
Top 2%
0.8%
18
Genome Medicine
154 papers in training set
Top 8%
0.8%
19
Frontiers in Virology
15 papers in training set
Top 0.1%
0.7%
20
Frontiers in Physiology
93 papers in training set
Top 6%
0.7%
21
PeerJ
261 papers in training set
Top 16%
0.7%
22
Gigabyte
60 papers in training set
Top 2%
0.7%
23
BioSystems
11 papers in training set
Top 0.4%
0.7%
24
Journal of Structural Biology
58 papers in training set
Top 2%
0.7%
25
BioData Mining
15 papers in training set
Top 1%
0.5%