Back

Benchmarking Sequence-Based and AlphaFold-Based Methods for pMHC-II Binding Core Prediction: Distinct Strengths and Consensus Approaches

Ko, S.; Li, H.; Kim, H.; Shin, W.-H.; Ko, J.; Choi, Y.

2024-10-11 bioinformatics
10.1101/2024.10.06.616783 bioRxiv
Show abstract

BackgroundInteractions between peptide and MHC class II (pMHC-II) are crucial for T-cell recognition and immune responses, as MHC-II molecules present peptide fragments to T cells, enabling the distinction between self and non-self antigens. Accurately predicting the pMHC-II binding core is particularly important because it provides insights into pMHC-II interactions and T-cell receptor engagement. Given the high polymorphism and peptide-binding promiscuity of MHC-II molecules, computational prediction methods are essential for understanding pMHC-II interactions. While sequence-based methods are widely used, recent advances in AlphaFold-based structure prediction have opened new possibilities for improving pMHC-II binding core predictions. ResultsWe benchmarked four recent pMHC-II prediction methods with a focus on binding core prediction: two sequence-based methods, NetMHCIIpan and DeepMHCII, and two AlphaFold-based structure prediction methods, AlphaFold2 fine-tuned for peptide interactions (AF2-FT) and AlphaFold3 (AF3). The AlphaFold-based methods showed strong performance in predicting positive binders, with AF3 achieving the highest positive recall (0.86) and AF2-FT performing similarly (0.81). However, both methods frequently misclassified unbound peptides as binders. NetMHCIIpan excelled at identifying non-binders, achieving the highest negative recall (0.93), but had lower positive recall (0.44). In contrast, DeepMHCII demonstrated moderate performance without any notable strength. Consensus approaches combining AlphaFold-based methods for binder identification with filtering using NetMHCIIpan improved overall prediction precision (0.94 and 0.87 for known and unknown binding status, respectively). ConclusionsThis study highlights the complementary strengths of AlphaFold-based and sequence-based methods for predicting pMHC-II binding core regions. AlphaFold-based methods excel in predicting positive binders, while NetMHCIIpan is highly effective at identifying non-binders. Future research should focus on improving the prediction of unbound peptides for AlphaFold-based models. Since NetMHCIIpans binding core predictive ability is already high, future efforts should concentrate on enhancing its binding prediction to further improve overall accuracy.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
ImmunoInformatics
11 papers in training set
Top 0.1%
33.1%
2
Bioinformatics
1061 papers in training set
Top 2%
14.4%
3
BMC Bioinformatics
383 papers in training set
Top 1%
7.2%
50% of probability mass above
4
Frontiers in Immunology
586 papers in training set
Top 1.0%
6.8%
5
PLOS ONE
4510 papers in training set
Top 28%
6.3%
6
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
4.0%
7
Scientific Reports
3102 papers in training set
Top 53%
1.9%
8
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
9
Bioinformatics Advances
184 papers in training set
Top 3%
1.8%
10
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
11
GigaScience
172 papers in training set
Top 2%
1.2%
12
Frontiers in Bioinformatics
45 papers in training set
Top 0.4%
1.2%
13
Journal of Proteome Research
215 papers in training set
Top 2%
1.1%
14
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.1%
15
Journal of Immunological Methods
24 papers in training set
Top 0.2%
0.9%
16
Frontiers in Physiology
93 papers in training set
Top 5%
0.9%
17
Communications Biology
886 papers in training set
Top 23%
0.7%
18
iScience
1063 papers in training set
Top 32%
0.7%
19
Gigabyte
60 papers in training set
Top 2%
0.6%
20
Immunology
29 papers in training set
Top 1%
0.6%
21
Cell Reports Methods
141 papers in training set
Top 6%
0.6%
22
BioMed Research International
25 papers in training set
Top 4%
0.6%
23
Molecular & Cellular Proteomics
158 papers in training set
Top 2%
0.6%
24
Journal of Biological Chemistry
641 papers in training set
Top 6%
0.5%
25
Methods
29 papers in training set
Top 0.9%
0.5%
26
PeerJ
261 papers in training set
Top 19%
0.5%