Back

IDBSpred: An intrinsically disordered binding site predictor using machine learning and protein language model

Jones, D.; Wu, Y.

2026-03-30 bioinformatics
10.64898/2026.03.27.714773 bioRxiv
Show abstract

Intrinsically disordered proteins (IDPs) mediate many cellular functions through interactions with structured protein partners, but predicting the corresponding binding sites on the structured partner remains challenging. Here, we present IDBSpred, a sequence-based method for residue-level prediction of IDP-binding sites on structured proteins. Training and test data were collected from the DIBS database, which contains more than 700 non-redundant IDP-protein complexes. Residue-level embeddings of structured partner sequences were generated using the ESM-2 protein language model and used as input to a multilayer perceptron classifier for binary prediction of binding versus non-binding residues. Analysis of amino acid composition showed that IDP-binding sites are enriched in aromatic residues, especially Trp, Tyr, and Phe, as well as several charged and polar residues, whereas Ala and several small or conformationally restrictive residues are depleted. The classifier achieved an ROC AUC of 0.87 and an average precision of 0.61. Structural case studies further showed that the predicted sites largely recapitulate the major experimentally defined binding interfaces. These results demonstrate that protein language model embeddings plus machine learning algorithms can effectively capture sequence features associated with IDP recognition on structured proteins. IDBSpred provides a practical framework for studying IDP-mediated interfaces and identifying potential therapeutic hotspots.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 3%
8.2%
2
Journal of Molecular Biology
217 papers in training set
Top 0.2%
6.7%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.8%
6.7%
4
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 1%
6.2%
5
PLOS Computational Biology
1633 papers in training set
Top 6%
6.2%
6
Scientific Reports
3102 papers in training set
Top 20%
6.2%
7
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.9%
6.2%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.9%
4.7%
50% of probability mass above
9
Advanced Science
249 papers in training set
Top 4%
4.2%
10
Nature Communications
4913 papers in training set
Top 37%
3.9%
11
Protein Science
221 papers in training set
Top 0.4%
3.5%
12
Communications Biology
886 papers in training set
Top 3%
3.0%
13
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.4%
1.8%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
15
PLOS ONE
4510 papers in training set
Top 55%
1.7%
16
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
17
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.7%
18
Communications Chemistry
39 papers in training set
Top 0.4%
1.5%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
20
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
21
ACS Omega
90 papers in training set
Top 3%
0.9%
22
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
23
eLife
5422 papers in training set
Top 56%
0.8%
24
Biomolecules
95 papers in training set
Top 2%
0.7%
25
The Journal of Physical Chemistry B
158 papers in training set
Top 2%
0.7%
26
National Science Review
22 papers in training set
Top 2%
0.7%
27
International Journal of Molecular Sciences
453 papers in training set
Top 16%
0.7%
28
Computational Biology and Chemistry
23 papers in training set
Top 0.6%
0.7%
29
International Journal of Biological Macromolecules
65 papers in training set
Top 4%
0.7%
30
Journal of Genetics and Genomics
36 papers in training set
Top 3%
0.6%