Back

Improving the Accuracy of Distance-Based Protein-Ligand Binding Affinity Prediction Using Linear Regression and Artificial Neural Network

Yang, Y. X.; Zhu, B. T.

2025-10-08 biophysics
10.1101/2025.10.07.680851 bioRxiv
Show abstract

In the traditional scoring functions for protein-ligand binding affinity prediction, the energies of the electrostatic and van der Waals interactions were evaluated (or restricted) by the mathematical expressions of [Formula] and [Formula], respectively. In comparison, the power exponents of distance-based variables as adopted in the present study are not restricted as those in traditional energy terms for atomic interactions. The distance-based variables were integrated using linear regression and artificial neural network to predict the protein-ligand binding affinity or binding energy. The training of the linear, neural network and mixed models was based on the newest data in PDBbind, i.e., PDBbind (v.2024). Estimated according to Pearsons correlation coefficient (R), the best performances of the linear models are 0.700 < R [&le;] 0.800 with the high-quality affinity data, and those of the neural network-based mixed models are 0.800 [&le;] R < 0.900 with the same data. The predictive powers of the best models developed in this study are superior to the sophisticated linear and machine learning-based scoring functions developed before. The results suggest that the distance-based variables with appropriate power exponents may have the ability to improve the prediction of protein-ligand binding affinity with high accuracy. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=143 HEIGHT=200 SRC="FIGDIR/small/680851v1_ufig1.gif" ALT="Figure 1"> View larger version (50K): org.highwire.dtl.DTLVardef@10e86e4org.highwire.dtl.DTLVardef@b9ec82org.highwire.dtl.DTLVardef@565a90org.highwire.dtl.DTLVardef@153c4ef_HPS_FORMAT_FIGEXP M_FIG C_FIG HIGHLIGHTSO_LIBy using the newest data in PDBbind (v.2024) to train the linear, neural network and mixed models, the quantitative distance-energy relationships are further explored and improved to predict the binding affinity of protein-ligand complexes. C_LIO_LIThe power exponents of distance in the traditional energy terms are expanded to characterize the distance-energy relationships accurately at atom level for protein-ligand interactions. C_LIO_LIThe best models are superior to the sophisticated machine learning-based scoring functions developed before. C_LI

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.3%
18.8%
2
Biophysics and Physicobiology
10 papers in training set
Top 0.1%
4.3%
3
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.2%
4
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
4.0%
5
The Journal of Physical Chemistry Letters
58 papers in training set
Top 0.4%
3.6%
6
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.3%
3.6%
7
The Journal of Physical Chemistry B
158 papers in training set
Top 0.6%
3.1%
8
Molecules
37 papers in training set
Top 0.3%
3.1%
9
Computers in Biology and Medicine
120 papers in training set
Top 1%
3.1%
10
eLife
5422 papers in training set
Top 31%
2.8%
50% of probability mass above
11
PLOS ONE
4510 papers in training set
Top 45%
2.6%
12
Physical Chemistry Chemical Physics
34 papers in training set
Top 0.2%
2.1%
13
ACS Omega
90 papers in training set
Top 1%
2.1%
14
Journal of Biomolecular Structure and Dynamics
43 papers in training set
Top 0.6%
1.8%
15
Computational Biology and Chemistry
23 papers in training set
Top 0.1%
1.8%
16
International Journal of Molecular Sciences
453 papers in training set
Top 7%
1.7%
17
Scientific Reports
3102 papers in training set
Top 58%
1.7%
18
Biochemistry and Biophysics Reports
28 papers in training set
Top 0.5%
1.7%
19
Journal of Molecular Graphics and Modelling
16 papers in training set
Top 0.1%
1.7%
20
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
21
Protein Science
221 papers in training set
Top 1%
1.3%
22
International Journal of Biological Macromolecules
65 papers in training set
Top 2%
1.1%
23
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
1.0%
24
Proteins: Structure, Function, and Bioinformatics
82 papers in training set
Top 0.7%
1.0%
25
European Biophysics Journal
11 papers in training set
Top 0.2%
0.9%
26
Frontiers in Chemistry
14 papers in training set
Top 0.3%
0.8%
27
PeerJ
261 papers in training set
Top 14%
0.8%
28
Entropy
20 papers in training set
Top 0.3%
0.8%
29
Bioinformatics
1061 papers in training set
Top 9%
0.8%
30
Quantitative Biology
11 papers in training set
Top 0.7%
0.8%