Back

Accurate nucleic acid-binding residue identification based on domain-adaptive protein language model and explainable geometric deep learning

Zeng, W.; Pan, L.; Ji, B.; Xu, L.; Peng, S.

2024-12-16 bioinformatics
10.1101/2024.12.11.628078 bioRxiv
Show abstract

Protein-nucleic acid interactions play a fundamental and critical role in a wide range of life activities. Accurate identification of nucleic acid-binding residues helps to understand the intrinsic mechanisms of the interactions. However, the accuracy and interpretability of existing computational methods for recognizing nucleic acid-binding residues need to be further improved. Here, we propose a novel method called GeSite based the domain adaptive protein language model and explainable E(3)-equivariant graph convolution neural network. Prediction results across multiple benchmark test sets demonstrate that GeSite is superior or comparable to state-of-the-art prediction methods. The performance comparison on low structure similarity and newly released test proteins demonstrates the robustness and generalization of the method. Detailed experimental results suggest that the advanced performance of GeSite lies in the well-designed nucleic acid-binding protein adaptive language model. Meanwhile, interpretability analysis exposes the perception of the prediction model on various remote and close functional domains, which is the source of its discernment. The data and source code of GeSite are freely accessible at https://github.com/pengsl-lab/GeSite.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Briefings in Bioinformatics
326 papers in training set
Top 0.1%
28.9%
2
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.8%
6.7%
3
Bioinformatics
1061 papers in training set
Top 4%
6.7%
4
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
5
Quantitative Biology
11 papers in training set
Top 0.1%
3.7%
6
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.2%
50% of probability mass above
7
Advanced Science
249 papers in training set
Top 8%
2.5%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.5%
9
Science Bulletin
22 papers in training set
Top 0.2%
2.2%
10
National Science Review
22 papers in training set
Top 0.7%
2.0%
11
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
12
Communications Biology
886 papers in training set
Top 10%
1.6%
13
BMC Bioinformatics
383 papers in training set
Top 5%
1.6%
14
Scientific Reports
3102 papers in training set
Top 63%
1.4%
15
Journal of Structural Biology
58 papers in training set
Top 0.9%
1.4%
16
PLOS ONE
4510 papers in training set
Top 57%
1.4%
17
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.3%
1.3%
18
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.3%
1.3%
19
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
20
Nature Communications
4913 papers in training set
Top 57%
1.2%
21
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
22
The Innovation
12 papers in training set
Top 0.7%
0.9%
23
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.8%
24
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.8%
25
Journal of Chemical Theory and Computation
126 papers in training set
Top 0.8%
0.8%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
27
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
28
eLife
5422 papers in training set
Top 60%
0.7%
29
The Journal of Physical Chemistry B
158 papers in training set
Top 2%
0.7%
30
Science China Life Sciences
26 papers in training set
Top 3%
0.5%