Accurate nucleic acid-binding residue identification based on domain-adaptive protein language model and explainable geometric deep learning

Zeng, W.; Pan, L.; Ji, B.; Xu, L.; Peng, S.

2024-12-16 bioinformatics

10.1101/2024.12.11.628078 bioRxiv

Show abstract

Protein-nucleic acid interactions play a fundamental and critical role in a wide range of life activities. Accurate identification of nucleic acid-binding residues helps to understand the intrinsic mechanisms of the interactions. However, the accuracy and interpretability of existing computational methods for recognizing nucleic acid-binding residues need to be further improved. Here, we propose a novel method called GeSite based the domain adaptive protein language model and explainable E(3)-equivariant graph convolution neural network. Prediction results across multiple benchmark test sets demonstrate that GeSite is superior or comparable to state-of-the-art prediction methods. The performance comparison on low structure similarity and newly released test proteins demonstrates the robustness and generalization of the method. Detailed experimental results suggest that the advanced performance of GeSite lies in the well-designed nucleic acid-binding protein adaptive language model. Meanwhile, interpretability analysis exposes the perception of the prediction model on various remote and close functional domains, which is the source of its discernment. The data and source code of GeSite are freely accessible at https://github.com/pengsl-lab/GeSite.

Matching journals

●Non-profit ◐University press ○Commercial

The top 6 journals account for 50% of the predicted probability mass.

Only show non-profit

Briefings in Bioinformatics

◐ 326 papers in training set

Journal of Chemical Information and Modeling

● 207 papers in training set

◐ 1061 papers in training set

PLOS Computational Biology

● 1633 papers in training set

Quantitative Biology

○ 11 papers in training set

Genomics, Proteomics & Bioinformatics

◐ 171 papers in training set

50% of probability mass above

Advanced Science

○ 249 papers in training set

Computational and Structural Biotechnology Journal

● 216 papers in training set

Science Bulletin

○ 22 papers in training set

National Science Review

◐ 22 papers in training set

Nature Machine Intelligence

○ 61 papers in training set

Communications Biology

○ 886 papers in training set

BMC Bioinformatics

○ 383 papers in training set

Scientific Reports

○ 3102 papers in training set

Journal of Structural Biology

○ 58 papers in training set

● 4510 papers in training set

IEEE Transactions on Computational Biology and Bioinformatics

● 17 papers in training set

IEEE/ACM Transactions on Computational Biology and Bioinformatics

● 32 papers in training set

Nucleic Acids Research

◐ 1128 papers in training set

Nature Communications

○ 4913 papers in training set

Computers in Biology and Medicine

○ 120 papers in training set

○ 12 papers in training set

Journal of Genetics and Genomics

○ 36 papers in training set

Frontiers in Molecular Biosciences

○ 100 papers in training set

Journal of Chemical Theory and Computation

● 126 papers in training set

IEEE Journal of Biomedical and Health Informatics

● 34 papers in training set

Frontiers in Genetics

○ 197 papers in training set

● 5422 papers in training set

The Journal of Physical Chemistry B

● 158 papers in training set

Science China Life Sciences

○ 26 papers in training set