Back

HERCULES: an integrative deep-learning framework for predicting RNA-binding propensity and mutation effects at single-residue resolution

Fiorentino, J.; Monti, M.; Armaos, A.; Vrachnos, D. M.; Di Rienzo, L.; Tartaglia, G. G.

2026-03-18 biochemistry
10.64898/2026.03.17.712455 bioRxiv
Show abstract

RNA-binding proteins (RBPs) regulate essential aspects of RNA metabolism, yet accurately identifying RNA-binding domains (RBDs) and quantifying the impact of sequence variation on RNA-binding ability remain challenging. Here, we present HERCULES (Hybrid framEwoRk for RNA-binding domain loCalization and mUtation anaLysis using physicochemical and languagE modelS), a unified sequence-based framework for simultaneous RBD localization, global RNA-binding propensity prediction and mutation effect assessment. HERCULES integrates a fine-tuned protein language model with an explicit residue-level physicochemical module, combining global contextual representations with local mutation-sensitive descriptors. On an independent test set, the HERCULES global score discriminates RBPs from non-RBPs with an AUROC of 0.86. At residue resolution, HERCULES outperforms state-of-the-art sequence-based predictors in identifying canonical, non-canonical and putative RBDs across Pfam-annotated proteins. Using a curated dataset of experimentally validated RNA-binding-disrupting mutations, HERCULES correctly classifies 87% of deleterious variants, including single-amino acid substitutions. Evaluation on experimentally resolved protein-RNA complexes further demonstrates robust residue-level performance and improved generalization when contact annotations are augmented with AlphaFold3-predicted complexes. By unifying domain localization and mutation sensitivity within a single sequence-only framework, HERCULES provides a mechanistically interpretable approach for studying RNA-protein interactions. HERCULES is freely available at https://tools.tartaglialab.com/hercules and as an open-source Python package at https://github.com/tartaglialabIIT/hercules.git.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 0.1%
40.6%
2
Nature Communications
4913 papers in training set
Top 9%
15.2%
50% of probability mass above
3
Nature Methods
336 papers in training set
Top 2%
4.1%
4
Bioinformatics
1061 papers in training set
Top 5%
4.1%
5
Cell Systems
167 papers in training set
Top 4%
3.0%
6
PLOS Computational Biology
1633 papers in training set
Top 12%
2.8%
7
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 24%
2.7%
8
Nature Biotechnology
147 papers in training set
Top 3%
2.7%
9
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.1%
10
Genome Biology
555 papers in training set
Top 4%
1.9%
11
Cell Reports Methods
141 papers in training set
Top 2%
1.7%
12
Communications Biology
886 papers in training set
Top 12%
1.4%
13
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
14
Advanced Science
249 papers in training set
Top 14%
1.3%
15
PLOS ONE
4510 papers in training set
Top 61%
1.1%
16
Nature
575 papers in training set
Top 13%
1.0%
17
Journal of Molecular Biology
217 papers in training set
Top 3%
0.9%
18
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
19
Scientific Reports
3102 papers in training set
Top 74%
0.8%
20
Molecular Cell
308 papers in training set
Top 10%
0.8%
21
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.8%
22
Science
429 papers in training set
Top 20%
0.7%
23
Computational and Structural Biotechnology Journal
216 papers in training set
Top 12%
0.5%
24
Cell Reports
1338 papers in training set
Top 36%
0.5%