Back

Joint enzyme-reaction retrieval and catalytic optima prediction via multimodal fusion

Cai, Y.; Yang, F.; Liu, J.

2026-05-21 bioinformatics
10.64898/2026.05.19.726405 bioRxiv
Show abstract

MotivationEnzyme-reaction retrieval is increasingly used to prioritize candidate biocatalysts for experimental follow-up, where useful recommendations should indicate not only whether an enzyme can catalyze a target reaction but also under which pH and temperature conditions it should be tested. Existing retrieval models optimize catalytic matching scores, whereas catalytic optima predictors are typically developed as enzyme-level regressors because public pH and temperature annotations are sparse and often available only at the enzyme or EC-associated record level. This separation leaves a practical gap: high-ranking enzyme-reaction pairs are not evaluated for condition suitability, and enzyme-level optima predictions do not use the reaction context being retrieved. ResultsWe present GERO, a multimodal fusion framework that uses feature-gated cross-modal fusion to integrate global enzyme sequence semantics, sequence-derived pocket geometry, and molecular reaction representations for condition-aware enzyme-reaction retrieval and catalytic optima estimation with reaction context. To evaluate this setting, we define the tolerance-restricted hit rate (Hit@k-TR), which requires both top-k retrieval of the correct candidate and condition prediction within predefined tolerances. Across enzyme- and reaction-similarity splits, GERO improves Hit@k-TR over two-stage retrieval-then-prediction baselines. Representative benchmark examples and an iodinin biosynthesis case study further illustrate GEROs ability to provide candidate rankings together with plausible assay-condition estimates for downstream experimental prioritization. Availability and implementationSource code is available at https://github.com/ykxhs/GERO. Contactliujuan@whu.edu.cn Supplementary informationSupplementary data are available at XXXX online.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.7%
28.7%
2
Nature Communications
4913 papers in training set
Top 5%
19.3%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.9%
4.5%
50% of probability mass above
4
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.5%
5
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
3.8%
6
Advanced Science
249 papers in training set
Top 6%
3.2%
7
ACS Synthetic Biology
256 papers in training set
Top 1%
3.2%
8
PLOS Computational Biology
1633 papers in training set
Top 11%
3.2%
9
Nucleic Acids Research
1128 papers in training set
Top 7%
2.8%
10
Journal of Cheminformatics
25 papers in training set
Top 0.3%
1.7%
11
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.5%
12
Cell Reports Methods
141 papers in training set
Top 3%
1.4%
13
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
14
Bioinformatics Advances
184 papers in training set
Top 4%
1.0%
15
Patterns
70 papers in training set
Top 2%
0.8%
16
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.8%
17
Plant Communications
35 papers in training set
Top 1%
0.8%
18
Communications Biology
886 papers in training set
Top 22%
0.8%
19
Cell Systems
167 papers in training set
Top 12%
0.8%
20
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
21
Scientific Reports
3102 papers in training set
Top 77%
0.7%
22
Chemical Science
71 papers in training set
Top 2%
0.7%
23
Communications Chemistry
39 papers in training set
Top 1%
0.7%