Back

Methods for Molecular Recognition Computing

Reddy, S. T.

2026-04-03 synthetic biology
10.64898/2026.04.03.716328 bioRxiv
Show abstract

The softmax attention mechanism in transformer architectures (Vaswani et al., 2017) is mathematically identical to the Boltzmann distribution governing molecular binding at thermal equilibrium (Boltzmann, 1877). Luces Choice Axiom (1959) establishes this function - which we term the convergence equation - as the unique function satisfying five axioms of competitive selection: positivity, normalization, unrestricted domain, rank preservation, and independence of irrelevant alternatives. We show that five additional architecture conditions - discrete intermolecular contacts, bilinear energy decomposition, finite competitor pools, thermal equilibrium, and stochastic selection - are satisfied by at least ten biological molecular recognition systems and together prescribe a complete neural architecture: dual encoders, cross-attention, InfoNCE contrastive training, symmetric loss, learned temperature, and cross-attentive decoder. We term this architecture a Specificity Foundation Model (SFM) and specify it for antibody-antigen, TCR-peptide-MHC, transcription factor-DNA, microRNA-mRNA, enzyme-substrate, CRISPR guide RNA-DNA, drug-target, peptide-MHC, receptor-ligand, and RNA-binding protein-RNA recognition. The first implementation (CALM; Lee et al., 2026) achieves antibody-antigen retrieval from approximately 4,000 training pairs with [~]100,000-fold greater data efficiency than comparable contrastive architectures trained without the physics derivation. We classify this as Level 3 architecture-physics alignment and derive three further theoretical results: an exponential scaling law for retrieval accuracy as a function of training data diversity (the MRC scaling law), a two-parameter affinity calibration framework connecting contrastive scores to binding free energies, and a hybrid recursive learning framework for cross-modal reinforcement learning with orthogonal verification. The failure conditions of the framework are analyzed in terms of the validity of equilibrium thermodynamics for molecular binding and the convergence properties of gradient-based parameter estimation.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 3%
22.5%
2
Cell Systems
167 papers in training set
Top 1%
10.1%
3
Science
429 papers in training set
Top 4%
8.2%
4
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 11%
6.3%
5
Nature Machine Intelligence
61 papers in training set
Top 0.6%
4.9%
50% of probability mass above
6
eLife
5422 papers in training set
Top 20%
4.3%
7
Nature
575 papers in training set
Top 7%
3.6%
8
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
9
Nature Computational Science
50 papers in training set
Top 0.1%
3.6%
10
Science Advances
1098 papers in training set
Top 8%
3.1%
11
Communications Physics
12 papers in training set
Top 0.1%
2.7%
12
ACS Synthetic Biology
256 papers in training set
Top 1%
2.7%
13
Bioinformatics
1061 papers in training set
Top 7%
1.8%
14
Communications Biology
886 papers in training set
Top 7%
1.8%
15
Journal of The Royal Society Interface
189 papers in training set
Top 2%
1.8%
16
Nature Methods
336 papers in training set
Top 5%
1.3%
17
Nature Neuroscience
216 papers in training set
Top 5%
1.2%
18
Neuron
282 papers in training set
Top 7%
1.2%
19
Scientific Reports
3102 papers in training set
Top 71%
0.9%
20
Cell
370 papers in training set
Top 16%
0.8%
21
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 6%
0.6%
22
Advanced Science
249 papers in training set
Top 21%
0.6%