Methods for Molecular Recognition Computing

Reddy, S. T.

2026-04-03 synthetic biology

10.64898/2026.04.03.716328 bioRxiv

Show abstract

The softmax attention mechanism in transformer architectures (Vaswani et al., 2017) is mathematically identical to the Boltzmann distribution governing molecular binding at thermal equilibrium (Boltzmann, 1877). Luces Choice Axiom (1959) establishes this function - which we term the convergence equation - as the unique function satisfying five axioms of competitive selection: positivity, normalization, unrestricted domain, rank preservation, and independence of irrelevant alternatives. We show that five additional architecture conditions - discrete intermolecular contacts, bilinear energy decomposition, finite competitor pools, thermal equilibrium, and stochastic selection - are satisfied by at least ten biological molecular recognition systems and together prescribe a complete neural architecture: dual encoders, cross-attention, InfoNCE contrastive training, symmetric loss, learned temperature, and cross-attentive decoder. We term this architecture a Specificity Foundation Model (SFM) and specify it for antibody-antigen, TCR-peptide-MHC, transcription factor-DNA, microRNA-mRNA, enzyme-substrate, CRISPR guide RNA-DNA, drug-target, peptide-MHC, receptor-ligand, and RNA-binding protein-RNA recognition. The first implementation (CALM; Lee et al., 2026) achieves antibody-antigen retrieval from approximately 4,000 training pairs with [~]100,000-fold greater data efficiency than comparable contrastive architectures trained without the physics derivation. We classify this as Level 3 architecture-physics alignment and derive three further theoretical results: an exponential scaling law for retrieval accuracy as a function of training data diversity (the MRC scaling law), a two-parameter affinity calibration framework connecting contrastive scores to binding free energies, and a hybrid recursive learning framework for cross-modal reinforcement learning with orthogonal verification. The failure conditions of the framework are analyzed in terms of the validity of equilibrium thermodynamics for molecular binding and the convergence properties of gradient-based parameter estimation.

Methods for Molecular Recognition Computing

Matching journals