Back

Entropy

MDPI AG

Preprints posted in the last 30 days, ranked by how well they match Entropy's content profile, based on 20 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Spectral requirements for cooperation

Pachter, L.

2026-04-09 evolutionary biology 10.64898/2026.04.07.716994 medRxiv
Top 0.1%
3.6%
Show abstract

We introduce a spectral existence criterion for the evolution of cooperation in the form of the inequality{lambda} maxb > c, where{lambda} max is the leading eigenvalue of an interaction operator encoding population structure, and b and c represent benefit and cost tradeoffs, respectively. Nowaks five rules for the evolution of cooperation correspond to cases in which the cooperation condition reduces to a scalar assortment coefficient. These results follow from the Price equation, which sheds light on a long-standing debate on the role of inclusive fitness and evolutionary dynamics in explaining the evolution of cooperation.

2
Comparing Random and Natural RNA Boltzmann Ensembles

Khan, H.; Garcia-Galindo, P.; Ahnert, S. E.; Dingle, K.

2026-04-01 biophysics 10.64898/2026.03.31.715513 medRxiv
Top 0.2%
1.3%
Show abstract

A morphospace is an abstract space of theoretically possible biological traits, shapes, or property values. It is interesting to explore which parts of a morphospace life occupies, as compared to those parts which could be occupied, but are not. Comparing random and natural non-coding (nc) RNA secondary structures is an established approach to studying morphospace occupation for RNA structures. Most earlier studies have focused on the minimum free energy (MFE) structure, while relatively few have looked at the Boltzmann distribution, describing the ensemble of energetically suboptimal RNA folds. These suboptimal structures may have important roles and functions, and hence should be examined carefully. Here we compare random and natural ncRNA in terms of their Boltzmann distributions, finding that natural RNA tend to have very similar profiles to random RNA, with the main difference being that natural RNA are slightly more energetically stable, except for very short sequences (20 to 30 nucleotides) which tend to be slightly less stable. We infer that natural ncRNA occupy similar parts of the morphospace that random RNA do, indicating that the biophysics of the genotype-phenotype map largely determines the ensemble properties of ncRNA.

3
Residue burial encodes a protein's fold

Grigas, A. T.; Sumner, J.; O'Hern, C. S.

2026-03-31 biophysics 10.64898/2026.03.28.714986 medRxiv
Top 0.3%
1.0%
Show abstract

Protein structure is controlled by a high-dimensional energy landscape, which is a function of all of the atomic coordinates of the protein. Can this landscape be accurately described by a low-dimensional representation? We find that residue core identity, a binary N-dimensional encoding indicating whether each of the N amino acids in a protein is buried in the core or not, can predict the proteins backbone conformation more efficiently than all other representations that we tested. Core identity is 4 times more efficient than previous estimates of the bits per residue needed to encode a proteins native fold, 2 times more efficient than the C contact map, and 1.5 times more efficient than the machine-learned embeddings from FoldSeeks 3Di. Even when the folded structure is unavailable, predicting each residues burial from sequence yields a more accurate estimate of fold quality than predicting pairwise contacts from the same sequence information. Thus, this work emphasizes that the problem of determining a proteins native fold can be re-framed as predicting each residues core identity.

4
An abstract model of nonrandom, non-Lamarckian mutation in evolution using a multivariate estimation-of-distribution algorithm

Vasylenko, L.; Livnat, A.

2026-04-01 evolutionary biology 10.64898/2026.03.30.715341 medRxiv
Top 0.3%
0.9%
Show abstract

At the fundamental conceptual level, two alternatives have traditionally been considered for how mutations arise and how evolution happens: 1) random mutation and natural selection, and 2) Lamarckism. Recently, the theory of Interaction-based Evolution (IBE) has been proposed, according to which mutations are neither random nor Lamarckian, but are influenced by information accumulating internally in the genome over generations. Based on the estimation-of-distribution algorithms framework, we present a simulation model that demonstrates nonrandom, non-Lamarckian mutation concretely while capturing indirectly several aspects of IBE: selection, recombination, and nonrandom, non-Lamarckian mutation interact in a complementary fashion; evolution is driven by the interaction of parsimony and fit; and random bits do not directly encode improvement but enable generalization by the manner in which they connect with the rest of the evolutionary process. Connections are drawn to Darwins observations that changed conditions increase the rate of production of heritable variation; to the causes of bell-shaped distributions of traits and how these distributions respond to selection; and to computational learning theory, where analogizing evolution to learning in accord with IBE casts individuals as examples and places the learned hypothesis at the population level. The model highlights the importance of incorporating internal integration of information through heritable change in both evolutionary theory and evolutionary computation.

5
Analysis of biological networks using Krylov subspace trajectories

Frost, H. R.

2026-03-31 bioinformatics 10.64898/2026.03.29.715092 medRxiv
Top 0.3%
0.8%
Show abstract

We describe an approach for analyzing biological networks using rows of the Krylov subspace of the adjacency matrix. Specifically, we explore the scenario where the Krylov subspace matrix is computed via power iteration using a non-random and potentially non-uniform initial vector that captures a specific biological state or perturbation. In this case, the rows the Krylov subspace matrix (i.e., Krylov trajectories) carry important functional information about the network nodes in the biological context represented by the initial vector. We demonstrate the utility of this approach for community detection and perturbation analysis using the C. Elegans neural network.

6
Postsynaptic integration of excitatory and inhibitory signals based on an adaptive firing threshold

Gambrell, O.; Singh, A.

2026-03-26 neuroscience 10.64898/2026.03.26.714497 medRxiv
Top 0.3%
0.8%
Show abstract

A key component of intraneuronal communication is the modulation of postsynaptic firing frequencies by stochastic transmitter release from presynaptic neurons. The time interval between successive postsynaptic firings is called the inter-spike interval (ISI), and understanding its statistics is integral to neural information processing. We start with a model of an excitatory chemical synapse with postsynaptic neuron firing governed as per a classical integrate-and-fire model. Using a first-passage time framework, we derive exact analytical results for the ISI statistical moments, revealing parameter regimes driving precision in postsynaptic action potential timing. Next, we extended this analysis to include both an excitatory and an inhibitory presynaptic connection onto the same postsynaptic neuron. We consider both a fixed postsynaptic-firing threshold and a threshold that adapts based on the postsynaptic membrane potential history. Our analysis shows that the latter adaptive threshold can result in scenarios where increasing the inhibitory input frequency increases the postsynaptic firing frequency. Moreover, we characterize parameter regimes where ISI noise is hypo-exponential or hyperexponential based on its coefficient of variation being less than or higher than one, respectively.

7
Combinatorial constraints predict that mitochondrial networks contain a large component

Mostov, R.; Lewis, G. R.; Das, M.; Marshall, W. F.

2026-03-27 systems biology 10.64898/2026.03.25.714309 medRxiv
Top 0.4%
0.7%
Show abstract

Mitochondria often form branching membrane networks distributed throughout the cell interior. In many, though not all, cell types, these networks are observed to consist of one large connected component together with many smaller fragments. Why does this pattern arise? Does it reflect a specific biological function, an external biophysical constraint, or something simpler? Using results from extremal graph theory, we prove a new theorem which suggests that, under a sufficiently broad sampling of the space of mitochondria-like graphs, the predominance of three-way junctions makes the appearance of a large component likely. This suggests that, in some settings, a large component may serve as a useful null model for mitochondrial network structure rather than requiring a dedicated explanation. More broadly, our result points towards testable predictions, since systematic deviations from this baseline may help reveal additional constraints or mechanisms shaping mitochondrial morphology.

8
The low-field effect in radical pairs: a zero-field singlet-triplet basis picture

Woodward, J. R.

2026-04-08 biophysics 10.64898/2026.04.05.716627 medRxiv
Top 0.5%
0.7%
Show abstract

We present a new formulation of the low-field effect (LFE) in spin-correlated radical pairs based on a zero-field singlet-triplet basis for the isotropic spin Hamiltonian. The aim is to provide a description that is both formally rigorous and mechanistically transparent, especially in the regime of weak magnetic fields such as the geomagnetic field. For the standard model radical pair containing a single spin [Formula] nucleus, we show that the conventional singlet-triplet basis obscures the distinct dynamical roles of the hyperfine and Zeeman interactions. In the zero-field S-T basis, by contrast, the mechanism separates cleanly: isotropic hyperfine coupling mixes singlet-doublet and triplet-doublet states, whereas the weak-field Zeeman interaction mixes triplet-quartet and triplet-doublet states without directly introducing an additional singlet-triplet coupling. The LFE is therefore revealed as a sequential process in which a weak field unlocks access from a triplet-only manifold to a singlet-accessible triplet manifold, from which hyperfine-driven singlet-triplet interconversion can occur. We then generalize this picture to radical pairs with arbitrary isotropic hyperfine structures by identifying maximal, interior, and, when present, minimal triplet-only manifolds in the zero-field spectrum. Finally, we introduce a practical blockwise dark-state recruitment measure for the triplet-only zero-field state space made singlet-accessible by a weak field, and show how this quantity depends on hyperfine symmetry, including the effects of equivalent nuclei. The resulting framework provides both a simple physical picture of the LFE and a general route to estimating its structural upper bound for arbitrary radical pairs.

9
Triangular Invariant Sets for Containment of Drug Resistance Under Evolutionary Therapy

Hernandez Vargas, E. A.

2026-03-27 evolutionary biology 10.64898/2026.03.26.714636 medRxiv
Top 0.5%
0.6%
Show abstract

Evolutionary therapies regulate heterogeneous populations by altering selective pressures through treatment sequences in cancer and infections. This letter develops an invariant-set framework for treatment-induced containment based on positive triangular invariant sets. For periodically switched systems, sufficient conditions are derived for the existence of such invariant regions. Robustness with respect to mutation is established by showing that the invariant simplex persists under small perturbations of the subsystem matrices. In the two-phenotype case, the analysis yields an explicit mutation threshold that separates regimes in which therapy cycling maintains containment from regimes in which mutation can enable evolutionary escape. Simulations illustrate the geometry of the invariant sets and the role of mutation and dwell time in containment robustness.

10
Sparse Stimulus Generation Improves Reverse Correlation Efficiency and Interpretability

Gargano, J. A.; Rice, A.; Chari, D. A.; Parrell, B.; Lammert, A. C.

2026-03-26 neuroscience 10.64898/2026.03.24.714012 medRxiv
Top 0.6%
0.5%
Show abstract

Reverse correlation is a widely-used and well-established method for probing latent perceptual representations in which subjects render subjective preference responses to ambiguous stimuli. Stimuli are purposefully designed to have no direct relationship with the target representation (e.g., they are randomly-generated), a property which makes each individual stimulus minimally informative toward reconstructing the target, and often difficult to interpret for subjects. As a result, a large number of stimulus-response pairs must be gathered from a given subject in order for reconstructions to be of sufficient quality, making the task fatiguing. Recent work has demonstrated that the number of trials needed can be substantially reduced using a compressive sensing framework that incorporates the assumption that the target representation can be sparsely represented in some basis into the reconstruction process. Here, we introduce an alternative method that incorporates the sparsity assumption directly into stimulus generation, which holds promise not only for improving efficiency, but also for improving the interpretability of stimuli from subjects perspective. We develop this new method as a mathematical variation of the compressive sensing approach, before conducting one simulation study and two human subjects experiments to assess the benefits of this method to reconstruction quality, sample size efficiency, and subjective interpretability. Results show that sparse stimulus generation improves all three of these areas relative to conventional reverse correlation approaches, and also relative to compressive sensing in most conditions.

11
Methods for Molecular Recognition Computing

Reddy, S. T.

2026-04-03 synthetic biology 10.64898/2026.04.03.716328 medRxiv
Top 0.6%
0.5%
Show abstract

The softmax attention mechanism in transformer architectures (Vaswani et al., 2017) is mathematically identical to the Boltzmann distribution governing molecular binding at thermal equilibrium (Boltzmann, 1877). Luces Choice Axiom (1959) establishes this function - which we term the convergence equation - as the unique function satisfying five axioms of competitive selection: positivity, normalization, unrestricted domain, rank preservation, and independence of irrelevant alternatives. We show that five additional architecture conditions - discrete intermolecular contacts, bilinear energy decomposition, finite competitor pools, thermal equilibrium, and stochastic selection - are satisfied by at least ten biological molecular recognition systems and together prescribe a complete neural architecture: dual encoders, cross-attention, InfoNCE contrastive training, symmetric loss, learned temperature, and cross-attentive decoder. We term this architecture a Specificity Foundation Model (SFM) and specify it for antibody-antigen, TCR-peptide-MHC, transcription factor-DNA, microRNA-mRNA, enzyme-substrate, CRISPR guide RNA-DNA, drug-target, peptide-MHC, receptor-ligand, and RNA-binding protein-RNA recognition. The first implementation (CALM; Lee et al., 2026) achieves antibody-antigen retrieval from approximately 4,000 training pairs with [~]100,000-fold greater data efficiency than comparable contrastive architectures trained without the physics derivation. We classify this as Level 3 architecture-physics alignment and derive three further theoretical results: an exponential scaling law for retrieval accuracy as a function of training data diversity (the MRC scaling law), a two-parameter affinity calibration framework connecting contrastive scores to binding free energies, and a hybrid recursive learning framework for cross-modal reinforcement learning with orthogonal verification. The failure conditions of the framework are analyzed in terms of the validity of equilibrium thermodynamics for molecular binding and the convergence properties of gradient-based parameter estimation.

12
Learning gene interactions from tabular gene expression data using Graph Neural Networks

Boulougouri, M.; Nallapareddy, M. V.; Vandergheynst, P.

2026-03-23 bioinformatics 10.64898/2026.03.19.712949 medRxiv
Top 0.7%
0.5%
Show abstract

Gene interactions form complex networks underlying disease susceptibility and therapeutic response. While bulk transcriptomic datasets offer rich resources for studying these interactions, applying Graph Neural Networks (GNNs) to such data remains limited by a lack of methodological guidance, especially for constructing gene interaction graphs. We present REGEN (REconstruction of GEne Networks), a GNN-based framework that simultaneously learns latent gene interaction networks from bulk transcriptomic profiles and predicts patient vital status. Evaluated across seven cancer types in the TCGA cohort, REGEN outperforms baseline models in five datasets and provides robust network inference. By systematically comparing strategies for initializing gene-gene adjacency matrices, we derive practical guidelines for GNN application to bulk transcriptomics. Analysis of the learned kidney cancer gene-network reveals cancer-related pathways and biomarkers, validating the models biological relevance. Together, we establish a principled approach for applying GNNs to bulk transcriptomics, enabling improved phenotype prediction and meaningful gene network discovery.

13
Fiber optical parametric amplification of low-photon-flux microscopy signals

Demas, J.; Tan, L.; Ramachandran, S.

2026-03-30 biophysics 10.64898/2026.03.25.714345 medRxiv
Top 0.7%
0.4%
Show abstract

The performance of a laser scanning microscope inevitably depends on the performance of the point detector. As laser scanning approaches aim to penetrate deeper in tissue, there is a commensurate need for detectors that can operate with high sensitivity, bandwidth, and dynamic range at near-infrared wavelengths where scattering is reduced. Here, we demonstrate that fiber optical parametric amplification can be used to boost low-power microscopy signals to levels that can be detected by near-infrared photodiodes without introducing prohibitive noise. We construct amplifiers that achieve >50 dB of parametric gain at wavelengths within the third near-infrared transparency window and have similar sensitivity to near-infrared photomultiplier tubes. Furthermore, these amplifiers outperform detection with a photodiode and subsequent electrical amplification, providing a factor of 10-100-fold improvement in sensitivity. We demonstrate amplifier bandwidths up to ~1.6 GHz, a factor of 10 faster than conventional detectors, including near-infrared photo-multiplier tubes, with sensitivity of ~8 nW (corresponding to ~20 photons/pixel). Finally, the increased performance of the optical amplifier is confirmed in diagnostic imaging experiments where >10x less power is required to achieve the same signal-to-noise ratio and contrast as images using electrical amplification. Accordingly, fiber optical parametric amplification is a new path forward for extending the performance of laser scanning microscopes in the near infrared.

14
A neurocomputational model of observation-based decision making with a focus on trust

Hassanejad Nazir, A.; Hellgren Kotaleski, J.; Liljenström, H.

2026-03-26 neuroscience 10.64898/2026.03.24.713845 medRxiv
Top 0.7%
0.4%
Show abstract

As social beings, humans make decisions partly based on social interaction. Observing the behavior of others can lead to learning from and about them, potentially increasing trust and prompting trust-based behavioral changes. Observation-based decision making involves different neural structures. The orbitofrontal cortex (OFC) and lateral prefrontal cortex (LPFC) are known as neural structures mainly involved in processing emotional and cognitive decision values, respectively, while the anterior cingulate cortex (ACC) plays a pivotal role as a social hub, integrating the afferent expectancy signals from OFC and LPFC. This paper presents a neurocomputational model of the interplay between observational learning and trust, as well as their role in individual decision-making. Our model elucidates and predicts the emotional and rational behavioral changes of an individual influenced by observing the action-outcome association of an alleged expert. We have modeled the neurodynamics of three cortical structures (OFC, LPFC, and ACC) and their interactions, where the neural oscillatory properties, modeled with Dynamic Bayesian Probability, represent the observers attitude towards the expert and the decision options. As an example of an everyday behavioral situation related to climate change, we use the choice of transportation between home and work. The EEG-like simulation outputs from our model represent the presumed brain activity of an individual making such a choice, assuming the decision-maker is exposed to social information.

15
Pattern dynamics on mass-conserved reaction-diffusion compartment model

Sukekawa, T.; Ei, S.-I.

2026-03-29 biophysics 10.64898/2026.03.26.714357 medRxiv
Top 0.8%
0.4%
Show abstract

Mass-conserved reaction-diffusion systems are used as mathematical models for various phenomena such as cell polarity. Numerical simulations of this system present transient dynamics in which multiple stripe patterns converge to spatially monotonic patterns. Previous studies indicated that the transient dynamics are driven by a mass conservation law and by variations in the amount of substance contained in each pattern, which we refer to as "pattern flux". However, it is challenging to mathematically investigate these pattern dynamics. In this study, we introduce a reaction-diffusion compartment model to investigate the pattern dynamics in view of the conservation law and the pattern flux. This model is defined on multiple intervals (compartments), and diffusive couplings are imposed on each boundary of the compartments. Corresponding to the transient dynamics in the original system, we consider the dynamics around stripe patterns in the compartment model. We derive ordinary differential equations describing the pattern dynamics of the compartment model and analyze the existence and stability of equilibria for the reduced ODE with respect to the boundary parameters. For a specific parameter setting, we obtained results consistent with previous studies. Moreover, we present that the stripe patterns in the compartment model are potentially stabilized by changing the parameter, which is not observed in the original system. We expect that the methodology developed in this paper is extendable to various directions, such as membrane-induced pattern control.

16
Visualizing and sonifying neurodata (ViSoND) for enhanced observation

Blankenship, L.; Sterrett, S. C.; Martins, D. M.; Findley, T. M.; Abe, E. T. T.; Parker, P. R. L.; Niell, C.; Smear, M. C.

2026-03-24 neuroscience 10.64898/2026.03.21.713430 medRxiv
Top 0.8%
0.4%
Show abstract

Neuroscience needs observation. Observation lets us evaluate data quality, judge whether models are biologically realistic, and generate new hypotheses. However, high-dimensional behavioral and neural data are too complex to be easily displayed and eye-tested. Computational methods can reduce the dimensionality of data and reveal statistically robust dynamical structure but often yield results that are difficult to relate back to the underlying biology. In addition, the choice of what parameters to quantify may not capture unexpectedly relevant aspects of the data. To supplement quantification with enhanced qualitative observation, we developed Visualization and Sonification of NeuroData (ViSoND), an open-source approach for displaying multiple data streams using video and sonification. Sonification is nothing new to neuroscience. Scientists have sonified their physiological preparations since Lord Adrians earliest recordings. We extend this tradition by mapping multiple physiological datastreams to musical notes using MIDI. Synchronizing MIDI to video provides an opportunity to watch an animals movement while listening to physiological signals such as action potentials. Here we provide two demonstrations of this approach. First, we used ViSoND to interpret behavioral structure revealed by a computational model trained on the breathing rhythms of freely behaving mice. Second, ViSoND revealed patterns of neural activity in mouse visual cortex corresponding to eye blinks, events that were previously filtered out of analysis. These use cases show that ViSoND can supplement quantitative rigor with observational interpretability. Additionally, ViSoND provides an accessible way to display data which may broaden the audience for communication of neuroscientific findings.

17
Effects of protein interface mutations on protein quality and affinity

de Kanter, J. K.; Smorodina, E.; Minnegalieva, A.; Arts, M.; Blaabjerg, L. M.; Frolenkova, M.; Rawat, P.; Wolfram, L.; Britze, H.; Wilke, Y.; Weissenborn, L.; Lindenburg, L.; Engelhart, E.; McGowan, K. L.; Emerson, R.; Lopez, R.; van Bemmel, J. G.; Demharter, S.; Spreafico, R.; Greiff, V.

2026-03-26 molecular biology 10.64898/2026.03.24.713863 medRxiv
Top 0.9%
0.3%
Show abstract

Accurately modeling antibody-antigen interactions requires distinguishing intrinsic binding affinity ("protein-interaction") from protein biophysical properties ("protein-quality"), including folding, stability, and expression. However, high-throughput mutational measurements commonly used to train and benchmark computational models often conflate these effects, obscuring the true determinants of molecular recognition. Here, we present an experimental and analytical framework to disentangle protein-interaction effects from protein-quality effects in single-domain antibody (VHH)-antigen binding. Using a large-scale deep mutational scanning (DMS) dataset spanning four VHH-antigen complexes, with single and double mutations in both partners, we introduce control binders to quantify protein-quality changes independently of protein-interaction. This enables decomposition of experimentally measured affinity into protein-interaction and protein-quality components at scale. Leveraging the disentangled dataset, we evaluated state-of-the-art structure- and sequence-based models for protein-quality and protein-interaction prediction and show that their performance largely reflects protein-quality rather than protein-interaction effects. Our results highlight a major confounder in current datasets and suggest that accounting for protein-quality will be essential for training next-generation affinity-prediction models. Nomenclature Antibody related termsO_LIPrimary VHH: The VHH of a VHH-antigen complex for which the paratope and the epitope weremutated. C_LIO_LIControl VHH: A second VHH that binds to the same antigen as the primary VHH but has non-overlapping epitope positions and therefore does not bind to any of the mutated antigen positions. C_LI Affinity-related termsO_LIReal Affinity: "The strength of the interaction between two [...] molecules that bind reversibly (interact)" 1. In the context of antibody-antigen binding, it quantifies interactions between active proteins (which are expressed and correctly folded 2 and are therefore functionally and biologically active (see below). It is commonly quantified by the equilibrium dissociation constant, KD. C_LIO_LIObserved affinity ({degrees}KD): The interaction strength experimentally measured between two molecules. Unlike real affinity, this value is confounded by the biophysical properties of the individual binding partners, specifically their folding, stability, and expression levels. Consequently, the observed affinity often differs from the real/intrinsic affinity if a significant fraction of the protein population is inactive 3. NOTE: Unless otherwise specified, {degrees}KD is reported in - log10 space. For example, a {degrees}KD of -9 corresponds to 10-9M or 1nM. C_LIO_LIChange in observed affinity ({Delta}{degrees}KD): The shift in the observed affinity between two proteins upon mutation, reported as the log10-transformed fold change. A value of 1 reflects a 10-fold difference, a value of 2 a 100-fold difference, etc. This aggregate change resolves into two distinct biophysical components 2, 4: O_LIProtein-interaction change: The change in the intrinsic thermodynamic affinity between the two binding partners, each in its active state (i.e., the specific change in interface Gibbs free energy because both enthalpy and entropy are considered). C_LIO_LIProtein-quality change: The change in the fraction of the mutated protein population that is biologically active - meaning it is expressed, correctly folded, and stable 2, 5. O_LIFolding: The process that guides the polypeptide chain toward its native conformation, which is a prerequisite for forming a functional binding site. C_LIO_LIStability: The thermodynamic capacity to maintain the folded structure over time and under physiological conditions. Stability (decrease in Gibbs free energy from the unfolded to the folded state) ensures the binding interface remains intact and prevents competing processes such as aggregation 6. C_LIO_LIExpression: The steady-state abundance of the protein. This is largely dependent on proper folding and stability, as cellular quality control mechanisms degrade proteins that fail to fold or remain stable at functional concentrations. C_LI C_LI C_LIO_LIChange in relative affinity ({Delta}{Delta}{degrees}KD): the difference between the {Delta}{degrees}KD of the primary VHH compared to the control VHH for a given epitope mutation. C_LI Model-related termsO_LIESM-IF1 sc: Single-chain (sc) structure-conditioned inverse folding model (ESM-IF1), using the isolated monomer structure of the mutated protein: either the VHH or the antigen 7. C_LIO_LIESM-IF1 mc: Multi-chain (mc) structure-conditioned model (ESM-IF1), using the full complex structure (both antibody and antigen) 7. C_LIO_LIStability prediction score: Score that represents the predicted change in stability based on a single mutation, normally represented as {Delta}{Delta}G. C_LI

18
Spacing effect improves generalization in biological and artificial systems

Sun, G.; Huang, N.; Yan, H.; Zhou, J.; Li, Q.; Lei, B.; Zhong, Y.; Wang, L.

2026-03-23 neuroscience 10.64898/2025.12.18.695340 medRxiv
Top 0.9%
0.3%
Show abstract

Generalization is a fundamental criterion for evaluating learning effectiveness, a domain where biological intelligence excels yet artificial intelligence continues to face challenges. In biological learning and memory, the well-documented spacing effect shows that appropriately spaced intervals between learning trials can significantly improve behavioral performance. While multiple theories have been proposed to explain its underlying mechanisms, one compelling hypothesis is that spaced training promotes integration of input and innate variations, thereby enhancing generalization to novel but related scenarios. Here we examine this hypothesis by introducing a bio-inspired spacing effect into artificial neural networks, integrating input and innate variations across spaced intervals at the neuronal, synaptic, and network levels. These spaced ensemble strategies yield significant performance gains across various benchmark datasets and network architectures. Biological experiments on Drosophila further validate the complementary effect of appropriate variations and spaced intervals in improving generalization, which together reveal a convergent computational principle shared by biological learning and machine learning.

19
Benchmark of biomarker identification and prognostic modeling methods on diverse censored data

Fletcher, W. L.; Sinha, S.

2026-04-01 bioinformatics 10.64898/2026.03.29.715113 medRxiv
Top 1.0%
0.3%
Show abstract

The practices of identifying biomarkers and developing prognostic models using genomic data has become increasingly prevalent. Such data often features characteristics that make these practices difficult, namely high dimensionality, correlations between predictors, and sparsity. Many modern methods have been developed to address these problematic characteristics while performing feature selection and prognostic modeling, but a large-scale comparison of their performances in these tasks on diverse right-censored time to event data (aka survival time data) is much needed. We have compiled many existing methods, including some machine learning methods, several which have performed well in previous benchmarks, primarily for comparison in regards to variable selection capability, and secondarily for survival time prediction on many synthetic datasets with varying levels of sparsity, correlation between predictors, and signal strength of informative predictors. For illustration, we have also performed multiple analyses on a publicly available and widely used cancer cohort from The Cancer Genome Atlas using these methods. We evaluated the methods through extensive simulation studies in terms of the false discovery rate, F1-score, concordance index, Brier score, root mean square error, and computation time. Of the methods compared, CoxBoost and the Adaptive LASSO performed well in all metrics, and the LASSO and elastic net excelled when evaluating concordance index and F1-score. The Benjamini-Hoschberg and q-value procedures showed volatile performances in controlling the false discovery rate. Some methods performances were greatly affected by differences in the data characteristics. With our extensive numerical study, we have identified the best performing methods for a plethora of data characteristics using informative metrics. This will help cancer researchers in choosing the best approach for their needs when working with genomic data.

20
Estimating Bayesian phylogenetic information content using geodesic distances

Milkey, A.; Lewis, P. O.

2026-04-01 evolutionary biology 10.64898/2026.03.31.715656 medRxiv
Top 1%
0.3%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWA new Bayesian measure of phylogenetic information content is introduced based on geodesic distances in treespace. The measure is based on the relative variance of phylogenetic trees sampled from the posterior distribution compared to the prior distribution. This ratio is expected to equal 1 if there is no information in the data about phylogeny and 0 if there is complete information. Trees can be scaled to have the same mean tree length to avoid dominance by edge length information and focus on topological information. The method scales well, requiring only that a valid sample can be obtained from both prior and posterior distributions. We show how dissonance (information conflict) among data sets can also be estimated. Both simulated and empirical examples are provided to illustrate that the new approach produces sensible and intuitive results.