Entropy
○ MDPI AG
Preprints posted in the last 90 days, ranked by how well they match Entropy's content profile, based on 20 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Mukherjee, S.; Srivastava, D.; Patra, N.
Show abstract
Protein-DNA complexes are involved in vital cellular functions like gene regulation, replication, transcription, packaging, rearrangement, and damage repair. In this work, streamlined geometric formalism for computing the absolute binding free energy was used to obtain chemical accurate in silico estimation of binding free energy of three Protein-DNA complexes. Additionally, molecular interactions between Protein and DNA involved hydrogen bonds, electrostatic, van der Waals, and hydrophobic interactions. Using this formalism, researcher can obtain the absolute binding free energy for a Protein-DNA complex with remarkable accuracy and modest computational cost.
Pachter, L.
Show abstract
We introduce a spectral existence criterion for the evolution of cooperation in the form of the inequality{lambda} maxb > c, where{lambda} max is the leading eigenvalue of an interaction operator encoding population structure, and b and c represent benefit and cost tradeoffs, respectively. Nowaks five rules for the evolution of cooperation correspond to cases in which the cooperation condition reduces to a scalar assortment coefficient. These results follow from the Price equation, which sheds light on a long-standing debate on the role of inclusive fitness and evolutionary dynamics in explaining the evolution of cooperation.
Lyu, Z.; Kolomeisky, A.
Show abstract
One of the most critical steps in human reproduction is the selection of the dominant follicle when a single follicle is chosen from a large group of follicles to ovulate. Although this process involves complex hormonal regulation, the complete microscopic picture of unique selectivity remains unclear. We propose a novel stochastic mechanism for dominant follicle selection that incorporates the actions of the most relevant hormones, follicle-stimulating hormone (FSH) and estradiol. Our theoretical picture suggests the following sequence of events. As soon as the FSH concentration reaches the critical threshold, one of the available follicles is randomly selected, which immediately stimulates the production of estradiol, which, via a negative feedback mechanism, suppresses further FSH production, lowering its concentration below the critical threshold. This suppression limits the time window for the possible second follicle selection event, allowing only a single follicle to be selected. Based on this picture, a minimal quantitative theoretical model of dominant follicle selection is developed and analyzed using analytical calculations and computer simulations. Theoretical analysis shows how the interplay between different parameters that govern follicle selection leads to high selectivity. Our theoretical approach can explain some key known observations, providing a quantitative tool for analyzing biological reproduction phenomena.
Pena Fernandez, M.; Lloret Iglesias, L.; Marco de Lucas, J.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWOne of the most compelling ideas for bridging neuroscience and artificial neural networks is the establishment of a framework based on three main components: network architecture, optimization mechanism, and loss (or objective) function to be minimized. While the first two components have been extensively explored, the definition of a loss or objective function in neuroscience has been addressed less thoroughly, often from perspectives such as predictive coding. In this work, we propose an elementary loss function grounded in the comparison of neuronal responses to two signals: an external one, used for learning, and an internal one, reflecting the acquired knowledge. The loss function is thus simply the basic difference between the two, which, in terms of logical signals, corresponds to a well-known non-linearly separable function: the XOR function. We illustrate with a computational example how a binarized image recognition algorithm can be straightforwardly implemented in an autoencoder, and we show how a neuronal motif organized around an inhibitory neuron could implement such XOR operation and provide a feedback signal that makes optimization possible.
Powell, A.
Show abstract
A methodology for computationally unstructuring proteins is described and the results of its application to a variety of proteins analyzed and discussed. Some proteins prove more susceptible than others, and fold topology plays a part in this. Alpha helical structure is found to be generally somewhat robust, and, perhaps unsurprisingly, unstructuring often begins at exposed chain termini. Phosphofructokinase-1 and phosphofructokinase-2, which have similar sizes but different fold topologies, are found to differ markedly in their unstructuring behaviour.
Imtiyaz, S.
Show abstract
Biological organisation is inherently multi-level: molecular processes, membrane dynamics, cellular geometry and tissue context reciprocally constrain one another, often through boundary-mediated feedback. A recurring theme in theoretical biology is that such organisation is not well captured by models that assume a fixed repertoire of variables and a pre-given state space: what counts as a relevant state description can depend on organisational context and history. The principle of biological relativity further sharpens the same challenge from a different angle, emphasising that no level is causally privileged and that cross-level feedback can close into circular causality. These lines of work motivates for a structural multi-level semantics for modeling the biological pathways. We introduce a constraint-based semantic framework that distinguishes an evolving organisational scaffold--the admissible multi-level patterns and interfaces--from the pathways that traverse and coordinate them. This separation yields mathematical, loop-level diagnostics for boundary-driven circular causality: it identifies when organisational trajectories induce persistent reparameterisations of local state descriptions, and it classifies cyclic regimes into reversible loops, stable history-dependent loops, and unique (rare) organisational reconfigurations. The framework is accompanied by a systematic crosswalk to mainstream causal, dynamical and computational approaches, clarifying what is gained when interfaces and local-global consistency are treated as semantic, rather than purely parametric, structure. We demonstrate the approach on a canonical excitable-cell exemplar by modelling a single Hodgkin spike as a cross-level interface loop coupling membrane, molecular and cellular constraints. Without re-deriving Hodgkin-Huxley kinetics, the resulting diagnostics provide an explicit semantics for boundary-mediated feedback and spike-induced history dependence, including when cyclic activity imprints persistent changes in effective excitability. Together, the case study and comparisons position constraint semantics as a practical mathematical layer for multi-level biological organisation: compatible with existing mechanistic models, yet designed to expose circular causal closure and organisation-dependent state descriptions that standard formalisms typically leave implicit. AMS subject classifications92C30, 92C46, 92B05, 55U10, 55R10
Polo, C.; Thandi, A.; Chandler, O.; Lugert, P.; Hammond, A.; Madhi, T.; Ayala, M.; Berrigan, A. J.; Chen, A.; Gillett, K.; Sareen, M.; Yu, S.; Xiong, S.; Zuo, Y.-y.; Sanjeev, S.
Show abstract
Deoxyribonucleic acid (DNA) stands as one of the most foundational concepts in life sciences, essential for students to master. However, when surveyed about the forces that stabilize the double-stranded DNA structure, many students exhibited a conceptual bias-- favoring base pairing as the primary stabilizing force, while overlooking the equally critical role of base stacking interactions. To investigate the origins of this misconception, students conducted a comprehensive analysis of 35 widely used textbooks. Their findings revealed that one-third of these texts explicitly emphasized base pairing as the sole stabilizing force in their written content. Furthermore, two-thirds of the textbook contained illustrations that reinforced this bias, visually highlighting base pairing while neglecting base stacking. Recognizing this bias, students embarked on a literature review to gain a more accurate and nuanced understanding of DNA stabilization. Through this research, we identified three concept areas--DNA structure and function, environmental effects on DNA, and DNA-protein interactions--to illustrate how base pairing and base stacking work in concert to stabilize the antiparallel double helical structure of DNA. This interplay between base pairing and base stacking is crucial not only for the structural integrity of DNA, but also for its biological functionality. By addressing this conceptual bias, we aim to promote a more balanced and scientifically accurate representation of DNA stabilization in educational materials.
Scheres, S.
Show abstract
Several proteins from the human proteome have been observed to adopt multiple distinct amyloid filaments, and specific protofilament folds are associated with different diseases. Thereby, it has become necessary to compare pairs of amyloid structures of a given protein. This paper describes the amyloid packing difference (APD), which quantifies the difference between such a pair as the percentage of residues that are involved in unique cross-{beta} packing interactions or that have different side chain orientations relative to the {beta}-strands. Clustering of -synuclein protofilament folds on pairwise APD values recapitulates previously reported clustering based on structural superpositions. Any pair of known protofilament folds of the prion protein, tau, -synuclein, TDP-43 or TAF15 from different diseases have APD values above 20%, whereas all pairs of structures that have been associated with the same disease have APD values below 40%. These observations provide context for the interpretation of APD values of new comparisons.
Perez, G. J. G.; Perez-Rodriguez, R.; Gonzalez, A.
Show abstract
Common knowledge states that the spontaneous somatic evolution of a normal tissue may lead to a tumor. Once the tumor is formed, it naturally evolves towards a state of higher malignancy. On the other hand, perfect gene expression markers for normal tissue and tumor--the so-called N-genes and T-genes--were recently introduced. We join these two pieces of knowledge in order to argue that: 1) Only N-markers participate in the spontaneous dynamics of a normal tissue. The number of active markers decreases as the tissue approaches the transition point where it becomes a tumor. 2) Only T-markers participate in the spontaneous dynamics of tumors. The number of markers increases as the tumor becomes more malignant. 3) Both sets of genes are connected by the so-called NT-genes, i.e., genes that are simultaneously N- and T-markers. They should play a crucial role at the transition point and, possibly, when the tumor is exposed to a drug or therapy. 4) The pathways or mechanisms protecting the normal tissue from becoming a tumor may be described by a small perfect panel of N-genes. 5) The pathways or mechanisms guiding the evolution of tumors in a tissue may be described by a small perfect panel of T-genes. We illustrate the above statements with the analysis of expression data for prostate adenocarcinoma, one of the most heterogeneous tumors. In this case, there are about 1000 N-genes and 6000 T-genes, and the perfect N- and T-panels contain 11 and 8 genes, respectively. Additionally, we provide examples from lung adenocarcinoma and liver hepatocarcinoma.
Vardanyan, V. H.; Haldane, A.; Hwang, H.; Coskun, D.; Lihan, M.; Miller, E. B.; Friesner, R. A.; Levy, R. M.
Show abstract
Kinase family proteins constitute the second largest protein class targeted in drug development efforts, most prominently to treat cancer, but also several other diseases associated with kinase dysfunction. In this work we focus on type II kinase inhibitors which bind to the "classical" inactive conformation of the protein kinase catalytic domain where the DFG motif has a "DFG-out" orientation and the activation loop is folded. Many Tyrosine kinases (TKs) exhibit strong binding affinity with a wide spectrum of type II inhibitors while serine/threonine kinases (STKs) often bind more weakly. Recent work suggests this difference is largely due to differences in the folded to extended conformational equilibrium of the activation loop between TKs vs. STKs. The binding affinity of a type II inhibitor to its kinase target can be decomposed into a sum of two contributions: (1) the free energy cost to reorganize the protein from the active to inactive state, and (2) the binding affinity of the type II inhibitor to the inactive kinase conformation. In previous work we used a Potts statistical energy potential based on sequence co-variation to thread sequences over ensembles of active and inactive kinase structures. The threading function was used to estimate the free energy cost to reorganize kinases from the active to classical inactive conformation, and we showed that this estimator is consistent with the results of molecular dynamics free energy simulations for a small set of STKs and TKs. In the current study, we analyze the results of a large-scale study of the binding affinities of 50 type II inhibitors to 348 kinases, of which the results for 16 of the 50 type II inhibitors were reported in an earlier study (the "Davis dataset"); the binding data for the remaining 34 type II inhibitors to the panel of 348 kinases were recently obtained (the "Schrodinger dataset"). We use the Potts statistical energy model to investigate the contribution of protein reorganization to the selectivity of the large kinase panel against the set of 50 type II inhibitors, and find that protein reorganization makes a significant contribution to the selectivity. The AUC of the receiver-operator characteristic curve is [~]0.8. We report the results of an internal "blind test", that shows how Potts threading energies can provide more accurate estimates of kinase selectivity than corresponding predictions using experimental results of small sample size. We discuss why two STK phylogenetic kinase families, STE and CMGC, appear to contain many outliers, and how to improve the ability to predict kinase selectivity with a more complete analysis of the kinase conformational landscape. We compare the performance of Potts threading for predicting binding properties of the large set of (50) Type II inhibitors to 348 kinases, with those of a sequence-based purely machine learning model, DeepDTAGen, a publicly available machine learning model that was trained on the complete Davis dataset, including both Type I and Type II kinase inhibitors. We observe that DeepDTAGen performs well on binding predictions for the 16 type II inhibitors in the Davis dataset, but performs poorly on binding predictions for the 34 type II inhibitors against 348 kinases in the Schrodinger dataset.
Emami, B.; Dyk, W.; Haycraft, D.; Robinson, J.; Nguyen, L.; Miri, M.-A.; Huggins, D. J.
Show abstract
Computational protein design is a foundational challenge in biotechnology, advantageous for engineering novel enzymes and therapeutics, yet its combinatorial complexity remains a bottleneck for classical optimization. We formulate fixed-backbone computational protein design as a quadratic Hamiltonian over rotamer variables to naturally map onto a hybrid photonic entropy computing platform, Dirac-3. To assess solution quality and runtime performance, we benchmark the photonic solver against an exact classical cost function network (CFN) solver, which provides provably optimal baselines. For protein instances ranging from 493 to 943 variables, Dirac-3 attains solutions within 0.16-2.47 % of optimal energies. Empirical scaling analysis reveals a comparatively gentle effective runtime growth for the photonic solver over the measured regime, consistent with near-linear polynomial scaling, in contrast to the sharp super-polynomial growth observed for the classical baseline beyond approximately 1000 variables. These results suggest a near-term crossover regime in which hardware-aligned continuous-variable optimization may offer a practical promise for large computational protein design instances where exact classical methods become time-prohibitive.
Khan, H.; Garcia-Galindo, P.; Ahnert, S. E.; Dingle, K.
Show abstract
A morphospace is an abstract space of theoretically possible biological traits, shapes, or property values. It is interesting to explore which parts of a morphospace life occupies, as compared to those parts which could be occupied, but are not. Comparing random and natural non-coding (nc) RNA secondary structures is an established approach to studying morphospace occupation for RNA structures. Most earlier studies have focused on the minimum free energy (MFE) structure, while relatively few have looked at the Boltzmann distribution, describing the ensemble of energetically suboptimal RNA folds. These suboptimal structures may have important roles and functions, and hence should be examined carefully. Here we compare random and natural ncRNA in terms of their Boltzmann distributions, finding that natural RNA tend to have very similar profiles to random RNA, with the main difference being that natural RNA are slightly more energetically stable, except for very short sequences (20 to 30 nucleotides) which tend to be slightly less stable. We infer that natural ncRNA occupy similar parts of the morphospace that random RNA do, indicating that the biophysics of the genotype-phenotype map largely determines the ensemble properties of ncRNA.
Togashi, Y.; Yotsumoto, Y.; Hiramatsu, C.; Tsuchiya, N.; Oizumi, M.
Show abstract
Whether qualitative aspects of consciousness, or qualia in short, are equivalent across individuals is a foundational scientific question. Testing this is challenging because one cannot assume a shared mapping between stimuli and private experience (my "red" may be your "green") [1-3]. Previously, we proposed a structural characterization of qualia [4, 5] and the quantitative assessment of structural correspondences through an unsupervised alignment method [4, 6], which does not presuppose such correspondence. Using this approach, our previous work focused on identifying optimal mappings between relational structures of color qualia at the group level [4]. Given known perceptual diversities [7], however, it remained unknown whether any two individuals structures could be empirically aligned. Here, we resolve this by collecting 4,371 pairwise similarity ratings for 93 colors-from 11 individuals, enabling direct individual-to-individual alignment. We reveal two fundamental, coexisting features. First, we identified two clusters of individuals showing robust within-cluster alignment, corresponding to color-neurotypicals and atypicals. Second, we uncovered a continuous spectrum of diversity: some participants who showed normal color discrimination ability in terms of the Total Error Score (TES) on Farnsworth-Munsell 100 hue test nevertheless failed to align with either cluster, revealing idiosyncratic structures that defy simple categorization. Together, these findings suggest a novel structure-based taxonomy of divergent color qualia that complements conventional performance-based classification. Our method is generalizable to other sensory modalities, and opens a path to the scientific investigation of both shared and idiosyncratic qualitative aspects of consciousness.
Ergon, R.
Show abstract
A moving average smoothing method for extraction of cycles in time series data is described, with focus on obliquity cycles and fossil data. The proposed method is intended for cases where the environmental driver of phenotypic evolution can be shown to include obliquity cycles, either by power spectrum analysis or simply by inspection of raw or smoothed time series. The method gives improved mean trait predictions and better understanding when applied on stickleback fish fossil data from around 10 million years ago. The possibility to extract obliquity cycles will depend on the dynamics of the time series, and the method is thus not universally applicable. It may, however, be possible to adapt the size of the moving window to problems under study, or possibly to obtain improved predictions by inclusion of a sinusoidal component in the mean trait prediction modeling.
Grigas, A. T.; Sumner, J.; O'Hern, C. S.
Show abstract
Protein structure is controlled by a high-dimensional energy landscape, which is a function of all of the atomic coordinates of the protein. Can this landscape be accurately described by a low-dimensional representation? We find that residue core identity, a binary N-dimensional encoding indicating whether each of the N amino acids in a protein is buried in the core or not, can predict the proteins backbone conformation more efficiently than all other representations that we tested. Core identity is 4 times more efficient than previous estimates of the bits per residue needed to encode a proteins native fold, 2 times more efficient than the C contact map, and 1.5 times more efficient than the machine-learned embeddings from FoldSeeks 3Di. Even when the folded structure is unavailable, predicting each residues burial from sequence yields a more accurate estimate of fold quality than predicting pairwise contacts from the same sequence information. Thus, this work emphasizes that the problem of determining a proteins native fold can be re-framed as predicting each residues core identity.
Abdallah, H. H.; Kopchick, J.; Hadous, J.; Easter, P.; Rosenberg, D. R.; Stanley, J. A.; Salch, A.; Diwadkar, V. A.
Show abstract
Functional brain imaging data can provide a window into the task-driven network states that shape brain function (or dysfunction). Conventionally, these network states can be represented as bivariate correlation matrices (which are formed from fMRI time series from multiple brain regions/nodes within any task window). Here, we treat these conventional connectivity matrices as connectivity terrains in order to recover local structure. In principle, any such terrain can be traversed node by node, where from any node, one can move towards its nearest functional neighbor (i.e., its maximally correlated node). In terrains with meaningful structure, such traversals across multiple nodes should converge to attractor nodes; here, the nodes that flow into a shared attractor form an attractor basin, which effectively is a sub-network within the system. Extant methods (e.g., degree distribution and characteristic path length) can summarize global network properties but cannot identify attractor nodes and basins. Here, we construct a new relation, called transitive maximal correlation (TMC) that can recover attractors and attractor basins in connectivity terrains. Node A is said to be transitively maximally correlated to node B if and only if B is an attractor into which A flows. We first develop the mathematical basis for deriving a TMC matrix TMC(M) from a bivariate correlation matrix M (before explaining this with hypothetical data). We next apply the TMC relation to connectivity terrains derived from real fMRI time series data, where these data were acquired in two distinct task-domains (that varied in their extent of cross-cerebral demand): i) associative learning and ii) visually guided motor control. We show that TMC is remarkably sensitive to inter-hemispheric structure in the connectivity terrain; here, attractor pairs that were inter-hemispheric homologues were more likely to be observed for the cross-cerebral learning task, than the more circumscribed motor-control data. We confirm the condition-specific sensitivity of TMC showing that observed attractor basins differed significantly across conditions of the learning task. Finally, we demonstrate that TMC complements graph theoretic constructions like path length and betweenness centrality. We suggest that TMC is a mathematically sound and novel method for capturing functional properties of brain networks.
Vasylenko, L.; Livnat, A.
Show abstract
At the fundamental conceptual level, two alternatives have traditionally been considered for how mutations arise and how evolution happens: 1) random mutation and natural selection, and 2) Lamarckism. Recently, the theory of Interaction-based Evolution (IBE) has been proposed, according to which mutations are neither random nor Lamarckian, but are influenced by information accumulating internally in the genome over generations. Based on the estimation-of-distribution algorithms framework, we present a simulation model that demonstrates nonrandom, non-Lamarckian mutation concretely while capturing indirectly several aspects of IBE: selection, recombination, and nonrandom, non-Lamarckian mutation interact in a complementary fashion; evolution is driven by the interaction of parsimony and fit; and random bits do not directly encode improvement but enable generalization by the manner in which they connect with the rest of the evolutionary process. Connections are drawn to Darwins observations that changed conditions increase the rate of production of heritable variation; to the causes of bell-shaped distributions of traits and how these distributions respond to selection; and to computational learning theory, where analogizing evolution to learning in accord with IBE casts individuals as examples and places the learned hypothesis at the population level. The model highlights the importance of incorporating internal integration of information through heritable change in both evolutionary theory and evolutionary computation.
Ringer McDonald, A.; Vazquez, A. V.
Show abstract
Developing scientific reading skills is critical for undergraduate STEM students due to scientific literatures unique formatting and use of specialized jargon. Generative AI tools such as ChatGPT offer students the ability to ask questions about what they are reading interactively. Previously, we reported the development of a ChatGPT-assisted reading guide that combined structured, active reading strategies with using ChatGPT to clarify unfamiliar words and concepts in real time. In the initial study, undergraduates found the use of the ChatGPT-assisted reading guide helpful in their understanding of an abstract and introduction of a journal article. Here, the ChatGPT-assisted reading guide was used in a journal club assignment for an undergraduate chemistry course. ChatGPT transcripts were analyzed for common types of interactions, and students were surveyed about their experience. Overall, students reported that using the ChatGPT-assisted reading guide was helpful in understanding the article and helped them have more productive class discussions. However, some students also expressed skepticism about using AI tools, citing concerns about accuracy of AI-generated information and the effect of using AI on their own learning.
Wu, A.
Show abstract
We propose a computational instantiation of three cognitive stages from the Dot-Linear- Network (DLN) framework, grounded in a compression-efficiency thesis. DLN stages are characterized as graph-structured belief-dependency representations used to evaluate options: Dot as no persistent belief graph (reactive policies with negligible internal state), Linear as a null graph over option beliefs (K independent option estimates with no information sharing), and Network as shared latent structure (a bipartite factor graph in which F latent factors connect to K options), augmented by a temporal exposure state and an explicit structural learning cycle (hypothesis [->] test [->] update/expand). We distinguish two compression targets--option-factor structure (shared components in expected outcomes) and stakes-factor structure (shared drivers of consequence-bearing exposures)-- whose intersection yields jointly efficient actions that simultaneously improve expected outcomes and marginal exposure impact. In a bandit-like simulation (100 seeds, K [isin] { 20, 50, 100, 200 }, F =5), Network policies dominate Linear policies in cost-adjusted utility at large K, with the empirical crossover occurring much earlier than an analytic cost-only prediction (K* = F + cmeta/cparam), revealing that the advantage is primarily statistical (shrinkage-like estimation gains from factor pooling) rather than purely computational. Under stakes, all non-DLN agents--including Linear-Plus agents with identical factor structure and Network-standard agents with hierarchical Bayesian learning--collapse due to unmodeled cumulative exposure, while Network-DLN maintains positive utility. Within-stage consistency tests (two algorithmically distinct agents per stage) confirm that the collapse pattern is determined by representational topology, not algorithmic choice. These results evaluate internal consistency of a DLN-to-computation mapping under explicit assumptions; they do not validate a developmental theory in humans.
Frost, H. R.
Show abstract
We describe an approach for analyzing biological networks using rows of the Krylov subspace of the adjacency matrix. Specifically, we explore the scenario where the Krylov subspace matrix is computed via power iteration using a non-random and potentially non-uniform initial vector that captures a specific biological state or perturbation. In this case, the rows the Krylov subspace matrix (i.e., Krylov trajectories) carry important functional information about the network nodes in the biological context represented by the initial vector. We demonstrate the utility of this approach for community detection and perturbation analysis using the C. Elegans neural network.