Entropy — Latest Matching Preprints

1

Exact p-values for global network alignments via combinatorial analysis of shared GO terms

Hayes, W. B.

2020-10-09 molecular biology 10.1101/2020.10.08.332254 medRxiv

Top 0.1%

19.1%

Show abstract

Network alignment aims to uncover topologically similar regions in the protein-protein interaction (PPI) networks of two or more species under the assumption that topologically similar regions perform similar functions. Although there exist a plethora of both network alignment algorithms and measures of topological similarity, currently no "gold standard" exists for evaluating how well either is able to uncover functionally similar regions. Here we propose a formal, mathematically and statistically rigorous method for evaluating the statistical significance of shared GO terms in a global, 1-to-1 alignment between two PPI networks. We use combinatorics to precisely count the number of possible network alignments in which k proteins share a particular GO term. When divided by the number of all possible network alignments, this provides an explicit, exact p-value for a network alignment with respect to a particular GO term.

2

Outcome of Crash Course Training on Protein Structure Prediction with Artificial Intelligence

Balamurugan, D.; Dougherty, M.; Lubin, J.; Arias, P.; Chang, J.; Dalenberg, K.; Kholodovych, V.; Zelzion, E.; Khare, S.; von Oehsen, J. B.; Zwick, M. E.; Burley, S. K.

2022-09-03 scientific communication and education 10.1101/2022.09.01.506222 medRxiv

Top 0.1%

18.6%

Show abstract

Protein structure predictions have broad impact on several science disciplines such as biology, bioengineering, and medical science. AlphaFold2[1] and RoseTTAFold[2] are the current state-of-the-art AI methods to predict the structures of proteins with an accuracy comparable to lower-resolution experimental methods. In its 2021 year review, both these methods were recognized as "breakthrough of the year" by Science magazine[3] and "method of the year" by Nature magazine [4]. It is timely and important to provide training and support on these emerging methods. Our crash course "Enabling Protein Structure Prediction with Artificial Intelligence "was conducted in collaboration with domain experts and research computing professionals. The crash course was well received by the community as there were 750 registrants from all over the world. Here we provide the summary of the crash course, describe our findings in organizing the crash course, and explain what preparation steps helped us with the hands-on training. CCS CONCEPTSComputing methodologies a Machine learning a Machine learning approaches a Bio-inspired approaches

3

Experimental evaluation of thermodynamic cost and speed limit in living cells via information geometry

Ashida, K.; Aoki, K.; Ito, S.

2020-11-30 biophysics 10.1101/2020.11.29.403097 medRxiv

Top 0.1%

17.4%

Show abstract

Chemical reactions are responsible for information processing in living cells, and thermodynamic trade-off relations can explain their accuracy and speed. Its experimental test in living cells had not existed despite its importance because it is hard to justify sample size sufficiency. This paper reports the first experimental test of the thermodynamic trade-off relation, namely the thermodynamic speed limit, in living systems at the single-cell level where the sample size is relatively small. Due to the information-geometric approach, we can demonstrate the thermodynamic speed limit for the extracellular signal-regulated kinase phosphorylation using time-series fluorescence imaging data. Our approach quantifies the intrinsic speed of cell proliferation and can potentially apply other signal transduction pathways to detect their information processing speed. One-Sentence SummaryExperimental measurement of information thermodynamic speed by fluorescence imaging in living cells

4

Coherent Gene Assemblies: Example, Yeast Cell Division Cycle, CDC

Sirovich, L.

2021-09-05 molecular biology 10.1101/2021.09.05.459023 medRxiv

Top 0.1%

15.0%

Show abstract

A fresh approach to the dynamics of gene assemblies is presented. Central to the exposition are the concepts of: high value genes; correlated activity; and the orderly unfolding of gene dynamics; and especially dynamic mode decomposition, DMD, a remarkable new tool for dissecting dynamics. This program is carried out, in detail, for the Orlando et al yeast database (Orlando et al. 2008). It is shown that the yeast cell division cycle, CDC, requires no more than a six dimensional space, formed by three complex temporal modal pairs, each associated with characteristic aspects of the cell cycle: (1) A mother cell cohort that follows a fast clock; (2) A daughter cell cohort that follows a slower clock; (3) inherent gene expression, unrelated to the CDC. A derived set of sixty high-value genes serves as a model for the correlated unfolding of gene activity. Confirmation of our results comes from an independent database, and other considerations. The present analysis, leads naturally, to a Fourier description, for the sparsely sampled data. From this, resolved peak times of gene expression are obtained. This in turn leads to prediction of precise times of expression in the unfolding of the CDC genes. The activation of each gene appears as uncoupled dynamics from the mother and daughter cohorts, of different durations. These deliberations lead to detailed estimates of the fraction of mother and daughter cells, specific estimates of their maturation periods, and specific estimates of the number of genes in these cells. An algorithmic framework for yeast modeling is proposed, and based on the new analyses, a range of theoretical ideas and new experiments are suggested. A Supplement contains additional material and other perspectives.

5

Spectral theory of stochastic gene expression: a Hilbert space framework

Wu, B.; Grima, R.; Jia, C.

2025-05-23 molecular biology 10.1101/2025.05.19.654994 medRxiv

Top 0.1%

14.9%

Show abstract

A survey of the literature reveals notable discrepancies among the purported exact results for the spectra of stochastic gene expression models. For self-repressing gene circuits, previous studies ([Phys. Rev. Lett. 99, 108103 (2007)], [Phys. Rev. E 83,062902 (2011)], [J. Chem. Phys. 160, 074105 (2024)], and [bioRxiv 2025.02.05.635946 (2025)]) have provided different exact solutions for the eigenvalues of the generator matrix. In this work, we propose a unified Hilbert space framework for the spectral theory of stochastic gene expression. Based on this framework, we analytically derive the spectra for models of constitutive, bursty, and autoregulated gene expression. The eigenvalues and eigenvectors obtained are then used to construct an exact spectral representation of the time-dependent distribution of gene product numbers. The spectral gap between the zero eigenvalue and the first nonzero eigenvalue, which reflects the relaxation rate of the system towards its steady state, is then compared with the prediction of the deterministic model, and we find that deterministic modeling fails to capture the relaxation rate when autoregulation is strong. In particular, our results demonstrate that for infinite-dimensional operators such as in stochastic gene expression models, many conclusions in linear algebra do not apply, and one must rely on the modern theory of functional analysis.

6

The Area Law of Molecular Entropy: Moving Beyond Harmonic Approximation.

Roy, A.; Venkatraman, V.; Ali, T.

2024-03-17 biophysics 10.1101/2024.03.16.585357 medRxiv

Top 0.1%

14.7%

Show abstract

Inspired by black hole thermodynamics, the area law that entropy is proportional to horizon area has been proposed in quantum entanglement entropy and has largely maintained its validity. This article shows that the area law is also valid for the thermodynamic entropy of molecules. We showed that the gas-phase entropy of molecules obeys the area law with our proposed correction for the different curvatures of the molecular surface. The coefficient for the ultraviolet cutoff for the molecular entropy, calculated from our curated experimental data, is tantalizingly close to the value [Formula] proposed by Hawking [Hawking, 1976]. The ability to estimate gas-phase entropy by the area law also allows us to calculate molecular entropy faster and more accurately than currently popular methods of estimating molecular entropy with harmonic oscillator approximation. The speed and accuracy of our method will open up new possibilities for the explicit inclusion of entropy in computational biology methods, such as virtual screening applications.

7

A digitization theory of the Weber-Fechner law

Choe, H.

2021-03-20 neuroscience 10.1101/2021.03.15.435555 medRxiv

Top 0.1%

14.3%

Show abstract

Ever since the publication of Shannons article about information theory, there have been many attempts to apply information theory to the neuroscience field. Meanwhile, the Weber- Fechner law of psychophysics states that the magnitude of a subjective sensation of a person increases in proportion to the logarithm of the intensity of the external physical-stimulus. It is hardly surprising that we assign the amount of information to the response in the Weber- Fechner law. But, to date no one has succeeded in applying information theory directly to that law: the direct links between information theory and that response in the Weber-Fechner law have not yet been found. The proposed theory unveils a link between information theory and that response, and differs subtly from the field such as neural coding that involves complicated calculations and models. Because my theory targets the Weber-Fechner law which is a macroscopic phenomenon, this theory does not involve complicated calculations. My theory is expected to mark a new era in the fields of sensory perception research. My theory must be studied in parallel with the fields of microscopic scale such as neural coding. This article ultimately aims to provide the fundamental concepts and their applications so that a new field of research on stimuli and responses can be created.

8

Data Reuse and the Social Capital of Open Science

Alicea, B.

2019-10-01 scientific communication and education 10.1101/093518 medRxiv

Top 0.1%

10.7%

Show abstract

Participation in open data initiatives require two semi-independent actions: the sharing of data produced by a researcher or group, and a consumer of shared data. Consumers of shared data range from people interested in validating the results of a given study to people who actively transform the available data. These data transformers are of particular interest because they add value to the shared data set through the discovery of new relationships and information which can in turn be shared with the same community. The complex and often reciprocal relationship between producers and consumers can be better understood using game theory, namely by using three variations of the Prisoners Dilemma (PD): a classical PD payoff matrix, a simulation of the PD n-person iterative model that tests three hypotheses, and an Ideological Game Theory (IGT) model used to formulate how sharing strategies might be implemented in a specific institutional culture. To motivate these analyses, data sharing is presented as a trade-off between economic and social payoffs. This is demonstrated as a series of payoff matrices describing situations ranging from ubiquitous acceptance of Open Science principles to a community standard of complete non-cooperation. Further context is provided through the IGT model, which allows from the modeling of cultural biases and beliefs that influence open science decision-making. A vision for building a CC-BY economy are then discussed using an approach called econosemantics, which complements the treatment of data sharing as a complex system of transactions enabled by social capital.

9

Emerging Concern of Scientific Fraud: Deep Learning and Image Manipulation

Qi, C.; Zhang, J.; Luo, P.

2021-01-17 scientific communication and education 10.1101/2020.11.24.395319 medRxiv

Top 0.1%

10.4%

Show abstract

Scientific fraud by image duplications and manipulations within western blot images is a rising problem. Currently, problematic western blot images are mainly detected by checking repeated bands or through visual observation. However, the completeness of the above methods in detecting problematic images has not been demonstrated. Here we show that Generative Adversarial Nets (GANs) can generate realistic western blot images that indistinguishable from real western blots. The overall accuracy of researchers for identifying synthetic western blot images is 0.52, which almost equal to blind guess (0.5). We found that GANs can generate western blot images with bands of the expected lengths, widths, and angles in desired positions that can fool researchers. For the case study, we find that the accuracy of detecting the synthetic western blot images is related to years of researchers performed studies relevant to western blots, but there was no apparent difference in accuracy among researchers with different academic degrees. Our results demonstrate that GANs can generate fake western blot images to fool existing problematic image detection methods. Therefore, more information is needed to ensure that the western blots appearing in scientific articles are real. We argue to require every western blot image to be uploaded along with a unique identifier generated by the laboratory machine and to peer review these images along with the corresponding submitted articles, which may reduce the incidence of scientific fraud.

10

Heterogeneity versus the COVID-19 Pandemic

Shanmugam, R.; Ledlow, G.; Singh, K. P.

2021-01-06 scientific communication and education 10.1101/2021.01.06.425543 medRxiv

Top 0.1%

10.4%

Show abstract

In this paper, heterogeneity is formally defined, and its properties are explored. We define and distinguish observable versus non-observable heterogeneity. It is proposed that heterogeneity among the vulnerable is a significant factor in the contagion impact of COVID-19, as demonstrated with incidence rates on a Diamond Princess Cruise ship in February 2020. Given the nature of the disease, its heterogeneity and human social norms, pre-voyage and post-voyage quick testing procedures may become the new standard for cruise ship passengers and crew. The technological advances in testing available today would facilitate more humanistic treatment as compared to more archaic quarantine and isolation practices for all onboard ship. With quick testing, identification of those infected and thus not allowed to embark on a cruise or quarantining those disembarking and other mitigation strategies, the popular cruise adventure could be available safely again. Whatever the procedures implemented, the methodological purpose of this study should add valuable insight in the modeling of disease and specifically, the COVID-19 virus.

11

A Biological Signature of Quantum Gravity?

Lone, I.

2024-09-26 biophysics 10.1101/2024.09.25.614787 medRxiv

Top 0.1%

10.0%

Show abstract

In a recent proposal on the experimental tests of quantum gravity creation of non-Gaussianity in a Bose-Einstein condensate (BEC) has been suggested as a decisive confirmation of quantum gravity. In a related proposal, a gas of ultracold Rb or Cs atoms has previously been suggested as a possible platform for tests of quantum gravity. Since a practical demonstration of above proposals is a very challenging and costly affair, exploring cost-effective alternatives to these technologically demanding experimental protocols becomes very important. We here show that the phenomenon of Bicoid (Bcd) gradient formation in the early fruit fly embryo, considered basically here as a multipartite quantum system with an ensemble of initial states and a unitary evolution U that implements a quantum Newtonian Hamiltonian over this gravitationally interacting system, naturally combines the essential features of above proposals in a single system giving a viable signature of quantum gravity through the creation of non-Gaussianity. We conclude that although the phenomenon of Bcd gradient formation in the early Drosophila embryo is accompanied by quantum gravitational effects, it might need further experiments to verify such a noval claim.

12

Impact and centrality of scientific disciplines

Aguilar-Velazquez, D.

2023-09-05 scientific communication and education 10.1101/2023.09.01.555991 medRxiv

Top 0.1%

8.9%

Show abstract

The Scimago Journal Rank (SJR) is a metric that captures the centrality of a journal across an all-discipline article network, while the impact factor (IF) is the average incoming citations of a journal. We analyzed SJRs and IFs of the journals belonging to the SJR first quartile from 2013 to 2020 in 7 disciplines: mathematics, biology, physics, medicine, social sciences, chemistry, and engineering. We show that biology is the most central discipline, followed by physics and chemistry. These three disciplines also present the highest IFs. Mathematics journals display a low IF (the second-lowest among disciplines), but possesses an intermediate centrality. While the average IF has increased over the last years, the SJR average has decreased. Gini coefficients show that SJR is a slightly more egalitarian metric than IF. We discuss some possible origins of these findings.

13

Ubiquitous Forbidden Order in R-group classified protein sequence of SARS-CoV-2 and other viruses

Pratibha, P.; Shaju, C.; Kamal, K.

2020-08-21 molecular biology 10.1101/2020.08.21.261289 medRxiv

Top 0.1%

8.7%

Show abstract

Each amino acid in a polypeptide chain has a distinctive R-group associated with it. We report here a novel method of species characterization based upon the order of these R-group classified amino acids in the linear sequence of the side chains associated with the codon triplets. In an otherwise pseudo-random sequence, we search for forbidden combinations of kth order. We applied this method to analyze the available protein sequences of various viruses including SARS-CoV-2. We found that these ubiquitous forbidden orders (UFO) are unique to each of the viruses we analyzed. This unique structure of the viruses may provide an insight into viruses chemical behavior and the folding patterns of the proteins. This finding may have a broad significance for the analysis of coding sequences of species in general.

14

A formal relation between two disparate mathematical algorithms is ascertained from biological circuit analyses

Liu, C.; Bowen, E. F.; Granger, R.

2025-04-03 neuroscience 10.1101/2025.03.28.645962 medRxiv

Top 0.1%

8.6%

Show abstract

We simulate and formally analyze the emergent operations from the specific anatomical layout and physiological activation patterns of a particular local excitatory-inhibitory circuit architecture that occurs throughout superficial layers of cortex. The circuit carries out two effective procedures on its inputs, depending on the strength of its local feedback inhibitory cells. Both procedures can be formally characterized in terms of well-studied statistical operations: clustering, and component analyses, under high-feedback-inhibition and low-feedback-inhibition conditions, respectively. The detailed nature of these clustering and component procedures is studied in the context of extensive related literatures in statistics, machine learning, and computational neuroscience. The two operations (clustering and component analysis) have not previously been shown to contain deep connections, let alone to each be derivable from a single overarching algorithmic precursor. The identification of this deep formal mathematical connection, which arose from analysis of a detailed biological circuit, represents a rare instance of novel mathematical relations arising from biological analyses. 1

15

Selective attention in hypothesis-driven data analysis

Yanai, I.; Lercher, M.

2020-07-31 scientific communication and education 10.1101/2020.07.30.228916 medRxiv

Top 0.1%

8.5%

Show abstract

When analyzing the results of an experiment, the mental focus on a specific hypothesis might prevent the exploration of other aspects of the data, effectively blinding one to new ideas. To test this notion, we performed an experiment in which we asked undergraduate students to analyze a fictitious dataset. In addition to being asked what they could conclude from the dataset, half of the students were asked to also test specific hypotheses. In line with our notion, students in the hypothesis-free group were almost 5 times more likely to observe an image of a gorilla when simply plotting the data, a proxy for an initial step towards data analysis. If these findings are representative also of scientific research as a whole, they warrant concern about the current emphasis on hypothesis-driven research, especially in the context of information-rich datasets such as those now routinely created in the biological sciences. Our work provides evidence for a link between the psychological effect of selective attention and hypothesis-driven data analysis, and suggests a hidden cost to having a hypothesis when analyzing a dataset.

16

Analyzing the link between RNA secondary structures and R-loop formation with tree polynomials

Liu, P.; Lusk, J.; Jonoska, N.; Vazquez, M.

2024-02-01 molecular biology 10.1101/2023.09.24.559224 medRxiv

Top 0.1%

8.4%

Show abstract

R-loops are a class of non-canonical nucleic acid structures that typically form during transcription when the nascent RNA hybridizes the DNA template strand, leaving the DNA coding strand unpaired. Co-transcriptional R-loops are abundant in nature and biologically relevant. Recent research shows that DNA sequence and topology affect R-loops, yet it remains unclear how these and other factors drive R-loop formation. In this work, we investigate a link between the secondary structure of the nascent RNA and the probability of R-loop formation. We introduce tree-polynomial representations, a class of mathematical objects that enable accurate and efficient data analysis of RNA secondary structures. With tree-polynomials, we establish a strong correlation between the secondary structure of the RNA transcript and the probability of R-loop formation. We identify that branches with short stems separated by multiple bubbles in the RNA secondary structure are associated with the strong correlation and are predictive of R-loop formation.

17

CHANging Consciousness Epistemically (CHANCE): A method to fully know the content of consciousness of other individuals in scientific experiments

Tanaka, D. H.; Tanabe, T.

2019-07-16 neuroscience 10.1101/495523 medRxiv

Top 0.1%

8.4%

Show abstract

The content of consciousness (cC) constitutes an essential part of human life and is at the very heart of the hard problem of consciousness. The cC of a person (e.g., study participant) has been examined indirectly by evaluating the persons behavioral reports, bodily signs, or neural signals. However, the measures do not reflect the full spectrum of the persons cC. In this paper, we define a method, called "CHANging Consciousness Epistemically" (CHANCE), to consciously experience a cC that would be identical to that experienced by another person, and thus directly know the entire spectrum of the others cC. In addition, the ontologically subjective knowledge about a persons cC may be considered epistemically objective and scientific data. The CHANCE method comprises two empirical steps: (1) identifying the minimally sufficient, content-specific neural correlates of consciousness (mscNCC) and (2) reproducing a specific mscNCC in different brains.

18

Evolutionary Distance of Gene-Gene Interactions: Estimation under Statistical Uncertainty

Gu, X.

2020-03-09 evolutionary biology 10.1101/2020.03.08.982710 medRxiv

Top 0.1%

7.4%

Show abstract

Consider the functional interaction of gene A to an interaction subject X; for instance, it is the gene-gene interaction if X represents for a gene, or gene-tissue interaction (expression status) if X for a tissue. In the simplest case, the status of this A-X interaction is r=1 if they are interacted, or r=0 otherwise. A fundamental problem in molecular evolution is, given two homologous (orthologous or paralogous) genes A and B, to what extent their functional overlapping could be by the means of interaction networks. Given a set of interaction subjects (X1, ... XN), it is straightforward to calculate the interaction distance (IAB) between genes A and B, by a Markov-chain model. However, since the high throughput interaction data always involve a high level of noises, reliable inference of r=1 or r=0 for each gene remains a big challenge. Consequently, the estimated interaction distance (IAB) is highly sensitive to the cutoff of interaction inference which is subject to some arbitrary. In this paper we will address this issue by developing a statistical method for estimating IAB based on the p-values (significant levels). Computer simulations are carried out to evaluate the performance of different p-value transformations against the uncertainty of interaction networks.

19

Multiversal SpaceTime (MSpaceTime) Not Neural Network as Source of Intelligence in Generalized Quantum Mechanics, Extended General Relativity, Darwin Dynamics for Artificial Super Intelligence Synthesis

Zhang, Y.

2019-11-29 neuroscience 10.1101/858423 medRxiv

Top 0.1%

7.3%

Show abstract

From Synthesis perspective, whether Logic Synthesis, Physical Synthesis, Chemical Synthesis, or Biological Synthesis, Physical Geometry such as Universal Geometry and Quantum Geometry, and Biological Geometry like Conformal Geometry supported by Tensors and Manifolds, are the outcome of physical laws and biological laws in modeling non-linear physical and biological dynamics as opposed to traditional partial differential/difference equation way. We discover that Multiversal SpaceTime instead of Neural Network, governing physical and biological world at macroscopic and microscopic level, is the ultimate source of intelligence. With that we propose Multiversal Synthesis-based Artificial Design Automation (ADA), a bio-physical inspired model based on Multiverse in Darwin Dynamics, Generalized Quantum Mechanics, and Extended General Relativity, for Artificial Super Intelligence (ASI) implementation. Based on Schrodinger Equation of Quantum Mechanics, we generalize the 4-Dimensional Hilbert Space based Discrete Quantum SpaceTime to N-Dimensional (1 << N < M, with M is limited by Planck Length) Hilbert Space based Discrete MSpaceTime as part of MSpaceTime, in modeling both Micro-Environment Intelligence and Micro-Agent Intelligence of ASI; likewise based on Einstein Equations of General Relativity, we make a T-Symmetry extension first, and then extend the 4-Dimensional Pseudo-Riemannian Manifold based Continuous Curved SpaceTime as part of MSpaceTime to N-Dimensional (1 << N < {infty}) Pseudo-Riemannian Manifold based Continuous MSpaceTime extension, in modeling both Macro-Environment Intelligence and Macro-Agent Intelligence of ASI. Our discovery only solves the black box puzzle of AI, but also paves the way in achieving ASI through ADA. Of course, our Multiverse Endeavor will never stop from there.

20

Topological Data Analysis of Protein Structure Manifolds from Molecular Dynamics Computer Simulation

Sino, M.; Kamberaj, H.

2025-07-14 biophysics 10.1101/2025.07.12.664527 medRxiv

Top 0.1%

7.2%

Show abstract

The analysis of computer simulation data requires efficient statistical and computational approaches, based on well-established theoretical frameworks. This study aims to introduce such approaches for topological data analysis within the persistent homology framework and to describe the manifold of the protein structure dynamics within the differential geometry of the directed graphs framework. Furthermore, the asymmetric kernel-directed graphs determined by the transfer entropy will describe the information flow in this manifold. The primary goal is to characterise changes in the topology of the protein structure due to the mutations. Moreover, this study aims to define the embedded manifold of dimension m of the amino acid sequence interaction network using the graphs Laplacian matrix for determining the local embedded vector fields and coordinate vectors in this manifold for each amino acid as the vertices of either a directed or undirected graph. Furthermore, this study strives to show that encoding the amino acid sequence information in an m-dimensional manifold is statistically efficient by decoding that information in a much lower-dimensional space. Then, using the topological data analysis, we can observe protein structure dynamics changes in a multidimensional manifold, for example, due to amino acid mutations. The analysis showed that short equilibrium structure fluctuations at a few nanoseconds enable the construction of such a manifold. As a case study, the influence of the mutation of the two disulphide bridges on the three-dimensional structure of the Bovine Pancreatic Trypsin Inhibitor protein is investigated.