SoftwareX — Latest Matching Preprints

1

A Python Dash App and cPanel workflow to automate metabolomics data analyses and visualisation

O'Loughlin, J.; Moses, T.

2026-05-05 biochemistry 10.64898/2026.05.01.722139 medRxiv

Top 0.1%

7.2%

Show abstract

Metabolomics offers a sophisticated analytical framework for characterising the molecular phenotype of biological organisms and complex living systems at a high resolution. As the functional endpoint of the omics cascade, the metabolome serves as a close reflection of cellular activity. It integrates genetic, transcriptomic and proteomic variations with external environmental influences. However, the inherent complexity of metabolomic datasets, characterised by high-dimensional chemical diversity, wide dynamic ranges, and significant matrix effects, necessitates a rigorous suite of chemometric and bioinformatic workflows. For researchers uninitiated in computational biology, the multi-stage requirement for raw data pre-processing, signal deconvolution, and multivariate statistical modelling (such as PCA or PLS-DA) presents a substantial barrier to entry. Navigating these convoluted data architectures remains a primary challenge in deriving biological meaning from the global metabolic profile. Here, we present a workflow to use Python Dash Apps to create a user-friendly interface for simplifying data processing and statistical calculations. Users can select their desired samples to initiate calculations for various statistical tests, generating interactive and publication-quality figures to explore their results. These apps were deployed on an Apache server via cPanel, allowing individuals to share their findings with collaborators and for research facilities to share metabolomics results with their users.

2

PALMS: A Computational Implementation for Pavlovian Associative Learning Models Simulation

Fixman, M.; Abati, A.; Jimenez Nimo, J.; Lim, S.; Mondragon, E.

2026-05-08 animal behavior and cognition 10.64898/2026.05.05.722899 medRxiv

Top 0.1%

4.8%

Show abstract

In contrast to static formalisms, computational definitions describe the operational mechanisms of a model. Simulations are an essential part of the cycle of theory development and refinement, assisting researchers in formulating the precise definitions that models require, and making accurate predictions. This manuscript introduces a computational implementation of Pavlovian learning models in a Python environment, termed Pavlovian Associative Learning Models Simulation (PALMS). In addition to the canonical Rescorla-Wagner model, attentional approaches are implemented, including Pearce-Kaye-Hall, Mackintosh Extended, Le Pelleys Hybrid, and a novel extension of the Rescorla-Wagner model featuring a unified variable learning rate that synthesises Mackintoshs and Pearce and Halls opposing conceptualisations. To our knowledge, only the first attentional model has been previously specified computationally in a general design tool. PALMS integrates a graphical interface that permits the input of entire experimental designs in an alphanumeric format, akin to that used by experimental neuroscientists. It uniquely enables the simulation of experiments involving hundreds of stimuli, such as those used with human participants, and the computation of configural cues and configural-cue compounds across all models, thereby substantially broadening their predictive capabilities. A comprehensive description of the models implementation and the environment functionalities is provided in the paper; these include efficient and accurate operation and instant visualisation of predicted results across different models within a single architecture and environment. We evaluate PALMS by simulating five published experiments in the associative learning literature that assessed the predictive scope of existing models, and we show that this implementation provides neuroscientists with a useful tool for identifying critical variables, refining experimental designs, making precise predictions, comparing model fitness, and formulating new theoretical approaches. PALMS is licensed under the open-source GNU Lesser General Public License 3.0. The environment source code and the latest multiplatform release build are accessible as a GitHub repository at https://github.com/cal-r/PALMS-Simulator. Author summaryResearch on associative learning is multidisciplinary, encompassing disciplines such as neuroscience, AI, psychology, psychiatry, behavioural sciences, planning, and marketing. Unlike static formalisms, precise computational definitions specify how a model operates, enabling model simulation, swift and error-free prediction calculations, which are essential for testing theories, comparing predictions, holding models accountable, and providing a common language across fields. We introduce Pavlovian Associative Learning Models Simulation (PALMS), a user-friendly, open-source Python environment for simulating classical conditioning and studying the role of attention in learning. PALMS implements the prescriptive Rescorla-Wagner and attentional models: Pearce-Kaye-Hall, Mackintosh Extended, Le Pelleys Hybrid, and a new hybrid model with a unified variable learning rate that blends Mackintosh and Pearce-Halls conflicting views. Its graphical interface makes it easy for neuroscientists to enter experiments. Our computational implementation supports simulations with hundreds of stimuli, configural cues, and compounds, broadening the models predictive power. Designed for efficiency, it offers instant visual results and useful features. We evaluate PALMS by simulating five published experiments, highlighting its value for model comparison and refinement, and, more generally, as a tool to assist research.

3

MicrobeMS - A MATLAB Toolbox for Microbial Identification Based on Mass Spectrometry

Lasch, P.

2026-05-12 bioinformatics 10.64898/2026.05.08.723807 medRxiv

Top 0.1%

4.2%

Show abstract

1.Over the last two decades, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-ToF MS) has become the standard method for identifying bacteria and has found a wide range of applications, especially in clinical microbiology. The methods high taxonomic resolution, minimal sample preparation, and complete, ready-to-use commercial systems, which include instrumentation, experimental protocols, spectral databases, and identification analysis software, were key factors in the success of MALDI-ToF MS as the standard for identifying microorganisms in routine diagnostic laboratories. However, despite the availability of these commercial solutions, there is also a growing need for efficient, cost-effective, vendor-neutral databases and analysis tools. These tools would enable the compilation of user-defined mass spectral databases and the testing of new analysis methods and algorithms, particularly in an academic context. To this end, MicrobeMS software has been developed to cover all stages of MALDI-ToF MS-based identification analysis. MicrobeMS is an easy-to-use desktop application for analyzing mass spectra from microorganisms and performing tasks related to spectrum database compilation. It includes routines for direct data import and export, biomarker peak searches, management of spectrum metadata, testing of spectrum quality, supervised and unsupervised identification analysis and intuitive result display. MicrobeMS is implemented in MATLAB and is freely available as MATLAB pcode for Windows and Linux, as well as a standalone application. Over the last fifteen years, the software has undergone continuous development and is now used routinely in various settings at the Centre for Biological Threats and Special Pathogens (ZBS) at the Robert Koch Institute (RKI) in Berlin, Germany, for example in supporting spectrum database compilation, to identify special or rare pathogenic bacteria by advanced identification analysis concepts, or to test in silico MALDI-ToF MS databases derived from microbial genomes. In this software publication the versatility and capabilities of MicrobeMS are demonstrated using a test data set from highly pathogenic bacteria (HPB) which has been obtained as part of a published European Union (EU)-funded External Quality Assurance Exercise (EQAE). MicrobeMS and HPB test data can both be downloaded from https://wiki.microbe-ms.com/. The goal of this software publication is twofold: to raise awareness of MicrobeMS within the scientific community and to encourage the testing of the software and custom-developed MALDI-ToF MS databases of the RKI, which are published at the ZENODO data repository (https://doi.org/10.5281/zenodo.7702374).

4

An Open Reproducible Framework for CNN-Based Cetacean Vocalization Detection in Passive Acoustic Monitoring

De Marco, R.

2026-05-06 animal behavior and cognition 10.64898/2026.05.01.721665 medRxiv

Top 0.1%

3.7%

Show abstract

This paper presents a six-stage methodological framework for Convolutional Neural Net-work (CNN)-based cetacean vocalization detection and classification in Passive Acoustic Monitoring (PAM), implemented as the open-source toolkit ai-pam-pipeline. The frame-work is generalizable across species and fully parameterised through a single configuration file, guaranteeing exact experimental reproducibility. Two experiments are reported. Experiment A examines the effect of FFT window length Nfft [isin] {256, 512, 1024} on binary Bottlenose dolphin (Tursiops truncatus) whistle detection using stratified 10-fold cross-validation on an in-domain dataset (Oltremare, 192 kHz) and a cross-domain benchmark (DCLDE 2022). In-domain performance is uniformly high (macro F1{approx} 0.98; Wilcoxon, all p > 0.05). Cross-domain results diverge substantially: Nfft = 256 is significantly superior (p = 0.006, rank-biserial r = 0.89). The mechanism is an upsampling amplification effect: coarser spectral bins produce wider, higher-contrast FM traces after bilinear resampling to fixed image dimensions. This superiority is threshold-invariant: precision equals 1.000 across all configurations and thresholds{theta} [isin] [0.1, 0.9], confirming that the advantage is not an artifact of threshold choice. These findings demonstrate that preprocessing choices -- often treated as secondary implementation details -- can significantly affect cross-domain generalisation. While Nfft serves here as a controlled case study, the framework is designed to enable systematic, reproducible evaluation of arbitrary preprocessing parameters within a unified experimental protocol. Experiment B demonstrates multiclass capability on five T. truncatus vocalization cate-gories (macro F1 = 0.843); inter-class confusion between click trains and burst-pulse sounds reflects biological signal overlap rather than classifier failure.

5

cran2crux: automatically create CRUX ports for R-packages

Petrov, P.; Izzi, V.

2026-05-13 bioinformatics 10.64898/2026.05.09.723963 medRxiv

Top 0.2%

1.3%

Show abstract

MotivationR together with CRAN and Bioconductor provides one of the richest ecosystems for bioinformatics and computational biology, with thousands of specialized packages. While GNU/Linux is a vastly-used operating system in this field, R-packages are typically managed independently of the systems native package manager. This separation makes installation, updates and mass rebuilds cumbersome. CRUX, a minimalist semi-source GNU/Linux distribution, offers great flexibility with its ports-based system for the seamless integration of R-packages with its native package manager. ResultsThe hereby presented cran2crux tool automatically generates CRUX ports for packages from both CRAN and Bioconductor. It performs recursive dependency resolution, handles naming conventions, extracts dependencies information, and supports inclusion of optional dependencies. The tool also provides convenient functions for checking updates and regenerating outdated ports. It can generate over 140 ports for complex packages such as Seurat in approximately 11 seconds, dramatically simplifying the maintenance of large R-dedicated repositories on CRUX. Availabilitycran2crux is available under the MIT license at https://github.com/izzilab/cran2crux. As of now, more than 650 R package ports, generated with the tool, are available in the CRUX ports database.

6

PDBe-SIFTS: an open-source tool for Structure Integration with Function, Taxonomy, and Sequences, featuring improved alignment, scoring scheme, and accelerated search

Bellaiche, A.; Choudhary, P.; Nair, S.; Harrus, D.; Yu, C. W.-H.; Tanweer, S. A.; Evans, G. L.; Lo, S. W.; Martin, M.; Fleming, J. R.; Velankar, S.

2026-05-04 bioinformatics 10.64898/2026.04.30.721839 medRxiv

Top 0.2%

1.2%

Show abstract

Structure Integration with Function, Taxonomy and Sequences (SIFTS) provides residue-level mappings between UniProt Knowledgebase sequences and Protein Data Bank structures and has historically been generated through internal Protein Data Bank in Europe (PDBe) pipelines. Here, PDBe-SIFTS is presented as a fully open-source, locally deployable implementation of this mapping framework. The pipeline combines fast, scalable sequence search using MMseqs2, an improved bounded scoring scheme for ranking candidate mappings, and residue-level mapping refinement based on backbone connectivity. PDBe-SIFTS is distributed as a Python package with command-line tools for 1) building a sequence search database, 2) identifying the best sequence-structure match, 3) one-to-one mapping at the residue level, and 4) generating SIFTS annotations in PDBx/mmCIF format. Benchmarking on the complete Protein Data Bank archive showed that MMseqs2 reduced archive-scale UniProtKB searches from hours with BLASTP to minutes, approximately 22-36 times faster, while curated mappings were recovered at top rank in 93.1% of cases. The remaining discrepancies mainly involved biologically ambiguous cases such as highly conserved proteins, chimeric constructs, or closely related orthologs. These results show that PDBe-SIFTS enables fast mapping, improving structural coherence in residue-level alignments while delivering the most up-to-date and accurate mappings, comparable to expert curation. Tool: https://github.com/PDBeurope/SIFTS Quick start notebook with example: https://github.com/PDBeurope/SIFTS/tree/master/notebooks Broader audience statementMatching protein sequences to their three-dimensional structures, and mapping annotations across both, is essential for understanding protein function, interactions, and molecular mechanisms. This integrated view enables richer interpretation of biological data and underpins advances in drug discovery, disease research, and protein engineering. PDBe-SIFTS provides an open and functional framework for structure-sequence mapping, allowing researchers and databases to run, inspect, and extend these mappings locally, while benefiting from faster searches, transparent scoring, and structurally informed residue-level alignments. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=110 SRC="FIGDIR/small/721839v1_ufig1.gif" ALT="Figure 1"> View larger version (25K): org.highwire.dtl.DTLVardef@5e6ea6org.highwire.dtl.DTLVardef@1b2754dorg.highwire.dtl.DTLVardef@1334f9forg.highwire.dtl.DTLVardef@1b083a1_HPS_FORMAT_FIGEXP M_FIG C_FIG

7

A workflow for the identification of oligomeric structures on tilted sample planes in Cryo-SMLM

Dong, Y.; Yang, Z.; Schneider, M.; Scherzer, O.; Schuetz, G.

2026-05-14 biophysics 10.64898/2026.05.12.724524 medRxiv

Top 0.3%

1.1%

Show abstract

We introduce a workflow to identify oligomeric structures that are recorded with single-molecule localization microscopy (SMLM) under cryogenic conditions. Typically, these oligomers are assumed to consist of protomers arranged as equilateral two-dimensional polygons and every protomer is labeled with a dye molecule for visualization. Unlike previous work, we consider scenarios in which the sample plane has an unknown orientation relative to the focal plane. Our contribution is a high-precision plane-fitting algorithm to determine the sample plane, combined with geometrical transformations and two circle-fitting algorithms to identify the oligomeric structures. Our simulations on synthetic data demonstrate that the proposed workflow achieves high accuracy in estimating both the unknown tilted plane and the oligomer size.

8

eeeHive: a new HF RFID-based automated behavioral monitoring system for group-housed animals with high spatiotemporal resolution

Benner, S.; Shiono, S.; Kagawa, T.; Hattori, K.; Yamasue, H.; Lipp, H.-P.; Endo, T.

2026-05-05 animal behavior and cognition 10.64898/2026.04.30.720993 medRxiv

Top 0.3%

1.0%

Show abstract

Long-term, automated tracking of group-housed social animals using RFID (radio frequency identification) is a promising approach in ethological neuroscience. However, low-frequency (LF) RFID, while long-established in the field, is constrained by its inherent low data rates, which lead to two critical limitations: (1) compromised spatiotemporal resolution, and (2) the inability to identify multiple tags (animals) simultaneously. To address these limitations, we developed eeeHive, a high-frequency (HF) RFID-based animal tracking system with a fully custom hardware architecture that enables high-speed, multiplexed antenna polling and concurrent multi-tag reading. The polling time per antenna in eeeHive was 5.9 ms, with an additional 8.2 ms read time per tag. We applied the system to track 24 mice for one week, and six common marmosets for seven weeks. The system successfully tracked individuals even within dense clusters, revealing complex behavioral traits characterized by spatial utilization, temporal dynamics, behavioral regularity, and inter-individual relationships. Additional tests with Japanese fire-bellied newts and Nile tilapia juveniles demonstrated comparable tracking performance in aquatic environments. Taken together, eeeHive overcomes the inherent limitations of conventional LF RFID, establishing a powerful HF RFID-based platform for fine-scale behavioral tracking of group-housed animals across terrestrial and aquatic species.

9

CTGoMartini: A Python Framework for Simulating Biomolecular Conformational Transitions with Go-Martini Models

Yang, S.; Song, C.

2026-05-04 biophysics 10.64898/2026.04.30.721921 medRxiv

Top 0.3%

0.9%

Show abstract

Characterizing conformational transitions between distinct structural states is essential for understanding protein function but remains challenging due to the timescale limitations of atomistic molecular dynamics. While coarse-grained models like Martini accelerate sampling, classical elastic-network or G[o]-like restraints often trap proteins in a single energy basin, precluding the study of transition pathways between distinct functional states. Here, we present CTGoMartini, a comprehensive Python package designed to simulate protein conformational transitions using G[o]-Martini models in explicit membranes. CTGoMartini addresses key methodological limitations of existing approaches by redefining native contacts as a dedicated interaction type, thereby eliminating spurious protein aggregation artifacts in multi-copy simulations. The package implements both switching and multiple-basin approaches (Exponential and Hamiltonian mixing) to sample transitions between experimentally defined states. Furthermore, it integrates Hamiltonian replica exchange molecular dynamics (HREMD) with PyMBAR analysis, enabling efficient optimization of mixing parameters that govern barrier heights and relative state stabilities. We demonstrate the power of CTGoMartini through two biologically significant membrane protein systems: (1) capturing the inward-open to outward-open transition of the lipid transporter SPNS2, revealing the molecular mechanism of S1P translocation; and (2) elucidating how membrane surface tension and anionic lipids (POPA, PIP2) modulate the conformational equilibrium of the mechanosensitive ion channel TREK1. By streamlining model construction, simulation, and analysis, CTGoMartini offers an easy-to-use platform that connects static structural snapshots with their underlying dynamic functional mechanisms. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=118 SRC="FIGDIR/small/721921v1_ufig1.gif" ALT="Figure 1"> View larger version (26K): org.highwire.dtl.DTLVardef@75eb26org.highwire.dtl.DTLVardef@1a12accorg.highwire.dtl.DTLVardef@e927org.highwire.dtl.DTLVardef@1cb0dcd_HPS_FORMAT_FIGEXP M_FIG C_FIG

10

Figra: A WebAssembly-based Excel Add-in for publication-quality scientific visualization with ggplot2

Sato, Y.

2026-05-12 bioinformatics 10.64898/2026.05.06.723320 medRxiv

Top 0.4%

0.8%

Show abstract

Data visualization is a critical step in scientific communication. Most researchers rely on subscription-based software for this purpose, which requires ongoing licensing costs. Free alternatives such as R and Python offer publication-quality output but demand programming expertise that many researchers do not possess. Artificial intelligence tools can assist with figure generation but remain frustrating when users wish to fine-tune specific visual parameters to their preference. Meanwhile, Microsoft Excel, the most widely used tool for scientific data storage and management, offers limited visualization capabilities, forcing researchers to transfer their data to external software as an extra step before creating figures. Here we present Figra, a free Excel Office Add-in that eliminates this extra step by enabling publication-quality ggplot2-based figure generation directly within Excel, with simple and direct control over every visual option. Figra leverages WebAssembly technology (webR) to execute R code entirely within the browser, requiring no R installation, no subscription, and no server connection. The add-in supports over 20 chart types spanning distribution plots, grouped comparisons, time-series, scatter plots, and specialized curve-fitting analyses. For applicable chart types, Figra performs automated or manual statistical analysis supporting both paired and unpaired designs across two or more groups. Additionally, Figra exports simplified, executable R code that reproduces the displayed figure, serving as an educational tool for researchers wishing to learn ggplot2. Figra is open-source and freely available at https://h20gg702.github.io/figra-pages/index.html while the source code is provided at https://github.com/h20gg702/Figra.

11

An AI-Powered Smartphone Application for Universal and Standardized Reading and Interpretation of Lateral Flow Assays

Bermejo-Pelaez, D.; Darias, O.; Pastor, L.; Valles, R.; Diez, N.; Lin, L.; Garcia-Villena, J.; Cuadrado, D.; Vladimirov, A.; Alamo, E.; Postigo, M.; Rodriguez-Dominguez, M.; Canton, R.; Rodriguez-Tudela, J. L.; Alastruey Izquierdo, A.; Bohorquez, L. C.; Rubio, J. M.; Dacal, E.; Luengo-Oroz, M.

2026-05-18 public and global health 10.64898/2026.05.14.26352875 medRxiv

Top 0.5%

0.7%

Show abstract

Introduction. Lateral flow assays (LFAs) are indispensable rapid diagnostic tools in healthcare, enabling point-of-care diagnosis critical for patient management and support disease burden assessment and surveillance when results are properly recorded. However, misinterpretation errors and unreported cases remain a concern. A quality-assured, affordable Ai-powered tool, supporting the decision-making during result interpretation could promote proper disease monitoring and epidemiological surveillance. Here, we describe the performance of a universal AI model to digitize and interpret results from multiple LFA types through a smartphone application, a step that could ultimately enable standardized and digitally reportable test outcomes. Methods. The AI algorithm was evaluated in 17 LFA types, including both 2-band and 3-band tests for different diseases and manufacturers. The model was trained on a dataset of 22,576 images captured under diverse lighting conditions with different smartphone models and using a custom mobile application, TiraSpot (Spotlab, Madrid, Spain). To assess generalizability, a leave-one-out cross-validation was applied, where in each LFA type was iteratively excluded from training and used for testing. Model performance was evaluated using bootstrapping on the inference dataset. Results. In the assessment of the model's ability to generalize to new LFA types not previously analyzed (not included during development), the model achieved an overall AUC of 94.3% for second band detection. This overall performance was enhanced to 99.3% (Sensitivity=98,6%; Specificity=98%) after training with 50 images of each LFA type, highlighting the benefit of additional data for specific LFA types. For the third band detection, where less training data was available, the system achieved an overall AUC of 83.9% for unseen LFAs, improving to 94.2% (Sensitivity=92.9%; Specificity=87,9%) after training with 50 images of each LFA type. Conclusion. This system demonstrates the feasibility of an AI-powered universal digital reader for interpreting LFA results from diverse test types using smartphone-captured images. Its compatibility with standard smartphones makes it a universal tool, enabling reliable LFA interpretation across devices and settings. By standardizing test interpretation and digitizing results, this tool could support decision making in result interpretation, enhancing epidemiological surveillance, particularly in resource-limited settings. Its adaptability across various infections highlights its potential to improve diagnostic consistency and support disease management in diverse healthcare settings.

12

SuBMIT: A Software Toolkit for Facilitating Simulations of Coarse-Grained Structure-Based Models of Biomolecules.

Prakash, D. L.; Banerjee, A.; Gosavi, S.

2026-05-20 biophysics 10.64898/2026.05.18.725912 medRxiv

Top 0.5%

0.7%

Show abstract

Coarse-grained structure-based models (CG-SBMs; or G[o] models) are simplified potential energy functions of biomolecules or biomolecular complexes that encode their structure. Molecular dynamics simulations of such SBMs have been successfully used to study long time-scale dynamics such as protein and RNA folding, and large conformational transitions of biomolecular complexes. SBMs have several advantages: (1) Their MD simulations are computationally inexpensive, making extensive sampling easily accessible to many researchers. (2) They are easy to modify and can be adapted for the specific biomolecular problem that needs to be investigated. However, the force-fields of SBMs are not usually included in commonly used biomolecular simulation packages resulting in a barrier to their use. Here, we present SuBMIT (Structure Based Models Input Toolkit; https://github.com/sglabncbs/submit), a toolkit for generating coarse-grained SBM input files for performing MD simulations with GROMACS and OpenMM/OpenSMOG. Simulations whose input files can be generated using the different flavors of CG-SBMs present in SuBMIT include the folding and conformational ensembles of proteins with intrinsically disordered regions, 3D-domain-swapping in proteins and the dynamics of RNA-protein assemblies (e.g., simple RNA viruses).

13

Nanopore event detection in a simple and adaptive way

Wei, P.; Kansari, M.; Mierzejewski, M.; Ensslen, T.; Lin, C.-Y.; Kavetsky, K.; Jones, P. D.; Behrends, J. C.; Drndic, M.; Fyta, M.

2026-05-11 bioinformatics 10.64898/2026.05.07.723187 medRxiv

Top 0.5%

0.7%

Show abstract

Nanopore read-out, that is the current signals measured across nanometer-sized openings in dielectric membranes or through natural protein channels, enables the detection, identification and sequencing of individual molecules. The detection can take place by analyzing the events of single biomolecules interacting with the pore. The accuracy in the detection of these single events is key for identification of physicochemical properties of analyte molecules. To this end, we further develop a very simple, fast, almost parameter-free, and adaptable cluster-based event detection (CBED) algorithm that clusters the nanopore signals prior to detecting nanopore events. The algorithm is validated against two other event detection schemes with respect to simplicity and efficiency. For this, nanopore data from four different experiments stemming from different laboratories that vary in the nanopore type, size, and analyte are considered. The comparison is made on the basis of the number of events detected, their quality, and the most important features extracted from nanopore events. Our results underline the higher efficiency and less noise of the CBED detected events for biological nanopore data and the need for an on-the-fly adaptivity of the baseline current for a class of solid-state nanopore data.

14

metaJAM: a Nextflow integrated metagenomic workflow for sedimentary ancient DNA

Johnson, E.; Jin, C.; Guinet, B.; Alumbaugh, J.; Martin, N. L.

2026-05-07 bioinformatics 10.64898/2026.05.05.722689 medRxiv

Top 0.5%

0.6%

Show abstract

The application of metagenomics in ancient DNA (aDNA) research is rapidly expanding, driven in particular by advances in sedimentary aDNA research and sequencing technologies. Although many ancient DNA studies rely on broadly similar bioinformatic strategies, there is still no single standardized, widely adopted workflow. These differences can directly affect how efficiently past biodiversity can be reconstructed and authenticated from the various archives analyzed using ancient metagenomic approaches. Although a few pipelines tackle the processing of ancient DNA data from shotgun sequencing, the ones applied to metagenomic datasets are scarce and often resource-intensive or challenging to install, update, or extend with new tools and parameters. metaJAM, a scalable and user-friendly pipeline, is presented here to specifically address the challenges of metagenomic aDNA analyses of eukaryotes. The pipeline has been designed in Nextflow to ensure continuous development and can be used on different high-performance computing (HPC) clusters. metaJAM integrates all key steps required for ancient DNA metagenomic analyses, from raw sequencing data pre-processing to microbial filtering, taxonomic assignment via competitive iterative mapping against Bowtie 2 reference indexes and reassignment using lowest common ancestor (LCA) inference. Validation and authentication are performed using the post-LCA toolkit bamdam together with alignment to an exhaustive reference database using MMseqs2. It allows users to choose among alternative tools and generates a series of plots to support data visualization and taxon authentication. metaJAM differs from existing pipelines through its implementation of rigorous filtering of microbial-like reads by Kraken 2 classification and masking microbial-like regions, iterative or parallel Bowtie 2 mapping, validation of the detected taxa and integration of up-to-date tools for ancient metagenomic analysis, along with diagnostic plots that help users assess the reliability of taxonomic assignments and visualize their data. It complies well with limited computational resources, customised databases for taxonomical groups, and provides an accessible workflow to support the investigation of metagenomic ancient DNA datasets. Its applications span a range of contexts, from ecosystem reconstructions in environmental aDNA archives such as sediments, to metagenomic studies on archaeological artefacts and even taxonomic identification of undiagnosed biological materials.

15

Building an open ecosystem for molecular neuroimaging: standards and tools from the OpenNeuroPET initiative

Ganz, M.; Norgaard, M.; Pernet, C.; Matheson, G. J.; Galassi, A.; Ceballos, E. G.; Wighton, P.; Bilgel, M.; Eierud, C.; Gonzalez-Escamilla, G.; Buckholtz, J.; Blair, R.; Markiewicz, C. J.; Hardcastle, N.; Greve, D. N.; Thomas, A. G.; Poldrack, R. A.; Calhoun, V. D.; Innis, R. B.; Knudsen, G. M.

2026-05-09 bioinformatics 10.64898/2026.05.06.722876 medRxiv

Top 0.5%

0.6%

Show abstract

Molecular neuroimaging with positron emission tomography (PET) and single-photon emission computed tomography (SPECT) enables quantification of specific molecular targets in the living brain. Despite its scientific impact, molecular neuroimaging research has historically faced challenges due to high costs, small sample sizes, laboratory-specific analysis pipelines, and limited large-scale data sharing. These factors have hindered reproducibility and the broader reuse of valuable PET datasets. The OpenNeuroPET initiative was established to address these barriers by developing standards, infrastructure, and open-source tools for organizing, sharing, and analyzing molecular neuroimaging data. Through collaborations across Europe and North America, OpenNeuroPET has supported the PET extension of the Brain Imaging Data Structure (PET-BIDS), providing a standardized framework for PET datasets and metadata. Building on PET-BIDS, tools such as PET2BIDS, ezBIDS, and BIDSCoin facilitate data conversion and curation. In parallel, OpenNeuro now hosts PET-BIDS datasets for open sharing, while complementary platforms such as PublicnEUro enable GDPR-compliant controlled access. Emerging open-source workflows and BIDS applications further support automated, reproducible PET preprocessing and quantitative analysis, promoting harmonized processing across centers. Together, these developments mark an important step toward an open molecular neuroimaging ecosystem in which datasets, software, and workflows can be transparently shared, reused, and scaled for collaborative research.

16

Revealing the Hidden Landscape of Public Metabolomics Data Reuse in MetaboLights

Karaman, I.; Payne, T.; Vizcaino, J. A.

2026-05-05 bioinformatics 10.64898/2026.05.01.722142 medRxiv

Top 0.6%

0.6%

Show abstract

Public data reuse is a key driver of progress in omics sciences, including increasingly metabolomics data. In this study, we present a validated analysis of confirmed reuse of datasets from the MetaboLights data repository, one of the leading resources in the field. Candidate publications were collected via dataset identifiers (MTBLS#) using a Python-based retrieval pipeline across major publisher databases. They were next manually validated to distinguish active reuse from citation-only mentions. Overall, 272 unique publications were confirmed to have reused at least one MetaboLights dataset. Reuse is dominated by Method/Tool Development, with smaller contributions from Secondary Biological Analysis and Data Integration/Meta-analysis. LC-MS datasets account for the majority of reuse, whereas NMR and GC-MS also contribute but at a lower level. Data reuse has increased over time, with a noticeable acceleration in the most recent years. At the dataset level, reuse follows a long-tail distribution, where a small subset of datasets accounts for repeated reuse, mainly as community benchmarks. These results provide a conservative estimate of public metabolomics data reuse and show that public datasets are predominantly used for methodological and computational applications. They also indicate that reuse is under-detected when dataset identifiers are not consistently reported, highlighting the need for standardised dataset citation to improve traceability and recognition of reuse. Statement of significance of the studyThe impact of public metabolomics repositories has been difficult to assess due to the lack of reliable evidence distinguishing true data reuse from simple literature citations. This study addresses that gap by providing a conservative, manually validated baseline for confirmed reuse of datasets from the MetaboLights data repository. The analysis clarifies how MetaboLights datasets are used in practice, showing that reuse is concentrated to a limited number of datasets and is dominated by computational and methodological applications.

17

Improving Welfare Through Enrichment: A Case Study in Aged Ex-Laboratory Rhesus Macaques

Dell'Anna, F.; Albanese, V.; Berardi, R.; Kuan, M.; Marliani, G.; Accorsi, P. A.; Padrell, M.; Llorente, M.

2026-05-08 animal behavior and cognition 10.64898/2026.05.05.719840 medRxiv

Top 0.6%

0.5%

Show abstract

Rhesus macaques (Macaca mulatta) are widely used as non-human primate models for biomedical research. When housed in captivity, it is essential to provide an environment that supports their natural behaviours; otherwise, they risk developing mood disorders, stereotypies, and other behavioural issues that may lead to physical harm. The objective of this preliminary study was to monitor the behaviour of three aged rhesus macaques ([≥] 20 y.o.), relocated from a laboratory to a Rescue Centre for Exotic Animals (Italy), and to assess the impact of novel food enrichments. Behavioural data were collected over 18 weeks, beginning at their arrival, using continuous focal sampling from video recordings. Simultaneously, faecal samples were gathered for cortisol analysis. The study was divided into three phases: a control phase without enrichments, a feeding enrichment phase (divided into two periods), and a final control phase without enrichments. Each phase comprised 900 minutes of observations for each subject. Data were analysed using generalized linear mixed models. Results showed an increase in locomotion during the enrichment and final phase compared to the initial phase. Additionally, a reduction in scratching and body-shaking behaviours was observed in the final phase compared to the initial phase. These findings suggest that implementing an enrichment program can enhance the welfare of aged non-human primates and can be considered a valuable tool in the rehabilitation of non-human primates previously housed in laboratories. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=113 SRC="FIGDIR/small/719840v1_ufig1.gif" ALT="Figure 1"> View larger version (50K): org.highwire.dtl.DTLVardef@152a3a1org.highwire.dtl.DTLVardef@74b53forg.highwire.dtl.DTLVardef@275b21org.highwire.dtl.DTLVardef@1d004d8_HPS_FORMAT_FIGEXP M_FIG C_FIG RESEARCH HIGHLIGHTSO_LIEnvironmental enrichment positively affected activity and stress indicators in aged ex-laboratory rhesus macaques. C_LIO_LILocomotion rates increased while scratching, body-shaking, and cortisol levels decreased. C_LIO_LIEnrichment enhance welfare during rehabilitation, even in older individuals. C_LI

18

Protocol for measuring endocrine disruptive effects on transcriptional bursting using single-molecule imaging in human breast cancer cells

Yasar, P.; Day, C. R.; Rodriguez, J.

2026-05-05 cell biology 10.64898/2026.05.01.722245 medRxiv

Top 0.6%

0.5%

Show abstract

Transcriptional bursts regulate gene expression by altering burst size or burst frequency. Here, we present a protocol that integrates fixed-cell smFISH and live-cell single-molecule imaging to analyze estrogen-responsive transcriptional bursting of the TFF1 gene in human breast cancer cell lines. This workflow enables measurement of burst size, burst initiation, and active allele frequency to determine how endocrine disruptor chemicals modulate transcriptional bursting dynamics. For complete details on the use and execution of this protocol, please refer to Day, Yasar et al.1

19

How to Monitor Physical activity in pregnant women? Questionnaire and accelerometer: stages of building a virtual assistant

Perdona, G. C.; da Costa, T. C.; da Silva, C. M.; de Fazio, R. B.; Zanutto, N. T.; Lopes, C. E. C. E.; Facci, L. B.

2026-05-18 health informatics 10.64898/2026.05.07.26343713 medRxiv

Top 0.6%

0.5%

Show abstract

Introduction: Physical activity during pregnancy can be tracked directly by accelerometer measurements and indirectly by validated questionnaires. Considering the advancement of the Internet of Things (IOT), managing and/or monitoring physical activities can be better explored to analyze individuals, as well as indirectly compare the intensity and domains of physical activities carried out by pregnant women. The project, called 'EVA'(Expert Virtual Assistant), suggests combining several fields of knowledge to obtain better information about physical activity during pregnancy, surpassing the claim made in previous research that studying and measuring the duration of daily physical activities in pregnant women is a challenge. Objective: In the present study, we present the results of the first stage of the EVA project, which aims to develop a Virtual Assistant (VA) in Portuguese, providing examples of health management features for monitoring Physical Activity measurements for pregnant women assisted in the Unified Health System (SUS) and the adaptation of the Pregnancy Physical Activity Questionnaire (PPAQ). Methods and Analysis: The methods used were developed in two stages: adapting the physical activity questionnaire and building the Virtual Assistent to monitor physical activities. Thirty pregnant women who used the Unified Health System (SUS) in the city of Ribeir&atildeo Preto, Brazil participated in the study. The pregnant women wore sensor wristbands (accelerometers) and answered the sociodemographic, lifestyle and physical activity questionnaires via an application developed for this study. Results: The questionnaire used was the PPAQ adapted for Brazilian pregnant women. The most important changes were in the occupational domain for the house cleaning and in sedentary behavior activities. In the pilot study, it was observed that pregnant women spend more energy at home and in light and moderate intensity activities. textbfConclusion:This study made important contributions to evaluating PA in pregnant women. The proposal and studies for the construction of the AV-EVA, the inclusion of a specific occupational domain for pregnant women with domestic occupations and the new cutoff points for PA intensity measurements obtained via accelerometers.

20

Reparameterization of the Amber RNA Force Field Non-Bonded Terms

Puthenpeedikakkal, A. M. K.; Cavender, C. E.; Smith, L. G.; Grossfield, A.; Mathews, D.

2026-05-19 biochemistry 10.64898/2026.05.18.725894 medRxiv

Top 0.6%

0.5%

Show abstract

All-atom simulations of RNA using molecular dynamics have the promise of modeling conformational preferences, folding thermodynamics, conformational change kinetics, and binding affinities of small molecule therapeutics. These simulations rely on a force field, a set of equations and parameters that model the potential energy as a function of conformation using classical mechanics. One popular force field for RNA is Amber OL3, with the most recent iteration derived in 1999 and with subsequent updates to backbone dihedral parameters. The Amber force field, while frequently used, is known to have limitations; for example, it does not properly stabilize native structures against alternative structures. Here, we provide a new approach to fitting the non-bonded parameters for the force field, specifically atom-centered point charges for electrostatics and the Lennard-Jones parameters. The parameters are fit to quantum mechanics (QM) interaction energies calculated with symmetry-adapted perturbation theory (SAPT), including embedded point charges to represent the electrostatic field from solvent and adjacent nucleotides. In this pilot study with a limited set of fitting data, we use the Amber ff99 equations and atom types unchanged. With the revised parameters, we observe improvement in the stability of native structures relative to alternative structures. Native tetraloop conformations, which unfold with the Amber OL3 force field, are stable on the microsecond timescale with our new force field parameters. We also see improvement in the conformational preferences of tetramers. Crucially, A-form helices are still well-modeled, but we observe additional flexibility in an internal loop that is not consistent with NMR data. Overall, we provide evidence that this new approach to fitting RNA force field parameters to SAPT interaction energies with native-structure context represented as embedded point charges is promising. It offers a flexible solution for revising the equations in future work or for extension to other molecules that interact with RNA, such as proteins and small molecules. We call this new set of force field parameters Amber RNA.ROC26.