Patterns
○ Elsevier BV
Preprints posted in the last 90 days, ranked by how well they match Patterns's content profile, based on 70 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit.
Arabi, S.; Hutchins, B. I.
Show abstract
Early identification of promising drug research topics is challenging yet crucial for the scientific community to accelerate the development of novel therapeutics. In this work, we leverage large-scale public data from the biomedical literature to extract predictive features to identify promising therapeutic research topics at an early stage. We divide the global citation graph of biomedical literature into a time series of research topics and extract topic features based on citation activity, publication content, and measurable flocking of scientists into novel research topics. Based on these features, our machine learning model identifies research topics that in the future yield Food and Drug Administration (FDA)-approved drugs years before approval (F1-score of 0.84). 80% of target drugs are predicted in advance, with 65% predicted 8 or more years before approval. This predates the start of phase 2 clinical trials in the vast majority of positive predictions. These results show this approach can efficiently flag research topics generating approved drugs several years prior to approval using public data that would have been contemporaneous at the time of prediction. Thus, reliable forecasting can be accomplished with a high-level view of the publication and citation behavior of scientists, without depending on clinical trial data that may only be deposited with a significant lag. This demonstrates that it is possible to detect early signals of future FDA approved therapies even without any specialized information about these applied research efforts. TeaserLarge-scale data analysis can use the full set of scientific citations to predict which areas of research will yield new FDA approved drugs, years in advance.
Jovanovic, M.; Weidener, L. S.; Brkic, M.; Ulgac, E.; Meduri, A.
Show abstract
Drug-induced inhibition of the hERG potassium channel is the leading cause of cardiac safety-related drug attrition, but the Comprehensive in Vitro Proarrhythmia Assay (CiPA) framework requires activity data on multiple cardiac ion channels to assess proarrhythmic risk. We present CardioSafe, a three-branch multi-task neural network with cross-attention fusion that integrates chemical fingerprints, ChemBERTa embeddings, and predicted L1000 transcriptomic features to predict blocker status and potency for hERG, Nav1.5, and Cav1.2, with an exploratory IKs head. CardioSafe was trained on the largest publicly reported multi-channel cardiac ion channel dataset, combining ChEMBL 36 with the hERGCentral database (331127 hERG, 3160 Nav1.5, 1138 Cav1.2, and 115 IKs compounds), curated under a pharmacology-aware policy that retains censored measurements and inhibition-percentage votes. Under Tanimoto-similarity-controlled splits, CardioSafe outperforms the leading published comparators (CToxPred2 and CardioGenAI) on the data-rich hERG head; on the smaller Nav1.5 and Cav1.2 heads the standard evaluation is statistically inconclusive. A reverse-leak audit revealed that 22% of Nav1.5 and 21% of Cav1.2 test compounds were present in published comparators training data (92% as exact compound matches); after removing these contaminated compounds, CardioSafes lead on Nav1.5 and Cav1.2 also reaches statistical significance, demonstrating that prior cross-publication benchmarks for these channels were inflated by training-data overlap. Scientific contributionWe present the first multi-task neural network jointly predicting blocker activity for the three primary CiPA cardiac ion channels (hERG, Nav1.5, Cav1.2) within a single architecture. We introduce a reverse-leak audit methodology that reveals systematic test-set contamination in cross-publication cardiac safety benchmarks, establishing a stricter evaluation protocol. We provide the empirical test of predicted L1000 transcriptomic features as auxiliary input for cardiac ion channel prediction and document a well-characterized negative result. Graphical abstractCardioSafe encodes each query SMILES with three branches (chemical fingerprints + descriptors, pretrained ChemBERTa, and predicted L1000 transcriptomic signatures), fuses them via a cross-attention block with four learnable per-channel query tokens, and emits binary blocker calls plus pChEMBL regression for hERG, Nav1.5, Cav1.2, and (exploratory) IKs. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=59 SRC="FIGDIR/small/723181v1_ufig1.gif" ALT="Figure 1"> View larger version (13K): org.highwire.dtl.DTLVardef@1c0ba2aorg.highwire.dtl.DTLVardef@1fe3a0borg.highwire.dtl.DTLVardef@194de8aorg.highwire.dtl.DTLVardef@9e4f74_HPS_FORMAT_FIGEXP M_FIG C_FIG
Tan, S.; Tian, Z.
Show abstract
The rapid advancement of AI research automation systems--including AI Scientist, data-to-paper, and Agent Laboratory--has demonstrated the potential for autonomous scientific discovery. However, existing benchmarks for evaluating these systems focus predominantly on fundamental sciences (machine learning, physics, chemistry), overlooking the unique challenges of medical clinical research: complex survey designs, inferential statistics with confounding control, adherence to reporting standards (STROBE, CONSORT), and the requirement for clinically actionable interpretation. We present MedResearchBench, the first benchmark specifically designed to evaluate AI systems on medical clinical research tasks. MedResearchBench comprises 16 tasks spanning 7 clinical domains (cardiovascular, oncology, mental health, metabolic, respiratory, neurology, infectious disease), built on publicly available datasets (the National Health and Nutrition Examination Survey [NHANES] and the Surveillance, Epidemiology, and End Results [SEER] program) with ground truth from 16 high-quality published papers (IF range: 2.3-51.0). Each task is evaluated along 6 medical-specific dimensions: statistical methodology, results accuracy, visualization quality, clinical interpretation, confounding sensitivity, and reporting compliance. We describe the benchmark design rationale, task construction methodology, paper selection criteria with anti-paper-mill filtering, and a detailed analysis of task characteristics including methodological diversity, evaluation dimension coverage, and difficulty stratification. To demonstrate benchmark executability, we evaluate an agentic data2paper pipeline on 3 pilot tasks spanning all three difficulty tiers, achieving scores of 72/100 (Tier 1, Cardio_000), 69/100 (Tier 2, Mental_000), and 75/100 (Tier 3, Metabolic_002), with a mean score of 72/100 (B-level). Survey-weighted methodology was correctly implemented across all tasks; primary limitations included covariate incompleteness and reference group misspecification. MedResearchBench addresses a critical gap in AI research evaluation and provides a standardized, community-extensible platform for assessing whether AI systems can conduct clinically sound, publication-quality medical research. All task materials are publicly available at https://github.com/TerryFYL/MedResearchBench.
Lu, H.-E.; Koivisto, D.; Lou, Y.; Zeng, Z.; Yu, T.; Wang, J.; Meng, X.; Nowikow, C.; Wilson, R.; Kumbhare, D.; Pu, J.
Show abstract
Deep learning has transformed medical image and video analysis, but it usually requires large, well annotated datasets. In many clinical domains, especially when testing novel mechanistic hypotheses, such retrospective datasets are hard to obtain since acquiring adequate cohorts is time intensive, costly, and operationally difficult. This creates a critical translational gap: scientifically compelling early stage ideas may remain untested due to lack of sufficient sample size to support conventional deep learning pipelines. Developing data-efficient strategies for evaluating new hypotheses within small prospective cohorts is therefore essential to de-risk innovation before large-scale validation. Myofascial Pain Syndrome (MPS) exemplifies this challenge, as quantitative ultrasound imaging biomarkers for MPS remain underexplored. We investigated whether MPS in the upper trapezius can be detected from full B-mode ultrasound videos in a small prospective cohort (11 controls, 13 patients). Videos were automatically preprocessed and resampled using a sliding window strategy to expand training samples (404 clips). A self-supervised Video Diffusion Encoder (VDE) is developed to learn spatiotemporal representations without relying on extensive labeled data, and compared it with transfer-learning-based ResNet, VideoMAE, and SimCLR. Using subject-level stratified four-fold cross-validation, the VDE outperformed transfer learning baselines and achieved performance comparable to SimCLR, with subject-level AUC of 0.79 and accuracy of 0.86, and no significant differences between latent-only and combined trigger point analyses. These results demonstrate that self-supervised diffusion learning can support robust, data-efficient deep learning in small prospective studies, enabling early feasibility testing of innovative ultrasound biomarkers before large-scale clinical trials.
Ramirez, A.; Thomas, N.; Calabrese, D. R.; Greenland, J. R.; Meyer, A. S.
Show abstract
Cell-cell communication (CCC) mediates coordinated cellular activities that vary dynamically across time, location, and biological context. While various tools exist to infer CCC, they typically aggregate data according to pre-defined cell types, obscuring critical single-cell heterogeneity. Furthermore, because signaling pathways and cell populations operate in a coordinated manner, an integrative analytical approach is essential. To address these challenges, we developed CCC-RISE, an extension of the tensor-based method Reduction and Insight in Single-cell Exploration (RISE). CCC-RISE identifies integrative patterns of single-cell variation by deconvolving communication into interpretable modules defined by unique sender cells, receiver cells, ligands, and condition associations. We applied this framework to a COVID-19 cohort with varying disease severity and a lung transplant cohort with acute allograft dysfunction. In both contexts, CCC-RISE successfully identified disease-relevant communication programs and traced them to specific cellular subpopulations, often crossing conventional cell-type boundaries. This approach offers a robust pipeline enabling the identification of disease-relevant signaling subpopulations that are invisible to aggregate methods. HighlightsO_LICCC-RISE enables integrative analysis of cell-cell communication across multiple conditions at single-cell resolution C_LIO_LICCC-RISE deconvolves signaling patterns into modules defined by their sender cells, receiver cells, LR pairs, and experimental conditions/samples C_LIO_LIAnalysis at single-cell resolution uncovers signaling activity within and across conventional cell types C_LI
Blankenship, L.; Sterrett, S. C.; Martins, D. M.; Findley, T. M.; Abe, E. T. T.; Parker, P. R. L.; Niell, C.; Smear, M. C.
Show abstract
Neuroscience needs observation. Observation lets us evaluate data quality, judge whether models are biologically realistic, and generate new hypotheses. However, high-dimensional behavioral and neural data are too complex to be easily displayed and eye-tested. Computational methods can reduce the dimensionality of data and reveal statistically robust dynamical structure but often yield results that are difficult to relate back to the underlying biology. In addition, the choice of what parameters to quantify may not capture unexpectedly relevant aspects of the data. To supplement quantification with enhanced qualitative observation, we developed Visualization and Sonification of NeuroData (ViSoND), an open-source approach for displaying multiple data streams using video and sonification. Sonification is nothing new to neuroscience. Scientists have sonified their physiological preparations since Lord Adrians earliest recordings. We extend this tradition by mapping multiple physiological datastreams to musical notes using MIDI. Synchronizing MIDI to video provides an opportunity to watch an animals movement while listening to physiological signals such as action potentials. Here we provide two demonstrations of this approach. First, we used ViSoND to interpret behavioral structure revealed by a computational model trained on the breathing rhythms of freely behaving mice. Second, ViSoND revealed patterns of neural activity in mouse visual cortex corresponding to eye blinks, events that were previously filtered out of analysis. These use cases show that ViSoND can supplement quantitative rigor with observational interpretability. Additionally, ViSoND provides an accessible way to display data which may broaden the audience for communication of neuroscientific findings.
Qin, Y.; Peng, Y.; Chen, Q.; Chen, J.; Ren, P.; Deng, H.; Wang, D.; Liu, X.; Ou, Z.; Deng, Z.; Shi, X.
Show abstract
Spatial transcriptomic studies of infectious diseases still rely on fragmented data analysis processes. Here, we developed STID, a standardized framework for spatial transcriptomic analysis of infectious diseases that leverages the Seurat ecosystem and incorporates Python-based modules. STID provides an extensible infection-specific data structure and supports a full suite of analyses, such as pathogen background correction, infection-associated spot and niche identification, single-sample niche characterization, and multi-sample comparative and temporal analyses. Moreover, STID is broadly applicable to spatial transcriptomic data from infectious diseases caused by bacteria, viruses, and parasites, and enables systematic characterization of the structural features, cellular composition, molecular functions, and host-pathogen interactions within pathogen-infected and/or host-responsive niches. Overall, STID provides an accessible, reproducible, and extensible framework for analyzing infection-associated spatial transcriptomic data and for dissecting host-pathogen interactions in their native spatial microenvironments. MotivationSpatial transcriptomics technologies have emerged as powerful approaches for dissecting the structural and functional features of spatial microenvironments. However, the current general-purpose tools remain fundamentally inadequate for resolving the spatial heterogeneity of infectious disease samples, where the intricacies of host-pathogen interactions render spatial microenvironments both challenging to dissect and largely inaccessible. Tools tailored to infectious diseases are critically lacking, including those for reducing pathogen-derived background noise, identifying and isolating infection{square}associated spots or niches, dissecting host-pathogen interactions, and supporting systematic multi-sample analyses. We therefore developed STID, a unified framework that integrates standardized workflows and addresses the analytical bottlenecks in spatial transcriptomic analysis of infectious diseases. HighlightsO_LISTID standardizes spatial transcriptomic analysis in infectious diseases C_LIO_LISTID improves pathogen-infected spot detection by correcting pathogen background C_LIO_LISTID distinguishes pathogen-infected and host-responsive niches C_LIO_LISTID supports multi-sample comparative and temporal analyses of niches C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=194 SRC="FIGDIR/small/727492v1_ufig1.gif" ALT="Figure 1"> View larger version (75K): org.highwire.dtl.DTLVardef@167d351org.highwire.dtl.DTLVardef@1628848org.highwire.dtl.DTLVardef@1e157aforg.highwire.dtl.DTLVardef@143ca1b_HPS_FORMAT_FIGEXP M_FIG C_FIG
Jackson, N. J.; Yan, C.; Caro-Vega, Y.; Paredes, F.; Ismerio Moreira, R.; Cadet, S.; Varela, D.; Cesar, C.; Duda, S. N.; Shepherd, B. E.; Malin, B. A.
Show abstract
Digital health technologies, including machine learning (ML), are transforming infectious disease management, however ML models for HIV care have been limited by data sharing restrictions that prevent multi-site collaboration. Federated Learning (FL) offers a privacy-preserving solution, enabling cross-site model training without sharing patient-level data. We evaluated FL for developing clinical prediction models using data from 22,234 people living with HIV (PLWH) across six sites in five countries within the Caribbean, Central, and South America network for HIV epidemiology (CCASAnet). Across four prediction tasks --- 1-year mortality, 3-year mortality, tuberculosis incidence, and AIDS-defining cancer incidence --- FL algorithms achieved near-centralized performance while substantially outperforming site-specific models. Performance gains varied across sites, driven by both site size and between-site heterogeneity. Local fine-tuning often improved FL performance, though benefits were task dependent. These findings support FL as a scalable, privacy-preserving infrastructure for multi-site ML in international HIV research.
Chen, D. Z.; Xie, A.; Ma, C.
Show abstract
Precision medicine has given rise to a spectrum of biomarker-guided trial designs, from simple enrichment and strategy designs to more complex adaptive frameworks. To address the need for user-friendly tools that span this spectrum, we developed a unified R Shiny platform that first implements three standard designs: the randomize-all design, the enrichment design, and the biomarker-strategy design, allowing researchers to perform power and sample size calculations under each framework with intuitive inputs and visual outputs. Building on this foundation, the platform further extends to support two-stage general randomized basket trial designs with interim analysis, which can be viewed as a generalization of the standard designs to multiple biomarker-defined subgroups. The tool was rigorously validated by comparison with established R pipelines and published formulas, and user testing confirmed its intuitive interface. By providing seamless integration from standard to advanced designs under a common input-output framework, our platform enables researchers to directly compare power and sample size requirements across different design choices using the same underlying assumptions. The result is a freely accessible tool offering effective visualizations for the full spectrum of biomarker-guided trial designs, available at https://ampt.obicloud.ca/. Future improvements may further expand the tools capabilities to accommodate the increasing complexity of trial designs needed by the research community.
Yin, Q.; Chen, L.
Show abstract
Programmed cell death (PCD) encompasses multiple regulated processes whose dysregulation shapes cancer fitness, yet current computational studies largely use known PCD genes for prognosis rather than discovering regulators. We developed xNNPCD, an interpretable neural-network framework that links CRISPR-Cas9 perturbation signatures from CMap to gene dependency profiles from DepMap. The model constrains hidden neurons to five PCD pathways and iteratively refines a prior gene-pathway mask matrix derived from GO, KEGG, and Reactome using pathway-neuron ablation. This converts binary gene-pathway relationships into continuous-valued associations and improves dependency prediction over random forests, standard fully connected multi-layer perceptron, and its own non-iterative variant. The learned matrix recovers annotated death regulators and nominates candidate regulators, including RPL23A, HSPA5, SNRPA1, SLC6A2, and ASAH1; combined with dependency scores, it further separates pathway coupling from regulatory direction. Transferring the refined relationship matrix and learned weights to compound-induced perturbation data enables in silico drug screening, identifying BRD-K19103580 and decitabine as targeted therapeutic agents for apoptosis and ferroptosis, respectively. The pathway-resolved drug profiles can facilitate the rational design of combination therapies targeting complementary PCD pathways to overcome single-pathway resistance. Overall, xNNPCD offers a generalizable, interpretable approach for mapping the regulatory landscape and elucidating the molecular processes of PCD in cancer. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=75 SRC="FIGDIR/small/724150v1_ufig1.gif" ALT="Figure 1"> View larger version (21K): org.highwire.dtl.DTLVardef@e74c0forg.highwire.dtl.DTLVardef@1326f4corg.highwire.dtl.DTLVardef@291e96org.highwire.dtl.DTLVardef@1970f10_HPS_FORMAT_FIGEXP M_FIG C_FIG
Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.
Show abstract
Interpreting machine learning models typically relies on feature attribution methods that quantify the contribution of individual variables to model predictions. However, it remains unclear whether attribution magnitude reflects the true functional importance of features for model performance. Here, we present a unified interpretability framework integrating permutation-based attribution, feature ablation, and stability under perturbation across multiple feature spaces. Using nested cross-validation and permutation-based null diagnostics, we systematically evaluate the relationship between attribution magnitude and functional dependence in clinical and biomarker-based prediction models. Attribution magnitude is frequently misaligned with functional importance, with weak to strong negative correlations observed across feature spaces (Spearman {rho} ranging from -0.374 to -0.917). Features with high attribution often have limited impact on model performance when removed, whereas features with low attribution can be essential for maintaining predictive accuracy. These discrepancies define distinct classes of interpretability failure, including attribution excess and latent dependence. Interpretability further depends on feature space composition, and stable, functionally relevant features are not necessarily those with the highest attribution scores. By integrating attribution, functional impact, and stability into a composite Feature Reliability Score, we identify features that remain informative across perturbations and analytical contexts. These findings indicate that interpretability does not arise from attribution magnitude alone but is better characterized from stability under perturbation. This framework provides a basis for more robust model interpretation and highlights limitations of attribution-centric approaches in high-dimensional and correlated data settings.
ye, w.; Jiang, X.; Shen, F.
Show abstract
ObjectiveAiming at the core problems prevalent in biomedical research, including the "translational distance", the difficulty in aligning cross-scale studies, and the lack of direct validation of single-cell systems biology models in human samples, this study aims to verify whether the results of transcriptome-wide Mendelian randomization (TWMR) based on large-scale populations are consistent with the causal inference results of deep learning combined with double machine learning (DML) using single-cell transcriptome data from human samples, to clarify whether statistical biology and systems biology can converge to the same biological truth, and provide methodological support for mechanism dissection and precision medicine research of complex diseases such as rheumatoid arthritis (RA). MethodsThis study integrated multi-omics data to conduct a two-stage causal inference and cross-scale validation analysis. In the first stage, based on the summary statistics of RA genome-wide association study (GWAS) from 456,348 individuals of European ancestry in the UK Biobank (UKB), and cis-expression quantitative trait locus (cis-eQTL) data from 31,684 individuals in the eQTLGen Consortium, a two-sample Mendelian randomization approach was adopted. Transcriptome-wide causal effect analysis was performed using the inverse-variance weighted (IVW) method, MR Egger regression, and weighted median method, and gene-level causal effect values were obtained after strict quality control and multiple testing correction. In the second stage, based on single-cell RNA sequencing (scRNA-seq) data from RA patients and healthy controls (RA group: 11 samples, 211,867 cells; Healthy control group: 38 samples, 456,631 cells), after preprocessing via the Seurat pipeline, batch effect correction, and cell type annotation, a hierarchical deep neural network was constructed to complete feature compression of high-dimensional expression data, and the DML framework was used to estimate the causal effects of genes on RA disease status. Finally, Pearson correlation analysis was performed to conduct cell type-specific cross-scale validation of gene-level causal effect values obtained by the two methods, and the validated model was used to quantify the causal effects of 16 RA-related pathways from the Reactome database. ResultsThis study confirmed that the gene causal effect values obtained from large-scale population TWMR analysis were significantly correlated with those calculated by the deep learning combined with DML model based on single-cell transcriptome data. Among them, the correlation was extremely significant (p<0.001) in core naive B cells (r=0.202, p=3.2e-05, n=414) and core naive CD4 T cells (r=0.102, p=0.037, n=412). The validated DML model successfully quantified the cell type-specific causal effect values of 16 RA-related signaling pathways. ConclusionStatistical biology and systems biology can converge to the same biological truth. The cross-scale consistency between the two can significantly shorten the "translational distance" in biomedical research, and realizes the direct validation of the single-cell systems biology causal model of human samples based on large-scale population genetic data, getting rid of the excessive dependence on animal/cell experimental models in traditional research. This research paradigm not only provides a new path for mechanism dissection and therapeutic target screening of complex diseases such as RA, but also provides a feasible solution for rare disease research to break through the limitation of GWAS sample size, and lays an important theoretical and methodological foundation for constructing standardized systems biology models of human complex diseases and promoting the development of precision medicine.
van Geest, G.; Thomas-Lopez, D.; Feitzinger, A. A.; Weissgold, L. A.; Halabi, S.; Cuesta, I.; Hjerde, E.; Gurwitz, K. T.; Arora, N.; Neves, A.; Palagi, P. M.; Williams, J. J.
Show abstract
BackgroundDatasets related to infectious diseases are essential for public health decision-making, yet their reuse remains limited by persistent barriers to data sharing and integration. Achieving data that are Findable, Accessible, Interoperable, and Reusable (FAIR) is widely recognized as essential for accelerating scientific discovery and enabling coordinated responses to emerging threats, but the needs of the global pathogen data community have not been systematically characterized. AimThis study, conducted by the Pathogen Data Network (PDN), aims to identify infrastructural and educational priorities among stakeholders working with infectious disease-related data in order to guide community-responsive support for data sharing and interoperability. MethodsA cross-sectional stakeholder survey was disseminated to a well-defined expert population within PDN networks and via open professional channels. A total of 136 responses from researchers, healthcare professionals, bioinformaticians, and educators were analyzed descriptively to identify prioritized barriers, training needs, and preferred support mechanisms. ResultsRespondents consistently identified structural constraints as the primary impediments to effective data use, including limited funding (74%), data-aggregation challenges (68%), and a shortage of skilled personnel (52%). Respondents identified bioinformatics for infectious disease research (68%) as the highest priority for training, followed by guidance on using the integrated pathogen data and tools portal provided by the PDN, the Pathogens Portal (51%). The Pathogens Portal was also ranked as the most essential PDN resource (72%). Preferred training formats included virtual short courses (68%) and webinars (66%). Notably, while researchers emphasized technical subjects like machine learning, educators prioritized foundational case studies. ConclusionThese findings provide an evidence-based diagnostic of community needs and suggest that barriers to FAIR pathogen data are predominantly systemic rather than purely technological. The survey framework and openly available dataset offer a reusable template for assessing needs in other communities and regions. By aligning training, infrastructure development, and outreach with empirically identified priorities, organizations supporting infectious disease research can strengthen the interoperability and reuse of data and establish a benchmark for future community-driven improvements.
Cajas, S.; Marzullo, A.; Kapadia, S.; Santos, F.; Ocampo Osorio, F.; Kong, Q.; Quarta, A.; Kuo, P.-C.; Patel, M.; Rojas Sillery, R. I.; Celi, L. A.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWShortcut learning poses a significant challenge in clinical artificial intelligence, as models may rely on spurious signals rather than clinically relevant features, leading to biased predictions and poor generalization. Existing detection methods are fragmented and lack systematic evaluation across datasets and model architectures. To address this issue, we propose ShortKit-ML, an open-source Python framework for unified shortcut analysis in embedding spaces. The framework integrates over 20 detection methods and six mitigation strategies within a modular pipeline, encompassing embedding analysis, fairness metrics, training dynamics, causal methods, explainability, and representation analysis. We evaluate the framework on chest X-ray datasets (CheXpert and MIMIC-CXR), synthetic benchmarks, and an out-of-domain dataset (CelebA). Experimental results demonstrate that multi-method auditing provides more stable and interpretable evidence than individual methods, while detector disagreement reveals meaningful representational differences. The proposed framework offers automated reporting, interactive visualization, and is available as a pip-installable package. The source code and documentation are publicly available at https://github.com/criticaldata/ShortKit-ML and https://criticaldata.github.io/ShortKit-ML/.
Ren, H.-C.; Gu, Y.-X.
Show abstract
Pharmacokinetic analysis has spent half a century compressing drug concentration-time curves into scalar summaries--AUC, Cmax, clearance--discarding the shape information that encodes mechanistic fingerprints of the underlying physiology. We introduce Topological Pharmacokinetics (TPK), a framework that reads the shape of pharmacokinetic trajectories directly from data without prior commitment to a compartmental model. TPK uses delay embedding to reconstruct the pharmacokinetic attractor from the concentration-time curve, and persistent homology to extract its topological invariants--connected components and loops--as a Pharmacokinetic Topological Invariant (PTI) vector. We validate TPK across three levels: linear systems (negative control), nonlinear saturable elimination (detection of the N_PTP +1 rule and a nonlinear diagnostic triad), and endogenous circadian rhythms (contrastive detection of rhythmic interference via Dev specificity and Decouple Collapse). The PTI vector provides a model-agnostic shape fingerprint that, in simulation, demonstrates the diagnostic potential of shape-based analysis; validation on experimental data is required to assess whether this potential generalizes to real pharmacokinetic data. All findings are demonstrated as proof of concept on simulated data; validation on experimentally measured concentration-time curves is the essential next step.
Roberts, K. F.; Abrams, Z. B.; Cappelletti, L.; Moqri, M.; Heugel, N.; Caufield, J. H.; Bourdenx, M.; Li, Y.; Banerjee, J.; Foschini, L.; Galeano, D.; Harris, N. L.; Li, M.; Ying, K.; Melendez, J. A.; Barthelemy, N. R.; Bollinger, J. G.; He, Y.; Ovod, V.; Benzinger, T. L. S.; Flores, S.; Gordon, B.; Ojewole, A. A.; Phatak, M.; Elbert, D. L.; Biber, S.; Landsness, E. C.; Mungall, C. J.; Bateman, R. J.; Reese, J.
Show abstract
BackgroundAdvances in medicine depend on analyzing large and complex data sources, but discovery is partly constrained by the limited time and domain expertise of human researchers. Agentic artificial intelligence (agentic AI) can accelerate discovery by automating components of the scientific workflow, including information retrieval, data analysis, and knowledge synthesis. AimOpenScientist, an open-source agentic AI co-scientist, aims to accelerate biomedical discovery by semi-autonomously investigating scientist-defined queries and generating clinically relevant, verifiable scientific insights. MethodsDomain experts evaluated OpenScientist for novel discoveries in four clinical case studies: (1) a prespecified analysis in a community-based Alzheimers disease biomarker cohort, (2) unsupervised modeling for plasma proteomic survival prediction, (3) hypothesis investigation in single-cell transcriptomic data from neurons with neurofibrillary tangles, and (4) hypothesis generation with validation in a multiple myeloma dataset with a randomized negative control. ResultsOpenScientist completed analyses in minutes that otherwise would take weeks to months of human time and expertise. It identified %ptau217 as the best predictor of amyloid PET status, generated a plasma proteomic survival model with performance comparable to published models, proposed a mechanism linking tau pathology to altered lysosomal acidification, and generated multiple myeloma hypotheses that were validated in an external cohort while distinguishing true signal from randomized controls. ConclusionOpenScientist demonstrates that open, auditable, agentic AI can support real-world clinical research by generating hypotheses, executing analyses, and discovering insights from complex datasets.
Zeng, T.; Li, H.; Zhang, S.; Tan, Y. Q.; Tian, F.; Orban, C.; An, L.; Che, W.; Cheng, J.; Chong, J. S. X.; Dehestani, N.; Dong, Z.; Li, X.; Li, Z.; Lim, M. J. R.; Lin, Y.; Ling, Q.; Ling, Z.; Low, X. Z.; Mansour L., S.; Ng, K. K.; Nguyen, T. T.; Ooi, L. Q. R.; Pande, S.; Qian, X.; Ruan, J.; Wang, Z.; Xie, Y.; Zhang, C.; Zhang, Y.; Patil, K.; Parkes, L.; Dhamala, E.; Chopra, S.; Zalesky, A.; Holmes, A.; Eickhoff, S.; Zhou, J. H.; Renaud, O.; Dosenbach, N.; Kording, K. P.; Bzdok, D.; Nichols, T.; Yeo, B. T. T.
Show abstract
Machine learning is accelerating biomedical research. Cross-validation is widely used to compare predictive performance - not only to benchmark algorithms, but also to inform scientific applications, such as ranking biomarkers. However, prediction performance estimates across cross-validation folds are not independent. Standard tests for comparing prediction performance (e.g., paired t-test) assume independence and can therefore inflate false positive rates. In a PRISMA-guided meta-analysis of 210 studies (impact factor [≥]15, 1 June 2020 - 1 June 2025), we find that 97% ignored fold dependence when comparing prediction performance. This problem is ubiquitous across scientific fields and unaffected by impact factor, rigor-promoting policies, or open science practices. Simulations across 420 scenarios spanning four diverse datasets show that ignoring fold dependence leads to invalid false positive control in most settings. Repeated cross-validation further compounds this problem, with false positive rates rising toward 100% as the number of repetitions grows. Existing fold-dependence-aware tests rely on strong assumptions because the variance of fold-level statistics and the between-fold correlation cannot be disentangled under standard cross-validation. We therefore propose the SHARP (Split-HAlf RePeated) test, a simple modification to standard cross-validation that enables direct estimation of variance and correlation. Benchmarked against 12 tests, SHARP provides the best overall balance of false-positive control, statistical power, and confidence-interval calibration across simulation schemes. We conclude by providing best practices and reporting guidelines for valid model comparison inference in biomedical machine learning and beyond.
Sethi, T.; Anand, A.; Pratiti, M.; Ali, S. Y.; Kamra, S.; Verma, S.; Singh, S.; Bajaj, T.
Show abstract
Identifying robust gene expression signatures from transcriptomic studies with small sample sizes remains one of the most persistent challenges in computational biology. Gene expression datasets have thousands of features but only a handful of biological samples. This presents the classic p >> n imbalance, which limits statistical power and makes it difficult to discover reliable biomarkers. In imaging, generative models such as GANs, VAEs, and diffusion models have demonstrated promising applications in data augmentation, but their usefulness for omics data has not been systematically tested. More importantly, no existing framework integrates synthetic data generation, stability-aware signature discovery, and multi-source biological validation into a single pipeline. In this work, we present GeneLift, with the hypothesis that a computational pipeline of generative data augmentation, stability testing, and evaluating biological evidence will aid novel gene-signature discovery in small-cohort transcriptomic studies. We tested this hypothesis across 36 microarray datasets covering five diseases: sepsis, breast cancer, ovarian cancer, tuberculosis, and diabetes. A component-wise testing of GeneLift revealed that Gaussian Mixture Models (GMMs) outperformed deep generative approaches and faithfully reproduced gene-level distributions. By a novel approach of titrating the level of augmentation, we identified biologically meaningful gene candidates that did not appear in the original, underpowered analyses. We also developed BayesScore, a Bayesian posterior probability of gene-disease association computed from PubMed co-occurrence, which both recovers well-characterised disease genes missed by standard differential expression and surfaces candidates whose disease relevance was independently confirmed in subsequent publications, with lead times of up to 18 years between the source dataset and the first disease-specific citation. GeneLift is freely available at tavlab-iiitd/GeneLift. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=103 SRC="FIGDIR/small/720348v1_ufig1.gif" ALT="Figure 1"> View larger version (41K): org.highwire.dtl.DTLVardef@13c8aeeorg.highwire.dtl.DTLVardef@1c5277corg.highwire.dtl.DTLVardef@a35664org.highwire.dtl.DTLVardef@17cdc48_HPS_FORMAT_FIGEXP M_FIG C_FIG
Pavlovic, M.; Wurtzen, C.; Kanduri, C.; Mamica, M.; Scheffer, L.; Lund-Andersen, C.; Gubatan, J. M.; Ullmann, T.; Greiff, V.; Sandve, G. K.
Show abstract
Machine learning (ML) enables adaptive immune receptor repertoires (AIRRs) analyses for biomarker identification and therapeutic development. With the majority of AIRR data partially or imperfectly labeled, unsupervised ML is essential for motif discovery, biologically meaningful clustering, and generation of novel receptor sequences. However, no unified framework for unsupervised ML exists in the AIRR field, hindering the assessment of model robustness and generalizability. Here, we present an immuneML release advancing unsupervised ML in the AIRR field through unified clustering workflows, interpretable generative modeling, integration with protein language model embeddings, dimensionality reduction, and visualization. We demonstrate immuneMLs utility in three use cases: (i) benchmarking generative models for epitope-specific sequence generation, assessing specificity and novelty, (ii) systematic evaluation of clustering approaches on experimental receptor sequences against biological properties, such as epitope specificity and MHC, and (iii) unsupervised analysis of an experimental AIRR dataset to examine potential confounding, a practice widespread in related fields but unexplored in AIRR analyses.
Vindas Yassine, Y. E.; Bornet, A.; Abbas, M.; Geissbuehler, D.; Rodrigues-Jr, J. F.; Teodoro, D.
Show abstract
Transmissible hospital-acquired infections (HAIs) arise from complex, time-varying interactions among patients, healthcare workers, and clinical environments. Although data-driven approaches like graph neural networks (GNNs) effectively model these contacts, they often function as black boxes that over-look established epidemiological principles, limiting interpretability and clinical trust. Inspired by physics-informed neural networks, we propose a epidemiology-informed GNN (EIGNN) framework for patient-level state transitions prediction in dynamic hospital settings, integrating mechanistic epidemiological models into GNNs in a principled manner. Patient-level risk factors learned from dynamic contact networks are jointly leveraged to infer latent epidemiological states, predict state transitions across multiple horizons, and estimate key epidemiological parameters, including transmission and recovery rates. We evaluate the approach on a real-world hospital-onset COVID-19 cohort and two public datasets simulating viral and bacterial HAIs. Across multiple architectures and horizons, EIGNNs achieves AUC-ROC up to 98.46% while providing interpretable, mechanistically consistent insights, offering a transparent tool for infection prevention and control.