Back

Bioinformatics

24 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
JointMR: A joint likelihood-based approach for causal effect estimation in overlapping Mendelian Randomization studies
2025-12-19 genetic and genomic medicine 10.64898/2025.12.18.25342634
#1 (6.0%)
Show abstract

The integration of causal effect estimates from multiple Mendelian Randomization studies has become increasingly popular. However, the presence of overlapping databases compromises traditional meta-analysis, leading to inflated variance and reduced statistical power. Here, we propose JointMR, a joint likelihood-based approach designed to integrate multiple GWAS summary databases while explicitly accounting for the covariance matrix of the Wald ratio estimates. Specifically, to accommodate potent...

2
Enhancing Polygenic Risk Prediction by Modeling Quantile-Specific Genetic Effects
2025-12-29 epidemiology 10.64898/2025.12.25.25342935
#1 (6.0%)
Show abstract

Polygenic risk scores (PRSs) quantify an individuals genetic susceptibility to complex traits and diseases. Conventional PRSs, which are based on linear models, perform poorly for phenotypes with skewed distributions or with genetic effects that vary across the distribution. We propose a quantile regression-based PRS (QPRS) that can capture quantile-specific genetic effects. While existing PRSs provide only a single score, QPRS models genetic influences at multiple quantiles of the phenotype, th...

3
FA-NIVA: A Nextflow framework for automated analysis of Nanopore based long-read sequencing data for genetic analysis in Fanconi anemia
2026-03-04 genetic and genomic medicine 10.64898/2026.02.27.26346867
Top 0.1% (5.7%)
Show abstract

MotivationFanconi anemia (FA) is a rare disease mainly caused by biallelic pathogenic variants, including structural variants such as large deletions and insertions in FA genes. Currently, variant detection is based on short-read sequencing and probe-based approaches. However, determining the exact genomic breakpoint or achieving allelic discrimination remains challenging. Nanopore-based long-read sequencing enables a comprehensive detection of FA variants, but a unified bioinformatic analysis p...

4
A variational sparse Gaussian-process method for detecting spatially variable genes and cellular interactions from spatial transcriptomics
2025-12-11 genetic and genomic medicine 10.64898/2025.12.10.25341956
Top 0.2% (4.1%)
Show abstract

Advanced spatially resolved transcriptomic (SRT) technologies preserve the spatial context of gene expression within tissues, enabling the study of context-dependent transcriptional regulation. Here, we propose VISGP, a variational sparse gaussian-process method for spatial variable genes (SVGs) and cellular interactions analysis from such data. VISGP utilizes variational inference and a sparse Gaussian process approximation, which efficiently models the posterior distribution with a set of indu...

5
Integrating multi-omics and multi-context QTL data with GWAS reveals the genetic architecture of complex traits and improves the discovery of risk genes
2025-12-27 genetic and genomic medicine 10.64898/2025.12.19.25342620
Top 0.2% (4.1%)
Show abstract

Recent studies showed that expression QTLs, even from trait-related tissues, explained a small fraction of complex trait heritability. A natural strategy to close this gap is to incorporate molecular QTLs (molQTLs) beyond gene expression, across diverse tissue/cellular contexts. Yet, integrating such QTL data presents analytical challenges. Molecular traits often share QTLs or have QTLs in high LD, complicating the attribution of GWAS signals to specific molecular traits. Our simulations showed ...

6
Learning lifetime disease liability reveals and removes genetic confounding in electronic health records
2026-02-22 genetic and genomic medicine 10.64898/2026.02.15.26346336
Top 0.2% (4.0%)
Show abstract

Electronic health records (EHRs) have become the cornerstone of population-scale genetic studies1, but factors including patterns of healthcare use shape which and how diagnoses are recorded, leading to confounding effects in genetic associations with EHR codes2. In this study we propose EDGAR, a deep learning framework that recovers lifetime disease liability from EHR by aligning diagnostic codes with clinically validated measures and disease labels in a set of individuals prioritized through a...

7
Statistical uncertainty explains the poor agreement in polygenic scoring for type 2 diabetes
2026-02-27 genetic and genomic medicine 10.64898/2026.02.25.26347015
Top 0.2% (3.8%)
Show abstract

Polygenic scores (PGS) have emerged as an important tool for genetic risk prediction in medicine to identify individuals at high-risk for disease. A major limitation in their implementation is the apparent disagreement among scores for the same individual decreasing their interpretability and utility in clinical settings. Here we show that the poor agreement across PGSes for type 2 diabetes (T2D) is fully explained by statistical uncertainty in PGS-based prediction; individual-level uncertainty ...

8
PHARMWATCH: A Multilayer Pharmacogenomics Safety System for Accurate Star Allele Interpretation
2026-02-28 genetic and genomic medicine 10.64898/2026.02.26.26347200
Top 0.2% (3.7%)
Show abstract

The Clinical Pharmacogenetics Implementation Consortium (CPIC) bases its drug-gene recommendations on the assignment of star alleles, which map known genotypes to defined functional categories and corresponding drug dosage guidelines. The star allele framework, first proposed in 1996 for the CYP gene family and later formalized with CPICs establishment in 2010 [1, 2], remains foundational to pharmacogenomics. However, this system has notable limitations. Its dependence on a restricted set of ben...

9
Constructing a Literature-Derived Database for Benchmarking Polygenic Risk Score Construction Methods with Spectral Ranking Inferences
2026-03-03 genetic and genomic medicine 10.64898/2026.03.01.26347258
Top 0.3% (3.7%)
Show abstract

Polygenic risk scores (PRSs) have emerged as a valuable tool for genetic risk prediction and stratification in human diseases. Over the past decade, extensive methodological efforts have focused on improving the predictive power of PRS, leading to the development of numerous methods for PRS construction. Benchmarking these various methods thus becomes an essential task that is crucial for guiding future PRS applications. While studies have benchmarked subsets of these methods on specific phenoty...

10
Deep Agentic Variant Prioritisation for Expert Level Genetic Diagnosis Fast at Scale
2026-02-18 genetic and genomic medicine 10.64898/2026.02.17.26346421
Top 0.3% (3.1%)
Show abstract

Genetic diagnosis remains a formidable challenge characterized by a diagnostic odyssey that spans years, with over half of rare disease patients remaining undiagnosed affecting more than 300 million people on earth. Clinicians must navigate through thousands of candidate variants against a noisy and fragmented literature landscape, a task that overwhelms human cognitive capacity and conventional decision-making approaches. Recent advances in agentic artificial intelligence systems have demonstra...

11
Human Phenotype Ontology (HPO) Mapper: Semantic Mapping of Clinical Findings to the Human Phenotype Ontology Using AI-Powered Embeddings and LLM-Based Quality Control
2025-12-22 health informatics 10.64898/2025.12.20.25342726
Top 0.3% (3.0%)
Show abstract

O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/25342726v1_ufig1.gif" ALT="Figure 1"> View larger version (36K): org.highwire.dtl.DTLVardef@8edb94org.highwire.dtl.DTLVardef@f20105org.highwire.dtl.DTLVardef@21033corg.highwire.dtl.DTLVardef@15b865e_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOVISUAL ABSTRACT:C_FLOATNO C_FIG Structured phenotypic annotations linked to genetic data can drive diagnostic insight and therapeutic discovery in complex diseases. However, poor research access to the rich ...

12
PAGAN predicts digenic interactions by generalizing single-gene representations in biological networks
2026-01-29 genetic and genomic medicine 10.64898/2026.01.27.26344931
Top 0.4% (2.0%)
Show abstract

Digenic alterations can produce phenotypes such as synthetic lethality or digenic disease that are not observed upon individual gene perturbation, often by disrupting compensatory or redundant biological mechanisms. We hypothesized that gene pairs underlying such phenotypes share, when considered jointly, biological network properties analogous to those of essential genes or monogenic Mendelian disease genes. To test this hypothesis, we developed PAGAN, a graph representation learning framework ...

13
A time-to-event heritability framework for inferring the genetic architecture of longitudinal traits
2026-02-22 genetic and genomic medicine 10.64898/2026.02.16.26346285
Top 0.4% (2.0%)
Show abstract

Biobanks with longitudinal measurements have advanced our understanding of time-to-event (TTE) traits including age-of-onset and disease progression. However, limited work has characterized the heritability of TTE traits, a key parameter for comparisons of total association and predictive power. Here, we present COXMM, a Cox proportional hazard mixed model for estimating TTE heritability. Simulations show our model achieves nearly unbiased results, whereas non-TTE approaches severely underestima...

14
Performance Characteristics of Reasoning Large Language Models for Evidence Extraction from Clinical Genomics Literature
2026-02-19 genetic and genomic medicine 10.64898/2026.02.18.26346543
Top 0.4% (2.0%)
Show abstract

BACKGROUNDGenetic variant curation, an important step in the implementation of Genomic Medicine, requires literature-guided comparison of variant prevalence in affected individuals versus healthy controls. This evidence is categorized as the PS4 evidence code by the AMP/ACMG variant interpretation guidelines and its manual extraction is a major bottleneck in clinical variant curation. This study aimed to evaluate whether reasoning-capable large language models (LLMs) can support guideline-constr...

15
PhenoSS: Phenotype semantic similarity-based approach for rare disease prediction and patient clustering
2026-03-02 health informatics 10.64898/2026.02.26.26347219
Top 0.4% (2.0%)
Show abstract

ObjectiveSystematic clinical phenotyping using Human Phenotype Ontology (HPO) is central to rare disease diagnosis. However, current disease prioritization (ranking candidate diseases from HPO for a patient) methods face key challenges: they often fail to account for the hierarchical structure of HPO terms, ignore dependencies among correlated terms, and do not adjust for batch effects arising from systematic differences in phenotype documentation across cohorts, institutions, or clinicians. We ...

16
An Integrated Deep Learning Framework for Small-Sample Biomedical Data Classification: Explainable Graph Neural Networks with Data Augmentation for RNA sequencing Dataset
2026-02-24 genetic and genomic medicine 10.64898/2026.02.22.26346827
Top 0.5% (1.9%)
Show abstract

Applying deep learning models to RNA-Seq data poses substantial challenges, primarily due to the high dimensionality of the data and the limited sample sizes. To address these issues, this study introduces an advanced deep learning pipeline that integrates feature engineering with data augmentation. The engineering application focuses on biomedical engineering, specifically the classification of RNA-Seq datasets for disease diagnosis. The proposed framework was initially validated on synthetic d...

17
Interpretable Fine-tuned Large Language Models Facilitate Making Genetic Test Decisions for Rare Diseases
2026-03-02 health informatics 10.64898/2026.02.26.26347223
Top 0.5% (1.9%)
Show abstract

Clinical decision making often relies on expert judgment guided by established guidelines, which can be challenging to standardize and abstract to implement. For example, selecting between gene panels and whole exome/genome sequencing (WES/WGS) for rare disease diagnosis frequently requires interpretation of evidence-based recommendations from the American College of Medical Genetics and Genomics (ACMG) guideline. Traditional machine learning (ML) models predicting suitable genetic tests often f...

18
Swarm-GestaltMatcher: distributed Gestalt learning to enhance facial phenotyping for rare genetic syndromes
2026-01-05 health informatics 10.64898/2026.01.02.26343337
Top 0.5% (1.9%)
Show abstract

Deep learning-based facial phenotyping represents a major paradigm shift in the diagnosis of rare and ultra-rare genetic disorders. By capturing disease-specific craniofacial "gestalts" that are often subtle, overlapping, but overlooked in routine clinical practice, these technologies surpass the traditional limits of dysmorphology assessment. Despite this, data scarcity and stringent privacy policies constraint centralized model training and its clinical translation. Swarm learning, a decentral...

19
A novel age-informative polygenic score improves predictive ability for kidney function and kidney function decline
2026-01-23 genetic and genomic medicine 10.64898/2026.01.22.26344420
Top 0.6% (1.9%)
Show abstract

Polygenic scores (PGSs) are widely used to summarize the joint genetic effects for disease-related traits. However, while age-dependent genetic effects are increasingly recognized, their integration into PGSs remains underexplored. Kidney function, assessed by estimated glomerular filtration rate (eGFR), has strong age-related genetic effects, and prediction of kidney function decline is an unmet need. We develop an age-informative PGS for quantitative traits by generating age-specific weights ...

20
Step change in glaucoma polygenic risk score performance enables clinical utility and disease prediction across all major ancestries
2026-01-24 genetic and genomic medicine 10.64898/2026.01.23.26344675
Top 0.6% (1.9%)
Show abstract

Glaucoma is the leading cause of irreversible blindness; vision loss is preventable with timely treatment, but early detection is challenging, leaving [~]50% undiagnosed, highlighting the need for improved risk assessment tools. We developed a polygenic risk score (PRS) using data from >6 million individuals. PRS performance was exceptional in European ancestries; top 10% PRS individuals had 10-fold increased risk (OR=10.0) relative to the remainder. Performance remained good across all major an...