Bioinformatics
◐ Oxford University Press (OUP)
Preprints posted in the last 30 days, ranked by how well they match Bioinformatics's content profile, based on 1061 papers previously published here. The average preprint has a 0.95% match score for this journal, so anything above that is already an above-average fit.
Harviainen, J.; Sena, F.; Moumard, C.; Politov, A.; Schmidt, S.; Tomescu, A. I.
Show abstract
MotivationPangenome graphs are increasingly used in bioinformatics, ranging from environmental surveillance and crop improvement to the construction of population-scale human pangenomes. As these graphs grow in size, methods that scale efficiently become essential. A central task in pangenome analysis is the discovery of variation structures. In directed graphs, the most widely studied such structures, superbubbles, can be identified in linear time. Their canonical generalization to bidirected graphs, ultrabubbles, more accurately models DNA reverse complementarity. However, existing ultrabubble algorithms are quadratic in the worst case. ResultsWe show that all ultrabubbles in a bidirected graph containing at least one tip or one cutvertex--a common property of pangenome graphs--can be computed in linear time. Our key contribution is a new linear-time orientation algorithm that transforms such a bidirected graph into a directed graph of the same size, in practice. Orientation conflicts are resolved by introducing auxiliary source or sink vertices. We prove that ultrabubbles in the original bidirected graph correspond to weak superbubbles in the resulting directed graph, enabling the use of existing lineartime algorithms. Our approach achieves speedups of up to 25xover the ultrabubble implementation in vg, and of more than 200x over BubbleGun, enabling scalable pangenome analyses. For example, on the v2.0 pangenome graph constructed by the Human Pangenome Reference Consortium from 232 individuals, after reading the input, our method completes in under 3 minutes, while vg requires more than one hour, and four times more RAM. AvailabilityOur method is implemented in the BubbleFinder tool github.com/algbio/BubbleFinder, via the new ultrabubbles subcommand. Contactalexandru.tomescu@helsinki.fi
Jia, X.; Phan, A.; Dorman, K.; Kadelka, C.
Show abstract
MotivationHigh-throughput experiments generate genome-wide measurements for thousands of genes, which are often tested marginally. Biological processes are driven by coordinated groups of genes rather than individual genes, making gene set enrichment analysis an essential post hoc interpretation tool. Traditional approaches such as Over-Representation Analysis and Gene Set Enrichment Analysis test gene sets independently, which ignores the hierarchical and overlapping structure of gene set collections such as the Gene Ontology, and often leads to redundant enrichment results. Set-based approaches such as MGSA address this issue by modeling multiple gene sets simultaneously, but they rely on binary gene activation states derived from arbitrary thresholds on gene-level statistics. ResultsWe introduce Concise List Enrichment Analysis Reducing Redundancy (CLEAR), a Bayesian gene set enrichment framework that jointly models gene sets while incorporating continuous gene-level statistics such as test statistics or p-values. CLEAR extends model-based gene set analysis by replacing threshold-based gene activation with a probabilistic model for continuous gene-level statistics. This approach preserves the redundancy-reduction advantages of set-based enrichment methods while avoiding the information loss introduced by binarization. Using both simulated datasets and human gene expression data, we show that CLEAR improves sensitivity compared with existing enrichment approaches while producing a more concise and interpretable set of enriched gene sets. Availability and implementationThe source code, data, and a brief tutorial are freely available at https://github.com/jiatuya/CLEAR
Joshi, S.; Sowdhamini, R.
Show abstract
MotivationCharacterizing atomic-level stability and cooperative interaction networks is essential for understanding protein function and evolution. However, existing tools often lack the precision to integrate detailed physicochemical energies with higher-order graph-theoretic analyses. ResultsWe present HORI-EN, an updated implementation to the HORI framework, featuring hybrid energetic scoring (Physicochemical + Knowledge-Based) and a Normalized Interaction Score (NIS) based on cumulative distribution functions. HORI-EN identifies higher-order cliques of interacting residues, revealing cooperative stabilization networks. Validation on the SKEMPI v2 dataset demonstrates that HORI-EN shows discriminative performance in identifying mutational hotspots, achieving an ROC-AUC of 0.780 on the full dataset and 0.844 on a clean benchmark. Enrichment analysis indicates a 3.1-fold increase in precision for the top 1% of predictions. Furthermore, analysis of the residue interaction network recovers 77.4% of non-contacting hotspots by identifying one-hop bridging interactions to the partner chain. Beyond hotspot prediction, HORI-EN distinguishes native structures from decoys and captures conserved energetic signatures in evolutionary case studies of serine proteases and lipases. Availability and ImplementationThe web server is freely available at https://caps.ncbs.res.in/HORI-EN and source code is available at https://github.com/thesixeyedknight/HoriPy. Contactmini@ncbs.res.in
Liu, Z.; Cordero, A.; Kinney, J. B.
Show abstract
MotivationComputationally designed DNA sequence libraries are essential components of massively parallel reporter assays (MPRAs), deep mutational scanning (DMS) experiments, and other multiplex assays of variant effect (MAVEs). They are also increasingly used in silico to analyze genomic AI models. Designing these libraries, however, remains tedious and error-prone due to the lack of purpose-built software. ResultsHere we describe PoolParty, a Python package that streamlines the design of complex oligo pools using a simple but flexible API. In PoolParty, each library is represented by a computational graph that can be specified in just a few lines of code. Over 50 built-in operations cover nucleotide- and codon-level mutagenesis, motif insertion, barcode generation, and more. PoolParty automatically generates informative names for each sequence and provides "design cards" detailing how each sequence was generated. Visualization methods let users quickly audit library content and inspect the underlying graph. PoolParty thus transforms oligo pool design from a tedious task requiring custom functions and scripts into a structured, transparent, and reproducible process. Availability and implementationPoolParty is freely available and can be installed using pip. It is compatible with Python [≥] 3.10. Documentation is provided at https://poolparty.readthedocs.io; source code is available at https://github.com/jbkinney/poolparty-statetracker. A static release is archived at DOI 10.5281/zenodo.19445098.
Sapci, A. O. B.; Arasti, S.; Braun, E.; Mirarab, S.
Show abstract
MotivationPhylogenetic analyses of entire genomes (phylogenomics) have revealed abundant heterogeneity of evolutionary histories. While much has been done to model this heterogeneity and to infer species trees despite it, the current toolkit has a limitation. Most methods assume that gene trees across the genome differ but are all sampled from the same distribution, defined by models such as the multi-species coalescent (MSC), and parametrized consistently across the genome. Empirical data strongly suggest this assumption is often violated because the species tree, its parameters, or the process generating the gene trees can all change across the genome. Errors in the data can further compound this heterogeneity. ResultsTo address this challenge, we define the problem of detecting what segments of the genome are inconsistent with a putative species tree, even after allowing discordance according to MSC. We model gene trees not as a set, but rather as a series (a realization of a stochastic process) along genomic positions. We propose a Hidden Markov Model (HMM) approach applied to quartet statistics measured from gene trees and tie the model to MSC using simulations. The combined use of these three ideas leads to a scalable method called Phlag. On simulated and real data, we show that Phlag can detect many cases of change in underlying evolutionary processes, including reduced recombination rates, population size changes, and admixture, all using the same algorithm. Availability and ImplementationPhlag is available at github.com/bo1929/phlag. All results and scripts can be found at github.com/bo1929/shared.phlag.
Wu, H.; Medvedev, P.
Show abstract
Estimating mutation rates between evolutionarily related sequences is a central problem in molecular evolution. Due to the rapid expansion of datasets, modern methods avoid costly alignment and instead focus on comparing sketches of sets of constituent k-mers. While these methods perform well on many sequences, they are not robust to highly repetitive sequences such as centromeres. In this paper, we present three new estimators that are robust to the presence of repeats. The estimators are applicable in different settings, based on whether they need count information from zero, one, or both of the sequences. We evaluate our estimators empirically using highly repetitive alpha satellite sequences. Our estimators each perform best in their class and our strongest estimator outperforms all other tested estimators. Our software is open-source and freely available on https://github.com/medvedevgroup/Accurate_repeat-aware_kmer_based_estimator.
Kanchwala, M. S.; Xing, C.; Xuan, Z.
Show abstract
Genome-wide association studies (GWAS) have significantly advanced our understanding of complex traits and diseases, but their interpretive power remains limited due to challenges in identifying causal genes and pathways. Integrating GWAS with multi-omics data--such as gene expression, protein-protein interactions, and gene-pathway networks have the potential to enhance biological insights and improve gene prioritization. To fulfill this potential and need, we developed the GWAS & Multi-omics Integration Pipeline (GMIP), a flexible and scalable framework that incorporates widely used tools such as PoPS, MAGMA, and benchmarker to enrich GWAS findings. However, PoPS suffers from multicollinearity in its features, which can impact performance. To overcome this, we introduce GMIP-PLSR, an extension of GMIP that uses Partial Least Squares Regression (PLSR) to manage multicollinearity effectively. We applied GMIP-PLSR across multiple GWAS datasets, demonstrating superior performance over PoPS in most cases. In a case study on NAFLD, GMIP-PLSR, using features derived from both disease-specific scRNA-seq and general PoPS features, identified gene sets with higher heritability and stronger enrichment in known NAFLD pathways, confirming its ability to enhance GWAS findings. Built on Nextflow, GMIP is computationally efficient, adaptable to diverse research environments, and provides a robust solution for gene reprioritization in post-GWAS analyses. GMIP-PLSR is available at https://github.com/mohammedmsk/GMIP.
Chu, G.; Schmidt, H.; Raphael, B.
Show abstract
MotivationRecent dynamic lineage tracing technologies use genome editing to induce heritable mutations, or edits, that accumulate across successive cell divisions. These edits are measured using single-cell sequencing or imaging, providing data to reconstruct cell lineages at single-cell resolution. Current computational approaches to infer cell lineage trees, or phylogenies, from these data perform two separate steps: (1) Identify each cells edits (genotype) from the raw sequencing or imaging data; (2) Infer a cell lineage tree from the cell genotypes. However, genotyping cells is an inexact process and genotype errors can yield an inaccurate lineage tree. For example, using fluorescence based-imaging to measure edits results in a high fraction ({approx} 25-50%) of uncertain or erroneous genotypes. ResultsWe introduce Lineage Analysis via Maximum Likelihood with PRobabilistic Observations (LAML-Pro), an algorithm that jointly infers cell genotypes and a cell lineage tree. LAML-Pro is based on the Probabilistic Mixed-type Missing Observation (PMMO) model, which we derive to describe both the genome editing and genotype observation processes. LAML-Pro constructs lineage trees from thousands of cells in under an hour by leveraging the sparsity of transitions under the PMMO model. On simulated data, we demonstrate that LAML-Pro corrects genotype errors and infers substantially more accurate trees than existing methods which are vulnerable to genotype errors. Applied to data from two recent imaging-based lineage tracing systems, LAML-Pro reduces genotype errors by 5-fold and produces more spatially coherent lineage trees compared to existing methods. Availability and ImplementationLAML-Pro is freely available at: github.com/raphael-group/LAML-Pro.
Freese, N. H.; Raveendran, K.; Sirigineedi, J. S.; Chinta, U. L.; Badzuh, P.; Marne, O.; Shetty, C.; Naylor, I.; Jagarapu, S.; Loraine, A.
Show abstract
SummaryTrack Hub Quickload Translator is a web application that interconverts University of California Santa Cruz (UCSC) Genome Browser track hub and Integrated Genome Browser (IGB) data repository formats by translating the track hub or Quickload configuration files to the other genome browsers required format. This new work enables researchers to work with tens of thousands of published genome assemblies for the first time using either browser. Availability and ImplementationTrack Hub Quickload Translator is implemented using Python 3 and freely available to use at translate.bioviz.org. Integrated Genome Browser is available from BioViz.org. Track Hub Quickload Translator, GenArk Genomes, and the Integrated Genome Browser source code is available from github.org/lorainelab. Contactaloraine@charlotte.edu
Heydarabadipour, A.; Smith, L. P.; Hellerstein, J. L.; Sauro, H. M.
Show abstract
Antimony is a human-readable language for defining and sharing models developed by the systems biology community. It enables scientists to describe biochemical networks with a simple syntax, while supporting seamless conversion to and from the Systems Biology Markup Language (SBML) community standard. Since Antimonys original release, both SBML and modeling practices have evolved significantly, creating a need to update Antimony to maintain its standards compliance and practical relevance. In this paper, we introduce Antimony 3, a comprehensive update that formalizes its cumulative improvements and extends its support for SBML Level 3 Core and Flux Balance Constraints (FBC), Distributions, Layout, and Render packages. Antimony 3 enables model specifications that combine kinetic reactions with flux balance analysis, represent uncertainty using probability distributions, add biological context through annotations, and define publication-ready visualizations, all within a unified plain-text format. Antimony 3 is delivered as a lightweight C/C++ library with a stable C API. It is available through official bindings for Python, Julia, and JavaScript/WebAssembly, as well as a cross-platform desktop GUI, which enables straightforward use across scripting environments, desktop applications, and browser-based tools. Antimony 3 is released as open-source software under the BSD 3-Clause License and is available at https://github.com/sys-bio/antimony. Author SummaryBiological models are typically stored in standardized formats that ensure compatibility across different software tools, but these formats rely on verbose, machine-readable syntax that is difficult for humans to write or inspect directly. Antimony addresses this challenge by providing an intuitive, text-based language for defining biological models that can be automatically converted to and from the Systems Biology Markup Language (SBML). Since Antimonys original release in 2009, the SBML standard and common modeling workflows have expanded significantly. We developed Antimony 3 to support these advances, enabling researchers to write a single human-readable text file that defines reaction networks, constraint-based objectives, uncertainty in parameters and initial conditions, semantic annotations linking to biological databases, and model diagrams. Antimony 3 is provided as open-source software with broad support across computational environments, making it accessible to researchers in a wide range of workflows.
Andrews, B.; Ranganathan, R.
Show abstract
MotivationDNA barcodes are commonly used as a tool to distinguish genuine mutations from sequencing errors in sequencing-based assays. In the presence of indel errors, utilizing barcodes requires accurate alignment of the raw reads to distinguish genuine indels from indel errors. Existing strategies to do this generally rely on aligners built for homology comparison and do not fully utilize quality scores. We reasoned that developing an aligner purpose-built for error correction could yield higher quality barcode-sequence maps. ResultsHere, we present BCAR, a fast barcode-sequence mapper for correcting sequencing errors. BCAR considers all of the evidence for each base call at each position both during alignment and during final consensus generation. BCAR creates high-accuracy barcode-sequence maps from simulated reads across a broad range of error rates and read lengths, outperforming existing methods. We apply BCAR to two experimental datasets, where it generates high-quality barcode-sequence maps. Availability and implementationBCAR source code, documentation and test data are available from: https://github.com/dry-brews/BCAR
O'Brien, A.; Lagos, C.; Fernandez, K.; Ojeda, B.; Parada, P.
Show abstract
As long-read amplicon sequencing becomes routine for fungal metabarcoding, species-level abundance estimation from ITS amplicons remains limited by naive best-hit classification, which misattributes reads among closely related species sharing similar ITS sequences and fragments abundance across redundant database entries. Here we present EMITS, a Rust-based tool that applies expectation-maximization (EM) to iteratively resolve ambiguous read-to-reference mappings from minimap2 alignments against the UNITE database, producing probabilistic specieslevel abundance estimates. EMITS includes platform-specific presets for Oxford Nanopore and PacBio chemistries and performs taxonomic aggregation across UNITE accessions. We validated EMITS using three complementary approaches: controlled simulations with tunable alignment noise, an Oxford Nanopore mock community of 10 fungal species with known composition, and a synthetic community of 21 species derived from UNITE reference sequences. In simulations, EM reduced L1 error by 80-92% compared to naive counting under realistic noise conditions. On the ONT mock community, EM correctly resolved within-genus species assignments where naive counting misattributed reads (e.g., Trichophyton mentagrophytes vs. T. simii; Penicillium species) and consolidated abundance across redundant database accessions. On the synthetic community, EM reduced false positive abundance by 54% and improved overall accuracy by 13.4%. Together with ITSxRust [OBrien et al., 2026] for upstream ITS extraction, EMITS provides a complete high-performance pipeline for long-read fungal amplicon profiling.
Broster, J. H.; Popovic, B.; Kondinskaia, D.; Deane, C. M.; Imrie, F.
Show abstract
Molecular docking aims to predict the binding conformation of a small molecule to its protein target. Recent work has proposed diffusion models for this task, from rigid-body docking that diffuses over ligand degrees of freedom to co-folding approaches that jointly generate protein structure and ligand pose. However, diffusion-based docking models have been shown to frequently produce physically implausible poses and fail to consistently recover key protein-ligand interactions. To address this, we introduce a reinforcement learning framework for training diffusion-based docking models directly on non-differentiable objectives. Fine-tuning DiffDock-Pocket for physical validity with our approach substantially increases the number of generated poses that are physically valid and interaction-preserving, with no increase in inference-time compute. Importantly, this comes without sacrificing structural accuracy; in fact, our approach increases the proportion of structures with near-native poses. These effects are most pronounced for protein targets that are dissimilar to the training data. Our fine-tuned DiffDock-Pocket model outperforms both classical docking algorithms and machine learning-based approaches on the PoseBusters set. Our results demonstrate that reinforcement learning can teach diffusion-based docking models to better respect physical constraints and recover key interactions, without the requirement to rely on inference-time corrections.
Iotchkova, V.; Weale, M. E.
Show abstract
Multi-trait colocalisation is a vital tool to make sense of the large amounts of GWAS data available on platforms like Mystra. It identifies genetic association signals that cluster together, allowing us to infer which gene might be causal for a trait and also which constellation of biological effects might be affected by modulating that gene. Multi-trait colocalisation is a challenging computational problem. Here, we introduce MystraColoc, a Bayesian algorithm for multi-trait colocalisation that works across hundreds or even thousands of GWAS datasets. We illustrate its power both via a worked example at the HDAC9-TWIST1 locus, and via a simulation study that demonstrates its superior clustering performance compared to alternative methods.
Kawato, S.
Show abstract
MotivationGenerating graphical diagrams of microbial and organellar genomes is a common and essential task in bioinformatics. Existing tools often present a trade-off; while powerful programming libraries that require coding skills, graphical applications require server processing or local installation with complex dependency. This highlights the need for a tool that offers both programmatic control for batch processing and graphical accessibility for ease of use. ResultsTo fill this gap, I developed gbdraw, a web application that generates circular and linear genome diagrams from self-contained GenBank or DDBJ files or combinations of GFF3 annotation and FASTA sequence files. Its core functions include visualizing annotated features, plotting GC content/skew tracks, and optionally generating pairwise sequence comparisons for comparative genomics. It is available as both a GUI web application and a command-line utility. Unlike existing web-based tools that require data upload to a remote server, gbdraw operates entirely within the users web browser. This serverless architecture ensures that sensitive sequence data never leaves the local machine, providing a secure environment for visualizing unpublished genomic data. Availability and Implementationgbdraw is implemented in Python 3 (version 3.10+) and is freely available under the MIT license. The web app is available at https://gbdraw.app/. Source code and documentation are available at https://github.com/satoshikawato/gbdraw. The local version can be installed from the Bioconda channel using a conda-compatible package manager.
Sanjaya, P.; Pitkänen, E.
Show abstract
MotivationDeep neural networks have proven effective in classifying tumour types using next-generation sequencing data. However, developing transferable models that work across heterogeneous operating environments remains challenging due to differences in cohort compositions and data generation protocols, privacy concerns, and limited computational capabilities. ResultsWe introduce muat, a transformer-based software for tumour classification using somatic variant data from whole-genome (WGS) and whole-exome sequencing (WES). Building on previously developed MuAt and MuAt2 models, we distribute the software via Docker containers and Bioconda for deployment in high-performance computing (HPC) systems and Secure Processing Environments (SPEs). Using a downloadable MuAt checkpoint, we reproduce the performance reported in the original study on whole genome (PCAWG; 89% accuracy in histological tumour typing) and exome sequencing data (TCGA; 64% accuracy). Cross-cohort evaluation in Genomics England SPE achieved 81% accuracy without retraining and 89% following fine-tuning. As a demonstration of the softwares adaptability, we also deployed muat within the iCAN Digital Precision Cancer Medicine Flagships SPE and integrated it into a Nextflow-managed workflow. Availability and implementationmuat is available through conda (www.anaconda.org/bioconda/muat) and GitHub (https://github.com/primasanjaya/muat), under the Apache 2.0 License. Contactprima.sanjaya@helsinki.fi, esa.pitkanen@helsinki.fi; website: mlbiomed.net
Strassburg, C.; Pitlor, D.; Singhi, A. D.; Gottschalk, R.; Uttam, S.
Show abstract
SummaryMitochondrial transcript abundance is a standard quality control metric in single-cell RNA sequencing, but fixed percentage thresholds fail to account for the substantial variation in mitochondrial content across cell types and tissues, risking both retention of compromised cells and exclusion of transcriptionally active viable cell populations. We present MitoChontrol, a cell-type-aware probabilistic framework for mitochondrial quality control that models the mitochondrial transcript fraction within transcriptionally coherent clusters as a Gaussian mixture distribution. Compromised-cell components are identified from the upper tail of each cluster-specific distribution, and filtering thresholds are defined as the point at which the posterior probability of cellular compromise exceeds a user-definded confidence value. Applied to controlled perturbation experiments and a pancreatic ductal adenocarcinoma single-cell dataset, MitoChontrol selectively removes transcriptionally compromised cells while preserving biologically elevated but viable populations, outperforming fixed-threshold and outlier-based approaches. Availability and ImplementationMitoChontrol is implemented in Python and integrates directly with AnnData-based workflows. It is freely available under the GNU General Public License v3 (GPL-3.0) at: https://github.com/uttamLab/MitoChontrol (DOI: https://doi.org/10.5281/zenodo.19423054)
Zhou, Z.; Buchan, D. W.
Show abstract
Protein function annotation requires integrating diverse biological signals, yet existing multimodal methods often struggle with missing inputs and redundant information. We present Hybrid Gated Fusion, a multimodal architecture that combines intrinsic protein features, including sequence and structure, with extrinsic functional context from text and interaction networks. Rather than weighting all modalities equally, the model uses bilinear gating to assess both the informativeness of each modality and its agreement with the others, while auxiliary supervision reduces modality dominance and preserves useful signal in weaker modalities. On the CAFA3 benchmark, a single Hybrid Gated Fusion model achieves state-of-the-art performance in Biological Process (Fmax = 0.601) and Cellular Component (Fmax = 0.706), while remaining competitive in Molecular Function (Fmax = 0.702). Analysis of the learned gates shows that interaction networks and text often provide complementary functional signals, whereas structural features are down-weighted when redundant but remain valuable under sparse-input settings. These results establish Hybrid Gated Fusion as a robust and scalable framework for genome-scale protein function annotation. Availability and implementationSource code and reproduction scripts are freely available at https://github.com/psipred/PFP. Pre-computed embeddings, data splits, and model checkpoints are deposited at https://doi.org/10.5281/zenodo.19498341.
Iversen, P.; Renard, B. Y.; Baum, K.
Show abstract
MotivationMachine learning models that predict drug response from cancer cell line omics profiles could advance precision oncology, yet their utility is limited by heterogeneous prediction quality and silent failures under distribution shifts. Uncertainty quantification can address these challenges, but systematic evaluation of methods for this domain is lacking. ResultsWe benchmark seven uncertainty-aware models for drug response prediction, comparing epistemic uncertainty via ensemble disagreement, aleatoric uncertainty via distributional modeling, and their combination. Gaussian neural network ensembles reliably flag out-of-distribution inputs and achieve a 64% reduction in mean squared error when filtering to the 10% most confident predictions. We discuss how probabilistic predictions can enable drug candidate analyses that account for therapeutically relevant response ranges. Through uncertainty attribution, we identify transcriptomic signatures of unpredictability, i.e., genes associated with prediction uncertainty. We also demonstrate that uncertainty-guided active learning can prioritize informative experiments. Availability and ImplementationThe code and data are available at https://github.com/PascalIversen/LUDRP and https://zenodo.org/records/19219091. Contactkatharina.baum@fu-berlin.de
Cimesa, M.; Sokic, A.
Show abstract
SummaryThe rapid accumulation of cancer genomic data across repositories such as ClinVar, cBioPortal, and the TCGA Pan-Cancer Atlas has created an urgent need for integrated tools that allow researchers to explore mutations in their structural and clinical context without requiring specialized bioinformatics expertise. Here we present OncoMORPHIA, a free, browser-based web platform that unifies 3D protein structure visualization, clinical variant annotation, drug-target interaction mapping, survival analysis, mutational signature decomposition, and AI-powered interpretation within a single interface. OncoMORPHIA automatically retrieves and integrates data from ten public databases, maps missense mutations onto experimentally determined or AlphaFold-predicted protein structures, computes mutation density heatmaps with Gaussian smoothing, and renders interactive visualizations including lollipop plots, Kaplan-Meier survival curves, protein-protein interaction networks, and pan-cancer tissue distribution charts. The platform supports 45 major cancer driver genes with extensibility to any human gene with available structural data. AvailabilityOncoMORPHIA is freely available at https://oncomorphia.com. Source code is available upon request. The platform requires no installation, no account registration, and no API keys for core functionality.