Back

Automated GenePy Gene-Burden Computation via a Reproducible Nextflow Workflow Integrated with the Genomics England (GEL) Lifebit Platform

Nazari, I.; Ennis, S.; Ashton, J.; Cheng, G.

2026-05-24 genetic and genomic medicine
10.64898/2026.05.22.26353863 medRxiv
Show abstract

Interpretation of rare-disease genomes remains constrained by variant-centric analytical frameworks that insufficiently capture the cumulative impact of multiple variants within a gene. GenePy provides an individual-level, gene-based burden metric that integrates variant consequence, allele frequency, and zygosity into a unified quantitative score, enabling a transition from discrete variant annotation to aggregated gene-level interpretation. In the context of Genomics England, this formulation supports a panel-agnostic, genotype-to-phenotype diagnostic strategy for unresolved monogenic disorders by prioritising genes with elevated mutational burden per individual. Here, we present a fully automated, containerised GenePy workflow deployed through Nextflow and integrated within the Genomics England (GEL) Research Environment via the Lifebit CloudOS platform. This implementation provides scalable, secure, and governance-compliant computation of gene-level burden scores across population-scale cohorts. The workflow harmonises variant annotation, quality control, and chunked data aggregation within modular, reproducible processes designed for high-throughput execution on cloud-native infrastructure. By enabling robust, portable, and auditable gene-level scoring across large rare-disease sequencing datasets, this framework enhances analytical resolution and supports downstream statistical prioritisation, integrative phenotype matching, and hypothesis generation within genotype-to-phenotype diagnostic workflows.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.