CBIcall: a configuration-driven framework for variant calling in large sequencing cohorts
Rueda, M.; Fernandez Orth, D.; Gut, I. G.
Show abstract
MotivationVariant calling for next-generation sequencing (NGS) data relies on a diverse ecosystem of tools and workflows. Large-scale collaborative studies increasingly adopt federated analysis, where each institution processes sensitive data locally using standardized pipelines. Deploying identical pipelines across multiple centers remains challenging because heterogeneous software environments and computing policies can cause workflow divergence and inconsistent results. ResultsWe developed CBIcall, a workflow-agnostic, configuration-driven framework that runs standardized variant-calling pipelines from raw FASTQ files to analysis-ready VCFs using a single YAML file. An execution driver validates user parameters, enforces compatibility across pipelines, analysis modes, work-flow backends, genome builds, and tool versions, and records structured provenance for each run, ensuring consistent and reproducible pipeline execution across computing environments. CBIcall dispatches validated workflows through Bash or Snakemake backends and provides production-ready pipelines for germline WES, WGS (single-sample or cohort joint genotyping following GATK Best Practices), and mitochondrial DNA analysis. We validated CBIcall on public datasets and deployed it in the EU HEREDITARY project, processing 1,111 samples with both WES and mtDNA pipelines on an institutional HPC system, demonstrating its suitability for reproducible cohort-scale genomic analyses. Availability and implementationCBIcall is open source (GPLv3) and distributed with ready-to-run pipelines; full dependency and installation documentation is available at https://github.com/CNAG-Biomedical-Informatics/cbicall.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.