Back

ALPINE: A Scalable Pipeline for Comprehensive Classification of Gene-Editing Outcomes from Long-Read Amplicon Sequencing

Chen, Y.; Gao, X.-H.; Vichas, A.; Wang, J.; Golhar, R.; Neuhaus, I.

2026-03-30 bioinformatics
10.64898/2026.03.27.714831 bioRxiv
Show abstract

CRISPR genome editing has enabled precise genetic modification for gene and cell therapies, but edits often produce heterogeneous on-target outcomes, including homology-directed repair (HDR) knock-ins, DNA repair template integrations, and structural variants. Existing tools are frequently limited to short reads or lack viral vector-specific integration categories needed for therapeutic development. Here, we present ALPINE (Amplicon Long-read Pipeline for INtegration Evaluation), a scalable and reproducible pipeline for classifying and quantifying gene-editing outcomes from long-read amplicon sequencing. ALPINE classifies reads into 10+ categories, including DNA repair vector integration subtypes, and performs variant calling near the gene-edited site with batch, multi-sample reporting. Uniquely, ALPINE can distinguish between cells treated with multiple DNA repair vectors and identify distinct molecular features, such as inverted terminal repeats (ITRs), enabling comprehensive characterization of complex gene editing outcomes. Benchmarking on simulated datasets showed high accuracy, and application to edited T cell samples demonstrated comprehensive gene-editing outcome profiling. Supplementary data are available online. AvailabilityALPINE is implemented in Python and distributed as Docker containers with Common Workflow Language (CWL) support for cloud deployment. The pipeline is available under MIT license at https://github.com/Maggi-Chen/ALPINE.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Biotechnology
147 papers in training set
Top 0.3%
14.6%
2
Bioinformatics
1061 papers in training set
Top 2%
12.6%
3
Nature Methods
336 papers in training set
Top 0.7%
12.4%
4
Nucleic Acids Research
1128 papers in training set
Top 2%
10.0%
5
Nature Communications
4913 papers in training set
Top 24%
8.1%
50% of probability mass above
6
Genome Biology
555 papers in training set
Top 0.9%
6.7%
7
Genome Medicine
154 papers in training set
Top 1%
6.3%
8
Cell Reports Methods
141 papers in training set
Top 0.7%
3.9%
9
Nature
575 papers in training set
Top 8%
3.0%
10
Genome Research
409 papers in training set
Top 2%
2.1%
11
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.8%
12
Cell Systems
167 papers in training set
Top 7%
1.7%
13
Nature Genetics
240 papers in training set
Top 5%
1.6%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
1.1%
15
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.1%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.9%
17
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
18
Cell Genomics
162 papers in training set
Top 6%
0.8%
19
Advanced Science
249 papers in training set
Top 19%
0.7%
20
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
21
Nature Computational Science
50 papers in training set
Top 2%
0.7%
22
Molecular Systems Biology
142 papers in training set
Top 2%
0.6%