Back

KaryoScope: rapid, alignment-free sequence annotation for the pangenome era

Ranallo-Benavidez, T. R.; Chen, Y.-A.; Potapova, T. A.; Alanko, J. N.; Loucks, H.; Lucas, J.; Human Pangenome Reference Consortium, ; Guarracino, A.; Puglisi, S. J.; MARCHET, C.; Miga, K. H.; Gerton, J. L.; Barthel, F. P.

2026-05-17 bioinformatics
10.64898/2026.05.15.725544 bioRxiv
Show abstract

The pangenome era is producing long-read sequencing data and complete genome assemblies (1-3) at a pace that current annotation methods cannot match. Existing tools were each built for a single feature class (repeats, centromeric satellites, or genes) and falter precisely where the genome is most variable and harbours clinically important variation: the centromeres, subtelomeres, and acrocentric short arms. Here we present KaryoScope, an alignment-free method to annotate an assembly at base-pair resolution across any desired feature classes in a single pass, completing in minutes on a standard workstation. Applied to the Human Pangenome Reference Consortium Release 2 assemblies (3), KaryoScope identifies the SST1 macrosatellite as the recurrent sequence at Robertsonian translocation fusion points (4, 5), delivers the first pangenome-wide census of D4Z4 macrosatellite structural diversity at the 4q and 10q subtelomeres relevant to facioscapulohumeral muscular dystrophy (6), and reveals previously uncharacterised centromere structural polymorphism, including chromosome-specific satellite loss and megabase-scale rearrangement validated by fluorescence in situ hybridization. A pre-built KaryoScope database for the human genome is distributed alongside the tool, and additional databases can be built for any reference genome or annotation source. Together, these capabilities bring the most variable regions of the genome within reach for comparative, clinical, and pangenome-scale analysis. KaryoScope is available at https://github.com/barthel-lab/KaryoScope.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 0.4%
19.5%
2
Nature Methods
336 papers in training set
Top 1%
9.2%
3
Bioinformatics
1061 papers in training set
Top 4%
6.3%
4
Science
429 papers in training set
Top 7%
4.3%
5
Nature
575 papers in training set
Top 6%
4.3%
6
Genome Medicine
154 papers in training set
Top 2%
4.3%
7
Nature Biotechnology
147 papers in training set
Top 2%
4.0%
50% of probability mass above
8
Bioinformatics Advances
184 papers in training set
Top 1%
3.7%
9
Science Advances
1098 papers in training set
Top 5%
3.6%
10
Nature Communications
4913 papers in training set
Top 40%
3.6%
11
Genome Research
409 papers in training set
Top 1.0%
3.6%
12
Nature Genetics
240 papers in training set
Top 3%
2.6%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 28%
2.1%
14
Genome Biology
555 papers in training set
Top 3%
2.1%
15
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.8%
16
Cell Genomics
162 papers in training set
Top 3%
1.7%
17
Microbiology Resource Announcements
22 papers in training set
Top 0.5%
1.2%
18
Cancer Discovery
61 papers in training set
Top 1%
1.2%
19
Leukemia
39 papers in training set
Top 0.6%
1.1%
20
Database
51 papers in training set
Top 0.6%
1.0%
21
The American Journal of Human Genetics
206 papers in training set
Top 3%
0.9%
22
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
23
American Journal of Respiratory and Critical Care Medicine
39 papers in training set
Top 0.9%
0.7%
24
Science Translational Medicine
111 papers in training set
Top 6%
0.7%
25
Molecular Therapy - Nucleic Acids
24 papers in training set
Top 0.5%
0.6%
26
Clinical Infectious Diseases
231 papers in training set
Top 5%
0.6%
27
Science Immunology
81 papers in training set
Top 2%
0.6%
28
Cell Reports Methods
141 papers in training set
Top 6%
0.6%