Back

SCIA: A fast and widely applicable pipeline for measuring expanded repeat instability

Smith, C.; Peter Durairaj, R. R.; Randall, E. L.; Aston, A. N.; Heraty, L.; Elsayed, W.; Murillo, A.; Dion, V.

2026-03-15 neuroscience
10.64898/2026.03.12.707943 bioRxiv
Show abstract

The expansion of short tandem repeats is a feature of over 60 different human diseases. Ongoing somatic instability throughout a patients lifetime can influence disease progression and has emerged as a therapeutic target. Understanding its mechanism is essential for the identification of both drug targets and therapeutic interventions. A major obstacle towards this translational goal has been to measure changes in repeat size distribution in a timely manner. To address this, here we present Single Clone-based Instability Assay (SCIA), a streamlined experimental design that saves weeks in assessing the effect of a gene knockout on repeat instability. The approach avoids bulk cultures and does not require a reporter cell line. It uses targeted long-read sequencing as a readout for repeat instability. We have validated the approach using FAN1, PMS1, and MLH1 knockouts in HEK293-derived cells. We provide a visualization software that generates delta plots, extracts the instability frequency, the bias towards expansion or contraction, and the average size of the changes. Using SCIA, we find that although FAN1 knockout clones showed increased frequency of expansions, the size of the expansions were smaller. This highlights the wealth of information that can be extracted and the potential for novel insights into the mechanism of repeat instability.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.8%
18.8%
2
BMC Medical Genomics
36 papers in training set
Top 0.1%
14.9%
3
PLOS ONE
4510 papers in training set
Top 22%
8.5%
4
eneuro
389 papers in training set
Top 1%
6.4%
5
Nucleic Acids Research
1128 papers in training set
Top 4%
4.9%
50% of probability mass above
6
BMC Bioinformatics
383 papers in training set
Top 2%
4.0%
7
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
8
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.4%
9
Cell Reports Methods
141 papers in training set
Top 2%
2.1%
10
BMC Genomics
328 papers in training set
Top 2%
1.8%
11
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
12
Bioinformatics
1061 papers in training set
Top 7%
1.7%
13
Open Biology
95 papers in training set
Top 0.6%
1.7%
14
SLAS Discovery
25 papers in training set
Top 0.1%
1.7%
15
BioTechniques
24 papers in training set
Top 0.1%
1.5%
16
International Journal of Molecular Sciences
453 papers in training set
Top 9%
1.5%
17
Nature Communications
4913 papers in training set
Top 54%
1.3%
18
iScience
1063 papers in training set
Top 19%
1.3%
19
Methods
29 papers in training set
Top 0.3%
1.2%
20
Communications Biology
886 papers in training set
Top 14%
1.2%
21
PLOS Genetics
756 papers in training set
Top 13%
0.9%
22
eLife
5422 papers in training set
Top 55%
0.8%
23
Genome Medicine
154 papers in training set
Top 8%
0.7%
24
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.7%
25
GigaScience
172 papers in training set
Top 4%
0.7%
26
Genome Research
409 papers in training set
Top 5%
0.7%
27
Nature Methods
336 papers in training set
Top 7%
0.5%
28
Wellcome Open Research
57 papers in training set
Top 3%
0.5%
29
Forensic Science International: Genetics
24 papers in training set
Top 0.2%
0.5%
30
Journal of Medical Genetics
28 papers in training set
Top 0.7%
0.5%