Back

STRmie-HD enables interruption-aware HTT repeat genotyping and somatic mosaicism profiling across sequencing platforms

Napoli, A.; Liorni, N.; Biagini, T.; Giovannetti, A.; Squitieri, A.; Miele, L.; Urbani, A.; Caputo, V.; Gasbarrini, A.; Squitieri, F.; Mazza, T.

2026-03-25 bioinformatics
10.64898/2026.03.21.713334 bioRxiv
Show abstract

Short tandem repeat expansions in exon 1 of the HTT gene drive Huntingtons disease (HD) pathogenesis, with disease onset and progression heavily influenced by somatic mosaicism and sequence interruptions. While sequencing technologies enable repeat sizing, many computational tools lack the resolution to capture subtle interruption motifs and allele-specific somatic variation. We present STRmie-HD, an alignment-free, de novo framework for interruption-aware genotyping and quantitative profiling of somatic mosaicism at single-read resolution. The tool parses individual reads to quantify uninterrupted CAG tract length, CCG repeat content, and critical interruption variants, including Loss of Interruption (LOI) and Duplication of Interruption (DOI). Validated across Illumina, PacBio SMRT, and Oxford Nanopore platforms, STRmie-HD demonstrates high concordance with reference genotypes and superior sensitivity in identifying rare interruption patterns that conventional tools often overlook. Furthermore, it implements somatic mosaicism metrics to characterize repeat dynamics, successfully distinguishing the higher somatic expansion burden in brain tissues compared to peripheral blood. STRmie-HD offers a comprehensive and extensible solution for high-resolution molecular characterization of HTT variation, providing a robust framework for patient stratification and genetic research in HD. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=72 SRC="FIGDIR/small/713334v1_ufig1.gif" ALT="Figure 1"> View larger version (27K): org.highwire.dtl.DTLVardef@17a54aforg.highwire.dtl.DTLVardef@4dcfc5org.highwire.dtl.DTLVardef@8398edorg.highwire.dtl.DTLVardef@1acefde_HPS_FORMAT_FIGEXP M_FIG Graphical Abstract: STRmie-HD flowchart. STRmie-HD is a comprehensive analytical framework that processes sequencing reads to analyze CAG/CCG trinucleotide repeats, interruption variants, and somatic mosaicism in the HTT gene. The workflow begins with sequencing reads (FASTA/FASTQ) that can undergo optional custom processing eq]based on the sequencing design. These reads are then fed into a regular expression-based engine (STRmie-HD) to identify CAG and CCG motifs. The identified motifs lead to the estimation of CAG/CCG alleles, visualized as distinct peaks representing different allele sizes, interruption variant assessment, and somatic mosaicism quantification. STRmie-HD produces an HTML output that wraps this information into a report. C_FIG

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 1%
21.9%
2
Bioinformatics Advances
184 papers in training set
Top 0.1%
18.1%
3
BMC Bioinformatics
383 papers in training set
Top 2%
6.2%
4
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.3%
6.1%
50% of probability mass above
5
Nucleic Acids Research
1128 papers in training set
Top 4%
4.7%
6
Genome Medicine
154 papers in training set
Top 2%
3.5%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.5%
8
PLOS ONE
4510 papers in training set
Top 51%
1.8%
9
Scientific Reports
3102 papers in training set
Top 55%
1.8%
10
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
11
Nature Communications
4913 papers in training set
Top 52%
1.6%
12
Alzheimer's & Dementia
143 papers in training set
Top 2%
1.6%
13
Genome Biology
555 papers in training set
Top 5%
1.4%
14
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
15
BMC Genomics
328 papers in training set
Top 4%
1.2%
16
npj Genomic Medicine
33 papers in training set
Top 0.6%
1.2%
17
Cell Reports Methods
141 papers in training set
Top 4%
1.1%
18
Advanced Science
249 papers in training set
Top 15%
1.1%
19
Methods
29 papers in training set
Top 0.4%
0.9%
20
Genes
126 papers in training set
Top 3%
0.8%
21
Database
51 papers in training set
Top 0.9%
0.8%
22
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
23
Disease Models & Mechanisms
119 papers in training set
Top 3%
0.7%
24
Human Mutation
29 papers in training set
Top 0.8%
0.7%
25
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
26
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
27
Neurology Genetics
14 papers in training set
Top 0.4%
0.6%
28
Journal of Molecular Biology
217 papers in training set
Top 4%
0.6%
29
International Journal of Molecular Sciences
453 papers in training set
Top 18%
0.6%
30
Alzheimer's & Dementia: Translational Research & Clinical Interventions
16 papers in training set
Top 0.8%
0.6%