Back

Population-scale detection of methylation outliers from long-read genome sequencing

Jensen, T. D.; Kaur, R.; Bonner, D. E.; Nguyen, J.; Reuter, C. M.; Undiagnosed Diseases Network, ; Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium, ; Ashley, E. A.; Bernstein, J. A.; Wheeler, M. T.; Montgomery, S. B.

2026-06-11 genetic and genomic medicine
10.64898/2026.06.09.26355279 medRxiv
Show abstract

Background: Aberrant DNA methylation can mediate the functional effects of rare genetic variation and contribute to imprinting disorders, repeat expansion diseases, and other pathogenic regulatory mechanisms. Long-read sequencing technologies now enable genome-wide detection of CpG methylation alongside genetic variation from a single assay. However, methods for systematic identification and interpretation of methylation outliers from long-read sequencing data remain limited. Methods: We developed METAFORA, a computational workflow for detecting methylation outlier regions from PacBio and Oxford Nanopore long-read sequencing data. METAFORA constructs population-level methylation references, segments the genome into correlated CpG blocks, infers technical and biological sources of variation through hidden factor estimation, models uncertainty due to variable depth sequencing, and computes covariate-adjusted methylation outlier scores for individual samples. We applied METAFORA across large long-read sequencing cohorts and integrated methylation outliers with multi-omic data. METAFORA is implemented as a snakemake workflow available at https://github.com/tjense25/METAFORA. Results: METAFORA identified methylation outlier regions associated with rare structural variants, tandem repeat expansions, and imprinting abnormalities. We found outlier regions were enriched for molecular outliers across transcriptomic and chromatin accessibility datasets, supporting their functional relevance in gene regulation. In a representative case, METAFORA identified an imprinting defect affecting the GNAS locus associated with an STX16 deletion. Conclusions: METAFORA enables scalable detection and interpretation of methylation outliers from long-read sequencing data and provides a framework for integrating epigenetic outliers with genomic and multi-omic analyses. These approaches may improve interpretation of rare regulatory variation and support discovery of clinically relevant epigenetic abnormalities in genomic medicine.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
28.1%
2
Bioinformatics
1061 papers in training set
Top 1%
19.0%
3
Genome Biology
555 papers in training set
Top 0.6%
8.6%
50% of probability mass above
4
The American Journal of Human Genetics
206 papers in training set
Top 0.7%
6.4%
5
BMC Genomics
328 papers in training set
Top 0.7%
3.7%
6
Nature Communications
4913 papers in training set
Top 39%
3.7%
7
Cell Genomics
162 papers in training set
Top 2%
3.1%
8
Bioinformatics Advances
184 papers in training set
Top 2%
2.9%
9
Scientific Reports
3102 papers in training set
Top 61%
1.5%
10
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.4%
11
Med
38 papers in training set
Top 0.4%
1.4%
12
Genetics in Medicine
69 papers in training set
Top 0.8%
1.2%
13
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
14
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
15
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
16
npj Genomic Medicine
33 papers in training set
Top 0.8%
0.8%
17
Genetic Epidemiology
46 papers in training set
Top 0.8%
0.8%
18
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.5%
0.8%
19
npj Precision Oncology
48 papers in training set
Top 1%
0.7%
20
Human Mutation
29 papers in training set
Top 0.7%
0.7%
21
GENETICS
189 papers in training set
Top 2%
0.7%
22
BMC Medical Genomics
36 papers in training set
Top 2%
0.7%
23
Frontiers in Genetics
197 papers in training set
Top 11%
0.7%
24
European Journal of Human Genetics
49 papers in training set
Top 2%
0.5%
25
Genetics in Medicine Open
10 papers in training set
Top 0.2%
0.5%
26
Nature Biotechnology
147 papers in training set
Top 9%
0.5%