Back

Practical utility of sequence-to-omics models for improving the reproducibility of genetic fine-mapping

Sweeney, M. D.; Kang, H. M.

2026-02-06 genetics
10.64898/2026.02.04.703796 bioRxiv
Show abstract

Recent advances in deep learning have led to the development of sequence-to-omics (S2O) models that predict molecular phenotypes directly from DNA sequences. Here, we systematically evaluate the utility of these models, e.g., AlphaGenome, Borzoi, Enformer, and Sei, for improving the reproducibility of genetic fine-mapping across expression quantitative trait loci (eQTL) datasets from Genotype-Tissue Expression (GTEx), Trans-Omics Precision Medicine (TOPMed), and Multi-Ancestry Analysis of Gene Expression (MAGE) projects. We show that purely statistical fine-mapping often yields high replication failure rates (RFRs), but integrating S2O model predictions substantially reduces RFRs and enhances the accuracy of prioritizing SNPs replicated in other consortia. We describe a generalized framework for functionally informed fine-mapping that combines traditional posterior inclusion probabilities (PIPs) from statistical fine-mapping methods with scores from S2O models to generate functionally informed PIPs (fiPIPs) that improve reproducibility. Our findings demonstrate that S2O models, particularly newer ones like AlphaGenome and Borzoi, enable robust identification of replicated variants across consortia, highlighting their promise for scalable, functionally aware genetic mapping.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
28.2%
2
Cell Genomics
162 papers in training set
Top 0.1%
10.6%
3
Nature Communications
4913 papers in training set
Top 21%
9.3%
4
Nature Genetics
240 papers in training set
Top 0.8%
8.6%
50% of probability mass above
5
Genome Medicine
154 papers in training set
Top 0.8%
6.9%
6
Genome Biology
555 papers in training set
Top 1%
6.4%
7
Nucleic Acids Research
1128 papers in training set
Top 7%
3.3%
8
Bioinformatics
1061 papers in training set
Top 6%
3.1%
9
Genome Research
409 papers in training set
Top 2%
2.4%
10
Nature Biotechnology
147 papers in training set
Top 4%
1.7%
11
PLOS Genetics
756 papers in training set
Top 10%
1.4%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 37%
1.2%
13
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.0%
14
Human Genetics and Genomics Advances
70 papers in training set
Top 0.5%
1.0%
15
Nature Methods
336 papers in training set
Top 5%
0.9%
16
Nature
575 papers in training set
Top 15%
0.8%
17
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
18
Scientific Reports
3102 papers in training set
Top 76%
0.7%
19
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
20
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
21
Science Translational Medicine
111 papers in training set
Top 8%
0.5%
22
Cell
370 papers in training set
Top 19%
0.5%
23
PLOS ONE
4510 papers in training set
Top 73%
0.5%
24
Nature Machine Intelligence
61 papers in training set
Top 4%
0.5%