Back

gRely: Relyability for genome trained sequence-to-expression models

Rafi, A. M.; Eraslan, G.; Fletez-Brant, K.

2026-05-27 genomics
10.64898/2026.05.23.727431 bioRxiv
Show abstract

Sequence-to-function (S2F) models predict molecular phenotypes from DNA sequence and are increasingly applied to variant effect prediction (VEP), where the goal is to quantify how genetic variants alter gene expression. However, S2F model predictions are not uniformly reliable: accuracy varies substantially across variants, genes, and tissues, and current practice relies on crude magnitude thresholding to enrich for trustworthy predictions, which discards the majority of variants where S2F models could still provide signal. We developed gRely, a meta-modeling framework that estimates the probability that a given Borzoi VEP correctly predicts eQTL direction, using 1,121 features derived from the target variant, gene, and model outputs. On held-out tissues, gRely achieves a mean average precision of 0.885 (random baseline 0.744). Critically, within the low-magnitude regime where thresholding fails entirely, gRely identifies a high-confidence subset with 76% accuracy compared to a 58% baseline, recovering reliable predictions that magnitude filtering would discard. Interpretation via SHAP reveals that in this low-magnitude regime, gene expression level and cross-replicate signal concentration replace VEP magnitude as the primary discriminators of reliability. gRely is the first framework to provide per-prediction confidence scores for S2F model VEPs, and generalizes across architectures, producing consistent improvements on AlphaGenome predictions. By making reliability quantifiable, gRely enables principled filtering rather than blanket thresholding, and marks a step toward trustworthy deployment of S2F models in genomic research and clinical applications.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Genetics
240 papers in training set
Top 0.6%
12.0%
2
Science
429 papers in training set
Top 3%
9.8%
3
The American Journal of Human Genetics
206 papers in training set
Top 0.5%
9.8%
4
Genome Biology
555 papers in training set
Top 0.7%
8.2%
5
Cell Genomics
162 papers in training set
Top 0.3%
8.2%
6
Nature
575 papers in training set
Top 4%
7.0%
50% of probability mass above
7
Nature Biotechnology
147 papers in training set
Top 2%
4.7%
8
Nature Methods
336 papers in training set
Top 2%
4.2%
9
Nature Communications
4913 papers in training set
Top 38%
3.9%
10
Genome Medicine
154 papers in training set
Top 2%
3.9%
11
Nature Machine Intelligence
61 papers in training set
Top 1%
3.5%
12
Nature Neuroscience
216 papers in training set
Top 4%
1.8%
13
Cell Systems
167 papers in training set
Top 8%
1.6%
14
Bioinformatics
1061 papers in training set
Top 7%
1.6%
15
Nature Medicine
117 papers in training set
Top 3%
1.6%
16
Cell
370 papers in training set
Top 13%
1.4%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.2%
18
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
19
Nature Computational Science
50 papers in training set
Top 1%
0.9%
20
Nature Human Behaviour
85 papers in training set
Top 4%
0.9%
21
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
22
Science Translational Medicine
111 papers in training set
Top 6%
0.8%
23
Genome Research
409 papers in training set
Top 4%
0.7%