A modality gap in personal-genome prediction by sequence-to-function models
Mostafavi, S.; Tu, X.; Spiro, A.; Chikina, M.
Show abstract
Sequence-to-function (S2F) models trained on reference genomes have achieved strong performance on regulatory prediction and variant-effect benchmarks, yet they still struggle to predict inter-individual variation in gene expression from personal genomes. We evaluated AlphaGenome on personal genome prediction in two molecular modalities--gene expression and chromatin accessibility--and observed a striking dichotomy: AlphaGenome approaches the heritability ceiling for chromatin accessibility variation, but remains far below baseline for gene-expression variation, despite improving over Borzoi. Context truncation and fine-mapped QTL analyses indicate that accessibility is governed by local regulatory grammar captured by current architectures, whereas gene-expression variation requires long-range regulatory integration that remains challenging.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.