Back

In vivo validation of predicted fitness effects at single-base resolution in a Brachypodium distachyon mutant population

Moslemi, C.; Folgoas, M.; Yu, X.; Jensen, J. D.; Hentrup, S.; Li, T.; Wang, H.; Boelt, B.; Asp, T.; Sibout, R.; Ramstein, G. P.

2026-04-02 genomics
10.64898/2026.03.31.715642 bioRxiv
Show abstract

Computational tools, including biological language models (LMs), show substantial promise in predicting the impact of genetic variants on plant fitness. However, validating variant effect predictions (VEP) requires experimental populations where genetic variation consists of discrete point mutations rather than segregating recombination blocks. In this study, we generated a novel population of Brachypodium distachyon mutant lines to evaluate the accuracy of VEP at single-base resolution. These lines were advanced through single-seed descent for five generations (M1 to M5), with whole-genome sequencing performed at M2 and M5 and phenotypic measurements recorded at M3 and M4. Using state-of-the-art VEP models, we predicted the functional impact of missense protein-coding variants and gene-proximal non-coding variants. We validated these predictions by estimating the effect of mutations on whole-plant measurements (burden tests) and their probability of fixation from M2 to M5 (purging tests). Among missense variants, the protein LM ESM showed superior predictive accuracy compared to the bioinformatic standard SIFT and the genomic LM PlantCAD. Notably, the relationship between VEP scores and allele fixation suggested a log-linear relationship between VEP scores and variant fitness. Among gene-proximal variants, PlantCAD appeared more accurate than supervised models of regulatory activity, such as chromatin accessibility (a2z) and RNA abundance (PhytoExpr). Collectively, our findings highlight the utility of state-of-the-art VEP tools as predictors of fitness and demonstrate the potential of mutant populations to evaluate computational tools for precision breeding applications.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
New Phytologist
309 papers in training set
Top 0.2%
18.6%
2
The Plant Cell
141 papers in training set
Top 0.3%
10.0%
3
Plant Biotechnology Journal
56 papers in training set
Top 0.1%
7.1%
4
Plant Physiology
217 papers in training set
Top 0.9%
4.8%
5
The Plant Journal
197 papers in training set
Top 1%
4.3%
6
The Plant Genome
53 papers in training set
Top 0.2%
4.1%
7
Plant Communications
35 papers in training set
Top 0.4%
3.6%
50% of probability mass above
8
Frontiers in Plant Science
240 papers in training set
Top 2%
3.6%
9
Nature Plants
84 papers in training set
Top 0.6%
3.1%
10
Genome Biology
555 papers in training set
Top 3%
3.1%
11
Nature Communications
4913 papers in training set
Top 45%
2.4%
12
in silico Plants
24 papers in training set
Top 0.1%
2.4%
13
Horticulture Research
43 papers in training set
Top 0.7%
2.3%
14
eLife
5422 papers in training set
Top 36%
2.1%
15
Genetics
225 papers in training set
Top 2%
1.8%
16
Scientific Reports
3102 papers in training set
Top 59%
1.7%
17
Nature Genetics
240 papers in training set
Top 5%
1.5%
18
Molecular Plant
36 papers in training set
Top 0.9%
1.5%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.2%
20
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
1.2%
21
Cell Genomics
162 papers in training set
Top 5%
1.2%
22
Plant Direct
81 papers in training set
Top 2%
1.2%
23
PLOS Genetics
756 papers in training set
Top 12%
1.1%
24
Applications in Plant Sciences
21 papers in training set
Top 0.2%
0.9%
25
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 40%
0.9%
26
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
27
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.8%
28
Communications Biology
886 papers in training set
Top 21%
0.8%
29
GENETICS
189 papers in training set
Top 2%
0.7%
30
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%