Back

Signal, Bounds, and Baselines: Principles for Rigorous Evaluation of High-Dimensional Biological Perturbation Prediction

Vollenweider, M. S.; Buehlmann, P.

2026-04-22 bioinformatics
10.64898/2026.04.20.719650 bioRxiv
Show abstract

Understanding cellular responses to perturbations is a central objective in biology and biomedicine, yet rigorously evaluating predictions from high-dimensional transcriptomic models remains an open challenge. Here, we propose the SBB principles (Signal, Bounds, and Baselines) for evaluating biological perturbation prediction. The Signal pillar introduces diagnostic meta-metrics and promotes differentially expressed gene (DEG)-based weighting or filtering to verify and recover metric sensitivity to biological signal masked by high-dimensional noise. The Bounds pillar provides perturbation-wise metric calibration against empirical reference points, transforming raw metric values into interpretable quantities that clarify the actual scale of model improvements. The Baselines pillar establishes a hierarchy of interpretable linear models that serve as rigorous performance floors. Applying these principles across seven transcriptomic perturbation datasets, including single and double perturbations, we demonstrate that complex deep learning methods, including foundation models, often still fail to meaningfully surpass simple linear baselines, and that substantial room for improvement remains even where they do. These principles provide a critical standard for distinguishing genuine biological signal from statistical artifacts and for guiding more robust model development.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.2%
22.6%
2
Nature Machine Intelligence
61 papers in training set
Top 0.1%
12.5%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 4%
12.4%
4
Nature Communications
4913 papers in training set
Top 17%
10.1%
50% of probability mass above
5
Nature Methods
336 papers in training set
Top 2%
4.9%
6
Nature Biotechnology
147 papers in training set
Top 2%
3.7%
7
PLOS Computational Biology
1633 papers in training set
Top 12%
2.7%
8
Bioinformatics
1061 papers in training set
Top 6%
2.6%
9
Genome Biology
555 papers in training set
Top 4%
1.8%
10
Genome Research
409 papers in training set
Top 2%
1.7%
11
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
12
PLOS ONE
4510 papers in training set
Top 58%
1.3%
13
Nature Computational Science
50 papers in training set
Top 1.0%
1.2%
14
Science
429 papers in training set
Top 16%
1.2%
15
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.2%
16
Nucleic Acids Research
1128 papers in training set
Top 15%
1.0%
17
Advanced Science
249 papers in training set
Top 16%
0.9%
18
Nature
575 papers in training set
Top 15%
0.8%
19
NAR Genomics and Bioinformatics
214 papers in training set
Top 3%
0.8%
20
Cell Genomics
162 papers in training set
Top 6%
0.8%
21
Science Advances
1098 papers in training set
Top 30%
0.7%
22
Development
440 papers in training set
Top 4%
0.7%
23
Communications Biology
886 papers in training set
Top 26%
0.7%
24
Scientific Reports
3102 papers in training set
Top 76%
0.7%
25
Patterns
70 papers in training set
Top 3%
0.6%
26
Biology
43 papers in training set
Top 4%
0.5%
27
Cell Reports
1338 papers in training set
Top 37%
0.5%
28
Cell Reports Methods
141 papers in training set
Top 7%
0.5%
29
Neuron
282 papers in training set
Top 10%
0.5%
30
The American Journal of Human Genetics
206 papers in training set
Top 5%
0.5%