Signal, Bounds, and Baselines: Principles for Rigorous Evaluation of High-Dimensional Biological Perturbation Prediction
Vollenweider, M. S.; Buehlmann, P.
Show abstract
Understanding cellular responses to perturbations is a central objective in biology and biomedicine, yet rigorously evaluating predictions from high-dimensional transcriptomic models remains an open challenge. Here, we propose the SBB principles (Signal, Bounds, and Baselines) for evaluating biological perturbation prediction. The Signal pillar introduces diagnostic meta-metrics and promotes differentially expressed gene (DEG)-based weighting or filtering to verify and recover metric sensitivity to biological signal masked by high-dimensional noise. The Bounds pillar provides perturbation-wise metric calibration against empirical reference points, transforming raw metric values into interpretable quantities that clarify the actual scale of model improvements. The Baselines pillar establishes a hierarchy of interpretable linear models that serve as rigorous performance floors. Applying these principles across seven transcriptomic perturbation datasets, including single and double perturbations, we demonstrate that complex deep learning methods, including foundation models, often still fail to meaningfully surpass simple linear baselines, and that substantial room for improvement remains even where they do. These principles provide a critical standard for distinguishing genuine biological signal from statistical artifacts and for guiding more robust model development.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.