Back

Benchmarking Static Gene Regulatory Network Reconstruction and Dynamic Transition Probing in Single-Cell Foundation Models.

Ye, z.; Yang, N.; Yang, X.; Mao, X.; Tang, C.

2026-05-20 systems biology
10.64898/2026.05.17.725083 bioRxiv
Show abstract

Single-cell foundation models may encode gene regulatory information, but it remains unclear which model components capture this signal and how it compares with conventional inference methods. Here, we introduce a unified benchmark that evaluates gene regulatory network (GRN) reconstruction from six single-cell foundation models and three classical baselines across six datasets and four reference network types. We disentangle three sources of regulatory signal within each model--pretrained token embeddings, final-layer hidden states, and attention-derived scores. Under a strict zero-shot setting, scGPT token-embedding similarity outperforms classical baselines on STRING and ChIP-seq references, recovers core transcription factors, and best preserves reference network topology. Moreover, static GRNs cannot test whether learned gene-gene relationships are predictive of expression dynamics, we therefore introduce dynamic transition probing, which iteratively applies a models reconstruction head to drive early-cell profiles toward late-cell states without temporal supervision. We find pretrained models capture meaningful developmental transitions, with scFoundation showing the strongest overall performance. Together, our results show that single-cell foundation models encode transferable regulatory and dynamical priors, but how well these priors can be recovered depends on model architecture, pretraining design, and extraction strategy.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.7%
14.2%
2
Genome Biology
555 papers in training set
Top 0.4%
10.1%
3
Nature Communications
4913 papers in training set
Top 31%
6.1%
4
Nature Methods
336 papers in training set
Top 2%
6.1%
5
Genome Research
409 papers in training set
Top 0.5%
6.1%
6
Nature Machine Intelligence
61 papers in training set
Top 0.5%
6.1%
7
Bioinformatics
1061 papers in training set
Top 5%
4.7%
50% of probability mass above
8
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
9
Nucleic Acids Research
1128 papers in training set
Top 6%
3.5%
10
Cell Reports
1338 papers in training set
Top 16%
3.5%
11
PLOS Computational Biology
1633 papers in training set
Top 11%
3.5%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 25%
2.6%
13
Development
440 papers in training set
Top 1%
2.0%
14
Bioinformatics Advances
184 papers in training set
Top 3%
1.8%
15
Nature
575 papers in training set
Top 10%
1.8%
16
Nature Computational Science
50 papers in training set
Top 0.8%
1.4%
17
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.3%
18
Cell Reports Methods
141 papers in training set
Top 3%
1.3%
19
Scientific Reports
3102 papers in training set
Top 67%
1.2%
20
PLOS ONE
4510 papers in training set
Top 61%
1.2%
21
Nature Genetics
240 papers in training set
Top 6%
1.2%
22
iScience
1063 papers in training set
Top 23%
1.1%
23
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.1%
24
Nature Neuroscience
216 papers in training set
Top 5%
0.9%
25
Molecular Systems Biology
142 papers in training set
Top 1%
0.9%
26
eLife
5422 papers in training set
Top 60%
0.7%
27
Science
429 papers in training set
Top 21%
0.7%
28
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
29
Genome Medicine
154 papers in training set
Top 9%
0.7%
30
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.7%
0.7%