Back

Beyond alignment: synergistic integration is required for multimodal cell foundation models

Richter, T.; Zimmermann, E.; Hall, J.; Theis, F. J.; Raghavan, S.; Winter, P. S.; Amini, A. P.; Crawford, L.

2026-02-24 bioinformatics
10.64898/2026.02.23.707420 bioRxiv
Show abstract

The vision of a "virtual cell"--a computational model that simulates biological function across modalities and scales--has become a defining goal in computational biology. While powerful unimodal foundation models exist, the lack of large-scale paired data prohibits the joint training of multimodal approaches. This scarcity favors compositional foundation models (CFMs): architectures that fuse frozen unimodal experts via a learned interface. However, it remains unclear when this multimodal fusion adds task-relevant information beyond the strongest unimodal representation and when it merely aggregates redundant signal. Here, we introduce the Synergistic Information Score (SIS), a metric grounded in partial information decomposition (PID), that quantifies the information gain achievable only through cross-modal interactions. Extending theoretical results from self-supervised learning, we show that standard alignment-based fusion objectives on frozen encoders inherently collapse to detecting linear redundancies, limiting their ability to capture nonlinear synergistic states. This distinction is directly relevant for tasks aiming to link tissue morphology and gene expression. Benchmarking ten fusion methods on spatial transcriptomics datasets, we use SIS to demonstrate that tasks dominated by linear redundancies are sufficiently served by unimodal baselines, whereas complex niche definitions benefit from synergy-aware integration objectives that enable cross-modal interactions beyond linear alignment. Finally, we perform a scaling analysis which highlights that fine-tuning a dominant unimodal expert is the most sample-efficient path for standard tasks, suggesting that the benefits of multimodal frameworks only emerge when tasks depend on information distributed across modalities. Together, these results establish that building towards a virtual cell will require a fundamental shift from alignment objectives that emphasize shared structure to synergy-maximizing integration that preserves and exploits complementary cross-modal signal.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
33.2%
2
Nature Communications
4913 papers in training set
Top 17%
10.2%
3
PLOS Computational Biology
1633 papers in training set
Top 4%
8.5%
50% of probability mass above
4
Neuron
282 papers in training set
Top 3%
4.9%
5
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 19%
3.6%
6
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
7
Nature Biotechnology
147 papers in training set
Top 3%
2.8%
8
Nature Methods
336 papers in training set
Top 3%
2.5%
9
Ecology Letters
121 papers in training set
Top 0.6%
2.1%
10
Science
429 papers in training set
Top 13%
1.8%
11
Genome Biology
555 papers in training set
Top 4%
1.8%
12
Nature Genetics
240 papers in training set
Top 4%
1.7%
13
eLife
5422 papers in training set
Top 45%
1.5%
14
Development
440 papers in training set
Top 2%
1.3%
15
Cell Reports
1338 papers in training set
Top 27%
1.3%
16
Nature
575 papers in training set
Top 12%
1.3%
17
Molecular Systems Biology
142 papers in training set
Top 1%
1.1%
18
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
19
Cell
370 papers in training set
Top 17%
0.8%
20
Cell Genomics
162 papers in training set
Top 7%
0.6%
21
Nature Microbiology
133 papers in training set
Top 5%
0.6%
22
Bioinformatics
1061 papers in training set
Top 10%
0.6%
23
Nature Computational Science
50 papers in training set
Top 2%
0.5%
24
Science Advances
1098 papers in training set
Top 34%
0.5%
25
Current Biology
596 papers in training set
Top 16%
0.5%
26
Nature Cell Biology
99 papers in training set
Top 5%
0.5%