Back

Supervised restricted data fusion with common, local & distinct components

White, F.; van der Ploeg, G. R.; Heintz-Buschart, A.; Dong, L.; Bouwmeester, H.; Smilde, A.; Westerhuis, J.

2026-05-04 systems biology
10.64898/2026.04.30.721639 bioRxiv
Show abstract

In multi-block data, the dominant sources of variation are not always most relevant to a response of interest, meaning that purely exploratory decompositions may fail to recover subtle but important response-associated structure. We introduce PESCAR, a supervised extension of Penalised Exponential Simultaneous Component Analysis (PESCA) that incorporates response information directly into the estimation of common, local, and distinct (CLD) structure across multiple data blocks. This allows simultaneous multiblock decomposition and response variable influenced recovery of latent structure. Through simulation studies, we show that PESCAR can detect weak response-related components across a range of settings, including different noise levels and model-rank mis-specification. Applied to a real multi-omics dataset, PESCAR recovers biologically meaningful response-associated patterns and retains interpretable block structure. We further demonstrate that sparsity in the fitted loading matrices admits a hypergraph-based interpretability layer, summarising overlapping support patterns across components and blocks. These results show that direct incorporation of response information into multiblock decomposition can improve detection of subtle relevant signal and facilitate interpretation in complex systems.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 3%
22.3%
2
Cell Systems
167 papers in training set
Top 1%
10.0%
3
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
4
Genome Biology
555 papers in training set
Top 2%
4.8%
5
Nucleic Acids Research
1128 papers in training set
Top 4%
4.8%
6
Nature Methods
336 papers in training set
Top 2%
4.3%
50% of probability mass above
7
Communications Biology
886 papers in training set
Top 0.9%
4.3%
8
Bioinformatics
1061 papers in training set
Top 5%
4.3%
9
Molecular Systems Biology
142 papers in training set
Top 0.2%
3.6%
10
PLOS ONE
4510 papers in training set
Top 43%
2.9%
11
Nature Biotechnology
147 papers in training set
Top 4%
2.1%
12
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 30%
1.9%
13
Frontiers in Molecular Biosciences
100 papers in training set
Top 1%
1.9%
14
Genome Medicine
154 papers in training set
Top 4%
1.8%
15
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
16
eLife
5422 papers in training set
Top 43%
1.7%
17
Nature Microbiology
133 papers in training set
Top 2%
1.7%
18
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
19
Nature Computational Science
50 papers in training set
Top 1%
0.9%
20
Advanced Science
249 papers in training set
Top 17%
0.9%
21
Patterns
70 papers in training set
Top 2%
0.9%
22
Nature Genetics
240 papers in training set
Top 7%
0.9%
23
Scientific Reports
3102 papers in training set
Top 73%
0.8%
24
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
25
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
26
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
27
Nature Neuroscience
216 papers in training set
Top 7%
0.6%