Back

When Multimodal Fusion Fails: Contrastive Alignment as a Necessary Stabilizer for TCR--Peptide Binding Prediction

Qi, C.; Wang, W.; Fang, H.; Wei, Z.

2026-04-02 bioinformatics
10.64898/2026.03.31.715453 bioRxiv
Show abstract

Multimodal learning is commonly assumed to improve predictive performance, yet in biological applications auxiliary modalities are often imperfect and can degrade learning if fused naively. We investigate this problem in TCR-peptide binding prediction, where sequence embeddings from pretrained protein language models are strong and transferable, but structure-derived residue graphs are built from predicted folds and heuristic discretization. In this setting, structural views can be noisy, inconsistent, and difficult to optimize jointly with sequence features. We introduce TRACE, a lightweight multimodal framework that encodes each entity (TCR and peptide) with parallel sequence and graph towers, then applies CLIP-style intra-entity contrastive alignment before interaction modeling. The alignment objective regularizes representation geometry by encouraging modality consistency for the same biological entity, thereby preventing unstable graph signals from dominating fusion. Across protocol-aware TCHard RN evaluations, naive sequence+graph fusion frequently underperforms a sequence-only baseline and can collapse toward near-random behavior. In contrast, TRACE consistently restores and improves performance. Controlled noise and supervision sweeps show that these gains persist under increasing graph corruption and positive-label scarcity, indicating that alignment is especially important when training conditions are hard. Our results challenge the assumption that adding modalities is inherently beneficial. Instead, they highlight a central principle for robust multimodal bioinformatics: performance depends not only on what modalities are used, but on how their interaction is constrained during optimization. TRACE provides a simple and general recipe for leveraging imperfect structural information without sacrificing stability.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.2%
22.7%
2
Nature Methods
336 papers in training set
Top 0.5%
14.4%
3
Nature Communications
4913 papers in training set
Top 21%
9.2%
4
Nature Biotechnology
147 papers in training set
Top 0.9%
8.5%
50% of probability mass above
5
Nature Machine Intelligence
61 papers in training set
Top 0.4%
6.4%
6
Bioinformatics
1061 papers in training set
Top 5%
4.3%
7
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 17%
4.0%
8
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.1%
9
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
10
Science
429 papers in training set
Top 14%
1.7%
11
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
12
Genome Biology
555 papers in training set
Top 4%
1.7%
13
Nature
575 papers in training set
Top 12%
1.3%
14
Nature Computational Science
50 papers in training set
Top 1.0%
1.2%
15
Genome Research
409 papers in training set
Top 3%
1.1%
16
Nature Genetics
240 papers in training set
Top 6%
1.0%
17
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
18
Protein Science
221 papers in training set
Top 2%
0.8%
19
eLife
5422 papers in training set
Top 57%
0.8%
20
Nature Medicine
117 papers in training set
Top 5%
0.7%
21
Scientific Reports
3102 papers in training set
Top 76%
0.7%
22
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.7%
23
GigaScience
172 papers in training set
Top 3%
0.7%
24
Biophysical Journal
545 papers in training set
Top 6%
0.6%
25
Advanced Science
249 papers in training set
Top 21%
0.6%
26
Structure
175 papers in training set
Top 4%
0.6%
27
Protein Engineering, Design and Selection
14 papers in training set
Top 0.1%
0.6%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 5%
0.5%
29
Patterns
70 papers in training set
Top 3%
0.5%