Back

CMS4-focused multi-omic integration enhances antigen target identification in colorectal cancer

Fox, E.; Meunier, L.; Weill, S.; Appe, G.; Behdenna, A.; Hensen, L.; Lafond, C.; Nordor, A. V.; Marijon, C.

2026-05-07 cancer biology
10.64898/2026.05.04.722755 bioRxiv
Show abstract

Colorectal cancer (CRC) remains a major cause of cancer mortality, with limited options for poor-prognosis subtypes such as CMS4. Antigen-targeted therapies show promise but tend to fail due to inadequate target selection and insufficient patient stratification. Effective prioritization requires large harmonized data capturing CRC heterogeneity - a resource that is currently lacking. To address this need, we built a harmonized multi-omic CRC knowledge base and applied a scalable discovery pipeline to identify antigen targets specifically associated with CMS4 biology and with strong translational potential. We constructed a harmonized CRC atlas by integrating 79 transcriptomics datasets (5,033 tumors, 161 normal samples) using proprietary AI-powered data scouting, integration, and curation technologies. Consensus Molecular Subtypes (CMS) were inferred to capture CMS4-specific expression patterns and this atlas was then combined with 3 bulk RNA-seq reference datasets, 2 single-cell atlases, and 8 protein annotation databases to form a unified multi-omic CRC knowledge base of unmatched scale. From this integrated system, we identified genes differentially expressed in CMS4 patients encoding druggable cell-surface proteins, which we then prioritized using a weighted efficacy- and safety-based scoring model. We identified 236 CMS4-enriched candidates, including 124 not detectable at the CRC-wide level, demonstrating the added resolution gained through subtype stratification. Recovery of known investigational CRC (LGR5, MET, TACSTD2) and CMS4-associated targets of clinical emerging interest (PDGFRB, ALK5/TGFBR1, FAP) support the biological and methodological validity of our approach. Benchmarking against thresholds from FDA-approved pan-cancer targets and terminated trials identified 32 candidates with comparable or superior therapeutic profiles. Among these, 11 were enriched for CMS4-defining pathways, including epithelial-mesenchymal transition, angiogenesis, and stromal invasion, and 5 showed strong profile similarity to established CRC and CMS4 benchmarks. After extensive data exploration, particularly promising candidates were shortlisted for further validation. This work shows that CMS4-focused molecular stratification, when combined with an unprecedentedly large harmonized multi-omic knowledge base, yields a refined set of antigen candidates with enhanced specificity, safety, and biological relevance. The prioritized targets illustrate the power of subtype-resolved discovery to uncover clinically actionable insights. Our pipelines modular design can extend to other tumor contexts, offering a robust foundation for accelerating targeted therapy development.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.3%
14.1%
2
Cancer Cell
38 papers in training set
Top 0.1%
14.1%
3
Nature Communications
4913 papers in training set
Top 19%
9.9%
4
Cell Reports Medicine
140 papers in training set
Top 0.3%
6.7%
5
Cell Genomics
162 papers in training set
Top 0.6%
6.2%
50% of probability mass above
6
Cancer Discovery
61 papers in training set
Top 0.6%
3.5%
7
Nature Cancer
35 papers in training set
Top 0.3%
3.5%
8
Cell
370 papers in training set
Top 8%
3.0%
9
Advanced Science
249 papers in training set
Top 8%
2.6%
10
Nature Medicine
117 papers in training set
Top 1%
2.4%
11
Cell Reports
1338 papers in training set
Top 21%
2.0%
12
Cancer Research
116 papers in training set
Top 2%
2.0%
13
Nature Genetics
240 papers in training set
Top 4%
1.9%
14
npj Precision Oncology
48 papers in training set
Top 0.8%
1.3%
15
Communications Biology
886 papers in training set
Top 15%
1.2%
16
npj Digital Medicine
97 papers in training set
Top 3%
0.9%
17
Cell Systems
167 papers in training set
Top 10%
0.9%
18
Nature
575 papers in training set
Top 15%
0.9%
19
Cell Stem Cell
57 papers in training set
Top 2%
0.9%
20
Nature Cell Biology
99 papers in training set
Top 4%
0.9%
21
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
22
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
23
Molecular Cancer
14 papers in training set
Top 1%
0.7%
24
Nucleic Acids Research
1128 papers in training set
Top 20%
0.6%
25
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.6%
26
Genome Biology
555 papers in training set
Top 9%
0.6%
27
Science Advances
1098 papers in training set
Top 34%
0.6%
28
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.6%