Back

A Pharmacogenomic-Informed Representation Improves Multimodal EHR Survival Prediction

Lee, M. H.; Xiao, Y.; Li, X.; Klee, E.; Yang, P.; Sio, T.; Wang, L.; Cerhan, J. R.; Zong, N.

2026-01-30 health informatics
10.64898/2026.01.27.26344981 medRxiv
Show abstract

BackgroundElectronic health record (EHR)-based prognostic modeling is increasingly used in oncology, yet incorporating pharmacogenomic (PGx) knowledge derived from experimental systems into clinical prediction frameworks remains challenging. This gap is driven by fundamental mismatches between controlled drug-mutation assays and heterogeneous, incomplete real-world clinical data. MethodsWe propose a representation transfer framework that integrates PGx embeddings learned from large-scale in vitro pharmacogenomic screens into patient-level EHR models. A frozen pharmacogenomic encoder is used to generate interaction-aware embeddings from patient mutation profiles and administered therapies, which are aggregated into a fixed-length PGx Complementarity Representation. These representations are incorporated into multimodal survival prediction models alongside standard clinical features. Performance was evaluated using systematic modality ablation analyses, attribution analyses, and exploratory unsupervised representation analyses. ResultsIntegrating PGx embeddings yielded consistent performance improvements across all evaluated modality combinations. Relative gains were largest in modality-sparse settings, where baseline EHR features encode limited biological context, and were attenuated--but remained significant--in biologically enriched configurations. Attribution analyses indicated that PGx embeddings contributed non-redundant predictive signal beyond standard clinical features. Exploratory unsupervised analyses further demonstrated that the learned representations exhibit interpretable association patterns aligned with known therapeutic exposures and pathway-level associations. ConclusionThese findings suggest that externally learned pharmacogenomic representations can be transferred into real-world EHR models as a context-dependent, non-redundant augmentation. By framing PGx knowledge as an interaction-aware representation rather than a mechanistic model, this work provides an informatics framework for integrating experimental pharmacogenomic data into clinical prediction tasks in a reproducible and interpretable manner.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
44.9%
2
Bioinformatics
1061 papers in training set
Top 4%
5.2%
50% of probability mass above
3
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.7%
3.9%
4
Scientific Reports
3102 papers in training set
Top 32%
3.9%
5
npj Digital Medicine
97 papers in training set
Top 1%
3.9%
6
Nature Communications
4913 papers in training set
Top 47%
2.0%
7
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.8%
8
PLOS Computational Biology
1633 papers in training set
Top 15%
1.8%
9
eBioMedicine
130 papers in training set
Top 2%
1.4%
10
iScience
1063 papers in training set
Top 18%
1.4%
11
Cancer Medicine
24 papers in training set
Top 0.9%
1.4%
12
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.4%
13
npj Precision Oncology
48 papers in training set
Top 0.8%
1.3%
14
Artificial Intelligence in Medicine
15 papers in training set
Top 0.5%
1.0%
15
JAMIA Open
37 papers in training set
Top 1%
1.0%
16
Patterns
70 papers in training set
Top 2%
1.0%
17
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
18
JMIR Medical Informatics
17 papers in training set
Top 1%
0.9%
19
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
20
Biomedicines
66 papers in training set
Top 2%
0.9%
21
European Journal of Cancer
10 papers in training set
Top 0.4%
0.8%
22
Nature Cancer
35 papers in training set
Top 1%
0.8%
23
Frontiers in Immunology
586 papers in training set
Top 7%
0.8%
24
Informatics in Medicine Unlocked
21 papers in training set
Top 1.0%
0.8%
25
Nature Machine Intelligence
61 papers in training set
Top 3%
0.8%
26
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.8%
27
BMC Infectious Diseases
118 papers in training set
Top 5%
0.8%
28
Modern Pathology
21 papers in training set
Top 0.4%
0.8%
29
Database
51 papers in training set
Top 0.9%
0.8%
30
Cell Reports Medicine
140 papers in training set
Top 9%
0.7%