A Pharmacogenomic-Informed Representation Improves Multimodal EHR Survival Prediction

Lee, M. H.; Xiao, Y.; Li, X.; Klee, E.; Yang, P.; Sio, T.; Wang, L.; Cerhan, J. R.; Zong, N.

2026-01-30 health informatics

10.64898/2026.01.27.26344981 medRxiv

Show abstract

BackgroundElectronic health record (EHR)-based prognostic modeling is increasingly used in oncology, yet incorporating pharmacogenomic (PGx) knowledge derived from experimental systems into clinical prediction frameworks remains challenging. This gap is driven by fundamental mismatches between controlled drug-mutation assays and heterogeneous, incomplete real-world clinical data. MethodsWe propose a representation transfer framework that integrates PGx embeddings learned from large-scale in vitro pharmacogenomic screens into patient-level EHR models. A frozen pharmacogenomic encoder is used to generate interaction-aware embeddings from patient mutation profiles and administered therapies, which are aggregated into a fixed-length PGx Complementarity Representation. These representations are incorporated into multimodal survival prediction models alongside standard clinical features. Performance was evaluated using systematic modality ablation analyses, attribution analyses, and exploratory unsupervised representation analyses. ResultsIntegrating PGx embeddings yielded consistent performance improvements across all evaluated modality combinations. Relative gains were largest in modality-sparse settings, where baseline EHR features encode limited biological context, and were attenuated--but remained significant--in biologically enriched configurations. Attribution analyses indicated that PGx embeddings contributed non-redundant predictive signal beyond standard clinical features. Exploratory unsupervised analyses further demonstrated that the learned representations exhibit interpretable association patterns aligned with known therapeutic exposures and pathway-level associations. ConclusionThese findings suggest that externally learned pharmacogenomic representations can be transferred into real-world EHR models as a context-dependent, non-redundant augmentation. By framing PGx knowledge as an interaction-aware representation rather than a mechanistic model, this work provides an informatics framework for integrating experimental pharmacogenomic data into clinical prediction tasks in a reproducible and interpretable manner.

A Pharmacogenomic-Informed Representation Improves Multimodal EHR Survival Prediction

Matching journals