Back

Validation of a Composite Mortality Endpoint in a Large, Clinico-Genomic Real-World Database of Patients with Advanced Cancer

Kapilivsky, J.; Islam, F.; Roth, E. K.; Dow, J.; Moran, S.; Scherrer, E.; Hyun, S. W.; Sangli, C.

2025-08-24 health informatics
10.1101/2025.08.20.25334011 medRxiv
Show abstract

PurposeReal-world data (RWD) from electronic health records (EHRs) and next-generation sequencing are increasingly used to study treatment effectiveness in molecularly refined patient populations. Incomplete mortality data in EHR can overestimate survival rates in RWD studies. While the National Death Index (NDI) is the gold standard for mortality data in the United States, its limited accessibility and reporting delays hinder timely research. Instead, EHR datasets are often supplemented with external mortality data sources to improve mortality data capture. This study evaluated a composite mortality variable against NDI records using a large cohort of advanced cancer patients from a real-world oncology database. MethodsDe-identified clinical and molecular data from patients with advanced solid tumors were linked with third-party mortality and claims datasets using deterministic tokenization. Vital status and death dates were harmonized across sources. Patient identifiers were submitted to NDI, and true matches were de-identified and joined for analysis. Performance metrics (sensitivity, specificity, positive predictive value [PPV], negative predictive value [NPV]) were calculated using NDI as ground truth. Date agreement was assessed at 0, {+/-}15, and {+/-}30-day tolerances. Subgroup analyses and a cumulative cases/dynamic controls (CC/DC) approach were also performed. ResultsAmong 17,597 patients, the composite mortality variable demonstrated 82% sensitivity and 95% specificity against NDI. PPV was 96%, and NPV was 77%. Exact date agreement was 86%, increasing to 94% within a {+/-}15-day tolerance and 96% within a {+/-}30-day tolerance. Incorporating third-party mortality and claims data substantially improved sensitivity from 17% (EHR alone) to 82%. Sensitivity remained stable across subgroups but showed variation by age, cancer type, geographic region, and race. With the CC/DC approach, sensitivity was 96% at 6 months, 97% at 12 months, and 98% at 24 months, with specificity above 98% across these timeframes. ConclusionsThe composite mortality variable is a robust, reliable endpoint for real-world evidence analyses. Its high accuracy for identified deaths and appropriate censoring of lost-to-follow-up patients support its use in overall survival analyses. This validation is a foundational step towards high-quality research to improve patient outcomes and advance cancer drug development using this multimodal dataset. Clinical trial number: not applicable

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
26.3%
2
Cancer Medicine
24 papers in training set
Top 0.1%
10.6%
3
PLOS ONE
4510 papers in training set
Top 31%
4.9%
4
The Lancet Digital Health
25 papers in training set
Top 0.1%
4.9%
5
npj Digital Medicine
97 papers in training set
Top 1.0%
4.4%
50% of probability mass above
6
Nature Communications
4913 papers in training set
Top 39%
3.6%
7
BMJ Open
554 papers in training set
Top 6%
3.1%
8
Scientific Reports
3102 papers in training set
Top 41%
3.1%
9
Annals of Internal Medicine
27 papers in training set
Top 0.2%
2.6%
10
JMIR Medical Informatics
17 papers in training set
Top 0.6%
1.9%
11
JAMA Network Open
127 papers in training set
Top 2%
1.7%
12
Journal of Clinical Epidemiology
28 papers in training set
Top 0.3%
1.7%
13
BMC Medical Research Methodology
43 papers in training set
Top 0.8%
1.4%
14
eBioMedicine
130 papers in training set
Top 2%
1.4%
15
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.2%
16
npj Precision Oncology
48 papers in training set
Top 0.9%
1.0%
17
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
18
Bioinformatics
1061 papers in training set
Top 9%
0.9%
19
Scientific Data
174 papers in training set
Top 2%
0.9%
20
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.8%
21
Clinical and Translational Science
21 papers in training set
Top 0.9%
0.8%
22
JAMIA Open
37 papers in training set
Top 1%
0.8%
23
Informatics in Medicine Unlocked
21 papers in training set
Top 1%
0.8%
24
BMC Infectious Diseases
118 papers in training set
Top 5%
0.8%
25
Frontiers in Oncology
95 papers in training set
Top 3%
0.8%
26
BMJ Health & Care Informatics
13 papers in training set
Top 0.9%
0.8%
27
European Journal of Cancer
10 papers in training set
Top 0.5%
0.8%
28
JCO Precision Oncology
14 papers in training set
Top 0.4%
0.8%
29
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
30
Nature Cancer
35 papers in training set
Top 1%
0.8%