CT-Based Deep Foundation Model for Predicting Immune Checkpoint Inhibitor-Induced Pneumonitis Risk in Lung Cancer
Muneer, A.; Showkatian, E.; Kitsel, Y.; Saad, M. B.; Sujit, S. J.; Soto, F.; Shroff, G. S.; Faiz, S. A.; Ghanbar, M. I.; Ismail, S. M.; Vokes, N. I.; Cascone, T.; Le, X.; Zhang, J.; Byers, L. A.; Jaffray, D.; Chang, J. Y.; Liao, Z.; Naing, A.; Gibbons, D. L.; Vaporciyan, A. A.; Heymach, J. V.; Suresh, K. S.; Altan, M.; Sheshadri, A.; Wu, J.
Show abstract
Background: Immune checkpoint inhibitors (ICIs) have revolutionized cancer therapy but can cause serious immune-related adverse events (irAEs), with pneumonitis (ICI-P) being among the most severe. Early identification of high-risk patients before ICI initiation is critical for closer monitoring, timely intervention, and improved outcomes. Purpose: To develop and validate a deep learning foundation model to predict ICI-P from baseline CT scans in patients with lung cancer. Methods: We designed the Checkpoint-Inhibitor Pneumonitis Hazard EstimatoR (CIPHER), a deep learning foundation model that combines contrastive learning with a transformer-based masked autoencoder to predict ICI-P from baseline CT scans in patients with lung cancer. Using self-supervised learning, CIPHER was pre-trained on 590,284 CT slices from 2,500 non-small cell lung cancer (NSCLC) patients to capture heterogeneous lung parenchymal patterns. After pre-training, the model was fine-tuned on an internal NSCLC cohort for ICI-P risk prediction, using images from 254 patients for model development and 93 patients for internal validation. We compared CIPHER with classical radiomic models and further evaluated it on an external NSCLC cohort of 116 patients. Results: In the internal immunotherapy cohort, CIPHER consistently distinguished patients at elevated risk of ICI-P from those without the event, with AUCs ranging from 0.77 to 0.85. In head-to-head benchmarking, CIPHER achieved an AUC of 0.83, outperforming the radiomic models. In the external validation cohort, CIPHER maintained strong performance (AUC = 0.83; balanced accuracy = 81.7%), exceeding the radiomic models (DeLong p = 0.0318) and demonstrating higher specificity without sacrificing sensitivity. By contrast, the radiomic model showed high sensitivity (85.0%) but markedly lower specificity (45.8%). Confusion matrix analysis confirmed the robust classification performance of CIPHER, correctly identifying 80 of 96 non-ICI-P cases and 16 of 20 ICI-P cases. Conclusions: We developed and externally validated CIPHER for predicting future risk of ICI-P from pre-treatment CT scans. With prospective validation, CIPHER may be incorporated into routine patient management to improve outcomes.
Matching journals
The top 11 journals account for 50% of the predicted probability mass.