Back

UPhAIR: A Hybrid Pipeline for Generating Understandable Post-hoc AI Reports in Glioma IDH Mutation Status Prediction

Gorji, A.; Shahverdi, H.; Saberi, A.; Gheiji, B.; Farahani, S.; Azemi, G.; Di Ieva, A.

2026-05-08 health informatics
10.64898/2026.05.01.26349658 medRxiv
Show abstract

Clinical adoption of machine learning (ML) in medical imaging is limited by the lack of interpretability. To address this, we present understandable post-hoc artificial intelligence reports (UPhAIR), a pipeline designed to generate transparent, evidence-based explanations by combining Shapley additive explanation (SHAP) analysis with retrieval-augmented generation (RAG) and large language models (LLMs). We trained 12 Classifiers to predict Isocitrate dehydrogenase (IDH) mutation status in glioma using radiomics and clinical features. SHAP values were used to identify key contributors to each prediction. Domain literature was collected from three sources and indexed within a RAG framework. Relevant papers were retrieved using Facebook AI similarity search (FAISS) vector similarity search and provided to Google Gemini 2.5 Pro to generate concise, reference-supported explanations for each feature. The model achieved a best AUC of 0.90{+/-}0.02 on a 5-fold cross-validation using an extreme gradient boosting (XGBoost) Classifier and a hold-out test AUC of 0.86. In a case study of a single patient excluded from training, the model correctly predicted the patient to be IDH-wildtype glioma, and SHAP identified MGMT status, age, and three radiomic features as the most influential features. UPhAIR produced a structured report combining SHAP visualizations with LLM-generated summaries grounded in scientific evidence. UPhAIR provides a practical, model-agnostic framework that enhances ML interpretability in clinical settings, helping bridge the gap between black-box AI and real-world medical decision-making.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
33.5%
2
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
8.5%
3
Scientific Reports
3102 papers in training set
Top 26%
4.4%
4
Computers in Biology and Medicine
120 papers in training set
Top 0.8%
3.6%
50% of probability mass above
5
Nature Communications
4913 papers in training set
Top 46%
2.4%
6
Nature Machine Intelligence
61 papers in training set
Top 1%
2.1%
7
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.1%
8
Artificial Intelligence in Medicine
15 papers in training set
Top 0.2%
2.1%
9
Patterns
70 papers in training set
Top 0.6%
1.9%
10
Journal of Biomedical Informatics
45 papers in training set
Top 0.8%
1.7%
11
PLOS ONE
4510 papers in training set
Top 52%
1.7%
12
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
13
The Lancet Digital Health
25 papers in training set
Top 0.6%
1.4%
14
Bioinformatics
1061 papers in training set
Top 8%
1.4%
15
Medical Image Analysis
33 papers in training set
Top 0.7%
1.4%
16
iScience
1063 papers in training set
Top 21%
1.2%
17
eBioMedicine
130 papers in training set
Top 2%
1.1%
18
Advanced Science
249 papers in training set
Top 15%
1.0%
19
NeuroImage: Clinical
132 papers in training set
Top 3%
1.0%
20
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.9%
21
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
22
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.8%
23
Med
38 papers in training set
Top 0.8%
0.8%
24
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
25
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
27
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
28
Communications Medicine
85 papers in training set
Top 1%
0.7%
29
Nature Medicine
117 papers in training set
Top 6%
0.7%
30
Scientific Data
174 papers in training set
Top 3%
0.7%