AI-based radiomics for pancreatic cysts: high diagnostic performance amid a persistent translational gap

Lettner, J. D.; Evrenoglou, T.; Binder, H.; Fichtner-Feigl, S.; Neubauer, C.; Ruess, D. A.

2026-02-12 radiology and imaging

10.64898/2026.02.10.26345995 medRxiv

Show abstract

BackgroundAI-based radiomics has demonstrated promising diagnostic performance for pancreatic cystic neoplasms, yet clinical translation remains limited. Whether this reflects insufficient model performance or structural limitations of the evidence base remains unclear. MethodsWe performed a systematic review and diagnostic test accuracy meta-analysis of AI-based radiomics in pancreatic cyst (2015-2025), addressing two clinically relevant tasks (Q1: cyst type differentiation/Q2: malignancy or high-grade dysplasia prediction). Training and validation datasets were synthesized independently using hierarchical models. Study evaluation extended beyond diagnostic performance to a four-dimensional framework integrating RQS 2.0, METRICS, TRIPOD+AI and PROBAST+AI explicitly contrasting pooled diagnostic performance with reporting quality, methodological rigor, and risk of bias. The review was pre-registered (PROSPERO) and conducted according to PRISMA 2020. ResultsTwenty-nine studies were included (Q1: n = 15; Q2: n = 14), predominantly retrospective and single center. Training-based analyses showed high apparent diagnostic performance for Q1 (pooled sensitivity/specificity: 0.89 [95% CI, 0.85-0.92]/ 0.90 [0.85-0.93]), but there was substantial heterogeneity ({tau}{superscript 2} = 0.56/0.78; {rho} = 0.38). Validation-based performance remained high (0.86 [0.82-0.89]/ 0.88 [0.81-0.93]), while heterogeneity persisted and prediction regions exceeded confidence regions. Training-based analyses demonstrated similarly high apparent performance (0.88 [0.79-0.95]/0.89 [0.81-0.94]) for Q2, with pronounced heterogeneity ({tau}{superscript 2} = 1.98/1.61; {rho} = 0.63). Validation-based performance was slightly lower, yet still clinically comparable (0.82 [0.75-0.89]/0.86 [0.80-0.91]), and heterogeneity persisted ({tau}{superscript 2} = 0.71/0.43; {rho} = 0.15). Across both tasks, high diagnostic accuracy occurred alongside incomplete reporting, limited validation and an elevated risk of bias. ConclusionAI-based radiomics for pancreatic cysts has reached a structural performance plateau. Further improvements in diagnostic accuracy alone are insufficient to achieve clinical translation and must be accompanied by a paradigm shift from performance-driven model development toward decision-anchored study designs, robust validation strategies, transparent reporting standard, and clinically integrated evaluation frameworks. SummaryAlthough pancreatic cystic lesions are increasingly being detected, imaging-based decision-making remains limited, particularly regarding differentiating between cyst types and stratifying malignancy risk. In this PRISMA-compliant and PROSPERO-registered systematic review and meta-analysis of diagnostic tests, we evaluated the use of AI-based radiomics for these two tasks, as well as its contextualized performance. In addition, a four-dimensional framework was employed to conduct the evaluation, incorporating diagnostic accuracy, reporting quality, risk of bias, and radiomics maturity. Across studies published between 2015 and 2025, the pooled diagnostic performance was consistently high, with only modest declines observed from the training to the validation stage. Nevertheless, considerable heterogeneity between studies and limited transportability remained evident. Multidimensional evaluation indicated a systematic dissociation between reported performance and methodological robustness, characterized by incomplete reporting, restricted validation, and an elevated risk of bias. These limitations were consistent across both clinical questions and were not resolved by increasing model complexity. The findings of this meta-analysis suggest that the structural performance of AI-based radiomics for pancreatic cysts has plateaued. To progress towards clinical translation, it is necessary to employ study designs anchored in decision-making processes, robust multi-center validation, and transparent, reproducible evaluation frameworks. This is preferred to further optimization of model architecture alone.

AI-based radiomics for pancreatic cysts: high diagnostic performance amid a persistent translational gap

Matching journals