Back

End-to-End PET/CT Interpretation and Quantification with an LLM-Orchestrated AI Agent: A Real-World Pilot Study

Choi, H.; Bae, S.; Na, K. J.

2026-02-25 radiology and imaging
10.64898/2026.02.21.26346798 medRxiv
Show abstract

BackgroundAlthough deep learning models have improved individual PET analysis, image processing and quantification tasks, end-to-end automation from raw DICOM to quantitative clinical reporting remains limited, particularly in heterogeneous real-world settings. MethodsAs a proof-of-concept, an autonomous large language model (LLM)-orchestrated multi-tool agent for end-to-end PET/CT interpretation was developed. A reasoning-based text LLM selected appropriate series from raw DICOM, coordinated registration and SUV conversion, invoked segmentation and detection tools, generated maximum-intensity projections, called a vision-enabled LLM for interpretation, and synthesized structured draft reports. The system was retrospectively evaluated in 170 patients undergoing baseline FDG PET/CT for lung cancer staging, using expert reports as reference. ResultsThe agent successfully completed the full end-to-end workflow from raw DICOM selection to structured draft report generation without human intervention in all 170 examinations. Primary tumor detection achieved 100% sensitivity. For nodal involvement, sensitivity was 84.8% and specificity was 39.4%, whereas distant metastasis detection showed 70.2% sensitivity and 65.0% specificity. Discrepancy analysis of 58 nodal and 57 metastatic mismatch cases revealed systematic false-positive findings related to reactive or physiologic uptake and false-negative findings involving small-volume or anatomically atypical metastases. ConclusionLLM-orchestrated PET/CT agents can enable workflow-level automation from raw DICOM to quantification and structured draft reporting under real-world conditions. Although primary tumor detection was highly reliable, nodal and metastatic assessment revealed systematic limitations, supporting a collaborative role with continued expert oversight in complex clinical scenarios.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
European Journal of Nuclear Medicine and Molecular Imaging
19 papers in training set
Top 0.1%
12.5%
2
npj Digital Medicine
97 papers in training set
Top 0.4%
12.5%
3
Nature Communications
4913 papers in training set
Top 21%
9.2%
4
The Lancet Digital Health
25 papers in training set
Top 0.1%
7.2%
5
Scientific Reports
3102 papers in training set
Top 18%
6.4%
6
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
6.3%
50% of probability mass above
7
European Radiology
14 papers in training set
Top 0.1%
4.9%
8
Medical Physics
14 papers in training set
Top 0.2%
3.6%
9
eBioMedicine
130 papers in training set
Top 0.3%
3.6%
10
PLOS ONE
4510 papers in training set
Top 45%
2.6%
11
Nature Machine Intelligence
61 papers in training set
Top 1%
2.1%
12
Diagnostics
48 papers in training set
Top 0.8%
1.9%
13
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
14
JAMA Network Open
127 papers in training set
Top 2%
1.7%
15
npj Precision Oncology
48 papers in training set
Top 0.7%
1.5%
16
Annals of Translational Medicine
17 papers in training set
Top 0.7%
1.5%
17
Frontiers in Oncology
95 papers in training set
Top 3%
1.1%
18
PLOS Computational Biology
1633 papers in training set
Top 21%
1.0%
19
GigaScience
172 papers in training set
Top 2%
1.0%
20
Artificial Intelligence in Medicine
15 papers in training set
Top 0.6%
0.9%
21
IEEE Access
31 papers in training set
Top 0.8%
0.9%
22
Patterns
70 papers in training set
Top 2%
0.9%
23
Frontiers in Neuroinformatics
38 papers in training set
Top 0.6%
0.9%
24
Modern Pathology
21 papers in training set
Top 0.4%
0.8%
25
iScience
1063 papers in training set
Top 32%
0.7%
26
International Journal of Radiation Oncology*Biology*Physics
21 papers in training set
Top 0.4%
0.7%
27
Nature Medicine
117 papers in training set
Top 5%
0.7%