Back

Distinguishing GPT-4-generated Radiology Abstracts from Original Abstracts: Performance of Blinded Human Observers and AI Content Detector

Ufuk, F.; Peker, H.; Sagtas, E.; Yagci, A. B.

2023-05-03 radiology and imaging
10.1101/2023.04.28.23289283
Show abstract

ObjectiveTo determine GPT-4s effectiveness in writing scientific radiology article abstracts and investigate human reviewers and AI Content detectors success in distinguishing these abstracts. Additionally, to determine the similarity scores of abstracts generated by GPT-4 to better understand its ability to create unique text. MethodsThe study collected 250 original articles published between 2021 and 2023 in five radiology journals. The articles were randomly selected, and their abstracts were generated by GPT-4 using a specific prompt. Three experienced academic radiologists independently evaluated the GPT-4 generated and original abstracts to distinguish them as original or generated by GPT-4. All abstracts were also uploaded to an AI Content Detector and plagiarism detector to calculate similarity scores. Statistical analysis was performed to determine discrimination performance and similarity scores. ResultsOut of 134 GPT-4 generated abstracts, average of 75 (56%) were detected by reviewers, and average of 50 (43%) original abstracts were falsely categorized as GPT-4 generated abstracts by reviewers. The sensitivity, specificity, accuracy, PPV, and NPV of observers in distinguishing GPT-4 written abstracts ranged from 51.5% to 55.6%, 56.1% to 70%, 54.8% to 60.8%, 41.2% to 76.7%, and 47% to 62.7%, respectively. No significant difference was observed between observers in discrimination performance. ConclusionGPT-4 can generate convincing scientific radiology article abstracts. However, human reviewers and AI Content detectors have difficulty in distinguishing GPT-4 generated abstracts from original ones.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
European Radiology
based on 11 papers
Top 0.1%
21.4%
2
Scientific Reports
based on 701 papers
Top 19%
8.2%
3
PLOS ONE
based on 1737 papers
Top 65%
5.8%
4
Medicine
based on 29 papers
Top 0.9%
5.5%
5
Annals of Translational Medicine
based on 14 papers
Top 0.3%
5.1%
6
Diagnostics
based on 36 papers
Top 0.5%
4.9%
50% of probability mass above
7
Cureus
based on 64 papers
Top 6%
2.6%
8
Stroke: Vascular and Interventional Neurology
based on 12 papers
Top 0.9%
2.5%
9
BMC Cancer
based on 21 papers
Top 2%
2.4%
10
Journal of Clinical Medicine
based on 77 papers
Top 6%
2.4%
11
Frontiers in Oncology
based on 34 papers
Top 4%
2.0%
12
Heliyon
based on 57 papers
Top 4%
2.0%
13
Journal of Magnetic Resonance Imaging
based on 10 papers
Top 1%
1.9%
14
Informatics in Medicine Unlocked
based on 11 papers
Top 0.9%
1.9%
15
Cancers
based on 57 papers
Top 5%
1.7%
16
Neuro-Oncology Advances
based on 14 papers
Top 1%
1.4%
17
Journal of the American Medical Informatics Association
based on 53 papers
Top 5%
1.4%
18
Brain and Behavior
based on 19 papers
Top 3%
1.3%
19
Radiotherapy and Oncology
based on 11 papers
Top 1%
1.3%
20
Computers in Biology and Medicine
based on 39 papers
Top 5%
1.3%
21
Archives of Clinical and Biomedical Research
based on 18 papers
Top 2%
0.9%
22
PLOS Digital Health
based on 88 papers
Top 12%
0.7%
23
BMJ Open
based on 553 papers
Top 51%
0.7%
24
npj Digital Medicine
based on 85 papers
Top 13%
0.7%
25
Frontiers in Neurology
based on 74 papers
Top 12%
0.7%