Back

Delving into PubMed Records: Some Terms in Medical Writing Have Drastically Changed after the Arrival of ChatGPT

Matsui, K.

2024-05-16 health informatics
10.1101/2024.05.14.24307373 medRxiv
Show abstract

IntroductionIt is estimated that large language models (LLMs) including ChatGPT is already widely used in academic paper writing. This study aims to investigate whether the usage of specific terminologies has increased, focusing on words and phrases frequently reported as overused by ChatGPT. MethodsA list of 142 potentially AI-influenced terms was curated from online discussions and recent literature documenting LLM vocabulary patterns, while 84 common academic terms in the medical field were used as controls. PubMed records from 2000 to 2024 were analyzed to track the frequency of these terms. Usage trends were normalized using a modified Z-score transformation. ResultsAmong the potentially AI-influenced terms, 100 displayed a meaningful increase (modified Z-score [&ge;] 3.5) in usage in 2024. The linear mixed-effects model showed a significant effect of potentially AI-influenced terms on usage frequency compared to common academic phrases (p < 0.001); the usage of potentially AI-influenced terms showed a noticeable increase starting in 2020. DiscussionThis study revealed that certain words, such as "delve," "underscore," "meticulous," "boast," and "commendable," have been used more frequently in medical and biological fields since the introduction of ChatGPT. The usage of these terms had already been increasing prior to ChatGPTs release, suggesting that ChatGPT accelerated the popularity of expressions already gaining traction. The identified terms can inform medical educators aiming to enhance awareness of language trends and promote best practices among trainees using LLMs.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 15%
12.7%
2
Journal of Medical Internet Research
85 papers in training set
Top 0.4%
10.3%
3
Healthcare
16 papers in training set
Top 0.1%
9.3%
4
International Journal of Medical Informatics
25 papers in training set
Top 0.2%
6.5%
5
Scientific Reports
3102 papers in training set
Top 17%
6.4%
6
BMC Bioinformatics
383 papers in training set
Top 3%
3.1%
7
PLOS Digital Health
91 papers in training set
Top 0.9%
2.8%
50% of probability mass above
8
Biology Methods and Protocols
53 papers in training set
Top 0.7%
1.9%
9
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.9%
10
Journal of Personalized Medicine
28 papers in training set
Top 0.3%
1.8%
11
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.8%
12
Bioinformatics
1061 papers in training set
Top 7%
1.7%
13
JAMIA Open
37 papers in training set
Top 0.8%
1.7%
14
Acta Neuropsychiatrica
12 papers in training set
Top 0.4%
1.7%
15
BMJ Health & Care Informatics
13 papers in training set
Top 0.4%
1.7%
16
Artificial Intelligence in Medicine
15 papers in training set
Top 0.4%
1.5%
17
Frontiers in Digital Health
20 papers in training set
Top 0.9%
1.2%
18
PeerJ
261 papers in training set
Top 11%
1.1%
19
Medicine
30 papers in training set
Top 2%
1.0%
20
European Respiratory Journal
54 papers in training set
Top 2%
0.8%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
22
JMIR Medical Informatics
17 papers in training set
Top 1%
0.8%
23
JAMA
17 papers in training set
Top 0.3%
0.8%
24
DIGITAL HEALTH
12 papers in training set
Top 0.7%
0.8%
25
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.7%
0.8%
26
Heliyon
146 papers in training set
Top 6%
0.8%
27
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
28
Cancer Medicine
24 papers in training set
Top 1%
0.7%
29
Journal of Clinical Medicine
91 papers in training set
Top 7%
0.7%
30
BMC Research Notes
29 papers in training set
Top 0.6%
0.7%