Back

Recovering Clinical Detail in AI-Generated Responses for Low Back Pain Through Prompt Design

Basharat, A.; Hamza, O.; Rana, P.; Odonkor, C. A.; Chow, R.

2026-04-23 pain medicine
10.64898/2026.04.21.26351437 medRxiv
Show abstract

Introduction Large language models are increasingly being used in healthcare. In interventional pain medicine, clinical reasoning is essential for procedural planning. Prior studies show that simplified prompts reduce clinical detail in AI-generated responses. It remains unclear whether this reflects knowledge loss or simply prompt-driven suppression of information. Methods We performed a controlled comparative study using 15 standardized low back pain questions representing common interventional pain questions. Each question was submitted to ChatGPT under three conditions, professional-level prompt (DP), fourth-grade reading-level prompt (D4), and clinician-directed rewriting of the D4 response to a medical level (U4[->]MD). No follow-up prompting was allowed. Three physicians independently rated responses for accuracy using a 0-2 ordinal scale. Clinical completeness was determined by consensus. Word count and Flesch-Kincaid Grade Level (FKGL) were also measured. Paired t-tests compared conditions. Results Accuracy was highest with professional prompting (1.76). Accuracy declined with the fourth-grade prompt (1.33; p = 0.00086). When simplified responses were rewritten for clinicians, accuracy returned to baseline (1.76; p {approx} 1.00 vs DP). Clinical completeness followed the same pattern showing DP 80.0%, D4 6.7%, U4[->]MD 73.3%. Fourth-grade responses were shorter and less complex. Upscaled responses were more complex and similar in length to professional responses. Inter-rater reliability was low (Fleiss {kappa} = 0.17), but trends were consistent across conditions. Conclusions Reduced clinical detail under simplified prompts appears to reflect constrained output rather than loss of knowledge. Clinician-directed reframing restores omitted content. LLM performance in interventional pain depends strongly on prompt design and intended audience.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
PLOS Digital Health
91 papers in training set
Top 0.1%
22.9%
2
Frontiers in Digital Health
20 papers in training set
Top 0.1%
14.6%
3
Scientific Reports
3102 papers in training set
Top 16%
6.5%
4
npj Digital Medicine
97 papers in training set
Top 0.9%
4.9%
5
BMJ Open Quality
15 papers in training set
Top 0.2%
4.4%
50% of probability mass above
6
PLOS ONE
4510 papers in training set
Top 34%
4.2%
7
Journal of Medical Internet Research
85 papers in training set
Top 1%
3.7%
8
JMIR Formative Research
32 papers in training set
Top 0.3%
3.7%
9
BMJ Open
554 papers in training set
Top 6%
3.1%
10
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.2%
2.8%
11
British Journal of Anaesthesia
14 papers in training set
Top 0.3%
2.1%
12
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.7%
13
BMC Neurology
12 papers in training set
Top 0.4%
1.7%
14
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.5%
15
Healthcare
16 papers in training set
Top 0.9%
1.4%
16
Journal of Neuroscience Methods
106 papers in training set
Top 1%
1.4%
17
Clinical and Translational Science
21 papers in training set
Top 0.6%
1.2%
18
Cancer Medicine
24 papers in training set
Top 1%
0.8%
19
Heliyon
146 papers in training set
Top 5%
0.8%
20
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.8%
21
Journal of General Internal Medicine
20 papers in training set
Top 1.0%
0.8%
22
iScience
1063 papers in training set
Top 33%
0.7%
23
Frontiers in Human Neuroscience
67 papers in training set
Top 3%
0.7%
24
Cureus
67 papers in training set
Top 6%
0.7%
25
PLOS Biology
408 papers in training set
Top 24%
0.5%
26
Pain
70 papers in training set
Top 0.9%
0.5%