Recovering Clinical Detail in AI-Generated Responses for Low Back Pain Through Prompt Design
Basharat, A.; Hamza, O.; Rana, P.; Odonkor, C. A.; Chow, R.
Show abstract
Introduction Large language models are increasingly being used in healthcare. In interventional pain medicine, clinical reasoning is essential for procedural planning. Prior studies show that simplified prompts reduce clinical detail in AI-generated responses. It remains unclear whether this reflects knowledge loss or simply prompt-driven suppression of information. Methods We performed a controlled comparative study using 15 standardized low back pain questions representing common interventional pain questions. Each question was submitted to ChatGPT under three conditions, professional-level prompt (DP), fourth-grade reading-level prompt (D4), and clinician-directed rewriting of the D4 response to a medical level (U4[->]MD). No follow-up prompting was allowed. Three physicians independently rated responses for accuracy using a 0-2 ordinal scale. Clinical completeness was determined by consensus. Word count and Flesch-Kincaid Grade Level (FKGL) were also measured. Paired t-tests compared conditions. Results Accuracy was highest with professional prompting (1.76). Accuracy declined with the fourth-grade prompt (1.33; p = 0.00086). When simplified responses were rewritten for clinicians, accuracy returned to baseline (1.76; p {approx} 1.00 vs DP). Clinical completeness followed the same pattern showing DP 80.0%, D4 6.7%, U4[->]MD 73.3%. Fourth-grade responses were shorter and less complex. Upscaled responses were more complex and similar in length to professional responses. Inter-rater reliability was low (Fleiss {kappa} = 0.17), but trends were consistent across conditions. Conclusions Reduced clinical detail under simplified prompts appears to reflect constrained output rather than loss of knowledge. Clinician-directed reframing restores omitted content. LLM performance in interventional pain depends strongly on prompt design and intended audience.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.