Evaluating Sycophancy in Frontier Models Using Persona-Driven Challenge
Hazare, N. S.; Goel, N.; Yu, C.; Agaron, S.; Sharma, A.; Parchure, P.; Patel, D.; Timsina, P.; Kaplan, B.; Lampert, J.; Vakil, A.; Kovatch, P.; Darrow, B.; Glicksberg, B. S.; Charney, A.; Nadkarni, G. N.; Sakhuja, A.
Show abstract
Large language models (LLMs) are increasingly used for lay health queries, yet may abandon correct recommendations under pressure, a vulnerability termed sycophancy. We evaluated sycophancy across five frontier LLMs (Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, Grok 4.1, Gemini 3 Flash) using 200 synthetic clinical vignettes, each anchored to a unanimous correct treatment baseline and challenged by nine personas representing both vulnerable and authority roles. Overall, 7.1% of responses were sycophantic, varying tenfold across personas (1.7 to 19.3%) and sixfold across LLMs (2.4 to 15.3%). Vulnerable personas elicited more sycophantic responses, with medical student highest at the highest rate (19.3%). In adjusted Generalized Estimating Equations models, vulnerable personas continued to be independent predictors of sycophantic responses, which is a reversal of the expected authority gradient. In adjusted GEE models, persona and LLM were both independent predictors for sycophantic responses. Persona driven sycophancy evaluation should be integrated into pre deployment safety assessment of clinical LLMs.
Matching journals
The top 1 journal accounts for 50% of the predicted probability mass.