Race, Ethnicity and Their Implication on Bias in Large Language Models

Hu, S.; Li, R.; Gao, Y.

2026-01-05 health informatics

10.64898/2026.01.04.26343415 medRxiv

Show abstract

Large language models (LLMs) increasingly operate in high-stakes settings including healthcare and medicine, where demographic attributes such as race and ethnicity may be explicitly stated or implicitly inferred from text. However, existing studies primarily document outcome-level disparities, offering limited insight into internal mechanisms underlying these effects. We present a mechanistic study of how race and ethnicity are represented and operationalized within LLMs. Using two publicly available datasets spanning toxicity-related generation and clinical narrative understanding tasks, we analyze three open-source models with a re-producible interpretability pipeline combining probing, neuron-level attribution, and targeted intervention. We find that demographic information is distributed across internal units with substantial cross-model variation. Although some units encode sensitive or stereotype-related associations from pretraining, identical demographic cues can induce qualitatively different behaviors. Interventions suppressing such neurons reduce bias but leave substantial residual effects, suggesting behavioral rather than representational change and motivating more systematic mitigation.

Matching journals

●Non-profit ◐University press ○Commercial

The top 11 journals account for 50% of the predicted probability mass.

Only show non-profit

Nature Communications

○ 4913 papers in training set

Scientific Reports

○ 3102 papers in training set

○ 575 papers in training set

Nature Machine Intelligence

○ 61 papers in training set

◐ 147 papers in training set

Philosophical Transactions of the Royal Society B

● 51 papers in training set

Science Advances

● 1098 papers in training set

Nature Medicine

○ 117 papers in training set

● 5422 papers in training set

● 429 papers in training set

Proceedings of the National Academy of Sciences

● 2130 papers in training set

50% of probability mass above

Nature Biomedical Engineering

○ 42 papers in training set

npj Digital Medicine

○ 97 papers in training set

○ 167 papers in training set

Nature Human Behaviour

○ 85 papers in training set

PLOS Computational Biology

● 1633 papers in training set

◐ 189 papers in training set

Science Translational Medicine

● 111 papers in training set

Cell Reports Medicine

○ 140 papers in training set

○ 38 papers in training set

○ 282 papers in training set

○ 1063 papers in training set

● 4510 papers in training set

Communications Psychology

○ 20 papers in training set

Journal of the American Medical Informatics Association

◐ 61 papers in training set

Annals of Internal Medicine

● 27 papers in training set

Communications Biology

○ 886 papers in training set

Nature Neuroscience

○ 216 papers in training set

Advanced Science

○ 249 papers in training set

Communications Medicine

○ 85 papers in training set