Back

A case report on gendered biases in a Finnish healthcare AI assistant

Luisto, R.; Snell, K.; Vartiainen, V.; Sanmark, E.; Äyrämö, S.

2026-04-14 health informatics
10.64898/2026.04.09.26350383 medRxiv
Show abstract

In this study, we investigate gender bias in a Retrieval-Augmented Generation (RAG) based AI assistant developed for Finnish wellbeing services counties. We tested the system using 36 clinically relevant queries, each rendered in three gendered variants (male, female, gender-neutral), and evaluated responses using both an LLM-as-a-judge approach and a human expert panel consisting of a physician and a sociologist specializing in ethics. We observed substantial and clinically significant differences across gendered variants, including differential treatment urgency, inappropriate symptom associations, and misidentification of clinical context. Female variants disproportionately framed responses around childcare and reproductive health regardless of clinical relevance, reflecting societal stereotypes rather than medical reasoning. Bias manifested both at the LLM generation stage and the RAG retrieval stage, in several cases causing the model to hallucinate responses entirely. Some bias patterns were persistent across repeated runs, while others appeared inconsistently, highlighting the challenge of distinguishing systematic bias from stochastic variation.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
25.8%
2
Scientific Reports
3102 papers in training set
Top 14%
6.8%
3
Frontiers in Digital Health
20 papers in training set
Top 0.1%
6.3%
4
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 0.9%
4.3%
5
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.7%
3.6%
6
BMJ Health & Care Informatics
13 papers in training set
Top 0.2%
3.6%
50% of probability mass above
7
PLOS Digital Health
91 papers in training set
Top 0.7%
3.6%
8
PLOS ONE
4510 papers in training set
Top 40%
3.6%
9
JMIR Medical Informatics
17 papers in training set
Top 0.4%
3.2%
10
International Journal of Medical Informatics
25 papers in training set
Top 0.5%
2.7%
11
Journal of Personalized Medicine
28 papers in training set
Top 0.1%
2.6%
12
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.3%
13
Journal of Biomedical Informatics
45 papers in training set
Top 0.7%
2.1%
14
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.9%
1.9%
15
Frontiers in Public Health
140 papers in training set
Top 5%
1.7%
16
Cureus
67 papers in training set
Top 3%
1.7%
17
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
18
iScience
1063 papers in training set
Top 16%
1.7%
19
Nature Medicine
117 papers in training set
Top 3%
1.5%
20
Frontiers in Psychiatry
83 papers in training set
Top 2%
1.5%
21
Healthcare
16 papers in training set
Top 0.8%
1.5%
22
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.7%
0.9%
23
Artificial Intelligence in Medicine
15 papers in training set
Top 0.6%
0.9%
24
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.9%
25
JAMIA Open
37 papers in training set
Top 1%
0.9%
26
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.7%
27
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
28
JMIR Formative Research
32 papers in training set
Top 2%
0.6%