Back

Large Language Models Readability Classification: A Variability Analysis of Sources and Metrics

Corrale de Matos, H. G.; Wasmann, J.-W. A.; Catalani Morata, T.; de Freitas Alvarenga, K.; Bornia Jacob, L. C.

2026-03-02 public and global health
10.64898/2026.02.20.26346638 medRxiv
Show abstract

AbstractAccurate health information is ineffective if patients cannot understand it. Large Language Model (LLM) health research values veridical precision; however, linguistic accessibility remains an under-examined component of output quality and usability. This study investigated two sources of variability in readability classification: differences across LLM systems and across readability metrics. The analysis tested 1,120 data points from seven systems in English and Portuguese, comparing baseline responses with a Wikipedia-grounded condition. Content was assessed using five standard readability metrics that measure distinct aspects of text complexity. Systems were statistically homogeneous at baseline but became significantly heterogeneous under Wikipedia grounding, indicating variability in the combination of Retrieval-Augmented Generation (differential readability effects of the same source-grounding instruction across systems). Significant metric variability was observed in all conditions, showing that readability metrics are not interchangeable. Although retrieval grounding is commonly used to improve accuracy, our findings show a trade-off: verified-source grounding can yield inconsistent readability. Therefore, evaluation protocols should use transparent, vendor-agnostic criteria, with metric-specific and language-aware thresholds, and be applied whenever models or grounding configurations change to support accessible cross-language health communication.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.2%
23.3%
2
Journal of Medical Internet Research
85 papers in training set
Top 0.3%
10.5%
3
JMIRx Med
31 papers in training set
Top 0.1%
6.6%
4
PLOS Digital Health
91 papers in training set
Top 0.5%
4.5%
5
Scientific Reports
3102 papers in training set
Top 29%
4.1%
6
PLOS ONE
4510 papers in training set
Top 35%
4.1%
50% of probability mass above
7
Frontiers in Digital Health
20 papers in training set
Top 0.2%
3.7%
8
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.7%
3.7%
9
BMJ Open
554 papers in training set
Top 9%
1.7%
10
JMIR Formative Research
32 papers in training set
Top 0.9%
1.5%
11
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.4%
12
JAMA Network Open
127 papers in training set
Top 3%
1.3%
13
Journal of General Internal Medicine
20 papers in training set
Top 0.6%
1.3%
14
Research Synthesis Methods
20 papers in training set
Top 0.2%
1.3%
15
JMIR Medical Informatics
17 papers in training set
Top 1.0%
1.3%
16
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.3%
1.0%
17
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.0%
18
Frontiers in Public Health
140 papers in training set
Top 6%
1.0%
19
IEEE Access
31 papers in training set
Top 0.7%
0.9%
20
DIGITAL HEALTH
12 papers in training set
Top 0.6%
0.8%
21
npj Genomic Medicine
33 papers in training set
Top 0.7%
0.8%
22
F1000Research
79 papers in training set
Top 4%
0.8%
23
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
0.8%
24
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
25
Database
51 papers in training set
Top 0.9%
0.8%
26
Cancer Medicine
24 papers in training set
Top 1%
0.8%
27
Healthcare
16 papers in training set
Top 2%
0.7%
28
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.7%
29
Nature Human Behaviour
85 papers in training set
Top 5%
0.5%
30
Frontiers in Medicine
113 papers in training set
Top 8%
0.5%