Back

Generation of Synthetic Data in Health Surveys Using Large Language Models

2026-01-30 health informatics Title + abstract only
View on medRxiv
Show abstract

BackgroundGenerating synthetic data using artificial intelligence, such as large language models (LLMs), is a useful strategy in public health because it can reduce time and costs, expand access to data, and facilitate information sharing without compromising confidentiality. ObjectiveTo evaluate the consistency and psychometric plausibility of synthetic data generated by an LLM to simulate the responses of survey participants (user personas) in a national health survey in Peru. MethodsWe cond...

Predicted journal destinations

1
PLOS ONE
1737 training papers
Top 48% 14.0%
2
Scientific Reports
701 training papers
Top 14% 11.3%
3
Journal of the American Medical Informatics Association
53 training papers
Top 3% 7.5%
4
BMJ Open
553 training papers
Top 27% 6.2%
5
JAMIA Open
35 training papers
Top 2% 6.2%
6
Journal of Biomedical Informatics
37 training papers
Top 2% 5.7%
7
Journal of Medical Internet Research
81 training papers
Top 3% 5.7%
8
BMC Medical Research Methodology
41 training papers
Top 0.3% 4.8%
9
PLOS Digital Health
88 training papers
Top 5% 4.8%
10
BMC Medical Informatics and Decision Making
36 training papers
Top 3% 4.8%
11
npj Digital Medicine
85 training papers
Top 7% 3.7%
12
JMIR Medical Informatics
16 training papers
Top 3% 1.9%
13
Nature Communications
483 training papers
Top 39% 1.9%
14
International Journal of Medical Informatics
25 training papers
Top 4% 1.9%
15
Proceedings of the National Academy of Sciences
100 training papers
Top 9% 1.5%
16
JMIR Public Health and Surveillance
45 training papers
Top 3% 1.3%
17
Frontiers in Public Health
135 training papers
Top 24% 1.3%
18
JMIR Formative Research
31 training papers
Top 4% 1.1%
19
Frontiers in Digital Health
18 training papers
Top 3% 0.9%
20
BMC Medicine
155 training papers
Top 37% 0.7%
21
International Journal of Environmental Research and Public Health
116 training papers
Top 34% 0.7%
22
JAMA Network Open
125 training papers
Top 32% 0.7%
23
PLOS Computational Biology
141 training papers
Top 15% 0.7%
24
Patterns
15 training papers
Top 4% 0.7%
25
Communications Medicine
63 training papers
Top 9% 0.7%