Back

Generative AI for Qualitative Analysis in a Maternal Health Study: Coding In-depth Interviews using Large Language Models (LLMs)

Qiao, S.; Fang, X.; Garrett, C.; Zhang, R.; Li, X.; Kang, Y.

2024-09-16 public and global health
10.1101/2024.09.16.24313707 medRxiv
Show abstract

Study ObjectivesThe coding of semi-structured interview transcripts is a critical step for thematic analysis of qualitative data. However, the coding process is often labor-intensive and time-consuming. The emergence of generative artificial intelligence (GenAI) presents new opportunities to enhance the efficiency of qualitative coding. This study proposed a computational pipeline using GenAI to automatically extract themes from interview transcripts. MethodsUsing transcripts from interviews conducted with maternity care providers in South Carolina, we leveraged ChatGPT for inductive coding to generate codes from interview transcripts without a predetermined coding scheme. Structured prompts were designed to instruct ChatGPT to generate and summarize codes. The performance of GenAI was evaluated by comparing the AI-generated codes with those generated manually. ResultsGenAI demonstrated promise in detecting and summarizing codes from interview transcripts. ChatGPT exhibited an overall accuracy exceeding 80% in inductive coding. More impressively, GenAI reduced the time required for coding by 81%. DiscussionGenAI models are capable of efficiently processing language datasets and performing multi-level semantic identification. However, challenges such as inaccuracy, systematic biases, and privacy concerns must be acknowledged and addressed. Future research should focus on refining these models to enhance reliability and address inherent limitations associated with their application in qualitative research.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 12%
15.0%
2
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.4%
6.4%
3
BMC Research Notes
29 papers in training set
Top 0.1%
4.4%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.6%
4.4%
5
PLOS Digital Health
91 papers in training set
Top 0.5%
4.4%
6
Scientific Reports
3102 papers in training set
Top 35%
3.7%
7
BMJ Open
554 papers in training set
Top 6%
3.1%
8
Frontiers in Digital Health
20 papers in training set
Top 0.3%
3.1%
9
Journal of Biomedical Informatics
45 papers in training set
Top 0.5%
3.1%
10
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
2.8%
50% of probability mass above
11
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.8%
12
International Journal of Medical Informatics
25 papers in training set
Top 0.5%
2.8%
13
Frontiers in Public Health
140 papers in training set
Top 3%
2.7%
14
Wellcome Open Research
57 papers in training set
Top 0.6%
2.1%
15
npj Digital Medicine
97 papers in training set
Top 2%
1.9%
16
JMIRx Med
31 papers in training set
Top 0.5%
1.8%
17
Epidemics
104 papers in training set
Top 0.9%
1.7%
18
Frontiers in Psychiatry
83 papers in training set
Top 2%
1.7%
19
Journal of Public Health
23 papers in training set
Top 0.5%
1.4%
20
F1000Research
79 papers in training set
Top 2%
1.4%
21
PLOS Computational Biology
1633 papers in training set
Top 19%
1.2%
22
Heliyon
146 papers in training set
Top 4%
1.1%
23
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
1.0%
24
PLOS Global Public Health
293 papers in training set
Top 5%
0.9%
25
JMIR Formative Research
32 papers in training set
Top 1%
0.9%
26
Biology Methods and Protocols
53 papers in training set
Top 2%
0.9%
27
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
28
American Journal of Infection Control
12 papers in training set
Top 0.3%
0.8%
29
Bioengineering
24 papers in training set
Top 1%
0.8%
30
IEEE Access
31 papers in training set
Top 0.8%
0.8%