Back

Comparing human vs. machine-assisted analysis to develop a new approach for Big Qualitative Data Analysis

Martin, S.; Beecham, E.; Kursumovic, E.; Armstrong, R.; Cook, T.; Deom, N.; Kane, A.; Moniz, S.; Soar, J.; Vindrola-Padros, C.

2024-07-17 public and global health
10.1101/2024.07.16.24310275 medRxiv
Show abstract

BackgroundAnalysing large qualitative datasets can present significant challenges, including the time and resources required for manual analysis and the potential for missing nuanced insights. This paper aims to address these challenges by exploring the application of Big Qualitative (Big Qual) and artificial intelligence (AI) methods to efficiently analyse Big Qual data while retaining the depth and complexity of human understanding. The free-text responses from the Royal College of Anaesthetists 7th National Audit Project (NAP7) baseline survey on peri-operative cardiac arrest experiences serve as a case study to test and validate this approach. Methodology/Principal FindingsQuantitative analysis segmented the data and identified keywords using AI methods. In-depth sentiment and thematic analysis combined natural language processing (NLP) and machine learning (ML) with human input - researchers assigned topic/theme labels and sentiments to responses, while discourse analysis explored sub-topics and thematic diversity. Human annotation refined the machine-generated sentiments, leading to an additional "ambiguous" category to capture nuanced, mixed responses. Comparative analysis was used to evaluate the concordance between human and machine-assisted sentiment labelling. While ML reduced analysis time significantly, human input was crucial for refining sentiment categories and capturing nuances. Conclusions/SignificanceThe application of AI-assisted data analysis tools, combined with human expertise, offers a powerful approach to efficiently analyse large-scale qualitative datasets while preserving the nuance and complexity of the data. This study demonstrates the potential of this novel methodology to streamline the analysis process, reduce resource requirements, and generate meaningful insights from Big Qual data. The integration of NLP, ML, and human input allows for a more comprehensive understanding of the themes, sentiments, and experiences captured in free-text responses. This study underscores the importance of continued interdisciplinary collaboration among domain experts, data scientists, and AI specialists to optimise these methods, ensuring their reliability, validity, and ethical application in real-world contexts. Author SummaryThe use of Artificial intelligence (AI) in health research has grown over recent years. However, analysis of large qualitative datasets known as Big Qualitative Data, in public health using AI, is a relatively new area of research. Here, we use novel techniques of machine learning and natural language processing where computers learn how to handle and interpret human language, to analyse a large national survey. The Royal College of Anaesthetists 7th National Audit Project is a large UK-wide initiative examining peri- operative cardiac arrest. We use the free-text data from this survey to test and validate our novel methods and compare analysing the data by hand (human) vs. human-machine learning also known as machine-assisted analysis. Using two AI tools to conduct the analysis we found that the machine- assisted analysis significantly reduced the time to analyse the dataset. Extra human input, however, was required to provide topic expertise and nuance to the analysis. The AI tools reduced the sentiment analysis to positive, negative or neutral, but the human input introduced a fourth ambiguous category. The insights gained from this approach present ways that AI can help inform targeted interventions and quality improvement initiatives to enhance patient safety, in this case, in peri-operative cardiac arrest management.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS ONE
4510 papers in training set
Top 19%
10.1%
2
BMJ Open
554 papers in training set
Top 2%
10.1%
3
Journal of Public Health
23 papers in training set
Top 0.1%
8.4%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.3%
7.2%
5
Journal of Medical Internet Research
85 papers in training set
Top 0.7%
6.4%
6
PLOS Digital Health
91 papers in training set
Top 0.5%
4.9%
7
International Journal of Medical Informatics
25 papers in training set
Top 0.4%
3.7%
50% of probability mass above
8
Wellcome Open Research
57 papers in training set
Top 0.3%
3.6%
9
Scientific Reports
3102 papers in training set
Top 41%
3.1%
10
BMC Medicine
163 papers in training set
Top 2%
2.4%
11
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
2.1%
12
JMIRx Med
31 papers in training set
Top 0.4%
2.1%
13
Public Health in Practice
11 papers in training set
Top 0.1%
1.9%
14
Frontiers in Public Health
140 papers in training set
Top 4%
1.9%
15
BMJ Health & Care Informatics
13 papers in training set
Top 0.4%
1.7%
16
BMC Medical Research Methodology
43 papers in training set
Top 0.7%
1.5%
17
Frontiers in Digital Health
20 papers in training set
Top 0.7%
1.5%
18
BMC Research Notes
29 papers in training set
Top 0.2%
1.3%
19
npj Digital Medicine
97 papers in training set
Top 3%
1.1%
20
Health Expectations
12 papers in training set
Top 0.6%
0.9%
21
F1000Research
79 papers in training set
Top 3%
0.9%
22
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
23
JMIR Formative Research
32 papers in training set
Top 2%
0.8%
24
BMJ Open Quality
15 papers in training set
Top 0.8%
0.7%
25
JMIR Public Health and Surveillance
45 papers in training set
Top 4%
0.7%
26
PLOS Global Public Health
293 papers in training set
Top 6%
0.7%
27
Bioengineering
24 papers in training set
Top 1%
0.7%
28
EClinicalMedicine
21 papers in training set
Top 1%
0.7%
29
BMJ Global Health
98 papers in training set
Top 3%
0.7%
30
JMIR Medical Informatics
17 papers in training set
Top 2%
0.7%