Back

Professionalism Pulse: Development and Validation of a Natural Language Processing Pipeline and Dashboard for Safety Culture Surveillance in NYC Health + Hospitals

Mangut, E.; Wallace, R.

2026-05-22 health informatics
10.64898/2026.05.19.26353620 medRxiv
Show abstract

Background: Professionalism and effective communication are foundational determinants of patient safety and quality of care. Unprofessional behaviors frequently serve as active precursors to adverse clinical events. However, proactive organizational surveillance is often hindered because incident feedback exists primarily as unstructured, free-text data. This study aimed to develop and validate a Natural Language Processing (NLP) pipeline and interactive dashboard to proactively monitor the "professionalism climate" within NYC Health + Hospitals, the largest municipal healthcare delivery system in the United States. Methods: A high-fidelity synthetic dataset (N=400) was computationally generated to safely mirror historical incident logs across 11 acute facilities without utilizing Protected Health Information (PHI). A rule-based NLP pipeline was developed in R utilizing the tidytext package. Unstructured narrative feedback was tokenized and classified into three core domains: Respect, Safety, and Communication. To validate the pipeline's accuracy, a 25% random stratified sample (n=100) was evaluated against independent, blinded manual coding performed by two reviewers, with inter-rater reliability measured via Cohen's Kappa. Finally, an interactive Tableau dashboard was developed to operationalize and visualize these metrics for ongoing surveillance. Results: The NLP algorithm achieved an overall accuracy of 85.8% (95% CI: 79.0-92.6), with 81.2% sensitivity and 88.9% specificity. The highest domain-specific performance was observed in Communication (88.0% accuracy). Manual validation demonstrated strong inter-rater reliability (k=0.84). Operational analysis via the dashboard revealed that 61.8% of reports occurred during the Tour 2 shift (15:00 to 23:00), aligning with peak operational volume. Furthermore, Respect-related feedback was reported at a disproportionately high frequency during the Tour 3 shift (23:00 to 07:00), accounting for over 50.7% of overnight feedback submissions. Conclusion: Rule-based NLP successfully transforms qualitative healthcare feedback into structured, actionable intelligence with high specificity. Integrating this pipeline into operational dashboards transitions safety culture surveillance from a reactive, manual exercise to a proactive, scalable system, enabling targeted, data-driven interventions by hospital leadership.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.2%
14.0%
2
Journal of Medical Internet Research
85 papers in training set
Top 0.3%
12.0%
3
JAMIA Open
37 papers in training set
Top 0.1%
9.8%
4
npj Digital Medicine
97 papers in training set
Top 0.7%
7.0%
5
JMIR Medical Informatics
17 papers in training set
Top 0.1%
6.6%
6
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.6%
4.7%
50% of probability mass above
7
BMJ Health & Care Informatics
13 papers in training set
Top 0.1%
4.7%
8
International Journal of Medical Informatics
25 papers in training set
Top 0.3%
4.1%
9
Frontiers in Digital Health
20 papers in training set
Top 0.3%
3.5%
10
Journal of Biomedical Informatics
45 papers in training set
Top 0.5%
3.5%
11
Scientific Reports
3102 papers in training set
Top 44%
2.7%
12
JMIR Public Health and Surveillance
45 papers in training set
Top 1%
2.3%
13
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 3%
1.7%
14
Journal of General Internal Medicine
20 papers in training set
Top 0.5%
1.7%
15
Healthcare
16 papers in training set
Top 0.7%
1.7%
16
The Lancet Digital Health
25 papers in training set
Top 0.5%
1.6%
17
PLOS ONE
4510 papers in training set
Top 57%
1.4%
18
BMJ Open
554 papers in training set
Top 11%
1.2%
19
JMIR Formative Research
32 papers in training set
Top 1%
0.9%
20
BMJ Open Quality
15 papers in training set
Top 0.8%
0.8%
21
CMAJ Open
12 papers in training set
Top 0.2%
0.8%
22
Heliyon
146 papers in training set
Top 7%
0.7%
23
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
24
DIGITAL HEALTH
12 papers in training set
Top 0.8%
0.6%