Back

Development of a natural language processing application to extract and categorize mentions of violence from mental healthcare records text

Li, L.; Sondh, S.; Sondh, H. K.; Stewart, R.; Roberts, A.

2026-03-26 health informatics
10.64898/2026.03.22.26348435 medRxiv
Show abstract

BackgroundExperiences of violence are reported frequently by mental health service users, victims of violence are at a greater risk of mental health disorders, and violence may sometimes occur as a consequence of a mental disorder. Electronic health records (EHRs) are an important source of information about healthcare, and its social context. Occurrences of violence are not routinely recorded as structured data in EHRs but are however recorded in the free text narrative. ObjectiveOur objective was to address this research gap by creating a natural language processing (NLP) application that extracts information related to various forms of violence (physical (non-sexual), sexual, emotional, and financial) from the EHR of a large south London mental health service. Additionally, we aimed to extract features concerning the patients role (victimization vs. perpetration), timing (recent vs. historic), domestic context, presence (actual, threat, or unclear), and polarity (affirmed, abstract, or negated) of the violent behaviors. MethodsTwo raters independently annotated 6,500 randomly selected segments of clinical notes containing violence-related keywords from a large mental healthcare provider in South London, each containing 400 characters (with approximately 200 characters before and after the keyword) after rigorous training using a pre-defined and approved coding book provided by senior professionals. We utilized 90% of the annotated data for fine-tuning a multi-label BERT model (employing 5-fold cross-validation) with the remaining 10% of data reserved for a blind test. ResultsThe model performed well on the blind test set for emotional violence (F1= 0.89), financial violence (0.88), physical (non-sexual) violence (0.84), and unspecified violence (0.81), and the patient role (0.89 as perpetrator; 0.84 as victim), polarity (0.89 for affirmed behavior), presence (0.95 for actual violence), and domestic settings (0.88). We were unable to achieve satisfactory results in capturing temporal aspects (0.65 for past violence). ConclusionsWe were able to improve substantially on previously developed NLP for ascertaining violence in routine mental health records, providing novel opportunities for both surveillance and research.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
International Journal of Medical Informatics
25 papers in training set
Top 0.1%
14.1%
2
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.2%
10.3%
3
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
9.0%
4
Journal of Medical Internet Research
85 papers in training set
Top 0.8%
6.2%
5
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
6.2%
6
BMJ Open
554 papers in training set
Top 4%
4.8%
50% of probability mass above
7
PLOS ONE
4510 papers in training set
Top 35%
4.1%
8
JAMIA Open
37 papers in training set
Top 0.4%
3.5%
9
Frontiers in Digital Health
20 papers in training set
Top 0.3%
3.5%
10
BJPsych Open
25 papers in training set
Top 0.2%
2.7%
11
JMIR Medical Informatics
17 papers in training set
Top 0.4%
2.7%
12
Scientific Reports
3102 papers in training set
Top 46%
2.6%
13
Acta Neuropsychiatrica
12 papers in training set
Top 0.3%
2.0%
14
BMJ Health & Care Informatics
13 papers in training set
Top 0.4%
1.7%
15
npj Digital Medicine
97 papers in training set
Top 2%
1.7%
16
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.5%
1.2%
17
BMC Medical Research Methodology
43 papers in training set
Top 0.9%
1.2%
18
eClinicalMedicine
55 papers in training set
Top 1%
0.9%
19
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
0.9%
20
Frontiers in Psychiatry
83 papers in training set
Top 3%
0.9%
21
Psychiatry Research
35 papers in training set
Top 1%
0.9%
22
JAMA Pediatrics
10 papers in training set
Top 0.1%
0.9%
23
Wellcome Open Research
57 papers in training set
Top 2%
0.8%
24
BMC Health Services Research
42 papers in training set
Top 2%
0.8%
25
JMIR Formative Research
32 papers in training set
Top 2%
0.8%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
27
International Journal of Drug Policy
11 papers in training set
Top 0.3%
0.7%
28
Nature Medicine
117 papers in training set
Top 5%
0.7%
29
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.4%
0.7%
30
Artificial Intelligence in Medicine
15 papers in training set
Top 0.8%
0.7%