Back

Assessing feasibility and risk to translate, de-identify and summarize medical letters using deep learning

Gauthier, L. W.; Willems, M.; Chatron, N.; Cenni, C.; Meyer, P.; Ruault, V.; Wells, C.; Sabbagh, Q.; Genevieve, D.; Yauy, K.

2023-07-31 genetic and genomic medicine
10.1101/2023.07.27.23293234 medRxiv
Show abstract

BackgroundPrecision medicine requires accurate phenotyping and data sharing, particularly for rare diseases. However, sharing medical reports across language barriers is challenging. Alternatively, inconsistent and incomplete clinical summary provided by physicians using Human Phenotype Ontology (HPO) can lead to a loss of clinical information. MethodsTo assess feasibility and risk of using deep learning methods to translate, de-identify and summarize medical reports, we developed an open-source deep learning multi-language software in line with health data privacy. We conducted a non-inferiority clinical trial using deep learning methods to de-identify protected health information (PHI) targeting a minimum sensitivity of 90% and specificity of 75%, and summarize non-English medical reports in HPO format, aiming a sensitivity of 75% and specificity of 90%. ResultsFrom March to April 2023, we evaluated 50 non-English medical reports from 8 physicians and 12 different groups of diseases, which included neurodevelopmental disorders, congenital disorders, fetal pathology and oncology. Reports contain in median 15 PHI and 7 HPO terms. Deep learning method achieved a sensitivity of 99% and a specificity of 87% in de-identification, and a sensitivity of 78% and a specificity of 92% in summarizing medical reports, reporting an average number of 6.6 HPO terms per report, which is equivalent to the number of HPO terms provided usually by physicians in databases (6.8 in PhenoDB). ConclusionsDe-identification and summarization of non-English medical reports using deep learning methods reports non-inferior performance, providing insights on AI usage to facilitate precision medicine. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=145 HEIGHT=200 SRC="FIGDIR/small/23293234v3_ufig1.gif" ALT="Figure 1"> View larger version (44K): org.highwire.dtl.DTLVardef@1cb8d9borg.highwire.dtl.DTLVardef@bddee9org.highwire.dtl.DTLVardef@175af12org.highwire.dtl.DTLVardef@138fddb_HPS_FORMAT_FIGEXP M_FIG Illustration of the non-inferiority trial for de-identification and summarization of non-english medical reports and main statistical performances. C_FIG

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
52.5%
50% of probability mass above
2
Journal of Biomedical Informatics
45 papers in training set
Top 0.2%
6.4%
3
Genetics in Medicine
69 papers in training set
Top 0.3%
6.4%
4
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
3.6%
5
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
6
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.8%
3.6%
7
JAMIA Open
37 papers in training set
Top 1.0%
1.3%
8
JAMA Network Open
127 papers in training set
Top 3%
1.3%
9
PLOS ONE
4510 papers in training set
Top 60%
1.2%
10
Bioinformatics
1061 papers in training set
Top 8%
1.2%
11
Scientific Reports
3102 papers in training set
Top 68%
1.1%
12
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.9%
13
BMC Medical Genomics
36 papers in training set
Top 1.0%
0.9%
14
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.8%
15
iScience
1063 papers in training set
Top 31%
0.8%
16
BioData Mining
15 papers in training set
Top 0.9%
0.8%
17
Nature Human Behaviour
85 papers in training set
Top 5%
0.7%
18
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
19
eBioMedicine
130 papers in training set
Top 5%
0.6%
20
Journal of Translational Medicine
46 papers in training set
Top 4%
0.5%
21
Frontiers in Oncology
95 papers in training set
Top 4%
0.5%
22
PLOS Digital Health
91 papers in training set
Top 3%
0.5%
23
GigaScience
172 papers in training set
Top 4%
0.5%