Back

Multi-Source Graph Synthesis (MUGS) for Pediatric Knowledge Graphs from Electronic Health Records

Li, M.; Li, X.; Pan, K.; Geva, A.; Yang, D.; Sweet, S. M.; Bonzel, C.-L.; Panickan, V. A.; Xiong, X.; Mandl, K. D.; Cai, T.

2024-01-16 pediatrics
10.1101/2024.01.14.24301302 medRxiv
Show abstract

The wealth of valuable real-world medical data found within Electronic Health Record (EHR) systems is particularly significant in the field of pediatrics, where conventional clinical studies face notably high barriers. However, constructing accurate knowledge graphs from pediatric EHR data is challenging due to its limited content density compared to EHR data for the general population. Additionally, knowledge graphs built from EHR data primarily covering adult patients may not suit the unique biomedical characteristics of pediatric patients. In this research, we introduce a graph transfer learning approach aimed at constructing precise pediatric knowledge graphs. We present MUlti-source Graph Synthesis (MUGS), an algorithm designed to create embeddings for pediatric EHR codes by leveraging information from three distinct sources: (1) pediatric EHR data, (2) EHR data from the general population, and (3) existing hierarchical medical ontology knowledge shared across different patient populations. We break down these code embeddings into shared and unshared components, facilitating the adaptive and robust capture of varying levels of heterogeneity across different medical sites through meticulous hyperparameter tuning. We assessed the quality of these code embeddings in recognizing established relationships among pediatric codes, as curated from credible online sources, pediatric physicians, or GPT. Furthermore, we developed a web API for visualizing pediatric knowledge graphs generated using MUGS embeddings and devised a phenotyping algorithm to identify patients with characteristics similar to a given profile, with a specific focus on pediatric pulmonary hypertension (PH). The MUGS-generated embeddings demonstrated resilience against negative transfer and exhibited superior performance across all three tasks when compared to pediatric-only approaches, multi-site pooling, and semantic-based methods. MUGS embeddings open up new avenues for evidence-based pediatric research utilizing EHR data.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 0.1%
49.0%
2
PLOS Digital Health
91 papers in training set
Top 0.4%
6.2%
50% of probability mass above
3
Journal of Biomedical Informatics
45 papers in training set
Top 0.3%
6.2%
4
Scientific Reports
3102 papers in training set
Top 39%
3.5%
5
npj Digital Medicine
97 papers in training set
Top 1%
3.5%
6
BioData Mining
15 papers in training set
Top 0.1%
3.2%
7
Nature Medicine
117 papers in training set
Top 1%
3.0%
8
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 25%
2.7%
9
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.9%
2.5%
10
Genome Medicine
154 papers in training set
Top 4%
2.0%
11
Science Bulletin
22 papers in training set
Top 0.2%
2.0%
12
Bioinformatics
1061 papers in training set
Top 7%
1.7%
13
PLOS ONE
4510 papers in training set
Top 62%
1.1%
14
Nature Communications
4913 papers in training set
Top 60%
0.9%
15
Database
51 papers in training set
Top 1%
0.7%
16
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
17
GigaScience
172 papers in training set
Top 3%
0.7%
18
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.6%
19
Artificial Intelligence in Medicine
15 papers in training set
Top 0.9%
0.6%
20
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%
21
JAMIA Open
37 papers in training set
Top 2%
0.6%