Back

A novel tool for standardizing clinical data in a realism-based common data model

Freedman, H. G.; Williams, H.; Miller, M.; Birtwell, D.; Stoeckert, C. J.

2020-05-14 bioinformatics
10.1101/2020.05.12.091223 bioRxiv
Show abstract

Standardizing clinical information in a common data model is important for promoting interoperability and facilitating high quality research. Semantic Web technologies such as Resource Description Framework can be utilized to their full potential when a clinical data model accurately reflects the reality of the clinical situation it describes. To this end, the Open Biomedical Ontologies Foundry provides a set of ontologies that conform to the principles of realism and can be used to create a realism-based clinical data model. However, the challenge of programmatically defining such a model and loading data from disparate sources into the model has not been addressed by pre-existing software solutions. The PennTURBO Semantic Engine is a tool developed at the University of Pennsylvania that works in conjunction with data aggregation software to transform source-specific RDF data into a source-independent, realism-based data model. This system sources classes from an application ontology and specifically defines how instances of those classes may relate to each other. Additionally, the system defines and executes RDF data transformations by launching dynamically generated SPARQL update statements. The Semantic Engine was designed as a generalizable RDF data standardization tool, and is able to work with various data models and incoming data sources. Its human-readable configuration files can easily be shared between institutions, providing the basis for collaboration on a standard realism-based clinical data model.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Database
51 papers in training set
Top 0.1%
18.8%
2
PLOS ONE
4510 papers in training set
Top 9%
18.8%
3
Bioinformatics
1061 papers in training set
Top 3%
8.5%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.3%
8.5%
50% of probability mass above
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.5%
4.9%
6
Frontiers in Physiology
93 papers in training set
Top 0.9%
4.2%
7
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.6%
8
GigaScience
172 papers in training set
Top 0.7%
2.8%
9
Bioinformatics Advances
184 papers in training set
Top 2%
2.8%
10
BMC Bioinformatics
383 papers in training set
Top 4%
2.1%
11
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
12
Nucleic Acids Research
1128 papers in training set
Top 10%
1.7%
13
JAMIA Open
37 papers in training set
Top 1%
1.2%
14
Scientific Data
174 papers in training set
Top 2%
1.0%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.0%
16
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%
17
Scientific Reports
3102 papers in training set
Top 70%
0.9%
18
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.7%
19
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.9%
0.7%
20
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
21
Journal of Molecular Biology
217 papers in training set
Top 5%
0.5%
22
IEEE Access
31 papers in training set
Top 1%
0.5%
23
JMIR Medical Informatics
17 papers in training set
Top 2%
0.5%
24
BioData Mining
15 papers in training set
Top 1%
0.5%
25
Cell Systems
167 papers in training set
Top 15%
0.5%