Back

NeuroDiscovery AI database: Comprehensive EHR dataset for Neurology

Selveshwari, S.; Suri, S.; Moodalagiri, S.; Krishna, A.; Kasivajjala, N. C.

2025-06-01 neurology
10.1101/2025.06.01.25328302 medRxiv
Show abstract

PurposeThe NeuroDiscovery AI database is a comprehensive real-world data (RWD) repository containing de-identified electronic health record (EHR) data from U.S.-based neurology outpatient clinics. The structured data encompasses sociodemographic details, clinical examinations, social, medical, and lifestyle histories, International Classification of Diseases (ICD-9/ICD-10) diagnoses, and prescribed medications. Additionally, the database integrates neuroimaging data and laboratory results, providing a robust resource for clinical research. This paper describes a subset of the NeuroDiscovery AI dataset and outlines the processes involved in its development. ParticipantsAs of October 15, 2024, the dataset includes EHR data from 355,791 patients, of whom 40.72% are male. Over 40.06% of the patients are aged 60 or older, spanning across 14,797 distinct diagnosis codes. The data represents more than 15 years of longitudinal patient information, with 26.87% of patients classified as active (defined as having had clinical encounters within the last 18 months). The median follow-up duration for active patients is 19.54 months. LimitationThe large sample size, rigorous data processing, and robust data security of the NeuroDiscovery AI dataset are key strengths, enabling comprehensive studies on disease progression, treatment responses, and long-term outcomes in neurology. The dataset aligns closely with published demographic trends for various neurological conditions, including a female predominance in migraines, multiple sclerosis, and vertigo, with slight variations in age and gender distribution for conditions such as ALS. However, challenges remain, including missing data and data heterogeneity. Ongoing efforts to expand and diversify the dataset aim to improve its applicability and representativeness. Future planThe NeuroDiscovery AI dataset will expand by incorporating data from more providers and improving diversity, aiming to become one of the largest neurology-focused datasets. The platform will continue to evolve into a comprehensive analytical tool, integrating cohort building and data interrogation functionalities to streamline clinical workflows. These enhancements will enable faster, more accurate decision-making, and future efforts will focus on identifying key trends in neurological conditions and patient outcomes.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Frontiers in Neurology
91 papers in training set
Top 0.1%
22.1%
2
Journal of the Neurological Sciences
17 papers in training set
Top 0.1%
6.7%
3
BMC Neurology
12 papers in training set
Top 0.1%
6.2%
4
BMJ Open
554 papers in training set
Top 4%
4.8%
5
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.6%
4.2%
6
Scientific Data
174 papers in training set
Top 0.4%
4.2%
7
Annals of Clinical and Translational Neurology
29 papers in training set
Top 0.2%
4.1%
50% of probability mass above
8
Neuroinformatics
40 papers in training set
Top 0.3%
3.5%
9
Orphanet Journal of Rare Diseases
18 papers in training set
Top 0.1%
3.5%
10
Scientific Reports
3102 papers in training set
Top 42%
3.0%
11
PLOS ONE
4510 papers in training set
Top 43%
3.0%
12
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.7%
13
Frontiers in Aging Neuroscience
67 papers in training set
Top 2%
1.5%
14
npj Digital Medicine
97 papers in training set
Top 2%
1.3%
15
Frontiers in Neuroinformatics
38 papers in training set
Top 0.5%
1.3%
16
Frontiers in Digital Health
20 papers in training set
Top 0.9%
1.2%
17
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.1%
18
Journal of Neurology
26 papers in training set
Top 0.9%
1.1%
19
PeerJ
261 papers in training set
Top 13%
0.9%
20
Journal of NeuroEngineering and Rehabilitation
28 papers in training set
Top 0.8%
0.9%
21
Gigabyte
60 papers in training set
Top 1%
0.8%
22
Journal of Neurology, Neurosurgery & Psychiatry
29 papers in training set
Top 1%
0.8%
23
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
24
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 1%
0.7%
25
Frontiers in Neuroscience
223 papers in training set
Top 8%
0.7%
26
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
27
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.3%
0.7%
28
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.7%
29
JMIR Formative Research
32 papers in training set
Top 2%
0.7%
30
Journal of the American Heart Association
119 papers in training set
Top 4%
0.7%