Back

A machine-learning evaluation of biomarkers designed for the future of precision medicine

Climer, S.

2023-07-12 health informatics
10.1101/2023.07.09.23292430 medRxiv
Show abstract

Precision medicine is cognizant of the impact of genetics and environments on subtypes of heterogeneous diseases and aims to identify, diagnose, and treat each subtype appropriately. Real-valued biomarkers, such as protein levels in plasma, are key for practical subtype diagnoses and hold potential to elucidate subtypes and illuminate promising drug targets. Biomarkers that are common across all subtypes have been discovered using fold change (FC) and the area under the receiver operating characteristic curve (AUC). However, FC and AUC fail to identify biomarkers for subtypes when they comprise less than half of the disease group. We present here a machine-learning biomarker evaluation method based on clustering of the data points, referred to as Difference in Bicluster Distances (DBD). We contribute efficient, yet optimal, software coupled with rigorous validation techniques, and demonstrate our approach on a late-onset Alzheimer disease (AD) gene expression dataset. Our trials produced four significant genes and appropriate thresholds for biomarker diagnostics. While none of these genes were identified as significant by either FC or AUC for the given dataset, the genes have been independently associated with AD or neurological disorders by other groups using completely independent means. In summary, DBD provides a unique and effective method for screening real-valued data to identify biomarkers associated with subtypes of heterogeneous diseases.

Matching journals

The top 12 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 5%
10.6%
2
Bioinformatics
1061 papers in training set
Top 4%
6.9%
3
Patterns
70 papers in training set
Top 0.1%
4.9%
4
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
4.9%
5
PLOS ONE
4510 papers in training set
Top 33%
4.4%
6
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.5%
4.0%
7
npj Digital Medicine
97 papers in training set
Top 1%
2.8%
8
Science Translational Medicine
111 papers in training set
Top 1%
2.8%
9
eBioMedicine
130 papers in training set
Top 0.5%
2.7%
10
Frontiers in Aging Neuroscience
67 papers in training set
Top 1%
2.4%
11
Neurobiology of Aging
95 papers in training set
Top 1%
2.1%
12
Nature Communications
4913 papers in training set
Top 46%
2.1%
50% of probability mass above
13
NeuroImage: Clinical
132 papers in training set
Top 2%
2.1%
14
Alzheimer's Research & Therapy
52 papers in training set
Top 1%
1.8%
15
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
16
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
17
Brain Communications
147 papers in training set
Top 2%
1.7%
18
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
19
Nature Machine Intelligence
61 papers in training set
Top 2%
1.5%
20
Communications Biology
886 papers in training set
Top 12%
1.4%
21
Advanced Science
249 papers in training set
Top 13%
1.4%
22
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.4%
23
Annals of Neurology
57 papers in training set
Top 1%
1.2%
24
PLOS Computational Biology
1633 papers in training set
Top 19%
1.2%
25
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.2%
26
Communications Medicine
85 papers in training set
Top 0.5%
1.2%
27
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.4%
1.2%
28
GeroScience
97 papers in training set
Top 1%
1.0%
29
Science Advances
1098 papers in training set
Top 25%
1.0%
30
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%