Back

Deep phenotyping of blood cell data reveals novel clinical biomarkers

Chen, Y.-L.; Zhang, C.; Lucas, F.; Hadlock, J.; Foy, B. H.

2026-03-26 hematology
10.64898/2026.03.24.26349221 medRxiv
Show abstract

Introduction The complete blood count with differential (CBD) is one of the most commonly performed blood tests worldwide, used in nearly all areas of medicine. Although modern CBD analyzers generate flow cytometry based single cell measurements,the resultant CBD markers are limited to coarse summary features, such as total cell counts and average cell sizes. This means, the markers cannotdetect subtle cell population shifts that may signal early stage pathogenesis. To test this, we evaluate whether AI based analysis of the raw single cell data underlying the CBD can be used to develop novel, clinically prognostic biomarkers, across patient settings. Method We developed two complementary methods for biomarker discovery using CBD tests and evaluated them with longitudinal data from an academic medical center. To create interpretable biomarkers, we clustered cells into physiologically meaningful subpopulations and performed robust statistical summarization. In tandem, self supervised autoencoders were developed to extract novel nonlinear markers. We evaluated the utility of these clustering (CLS) and autoencoder (AE) markers for patient prognostication across a range of outcomes (mortality, inpatient admission, and future disease development). Results Our study included 242,623 CBD samples from 127,545 patients. Both clustering and embedding approaches successfully generated hundreds of new clinical biomarkers. Many biomarkers showed strong prognostic associations for all cause mortality, inpatient admission, and development of anemia, cancer, or cardiovascular disease, with associations remaining significant after adjustment for demographics and clinical CBD markers. A large subset of these prognostic markers also showed high novelty, having low correlations to existing CBD markers, while also exhibiting significant correlations with broader physiologic signals, such as inflammatory, hormonal, infectious, and coagulopathic markers. Conclusion Collectively, these results demonstrate how modern AI techniques can allow for deeper phenotyping of routine clinical blood counts, generating novel biomarkers that capture more subtle physiologic signals than what are currently clinically utilized.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Cytometry Part A
30 papers in training set
Top 0.1%
43.0%
2
PLOS Computational Biology
1633 papers in training set
Top 4%
7.8%
50% of probability mass above
3
PLOS Digital Health
91 papers in training set
Top 0.5%
4.3%
4
PLOS ONE
4510 papers in training set
Top 34%
4.3%
5
Scientific Reports
3102 papers in training set
Top 31%
4.0%
6
npj Digital Medicine
97 papers in training set
Top 1%
3.3%
7
Heliyon
146 papers in training set
Top 0.6%
2.8%
8
npj Precision Oncology
48 papers in training set
Top 0.3%
2.3%
9
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.3%
10
Clinical Chemistry
22 papers in training set
Top 0.3%
2.0%
11
Physiological Reports
35 papers in training set
Top 0.4%
2.0%
12
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.8%
13
Clinical Chemistry and Laboratory Medicine (CCLM)
12 papers in training set
Top 0.1%
1.4%
14
Journal of Proteome Research
215 papers in training set
Top 2%
1.0%
15
British Journal of Haematology
15 papers in training set
Top 0.3%
1.0%
16
Frontiers in Medicine
113 papers in training set
Top 6%
0.9%
17
Bioinformatics
1061 papers in training set
Top 9%
0.9%
18
Science Advances
1098 papers in training set
Top 27%
0.8%
19
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
20
Communications Biology
886 papers in training set
Top 27%
0.7%
21
iScience
1063 papers in training set
Top 36%
0.7%
22
Bioinformatics Advances
184 papers in training set
Top 5%
0.5%
23
Molecular Omics
21 papers in training set
Top 0.6%
0.5%
24
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 11%
0.5%
25
Biology Methods and Protocols
53 papers in training set
Top 3%
0.5%
26
Journal of Pathology Informatics
13 papers in training set
Top 0.5%
0.5%