Back

Interpretable morphology mapping of peripheral blood leukocytes using annotation-efficient artificial intelligence

Liu, Z.; Castillo, S. P.; Han, X.; Sun, X.; Hu, Z.; Yuan, Y.

2026-05-26 pathology
10.64898/2026.05.22.725537 bioRxiv
Show abstract

BackgroundPeripheral blood smears (PBS) review is labor-intensive, subjective, and challenging for rare or morphologically heterogeneous cell types in hematologic malignancies. Artificial intelligence (AI) offers a scalable alternative, but broader clinical translation is constrained by annotation burden and limited interpretability. MethodsWe developed an interpretable, annotation-efficient AI framework that learns leukocyte morphology through a two-stage process: label-free representation learning to construct a morphological embedding space, followed by supervised fine-tuning for cell type and morphological attribute classification. The model was trained and evaluated on 5,952 PBS images from cancer patients at MD Anderson Cancer Center, including blast cells, and 17,092 images from public sources. Active learning strategies were assessed to improve label efficiency, and interpretability was examined using saliency and embedding visualization. An interactive web application, HemoSight, was developed to support clinical review. FindingsThe framework achieved a macro-F1 score of 0{middle dot}96 for 9-way leukocyte classification on the internal test split and 0{middle dot}83 on the held-out patient cohort. Active learning substantially reduced annotation requirements, reaching peak performance with only 13{middle dot}3% of available labels and significantly improving learning efficiency across 8 of 9 cell types. The model generalized to classifying 11 leukocyte morphological attributes with a mean F1 score of 85{middle dot}8% and revealed structured morphological landscapes. Saliency maps, embedding visualizations, and the HemoSight application enabled transparent morphological inspection of model predictions, supporting confidence in model behavior and feasibility for clinical integration. InterpretationOur framework enables scalable, annotation-efficient, and interpretable modeling of leukocyte morphology, supporting the integration of AI-assisted PBS review for hematopathology workflows. FundingSeed funding from The University of Texas MD Anderson Cancer Center. Research in ContextO_ST_ABSEvidence before this studyC_ST_ABSPeripheral blood smear review is essential for diagnosing and monitoring hematologic malignancies, but manual case review is time-consuming and variable, particularly for rare or abnormal leukocyte types. Automated hematology analyzers are widely used to flag abnormal cells; however, they provide limited morphological insight and often require frequent manual correction, especially in cancer settings where disease and treatment alter cell appearance. Previous artificial intelligence approaches for leukocyte classification have shown promise, but most rely on fully supervised learning, require extensive expert annotation, focus on a limited set of cell types, and frequently exclude diagnostically important rare cells such as blasts. Interpretability is inconsistently addressed, and few studies provide tools that allow clinicians to inspect and interpret model outputs within routine workflows. Added value of this studyThis study introduces an annotation-efficient framework trained on a large collection of peripheral blood smear images, including cancer patient samples with hematopathologist-verified rare cell types such as blasts. The framework learns leukocyte morphology from unlabeled images and adapts to multiple classification tasks with minimal expert labeling. Performance is evaluated on both internal test splits and a held-out patient cohort to provide a realistic estimate of generalization. Iterative, uncertainty-guided annotation substantially reduces labeling requirements while improving learning efficiency across most leukocyte classes. Beyond cell-type classification, the framework is extended to 11 clinically relevant morphological attributes and reveals a structured morphological landscape. These capabilities are integrated into a web application, HemoSight, enabling real-time inference and transparent morphological inspection of predictions within hematopathology workflows. Implications of all the available evidenceAdvancing artificial intelligence for hematology requires methods that reduce expert labeling demands, provide interpretable outputs, and perform reliably across clinically diverse patient samples. This study shows that learning from largely unlabeled data combined with iterative expert annotation can support scalable and flexible modeling of leukocyte morphology for classification tasks. Integrating quantitative predictions and interactive visualization supports the use of artificial intelligence as an assistive tool for diagnostic peripheral blood smear review, with potential to improve efficiency, consistency, and reviewer confidence.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Modern Pathology
21 papers in training set
Top 0.1%
52.6%
50% of probability mass above
2
Journal of Pathology Informatics
13 papers in training set
Top 0.1%
8.5%
3
Cytometry Part A
30 papers in training set
Top 0.1%
4.9%
4
Clinical Chemistry
22 papers in training set
Top 0.1%
4.4%
5
PLOS Computational Biology
1633 papers in training set
Top 11%
3.3%
6
Nature Communications
4913 papers in training set
Top 47%
2.1%
7
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
8
PLOS ONE
4510 papers in training set
Top 57%
1.4%
9
Scientific Reports
3102 papers in training set
Top 65%
1.2%
10
npj Precision Oncology
48 papers in training set
Top 0.8%
1.2%
11
Blood Cancer Journal
11 papers in training set
Top 0.2%
0.8%
12
Communications Biology
886 papers in training set
Top 21%
0.8%
13
Journal of Medical Imaging
11 papers in training set
Top 0.3%
0.8%
14
Blood
67 papers in training set
Top 1%
0.7%
15
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
16
PLOS Digital Health
91 papers in training set
Top 3%
0.7%
17
Journal of Clinical Pathology
12 papers in training set
Top 0.6%
0.7%
18
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
19
Bioinformatics
1061 papers in training set
Top 10%
0.7%
20
Biological Imaging
15 papers in training set
Top 0.3%
0.7%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 49%
0.5%
22
Laboratory Investigation
13 papers in training set
Top 0.4%
0.5%