Back

Development of an AI Algorithm for Automatic Classification of Gram Stain Images in Microbiology

Musaji, S.; Kibsey, P.; Musaji, A.

2025-12-03 microbiology
10.64898/2025.12.03.692028 bioRxiv
Show abstract

This paper reflects on the development and performance of an advanced artificial intelligence (AI) algorithm for the automated processing and classification of Gram stain images obtained from actual microbiology samples used in clinical microbiology. The aim of the project was to effectively categorize non-standardized Gram stain images into the six most common categories: Gram-negative rods, Gram-positive cocci in chains, Gram-positive cocci in clusters, Gram-positive rods, Gram-negative cocci, and yeasts. The development and testing relied on 1,077 Gram stain images of varying sizes, originating from different laboratories and captured using diverse microscopes at different points in time, resulting in differences in image quality, scaling, color balance, and the presence of artifacts. The dataset was split into 80% training and 20% testing subsets, with the split performed in a stratified manner so that each object group was proportionally represented in both the training and testing sets. Preprocessing involved computer vision techniques to improve contrast and color balance, detect contours and object borders, and implement filtering mechanisms to remove unwanted artifacts. Morphological analysis of shapes was then performed to extract parameters characterizing each contour. Next, human-like classification criteria--based on gradient, morphological features (e.g., shape, size) and spatial arrangement that mimic microbiologists visual assessment--were established, achieving around 92% accuracy in image classification without using machine learning (ML) methods. However, any further improvements turned practically impossible, prompting the use of ML methods. Building on pre-obtained features, a random forest ML algorithm was employed to further refine the criteria, with three models trained and tested successively. The first model determined the Gram stain reaction (positive or negative) of each object. The second model classified objects into one of six predefined categories. The third model aggregated individual object classifications to generate an overall classification for each slide, based on the number of objects observed in each category and their occupied area. Overall, the ML solution was significantly more accurate, reaching 99.9% accuracy in classifying the images into one of the aforementioned groups. The algorithms limitations include inability to classify mixed cultures, as it primarily focuses on the dominant category. In cases where positive and negative objects coexist, the algorithm tends to prioritize Gram-positive objects. Additionally, the current morphological assessment is insufficient for yeast classification. Addressing these limitations is a crucial avenue for future research to enhance the algorithms versatility and accuracy.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Journal of Microbiological Methods
11 papers in training set
Top 0.1%
19.5%
2
PLOS ONE
4510 papers in training set
Top 12%
15.0%
3
Scientific Reports
3102 papers in training set
Top 12%
7.1%
4
Frontiers in Microbiology
375 papers in training set
Top 2%
3.7%
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.7%
6
Access Microbiology
22 papers in training set
Top 0.1%
2.2%
50% of probability mass above
7
Journal of Pathology Informatics
13 papers in training set
Top 0.1%
2.2%
8
Frontiers in Cellular and Infection Microbiology
98 papers in training set
Top 2%
2.2%
9
Biology Methods and Protocols
53 papers in training set
Top 0.6%
2.0%
10
Applied Sciences
24 papers in training set
Top 0.2%
1.8%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.8%
12
Journal of Microscopy
18 papers in training set
Top 0.3%
1.4%
13
Frontiers in Medicine
113 papers in training set
Top 4%
1.3%
14
Cureus
67 papers in training set
Top 3%
1.3%
15
Diagnostics
48 papers in training set
Top 2%
1.0%
16
PeerJ
261 papers in training set
Top 11%
1.0%
17
Biomedical Optics Express
84 papers in training set
Top 0.9%
0.9%
18
JMIRx Med
31 papers in training set
Top 1%
0.9%
19
Animals
20 papers in training set
Top 0.8%
0.8%
20
BMC Microbiology
35 papers in training set
Top 1%
0.8%
21
Microbiology
57 papers in training set
Top 1%
0.8%
22
GigaScience
172 papers in training set
Top 3%
0.7%
23
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.7%
24
Journal of General Virology
46 papers in training set
Top 0.9%
0.7%
25
Sensors
39 papers in training set
Top 2%
0.7%
26
Microbiology Resource Announcements
22 papers in training set
Top 1%
0.7%
27
Mathematics
11 papers in training set
Top 0.5%
0.7%
28
Viruses
318 papers in training set
Top 6%
0.7%
29
Heliyon
146 papers in training set
Top 9%
0.5%
30
Journal of The Royal Society Interface
189 papers in training set
Top 6%
0.5%