Back

PhenoGMM: Gaussian mixture modelling of microbial cytometry data enables efficient predictions of biodiversity

Rubbens, P.; Props, R.; Kerckhof, F.-M.; Boon, N.; Waegeman, W.

2020-06-19 microbiology
10.1101/641464 bioRxiv
Show abstract

Microbial flow cytometry allows to rapidly characterize microbial communities. Recent research has demonstrated a moderate to strong connection between the cytometric diversity and taxonomic diversity based on 16S rRNA gene amplicon sequencing data. This creates the opportunity to integrate both types of data to study and predict the microbial community diversity in an automated and efficient way. However, microbial flow cytometry data results in a number of unique challenges that need to be addressed. The results of our work are threefold: i) We expand current microbial cytometry fingerprinting approaches by proposing and validating a model-based fingerprinting approach based upon Gaussian Mixture Models, which we called PhenoGMM. ii) We show that microbial diversity can be rapidly estimated by PhenoGMM. In combination with a supervised machine learning model, diversity estimations based on 16S rRNA gene amplicon sequencing data can be predicted. iii) We evaluate our method extensively by using multiple datasets from different ecosystems and compare its predictive power with a generic binning fingerprinting approach that is commonly used in microbial flow cytometry. These results demonstrate the strong connection between the genetic make-up of a microbial community and its phenotypic properties as measured by flow cytometry. Our workflow facilitates the study of microbial diversity and community dynamics using flow cytometry in a fast and quantitative way. ImportanceMicroorganisms are vital components in various ecoystems on Earth. In order to investigate the microbial diversity, researchers have largely relied on the analysis of 16S rRNA gene sequences from DNA. Flow cytometry has been proposed as an alternative technique to characterize microbial community diversity and dynamics. It is an optical technique, able to rapidly characterize a number of phenotypic properties of individual cells. So-called fingerprinting techniques are needed in order to describe microbial community diversity and dynamics based on flow cytometry data. In this work, we propose a more advanced fingerprinting strategy based on Gaussian Mixture Models. When samples have been analyzed by both flow cytometry and 16S rRNA gene amplicon sequencing, we show that supervised machine learning models can be used to find the relationship between the two types of data. We evaluate our workflow on datasets from different ecosystems, illustrating its general applicability for the analysisof microbial flow cytometry data. PhenoGMM facilitates the rapid characterization and predictive modelling of microbial diversity using flow cytometry.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Journal of Microbiological Methods
11 papers in training set
Top 0.1%
10.1%
2
Bioinformatics
1061 papers in training set
Top 4%
6.8%
3
PLOS ONE
4510 papers in training set
Top 28%
6.4%
4
mSystems
361 papers in training set
Top 2%
6.3%
5
PLOS Computational Biology
1633 papers in training set
Top 6%
6.3%
6
Microbiome
139 papers in training set
Top 0.7%
4.8%
7
Methods in Ecology and Evolution
160 papers in training set
Top 0.6%
4.8%
8
ISME Communications
103 papers in training set
Top 0.4%
4.3%
9
Scientific Reports
3102 papers in training set
Top 28%
4.3%
50% of probability mass above
10
Frontiers in Microbiology
375 papers in training set
Top 3%
3.6%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
12
mSphere
281 papers in training set
Top 3%
1.7%
13
iScience
1063 papers in training set
Top 15%
1.7%
14
GigaScience
172 papers in training set
Top 1%
1.7%
15
Molecular Ecology Resources
161 papers in training set
Top 0.6%
1.7%
16
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
17
PeerJ
261 papers in training set
Top 10%
1.2%
18
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
19
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
0.9%
21
Environmental Pollution
35 papers in training set
Top 2%
0.9%
22
Yeast
15 papers in training set
Top 0.1%
0.9%
23
Frontiers in Plant Science
240 papers in training set
Top 5%
0.8%
24
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
25
Ecological Informatics
29 papers in training set
Top 0.7%
0.8%
26
Microbiology Spectrum
435 papers in training set
Top 5%
0.8%
27
Journal of Visualized Experiments
30 papers in training set
Top 0.7%
0.8%
28
Metabolites
50 papers in training set
Top 1%
0.7%
29
Viruses
318 papers in training set
Top 5%
0.7%
30
Journal of Microscopy
18 papers in training set
Top 0.4%
0.7%