Back

Calibrating for absolute microbiome abundances without spike-ins

de Wit, N. T.; Baral, A.; Fuschi, A.; Jacobs, G.; de Rijk, S.; van der Plaats, R. Q.; Becsei, A.; Kerkvliet, J.; Freitag, R.; Vojtkova, M.; Brinch, C.; Schmitt, H.; Munk, P.

2026-02-26 microbiology
10.64898/2026.02.26.708180 bioRxiv
Show abstract

Metagenomics is a widely used approach in microbiome research. However, a major limitation of metagenomic datasets is their compositional nature, which prevents direct quantification of absolute abundances and complicates cross-sample comparisons. Existing strategies for absolute quantification typically require additional experiments or spike-in controls. Here, we introduce the MetaGenome Calibrator (MGCalibrator), a new tool that enables spike-in free, absolute abundance estimation based on routine DNA concentration measurements. We validated the accuracy of absolute abundances obtained with MGCalibrator against qPCR for 5 targets. Our results show a strong correlation with qPCR data, indicating that MGCalibrator enables qPCR-like trend analyses. For Bacteroides dorei, the estimated abundances were highly similar between the two methods (r2 = 0.98, y = 1.00x). For other targets like crAssphage or the bacterial 16S rRNA gene, qPCR values were underrepresented by a factor of 7 or overrepresented by a factor of 4. Benchmarking with synthetic microbiome data demonstrated that our method accurately determines copy numbers in sequencing datasets, and application to whole-cell mock community samples produced expected values based on known extraction biases. In an extraction-bias-free experiment, MGCalibrator accurately quantified genome copy numbers within a twofold range in 98% of cases and determined 16S rRNA gene copies within 1.6-fold or less. Finally, we applied MGCalibrator to track temporal trends in antibiotic resistance genes (ARGs) in wastewater treatment plants in two Dutch provincial capitals. We observed an overall increase in ARGs--such as sul2 in Utrecht and qnrS5 in Houtrust--likely driven by rising bacterial loads. Our findings demonstrate that MGCalibrator provides robust calibration of metagenomic data, paving the way for metagenomics to play a central role in future surveillance by enabling trend analysis across thousands of genetic targets, similar to the capabilities of qPCR for individual genes. The source code and documentation for MGCalibrator are available at github.com/NimroddeWit/MGCalibrator.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 8%
17.3%
2
Microbiome
139 papers in training set
Top 0.1%
17.3%
3
mSystems
361 papers in training set
Top 0.6%
10.3%
4
Water Research
74 papers in training set
Top 0.3%
9.0%
50% of probability mass above
5
ISME Communications
103 papers in training set
Top 0.1%
8.3%
6
PLOS ONE
4510 papers in training set
Top 43%
2.8%
7
Environmental Science & Technology
64 papers in training set
Top 1%
1.9%
8
GigaScience
172 papers in training set
Top 1%
1.7%
9
Microbiology Spectrum
435 papers in training set
Top 3%
1.7%
10
Frontiers in Microbiology
375 papers in training set
Top 6%
1.5%
11
PLOS Computational Biology
1633 papers in training set
Top 19%
1.3%
12
mSphere
281 papers in training set
Top 4%
1.3%
13
Microbial Genomics
204 papers in training set
Top 2%
1.2%
14
Methods in Ecology and Evolution
160 papers in training set
Top 2%
1.2%
15
Scientific Reports
3102 papers in training set
Top 67%
1.2%
16
mBio
750 papers in training set
Top 9%
1.1%
17
Nucleic Acids Research
1128 papers in training set
Top 14%
1.1%
18
Molecular Ecology Resources
161 papers in training set
Top 0.8%
1.1%
19
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
20
Science of The Total Environment
179 papers in training set
Top 4%
0.9%
21
Genome Biology
555 papers in training set
Top 7%
0.8%
22
eLife
5422 papers in training set
Top 58%
0.7%
23
npj Biofilms and Microbiomes
56 papers in training set
Top 2%
0.7%
24
FEMS Microbiology Ecology
47 papers in training set
Top 0.6%
0.7%
25
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%