Back

Tolerance Regions For Compositional Data With Application To Reference Regions For Healthy Microbiome Profiles

Wickramasinghe, N.; Choudhary, P.

2026-05-07 microbiology
10.64898/2026.05.06.723285 bioRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWImbalances in the human microbiome are associated with numerous diseases, highlighting the need for benchmarks that define healthy microbiome composition and identify abnormal deviations. Although the microbiome is increasingly studied as a potential clinical marker, statistical approaches for constructing reference regions of healthy microbiome composition remain relatively underexplored. This work develops statistical methods to construct reference regions for healthy microbiome data, addressing three main challenges. First, since microbiome data contain relative rather than absolute information, standard statistical methods are not directly appropriate. Therefore, microbiome profiles are treated as compositional data satisfying a sum constraint, and log-ratio transformations are used to analyze them in real space while preserving their relative structure. Second, reference regions are constructed as tolerance regions rather than confidence regions, so that they cover a pre-specified proportion of the healthy population with a given confidence level. The proposed framework incorporates both parametric and nonparametric approaches for constructing these tolerance regions. Parametric methods are considered when the ilr-transformed data approximately follow an elliptical distribution, where they can yield smaller regions while maintaining the desired coverage. Nonparametric approaches provide a flexible alternative by avoiding distributional assumptions. Third, because microbiome data are multidimensional and difficult to interpret, quantitative and graphical tools are introduced to assess atypicality and identify which microbial taxa contribute most to deviations from healthy profiles. Simulation studies are conducted to evaluate the performance of the proposed methods. The methodology is then demonstrated by constructing reference regions for healthy microbiome profiles using real-world data. Finally, the approach is applied to microbiome datasets comparing healthy and patient profiles to assess whether patient samples are identified as atypical and to examine which taxa contribute to these deviations. Overall, the proposed framework provides a clear and statistically robust approach for defining healthy microbiome reference regions and detecting atypical microbiome profiles.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 3%
10.3%
2
Briefings in Bioinformatics
326 papers in training set
Top 0.6%
7.3%
3
PLOS ONE
4510 papers in training set
Top 26%
6.5%
4
Frontiers in Microbiology
375 papers in training set
Top 1%
6.4%
5
Scientific Reports
3102 papers in training set
Top 17%
6.4%
6
Bioinformatics
1061 papers in training set
Top 4%
4.9%
7
Journal of The Royal Society Interface
189 papers in training set
Top 0.7%
4.9%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 1%
4.0%
50% of probability mass above
9
mSystems
361 papers in training set
Top 2%
4.0%
10
Microbiome
139 papers in training set
Top 0.9%
3.7%
11
npj Systems Biology and Applications
99 papers in training set
Top 0.8%
2.1%
12
Bioinformatics Advances
184 papers in training set
Top 2%
1.9%
13
Statistics in Medicine
34 papers in training set
Top 0.1%
1.9%
14
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.8%
15
Microbial Genomics
204 papers in training set
Top 1%
1.7%
16
BMC Genomics
328 papers in training set
Top 3%
1.5%
17
mSphere
281 papers in training set
Top 4%
1.4%
18
GigaScience
172 papers in training set
Top 2%
0.9%
19
Methods in Ecology and Evolution
160 papers in training set
Top 2%
0.8%
20
npj Biofilms and Microbiomes
56 papers in training set
Top 2%
0.8%
21
iScience
1063 papers in training set
Top 31%
0.8%
22
Gut Microbes
70 papers in training set
Top 1%
0.7%
23
Nature Communications
4913 papers in training set
Top 64%
0.7%
24
BMC Medical Genomics
36 papers in training set
Top 2%
0.7%
25
Mathematics
11 papers in training set
Top 0.5%
0.7%
26
ISME Communications
103 papers in training set
Top 2%
0.7%
27
Microbiology Spectrum
435 papers in training set
Top 7%
0.5%
28
Applied Sciences
24 papers in training set
Top 1%
0.5%
29
Journal of Microbiological Methods
11 papers in training set
Top 0.7%
0.5%
30
Journal of Theoretical Biology
144 papers in training set
Top 2%
0.5%