Back

A Benchmarking Framework for Comparative Evaluation of Low-Complexity Region Detection Tools in the Human Proteome

Chatterjee, A.; Vijay, N.

2026-01-26 bioinformatics
10.64898/2026.01.24.701293 bioRxiv
Show abstract

Low-complexity regions (LCRs) are compositionally biased segments of proteins that play critical roles in molecular recognition, structural flexibility, and phase separation. Yet, their accurate detection remains challenging due to methodological variability among computational tools. In this study, we conducted a comprehensive benchmarking of eight widely used LCR detection methods (with different parameter settings) across the Homo sapiens proteome. A modular computational framework was developed to systematically compare LCR characteristics, including residue-centric analyses such as length distributions and coverage percentages. Protein-centric analyses consisted of compositional bias, amino acid composition, and Shannon entropy. Consensus analyses revealed that regions detected by multiple tools were typically longer, more repetitive, and compositionally purer, suggesting stronger structural or functional relevance. Jaccard similarity matrices demonstrated distinct clustering patterns among algorithms based on shared detection principles. Additionally, entropy and purity analyses highlighted fundamental differences in sequence complexity captured by each tool. Together, these results provide a unified, reproducible framework for evaluating LCR detection performance and offer practical guidelines for reliably annotating low-complexity regions in proteome-scale studies.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Journal of Proteome Research
215 papers in training set
Top 0.1%
22.5%
2
Molecular & Cellular Proteomics
158 papers in training set
Top 0.4%
6.8%
3
PROTEOMICS
35 papers in training set
Top 0.1%
6.4%
4
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 1%
6.4%
5
Journal of Molecular Biology
217 papers in training set
Top 0.3%
4.8%
6
Protein Science
221 papers in training set
Top 0.3%
4.3%
50% of probability mass above
7
Nature Communications
4913 papers in training set
Top 38%
3.9%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.6%
9
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
10
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.6%
11
Advanced Science
249 papers in training set
Top 6%
3.1%
12
Analytical Chemistry
205 papers in training set
Top 1%
2.7%
13
Bioinformatics
1061 papers in training set
Top 7%
1.9%
14
Scientific Reports
3102 papers in training set
Top 56%
1.8%
15
PLOS ONE
4510 papers in training set
Top 57%
1.5%
16
Journal of the American Society for Mass Spectrometry
33 papers in training set
Top 0.3%
1.5%
17
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
18
Communications Biology
886 papers in training set
Top 14%
1.2%
19
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
20
ACS Omega
90 papers in training set
Top 4%
0.8%
21
International Journal of Molecular Sciences
453 papers in training set
Top 14%
0.8%
22
Journal of Proteomics
27 papers in training set
Top 0.4%
0.7%
23
Frontiers in Microbiology
375 papers in training set
Top 9%
0.7%
24
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.7%
25
Communications Chemistry
39 papers in training set
Top 1%
0.7%
26
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%
27
International Journal of Biological Macromolecules
65 papers in training set
Top 4%
0.6%