A Benchmarking Framework for Comparative Evaluation of Low-Complexity Region Detection Tools in the Human Proteome
Chatterjee, A.; Vijay, N.
Show abstract
Low-complexity regions (LCRs) are compositionally biased segments of proteins that play critical roles in molecular recognition, structural flexibility, and phase separation. Yet, their accurate detection remains challenging due to methodological variability among computational tools. In this study, we conducted a comprehensive benchmarking of eight widely used LCR detection methods (with different parameter settings) across the Homo sapiens proteome. A modular computational framework was developed to systematically compare LCR characteristics, including residue-centric analyses such as length distributions and coverage percentages. Protein-centric analyses consisted of compositional bias, amino acid composition, and Shannon entropy. Consensus analyses revealed that regions detected by multiple tools were typically longer, more repetitive, and compositionally purer, suggesting stronger structural or functional relevance. Jaccard similarity matrices demonstrated distinct clustering patterns among algorithms based on shared detection principles. Additionally, entropy and purity analyses highlighted fundamental differences in sequence complexity captured by each tool. Together, these results provide a unified, reproducible framework for evaluating LCR detection performance and offer practical guidelines for reliably annotating low-complexity regions in proteome-scale studies.
Matching journals
The top 6 journals account for 50% of the predicted probability mass.