Back

Distinct Repeat Architecture Landscapes in the Proteomes of Protozoan Parasites

Matsumoto, H.; Hong, J.

2026-01-21 bioinformatics
10.64898/2026.01.20.700692 bioRxiv
Show abstract

Protozoan parasites cause major infectious diseases and pose persistent challenges to global health, particularly the emergence of drug-resistant strains. Tandem repeats (TRs) and other repetitive architectures are widespread in proteomes, especially in protozoan proteins, where they have been implicated in host-parasite interactions, immune evasion, and antigenicity. However, repeat-containing proteins (RPs) exhibit highly diverse architectures that often extend beyond the simple reiteration of a single motif, making comprehensive and quantitative characterization challenging. In this study, we performed bioinformatics analysis of repeat architectures in protozoan proteins. In addition to the established repeat-detection approaches, we developed a new algorithm, Drepper, which quantifies repeat-architecture complexity. By integrating diverse repeat-related features, we clustered RPs across species and identified distinct groups associated with parasite lineages. Notably, we detected a Plasmodium-specific RP cluster and a Trypanosoma/Leishmania-specific RP cluster; both were characterized by large repeat regions but exhibited contrasting repeat-structure complexity. The Plasmodium-specific RPs showed high complexity, whereas the Trypanosoma/Leishmania-specific RPs displayed significantly low complexity. Functional enrichment analyses indicated that these lineage-associated clusters were enriched in parasite-specific factors. Furthermore, evolutionary analyses suggested that low-complexity repeat architectures may be actively maintained through concerted evolution. Taken together, our results reveal lineage-specific strategies in protozoan repeat architectures and provide a quantitative framework for studying their biological and evolutionary roles.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 0.1%
59.3%
50% of probability mass above
2
Advanced Science
249 papers in training set
Top 5%
3.6%
3
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
4
Nucleic Acids Research
1128 papers in training set
Top 9%
2.1%
5
Journal of Genetics and Genomics
36 papers in training set
Top 0.7%
2.1%
6
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.1%
7
Molecular & Cellular Proteomics
158 papers in training set
Top 0.9%
1.9%
8
Gut Microbes
70 papers in training set
Top 0.6%
1.7%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.7%
10
mSystems
361 papers in training set
Top 5%
1.3%
11
Frontiers in Microbiology
375 papers in training set
Top 6%
1.3%
12
Scientific Reports
3102 papers in training set
Top 66%
1.2%
13
Nature Communications
4913 papers in training set
Top 60%
0.9%
14
mBio
750 papers in training set
Top 11%
0.8%
15
BMC Biology
248 papers in training set
Top 4%
0.7%
16
Genome Biology
555 papers in training set
Top 7%
0.7%
17
Cell Reports
1338 papers in training set
Top 33%
0.7%
18
The FEBS Journal
78 papers in training set
Top 0.9%
0.7%
19
Frontiers in Molecular Biosciences
100 papers in training set
Top 5%
0.7%
20
Science Bulletin
22 papers in training set
Top 1.0%
0.7%
21
PLOS Neglected Tropical Diseases
378 papers in training set
Top 5%
0.7%
22
Science Advances
1098 papers in training set
Top 31%
0.7%
23
eLife
5422 papers in training set
Top 61%
0.6%