Back

Eukaryotic secreted proteins are encoded in repeat-rich genomic regions

Farrer, R. A.

2026-03-18 genomics
10.64898/2026.03.17.712334 bioRxiv
Show abstract

Secretion signals are ancient and functionally conserved sequence motifs that orchestrate function and intended destination of cleaved encoded proteins (1-3). To investigate the genomic landscape of secreted proteins, 4,694 annotated eukaryotic genome assemblies were analysed. Genes encoding secretion signals (n = 5.2 million) were consistently enriched in genomic regions with longer flanking intergenic regions (FIRs). Consecutive genes with characteristic FIR lengths were enriched for genes with secretion signals. Intriguingly, many eukaryotic pathogens and parasites have the most significant association between genes encoding secretion signals and their intergenic distance. Almost every category of repeat was found in greater number flanking genes encoding secretion signals, with especially strong enrichment of simple, unknown, and low complexity repeats in fungal genomes. Despite higher repeat counts, the total repeat length was consistently shorter around genes with secretion signals, suggesting a prevalence of truncated or fragmented repeats in these regions. Several GO-terms assigned to genes with secretion signals were consistently enriched across genome assemblies in each kingdom. Common GO-enrichment patterns were also identified in genes categorised by their FIR. These results hint at an anciently conserved genomic architecture and mode of evolution in eukaryotes, characterised by long FIRs and fragmented repeat landscapes, likely driven by mechanisms such as repeat-driven gene copy number variation (4), differential mutation rates (5) and chromatin remodelling (6). This conserved association highlights the potential of genome structure to drive innovation in secreted protein function.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 2%
10.0%
2
Microbial Genomics
204 papers in training set
Top 0.2%
9.1%
3
mBio
750 papers in training set
Top 2%
6.8%
4
Genome Biology and Evolution
280 papers in training set
Top 0.3%
4.8%
5
Frontiers in Microbiology
375 papers in training set
Top 2%
4.1%
6
mSystems
361 papers in training set
Top 2%
3.9%
7
BMC Biology
248 papers in training set
Top 0.2%
3.9%
8
The ISME Journal
194 papers in training set
Top 0.7%
3.6%
9
Frontiers in Cell and Developmental Biology
218 papers in training set
Top 2%
3.6%
10
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.2%
50% of probability mass above
11
Nature Communications
4913 papers in training set
Top 42%
3.0%
12
BMC Genomics
328 papers in training set
Top 1%
2.7%
13
Open Biology
95 papers in training set
Top 0.5%
1.9%
14
Molecular Biology and Evolution
488 papers in training set
Top 3%
1.7%
15
ISME Communications
103 papers in training set
Top 1%
1.7%
16
mSphere
281 papers in training set
Top 3%
1.7%
17
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
18
Communications Biology
886 papers in training set
Top 9%
1.7%
19
eLife
5422 papers in training set
Top 45%
1.5%
20
Scientific Reports
3102 papers in training set
Top 67%
1.2%
21
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
22
Microbiology Resource Announcements
22 papers in training set
Top 0.6%
1.1%
23
iScience
1063 papers in training set
Top 23%
1.1%
24
Genome Biology
555 papers in training set
Top 6%
0.9%
25
Bioinformatics
1061 papers in training set
Top 9%
0.9%
26
Cell Host & Microbe
113 papers in training set
Top 5%
0.8%
27
Microbiology Spectrum
435 papers in training set
Top 6%
0.7%
28
Science Advances
1098 papers in training set
Top 30%
0.7%
29
Current Biology
596 papers in training set
Top 14%
0.7%