Back

Early evolution of the prokaryotic transcription factor repertoire

Singh, I. R.; Dubey, A.; Seshasayee, A. S. N.

2026-04-11 evolutionary biology
10.64898/2026.04.08.717362 bioRxiv
Show abstract

Transcription initiation is regulated by proteins called transcription factors (TFs). Though TFs help determine phenotype across the tree of life, they are nonessential for minimal cellular life and are often absent in endosymbiotic and parasitic organisms. Given this and the idea that it is a certain level of organism complexity that calls for specific transcription regulation, we traced the evolutionary history of TF repertoire on a bacterio-archaeal tree of life using a dataset of [~]500,000 TFs, grouped into [~]1,700 orthologous groups (OGs) across [~]3,000 species. The most ancestral prokaryotes encoded multiple TFs. Going by known extant functions of these TFs, they possibly regulated sugar-fermentation metabolism, sensed overall metabolic state and redox, responded to DNA damage or bound metals; many of which are consistent with some reconstructions of ancestral gene pools and physiologies. The number of TFs as well as their superfamily-level diversity, through evolutionary history, matches expectations against genome size derived from extant bacteria, suggesting pre-LUCA diversification of TF sequence families. Emergence of new TFs along the phylogeny largely followed a smooth cumulative distribution curve, suggesting steady innovation, early in prokaryote evolution, in contrast to eukaryotes, in which a majority of TF families emerged in a burst manner at the ancestors of multicellular lineages. Gains of TFs late in prokaryotic evolution predominantly featured recycling of protein families discovered elsewhere in the prokaryotic tree, consistent with the dominance of horizontal gene transfer in these organisms. We speculate on the difference between the evolutionary trajectory of prokaryotic TF repertoire and compare it with the eukaryotic TF repertoire trajectory. This helps us in understanding the manner in which their TF repertoires have evolved in two different super-kingdoms. The difference between the evolutionary dynamics of TF-repertoires might be due to how complexity is envisioned in these two different kingdoms.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Genome Biology and Evolution
280 papers in training set
Top 0.1%
18.5%
2
eLife
5422 papers in training set
Top 7%
9.1%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 10%
6.8%
4
Molecular Biology and Evolution
488 papers in training set
Top 0.6%
6.8%
5
BMC Ecology and Evolution
49 papers in training set
Top 0.3%
3.9%
6
Journal of Molecular Evolution
21 papers in training set
Top 0.1%
3.9%
7
PLOS Biology
408 papers in training set
Top 3%
3.6%
50% of probability mass above
8
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
9
Scientific Reports
3102 papers in training set
Top 44%
2.7%
10
Open Biology
95 papers in training set
Top 0.3%
2.6%
11
Current Biology
596 papers in training set
Top 8%
2.1%
12
iScience
1063 papers in training set
Top 13%
1.8%
13
Genes
126 papers in training set
Top 1%
1.7%
14
Journal of Molecular Biology
217 papers in training set
Top 2%
1.7%
15
Frontiers in Ecology and Evolution
60 papers in training set
Top 3%
1.2%
16
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.2%
17
PLOS Genetics
756 papers in training set
Top 11%
1.2%
18
Proceedings of the Royal Society B: Biological Sciences
341 papers in training set
Top 5%
1.2%
19
Nature Communications
4913 papers in training set
Top 57%
1.2%
20
Nature Ecology & Evolution
113 papers in training set
Top 4%
0.9%
21
PLOS ONE
4510 papers in training set
Top 64%
0.9%
22
PeerJ
261 papers in training set
Top 13%
0.8%
23
Communications Biology
886 papers in training set
Top 21%
0.8%
24
Medical Research Archives
11 papers in training set
Top 0.6%
0.7%
25
PLOS Pathogens
721 papers in training set
Top 9%
0.7%
26
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
27
Science Advances
1098 papers in training set
Top 30%
0.7%
28
BMC Biology
248 papers in training set
Top 5%
0.7%
29
Protein Science
221 papers in training set
Top 2%
0.7%
30
Royal Society Open Science
193 papers in training set
Top 5%
0.7%