Back

The genome of the toxic invasive species Heracleum sosnowskyi carries an increased number of genes despite the absence of recent whole-genome duplications

Schelkunov, M. I.; Shtratnikova, V. Y.; Klepikova, A. V.; Makarenko, M. S.; Omelchenko, D. O.; Novikova, L. A.; Obukhova, E. N.; Bogdanov, V. P.; Penin, A. A. A.; Logacheva, M. D.

2023-02-15 genomics
10.1101/2023.02.14.528432 bioRxiv
Show abstract

Heracleum sosnowskyi, belonging to a group of giant hogweeds, is a plant with large effects on ecosystems and human health. It is an invasive species that contributes to the deterioration of grassland ecosystems. The ability of H. sosnowskyi to produce linear furanocoumarins (FCs), photosensitizing compounds, makes it very dangerous. At the same time, linear FCs are compounds with high pharmaceutical value that are used in skin disease therapies. Despite this high importance, it has not been the focus of genetic and genomic studies. Here, we report a chromosome-scale assembly of the Sosnowskys hogweed genome. Genomic analysis revealed an unusually high number of genes (55 206) in the hogweed genome, in contrast to the 25-35 thousand found in most plants. However, we did not find any traces of recent whole genome duplications not shared with its confamiliar, Daucus carota (carrot), which has approximately thirty thousand genes. The analysis of the genomic proximity of duplicated genes indicates tandem duplications as a main reason for this increase. We performed a genome-wide search of the genes of the FC biosynthesis pathway and their expression in aboveground plant parts. Using a combination of expression data and phylogenetic analysis, we found candidate genes for psoralen synthase and experimentally showed the activity of one of them using a heterologous yeast expression system. These findings expand our knowledge on the evolution of gene space in plants and lay a foundation for further analysis of hogweed as an invasive plant and as a source of FCs.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
The Plant Journal
197 papers in training set
Top 0.1%
28.2%
2
Frontiers in Plant Science
240 papers in training set
Top 1%
6.5%
3
Scientific Reports
3102 papers in training set
Top 17%
6.4%
4
New Phytologist
309 papers in training set
Top 1%
4.9%
5
The Plant Cell
141 papers in training set
Top 0.8%
3.7%
6
Horticulture Research
43 papers in training set
Top 0.7%
2.9%
50% of probability mass above
7
Genes
126 papers in training set
Top 0.5%
2.7%
8
Nature Communications
4913 papers in training set
Top 46%
2.1%
9
Plant Biotechnology Journal
56 papers in training set
Top 0.5%
2.1%
10
Plant Direct
81 papers in training set
Top 0.9%
2.1%
11
PLOS ONE
4510 papers in training set
Top 50%
1.9%
12
BMC Genomics
328 papers in training set
Top 2%
1.7%
13
Current Biology
596 papers in training set
Top 9%
1.7%
14
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
15
Journal of Experimental Botany
195 papers in training set
Top 2%
1.7%
16
Genome Biology and Evolution
280 papers in training set
Top 1.0%
1.7%
17
Open Biology
95 papers in training set
Top 0.8%
1.5%
18
PLOS Genetics
756 papers in training set
Top 10%
1.4%
19
Plant Communications
35 papers in training set
Top 0.9%
1.4%
20
Communications Biology
886 papers in training set
Top 12%
1.4%
21
International Journal of Biological Macromolecules
65 papers in training set
Top 2%
1.4%
22
Genomics
60 papers in training set
Top 2%
1.2%
23
Molecular Plant
36 papers in training set
Top 1%
1.2%
24
Peer Community Journal
254 papers in training set
Top 3%
0.9%
25
BMC Biology
248 papers in training set
Top 3%
0.8%
26
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.8%
27
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.7%
28
Plant Molecular Biology
18 papers in training set
Top 0.4%
0.7%
29
Plants
39 papers in training set
Top 2%
0.7%
30
Plant Physiology
217 papers in training set
Top 3%
0.7%