Back

Historical plant embryos as alternative sources of ancient DNA for whole genome sequencing

Le, H. P.; Porrelli, S.; Lee, Y. K.; Juraver, S.; Pennec, F.; Nesbitt, M.; Numaguchi, K.; Gutaker, R. M.

2026-02-26 genomics
10.64898/2026.02.25.707975 bioRxiv
Show abstract

Natural history and agricultural collections, which contain hundreds of millions of specimens classified in terms of time, space, and taxonomy, are valuable resources for diverse fields of research. Since the first success of ancient DNA (aDNA) isolation in the 1980s, these repositories, including herbaria for plants, have been intensively used to support studies in taxonomy, macroevolution, and genetic responses to anthropogenic activities over the past centuries. Two major challenges of aDNA research are environmental contamination and DNA degradation. For herbarium specimens, aDNA is usually extracted from leaf samples. It is highly fragmented (typically length of 50 to 100 bp) with a higher breakdown rate than that in most bone remains. To optimise the amount of data retrieved and minimise destructive sampling, we isolated DNA from an unconventional plant tissue type - seed embryos. We carried out whole-genome sequencing and compared sequenced DNA quality between embryo and leaf tissue. We evaluated endogenous DNA proportion, median fragment length, damage fraction per site ({lambda}), decay rates, nucleotide misincorporations, and library complexity for three species: cultivated rice Oryza sativa, wild rice O. rufipogon, and wild barley Hordeum spontaneum. In O. sativa, embryos exhibited significantly higher endogenous content and median fragment length than leaves, while in O. rufipogon only median fragment length was higher. The superior DNA preservation was likely due to the protective role of the seed husk, which might play an important role in DNA preservation in plants collected in the tropics. By contrast, in temperate H. spontaneum, tissue type had minimal impact on DNA quality. Despite the minuscule size of the embryos, all derived genomic libraries were highly complex, sufficient for deep whole genome sequencing. These results highlight seed embryos as a promising alternative aDNA source for millions of herbarium specimens, and enable effective genomic analyses of other historical plant collections, such as economic botany and anthropological museum collections.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 2%
14.4%
2
PLOS ONE
4510 papers in training set
Top 22%
8.4%
3
Scientific Data
174 papers in training set
Top 0.2%
6.8%
4
DNA Research
23 papers in training set
Top 0.1%
4.9%
5
Molecular Ecology Resources
161 papers in training set
Top 0.3%
4.0%
6
Genome Biology
555 papers in training set
Top 2%
4.0%
7
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
2.7%
8
Frontiers in Genetics
197 papers in training set
Top 3%
2.7%
9
The Plant Journal
197 papers in training set
Top 2%
2.6%
50% of probability mass above
10
GigaScience
172 papers in training set
Top 0.8%
2.5%
11
Science
429 papers in training set
Top 11%
2.4%
12
Genes
126 papers in training set
Top 0.5%
2.4%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.5%
14
International Journal of Biological Macromolecules
65 papers in training set
Top 2%
1.3%
15
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.2%
16
Emerging Infectious Diseases
103 papers in training set
Top 2%
1.2%
17
Cell
370 papers in training set
Top 14%
1.2%
18
eLife
5422 papers in training set
Top 49%
1.2%
19
Journal of Genetics and Genomics
36 papers in training set
Top 1%
1.2%
20
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.1%
21
iScience
1063 papers in training set
Top 26%
0.9%
22
International Journal of Molecular Sciences
453 papers in training set
Top 13%
0.9%
23
Frontiers in Plant Science
240 papers in training set
Top 5%
0.9%
24
Gigabyte
60 papers in training set
Top 1%
0.8%
25
Cell Genomics
162 papers in training set
Top 6%
0.7%
26
Frontiers in Ecology and Evolution
60 papers in training set
Top 4%
0.7%
27
BMC Genomics
328 papers in training set
Top 6%
0.7%
28
Communications Biology
886 papers in training set
Top 24%
0.7%
29
Biology
43 papers in training set
Top 3%
0.7%
30
American Journal of Biological Anthropology
11 papers in training set
Top 0.3%
0.7%