Back

metaJAM: a Nextflow integrated metagenomic workflow for sedimentary ancient DNA

Johnson, E.; Jin, C.; Guinet, B.; Alumbaugh, J.; Martin, N. L.

2026-05-07 bioinformatics
10.64898/2026.05.05.722689 bioRxiv
Show abstract

The application of metagenomics in ancient DNA (aDNA) research is rapidly expanding, driven in particular by advances in sedimentary aDNA research and sequencing technologies. Although many ancient DNA studies rely on broadly similar bioinformatic strategies, there is still no single standardized, widely adopted workflow. These differences can directly affect how efficiently past biodiversity can be reconstructed and authenticated from the various archives analyzed using ancient metagenomic approaches. Although a few pipelines tackle the processing of ancient DNA data from shotgun sequencing, the ones applied to metagenomic datasets are scarce and often resource-intensive or challenging to install, update, or extend with new tools and parameters. metaJAM, a scalable and user-friendly pipeline, is presented here to specifically address the challenges of metagenomic aDNA analyses of eukaryotes. The pipeline has been designed in Nextflow to ensure continuous development and can be used on different high-performance computing (HPC) clusters. metaJAM integrates all key steps required for ancient DNA metagenomic analyses, from raw sequencing data pre-processing to microbial filtering, taxonomic assignment via competitive iterative mapping against Bowtie 2 reference indexes and reassignment using lowest common ancestor (LCA) inference. Validation and authentication are performed using the post-LCA toolkit bamdam together with alignment to an exhaustive reference database using MMseqs2. It allows users to choose among alternative tools and generates a series of plots to support data visualization and taxon authentication. metaJAM differs from existing pipelines through its implementation of rigorous filtering of microbial-like reads by Kraken 2 classification and masking microbial-like regions, iterative or parallel Bowtie 2 mapping, validation of the detected taxa and integration of up-to-date tools for ancient metagenomic analysis, along with diagnostic plots that help users assess the reliability of taxonomic assignments and visualize their data. It complies well with limited computational resources, customised databases for taxonomical groups, and provides an accessible workflow to support the investigation of metagenomic ancient DNA datasets. Its applications span a range of contexts, from ecosystem reconstructions in environmental aDNA archives such as sediments, to metagenomic studies on archaeological artefacts and even taxonomic identification of undiagnosed biological materials.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 0.9%
26.1%
2
Genome Biology
555 papers in training set
Top 1%
6.4%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.6%
6.4%
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.4%
5
PLOS ONE
4510 papers in training set
Top 31%
4.9%
50% of probability mass above
6
Nucleic Acids Research
1128 papers in training set
Top 4%
4.9%
7
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.3%
8
Microbiome
139 papers in training set
Top 1.0%
3.6%
9
Bioinformatics Advances
184 papers in training set
Top 2%
3.1%
10
Methods in Ecology and Evolution
160 papers in training set
Top 1%
2.6%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.6%
12
Molecular Ecology Resources
161 papers in training set
Top 0.5%
2.1%
13
GigaScience
172 papers in training set
Top 0.9%
2.1%
14
Cell Reports Methods
141 papers in training set
Top 2%
1.9%
15
Scientific Reports
3102 papers in training set
Top 57%
1.7%
16
Nature Communications
4913 papers in training set
Top 54%
1.5%
17
iScience
1063 papers in training set
Top 26%
0.9%
18
Frontiers in Bioinformatics
45 papers in training set
Top 0.7%
0.8%
19
Scientific Data
174 papers in training set
Top 2%
0.8%
20
Viruses
318 papers in training set
Top 5%
0.8%
21
Journal of Proteome Research
215 papers in training set
Top 2%
0.8%
22
Nature Biotechnology
147 papers in training set
Top 8%
0.8%
23
Gigabyte
60 papers in training set
Top 2%
0.7%
24
PLOS Computational Biology
1633 papers in training set
Top 27%
0.6%
25
SoftwareX
15 papers in training set
Top 0.5%
0.6%
26
PeerJ
261 papers in training set
Top 17%
0.6%
27
Journal of Open Source Software
22 papers in training set
Top 0.3%
0.6%
28
International Journal of Molecular Sciences
453 papers in training set
Top 17%
0.6%
29
Peer Community Journal
254 papers in training set
Top 4%
0.6%
30
Open Research Europe
14 papers in training set
Top 0.3%
0.5%