Back

A study of k-mer patterns within DNA molecules sequenced from a chondrite

Farage, C.; Bachelet, I.

2026-01-27 bioinformatics
10.64898/2026.01.26.701670 bioRxiv
Show abstract

The known tree of life occupies an infinitesimal region of the space of all mathematically possible evolutionary histories, yet our sequence analysis frameworks are implicitly calibrated to it and to its associated compositional and grammatical regularities. Here we analyze nucleic acid molecules sequenced from the Zag meteorite as part of a broader effort to understand how nucleic acid sequence composition and higher-order structure are shaped under chemically divergent environments. We characterize these sequences across multiple analytical layers, and show that they lack signatures of protein-coding organization, translational periodicity, or known biological grammar. At the same time, they deviate significantly from random or composition-only null models, displaying constrained complexity and low-dimensional structure in k-mer frequency space. Multiple tests place amplification and sequencing-driven artifacts and metagenomic contaminants at a low likelihood. Taken together, these findings indicate that the Zag sequences occupy an unusual region of sequence space that is not readily accounted for by known biological or technical models, thereby narrowing, but not resolving, the range of plausible explanations and motivating independent replication and further investigation.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 4%
12.5%
2
PLOS ONE
4510 papers in training set
Top 27%
6.4%
3
Scientific Reports
3102 papers in training set
Top 17%
6.4%
4
Genome Biology
555 papers in training set
Top 1%
6.4%
5
Molecular Biology and Evolution
488 papers in training set
Top 0.9%
4.9%
6
Journal of Structural Biology
58 papers in training set
Top 0.3%
4.3%
7
mSystems
361 papers in training set
Top 3%
3.6%
8
PLOS Computational Biology
1633 papers in training set
Top 12%
2.5%
9
Science
429 papers in training set
Top 13%
1.8%
10
PeerJ
261 papers in training set
Top 6%
1.8%
50% of probability mass above
11
Nature Communications
4913 papers in training set
Top 51%
1.7%
12
npj Microgravity
11 papers in training set
Top 0.1%
1.7%
13
iScience
1063 papers in training set
Top 17%
1.5%
14
Nano Letters
63 papers in training set
Top 2%
1.5%
15
Cell Systems
167 papers in training set
Top 8%
1.5%
16
Microbiome
139 papers in training set
Top 2%
1.3%
17
eLife
5422 papers in training set
Top 47%
1.3%
18
Genome Biology and Evolution
280 papers in training set
Top 1%
1.3%
19
RNA
169 papers in training set
Top 0.3%
1.2%
20
Frontiers in Microbiology
375 papers in training set
Top 7%
1.2%
21
Mitochondrion
11 papers in training set
Top 0.1%
1.2%
22
PNAS Nexus
147 papers in training set
Top 0.7%
1.1%
23
Journal of Molecular Evolution
21 papers in training set
Top 0.3%
1.0%
24
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.9%
25
Genetics
225 papers in training set
Top 3%
0.9%
26
Frontiers in Genetics
197 papers in training set
Top 9%
0.8%
27
Cell Reports
1338 papers in training set
Top 31%
0.8%
28
Computational and Structural Biotechnology Journal
216 papers in training set
Top 8%
0.8%
29
PLOS Genetics
756 papers in training set
Top 14%
0.8%
30
Communications Biology
886 papers in training set
Top 21%
0.8%