Back

MirMachine 2: a scalable, evolutionarily informed pipeline for microRNA annotation and comparative genomics across thousands of animal genomes

Paynter, V. M.; Umu, S. U.; Tierney, J. A. S.; Tricomi, F. F.; Haggerty, L.; Fromm, B.

2026-05-21 bioinformatics
10.64898/2026.05.19.726197 bioRxiv
Show abstract

Genome sequencing is rapidly outpacing the annotation of conserved regulatory elements, limiting the evolutionary and comparative insights that can be extracted from expanding genome collections. MicroRNAs are among the most conserved and phylogenetically informative genes, yet automated annotation has remained difficult to scale while preserving evolutionary interpretability. Here we present MirMachine 2, an evolutionarily informed framework that combines curated reference models, lineage-aware scoring, and adaptive filtering to enable robust genome-wide microRNA annotation at scale. Applying this to thousands of animal genomes reveals that many apparent absences of conserved microRNAs reflect methodological bias rather than biological loss, particularly in underrepresented lineages. By enabling consistent and interpretable comparison of microRNA complements across large datasets, MirMachine 2 establishes scalable microRNA annotation as a practical foundation for genome-scale evolutionary and comparative genomics.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 0.9%
14.2%
2
Nature Biotechnology
147 papers in training set
Top 0.5%
12.4%
3
Genome Biology
555 papers in training set
Top 0.3%
10.3%
4
Nature Communications
4913 papers in training set
Top 23%
8.3%
5
Nature
575 papers in training set
Top 4%
6.8%
50% of probability mass above
6
Nature Methods
336 papers in training set
Top 2%
4.8%
7
Science
429 papers in training set
Top 6%
4.8%
8
Cell Systems
167 papers in training set
Top 3%
4.3%
9
PLOS ONE
4510 papers in training set
Top 39%
3.6%
10
Nature Genetics
240 papers in training set
Top 2%
3.6%
11
Cell Genomics
162 papers in training set
Top 2%
3.0%
12
Genome Medicine
154 papers in training set
Top 3%
3.0%
13
PLOS Computational Biology
1633 papers in training set
Top 21%
0.9%
14
Cell
370 papers in training set
Top 16%
0.9%
15
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.9%
16
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
0.9%
17
Scientific Reports
3102 papers in training set
Top 73%
0.8%
18
Molecular Biology and Evolution
488 papers in training set
Top 4%
0.8%
19
Genome Research
409 papers in training set
Top 4%
0.8%
20
Molecular Cell
308 papers in training set
Top 10%
0.7%
21
Bioinformatics
1061 papers in training set
Top 10%
0.7%
22
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
23
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
24
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.6%