Back

In silico restriction site analysis of whole genome sequences shows patterns caused by selection and sequence duplications

Vedder, L.; Schoof, H.

2026-05-16 genomics
10.64898/2026.05.15.725336 bioRxiv
Show abstract

Biological sequences are known to be not random. Thus, the comparison of in silico restriction fragment distributions of random and biological sequences may be an indicator of this non-randomness. Our analyses show that for most of the tested combinations of restriction enzyme and genome sequence the fragments per Megabase of the biological sequence deviate at least more then 10% from the corresponding random sequence. This deviation goes into both directions, i.e. clearly increased values are as common as clearly decreased values. Although there is no species- or restriction-enzyme-specific effect, a clear impact of the GC content both of the restriction site and of the genome sequence can be seen. In contrast to the random sequences, the genome sequences show distinct peaks in their fragment length distributions, hinting to repetitive elements such as transposons.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 0.8%
18.7%
2
Peer Community Journal
254 papers in training set
Top 0.1%
14.4%
3
PLOS ONE
4510 papers in training set
Top 18%
10.2%
4
BMC Genomics
328 papers in training set
Top 0.4%
4.9%
5
Frontiers in Microbiology
375 papers in training set
Top 3%
3.6%
50% of probability mass above
6
PeerJ
261 papers in training set
Top 2%
3.6%
7
Frontiers in Genetics
197 papers in training set
Top 2%
3.1%
8
Genes
126 papers in training set
Top 0.6%
2.1%
9
Frontiers in Plant Science
240 papers in training set
Top 3%
1.9%
10
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
11
Biology
43 papers in training set
Top 0.7%
1.7%
12
Genomics
60 papers in training set
Top 1.0%
1.7%
13
Applied Microbiology and Biotechnology
26 papers in training set
Top 0.1%
1.7%
14
Gene
41 papers in training set
Top 1%
1.5%
15
Gene Reports
13 papers in training set
Top 0.3%
1.5%
16
International Journal of Molecular Sciences
453 papers in training set
Top 8%
1.5%
17
Journal of Molecular Evolution
21 papers in training set
Top 0.2%
1.5%
18
Heliyon
146 papers in training set
Top 3%
1.2%
19
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
1.1%
20
Viruses
318 papers in training set
Top 4%
0.9%
21
Microorganisms
101 papers in training set
Top 2%
0.8%
22
Molecular Biology Reports
19 papers in training set
Top 0.5%
0.8%
23
Biology Methods and Protocols
53 papers in training set
Top 2%
0.8%
24
Molecular Genetics and Genomics
11 papers in training set
Top 0.4%
0.8%
25
Genome Biology and Evolution
280 papers in training set
Top 2%
0.8%
26
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.7%
27
BMC Microbiology
35 papers in training set
Top 2%
0.6%
28
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.9%
0.6%
29
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%
30
Archives of Clinical and Biomedical Research
28 papers in training set
Top 3%
0.6%