Back

Genome-mining algorithm to identify identical repetitive sequences for sensitive and specific diagnostic assays for infectious diseases

Rajeswari, K.; Poojary, R.; Padiwal, S.; Krishna, R. M.; Satyamoorthy, K.; Paul, B.

2024-12-22 bioinformatics
10.1101/2024.12.20.629856 bioRxiv
Show abstract

Nucleic acid amplification-based approaches are extensively used as the first line of choice for infectious diseases. However, the success rates of DNA amplification or hybridization techniques are highly dependent on short primer or probe sequences. A pair of primers that can bind at multiple loci across the genome and randomly amplify multiple copies increases the analytical sensitivity of the currently used diagnostic assays. Herein, we developed a novel genome mining algorithm to identify short identical repeat sequences (IRSs) dispersed across the genome, which can amplify multiple nonhomologous regions of variable sizes via three potential priming combinations. Using this algorithm, we analysed the genomes of five pathogens, namely, gammaherpesvirus, vaccinia virus, Mycobacterium tuberculosis, Plasmodium falciparum, and Phytophthora palmivora, and identified short identical sequences that were repeated at multiple loci. In silico PCR revealed that these identical repeat sequences can amplify multiple copies with different amplicon sizes in these five species. We further performed a polymerase chain reaction assay with short identical repeat pairs identified from M. tuberculosis. Very interestingly, the amplification yielded multiple copies for individual IRSs and even more copies, as in a pair of IRSs. These results indicate that the IRS-based approach can detect pathogens during disease progression in the case of low-concentration DNA. The genome mining algorithm can be used as a translation technology platform for developing highly sensitive varieties of PCR, microarray, loop-mediated isothermal amplification, fluorescence in situ hybridization, and DNA-DNA hybridization-based diagnostic assays.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 6%
10.2%
2
BMC Bioinformatics
383 papers in training set
Top 1%
6.9%
3
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 1.0%
6.4%
4
PLOS ONE
4510 papers in training set
Top 27%
6.4%
5
Briefings in Bioinformatics
326 papers in training set
Top 0.8%
6.4%
6
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
4.9%
7
Frontiers in Microbiology
375 papers in training set
Top 2%
4.9%
8
Microbiology Spectrum
435 papers in training set
Top 1%
2.8%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.1%
50% of probability mass above
10
BMC Genomics
328 papers in training set
Top 2%
2.1%
11
Analytical Chemistry
205 papers in training set
Top 1%
2.1%
12
Talanta
12 papers in training set
Top 0.3%
1.8%
13
BioTechniques
24 papers in training set
Top 0.1%
1.8%
14
Bioinformatics
1061 papers in training set
Top 7%
1.7%
15
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
16
Viruses
318 papers in training set
Top 3%
1.3%
17
PeerJ
261 papers in training set
Top 9%
1.3%
18
Journal of Genetics and Genomics
36 papers in training set
Top 1%
1.1%
19
Clinical Chemistry
22 papers in training set
Top 0.6%
1.0%
20
Frontiers in Plant Science
240 papers in training set
Top 5%
0.9%
21
ACS Synthetic Biology
256 papers in training set
Top 2%
0.9%
22
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
23
mSystems
361 papers in training set
Top 6%
0.9%
24
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
25
ACS Omega
90 papers in training set
Top 4%
0.8%
26
SLAS Technology
11 papers in training set
Top 0.3%
0.8%
27
Heliyon
146 papers in training set
Top 6%
0.8%
28
Synthetic and Systems Biotechnology
10 papers in training set
Top 0.5%
0.8%
29
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
30
Analytica Chimica Acta
17 papers in training set
Top 0.6%
0.8%