Genome-mining algorithm to identify identical repetitive sequences for sensitive and specific diagnostic assays for infectious diseases

Rajeswari, K.; Poojary, R.; Padiwal, S.; Krishna, R. M.; Satyamoorthy, K.; Paul, B.

2024-12-22 bioinformatics

10.1101/2024.12.20.629856 bioRxiv

Show abstract

Nucleic acid amplification-based approaches are extensively used as the first line of choice for infectious diseases. However, the success rates of DNA amplification or hybridization techniques are highly dependent on short primer or probe sequences. A pair of primers that can bind at multiple loci across the genome and randomly amplify multiple copies increases the analytical sensitivity of the currently used diagnostic assays. Herein, we developed a novel genome mining algorithm to identify short identical repeat sequences (IRSs) dispersed across the genome, which can amplify multiple nonhomologous regions of variable sizes via three potential priming combinations. Using this algorithm, we analysed the genomes of five pathogens, namely, gammaherpesvirus, vaccinia virus, Mycobacterium tuberculosis, Plasmodium falciparum, and Phytophthora palmivora, and identified short identical sequences that were repeated at multiple loci. In silico PCR revealed that these identical repeat sequences can amplify multiple copies with different amplicon sizes in these five species. We further performed a polymerase chain reaction assay with short identical repeat pairs identified from M. tuberculosis. Very interestingly, the amplification yielded multiple copies for individual IRSs and even more copies, as in a pair of IRSs. These results indicate that the IRS-based approach can detect pathogens during disease progression in the case of low-concentration DNA. The genome mining algorithm can be used as a translation technology platform for developing highly sensitive varieties of PCR, microarray, loop-mediated isothermal amplification, fluorescence in situ hybridization, and DNA-DNA hybridization-based diagnostic assays.

Genome-mining algorithm to identify identical repetitive sequences for sensitive and specific diagnostic assays for infectious diseases

Matching journals