Back

Design of DNA Aptamers for Lyme disease Diagnosis Combining experimental and numerical approaches

GAYRAUD, G.; Davila Felipe, M.; Padiolleau-Lefevre, S.; Maffucci, I.; Issouani, E. M.; Guerin, M.; Da Ponte, H.

2026-05-15 bioinformatics
10.64898/2026.05.13.724892 bioRxiv
Show abstract

Aptamers are single stranded DNA or RNA molecules selected for their high affinity and specificity to bind target molecules, similar to antibodies. They are commonly selected through the SELEX process, which involves the iterative exposure of a random sequence library to a target and retaining the sequences showing good binding properties. To improve Lyme disease detection, we propose designing aptamers that specifically bind to the CspZ protein on the surface of Borrelia burgdorferi, the bacterium responsible for the disease. Starting with a SELEX process consisting of thirteen rounds, from which selected in vitro sequence candidates have emerged, we aim to propose a holistic process that selects in silico new sequence candidates that are further validated experimentally. Our approach relies on 1) using Machine Learning (ML) techniques, specifically a Restricted Boltzmann Machine (RBM), to digitally replicate the last round of the SELEX process, 2) integrating insights from text analysis methods, such as word2vec and n-grams, into the RBM model trained on the final-round SELEX dataset to represent and compare newly generated sequences with in vitro candidates, 3) selecting in silico sequences with strong potential to bind to CspZ protein, 4) experimentally validating the selected in silico sequences of step 3. Our holistic approach combines biological insights with statistical models to improve the efficiency and outcome of the SELEX process. We enhance the RBM model, designed to replicate the distribution of the final SELEX round, by integrating geometric representations of sequences, which is especially advantageous when dealing with limited datasets relative to the vast sequence space. In addition, it provides in silico sequence candidates with strong binding properties.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Scientific Reports
3102 papers in training set
Top 10%
8.3%
2
PLOS ONE
4510 papers in training set
Top 24%
7.1%
3
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.6%
6.3%
4
Bioinformatics
1061 papers in training set
Top 4%
6.3%
5
Nucleic Acids Research
1128 papers in training set
Top 3%
6.3%
6
Briefings in Bioinformatics
326 papers in training set
Top 1%
4.8%
7
Bioinformatics Advances
184 papers in training set
Top 0.9%
4.3%
8
PLOS Computational Biology
1633 papers in training set
Top 8%
4.3%
9
BMC Bioinformatics
383 papers in training set
Top 2%
3.9%
50% of probability mass above
10
Journal of Chemical Information and Modeling
207 papers in training set
Top 1%
3.8%
11
Frontiers in Genetics
197 papers in training set
Top 2%
3.2%
12
iScience
1063 papers in training set
Top 8%
2.6%
13
Frontiers in Molecular Biosciences
100 papers in training set
Top 0.8%
2.6%
14
Biology Methods and Protocols
53 papers in training set
Top 0.5%
2.3%
15
Physical Biology
43 papers in training set
Top 0.8%
2.1%
16
ImmunoInformatics
11 papers in training set
Top 0.1%
1.9%
17
ACS Synthetic Biology
256 papers in training set
Top 1%
1.8%
18
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.6%
19
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.2%
1.5%
20
International Journal of Molecular Sciences
453 papers in training set
Top 9%
1.5%
21
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.3%
22
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
23
Genes
126 papers in training set
Top 2%
0.9%
24
Frontiers in Bioengineering and Biotechnology
88 papers in training set
Top 3%
0.8%
25
PeerJ
261 papers in training set
Top 14%
0.8%
26
Viruses
318 papers in training set
Top 5%
0.7%
27
Journal of Computational Chemistry
11 papers in training set
Top 0.2%
0.7%
28
GigaScience
172 papers in training set
Top 3%
0.7%
29
Frontiers in Bioinformatics
45 papers in training set
Top 1.0%
0.7%
30
Archives of Clinical and Biomedical Research
28 papers in training set
Top 2%
0.7%