Back

Enhanced Thompson Sampling by Roulette Wheel Selection for Screening Ultra-Large Combinatorial Libraries

Zhao, H.; Nittinger, E.; Tyrchan, C.

2024-05-21 bioinformatics
10.1101/2024.05.16.594622 bioRxiv
Show abstract

Chemical space exploration has gained significant interest with the increase in available building blocks, which enables the creation of ultra-large virtual libraries containing billions or even trillions of compounds. However, the challenge of selecting most suitable compounds for synthesis arises, and one such challenge is hit expansion. Recently, Thompson sampling, a probabilistic search approach, has been proposed by Walters et al. to achieve efficiency gains by operating in the reagent space rather than the product space. Here, we aim to address some of its shortcomings and propose optimizations. We introduce a warmup routine to ensure that initial probabilities are set for all reagents with a minimum number of molecules evaluated. Additionally, a roulette wheel selection is proposed with adapted stop criteria to improve sampling efficiency, and belief distributions of reagents are only updated when they appear in new molecules. We demonstrate that a 100% recovery rate can be achieved by sampling 0.1% of the fully enumerated library, showcasing the effectiveness of our proposed optimizations.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.1%
63.4%
50% of probability mass above
2
Bioinformatics
1061 papers in training set
Top 6%
2.8%
3
Journal of Cheminformatics
25 papers in training set
Top 0.2%
2.5%
4
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.2%
5
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.0%
6
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
7
Molecules
37 papers in training set
Top 0.8%
1.7%
8
Scientific Reports
3102 papers in training set
Top 63%
1.4%
9
Nature Communications
4913 papers in training set
Top 54%
1.4%
10
PLOS ONE
4510 papers in training set
Top 58%
1.4%
11
Chemical Science
71 papers in training set
Top 1%
1.3%
12
iScience
1063 papers in training set
Top 25%
0.9%
13
Frontiers in Molecular Biosciences
100 papers in training set
Top 3%
0.9%
14
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.9%
15
Communications Chemistry
39 papers in training set
Top 0.9%
0.8%
16
Journal of Medicinal Chemistry
68 papers in training set
Top 1%
0.8%
17
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
18
ACS Synthetic Biology
256 papers in training set
Top 3%
0.7%
19
RSC Advances
18 papers in training set
Top 2%
0.7%
20
ACS Omega
90 papers in training set
Top 5%
0.5%
21
Journal of Chemical Theory and Computation
126 papers in training set
Top 1.0%
0.5%