Back

Cell Painting-based bioactivity prediction boosts high-throughput screening hit-rates and compound diversity

Fredin Haslum, J.; Lardeau, C.-H.; Karlsson, J.; Turkki, R.; Leuchowius, K.-J.; Smith, K.; Mullers, E.

2023-04-05 bioinformatics
10.1101/2023.04.03.535328 bioRxiv
Show abstract

Efficiently identifying bioactive compounds towards a target of interest remains a time- and resource-intensive task in early drug discovery. The ability to accurately predict bioactivity using morphological profiles has the potential to rationalize the process, enabling smaller screens of focused compound sets. Towards this goal, we explored the application of deep learning with Cell Painting, a high-content image-based assay, for compound bioactivity prediction in early drug screening. Combining Cell Painting data and unrefined single-concentration activity readouts from high-throughput screening (HTS) assays, we investigated to what degree morphological profiles could predict compound activity across a set of 140 unique assays. We evaluated the performance of our models across different target classes, assay technologies, and disease areas. The predictive performance of the models was high, with a tendency for better predictions on cell-based assays and kinase targets. The average ROC-AUC was 0.744 with 62% of assays reaching [≥]0.7, 30% reaching [≥]0.8 and 7% reaching [≥]0.9 average ROC-AUC, outperforming commonly used structure-based predictions in terms of predictive performance and compound structure diversity. In many cases, bioactivity prediction from Cell Painting data could be matched using brightfield images rather than multichannel fluorescence images. Experimental validation of our predictions in follow-up assays confirmed enrichment of active compounds. Our results suggest that models trained on Cell Painting data can predict compound activity in a range of high-throughput screening assays robustly, even with relatively noisy HTS assay data. With our approach, enriched screening sets with higher hit rates and higher hit diversity can be selected, which could reduce the size of HTS campaigns and enable primary screening with more complex assays.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.2%
22.5%
2
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
6.4%
3
Journal of Cheminformatics
25 papers in training set
Top 0.1%
6.3%
4
Scientific Reports
3102 papers in training set
Top 19%
6.3%
5
Briefings in Bioinformatics
326 papers in training set
Top 2%
4.0%
6
Advanced Science
249 papers in training set
Top 5%
3.6%
7
SLAS Discovery
25 papers in training set
Top 0.1%
3.6%
50% of probability mass above
8
Nature Communications
4913 papers in training set
Top 40%
3.6%
9
PLOS Computational Biology
1633 papers in training set
Top 10%
3.6%
10
Patterns
70 papers in training set
Top 0.3%
3.2%
11
Communications Chemistry
39 papers in training set
Top 0.1%
2.1%
12
iScience
1063 papers in training set
Top 12%
1.9%
13
Communications Biology
886 papers in training set
Top 6%
1.9%
14
Nature Machine Intelligence
61 papers in training set
Top 2%
1.8%
15
Bioinformatics
1061 papers in training set
Top 7%
1.8%
16
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.7%
17
eLife
5422 papers in training set
Top 45%
1.5%
18
npj Systems Biology and Applications
99 papers in training set
Top 2%
1.2%
19
Chemical Science
71 papers in training set
Top 1%
1.2%
20
PLOS ONE
4510 papers in training set
Top 60%
1.2%
21
International Journal of Molecular Sciences
453 papers in training set
Top 11%
1.2%
22
Journal of Medicinal Chemistry
68 papers in training set
Top 0.9%
1.1%
23
Cell Systems
167 papers in training set
Top 9%
1.1%
24
BMC Bioinformatics
383 papers in training set
Top 6%
0.9%
25
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%
26
ACS Omega
90 papers in training set
Top 4%
0.7%
27
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.7%
28
Science Advances
1098 papers in training set
Top 31%
0.7%
29
Pharmaceuticals
33 papers in training set
Top 2%
0.6%