Back

Bio-Mol:Pretraining Multimodality Bioactivity Profile for Enhancing Small Molecule Property Prediction

Yip, H. F.; Wei, X.; Li, Z.; Ren, Q.; Cao, D.; Zhang, L.; Lu, A.

2023-11-05 bioinformatics
10.1101/2023.11.02.565401 bioRxiv
Show abstract

Non-optimized pharmacokinetic parameters serve as the primary cause of failure in clinical trials of drugs. Therefore, the successful prediction of pharmacokinetic parameters during the pre-clinical stage is crucial for the success of drug candidates. Conventional methods primarily rely on 2D structural information, while advanced models extend the features to other structural-related information or use advanced computational models to improve prediction accuracy. However, to gain a comprehensive understanding of small molecules, integrating bioactivity profiles with chemical structural information is essential. One significant challenge in this integration is the high proportion of missing values within experimentally validated bioactivity profiles for most small molecules. To address this challenge, we introduce Bio-Mol, an artificial intelligence model designed to effectively handle this issue. Bio-Mol utilizes a pretrain and finetune strategy, enabling the incorporation of a large proportion of missing bioactivity profiles during the small molecule representation learning process. Comprehensive evaluations of Bio-Mol demonstrate a notable improvement in predicting molecule properties. The integration of missing bioactivity profiles enhances the AUROC of average 5.2% compared to the previous state-of-the-art models predictions. Furthermore, we explore the potential of Bio-Mol in predicting synergistic drug combinations, highlighting its versatility and broader applications in the field of drug discovery. The successful implementation of Bio-Mol showcases its efficacy in over-coming the challenges posed by missing bioactivity profile data. This model paves the way for optimizing small molecule pharmacokinetics prediction, providing valuable insights for drug development and discovery processes.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.1%
27.8%
2
Journal of Cheminformatics
25 papers in training set
Top 0.1%
14.4%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.4%
10.1%
50% of probability mass above
4
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.5%
6.4%
5
Bioinformatics
1061 papers in training set
Top 4%
4.9%
6
Scientific Reports
3102 papers in training set
Top 45%
2.6%
7
PLOS ONE
4510 papers in training set
Top 45%
2.6%
8
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
9
Bioinformatics Advances
184 papers in training set
Top 2%
2.1%
10
Computers in Biology and Medicine
120 papers in training set
Top 2%
2.1%
11
npj Systems Biology and Applications
99 papers in training set
Top 1%
1.5%
12
BMC Bioinformatics
383 papers in training set
Top 5%
1.5%
13
International Journal of Molecular Sciences
453 papers in training set
Top 10%
1.3%
14
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.2%
15
Nature Machine Intelligence
61 papers in training set
Top 2%
1.2%
16
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.4%
1.1%
17
Frontiers in Molecular Biosciences
100 papers in training set
Top 4%
0.9%
18
Molecules
37 papers in training set
Top 2%
0.9%
19
Advanced Science
249 papers in training set
Top 19%
0.7%
20
Expert Systems with Applications
11 papers in training set
Top 0.6%
0.6%
21
Communications Chemistry
39 papers in training set
Top 2%
0.6%
22
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.5%