Back

Benchmarking Artificial Intelligence Models for Predicting Nuclear Receptor Activity from Tox21 Assays

Chivukula, N.; Karthikeyan, J.; Thangavel, H.; Madgaonkar, S. R.; Samal, A.

2026-03-24 pharmacology and toxicology
10.64898/2026.03.20.713297 bioRxiv
Show abstract

Tox21 assays compile extensive chemical bioactivity data across diverse biological targets, making them widely utilized resources for in silico model development. Nuclear receptor-specific assays within this dataset are particularly valuable for screening potential endocrine disrupting chemicals. This study presents a comprehensive benchmarking of diverse machine learning (ML), deep learning (DL), and transformer-based architectures with varied chemical feature representations across nuclear receptor assays. First, 43 datasets associated with 18 nuclear receptors within Tox21 assays were systematically curated from ToxCast invitrodb v4.3. Upon testing across these datasets, model performance was found to be dependent on the degree of class imbalance. Tree-based ML models such as random forest (RF) and extreme gradient boosting (XGBoost) trained on descriptors, or combination of descriptors and fingerprints, consistently outperformed in datasets with higher proportions of active chemicals (>10%), while DL models showed greater robustness for those with moderate proportions (5-10%). Further analysis revealed that approximately 40% of misclassified active chemicals occupied structurally isolated regions of the chemical space, suggesting absence of close structural analogues in the training set potentially contributed to their misclassification. External validation using in vitro and in vivo androgen and estrogen receptor bioactivity data showed generally good concordance. Finally, a systematic literature review revealed that the models in this study span wider range of architectures, feature representations, and assay endpoints, and are broadly comparable to or better than existing work. Overall, insights from this study can inform the development of more reliable in silico tools supporting new approach methodologies for nuclear receptor bioactivity predictions.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Environmental Science & Technology
64 papers in training set
Top 0.1%
17.6%
2
Scientific Reports
3102 papers in training set
Top 10%
8.4%
3
Science of The Total Environment
179 papers in training set
Top 0.9%
8.4%
4
Environment International
42 papers in training set
Top 0.2%
7.2%
5
PLOS ONE
4510 papers in training set
Top 24%
7.2%
6
Chemosphere
15 papers in training set
Top 0.1%
4.0%
50% of probability mass above
7
Journal of Agricultural and Food Chemistry
14 papers in training set
Top 0.3%
3.6%
8
Archives of Toxicology
14 papers in training set
Top 0.1%
3.6%
9
Toxicological Sciences
38 papers in training set
Top 0.2%
2.6%
10
Frontiers in Pharmacology
100 papers in training set
Top 1%
2.4%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.1%
12
Molecules
37 papers in training set
Top 0.6%
1.9%
13
International Journal of Environmental Research and Public Health
124 papers in training set
Top 4%
1.7%
14
Journal of Hazardous Materials
19 papers in training set
Top 0.5%
1.7%
15
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.5%
16
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
17
MethodsX
14 papers in training set
Top 0.2%
1.1%
18
Nature Communications
4913 papers in training set
Top 57%
1.1%
19
Environmental Research
46 papers in training set
Top 1%
1.0%
20
Scientific Data
174 papers in training set
Top 2%
0.9%
21
NeuroToxicology
11 papers in training set
Top 0.3%
0.9%
22
Peer Community Journal
254 papers in training set
Top 3%
0.8%
23
Journal of Controlled Release
39 papers in training set
Top 0.9%
0.8%
24
Science Advances
1098 papers in training set
Top 28%
0.8%
25
Pest Management Science
32 papers in training set
Top 1%
0.7%
26
PLOS Computational Biology
1633 papers in training set
Top 25%
0.7%
27
RSC Advances
18 papers in training set
Top 1%
0.7%
28
ACS Pharmacology & Translational Science
40 papers in training set
Top 1%
0.7%
29
Frontiers in Chemistry
14 papers in training set
Top 0.5%
0.6%
30
ACS Omega
90 papers in training set
Top 5%
0.6%