Back

DrugPTM-Bench: A Large-Scale Dataset for Predictive Modeling of Drug-Induced Cell Type-Specific Protein Post-Translational Modifications

Badkul, A.; Mottaqi, M.; Xie, L.; Xie, L.

2026-04-30 systems biology
10.64898/2026.04.27.721113 bioRxiv
Show abstract

Protein post-translational modifications (PTMs), particularly phosphorylation, serve as the primary "molecular switches" that orchestrate cellular signaling and drug response. While PTM dysregulation is a hallmark of cancer and neurodegeneration, the lack of standardized, drug-perturbed datasets has hindered the development of predictive models capable of capturing context-dependent PTM responses. Effective predictive modeling must therefore integrate multidimensional data, including the specific drug, dosage, treatment duration, cellular background, and the modified site. However, existing PTM resources remain largely static and fail to capture drug-induced regulation across these critical dimensions. To address this gap, we present DrugPTM-Bench, a curated, large-scale benchmark derived from decryptM-derived dose-dependent PTM measurements, standardizing site-level drug response across 7 cancer cell lines, 27 drugs, and 11,167 proteins. Comprising 99.5% phosphorylation events, the dataset includes six time points, 16 dosage levels, and pEC50 potency values (half-maximal effective concentration). We formulate a classification task to identify upregulated, downregulated, or unchanged PTM sites (following a drug treatment), a critical step in deciphering drug Mechanism of Action (MoA) and target engagement. Our evaluation reveals that in protein-disjoint out-of-distribution (OOD) setting, baseline machine learning and deep learning models struggle to recover minority regulation classes, while standard rebalancing strategies improve recall only at the cost of precision and overall F1-score. These results indicate that current methods do not learn robust decision boundaries between regulated and unchanged PTM events. DrugPTM-Bench provides a phosphoproteomics benchmark for modeling drug-induced PTM regulation in imbalanced biological settings. Beyond classification, DrugPTM-Benchs retention of pEC50 values, drug perturbation profiles, and site-level sequence context enables additional predictive tasks including drug potency regression, mechanism-of-action prediction from PTM fingerprints, and drug-specific PTM site sensitivity ranking, establishing a multi-task benchmark for PTM-centric drug discovery. Ultimately, DrugPTM-Bench establishes a rigorous framework for developing robust, context-aware models to elucidate drug MoA and signaling dynamics.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
42.3%
2
Nature Communications
4913 papers in training set
Top 25%
7.3%
3
npj Systems Biology and Applications
99 papers in training set
Top 0.4%
4.2%
50% of probability mass above
4
Nature Machine Intelligence
61 papers in training set
Top 0.8%
3.8%
5
Nature Methods
336 papers in training set
Top 3%
3.3%
6
Cancer Cell
38 papers in training set
Top 0.6%
2.9%
7
npj Digital Medicine
97 papers in training set
Top 2%
2.2%
8
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 29%
2.0%
9
Cell Reports Methods
141 papers in training set
Top 2%
1.9%
10
Cell Reports
1338 papers in training set
Top 23%
1.8%
11
Molecular Systems Biology
142 papers in training set
Top 0.5%
1.8%
12
Genome Medicine
154 papers in training set
Top 5%
1.6%
13
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.4%
14
Nature
575 papers in training set
Top 12%
1.4%
15
Patterns
70 papers in training set
Top 1%
1.4%
16
Bioinformatics
1061 papers in training set
Top 8%
1.4%
17
iScience
1063 papers in training set
Top 20%
1.3%
18
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.3%
19
PLOS Computational Biology
1633 papers in training set
Top 21%
1.0%
20
Nature Medicine
117 papers in training set
Top 4%
0.8%
21
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%
22
Cancer Research
116 papers in training set
Top 3%
0.8%
23
Bioinformatics Advances
184 papers in training set
Top 5%
0.8%
24
Communications Biology
886 papers in training set
Top 28%
0.7%
25
Advanced Science
249 papers in training set
Top 23%
0.5%
26
Science Advances
1098 papers in training set
Top 34%
0.5%
27
npj Precision Oncology
48 papers in training set
Top 2%
0.5%
28
Scientific Reports
3102 papers in training set
Top 79%
0.5%