Back

ARACoFusion: Uncertainty-aware calibrated deep learning for protein-protein interaction network prediction in Arabidopsis thaliana

Sarkar, D.; Sarkar, C.

2026-05-26 bioinformatics
10.64898/2026.05.22.727120 bioRxiv
Show abstract

Accurate mapping of the Arabidopsis thaliana protein-protein interaction (PPI) network is essential for deciphering complexity of plant systems biology. Here, we present ARACoFusion, a specialized deep learning architecture designed to predict inter-protein connectivity directly from primary sequences. To capture the asymmetric dependencies between plant proteins, the framework utilizes a reciprocal cross-attention encoder combined with latent interaction projections and multi-source feature fusion. Addressing the severe class imbalance inherent in plant interactomes, the model integrates uncertainty-aware variance regularization and focal loss with label smoothing, further enhancing reliability through post-hoc probability calibration via temperature scaling. Extensive benchmarking on gold-standard Arabidopsis datasets demonstrates that ARACoFusion significantly outperforms existing plant-specific predictors, achieving superior scores in Area Under the Precision-Recall Curve (AUPRC), Balanced Accuracy, and Matthews Correlation Coefficient (MCC). Additionally, the model exhibits robust cross-species generalization and clear class separability in t-SNE latent space visualizations. To facilitate community-wide usage, we provide a dedicated web server for scalable network-level inference at https://ARAcofusion.compbiosysnbu.in/.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 6%
18.6%
2
Nature Machine Intelligence
61 papers in training set
Top 0.1%
10.4%
3
Bioinformatics
1061 papers in training set
Top 3%
10.1%
4
Briefings in Bioinformatics
326 papers in training set
Top 0.8%
6.4%
5
Advanced Science
249 papers in training set
Top 3%
6.4%
50% of probability mass above
6
Nucleic Acids Research
1128 papers in training set
Top 4%
4.8%
7
Nature Methods
336 papers in training set
Top 3%
3.7%
8
Bioinformatics Advances
184 papers in training set
Top 2%
3.2%
9
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.1%
10
Cell Systems
167 papers in training set
Top 6%
2.1%
11
Patterns
70 papers in training set
Top 0.6%
2.1%
12
Communications Biology
886 papers in training set
Top 7%
1.8%
13
Genome Biology
555 papers in training set
Top 4%
1.7%
14
New Phytologist
309 papers in training set
Top 3%
1.5%
15
Nature Biotechnology
147 papers in training set
Top 6%
1.2%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
17
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
18
BMC Bioinformatics
383 papers in training set
Top 5%
1.2%
19
Genome Medicine
154 papers in training set
Top 7%
0.9%
20
Molecular Plant
36 papers in training set
Top 1%
0.9%
21
Scientific Reports
3102 papers in training set
Top 70%
0.9%
22
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
23
iScience
1063 papers in training set
Top 29%
0.8%
24
Cell Genomics
162 papers in training set
Top 7%
0.7%
25
Genome Research
409 papers in training set
Top 4%
0.7%
26
Nature Plants
84 papers in training set
Top 2%
0.7%
27
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
28
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.7%
29
Plant Communications
35 papers in training set
Top 1%
0.7%
30
Cell Reports
1338 papers in training set
Top 36%
0.6%