DualLoc: Full-parameter fine-tuning of cascaded dual transformers for protein subcellular localization prediction
Chen, Y. G.; Chung, W.-Y.; Chang, K. Y.
Show abstract
Accurate protein subcellular localization is essential for biological function, and mislocalization is linked to numerous diseases. While current methods like DeepLoc 2.0 employ lightweight fine-tuning of protein language models (PLMs), their ability to predict multi-compartment localization remains limited. To address this, we introduce DualLoc, a multi-label localization predictor for ten compartments. DualLoc leverages full-parameter fine-tuning of a cascaded dual-transformer architecture, built upon foundational PLMs and augmented with attention and dropout layers. We evaluated this framework using three foundational PLMs--ProtBERT, ESM-2, and ProtT5--as backbones. Cross-validation on Swiss-Prot and independent validation on the Human Protein Atlas demonstrate consistent superiority over state-of-the-art baselines. The best-performing variant, DualLoc-ProtT5, achieves 0.5872 accuracy, 0.8271 micro-F1, and 0.7811 macro-F1, with substantial gains in the Matthews correlation coefficient for the nucleus (+0.13), cell membrane (+0.13), and extracellular space (+0.07). Pointwise mutual information analysis of model outputs reveals biologically relevant compartment couplings, notably between the Golgi apparatus and endoplasmic reticulum (PMI = 0.25, P < 10-6), accurately reflecting secretory pathway coordination. DualLoc provides both a highly accurate predictive tool and a robust framework for investigating protein multi-localization mechanisms. Author summaryWhere a protein resides within a cell determines what it does. When proteins end up in the wrong location, normal cellular function breaks down--a misplacement linked to diseases like cancer and Alzheimers. While computational tools exist to predict these locations, accurately tracking proteins that multitask across multiple cellular compartments simultaneously remains a major challenge. We developed DualLoc, a new approach that predicts protein locations across ten different cellular compartments, from the nucleus to the cell membrane. By training an advanced artificial intelligence model on large protein sequence databases, our method more accurately identifies where proteins go, especially in complex, multi-location scenarios. Importantly, our analysis revealed meaningful biological patterns. We found strong predictive links between compartments that work closely together, such as the Golgi apparatus and the endoplasmic reticulum--two organelles that coordinate protein processing and transport. This suggests our model captures genuine cellular logic rather than simply memorizing data. By improving how we predict protein localization, DualLoc helps researchers better understand normal cellular function and disease mechanisms. Our method is freely available to the biomedical community.
Matching journals
The top 4 journals account for 50% of the predicted probability mass.