Reliable prediction of short linear motifs in the human proteome
Pancsa, R.; Ficho, E.; Kalman, Z. E.; Gerdan, C.; Remenyi, I.; Zeke, A.; Tusnady, G. E.; Dobson, L.
Show abstract
Short linear motifs (SLiMs) are small, often transient interaction modules within intrinsically disordered regions (IDRs) of proteins that interact with particular domains and thereby regulate numerous biological processes. The limited sequence information within these short peptides leads to frequent false positive hits in both computational and experimental SLiM identification methods. This makes the description of novel SLiMs challenging and has limited the number of known cases to a few thousand, even though SLiMs play widespread roles in cellular functions. We present SLiMMine, a deep learning-based method to identify SLiMs in the human proteome. By refining the annotations of known, annotated motif classes, we created a high-quality dataset for model training. Using protein embeddings and neural networks, SLiMMine reliably predicts novel SLiM candidates in known classes, eliminates [~]80% of the pattern matching-based motif hits as false-positives, furthermore, it can also be used as a discovery tool to find uncharacterized SLiMs based on optimal sequence environment. In addition, we narrowed the highly general interactor-domain definitions of known SLiM classes to specific human proteins, enabling more precise prediction of a wide range of potential protein-protein interactions (PPIs) in the human interactome. SLiMMine is available in the form of an appealing, user-friendly, multi-purpose web-server at https://slimmine.pbrg.hu/.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.