Predicting the Pathway Involvement of Compounds Annotated in the Reactome Knowledgebase
Huckvale, E. D.; Moseley, H. N. B.
Show abstract
MotivationPathway annotations of non-macromolecular (relatively small) biomolecules facilitate biological and biomedical interpretation of metabolomics datasets. However, low pathway annotation levels of detected biomolecules hinder this type of interpretation. Thus, predicting the pathway involvement of detected but unannotated biomolecules has high potential to improve metabolomics data analysis and omics integration. Past publications have only made use of the Kyoto Encyclopedia of Genes and Genomes derived datasets to develop machine learning models to predict pathway involvement. However, to our knowledge, the Reactome knowledgebase has not been utilized to develop these types of predictive models. ResultsWe created a dataset ready for machine learning using chemical representations of all path-way-annotated compounds available from the Reactome knowledgebase. Next, we trained and evaluated a single multilayer perceptron binary classifier using combined metabolite-pathway paired feature vectors engineered from this new dataset. While models trained on a prior corresponding KEGG dataset with 502 pathways scored a mean Matthews correlation coefficient (MCC) of 0.847 and 0.0098 standard deviation, the models trained on the Reactome dataset with 3,985 pathways demonstrated improved performance with a mean MCC of 0.916, but with a higher 0.0149 standard deviation. These results indicate that the pathways in Reactome can also be effectively predicted, greatly increasing the number of human-defined pathways available for prediction. AvailabilityCode and data for fully reproducing the results in this work are available at https://doi.org/10.6084/m9.figshare.27478065. Contacthunter.moseley@uky.edu.
Matching journals
The top 2 journals account for 50% of the predicted probability mass.