Back

Single-cell identity definition using random forests and recursive feature elimination (scRFE)

Park, M.; Vorperian, S.; Wang, S.; Pisco, A. O.

2020-08-04 bioinformatics
10.1101/2020.08.03.233650 bioRxiv
Show abstract

Single-cell RNA sequencing (scRNA-seq) enables the detailed examination of a cells underlying regulatory networks and the molecular factors contributing to its identity. We developed scRFE with the goal of generating interpretable gene lists that can accurately distinguish observations (single-cells) by their features (genes) given a metadata category of the dataset. scRFE is an algorithm that combines the classical random forest classifier with recursive feature elimination and cross validation to find the features necessary and sufficient to classify cells in a single-cell RNA-seq dataset by ranking feature importance. It is implemented as a Python package compatible with Scanpy, enabling its seamless integration into any single-cell data analysis workflow that aims at identifying minimal transcriptional programs relevant to describing metadata features of the dataset. We applied scRFE to the Tabula Muris Senis and reproduced established aging patterns and transcription factor reprogramming protocols, highlighting the biological value of scRFEs learned features. Author summaryscRFE is a Python package that combines a random forest classifier with recursive feature elimination and cross validation to find the features necessary and sufficient to classify cells in a single-cell RNA-seq dataset by ranking feature importance. scRFE was designed to enable straightforward integration as part of any single-cell data analysis workflow that aims at identifying minimal transcriptional programs relevant to describing metadata features of the dataset.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.