Back

Integration of Diverse Transcriptomics Datasets using Random Forest to Predict Universal Functional Pathways in Tfr Cells

Diallo, A. B.; Cavazzoni, C. B.; Sun, J. E.; Sage, P. T.

2021-12-01 bioinformatics
10.1101/2021.11.29.470410 bioRxiv
Show abstract

MotivationT follicular regulatory (Tfr) cells are a specialized cell subset that controls humoral immunity. Despite a number of individual transcriptomic studies on these cells, core functional pathways have been difficult to uncover due to the substantial transcriptional overlap of these cells with other effector cell types, as well as transcriptional changes occurring due to disease settings. Developing a core transcriptional module for Tfr cells that integrates multiple cell type comparisons as well as diverse disease settings will allow a more accurate prediction of functional pathways. Researchers studying allergic reactions, immune responses to vaccines, autoimmunity and cancer could use this gene set to better understand the roles of Tfr cells in controlling disease progression. Additional cell types beyond Tfr cells that have similar features of transcriptomic complexity within diverse disease settings may also be studied using similar approaches. High-throughput sequencing technologies allow the generation of large datasets that require specific tools to best interpret the data. The development of a core transcriptional module for Tfr cells will allow investigators to determine if Tfr cells may have functional roles within their biological systems with little knowledge of Tfr biology. With this work, we have addressed the need of core gene modules to define specific subsets of immune cells. ResultsWe introduce an integrated "core Tfr cell gene module" that can be incorporated into GSEA analysis using various input sizes. The integrated core Tfr gene module was built using transcriptomic studies in Tfr cells from several different tissues, disease settings, and cell type comparisons. Random forest was used to integrate the transcriptomic studies to generate the core gene module. A GSEA gene set was formulated from the integrated core Tfr gene module for incorporation into end-user friendly GSEA. The gene sets are presented along with random genes taken from the GTEX data set and are presented as GMT files. The user can upload the gene set to the GSEA website or any gene set tool which takes GMT files. We also present the full results of the model including p-values calculated by random forest. This provides users with more flexibility in choosing a p-value cutoff that is most appropriate for the experimental setting. AvailabilityThe core Tfr gene sets are freely available at: https://github.com/alosdiallo/TFR_Model. We have also included all of the code and data used in developing these gene sets. The code and results are released under an MIT license. Supplementary informationSupplementary data are available at Bioinformatics online.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
ImmunoInformatics
11 papers in training set
Top 0.1%
19.1%
2
Frontiers in Immunology
586 papers in training set
Top 0.5%
10.3%
3
BMC Bioinformatics
383 papers in training set
Top 1%
6.5%
4
Computational and Structural Biotechnology Journal
216 papers in training set
Top 0.5%
6.5%
5
Bioinformatics
1061 papers in training set
Top 4%
5.0%
6
Journal of Immunological Methods
24 papers in training set
Top 0.1%
5.0%
50% of probability mass above
7
Immunology
29 papers in training set
Top 0.1%
4.4%
8
Frontiers in Genetics
197 papers in training set
Top 2%
3.7%
9
The Journal of Immunology
146 papers in training set
Top 0.5%
2.9%
10
PLOS ONE
4510 papers in training set
Top 46%
2.4%
11
Scientific Reports
3102 papers in training set
Top 47%
2.4%
12
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
13
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
1.9%
14
PLOS Computational Biology
1633 papers in training set
Top 16%
1.7%
15
Frontiers in Physiology
93 papers in training set
Top 3%
1.7%
16
BMC Medical Genomics
36 papers in training set
Top 0.6%
1.4%
17
Bioinformatics Advances
184 papers in training set
Top 4%
1.3%
18
GigaScience
172 papers in training set
Top 2%
1.3%
19
Cytometry Part A
30 papers in training set
Top 0.2%
1.1%
20
Nature Communications
4913 papers in training set
Top 58%
1.0%
21
BMC Genomics
328 papers in training set
Top 5%
0.8%
22
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
23
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
24
International Journal of Molecular Sciences
453 papers in training set
Top 16%
0.7%
25
iScience
1063 papers in training set
Top 33%
0.7%
26
Journal of Translational Medicine
46 papers in training set
Top 4%
0.5%
27
PROTEOMICS
35 papers in training set
Top 1%
0.5%
28
Cancer Research Communications
46 papers in training set
Top 2%
0.5%