Back

Systematic evaluation of integration methods and parameters on single-cell RNA-sequencing biological insights: a case study on cattle embryos.

Biase, F. H.; Morozyuk, M.; Ezepha, C.

2026-02-03 genomics
10.64898/2026.02.01.703145 bioRxiv
Show abstract

BackgroundSingle-cell RNA sequencing (scRNA-seq) integration methods remove technical variation while preserving biological signal, yet systematic frameworks for evaluating how parameter choices influence biological interpretation remain limited. Traditional benchmarking approaches evaluate single-parameter configurations per method, potentially missing systematic patterns in functional outcomes and method convergence. A framework for systematic integration parameter evaluation was developed and applied to bovine embryo development. ResultsSix integration methods (FastMNN, CCA, RPCA, scVI, Harmony, STACAS) combined with multiple parameters, including those for neighbor identification and clustering, yielded 8232 combinations. The main outputs evaluated were specific cell counts and marker identification. After filtering for extremely poor cell and marker identification, 4,287 integration parameter combinations were retained for analysis. There were three major patterns (clusters) with integration methods distributed non-randomly across clusters and distinct biological outcomes. One pattern emerged, composed of scVI and STACAS integration, dominated by the lack of identification of epiblast cells. Cluster 2 (n=29), also composed of scVI and STACAS integration, identified the most epiblast markers (n=7, 8, or 9) but had a limited number of epiblast cells (median=10). Cluster 1 (n=4,120 combinations) had the highest method diversity. Across clusters, trophoblast and mesoderm showed high functional distinctness, while epiblast and hypoblast showed moderate overlap in gene ontology classes. ConclusionsThe approach reveals that parameter choices influence cell type classification, functional interpretation, and the degree of method convergence, with implications for identifying specific biological inferences for further orthogonal validation. A systematic approach to evaluating integration methods, along with other parameters, is advisable for accurate biological inference.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
BMC Genomics
328 papers in training set
Top 0.1%
22.4%
2
GigaScience
172 papers in training set
Top 0.1%
10.4%
3
Methods in Ecology and Evolution
160 papers in training set
Top 0.4%
10.0%
4
BMC Bioinformatics
383 papers in training set
Top 1%
6.8%
5
Scientific Reports
3102 papers in training set
Top 19%
6.3%
50% of probability mass above
6
PLOS Computational Biology
1633 papers in training set
Top 7%
4.8%
7
PLOS ONE
4510 papers in training set
Top 35%
4.1%
8
Molecular Ecology Resources
161 papers in training set
Top 0.4%
2.7%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.4%
10
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.3%
11
Bioinformatics
1061 papers in training set
Top 7%
2.1%
12
Briefings in Bioinformatics
326 papers in training set
Top 3%
1.9%
13
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
14
PeerJ
261 papers in training set
Top 7%
1.7%
15
Nature Communications
4913 papers in training set
Top 52%
1.7%
16
Nucleic Acids Research
1128 papers in training set
Top 14%
1.2%
17
BMC Biology
248 papers in training set
Top 2%
1.1%
18
Genome Biology
555 papers in training set
Top 6%
0.9%
19
Frontiers in Bioinformatics
45 papers in training set
Top 0.6%
0.9%
20
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
21
Communications Biology
886 papers in training set
Top 21%
0.8%
22
iScience
1063 papers in training set
Top 33%
0.7%
23
Microbial Genomics
204 papers in training set
Top 2%
0.7%
24
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%