Back

A Divide-and-Conquer Approach to Large-Scale Evolutionary Analysis of Single-Cell DNA Data

Liu, Y.; Nakhleh, L.

2025-03-17 cancer biology
10.1101/2024.04.28.591536 bioRxiv
Show abstract

Single-cell sequencing technology is producing large datasets, often containing thousands or even tens of thousands of single-cell genomic data points from an individual patient. Evolutionary analyses of these data sets help uncover and order genetic variants in the data as well as elucidate mutation trees and intra-tumor heterogeneity (ITH) in the case of cancer data sets. To enable such large-scale analyses computationally, we propose a divide-and-conquer approach that could be used to scale up computationally intensive inference methods. The approach consists of four steps: 1) partitioning the dataset into subsets, 2) constructing a rooted tree for each subset, 3) computing a representative genotype for each subset by utilizing its inferred tree, and 4) assembling the individual trees using a tree built on the representative genotypes. Besides its flexibility and enabling scalability, this approach also lends itself naturally to ITH analysis, as the clones would be the individual subsets, and the "assembly tree" could be the mutation tree that defines the clones. To demonstrate the effectiveness of our proposed approach, we conducted experiments employing a range of methods at each stage. In particular, as clustering and dimensionality reduction methods are commonly used to tame the complexity of large datasets in this area, we analyzed the performance of a variety of such methods within our approach.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Journal of Computational Biology
37 papers in training set
Top 0.1%
14.3%
2
Genome Research
409 papers in training set
Top 0.1%
10.0%
3
Bioinformatics Advances
184 papers in training set
Top 0.2%
9.1%
4
Bioinformatics
1061 papers in training set
Top 3%
7.1%
5
PLOS Computational Biology
1633 papers in training set
Top 5%
7.1%
6
PLOS ONE
4510 papers in training set
Top 28%
6.3%
50% of probability mass above
7
iScience
1063 papers in training set
Top 5%
3.6%
8
Genome Medicine
154 papers in training set
Top 2%
3.6%
9
Scientific Reports
3102 papers in training set
Top 50%
2.1%
10
Biostatistics
21 papers in training set
Top 0.1%
1.9%
11
BMC Bioinformatics
383 papers in training set
Top 4%
1.9%
12
Genome Biology
555 papers in training set
Top 4%
1.8%
13
PLOS Genetics
756 papers in training set
Top 9%
1.7%
14
Frontiers in Molecular Biosciences
100 papers in training set
Top 2%
1.7%
15
Frontiers in Genetics
197 papers in training set
Top 5%
1.7%
16
Communications Biology
886 papers in training set
Top 13%
1.3%
17
Nature Communications
4913 papers in training set
Top 55%
1.3%
18
Cell Systems
167 papers in training set
Top 8%
1.3%
19
Genome Biology and Evolution
280 papers in training set
Top 1%
1.3%
20
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.4%
1.2%
21
Nucleic Acids Research
1128 papers in training set
Top 15%
0.9%
22
Genetics
225 papers in training set
Top 4%
0.8%
23
PeerJ
261 papers in training set
Top 14%
0.8%
24
Frontiers in Bioinformatics
45 papers in training set
Top 0.8%
0.8%
25
Nature Genetics
240 papers in training set
Top 7%
0.7%
26
Patterns
70 papers in training set
Top 2%
0.7%
27
eLife
5422 papers in training set
Top 61%
0.6%
28
European Journal of Human Genetics
49 papers in training set
Top 2%
0.6%