Back

Improving Hi-C contact matrices using genome graphs

Shen, Y.; Yu, L.; Qiu, Y.; Zhang, T.; Kingsford, C.

2023-11-12 genomics
10.1101/2023.11.08.566275 bioRxiv
Show abstract

Three-dimensional chromosome structure plays an important role in fundamental genomic functions. Hi-C, a high-throughput, sequencing-based technique, has drastically expanded our comprehension of 3D chromosome structures. The first step of Hi-C analysis pipeline involves mapping sequencing reads from Hi-C to linear reference genomes. However, the linear reference genome does not incorporate genetic variation information, which can lead to incorrect read alignments, especially when analyzing samples with substantial genomic differences from the reference such as cancer samples. Using genome graphs as the reference facilitates more accurate mapping of reads, however, new algorithms are required for inferring linear genomes from Hi-C reads mapped on genome graphs and constructing corresponding Hi-C contact matrices, which is a prerequisite for the subsequent steps of the Hi-C analysis such as identifying topologically associated domains and calling chromatin loops. We introduce the problem of genome sequence inference from Hi-C data mediated by genome graphs. We formalize this problem, show the hardness of solving this problem, and introduce a novel heuristic algorithm specifically tailored to this problem. We provide a theoretical analysis to evaluate the efficacy of our algorithm. Finally, our empirical experiments indicate that the linear genomes inferred from our method lead to the creation of improved Hi-C contact matrices. These enhanced matrices show a reduction in erroneous patterns caused by structural variations and are more effective in accurately capturing the structures of topologically associated domains.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Bioinformatics
1061 papers in training set
Top 2%
18.2%
2
Genome Research
409 papers in training set
Top 0.1%
12.2%
3
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.1%
8.2%
4
PLOS Computational Biology
1633 papers in training set
Top 5%
6.7%
5
Genome Biology
555 papers in training set
Top 1%
6.2%
50% of probability mass above
6
BMC Bioinformatics
383 papers in training set
Top 2%
4.7%
7
Frontiers in Genetics
197 papers in training set
Top 2%
3.5%
8
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.5%
9
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.5%
10
iScience
1063 papers in training set
Top 7%
2.8%
11
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.0%
12
PLOS ONE
4510 papers in training set
Top 51%
1.8%
13
Journal of Computational Biology
37 papers in training set
Top 0.1%
1.8%
14
Scientific Reports
3102 papers in training set
Top 60%
1.7%
15
Bioinformatics Advances
184 papers in training set
Top 3%
1.7%
16
Nature Computational Science
50 papers in training set
Top 0.8%
1.5%
17
Nucleic Acids Research
1128 papers in training set
Top 12%
1.5%
18
Journal of Genetics and Genomics
36 papers in training set
Top 1%
1.3%
19
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
20
BMC Genomics
328 papers in training set
Top 4%
0.9%
21
Nature Communications
4913 papers in training set
Top 63%
0.7%
22
Communications Biology
886 papers in training set
Top 25%
0.7%
23
Cell Systems
167 papers in training set
Top 13%
0.7%
24
GigaScience
172 papers in training set
Top 4%
0.6%