Back

Cohort-stratified prioritization of CRISPR-Cas9 sgRNAs for HDR-mediated correction of TP53 hotspot codons in cancer

Loke, S.; Movva, N. S. V.; Hota, M.

2026-05-22 bioinformatics
10.64898/2026.05.20.726726 bioRxiv
Show abstract

TP53 is mutated in roughly half of all human cancers. Eight recurrent missense substitutions in the DNA-binding domain (R175H, Y220C, G245S, R248Q, R248W, R249S, R273H, R282W) account for most of the mutational burden. Homology-directed repair (HDR) with a wild-type donor template is one of the few feasible routes to revert these alleles, but existing CRISPR sgRNA design tools rank candidates without reference to the cancer cohort being treated. We built a reproducible pipeline that prioritizes SpCas9 sgRNAs for HDR-mediated correction of TP53 hotspot codons. The pipeline uses NM 000546.6 from NCBI, GRCh38 off-target search via Cas-OFFinder with the published Doench-2016 CFD matrices, on-target Doench-2016 (Rule Set 2) scores from CRISPOR, and per-cohort hotspot prevalence from three TCGA Pan-Cancer Atlas studies (HGSOC, n = 523; PDAC, n = 179; CRC, n = 534) accessed through cBioPortal. We enumerate guides whose cut sites fall within {+/-}10 nt of each hotspot codon, exclude any candidate that fails to map to GRCh38, and score the remainder. The final set contains 21 SpCas9 NGG sgRNAs across the seven hotspots, with no PAM-desert residues. A single candidate at R248 (TP53-248-P-ad878223; spacer GCATGGGCGGCATGAACCGG, AGG PAM; off-target specificity 0.913 over 806 reference-genome hits) ranks first in all three cohorts and holds rank 1 in 97% of 147 weight settings tested. Four additional residues (R175, Y220, R273, R282) yield within-residue tier-1 picks robust in 100% of weight settings. Cohort-specific differences appear only in cross-residue ordering: R175 and R282 climb in CRC, consistent with the higher prevalence of R175H and R282W in colorectal tumors.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 3%
22.7%
2
Genome Medicine
154 papers in training set
Top 0.4%
10.2%
3
Nature Biotechnology
147 papers in training set
Top 1%
7.2%
4
Cell Systems
167 papers in training set
Top 2%
7.2%
5
Nature Genetics
240 papers in training set
Top 1%
6.4%
50% of probability mass above
6
Genome Biology
555 papers in training set
Top 1%
6.4%
7
The American Journal of Human Genetics
206 papers in training set
Top 1%
3.6%
8
Nucleic Acids Research
1128 papers in training set
Top 6%
3.6%
9
Cell Genomics
162 papers in training set
Top 2%
3.3%
10
Bioinformatics
1061 papers in training set
Top 7%
1.8%
11
Scientific Reports
3102 papers in training set
Top 57%
1.7%
12
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
13
Nature
575 papers in training set
Top 12%
1.5%
14
PLOS ONE
4510 papers in training set
Top 58%
1.3%
15
Science
429 papers in training set
Top 18%
1.0%
16
Cell Reports Medicine
140 papers in training set
Top 6%
0.9%
17
The CRISPR Journal
33 papers in training set
Top 0.2%
0.9%
18
Nature Methods
336 papers in training set
Top 6%
0.9%
19
eLife
5422 papers in training set
Top 53%
0.9%
20
Genome Research
409 papers in training set
Top 4%
0.8%
21
Communications Biology
886 papers in training set
Top 26%
0.7%
22
Cancer Research
116 papers in training set
Top 4%
0.7%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 46%
0.7%
24
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%
25
Science Advances
1098 papers in training set
Top 33%
0.6%
26
iScience
1063 papers in training set
Top 37%
0.6%
27
PLOS Genetics
756 papers in training set
Top 17%
0.6%
28
Nature Medicine
117 papers in training set
Top 6%
0.6%
29
Advanced Science
249 papers in training set
Top 23%
0.5%
30
Nature Machine Intelligence
61 papers in training set
Top 4%
0.5%