Back

Definition of alleles and altered regulatory motifs across Cas9-edited cell populations

Ehmsen, K. T.; Knuesel, M. T.; Martinez, D.; Asahina, M.; Aridomi, H.; Yamamoto, K. R.

2019-09-19 molecular biology
10.1101/775361 bioRxiv
Show abstract

BackgroundGenetic alteration of candidate response elements at their native chromosomal loci is the only valid determinant of their potential transcriptional regulatory activities. Targeted DNA cleavage by Cas9 coupled with cellular repair processes can produce arrays of alleles that can be defined by massively parallel sequencing by synthesis (SBS), presenting an opportunity to generate and survey edited cell populations that include informative alterations. Such editing efforts commonly rely on subclonal enrichment to isolate cells with preferred genotypic properties at target loci; short nucleotide adducts (indices/barcodes) allow PCR-amplified molecules from diverse sample sources to be pooled, sequenced, and demultiplexed to resolve source-specific content. Not widely available, however, are capabilities for barcoding thousands of clones, or for automated analysis of individual candidate regulatory loci PCR-amplified and sequenced from a genetically heterogeneous population--specifically, imputation of discrete genotype(s) by allele definition and abundance, and identification of altered regulatory factor binding motifs.\n\nResultsWe describe a panel of 192 8-nucleotide barcode primers compatible with Illumina(R) sequencing platforms, and the application of these barcodes to genotypic analysis of Cas9-edited clones. Permutations of the ninety-six i7 (read 1) and ninety-six i5 (read 2) barcodes allow unique labeling of up to 9,216 distinct samples. We created three independent Python scripts: SampleSheet.py automates construction of Illumina(R) Sample Sheets encoding up to 9,216 barcode:sample relationships; ImputedGenotypes.py defines alleles and imputes genotypes from demultiplexed fastq files; CollatedMotifs.py flags transcription factor recognition motif matches altered in alleles relative to a reference sequence.\n\nConclusionsCode-enabled definition of alleles and regulatory motifs in sequenced, demultiplexed amplicons facilitates evaluation of genetic diversity in up to 9,216 distinct samples. Here, we demonstrate the utility of three scripts in analysis of cell populations targeted by Cas9 for disruption of glucocorticoid receptor (GR) binding sites near FKBP5, a GR-regulated gene in the human adenocarcinoma cell line A549. SampleSheet.py, ImputedGenotypes.py, and CollatedMotifs.py operate independently and are broadly applicable beyond the case described here.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.2%
14.4%
2
Bioinformatics
1061 papers in training set
Top 3%
10.5%
3
STAR Protocols
15 papers in training set
Top 0.1%
4.9%
4
BMC Bioinformatics
383 papers in training set
Top 2%
4.4%
5
Nucleic Acids Research
1128 papers in training set
Top 5%
4.0%
6
The CRISPR Journal
33 papers in training set
Top 0.1%
3.6%
7
Genome Research
409 papers in training set
Top 0.9%
3.6%
8
Wellcome Open Research
57 papers in training set
Top 0.3%
3.3%
9
PLOS ONE
4510 papers in training set
Top 42%
3.1%
50% of probability mass above
10
Nature Communications
4913 papers in training set
Top 42%
3.1%
11
Cell Genomics
162 papers in training set
Top 2%
2.7%
12
Genome Biology
555 papers in training set
Top 3%
2.4%
13
Scientific Reports
3102 papers in training set
Top 48%
2.4%
14
The Journal of Molecular Diagnostics
36 papers in training set
Top 0.2%
1.7%
15
npj Genomic Medicine
33 papers in training set
Top 0.4%
1.7%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.5%
17
Cell Reports Methods
141 papers in training set
Top 3%
1.3%
18
BMC Genomics
328 papers in training set
Top 3%
1.3%
19
Nature Biotechnology
147 papers in training set
Top 6%
1.2%
20
Genes & Development
90 papers in training set
Top 0.7%
1.2%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.0%
22
NAR Cancer
36 papers in training set
Top 0.1%
0.9%
23
American Journal of Respiratory Cell and Molecular Biology
38 papers in training set
Top 0.7%
0.7%
24
Human Mutation
29 papers in training set
Top 0.7%
0.7%
25
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%
26
Cancer Research Communications
46 papers in training set
Top 1%
0.7%
27
PLOS Computational Biology
1633 papers in training set
Top 26%
0.7%
28
Nature Methods
336 papers in training set
Top 7%
0.6%
29
Epigenetics & Chromatin
42 papers in training set
Top 0.4%
0.6%
30
Laboratory Investigation
13 papers in training set
Top 0.3%
0.6%