Back

PAMPHLET: A Robust Toolkit for Precise PAM Prediction and Unveiling PAM Consistency in Highly Co-occurrence CRISPR-Cas Systems

Qi, C.; Shen, X.; Li, B.; Liu, C.; Huang, L.; Lan, H.; Chen, D.; Jiang, Y.; Wang, D.

2024-04-09 bioinformatics
10.1101/2024.04.09.587696 bioRxiv
Show abstract

The CRISPR-Cas technology has sparked a new technological revolution, significantly enhancing our ability to understand and engineer organisms. The nuclease that underpins this technology is evolving from the "One Cas9 for all" model to a diverse CRISPR toolbox. Identifying PAM sequences is a critical bottleneck in developing novel Cas proteins. Given the limitations of experimental methods, bioinformatics approaches are essential for predicting PAM sequences of Cas proteins in advance. To date, there are only a few PAM sequence prediction programs, and their accuracy is relatively low due to the limited number of spacers in CRISPR-Cas systems. To overcome this challenge, we have developed a pipeline named PAMPHLET, which innovatively utilizes homology searches of Cas proteins to identify additional spacers. PAMPHLET was tested on 20 CRISPR-Cas systems with known PAMs, increasing the number of spacers by up to 18-fold compared to the original datasets and successfully predicting 18 PAM sequences for protospacers. For rigorous and high-quality wet-lab validation of the predictions made by PAMPHLET, we employed the published DocMF platform. This platform leverages next-generation sequencing chips to profile protein-DNA interactions and can simultaneously screen both 5 and 3 PAMs with high throughput. The PAMPHLET predictions showed high consistency with the DocMF results for four novel Cas proteins. We expect that PAMPHLET will overcome the current limitations in PAM sequence prediction, expedite the discovery of PAM sequences, and help to shorten the development cycle for CRISPR tools. Remarkably, PAMPHLET has revealed an intriguing genomic phenomenon: the C2c9 and C2c10 systems, which lack the canonical adaptation module, possess identical PAM sequences to those found in co-occurring type I systems, suggesting potential shared spacer acquisition mechanisms. This finding highlights the complex evolutionary relationships of CRISPR-Cas systems and propels us toward a deeper understanding of their mechanistic diversity and adaptability.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 6%
18.4%
2
The CRISPR Journal
33 papers in training set
Top 0.1%
14.2%
3
Cell Genomics
162 papers in training set
Top 0.2%
10.0%
4
Nucleic Acids Research
1128 papers in training set
Top 2%
8.3%
50% of probability mass above
5
Cell Systems
167 papers in training set
Top 3%
4.1%
6
Genome Biology
555 papers in training set
Top 2%
3.9%
7
Computational and Structural Biotechnology Journal
216 papers in training set
Top 4%
1.8%
8
ACS Synthetic Biology
256 papers in training set
Top 2%
1.7%
9
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 4%
1.7%
10
Cell Reports Methods
141 papers in training set
Top 2%
1.7%
11
Advanced Science
249 papers in training set
Top 11%
1.7%
12
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
13
PLOS Computational Biology
1633 papers in training set
Top 17%
1.6%
14
eLife
5422 papers in training set
Top 45%
1.5%
15
Nature Biotechnology
147 papers in training set
Top 5%
1.5%
16
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.5%
17
Bioinformatics
1061 papers in training set
Top 8%
1.2%
18
Genome Medicine
154 papers in training set
Top 6%
1.1%
19
Journal of Molecular Biology
217 papers in training set
Top 3%
1.1%
20
Frontiers in Genetics
197 papers in training set
Top 8%
0.9%
21
Communications Biology
886 papers in training set
Top 17%
0.9%
22
Horticulture Research
43 papers in training set
Top 1%
0.9%
23
PLOS ONE
4510 papers in training set
Top 65%
0.9%
24
Scientific Reports
3102 papers in training set
Top 73%
0.8%
25
iScience
1063 papers in training set
Top 30%
0.8%
26
PLOS Genetics
756 papers in training set
Top 15%
0.7%
27
Genome Research
409 papers in training set
Top 5%
0.7%
28
Briefings in Bioinformatics
326 papers in training set
Top 8%
0.6%