picking the best crispr-cas9 targets for functional gene ...€¦ · shawn mcclelland, emily m....
TRANSCRIPT
Introduction Machine learning
Rigorous analysis of potential off-targets
Machine learning
Functionality
Rapid functional readout
Trained functionality algorithm
A perfect alignment is not required for off-targeting
The effect of incomplete alignment
Complete and fast alignment
Comparison of number of possible off-target alignments
Trained functionality algorithm Functionality & Specificity are used to select the optimal picks
Good measure of algorithm fit
Validation of algorithm in other phenotypic assays
Improved experimentation
Conclusion
dharmacon.comDharmacon and Edit-R are trademarks of Horizon Discovery Group. ©2017 Horizon Discovery—All rights reserved.
Version published September 2017. UK Registered Head Office: Building 8100, Cambridge Research Park, Cambridge, CB25 9TL, United Kingdom
UK Registration No. 05363294 | UK VAT No. GB 189 1993 44
Shawn McClelland, Emily M. Anderson, Žaklina Strezoska, Elena Maksimova, Annaleen Vermeulen, Steve Lenger, Tyler Reed, and Anja van Brabant SmithDharmacon, A Horizon Discovery Group Company, 2650 Crescent Drive, Lafayette, CO 80026, USA
Picking the best CRISPR-Cas9 targets for functional gene knockout: a machine learning algorithm based on both specificity and functionality
A Horizon Discovery Group Company
The CRISPR-Cas9 system has the potential to significantly advance basic and applied research.
Functional gene knockout is an important tool for understanding a gene’s role in a system or for specifically manipulating a known system to achieve a desired outcome.
Not all gene cleavage events result in functional knockout of the target protein.
Here we share important advancements that have helped to achieve the goal of picking the best crRNA targets for functional gene knockout, and not just formation of indels (insertions or the deletions of bases in the DNA).
Lentiviraltransduction
Selection with blasticidin
Cell population expansion
Synthetic crRNA:tracrRNAtransfection
Cas9 cell linemixed population
Cas9/crRNA:tracrRNA mixed populations
Phenotypic analysis
Cas9 cell line mixed population
Edit-RCas9
Establishment of a Cas9-expressing cell line greatly improved phenotypic consistency and ease-of-use of Edit-R synthetic crRNA:tracrRNA to assess over one thousand target regions across multiple genes in a high-throughput manner.
Recombinant U2OS cell line stably expressing a mutant human ubiquitin fused to EGFP
• Uncleavable ubiquitin moiety (Gly76Val)
• Constitutive degradation of the protein means very low EGFP fluorescence when the proteasome is functioning normally
• Disruption in proteasome-related components increases fluorescence
VCP siGENOME poolPSMD7 VCPPPIB negative control
Cas9 stable cells transfected with crRNA:tracrRNA siRNA -treated
Simple + RapidLarge amounts of quality data=
It is essential to verify that algorithm designs are tested in an assay with genes that are unrelated to the ones used to generate the training data. We used the ApoONE assay with algorithm-derived crRNA to new target genes known to further validate the functionality algorithm.
High scoring crRNAs show better function than low scoring crRNAs in an assay unrelated to the assay used to train the algorithm.
ApoONE assay box plots: crRNAs with the bottom half algorithm scores (H1) versus crRNAs with the top half of algoirthm scores (H2).
H1
3
2.5
2
1.5
1
0.5
0H2
PLK1BCL2L1 WEE1
Norm
alize
d Ap
oONE
ass
ay
crRNAs
H1
3
2.5
2
1.5
1
0.5
0H2 H1
3
2.5
2
1.5
1
0.5
0H2
Features examined include:
• nucleotide composition
• nearest neighbor effects
• PAM sequence
An algorithm is important because crRNAs vary widely in their ability to cause functional gene disruption.
An algorithm is important because crRNAs vary widely in their ability to cause functional gene disruption.
cut site distance from CDS start in transcript (nts)
cut site distance from CDS start in transcript (nts)
GFP
sig
nal,
fold
ove
r neg
ativ
e co
ntro
l
GFP
sig
nal,
fold
ove
r neg
ativ
e co
ntro
l
0
1
2
3
4
5
6
0 100 200 300 400 500 600 700 800
ControlsExon 1Exon 2Exon 3Exon 4Exon 5Exon 6Exon 7
0
1
2
3
4
5
6
800 900 1000 1100 1200 1300 1400 1500
Exon 8Exon 9Exon 10Exon 11Exon 12
0
1
2
3
4
5
6
1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500
Exon 13Exon 14Exon 15Exon 16Exon 17
0
1
2
3
4
5
6
0 100 200 300 400 500 600 700 800
ControlsExon 1Exon 2Exon 3Exon 4Exon 5Exon 6Exon 7
0
1
2
3
4
5
6
800 900 1000 1100 1200 1300 1400 1500
Exon 8Exon 9Exon 10Exon 11Exon 12
0
1
2
3
4
5
6
1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500
Exon 13Exon 14Exon 15Exon 16Exon 17
0
1
2
3
4
5
6
0 100 200 300 400 500 600 700 800
ControlsExon 1Exon 2Exon 3Exon 4Exon 5Exon 6Exon 7
0
1
2
3
4
5
6
800 900 1000 1100 1200 1300 1400 1500
Exon 8Exon 9Exon 10Exon 11Exon 12
0
1
2
3
4
5
6
1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500
Exon 13Exon 14Exon 15Exon 16Exon 17
Specificity
GeneExon 1 Exon 2 Exon 3 Exon 4 Exon 5 CDS region of exon
Exon 1 Exon 2 Exon 3
Transcript A
Transcript B
CRISPR targets
CRISPR ‘B’CRISPR ‘A’
POOR
AVERAGE
GOOD
Specificity Rating
Functionality Rating
CRISPR target ‘A’ – May function, but poor specificity�CRISPR target ‘B’ – OPTIMAL PICK, scores well in both functionality & specificity
The Dharmacon CRISPR Specificity Analysis Tool provides comprehensive alignment. Try it at: dharmacon.gelifesciences.com/tools-and-calculators/crispr-specificity-tool
P A M
P A M
P A M
P A M
P A M
mismatch in alignmentgap in alignment
CRISPR target ‘A’
Incomplete alignment to the genome
Intended targetP A M
P A M
P A M
P A M
P A M
P A M
P A M
P A M
ALL alignments to the genome
Intended target
CRISPR ‘A’ incorrectly rated
as GOOD Specificity
CRISPR ‘A’correctly rated
as POOR Specificity
Likelihood of cleavage
+++
+
+
---
-
+
++
++
NNNNmNNNNNNN-NNNNNNN|||| ||| ||| |||||||NNNNNNNN-NNNNNNNNNNN RNA
DNANN N NNNNNNNNN NNNNNNNN
Gap in RNA strand
Flawsm = mismatch- = gap
Gap in DNA strand
Tool% Alignments found by Flaw Count
Alignment Time (s)0 1 2 3
BLAST 100 100 80 23 24.8
Bowtie 2 100 100 70 29 6.27
Dharmacon Alignment 100 100 100 100 2.30
# Alignments found
0 flaws 1 flaws 2 flaws 3 flaws 4 flaws
Tool Z 1 0 0 16 227Tool C 1 0 0 NR NRTool W 1 0 0 7 142Tool O 1 0 0 7 139
Dharmacon Alignment 1 0 4 169 3732
# Alignments found
0 flaws 1 flaws 2 flaws 3 flaws 4 flaws
Tool Z 1 0 1 7 81Tool C 2 0 0 NR NRTool W NR NR NR NR NRTool O 2 0 1 4 58
Dharmacon Alignment 2 0 1 72 1915
Chromosomal sites with flaws in their alignment can create indels and are potential off-targets. Flaws include not just mismatches, but gaps as well. Most tools do not take into account gaps and most tools do not find all imperfect alignments.
Target 1 - GGTCATCTGGGAGAAAAGCG CGGhg38; chr14:54748857-54748879
Target 2 - GCTCCACGTTTATAAATAGC TGGhg38; chr14:54569552-54569574
Missing alignments
NR No Results reported for tool (error or inability)
Using two targets, we used multiple publicly available alignment tools to look at the predicted off-targets and sorted them by the total number of flaws. Results below show that the Dharmacon alignment strategy can identify potential off-targets with much more rigor than other tools.
Dharmacon’s new alignment strategy finds all possible alignments (including alignments containing gaps) and allows us to design targets that are less likely to cause off-targets.
cut site distance from CDS start in transcript (nts)
cut site distance from CDS start in transcript (nts)
GFP
sig
nal,
fold
ove
r neg
ativ
e co
ntro
l
Dharmacon training set: 10 genes, 1115 crRNA target sites
Data was used to select features and multi-dimensional features that were highest predictors
• position in exons
• distance from the start codon
Good fit of data while avoiding overfitting of data. Goal: To have good prediction of the training data set AND unrelated gene editing data
ROC (Receiver Operating Characteristic) shows good fit of training set data. The ROC measures the area under the curve of True Positive Rate vs False Positive Rate.
The ROC for our test set data is 0.78.