picking the best crispr-cas9 targets for functional gene ...€¦ · shawn mcclelland, emily m....

1
Introduction Machine learning Rigorous analysis of potential off-targets Machine learning Functionality Rapid functional readout Trained functionality algorithm A perfect alignment is not required for off-targeting The effect of incomplete alignment Complete and fast alignment Comparison of number of possible off-target alignments Trained functionality algorithm Functionality & Specificity are used to select the optimal picks Good measure of algorithm fit Validation of algorithm in other phenotypic assays Improved experimentation Conclusion dharmacon.com Dharmacon and Edit-R are trademarks of Horizon Discovery Group. ©2017 Horizon Discovery—All rights reserved. Version published September 2017. UK Registered Head Office: Building 8100, Cambridge Research Park, Cambridge, CB25 9TL, United Kingdom UK Registration No. 05363294 | UK VAT No. GB 189 1993 44 Shawn McClelland, Emily M. Anderson, Žaklina Strezoska, Elena Maksimova, Annaleen Vermeulen, Steve Lenger, Tyler Reed, and Anja van Brabant Smith Dharmacon, A Horizon Discovery Group Company, 2650 Crescent Drive, Lafayette, CO 80026, USA Picking the best CRISPR-Cas9 targets for functional gene knockout: a machine learning algorithm based on both specificity and functionality A Horizon Discovery Group Company The CRISPR-Cas9 system has the potential to significantly advance basic and applied research. Functional gene knockout is an important tool for understanding a gene’s role in a system or for specifically manipulating a known system to achieve a desired outcome. Not all gene cleavage events result in functional knockout of the target protein. Here we share important advancements that have helped to achieve the goal of picking the best crRNA targets for functional gene knockout, and not just formation of indels (insertions or the deletions of bases in the DNA). Lenviral transducon Selecon with blascidin Cell populaon expansion Synthec crRNA:tracrRNA transfecon Cas9 cell line mixed populaon Cas9/crRNA: tracrRNA mixed populaons Phenotypic analysis Cas9 cell line mixed populaon Edit-R Cas9 Establishment of a Cas9-expressing cell line greatly improved phenotypic consistency and ease-of-use of Edit-R synthetic crRNA:tracrRNA to assess over one thousand target regions across multiple genes in a high-throughput manner. Recombinant U2OS cell line stably expressing a mutant human ubiquitin fused to EGFP • Uncleavable ubiquitin moiety (Gly76Val) • Constitutive degradation of the protein means very low EGFP fluorescence when the proteasome is functioning normally • Disruption in proteasome-related components increases fluorescence V C P s i G E N O M E p o o l P S M D 7 V C P P P I B n e g a v e c o n t r o l Cas9 stable cells transfected with crRNA:tracrRNA siRNA - treated Simple + Rapid Large amounts of quality data = It is essential to verify that algorithm designs are tested in an assay with genes that are unrelated to the ones used to generate the training data. We used the ApoONE assay with algorithm-derived crRNA to new target genes known to further validate the functionality algorithm. High scoring crRNAs show better function than low scoring crRNAs in an assay unrelated to the assay used to train the algorithm. ApoONE assay box plots: crRNAs with the bottom half algorithm scores (H1) versus crRNAs with the top half of algoirthm scores (H2). H1 3 2.5 2 1.5 1 0.5 0 H2 PLK1 BCL2L1 WEE1 Normalized ApoONE assay crRNAs H1 3 2.5 2 1.5 1 0.5 0 H2 H1 3 2.5 2 1.5 1 0.5 0 H2 Features examined include: • nucleotide composition • nearest neighbor effects • PAM sequence An algorithm is important because crRNAs vary widely in their ability to cause functional gene disruption. An algorithm is important because crRNAs vary widely in their ability to cause functional gene disruption. cut site distance from CDS start in transcript (nts) GFP signal, fold over negative control GFP signal, fold over negative control 0 1 2 3 4 5 6 0 100 200 300 400 500 600 700 800 Controls Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Exon 7 0 1 2 3 4 5 6 800 900 1000 1100 1200 1300 1400 1500 Exon 8 Exon 9 Exon 10 Exon 11 Exon 12 6 0 1 2 3 4 5 6 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 Exon 13 Exon 14 Exon 15 Exon 16 Exon 17 Specificity Gene Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 CDS region of exon Exon 1 Exon 2 Exon 3 Transcript A Transcript B CRISPR targets CRISPR ‘B’ CRISPR ‘A’ POOR AVERAGE GOOD Speci city Rang Funconality Rang CRISPR target ‘A’ – May funcon, but poor specificity� CRISPR target ‘B’ – OPTIMAL PICK, scores well in both funconality & specificity The Dharmacon CRISPR Specificity Analysis Tool provides comprehensive alignment. Try it at: dharmacon.gelifesciences.com/tools-and-calculators/crispr-specificity-tool P A M P A M P A M P A M P A M mismatch in alignment gap in alignment CRISPR target ‘A’ Incomplete alignment to the genome Intended target P A M P A M P A M P A M P A M P A M P A M P A M ALL alignments to the genome Intended target CRISPR ‘A’ incorrectly rated as GOOD Specificity CRISPR ‘A’ correctly rated as POOR Specificity Likelihood of cleavage +++ + + --- - + ++ ++ NNNNmNNNNNNN-NNNNNNN |||| ||| ||| ||||||| NNNNNNNN-NNNNNNNNNNN RNA DNA N N N N N N NNNNNN NNNNNNNN Gap in RNA strand Flaws m = mismatch - = gap Gap in DNA strand Tool % Alignments found by Flaw Count Alignment Time (s) 0 1 2 3 BLAST 100 100 80 23 24.8 Bowtie 2 100 100 70 29 6.27 Dharmacon Alignment 100 100 100 100 2.30 # Alignments found 0 flaws 1 flaws 2 flaws 3 flaws 4 flaws Tool Z 1 0 0 16 227 Tool C 1 0 0 NR NR Tool W 1 0 0 7 142 Tool O 1 0 0 7 139 Dharmacon Alignment 1 0 4 169 3732 # Alignments found 0 flaws 1 flaws 2 flaws 3 flaws 4 flaws Tool Z 1 0 1 7 81 Tool C 2 0 0 NR NR Tool W NR NR NR NR NR Tool O 2 0 1 4 58 Dharmacon Alignment 2 0 1 72 1915 Chromosomal sites with flaws in their alignment can create indels and are potential off- targets. Flaws include not just mismatches, but gaps as well. Most tools do not take into account gaps and most tools do not find all imperfect alignments. Target 1 - GGTCATCTGGGAGAAAAGCG CGG hg38; chr14:54748857-54748879 Target 2 - GCTCCACGTTTATAAATAGC TGG hg38; chr14:54569552-54569574 Missing alignments NR No Results reported for tool (error or inability) Using two targets, we used multiple publicly available alignment tools to look at the predicted off-targets and sorted them by the total number of flaws. Results below show that the Dharmacon alignment strategy can identify potential off-targets with much more rigor than other tools. Dharmacon’s new alignment strategy finds all possible alignments (including alignments containing gaps) and allows us to design targets that are less likely to cause off-targets. cut site distance from CDS start in transcript (nts) cut site distance from CDS start in transcript (nts) GFP signal, fold over negative control Dharmacon training set: 10 genes, 1115 crRNA target sites Data was used to select features and multi-dimensional features that were highest predictors • position in exons • distance from the start codon Good fit of data while avoiding overfitting of data. Goal: To have good prediction of the training data set AND unrelated gene editing data ROC (Receiver Operating Characteristic) shows good fit of training set data. The ROC measures the area under the curve of True Positive Rate vs False Positive Rate. The ROC for our test set data is 0.78 .

Upload: others

Post on 02-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Picking the best CRISPR-Cas9 targets for functional gene ...€¦ · Shawn McClelland, Emily M. Anderson, Žaklina Strezoska, Elena Maksimova, Annaleen Vermeulen, Steve Lenger, Tyler

Introduction Machine learning

Rigorous analysis of potential off-targets

Machine learning

Functionality

Rapid functional readout

Trained functionality algorithm

A perfect alignment is not required for off-targeting

The effect of incomplete alignment

Complete and fast alignment

Comparison of number of possible off-target alignments

Trained functionality algorithm Functionality & Specificity are used to select the optimal picks

Good measure of algorithm fit

Validation of algorithm in other phenotypic assays

Improved experimentation

Conclusion

dharmacon.comDharmacon and Edit-R are trademarks of Horizon Discovery Group. ©2017 Horizon Discovery—All rights reserved.

Version published September 2017. UK Registered Head Office: Building 8100, Cambridge Research Park, Cambridge, CB25 9TL, United Kingdom

UK Registration No. 05363294 | UK VAT No. GB 189 1993 44

Shawn McClelland, Emily M. Anderson, Žaklina Strezoska, Elena Maksimova, Annaleen Vermeulen, Steve Lenger, Tyler Reed, and Anja van Brabant SmithDharmacon, A Horizon Discovery Group Company, 2650 Crescent Drive, Lafayette, CO 80026, USA

Picking the best CRISPR-Cas9 targets for functional gene knockout: a machine learning algorithm based on both specificity and functionality

A Horizon Discovery Group Company

The CRISPR-Cas9 system has the potential to significantly advance basic and applied research.

Functional gene knockout is an important tool for understanding a gene’s role in a system or for specifically manipulating a known system to achieve a desired outcome.

Not all gene cleavage events result in functional knockout of the target protein.

Here we share important advancements that have helped to achieve the goal of picking the best crRNA targets for functional gene knockout, and not just formation of indels (insertions or the deletions of bases in the DNA).

Lentiviraltransduction

Selection with blasticidin

Cell population expansion

Synthetic crRNA:tracrRNAtransfection

Cas9 cell linemixed population

Cas9/crRNA:tracrRNA mixed populations

Phenotypic analysis

Cas9 cell line mixed population

Edit-RCas9

Establishment of a Cas9-expressing cell line greatly improved phenotypic consistency and ease-of-use of Edit-R synthetic crRNA:tracrRNA to assess over one thousand target regions across multiple genes in a high-throughput manner.

Recombinant U2OS cell line stably expressing a mutant human ubiquitin fused to EGFP

• Uncleavable ubiquitin moiety (Gly76Val)

• Constitutive degradation of the protein means very low EGFP fluorescence when the proteasome is functioning normally

• Disruption in proteasome-related components increases fluorescence

VCP siGENOME poolPSMD7 VCPPPIB negative control

Cas9 stable cells transfected with crRNA:tracrRNA siRNA -treated

Simple + RapidLarge amounts of quality data=

It is essential to verify that algorithm designs are tested in an assay with genes that are unrelated to the ones used to generate the training data. We used the ApoONE assay with algorithm-derived crRNA to new target genes known to further validate the functionality algorithm.

High scoring crRNAs show better function than low scoring crRNAs in an assay unrelated to the assay used to train the algorithm.

ApoONE assay box plots: crRNAs with the bottom half algorithm scores (H1) versus crRNAs with the top half of algoirthm scores (H2).

H1

3

2.5

2

1.5

1

0.5

0H2

PLK1BCL2L1 WEE1

Norm

alize

d Ap

oONE

ass

ay

crRNAs

H1

3

2.5

2

1.5

1

0.5

0H2 H1

3

2.5

2

1.5

1

0.5

0H2

Features examined include:

• nucleotide composition

• nearest neighbor effects

• PAM sequence

An algorithm is important because crRNAs vary widely in their ability to cause functional gene disruption.

An algorithm is important because crRNAs vary widely in their ability to cause functional gene disruption.

cut site distance from CDS start in transcript (nts)

cut site distance from CDS start in transcript (nts)

GFP

sig

nal,

fold

ove

r neg

ativ

e co

ntro

l

GFP

sig

nal,

fold

ove

r neg

ativ

e co

ntro

l

0

1

2

3

4

5

6

0 100 200 300 400 500 600 700 800

ControlsExon 1Exon 2Exon 3Exon 4Exon 5Exon 6Exon 7

0

1

2

3

4

5

6

800 900 1000 1100 1200 1300 1400 1500

Exon 8Exon 9Exon 10Exon 11Exon 12

0

1

2

3

4

5

6

1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500

Exon 13Exon 14Exon 15Exon 16Exon 17

0

1

2

3

4

5

6

0 100 200 300 400 500 600 700 800

ControlsExon 1Exon 2Exon 3Exon 4Exon 5Exon 6Exon 7

0

1

2

3

4

5

6

800 900 1000 1100 1200 1300 1400 1500

Exon 8Exon 9Exon 10Exon 11Exon 12

0

1

2

3

4

5

6

1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500

Exon 13Exon 14Exon 15Exon 16Exon 17

0

1

2

3

4

5

6

0 100 200 300 400 500 600 700 800

ControlsExon 1Exon 2Exon 3Exon 4Exon 5Exon 6Exon 7

0

1

2

3

4

5

6

800 900 1000 1100 1200 1300 1400 1500

Exon 8Exon 9Exon 10Exon 11Exon 12

0

1

2

3

4

5

6

1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500

Exon 13Exon 14Exon 15Exon 16Exon 17

Specificity

GeneExon 1 Exon 2 Exon 3 Exon 4 Exon 5 CDS region of exon

Exon 1 Exon 2 Exon 3

Transcript A

Transcript B

CRISPR targets

CRISPR ‘B’CRISPR ‘A’

POOR

AVERAGE

GOOD

Specificity Rating

Functionality Rating

CRISPR target ‘A’ – May function, but poor specificity�CRISPR target ‘B’ – OPTIMAL PICK, scores well in both functionality & specificity

The Dharmacon CRISPR Specificity Analysis Tool provides comprehensive alignment. Try it at: dharmacon.gelifesciences.com/tools-and-calculators/crispr-specificity-tool

P A M

P A M

P A M

P A M

P A M

mismatch in alignmentgap in alignment

CRISPR target ‘A’

Incomplete alignment to the genome

Intended targetP A M

P A M

P A M

P A M

P A M

P A M

P A M

P A M

ALL alignments to the genome

Intended target

CRISPR ‘A’ incorrectly rated

as GOOD Specificity

CRISPR ‘A’correctly rated

as POOR Specificity

Likelihood of cleavage

+++

+

+

---

-

+

++

++

NNNNmNNNNNNN-NNNNNNN|||| ||| ||| |||||||NNNNNNNN-NNNNNNNNNNN RNA

DNANN N NNNNNNNNN NNNNNNNN

Gap in RNA strand

Flawsm = mismatch- = gap

Gap in DNA strand

Tool% Alignments found by Flaw Count

Alignment Time (s)0 1 2 3

BLAST 100 100 80 23 24.8

Bowtie 2 100 100 70 29 6.27

Dharmacon Alignment 100 100 100 100 2.30

# Alignments found

0 flaws 1 flaws 2 flaws 3 flaws 4 flaws

Tool Z 1 0 0 16 227Tool C 1 0 0 NR NRTool W 1 0 0 7 142Tool O 1 0 0 7 139

Dharmacon Alignment 1 0 4 169 3732

# Alignments found

0 flaws 1 flaws 2 flaws 3 flaws 4 flaws

Tool Z 1 0 1 7 81Tool C 2 0 0 NR NRTool W NR NR NR NR NRTool O 2 0 1 4 58

Dharmacon Alignment 2 0 1 72 1915

Chromosomal sites with flaws in their alignment can create indels and are potential off-targets. Flaws include not just mismatches, but gaps as well. Most tools do not take into account gaps and most tools do not find all imperfect alignments.

Target 1 - GGTCATCTGGGAGAAAAGCG CGGhg38; chr14:54748857-54748879

Target 2 - GCTCCACGTTTATAAATAGC TGGhg38; chr14:54569552-54569574

Missing alignments

NR No Results reported for tool (error or inability)

Using two targets, we used multiple publicly available alignment tools to look at the predicted off-targets and sorted them by the total number of flaws. Results below show that the Dharmacon alignment strategy can identify potential off-targets with much more rigor than other tools.

Dharmacon’s new alignment strategy finds all possible alignments (including alignments containing gaps) and allows us to design targets that are less likely to cause off-targets.

cut site distance from CDS start in transcript (nts)

cut site distance from CDS start in transcript (nts)

GFP

sig

nal,

fold

ove

r neg

ativ

e co

ntro

l

Dharmacon training set: 10 genes, 1115 crRNA target sites

Data was used to select features and multi-dimensional features that were highest predictors

• position in exons

• distance from the start codon

Good fit of data while avoiding overfitting of data. Goal: To have good prediction of the training data set AND unrelated gene editing data

ROC (Receiver Operating Characteristic) shows good fit of training set data. The ROC measures the area under the curve of True Positive Rate vs False Positive Rate.

The ROC for our test set data is 0.78.