tips for effective use of blast and other ncbi tools
TRANSCRIPT
Matthew McNeill, PhD
Tips for effective use of BLAST and other NCBI tools
8/23/20161
Introduction: What is NCBI?National Center for Biotechnology Information
2
http://www.ncbi.nlm.nih.gov/
Introduction: NCBI’s available tools
3
http://www.ncbi.nlm.nih.gov/home/analyze.shtml
Introduction: NCBI’s available tools
4
http://www.ncbi.nlm.nih.gov/home/analyze.shtml
User story: Previously published paper
• A lncRNA regulates a network of genes involved in cancer processes
5
User story: Previously published paper
6Sanchez et al., Nature Communications 5,Article number:5812
User story: Previously published paper
7Sanchez et al., Nature Communications 5,Article number:5812
User story: We want to follow up on this work
Question: You have a collection of cancer cell lines. Does this lncRNA regulate the same network?
Selected tools:CRISPR – knockout lncRNA
qPCR – Analyze RNA expression of network
8
User story: We want to follow up on this work
Question: You have a collection of cancer cell lines. Does this lncRNA regulate the same network?
Selected tools:CRISPR – knockout lncRNA
qPCR – Analyze RNA expression of network
Common theme when using genetic/ genomic tools: Was my assay specific?
9
User story: Getting your gene sequences
• Identify your genes
• Downloading sequences
10
User story: Getting your gene sequences
• Identify your genes
• Downloading sequences
11
User story: Gene list
12Sanchez et al., Nature Communications 5,Article number:5812
lncRNA:
PR-‐lncRNA-‐1
Downstream genes:
TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO
User story: Identify your gene listTranslating IDs• Many options to consider
– Genome build– Gene Symbol/ Gene name– RefSeq Accession number.version
13
User story: Identify your gene listTranslating IDs• Many options to consider
– Genome build– Gene Symbol/ Gene name– RefSeq Accession number.version
• Note:– NCBI is phasing out GI numbers– Read more here: https://www.ncbi.nlm.nih.gov/news/03-02-2016-phase-
out-of-GI-numbers/
14
User story: Identify your gene listTranslating IDs—Genome build• Many options to consider
– Genome build• GRCh37/ hg19• GRCh38• GRCh38.p2
15
User story: Identify your gene listTranslating IDs—annotations
16
http://www.ncbi.nlm.nih.gov/
User story: Identify your gene listTranslating IDs—annotations—gene symbol
17
http://www.ncbi.nlm.nih.gov/gene/?term=TP53I3
User story: Identify your gene listTranslating IDs—annotations—gene name
18
http://www.ncbi.nlm.nih.gov/gene/?term=TP53I3
User story: Identify your gene listTranslating IDs—annotations—gene alias
19
http://www.ncbi.nlm.nih.gov/gene/?term=TP53I3
User story: Identify your gene listTranslating IDs—annotations—RefSeq mRNA accession
20http://www.ncbi.nlm.nih.gov/gene/9540
User story: Identify your gene listTranslating IDs—annotations—RefSeq mRNA accession
21http://www.ncbi.nlm.nih.gov/gene/9540
NM_001206802
User story: Identify your gene listTranslating IDs—annotations—RefSeq mRNA accession.version
22http://www.ncbi.nlm.nih.gov/gene/9540
NM_001206802.2
User story: Identify your gene listTranslating IDs—annotations
23
TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO
Gene symbol RefSeq mRNA accession
User story: Identify your gene listTranslating IDs—Annotations
24
TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO
https://biodbnet-‐abcc.ncifcrf.gov/db/db2db.php
Gene symbol RefSeq mRNA accession
User story: Identify your gene listTranslating IDs—Annotations
25
TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO
https://biodbnet-‐abcc.ncifcrf.gov/db/db2db.php
Gene symbol RefSeq mRNA accession
User story: Identify your gene listTranslating IDs—Annotations
26
TP53I3TGFB2SERPINB6POLA1PDK1LPPDPP4TNFRSF10DNCAPD3BCKDHBTRIO
https://biodbnet-‐abcc.ncifcrf.gov/db/db2db.php
Gene symbol RefSeq mRNA accessionNMXM
NR
User story: Getting your gene sequencesImportant background
• Identify your genes
• Downloading sequences
27
User story: Identify your gene listDownloading FASTA sequences
28http://www.ncbi.nlm.nih.gov/gene/9540
User story: Identify your gene listBatch Entrez
29
http://www.ncbi.nlm.nih.gov/sites/batchentrez
User story: Identify your gene listBatch Entrez
30
http://www.ncbi.nlm.nih.gov/sites/batchentrez
User story: Identify your gene listLog file page
31
User story: Identify your gene listDownloading output
32
User story: Identify your gene list FASTA file format>gi|332205880|ref|NM_001206802.2| Homo sapiens tumor protein p53 inducible protein 3 (TP53I3), transcript variant 3, mRNA ACAATATGTTAGCCGTGCACTTTGACAAGCCGGGAGGACCGGAAAACCTCTACGTGAAGGAGGTGGCCAA GCCGAGCCCGGGGGAGGGTGAAGTCCTCCTGAAGGTGGCGGCCAGCGCCCTGAACCGGGCGGACTTAATG CAGAGACAAGGCCAGTATGACCCACCTCCAGGAGCCAGCAACATTTTGGGACTTGAGGCATCTGGACATG TGGCAGAGCTGGGGCCTGGCTGCCAGGGACACTGGAAGATCGGGGACACAGCCATGGCTCTGCTCCCCGG TGGGGGCCAGGCTCAGTACGTCACTGTCCCCGAAGGGCTCCTCATGCCTATCCCAGAGGGATTGACCCTG ACCCAGGCTGCAGCCATCCCAGAGGCCTGGCTCACCGCCTTCCAGCTGTTACATCTTGTGGGAAATGTTC AGGCTGGAGACTATGTGCTAATCCATGCAGGACTGAGTGGTGTGGGCACAGCTGCTATCCAACTCACCCG GATGGCTGGAGCTATTCCTCTGGTCACAGCTGGCTCCCAGAAGAAGCTTCAAATGGCAGAAAAGCTTGGA GCAGCTGCTGGATTCAATTACAAAAAAGAGGATTTCTCTGAAGCAACGCTGAAATTCACCAAAGTACAAG CAAATGCTGGTGAATGCTTTCACGGAGCAAATTCTGCCTCACTTCTCCACGGAGGGCCCCCAACGTCTGC TGCCGGTTCTGGACAGAATCTACCCAGTGACCGAAATCCAGGAGGCCCATAAGTACATGGAGGCCAACAA GAACATAGGCAAGATCGTCCTGGAACTGCCCCAGTGAAGGAGGATGGGGCAGGACAGGACGCGGCCACCC CAGGCCTTTCCAGAGCAAACCTGGAGAAGATTCACAATAGACAGGCCAAGAAACCCGGTGCTTCCTCCAG AGCCGTTTAAAGCTGATATGAGGAAATAAAGAGTGAACTGGAAAAAAAAAA
33
http://www.ncbi.nlm.nih.gov/nuccore/332205880?report=fasta
User story: Identify your gene list FASTA file format>gi|332205880|ref|NM_001206802.2| Homo sapiens tumor protein p53 inducible protein 3 (TP53I3), transcript variant 3, mRNA ACAATATGTTAGCCGTGCACTTTGACAAGCCGGGAGGACCGGAAAACCTCTACGTGAAGGAGGTGGCCAA GCCGAGCCCGGGGGAGGGTGAAGTCCTCCTGAAGGTGGCGGCCAGCGCCCTGAACCGGGCGGACTTAATG CAGAGACAAGGCCAGTATGACCCACCTCCAGGAGCCAGCAACATTTTGGGACTTGAGGCATCTGGACATG TGGCAGAGCTGGGGCCTGGCTGCCAGGGACACTGGAAGATCGGGGACACAGCCATGGCTCTGCTCCCCGG TGGGGGCCAGGCTCAGTACGTCACTGTCCCCGAAGGGCTCCTCATGCCTATCCCAGAGGGATTGACCCTG ACCCAGGCTGCAGCCATCCCAGAGGCCTGGCTCACCGCCTTCCAGCTGTTACATCTTGTGGGAAATGTTC AGGCTGGAGACTATGTGCTAATCCATGCAGGACTGAGTGGTGTGGGCACAGCTGCTATCCAACTCACCCG GATGGCTGGAGCTATTCCTCTGGTCACAGCTGGCTCCCAGAAGAAGCTTCAAATGGCAGAAAAGCTTGGA GCAGCTGCTGGATTCAATTACAAAAAAGAGGATTTCTCTGAAGCAACGCTGAAATTCACCAAAGTACAAG CAAATGCTGGTGAATGCTTTCACGGAGCAAATTCTGCCTCACTTCTCCACGGAGGGCCCCCAACGTCTGC TGCCGGTTCTGGACAGAATCTACCCAGTGACCGAAATCCAGGAGGCCCATAAGTACATGGAGGCCAACAA GAACATAGGCAAGATCGTCCTGGAACTGCCCCAGTGAAGGAGGATGGGGCAGGACAGGACGCGGCCACCC CAGGCCTTTCCAGAGCAAACCTGGAGAAGATTCACAATAGACAGGCCAAGAAACCCGGTGCTTCCTCCAG AGCCGTTTAAAGCTGATATGAGGAAATAAAGAGTGAACTGGAAAAAAAAAA
34
http://www.ncbi.nlm.nih.gov/nuccore/332205880?report=fasta
User story: Identify your gene list FASTA file format>gi|332205880|ref|NM_001206802.2| Homo sapiens tumor protein p53 inducible protein 3 (TP53I3), transcript variant 3, mRNA ACAATATGTTAGCCGTGCACTTTGACAAGCCGGGAGGACCGGAAAACCTCTACGTGAAGGAGGTGGCCAA GCCGAGCCCGGGGGAGGGTGAAGTCCTCCTGAAGGTGGCGGCCAGCGCCCTGAACCGGGCGGACTTAATG CAGAGACAAGGCCAGTATGACCCACCTCCAGGAGCCAGCAACATTTTGGGACTTGAGGCATCTGGACATG TGGCAGAGCTGGGGCCTGGCTGCCAGGGACACTGGAAGATCGGGGACACAGCCATGGCTCTGCTCCCCGG TGGGGGCCAGGCTCAGTACGTCACTGTCCCCGAAGGGCTCCTCATGCCTATCCCAGAGGGATTGACCCTG ACCCAGGCTGCAGCCATCCCAGAGGCCTGGCTCACCGCCTTCCAGCTGTTACATCTTGTGGGAAATGTTC AGGCTGGAGACTATGTGCTAATCCATGCAGGACTGAGTGGTGTGGGCACAGCTGCTATCCAACTCACCCG GATGGCTGGAGCTATTCCTCTGGTCACAGCTGGCTCCCAGAAGAAGCTTCAAATGGCAGAAAAGCTTGGA GCAGCTGCTGGATTCAATTACAAAAAAGAGGATTTCTCTGAAGCAACGCTGAAATTCACCAAAGTACAAG CAAATGCTGGTGAATGCTTTCACGGAGCAAATTCTGCCTCACTTCTCCACGGAGGGCCCCCAACGTCTGC TGCCGGTTCTGGACAGAATCTACCCAGTGACCGAAATCCAGGAGGCCCATAAGTACATGGAGGCCAACAA GAACATAGGCAAGATCGTCCTGGAACTGCCCCAGTGAAGGAGGATGGGGCAGGACAGGACGCGGCCACCC CAGGCCTTTCCAGAGCAAACCTGGAGAAGATTCACAATAGACAGGCCAAGAAACCCGGTGCTTCCTCCAG AGCCGTTTAAAGCTGATATGAGGAAATAAAGAGTGAACTGGAAAAAAAAAA
35
http://www.ncbi.nlm.nih.gov/nuccore/332205880?report=fasta
User story: Identify your gene list FASTA file format>gi|332205880|ref|NM_001206802.2| Homo sapiens tumor protein p53 inducible protein 3 (TP53I3), transcript variant 3, mRNA ACAATATGTTAGCCGTGCACTTTGACAAGCCGGGAGGACCGGAAAACCTCTACGTGAAGGAGGTGGCCAA GCCGAGCCCGGGGGAGGGTGAAGTCCTCCTGAAGGTGGCGGCCAGCGCCCTGAACCGGGCGGACTTAATG CAGAGACAAGGCCAGTATGACCCACCTCCAGGAGCCAGCAACATTTTGGGACTTGAGGCATCTGGACATG TGGCAGAGCTGGGGCCTGGCTGCCAGGGACACTGGAAGATCGGGGACACAGCCATGGCTCTGCTCCCCGG TGGGGGCCAGGCTCAGTACGTCACTGTCCCCGAAGGGCTCCTCATGCCTATCCCAGAGGGATTGACCCTG ACCCAGGCTGCAGCCATCCCAGAGGCCTGGCTCACCGCCTTCCAGCTGTTACATCTTGTGGGAAATGTTC AGGCTGGAGACTATGTGCTAATCCATGCAGGACTGAGTGGTGTGGGCACAGCTGCTATCCAACTCACCCG GATGGCTGGAGCTATTCCTCTGGTCACAGCTGGCTCCCAGAAGAAGCTTCAAATGGCAGAAAAGCTTGGA GCAGCTGCTGGATTCAATTACAAAAAAGAGGATTTCTCTGAAGCAACGCTGAAATTCACCAAAGTACAAG CAAATGCTGGTGAATGCTTTCACGGAGCAAATTCTGCCTCACTTCTCCACGGAGGGCCCCCAACGTCTGC TGCCGGTTCTGGACAGAATCTACCCAGTGACCGAAATCCAGGAGGCCCATAAGTACATGGAGGCCAACAA GAACATAGGCAAGATCGTCCTGGAACTGCCCCAGTGAAGGAGGATGGGGCAGGACAGGACGCGGCCACCC CAGGCCTTTCCAGAGCAAACCTGGAGAAGATTCACAATAGACAGGCCAAGAAACCCGGTGCTTCCTCCAG AGCCGTTTAAAGCTGATATGAGGAAATAAAGAGTGAACTGGAAAAAAAAAA
36
http://www.ncbi.nlm.nih.gov/nuccore/332205880?report=fasta
5ʹ′
3ʹ′
User story: Getting your sequencesLearned so far• There are many identifiers that can be used for a gene, and those
identifiers are often updated. NCBI tracks update information.
• NCBI provides the sequence of genetic/ genomic elements for easy download individually or as batches.
37
User story: Checking for off-target CRISPR eventsCRISPR—general overview
38
https://www.idtdna.com/pages/products/genome-‐editing/crispr-‐cas9
User story: Checking for off-target CRISPR eventsCRISPR—general overview
39
https://www.idtdna.com/pages/products/genome-‐editing/crispr-‐cas9
User story: Checking for off-target CRISPR eventsCRISPR—general overview
40
https://www.idtdna.com/pages/products/genome-‐editing/crispr-‐cas9
User story: Checking for off-target CRISPR eventsCRISPR—general overview
41
https://www.idtdna.com/pages/products/genome-‐editing/crispr-‐cas9
User story: Checking for off-target captureUsing BLAST• BLAST = Basic
Local Alignment Search Tool
42
https://BLAST.ncbi.nlm.nih.gov/Blast.cgi
User story: Checking for off-target captureUsing BLAST• BLAST = Basic
Local Alignment Search Tool
43
https://BLAST.ncbi.nlm.nih.gov/Blast.cgi
User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters• Example guide RNA (crRNA) targeting PR-lncRNA-1: TTCCAAGTGGCTAAAACTAC(AGG)
44
User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters
45
User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters
46
User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters
47
User story: Checking for off-target CRISPR eventsUsing BLASTN—optional parameters
48
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
49
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
50
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
51
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
52
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
53
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
54
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
55
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
56
Perfect Match
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
57
Off-‐Target Match
User story: Checking for off-target CRISPR eventsUsing BLASTN—output
58
Off-‐Target Match
User story: Checking for off-target CRISPR eventsLearned so far• Blast is a powerful tool to look for likely off-target CRISPR activity
• Correctly parsing your BLAST output improves off-target characterization
59
User story: Checking off-target qPCR primersPCR—general overview
60
Typicaldiagram
User story: Checking off-target qPCR primersPCR—general overview
61
Typicaldiagram
First cycle
User story: Checking off-target qPCR primersPCR—general overview
62
Typicaldiagram
First cycle
Second cycle
User story: Checking for off-target qPCR primersPrimer BLAST—overview
63https://BLAST.ncbi.nlm.nih.gov/Blast.cgi
User story: Checking for off-target qPCR primersPrimer BLAST—overview
64https://BLAST.ncbi.nlm.nih.gov/Blast.cgi
User story: Checking for off-target qPCR primersPrimer BLAST—overview
65https://www.ncbi.nlm.nih.gov/tools/primer-‐BLAST/index.cgi?LINK_LOC=BlastHome
User story: Checking for off-target qPCR primersPrimer BLAST—optional parameters
66
User story: Checking for off-target qPCR primersPrimer BLAST—output
67
User story: Analyze expression of your gene network
• Design qPCR primers
• Check primers for specificity, similar to lncRNA
• Order primers!
68
User story: Checking for off-target qPCR primersLearned so far• PCR primers are consumed when they amplify a target.
• Off-target amplification will decrease the efficiency of on-target characterization for both SYBR and probe-based assays.
• Primer BLAST is a powerful tool to identify off-target regions that may be amplified.
69
Summary: Covered tools
• Gene lookup—Gene database• Gene Symbol Translation—bioDB• Fasta Sequence Download—Gene database, Batch entrez• Single Sequence Uniqueness—BLASTN• Primer Uniqueness—Primer BLAST
70
Conclusions
• NCBI provides a powerful suite of tools
• Checking for off-target hybridization, annealing, and amplification is important for genetic and genomic studies
• Proper use of settings for each informatics tools improves results
• For questions about anything we discussed, email: [email protected]
71
72
Todd AdamsonNicola Brookman-AmissahSean McCallHans PackerMaureen Young
Thanks
Nick DowneyElisabeth Wagner
Aurita Menezes
Yu Wang
Available products
73
Alt-R™ CRISPR-Cas9 System
• Cas9 protein, custom guide RNAs, and controls for genome editing• https://www.idtdna.com/pages/products/genome-editing/crispr-cas9
PrimeTime® qPCR Assays
• Predesigned primers, probes, multiple formats
• https://www.idtdna.com/pages/products/gene-expression/primetime-qpcr-assays-and-primers