wssp chapter 8 blastx translated dna vs protein searches atttaccgtg ttggattgaa attatcttgc atgagccagc...
TRANSCRIPT
WSSP Chapter 8BLASTX Translated DNA vs Protein searches
atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgctga ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgttgg attgaaggta attatcttgc atgagccagc tgatgagtat gatacagttt
8-3© 2014 WSSP
Query AGG TCG TTA CTA TCG AGG AGT AGA | | | | Sbjct CGT AGC CTT TTG AGT CGA TCG CGG 16% Identity
BLASTN Match
Why do a BLASTX if we have done a BLASTN?
R S L L S R S RQuery AGG TCG TTA CTA TCG AGG AGT AGA | | | | Sbjct CGT AGC CTT TTG AGT CGA TCG CGG R S L L S R S R
R S L L S R S R | | | | | | | | 100% Identity R S L L S R S R
8-1© 2014 WSSP
Clicker Question #1: How many different DNA sequences can code for the peptide sequence Met-Leu-Cys-Ala?
3 Letter 1 Letter DNA codons for each Amino Acids NAME Abbreviation AbbreviationAlanine Ala A GCA,GCC,GCG,GCUCysteine Cys C UGC,UGUHistidine His H CAC,CAUIsoleucine Ile I AUA,AUC,AUULysine Lys K AAA,AAGLeucine Leu L UUA,UUG,CUA,CUC,CUG,CUUMethionine Met M AUGAsparagine Asn N AAC,AAUProline Pro P CCA,CCC,CCG,CCUGlutamine Gln Q CAA,CAGArginine Arg R CGA,CGC,CGG,CGU,AGA,AGGSerine Ser S UCA,UCC,UCG,UCU,AGC,AGUThreonine Thr T ACA,ACC,ACG,ACUValine Val V GUA,GUC,GUG,GUUTryptophan Trp W UGGTyrosine Tyr Y UAC,UAUStop Codons . UAA,UAG,UGA
A) 1
B) 12
C) 36
D) 48
E) 54
3422242262422412263622666262232446622244246216
ITKNYPYYRTADKGWQNSIRHNLSLNRYFIKVPRSQEEPGKGSFWR
Number of codons for the conserved region of the protein
3 Letter 1 Letter DNA codons for each Amino Acids NAME Abbreviation AbbreviationAlanine Ala A GCA,GCC,GCG,GCUCysteine Cys C UGC,UGUAspartic Acid Asp D GAC,GAUGlutamic Acid Glu E GAA,GAGPhenylalanine Phe F UUC,UUUGlycine Gly G GGA,GGC,GGG,GGUHistidine His H CAC,CAUIsoleucine Ile I AUA,AUC,AUULysine Lys K AAA,AAGLeucine Leu L UUA,UUG,CUA,CUC,CUG,CUUMethionine Met M AUGAsparagine Asn N AAC,AAUProline Pro P CCA,CCC,CCG,CCUGlutamine Gln Q CAA,CAGArginine Arg R CGA,CGC,CGG,CGU,AGA,AGGSerine Ser S UCA,UCC,UCG,UCU,AGC,AGUThreonine Thr T ACA,ACC,ACG,ACUValine Val V GUA,GUC,GUG,GUUTryptophan Trp W UGGTyrosine Tyr Y UAC,UAUStop Codons . UAA,UAG,UGA
7.5 x 1019
© 2014 WSSP
8-2© 2014 WSSP
p. 8-2© 2014 WSSP
p. 7-2
DSAP BLASTx PageCropped DNA sequence
NCBI BLASTx page
© 2014 WSSP
p 8-3
BLASTX Dialog Box
© 2014 WSSP
BLASTX of EX1.14
p 8-3© 2014 WSSP
BLASTn and BLASTx of another Landoltia sequence
BLASTn
BLASTx
p 8-4© 2014 WSSP
List of EX1.14 BLASTx matches
p 8-4© 2014 WSSP
Best BLASTx alignment for EX1.14
p 8-5© 2014 WSSP
>gi|223542822|gb|EEF44358.1| conserved hypothetical protein [Ricinus communis]Score = 69.7 bits (169), Expect = 3e-10 Identities = 54/174 (31%), Positives = 85/174 (48%), Gaps = 4/174 (2%)
Query 40 LTCLLILQAPSSHAFYLWppfffpspvpDVITVLNQANQFTTLVQLLTETGVATAVNAIS 219 LT L++L + + A P PS +V +L++ QFTT ++LLT T VAT + Sbjct 9 LTALILLLSLQAQAQNPAAPAPAPSGPLNVTGILDKNGQFTTFIRLLTSTQVATQLEN-Q 67
Query 220 TNGAGPGITLFAPTDAAFAKIPAANLSALNVTQRTSILTLHALTRFYTFAELFVANAALP 399 N G T+FAPTD AF + A L+ L+ Q+ ++ H +FYT + L + + Sbjct 68 LNSTTEGFTVFAPTDNAFNNLKAGTLNDLSTQQQVQLVLAHITPKFYTLSNLLLVPNPVR 127
Query 400 TLNT---GrsltfstsvtrvttitsPGGRVTTLNFLLYRRFPLTIFPIADVLLP 552 T T G + + S G T +N + ++FPL ++ + VLLPSbjct 128 TQATGQDGGVFGLNFTGQANQVNVSTGIVETQINNAIRQQFPLALYQVDKVLLP 181
>gi|223542822|gb|EEF44358.1| conserved hypothetical protein [Ricinus communis]Score = 82.4 bits (202), Expect = 4e-14 Identities = 57/176 (32%), Positives = 89/176 (50%), Gaps = 8/176 (4%) Frame = +1
Query 40 LTCLLILQAPSSHAFYLWPPFFFPSPVPDVITVLNQANQFTTLVQLLTETGVATAVNAIS 219 LT L++L + + A P PS +V +L++ QFTT ++LLT T VAT + Sbjct 9 LTALILLLSLQAQAQNPAAPAPAPSGPLNVTGILDKNGQFTTFIRLLTSTQVATQLEN-Q 67
Query 220 TNGAGPGITLFAPTDAAFAKIPAANLSALNVTQRTSILTLHALTRFYTFAELFVANAALP 399 N G T+FAPTD AF + A L+ L+ Q+ ++ H +FYT + L + + Sbjct 68 LNSTTEGFTVFAPTDNAFNNLKAGTLNDLSTQQQVQLVLAHITPKFYTLSNLLLVPNPVR 127
Query 400 TLNTGR-----SLTFSTSVTRVTTITSPGGRVTTLNFLLYRRFPLTIFPIADVLLP 552 T TG+ L F+ +V S G T +N + ++FPL ++ + VLLPSbjct 128 TQATGQDGGVFGLNFTGQANQVN--VSTGIVETQINNAIRQQFPLALYQVDKVLLP 181
Low Sequence Complexity Filter
With Filter
Without Filter
© 2014 WSSP
p 8-6
Answer questions in DSAP
© 2013 WSSP© 2014 WSSP
Question: Which of these alignments has a greater biological significance?
A)
B)
© 2014 WSSP
What can you conclude about this BLASTX result?
A) It is too short to be significant
B) It does not match anythingC) There is a frame shift in the DNA sequenceD) Your DNA has an exact match
© 2014 WSSP
Where is the frameshift most likely to be found?
A) bp 181
B) Bp 75
C) bp 227
D) bp 381
E) Can not tell from the data© 2014 WSSP
AAAAAAAA
AAAAAAAATTTTTTTTT
TTTTTTTTTAAAAAAAA
TTTTTTTTTAAAAAAAA
DNA
RNA
cDNA
DS-cDNA
Cloning
Replication&
Purification
Sequencing
Points at when an error can be introduced into the DNA sequence of the clone
© 2014 WSSP
Is the frame shift at bp 227 caused by a DNA sequencing error?
A) Yes
B) No
C) Can not tell from the data
© 2014 WSSP
Does this have a frame shift?
Where?
© 2014 WSSP
What does this BLASTX report indicate?
A) There are matches to different proteins at the end of the sequence
B) There are matches in one frame to the entire sequence
C) There is a frame shift in the DNA sequence
D) The protein has two different domains
E) Can not conclude anything
Where is the frame shift?
A) bp 149B) Bp 160C) bp 458D) bp 469E) bp 493
© 2014 WSSP
Does this indicate that there is a frame shift in the sequence?
A)Yes
B)No
C)Can not tell
from the data
+1 +1+3
+1 +3Intron
© 2014 WSSP
What is the most likely explanation for this result?A) There is nothing
wrong with the alignment.
B) There is an extra or missing base causing a frame shift.
C) There is an unspliced intron in the cDNA.
D) The query has an extra protein region.
E) Answers C or D© 2014 WSSP