blast basic local alignment search tool. blast החכה blast (basic local alignment search tool)...

25
BLAST Basic Local Alignment Search Tool

Post on 22-Dec-2015

226 views

Category:

Documents


1 download

TRANSCRIPT

BLAST

Basic Local Alignment Search Tool

BLAST החכה

BLAST (Basic Local Alignment Search Tool)allows rapid sequence comparison of a query sequence פיתיון בחכהה (nucleotides or amino acids)רצף שאילתא]] against a database להים הגדו

לצורך דיג מוצלח

יש לבחור חכה פיתיון ומקווה מים בהתאם לשאלה הביולוגית

Comparing the query sequence to known sequences in databases is fundamental to understanding the relatedness of any query sequence to other known proteins or DNA sequences

Applications include

bull Identifying shared similarities with sequences already

deposited in the databanks (orthologs and paralogs)

bull Discovering new genes or proteins (ascertaining

existence of a putative ORF)

bull Discovering variants of genes or proteins

bullIdentifying functional motifs shared with other proteins

bull Investigating expressed sequence tags (ESTs)

bull Exploring protein structure and function

Why use local alignment for database searches

Local alignment is a useful approach to

DB searching because many query

sequences have domains active sites or

other motifs that have local but not

global regions of similarity to other sequences

BLAST(1) for the query find the list of high scoring words of length w

Query Sequence of length L

For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)

BLAST (cont)(2) Compare the word list to the database and identify exact matches

WordList

Exact matches of words from word lists

databasesequence

(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S

maximal segment pairs (MSPs)

Blast is a heuristic algorythm

לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר

אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב

Speed vs sensitivityDoes not find ALL best matches

False negativesכיצד נעריך את הממצאים המתקבלים

Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment

Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

BLAST החכה

BLAST (Basic Local Alignment Search Tool)allows rapid sequence comparison of a query sequence פיתיון בחכהה (nucleotides or amino acids)רצף שאילתא]] against a database להים הגדו

לצורך דיג מוצלח

יש לבחור חכה פיתיון ומקווה מים בהתאם לשאלה הביולוגית

Comparing the query sequence to known sequences in databases is fundamental to understanding the relatedness of any query sequence to other known proteins or DNA sequences

Applications include

bull Identifying shared similarities with sequences already

deposited in the databanks (orthologs and paralogs)

bull Discovering new genes or proteins (ascertaining

existence of a putative ORF)

bull Discovering variants of genes or proteins

bullIdentifying functional motifs shared with other proteins

bull Investigating expressed sequence tags (ESTs)

bull Exploring protein structure and function

Why use local alignment for database searches

Local alignment is a useful approach to

DB searching because many query

sequences have domains active sites or

other motifs that have local but not

global regions of similarity to other sequences

BLAST(1) for the query find the list of high scoring words of length w

Query Sequence of length L

For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)

BLAST (cont)(2) Compare the word list to the database and identify exact matches

WordList

Exact matches of words from word lists

databasesequence

(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S

maximal segment pairs (MSPs)

Blast is a heuristic algorythm

לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר

אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב

Speed vs sensitivityDoes not find ALL best matches

False negativesכיצד נעריך את הממצאים המתקבלים

Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment

Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

Comparing the query sequence to known sequences in databases is fundamental to understanding the relatedness of any query sequence to other known proteins or DNA sequences

Applications include

bull Identifying shared similarities with sequences already

deposited in the databanks (orthologs and paralogs)

bull Discovering new genes or proteins (ascertaining

existence of a putative ORF)

bull Discovering variants of genes or proteins

bullIdentifying functional motifs shared with other proteins

bull Investigating expressed sequence tags (ESTs)

bull Exploring protein structure and function

Why use local alignment for database searches

Local alignment is a useful approach to

DB searching because many query

sequences have domains active sites or

other motifs that have local but not

global regions of similarity to other sequences

BLAST(1) for the query find the list of high scoring words of length w

Query Sequence of length L

For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)

BLAST (cont)(2) Compare the word list to the database and identify exact matches

WordList

Exact matches of words from word lists

databasesequence

(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S

maximal segment pairs (MSPs)

Blast is a heuristic algorythm

לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר

אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב

Speed vs sensitivityDoes not find ALL best matches

False negativesכיצד נעריך את הממצאים המתקבלים

Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment

Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

Why use local alignment for database searches

Local alignment is a useful approach to

DB searching because many query

sequences have domains active sites or

other motifs that have local but not

global regions of similarity to other sequences

BLAST(1) for the query find the list of high scoring words of length w

Query Sequence of length L

For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)

BLAST (cont)(2) Compare the word list to the database and identify exact matches

WordList

Exact matches of words from word lists

databasesequence

(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S

maximal segment pairs (MSPs)

Blast is a heuristic algorythm

לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר

אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב

Speed vs sensitivityDoes not find ALL best matches

False negativesכיצד נעריך את הממצאים המתקבלים

Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment

Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

BLAST(1) for the query find the list of high scoring words of length w

Query Sequence of length L

For each word from the query sequence find the list of words that will score at least T when scored using a pair-score matrix(eg PAM 250 BLOSUM)

BLAST (cont)(2) Compare the word list to the database and identify exact matches

WordList

Exact matches of words from word lists

databasesequence

(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S

maximal segment pairs (MSPs)

Blast is a heuristic algorythm

לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר

אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב

Speed vs sensitivityDoes not find ALL best matches

False negativesכיצד נעריך את הממצאים המתקבלים

Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment

Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

BLAST (cont)(2) Compare the word list to the database and identify exact matches

WordList

Exact matches of words from word lists

databasesequence

(3) For each word match extend the alignment in both directions to find alignments that score greater than a threshold of value S

maximal segment pairs (MSPs)

Blast is a heuristic algorythm

לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר

אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב

Speed vs sensitivityDoes not find ALL best matches

False negativesכיצד נעריך את הממצאים המתקבלים

Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment

Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

Blast is a heuristic algorythm

לא משווים את מלוא רצף השאילתאלמלוא האורך של כא מן הרצפים במאגר

אלא מבצעים חיפוש (מרחב החיפוש)חלקי עס קירוב

Speed vs sensitivityDoes not find ALL best matches

False negativesכיצד נעריך את הממצאים המתקבלים

Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment

Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

Raw score S of the alignment is usuallycalculated by summing the scores formatches mismatches and gaps in thealignment

Normalized score (bits) - bit scores fromdifferent alignments even those employingdifferent scoring matrices can be comparedThe higher the score the better the alignmentbut the significance of an alignment can notbe deduced from the score alone

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

E-value (Expectation value)bull Expect value of 10 for a match means in a

database of current size one might expect to see 10 matches with a similar or better score simply by chance alone

bull E-value is the most commonly used threshold in

database searches Only those hits with E-values

smaller than the set threshold will be reported in

the output

bull Increasing the E-value enables you to see

biologically related sequences but statistically

insignificant

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

To evaluate the alignmentbull Examine statistical parameters1048707Normalized score1048707E value1048707 identity1048707 similarity1048707 gapsbull Examine the alignment itselfbull Use biological common senseDonrsquot rely only on statistical significance

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

מרוב עצים לא רואים את היער

יותר מידי חזרות על אותם רצפים בעלי מובהקות גבוהה לא רואים רצפים בעלי דמיון נמוך יותר

שעשויים אף הם להיות מעניינים

What can we do if there are too many matches

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

bullLimit DB

bullLimit organism

bullFilter reported entries by keyword

bull(Limit to a specific domain)

bullChange matrix andor gap penalties

bullChange E-value

bullAdd filter for low complexity

ספירת האפשרויות השונות

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

What can we do if there are hardly

any matches

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

bullCheck choice of DB

bullCheck choice of organism

bullRemove filter for low complexity

bullChange matrix or gap penalties

bullIncrease E-value

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

DNA vs Protein searchesIf we have a nucleotide sequence should we search the

DNA databases only Or should we translate it to protein and search protein databases

Translating causes loss of information but protein sequence is more conserved than DNA sequence

It is therefore advisable to translate a nucleotide sequence to protein and search protein databases for homology

Query DNA Protein

Database DNA Protein

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

No ORF found No similar protein sequences were found Specific DNA databases are available (EST)

To find duplicated genes in a genome

To find pseudogenes

To find the location of non-protein coding genes

in the genome (siRNA etc)

Why use a nucleotide sequence after all

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

Blast flavors

BlastN - nt versus nt database BlastP - protein versus protein database BlastX - translated nt (6 frames) versus protein database tBlastN - protein versus translated nt database (6 frames) tBlastX - translated nt versus translated nt database (both 6 frames)

Query DNA Protein

DB DNA Protein

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

Uses of BLAST programs

BLASTx ndash compares a nucleotide query seq translated in all reading frames against a prot seq db

DNA protein

If you have a DNA seq and you want to now what protein (if any) it encodes you can perform BLASTx search

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

tBLASTn

tBLASTn ndash compares a protein query seq against a nucleotide seq db which is translated in all reading frames

Protein DNA

You can use this program to ask whether a DNA or ESTs db contains a nuc seq encoding a protein that matches your protein of interest

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

tBLASTx

tBLASTx ndash translates DNA from query and compares it to db of DNA seqs all translated to all reading frames

DNA DNA

(nr db cannot be used because itrsquos too large)

Used to determine whether an entire DNA db contains genes that encodes proteins similar to your query (If blastx or tblastn fail)

E-value

  • Slide 23
  • Slide 24
  • Slide 25

E-value

  • Slide 23
  • Slide 24
  • Slide 25