searching molecular databases with blast. basic local alignment search tool how blast works...

54
Searching Molecular Databases with BLAST

Upload: daniel-sanders

Post on 02-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Searching Molecular Databases with BLAST

Page 2: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Searching Molecular Databases with BLAST

• Basic Local Alignment Search Tool

• How BLAST works• Interpreting search results• The NCBI Web BLAST interface• Demonstration and exercises

Page 3: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Why learn sequence database searching?

• What have I cloned ?• Is this really “my gene” ?• Has someone else already found

it ?• What is this protein’s function ?• What is it related to ?• Can I get more sequence easily ?

Page 4: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Search programs are sequence alignment programs

• They try to find the best alignment between your probe sequence and every target sequence in the database

• Finding optimal alignments is computationally a very resource intensive process

• It is usually not necessary to find optimal alignments, particularly for large databases

• Alignments are ranked and only top scores are reported

Page 5: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Practical database search methods incorporate shortcuts

• The fastest sequence database searching programs use heuristic algorithms

• The basic concept is to break the search and alignment process down into several steps

• At each step, only a best scoring subset is retained for further analysis

Page 6: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

What does ‘HEURISTIC’ mean?

• “a commonsense rule (or set of rules) intended to increase the probability of solving some problem”

• Why consider every possible alignment once a reasonably good alignment is found?

Page 7: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Heuristic programs find approximate alignments

• They are less sensitive than “dynamic programming” algorithms such as Smith-Waterman for detecting weak similarity

• In practice, they run much faster and are usually adequate

• The BLAST program developed by Stephen Altschul and coworkers at the NCBI is the most widely used heuristic program

Page 8: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLAST is a collection of five programs for different

combinations of query and database sequences

Page 9: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Program Query Database

BLASTN DNA DNA

BLASTP protein protein

BLASTX translatedDNA

protein

TBLASTN protein translatedDNA

TBLASTX translatedDNA

translatedDNA

Page 10: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Why BLAST is great

• Very fast and can be used to search extremely large databases

• Sufficiently sensitive and selective for most purposes

• Robust - the default parameters can usually be used

Page 11: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLAST scores are reported in two columns

• Raw values based on the specific scoring matrix employed

• As bits, which are matrix independent normalized values (bigger = better)

• Significance is represented by E values (smaller = better)

Page 12: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Typical BLAST Output Sorted by E value

Page 13: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

The EXPECT (E) threshold is used to control score reporting

• A match will only be reported if its E value falls below the threshold set

• The default value for E is 10, which means that 10 matches with scores this high are expected to be found by chance

• Lower EXPECT thresholds are more stringent, and report fewer matches

Page 14: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Interpreting BLAST scores

• Score interpretation is based on context– What is the question? – What else do you know about the

sequences?– Scoring is highly dependent on probe length

• Exact matches will usually have the highest scores (and lowest E values)– Short exact matches may score lower than

longer partial matches

Page 15: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Interpreting BLAST scores

• Short exact matches are expected to occur at random.

• Partial matches over the entire length of a query are stronger evidence for homology than are short exact matches.

Page 16: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Homology vs Identity

• Homologous sequences are descended from a common ancestral sequence.

• Homology is either true or false. It can never be partial! Saying two sequences are 45% homologous is a misuse of the term.

• Sequence identity and similarity can be described as a percentage and are used as evidence of homology.

Page 17: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLAST ExampleIs this sequence known? What does it encode?

Page 18: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Search Strategy

• Choose the BLAST program:– nucleotide query vs. nucleotide db– megabalst: optimized to find

identical sequences– blastn: will find identical and similar

sequences

• Choose the Database– nr (non-redundant) – everything– genome specific

Page 19: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration
Page 20: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

blastn Options

Paste QuerySequence HERE

Choose DatabaseHERE

Choose search programHERE

Page 21: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Each line is a hitin the database sorted vertically by E value

Colored rectangles along the X axis show where in the query sequencea similarity in the database has been found. Color indicates degree of similarity

Page 22: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Output sorted by E value

Page 23: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Link to GenBank file

Page 24: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Link to alignment

Page 25: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Link to Entrez Gene

Page 26: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

blastn Alignment

Page 27: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLASTP Example

Page 28: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration
Page 29: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

blastp input

Page 30: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

blastp Databases

Page 31: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

• nr - All non-redundant GenBank CDS translations + PDB + SwissProt+PIR

• swissprot - the last major release of the SWISS-PROT protein sequence database

• pat - patented sequences • pdb - Sequences derived from the 3-dimensional

structure Protein Data Bank• env_nr - Non-redundant environmental samples

blastp Databases

Page 32: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLASTP Output

Conserved Domain Search

Conserved domains are showngraphically. Link to explanationof the domain.

Page 33: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

blastp Output

Page 34: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

blastp Alignment

Page 35: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Protein Scoring MatricesBlosom 62 is the default BLASTP scoring matrix

Page 36: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Different Matrices Produce slightly different alignments

Page 37: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Other BLAST Programs:Psi-BLAST

4.6 PSI-BLAST is designed for more sensitive protein-protein similarity searches. Position-Specific Iterated (PSI)-BLAST is the most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family. Use PSI-BLAST when your standard protein-protein BLAST search either failed to find significant hits, or returned hits with descriptions such as "hypothetical protein" or "similar to...".

Page 38: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Other BLAST Programs:Phi-BLAST

4.7 PHI-BLAST can do a restricted protein pattern search. Pattern-Hit Initiated (PHI)-BLAST is designed to search for proteins that contain a pattern specified by the user AND are similar to the query sequence in the vicinity of the pattern. This dual requirement is intended to reduce the number of database hits that contain the pattern, but are likely to have no true homology to the query.

Page 39: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Sequence filters

• Since only a limited number of matches are reported, hits to simple repeats and other low complexity sequences can obscure other more biologically meaningful similarities

• Filters are used to remove low complexity sequences from the probe

• Low Complexity, human repeats (blastn)

Page 40: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Low Complexity Sequences are Filtered Out

Page 41: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLASTN vs BLASTP

• Protein sequences have much higher information content than nucleotide sequence

• To find evidence for sequence homology, use BLASTP and search protein sequences

• Is my sequence already in the database?

• To find identical sequences, search nucleotide databases

Page 42: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Translated BLAST Searches

• translations use all 6 frames

• computationally intensive

• tblastx searches can be very slow with some large databases

• must specify genetic code

Page 43: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Alternate Genetic Codes

Page 44: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Translated BLAST Searches

Page 45: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Taxonomy Reports

Page 46: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Taxonomy Reports

Page 47: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLAST Genomes

Page 48: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration
Page 49: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration
Page 50: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration
Page 51: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Align 2 Sequences with BLAST

Page 52: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLAST from ORF Finder

Page 53: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

Primer BLAST

Page 54: Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration

BLAST Tutorial

• BLAST tutorial on Biocomp Web page

• Goal: demonstrate utility and difference between BLASTN and BLASTP searches

• BLASTN: is my DNA sequence in the database?

• BLASTP: are there related (homologs) proteins in the database?