basic local alignment and search tool (blast)bioinformatics.amc.nl/.../20170314_blast_aj_bvs.pdf ·...
TRANSCRIPT
![Page 1: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/1.jpg)
Basic Local Alignment and Search Tool(BLAST)
Database searching
Barbera van Schaik
![Page 2: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/2.jpg)
Why use BLAST?
• Dynamic Programming is not suitable for comparing a query sequence against a database– Takes too much time!
• BLAST is a heuristic method to find the highest locally optimal alignments
– BLAST improved overall speed of searches– BLAST maintains good sensitivity
![Page 3: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/3.jpg)
3
BLAST terminology
query sequence
blast
targetdatabase(GenBank/ SwissProt)
output
sequence list:Hits/subject
information aboutinput query sequence, e.g.,function
The aim of a database (blast) search is to discover sequence homology on basis of sequence similarity
BLAST returns similar sequences, not necessarily biological similar sequences
![Page 4: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/4.jpg)
BLAST variants
Sequence type nucleotide database
protein database
nucleotide query blastn/tblastx blastx
amino acid query tblastn blastp
blastn: finds NT sequences similar to your NT sequenceblastp: finds AA sequences similar to your AA sequenceblastx: finds AA sequences similar to translation of your NT sequence (if you cannot recognize an ORF)tblastn: translate AA sequence and searches against NT database (for finding pseudogenes)tblastx: keep computers busy
![Page 5: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/5.jpg)
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Web interface changes now and then
![Page 6: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/6.jpg)
http://blast.ncbi.nlm.nih.gov/Blast.cgi
![Page 7: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/7.jpg)
http://blast.ncbi.nlm.nih.gov/Blast.cgi
![Page 8: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/8.jpg)
BLASTing a sequence at NCBI – programs
![Page 9: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/9.jpg)
BLASTing a sequence at NCBI – enter accession
![Page 10: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/10.jpg)
BLASTing a sequence at NCBI – enter sequence
![Page 11: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/11.jpg)
Database choice
Protein databases
Good for protein coding nucleotide queries
Choose a non-redundant database
Nucleotide databases
Non-redundant database
Filter on organism / other Entrez query
![Page 12: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/12.jpg)
BLASTing a sequence at NCBI – parameters
![Page 13: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/13.jpg)
Blast algorithm: step 1
protein query sequence
protein databasecompile list of ‘words’ of length W
![Page 14: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/14.jpg)
Blast algorithm: step 2
Initial searchUse PAM/BLOSUM matrix
Find word of length ‘W’ thatscores at least ‘T’ (T=11)
Exact matches only
The parameter T dictates thespeed and sensitivity-increasing T increases speed, decreases sensitivity
![Page 15: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/15.jpg)
Join words on same diagonal (ungapped)
database sequence
word hit
High-scoring segment pair (score=‘S’)
distance AScore>T Score>T
![Page 16: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/16.jpg)
Join words on same diagonal
database sequence
High-scoring segment pair (score=‘S’)
Extend HSP until score drops small amount belowhighest score of shorter alignment
score score HSPextend to left (similar to right)
stop extension
drops below thresholdscore S for this alignment
If S>threshold (based on random sequences) then keep HSP
![Page 17: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/17.jpg)
Finding HSP’s
T>11
T>13
HSP(Join and extend)
![Page 18: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/18.jpg)
Trigger gapped extension
HSP
Gapped extension
![Page 19: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/19.jpg)
BLASTing a sequence at NCBI – parameters
![Page 20: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/20.jpg)
Masking of sequences – low complexity
Low complexity repeats in genome
Many amino-acid “stretches” in proteins
BLAST recognizes these regions as similar
but, they are NOT biologically related
![Page 21: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/21.jpg)
Masking of sequences – highly abundant sequences
First query sequence against database that contains domains representative of large sequence families
Alu repeats Protein kinase catalytic domains Vector sequences
Then mask these domains in the query sequence and continue search
Masking option replaces these regions with XXXXXXX
![Page 22: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/22.jpg)
Reason Parameters to change
The sequence you’re interested in contains many identical residues; it has a biased composition
Sequence filter (automatic masking)
BLAST doesn’t report any results The substitution matrix or the gap penalties
Your match has a borderline E-value The substitution matrix or the gap penalties to check the match’s robustness
BLAST reports too many matches The database you’re searching OR filter the reported entries by keyword OR increase the nr of reported matches OR increase Expect (the evalue threshold) OR reject sequences too similar to the query (those with very low e-values)
When do you change the parameters?
Parameters are
already optimizedParameters are
already optimized
![Page 23: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/23.jpg)
BLASTing a sequence at NCBI – parameters
![Page 24: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/24.jpg)
BLASTing a sequence at NCBI – job status
![Page 25: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/25.jpg)
If it takes too long: try another BLAST server
Country / continent
Program URL
USAEurope
EuropeJapan
BLAST / PSI-BLASTBLAST
BLAST (WU-BLAST)BLAST / PSI-BLAST
http://blast.ncbi.nlm.nih.gov/Blast.cgihttp://www.expasy.org/tools/blast/bBLAST.htmlhttp://www.ebi.ac.uk/serviceshttp://blast.ddbj.nig.ac.jp/top-e.html
Warning:
different database (versions) !Warning:
different database (versions) !
![Page 26: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/26.jpg)
BLASTing a sequence at NCBI – blast summary
![Page 27: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/27.jpg)
BLASTing a sequence at NCBI – used parameters
![Page 28: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/28.jpg)
BLASTing a sequence at NCBI – graphical display
![Page 29: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/29.jpg)
BLASTing a sequence at NCBI – hit list
How often would thishit have occurred bychance?
Rule of thumb:E-value < 0.0001
![Page 30: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/30.jpg)
BLASTing a sequence at NCBI
![Page 31: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/31.jpg)
Alternatives for homology searches
Country / continent
Program Address
USA FASTA http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml
Europe FASTA http://www.ebi.ac.uk/Tools/sss/fasta/
Europe SSEARCH http://www.ebi.ac.uk/Tools/services/web/toolform.ebi?tool=fasta&program=ssearch&context=protein
USA BLAT http://genome.ucsc.edu/
![Page 32: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/32.jpg)
Alternative use
Of alignment algorithm
![Page 33: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/33.jpg)
Pairwise comparisonof Medline abstracts
![Page 34: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/34.jpg)
Pairwise comparisonof Medline abstracts
eTBLAST implementation
A sample of 62 213 Medline citations
1.35% with shared authors were sufficiently similar
0.04% of the citations with no shared authors were highly similar (potential plagiarism)
![Page 35: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/35.jpg)
Pairwise comparisonof Medline abstracts
![Page 36: Basic Local Alignment and Search Tool (BLAST)bioinformatics.amc.nl/.../20170314_BLAST_AJ_BvS.pdf · Basic Local Alignment and Search Tool (BLAST) Database searching Barbera van Schaik](https://reader035.vdocuments.net/reader035/viewer/2022062605/5fc78e38fa851b041d296271/html5/thumbnails/36.jpg)
Other applications
Sequence alignment
Literature comparison
Spelling checkerSound comparison
More?