bioinformatics tutorial i blast and sequence alignment

Bioinformatics Tutorial I BLAST and Sequence Alignment

What is BLAST?

• Online tool from National Center for the Biotechnology Information (NCBI)

• “Google” for proteins and nucleotide sequences

What can you use BLAST for?

• Identify an unknown sequence• Characterize the gene/protein of interest– Function/activity (gene and protein)– Structure or shape (new protein)– Location or preferred location (protein)– Stability (gene/transcript or protein)

• Origin of a gene or protein

Sequence alignment approaches

1. Global alignment– Needleman and Wunsch, 1970

2. Local alignment (used in BLAST)– Smith and Waterman, 1980

Global alignment

• One approach for searching a query sequence is to align the entire sequence against all sequences in a database

• This approach is very slow and hence impractical

• A much faster approach• Divides your search query into short sequences

(“words”) and initially looks for exact matches. Once found, these words are then extended

• i.e. Basic Local Alignment Search Tool

• Altschul, S.F. et al. Basic local alignment search tool. J Mol Biol. 215(3):403-10(1990).

BLAST algorithm

• Query sequences are usually split into words• Each word is then searched in database• Word hits are extended in either direction to

generate alignment with score greater than the threshold score

“The central idea of the BLAST algorithm is to confine attention to segment pairs that contain a word pair of length w with a score of at least T”

- Alschul et al, 1990

How does BLAST work?

Step 1: Get your sequence

• NCBI, UCSC etc..• Sequencing facility (unknown gene)

Step 2: Choose BLAST program

The different BLAST programs

• blastn (nucleotide BLAST)

• blastp (protein BLAST)

• blastx (translated BLAST)

• tblastn (translated BLAST)

• tblastx (translated BLAST)

Simplified visualization

Why translate in 6 reading frames?

5’ CAT CAA 5’ ATC AAC 5’ TCA ACT

5’ GTG GGT 5’ TGG GTA 5’ GGG TAG

5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’

• DNA sequence can code for six different proteins

Step 3: Search parameters

Step 4: Search results

Important: Tabular output

• Sequence similarity score is calculated based on the pair-wise alignment quality

• Alignment score is the sum of scores for each position

• Nucleotides• +1 score for each

match• -2 score for each

mismatch

• Peptides• Each amino acid

substitution is given a score

Example

AACGTTTCCAGTCCAAATAGCTAGGC===--=== =-===-==-====== AACCGTTC TACAATTACCTAGGC

Hits(+1): 18Misses (-2): 5Gaps (existence -2, extension -1): 1 Length: 3Score = 18 * 1 + 5 * (-2) – 2 – 2 = 6

David Fristrom, Introduction to BLAST

E-value

• E-value – expectation value; the number of different alignments which would yield a similar or better score if searched though the database by chance alone.

• Low E-value – sequences may be homologous• Statistical significance depends on..– Length of the query sequence– Size of the sequence database

Graphical output

Taxonomy Results

Graphical output

References

• Figures and text adapted from the following sources:– David Fristrom, Introduction to BLAST– Jonathan Pevsner, BLAST: Basic local alignment search tool– Joanne Fox, BLAST: Finding function by sequence similarity

bioinformatics tutorial i blast and sequence alignment

Documents

cap5510 – bioinformatics multiple alignment

pairwise sequence alignment - algorithms in bioinformatics

biology 224 tom peavy sept 20 & 22, 2010 slides derived from...

bioinformatics and blast

blast: basic local alignment search tool - university of...

global and local alignment (bioinformatics)

sequence alignment,blast, fasta, msa

www.bioalgorithms.infoan introduction to bioinformatics...

eecs 730 introduction to bioinformatics blast: basic local...

alignments in practice blast and clustal - tu … ·...

local alignment, blast and psi-blast october 25, 2012 local...

blast bioinformatics

bioinformatics algorithms - univerzita...

blast - basic local alignment search tool - välkommen...

sequence alignment in bioinformatics

point specific alignment methods psi – blast & phi –...

jonathan m. urbach bioinformatics group department of...

bioinformatics pairwise alignment

introduction to basic local alignment search tool...

blast and fasta heuristics in pairwise sequence...