sequence alignment tutorial

Download Sequence Alignment Tutorial

Post on 23-Feb-2016

80 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Sequence Alignment Tutorial. Presented by Kirill Bessonov Oct 2012. Talk Structure. Introduction to sequence alignments Methods / Logistics Global Alignment: Needleman- Wunsch Local Alignment: Smith-Waterman Illustrations of two types of alignments s tep by step local alignment - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

Sequence Alignment TutorialPresented byKirill BessonovOct 2012

Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Talk StructureIntroduction to sequence alignmentsMethods / LogisticsGlobal Alignment: Needleman-WunschLocal Alignment: Smith-WatermanIllustrations of two types of alignmentsstep by step local alignmentComputational implementation of alignmentRetrieval of sequences using RAlignment of sequences using RHomeworkBioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Sequence AlignmentsComparing two objects is intuitive. Likewise sequence pairwise alignments provide info on:evolutionary distance between species (e.g. homology)new functional motifs / regionsgenetic manipulation (e.g. alternative splicing)new functional roles of unknown sequenceidentification of binding sites of primers / TFsde novo genome assemblyalignment of the short reads from high-throughput sequencer (e.g. Illumina or Roche platforms)

Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Comparing two sequencesThere are two ways of pairwise comparisonGlobal using Needleman-Wunsch algorithm (NW)Local using Smith-Waterman algorithm (SW)Both approaches use similar methodology, but have completely different objectivesGlobal alignment (NW)tries to align the whole sequencemore restrictive than local alignmentLocal alignment (SW)tries to align portions (e.g. motifs) of given sequencesmore flexible as considers parts of the sequenceworks well on highly divergent sequences

entire sequenceperfect matchunaligned rest of the sequencealigned portionBioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Global alignment (NW)Sequences are aligned end-to-end along their entire length Many possible alignments are producedThe alignment with the highest score is chosenNave algorithm is very inefficient (Oexp)To align sequence of length 15, need to considerPossibilities # = (insertion, deletion, gap)15 = 315 = 1,4*107Impractical for sequences of length >20 ntUsed to analyze homology/similarity of entire:genes and proteins assess gene/protein overall homology between speciesBioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Methodology of global alignment (1 of 4)Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Methodology of global alignment (2 of 4)The matrix should have extra column and rowM+1 columns , where M is the length sequence MN+1 rows, where N is the length of sequence NInitialize the matrix by introducing gap penalty at every initial position along rows and columnsScores at each cell are cumulative WHAT0-2-4-6-8W-2H-4Y-6-2-2-2-2-2-2-2Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#-27Methodology of global alignment (3 of 4)For each cell consider all three possibilities1)Gap (horiz/vert) 2)Match (diag)3)Mismatch(diag)

Select the maximum score for each cell and fill the matrixWHAT0-2-4-6-8W-220-2-4H-40420Y-6-2231WH0-2-4W-2-4WH0-2-4W-2+2WH0-2-4W-2+2-1-2-2+2-1Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#-28Methodology of global alignment (4 of 4)Select the most very bottom right cell Consider different path(s) going to very top left cellPath is constructed by finding the source cell w.r.t. the current cellHow the current cell value was generated? From where?

WHAT WHATWHY- WH-Y Overall score = 1 Overall score = 1

Select the best alignment(s)WHAT0-2-4-6-8W-220-2-4H-40420Y-6-2231WHAT0-2-4-6-8W-220-2-4H-40420Y-6-2231Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Local alignment (SW)Sequences are aligned to find regions (part of the original sequence) where the best alignment occurs Local similarity context (aligning parts)More detailed / micro appoachIdeal for finding short motifs, DNA binding sitesWorks well on highly divergent sequencesSequences with highly variable introns but highly conserved and sparse exons Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Methodology of local alignment (1 of 4)Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Methodology of local alignment (2 of 4)Construct the MxN alignment matrix with M+1 columns and N+1 rowsInitialize the matrix by introducing gap penalty at 1st row and 1st column

WHAT00000W0H0Y0Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Methodology of local alignment (3 of 4)For each subsequent cell consider all possibilities (i.e. motions)Vertical 2)Horizontal 3)DiagonalFor each cell select the highest scoreIf score is negative assign zeroWHAT00000W02000H00420Y00231Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Methodology of local alignment (4 of 4)Select the initial cell with the highest score(s)Consider different path(s) leading to score of zeroTrace-back the cell values Look how the values were originated (i.e. path)

WHWHMathematicallywhere S(I, J) is the score for subsequences I and J WHAT00000W02000H00420Y00231

total score of 4BAJIBioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#I 14Local alignment illustration (1 of 2)Bioinformatics Chapter 4: Sequence comparison____________________________________________________________________________________________________________________Kirill Bessonov CH4:p#Local alignment illustration (2 of 2)GGCTCAATCAACCTAAGGGGCTCAATCA00000000000A0C0C0T0A0A0G0G0GGCTCAATCA00000000000A0000002C0C0T0A0A0G0G0GGCTCAATCA00000000000A0000000C0C0T0A0A0G0G0GGCTCAATCA00000000000A0000000C0C0T0A0A0G0G0GGCTCAATCA00000000000A00000020002C0C0T0A0A0G0G0GGCTCAATCA00000000000A00000020002C00020201120C0T0A0A0G0G0GGCTCAATCA00000000000A00000020002C00020201120C00021210021T0A0A0G0G0GGCTCAATCA00000000000A00000020002C00020201120C00021210021T00004210201A0A0G0G0GGCTCAATCA00000000000A00000020002C00020201120C00021210021T00004210201A00002343112A0G0G0GGCTCAATCA00000000000A00000020002C00020201120C00021210021T00004210201A00002343112A00000156423G0G0GGCTCAATCA00000000000A00000020002C00020201120C00021210021T00004210201A00002343112A00000156423G02200034531G0GGCTCAATCA00000000000A00000020002C00020201120C00021210021T00004210201A00002343112A00000156423G022000