pairwise sequence alignment

Download Pairwise Sequence  Alignment

Post on 30-Jan-2016

46 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Pairwise Sequence Alignment. Why align sequences?. Functional predictions based on identifying homologues. Assumes: conservation of sequence conservation of function BUT: Function carried out at level of proteins, i.e. 3-D structure - PowerPoint PPT Presentation

TRANSCRIPT

  • Presented by Liu QiPairwise Sequence Alignment

    Presented By Liu Qi

  • Presented By Liu QiWhy align sequences?

    Functional predictions based on identifying homologues.Assumes:conservation of sequence conservation of functionBUT: Function carried out at level of proteins, i.e.3-D structure Sequence conservation carried out at level of DNA1-D sequence

    Presented By Liu Qi

  • Presented By Liu Qi

    Presented By Liu Qi

  • Presented By Liu QiRelation of sequencesHomologous sequences. Orthologs and Paralogs are two types of homologous sequences. Orthology describes genes in different species that derive from a common ancestor. Orthologous genes may or may not have the same function. Paralogy describes homologous genes within a single species that diverged by gene duplication.

    Presented By Liu Qi

  • Some DefinitionsAn alignment is a mutual arrangement of two sequences, which exhibits where the two sequences are similar, and where they differ. An optimal alignment is one that exhibits the most correspondences and the least differences. It is the alignment with the highest score. May or may not be biologically meaningful.Presented By Liu Qi

    Presented By Liu Qi

  • Presented By Liu QiMethods

    Dot matrix Dynamic Programming Word, k-tuple (heuristic based)

    Presented By Liu Qi

  • Presented By Liu QiBrief intro of methodsdot matrix - all possible matches between sequence residues are found;used to compare two sequences to look for regions where they may align; very useful for finding indels and repeats in sequences; can be used as afirst pass to see if there is any similarity between sequences

    dynamic programming - mathematically guaranteed to find optimal alignment (global or local) between pairs of sequences; very computationallyexpensive - # of steps increases exponentially with sequence length

    k-tuple (word) methods - used by FASTA and BLAST (previously described); much faster than dynamic programming and ideal for databasesearches; uses heuristics that do not guarantee optimal alignment but are nevertheless very reliable

    Presented By Liu Qi

  • Presented By Liu QiDot matrix1 - one sequence listed along top of page and second sequence listed along the side

    2 - move across row and put dot in any column where the character is the same

    3 - continue for each row until all possible character matches between thesequences are represented by dots

    4 - diagonal rows of dots reveal sequencesimilarity (can also find repeats and invertedrepeats off the main diagonal)

    5 - isolated dots represent random similarity unrelated to the alignment

    Presented By Liu Qi

  • Presented By Liu Qi

    Presented By Liu Qi

  • Presented By Liu QiDot matrix with noise reduction

    Presented By Liu Qi

  • Presented By Liu QiDot matrixTo improve visualisation of identical regions among sequences we use sliding windows Instead of writing down a dot for every character that is common in both sequences We compare a number of positions (window size), and we write down a dot whenever there is minimum number (stringency) of identical characters

    Presented By Liu Qi

  • Presented By Liu QiDot matrixCaution is necessary regarding the window size and the stringency value. Generally, they assume different values for different problems. The optimal values will accent the regions of similarity of the two sequences For DNA sequence usually, Sliding window=15, stringency=10 For Protein sequence Sliding window=2 or 3, stringency=2

    Presented By Liu Qi

  • Presented By Liu QiThings to be considered Scoring matrix for distance correction. Window size Threshold

    Presented By Liu Qi

  • Presented By Liu QiThe useful of Dot plot Regions of similarity: diagonalsInsertions/deletions: gapsCan determine intron/exon structureRepeats: parallel diagonalsInverted repeats: perpendicular diagonalsInverted repeatsCan be used to determine regions of base pairing of RNA molecules

    Presented By Liu Qi

  • Presented By Liu QiIntra-sequence comparisonRepeatsInverted repeatsLow complexity

    Presented By Liu Qi

  • Presented By Liu QiABRACADABRACAD

    Examples

    Presented By Liu Qi

  • Presented By Liu QipalindromeSequence: ATOYOTA

    Presented By Liu Qi

  • Presented By Liu QiRepeatsDrosophila melanogaster SLIT protein against itself

    Presented By Liu Qi

  • Presented By Liu QiLow complexity

    Presented By Liu Qi

  • Presented By Liu QiInter sequence comparisonConserved domainsInsertion and deletion

    Presented By Liu Qi

  • Presented By Liu QiInsertion and deletionSeq1:DOROTHYCROWFOOTHODGKINSeq2:DOROTHYHODGKIN

    Presented By Liu Qi

  • Presented By Liu QiConserved domains

    Presented By Liu Qi

  • Presented By Liu QiTranslated DNA and protein comparison :Exons and introns

    Presented By Liu Qi

  • Presented By Liu Qi

    Presented By Liu Qi

  • Presented By Liu QiEven more can be done with RNARNA comparisons of the reverse, complement of a sequence to itself can often be very informative.Consider the following set of examples from the phenylalanine transfer RNA (tRNA-Phe) molecule from Bakers yeast.The sequence and structure of this molecule is also known; the illustration will show how simple dot-matrix procedures can quickly lead to functional and structural insights (even without complex folding algorithms).

    Presented By Liu Qi

  • Presented By Liu QiStructures of tRNA-Phe

    Presented By Liu Qi

  • Presented By Liu QiRNA comparisons of the reverse, complement of a sequence to itself

    Presented By Liu Qi

  • Presented By Liu QiPrograms for Dot MatrixDotlethttp://www.isrec.isb-sib.ch/java/dotlet/Dotlet.htmlSIGNALhttp://innovation.swmed.edu/research/informatics/res_inf_sig.htmlDotter http://www.cgb.ki.se/cgb/groups/sonnhammer/Dotter.htmlCOMPARE, DOTPLOT in GCG

    Presented By Liu Qi

  • Presented By Liu QiconclusionAdvantages: Readily reveals the presence of insertions/deletions and direct and inverted repeats that are more difficult to find by the other, more automated methods.lets your eyes/brain do the work VERY EFFICIENT!!!!

    Disadvantages: Most dot matrix computer programs do not show an actual alignment. Does not return a score to indicate how optimal a given alignment is.

    Presented By Liu Qi

  • Presented By Liu QiReferenceGibbs, A. J. & McIntyre, G. A. (1970). The diagram method for comparing sequences. its The diagram method for comparing sequences. its use with amino acid and nucleotide sequences.Eur. J. Biochem. 16 , 1-11. Maizel, J.V., Jr. and Lenk R.P. (1981). nhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl. Acad. Sci. 78: 7665- 7669 Staden, R. (1982). An interactive graphics program for comparing and aligning nucleic-acid and amino-acid acid sequences. Nucl. Acid. Res. 10 (9), 2951-2961.

    Presented By Liu Qi

  • Presented By Liu QiDynamic ProgrammingAnswer: what is the optimal alignment of two sequences(the best score)?How many different alignments?

    Presented By Liu Qi

  • Alignment methods with DPGlobal alignment - Needleman-Wunsch (1970) maximizes the number of matches between the sequences along the entire length of the sequences.Local alignment - Smith-Waterman (1981) is a modification of the dynamic programming algorithm giving the highest scoring local match between two sequencesPresented By Liu Qi

    Presented By Liu Qi

  • Presented By Liu QiDynamic ProgrammingA simple example879

    Presented By Liu Qi

  • Presented By Liu QiExercise

    Presented By Liu Qi

  • Presented By Liu Qi

    Presented By Liu Qi

  • Presented By Liu QiDynamic Programming

    Presented By Liu Qi

  • Presented By Liu QiDynamic Programming

    Presented By Liu Qi

  • Presented By Liu QiDynamic Programming

    Presented By Liu Qi

  • Presented By Liu QiDynamic Programming

    Presented By Liu Qi

  • Presented By Liu QiDP Algorithm for Global Alignment

    Two sequences X = x1...xn and Y = y1...ymF(i, j) be the optimal alignment score of X1...i and Y1...j (0 i n, 0 j m).

    Presented By Liu Qi

  • Presented By Liu QiDP in equation form

    Presented By Liu Qi

  • Presented By Liu QiA simple exampleFind the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

    ACGTA2-7-5-7C-72-7-5G-5-72-7T-7-5-72

    AAG

    AGC

    Presented By Liu Qi

  • Presented By Liu QiA simple exampleFind the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

    ACGTA2-7-5-7C-72-7-5G-5-72-7T-7-5-72

    AAG0AGC

    Presented By Liu Qi

  • Presented By Liu QiA simple exampleFind the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

    ACGTA2-7-5-7C-72-7-5G-5-72-7T-7-5-72

    AAG0-5-10-15A-5G-10C-15

    Presented By Liu Qi

  • Presented By Liu QiA simple exampleFind the optimal alignment of AAG and AGC.Use a gap penalty of d=-5.

    ACGTA2-7-5-7C-72-7-5G-5-72-7T-7-5-72

    AAG0-5-10-15A-52-3-8G-10-3-3-1C-15-8-8-6

    Presented By Liu Qi

  • Presented By Liu QiTracebackStart from the lower right corner and trace back to the upper left.Each arrow introduces one character at the end of each aligned sequence.A horizontal move puts a gap in the left sequence.A vertical move puts a gap in the top sequence.A diagonal move uses one character from each sequence.

    Presented By Liu Qi

  • Presented By Liu QiStart from the lower right corner and trace back to the upper left.Eac

Recommended

View more >