tema 13. sequence comparison. concept of homology. sequence alignment. comparison strategies. blast,...

Download Tema 13. Sequence comparison. Concept of homology. Sequence alignment. Comparison strategies. BLAST, PSI-Blast. Multiple alignment, profiles. Families

Post on 18-Dec-2015




0 download

Embed Size (px)


  • Slide 1
  • Tema 13. Sequence comparison. Concept of homology. Sequence alignment. Comparison strategies. BLAST, PSI-Blast. Multiple alignment, profiles. Families of proteins. Functional prediction based on sequence. Gabriel Pons, Departament de Cincies Fisiolgiques II, Campus de Cincies de la salut. Bellvitge. Universitat de Barcelona
  • Slide 2
  • Sequence comparison
  • Slide 3
  • Goals To take advantage from functional or structural information identifiyng homologies between sequences Differences between Homology and identity Two sequences are homologous when: They have the same evolutive origin They have similar function and structure
  • Slide 4
  • Homologous sequences - sequences that share a common evolutionary ancestry Similar sequences - sequences that have a high percentage of aligned residues with similar physicochemical properties (e.g., size, hydrophobicity, charge) IMPORTANT: Sequence homology: An inference about a common ancestral relationship, drawn when two sequences share a high enough degree of sequence similarity Homology is qualitative Sequence similarity: The direct result of observation from a sequence alignment Similarity is quantitative; can be described using percentages
  • Slide 5
  • More definitions Orthologs: sequences which exactely correspond to the same function/structure in different species Paralogs: sequences produced by gene duplications in the same organism. Usually, it involves change in function, but keeping functional relationship many times.
  • Slide 6
  • Homology
  • Slide 7
  • Homology and prediction Very divergent protein sequences may suport similar structures Similar protein structures will probably have related or similar functions
  • Slide 8
  • 3D STRUCTURE VERSUS SEQUENCE Sequence alignment between human myoglobin, and globins from hemoglobin
  • Slide 9
  • myoglobin -globin -globin Comparison of 3D structures of human myoglobin, and globins from hemoglobin
  • Slide 10
  • Superposition of 3D structures of human myoglobin and globin from hemoglobin
  • Slide 11
  • Homology and prediction Sequence comparison is the simplest method in order to identify the presence of homology between sequences. Identity > 30% in proteins involves homology (>65% nucleic) Identity > 80-90% usual in orthologs from close species Identity 10-30%. If there is homology may be not detectable (twilight zone)
  • Slide 12
  • No me gusta la bioinformatica Teme usted la ionosfera optica Nomegusta-labioin-forma--tica Teme-ustedla-ionosfer-aoptica 64% identity? But I dont like bioinformatics Do you fear optical ionospher?
  • Slide 13
  • DNA or protein? Both give information about homologa Protein: Exists functional equivalence between aminoacids
  • Slide 14
  • DNA: only identity is relevant Mismatches do not have variable cost. No substitution is better than other usually Canonical base pairing (Watson-Crick)
  • Slide 15
  • genetic code Pos 1Posicin 2Pos 3 UCAG UPhe Leu Ser Tyr Stop Cys Stop Trp UCAGUCAG CLeu Pro His Gln Arg UCAGUCAG AIle Met Thr Asn Lys Ser Arg UCAGUCAG GVal Ala Asp Glu Gly UCAGUCAG Trp, Met (1) Leu, Ser, Arg (6) others (2) Initiation AUG Stop (3) Third base pare degeneration XYC = XYU XYA ~ XYG
  • Slide 16
  • Equivalent aminoacids Hydrophobics Ala (A), Val (V), Met (M), Leu (L), Ile (I), Phe (F), Trp (W), Tyr (Y) Small Gly (G), Ala (A), Ser (S) Polar Ser (S), Thr (T), Asn (N), Gln (Q), Tyr (Y) En la superficie de la protena polares y cargados son equivalentes With charge Asp (D), Glu (E) / Lys (K), Arg (R) Difficult to be substituted Gly (G), Pro (P), Cys (C), His (H) BE CAREFULL: aminoacids do not always perform the same function in proteins
  • Slide 17
  • Histidin For the hemo coordination bonds Prolin in a turn 2 conserved glycines in 2 separate helix crossing each other 3D visualization of some conserved residues in globin family (Myoglobin structure)
  • Slide 18
  • DNA sequence diverges quicker than protein Mutation or recombination may alter DNA but must mantain function/structure Protein sequence comparison permits finding and localize very distant homologous proteins
  • Slide 19
  • Sequence alignment Measure the degree of similarity/identity and thus the existence of homology requires un alignment Strong identity/similarity: AWTRRATVHDGLMEDEFAA AWTRRATVHDGLCEDEFAA Weak identity/similarity: AWTKLATAVVVFEGLCEDEWGG AWTRRAT---VHDGLMEDEFAA
  • Slide 20
  • Alignments pairwise 2 sequences Multiple More than 2 sequences Global Whole sequence is considered Local Only similar regions are aligned
  • Slide 21
  • Strategies Depends of the goal Sequence comparison Goal: establish homology, identify equivalent aminoacuds global, pairwise/multiple Search in data bases Goal: Identify homologous proteins in a big group of sequences Local, pairwise
  • Slide 22
  • Automatic Alignment Requires Objective method to compare aminoacids or bases in order to score the alignment (comparison matrix) Algoritm to find the best alignment with the maximal score Quick and easy to reproduce Do not permit, in general, introduce additional information
  • Slide 23
  • Matrix types Identity Physico-chemical properties Genetics (codon substitution) Evolution
  • Slide 24
  • Slide 25
  • Blosum 62 Small positive score for changes in similar aminoacids Small positive score for commonaminoacids Infrequente aminoacids have high score High Penalty for very different aminoacids Same score independent of position !!
  • Slide 26
  • Rat versus mouse protein Rat versus bacterial protein BLOSUM90 PAM30 BLOSUM45 PAM240 BLOSUM80 PAM120 BLOSUM62 PAM180 Choice of a Matrix!
  • Slide 27
  • Query LengthSubstitution MatrixGap Costs


View more >