![Page 1: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/1.jpg)
Pairwise Alignment
![Page 2: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/2.jpg)
Sequences are related..
Phylogenetic tree of globin-type proteins found in humans
![Page 3: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/3.jpg)
The process of lining up two or more sequences to achieve maximal levels of identity (or similarity, in the case of amino acid sequences).
Definition of Pairwise alignment
![Page 4: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/4.jpg)
What for? A Few Examples:• Determining whether 2 sequences from 2
entries found by search of keywords are similar/ identical
• Focus on differences (genes sequenced in different labs, alternative splicing, SNPs, mutations.
• Finding similar (conserved) regions in two sequences
• More….
![Page 5: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/5.jpg)
How do we align two sequences?
ATTGCAGTGATCG
ATTGCGTCGATCG
Solution 1 Solution 2ATTGCAGTGATCG ATTGCAGT-GATCG||||| ||||| ||||| || ||||| ATTGCGTCGATCG ATTGC-GTCGATCG
10 matches | , 3 mismatches
12 matches |, 2 gaps -
![Page 6: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/6.jpg)
Which alignment is better?
Solution 1 Solution 2ATTGCAGTGATCG ATTGCAGT-GATCG||||| ||||| ||||| || ||||| ATTGCGTCGATCG ATTGC-GTCGATCG
10X1+3X(-1) = 7 12X1+2X(-2) = 8
10 matches, 3 mismatches 12 matches, 2 gaps
We will use a scoring schemeMatch +1 +1Mismatch –1 0Indel(gap) -2 -2
10X1+3X(0) = 10 12X1+2X(-2) = 8
![Page 7: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/7.jpg)
Changing the scores of the matrix scheme can change the final score of a
given aligned segment.
So how do we determine our matrix schemes?
![Page 8: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/8.jpg)
The mechanistic Rational
DNAמה קורה בעת סינתיזת ?
![Page 9: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/9.jpg)
Biological causes of mismatchesAccumulation of mutations in a segment of the sequence that is less crucial for function can create a stretch of mismatches.
(Any residue can be subject to back mutations.)
Very common.
ATTGCAGTGATCG||||| |||||ATTGCGTCGATCG
ATTGCAGTGATCG||||| | |||||ATTGCGGCGATCG
May reflect 2 or 4 independent
mutations
Original sequence
Emerging sequence
Original sequence
Emerging sequence
![Page 10: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/10.jpg)
Biological causes of gaps
(indel – insertion / deletion)•A single mutation can create a gap .
•Unequal crossover in meiosis can lead to insertion or deletion of strings of bases.
•DNA slippage in the replication procedure can result in the repetition of a string.
•Retrovirus insertions.
•Translocations of DNA between chromosomes.
Less common than events leading to single mutations
Are all gaps equal?
![Page 11: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/11.jpg)
A sequence with a short gap: ATCTTCAGTGTTTCCCCTGTTTTGCCC.ATTTAGTTCGCTC ||||||||||||||||||||||||||| |||||||||||||
ATCTTCAGTGTTTCCCCTGTTTTGCCCGATTTAGTTCGCTC
A sequence with a long gap: ATCTTCAGTGTTTCCCCTGTTTTGCCC....................ATTTAGTTCGCTC ||||||||||||||||||||||||||| ||||||||||||| ATCTTCAGTGTTTCCCCTGTTTTGCCCGXXXXXXXXXXXXXXXXXXXATTTAGTTCGCTC
Consider the following pair of sequences:
Two options for gap scoring
Keep the score similar regardless of gap length = have a zero gap extension penalty and just penalize when you open a gap.
Make the score become larger as a linear function of gap length = add gap extension penalty. This will penalize several small gaps by the same extent as 1 large gap.
![Page 12: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/12.jpg)
Gap penalties can penalize for:
•Gap opening
•Gap extension
•Gap ending (ClustalW – multiple alignment)
•Gap separation (minimum distance between 2 gaps) [ClustalW]
![Page 13: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/13.jpg)
What happens to the alignment if we change the gap penalties?
Gap opening
Gap extension
![Page 14: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/14.jpg)
מ:global alignmentאיך יושפע
קנסות גבוהים על פתיחת פער •
קנסות גבוהים על הארכת פער•
local alignmentהאם יושפע באותו אופן/ באותה
מידה?
![Page 15: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/15.jpg)
ATTGCAGTGATCGATTGCAGT-GATCG||||| |||||||||| || ||||| ATTGCGTCGATCGATTGC-GTCGATCG
Matches | Mismatches
Gaps - - - - -
Gap openingGap extension
פרס קנסות
Minimal space between two gaps הרשאות
When comparing nucleotide or amino
acid sequences
ציון ההשוואה ניתן בשיטת השוט והגזר
So far, when nucleotide sequences were considered all
mismatches received the same (negative) score .
![Page 16: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/16.jpg)
Ex: Pairwise alignments43.2% identity; Global alignment score: 374
10 20 30 40 50 alpha V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSA : :.: .:. : : :::: .. : :.::: :... .: :. .: : ::: :. beta VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP 10 20 30 40 50
60 70 80 90 100 110 alpha QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL .::.::::: :.....::.:.. .....::.:: ::.::: ::.::.. :. .:: :.beta KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF 60 70 80 90 100 110
120 130 140 alpha PAEFTPAVHASLDKFLASVSTVLTSKYR :::: :.:. .: .:.:...:. ::.beta GKEFTPPVQAAYQKVVAGVANALAHKYH 120 130 140
![Page 17: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/17.jpg)
Pairwise alignment
Percent identity is not a good measure of alignment quality
100.000% identity in 3 aa overlap
SPA::: SPA
![Page 18: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/18.jpg)
Pairwise alignments: alignment score
43.2% identity; Global alignment score: 374
10 20 30 40 50 alpha V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSA : :.: .:. : : :::: .. : :.::: :... .: :. .: : ::: :. beta VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP 10 20 30 40 50
60 70 80 90 100 110 alpha QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL .::.::::: :.....::.:.. .....::.:: ::.::: ::.::.. :. .:: :.beta KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF 60 70 80 90 100 110
120 130 140 alpha PAEFTPAVHASLDKFLASVSTVLTSKYR :::: :.:. .: .:.:...:. ::.beta GKEFTPPVQAAYQKVVAGVANALAHKYH 120 130 140
![Page 19: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/19.jpg)
Global alignment
An alignment that assumes that the two proteins are basically similar over the entire length of one another. The alignment attempts to match them to each other from end to end, even though parts of the alignment are not very convincing.
A short example
NLGPSTKDFGKISESREFDNQ
| |||| |
QLNQLERSFGKINMRLEDALV
![Page 20: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/20.jpg)
Local alignment
An alignment that searches for segments of the two sequences that match well. There is no attempt to force entire sequences into an alignment, just those parts that appear to have good similarity, according to some criterion.
Using the same sequences as above, one could get:
NLGPSTKDDFGKILGPSTKDDQ
||||
QNQLERSSNFGKINQLERSSNN
![Page 21: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/21.jpg)
Applying LOCAL
Applying GLOBAL
Global a.
Few mismatches
Several mismatches
Local a.
![Page 22: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/22.jpg)
If two proteins share more than one common region, for example one has a single copy of a
particular domain while the other has two copies, it may be possible to "miss" one of the
two copies if using local alignment, which presents only the best scoring alignment.
Emboss [best solution] vs. Lalign (Embnet) [several solutions]
![Page 23: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/23.jpg)
Pairwise alignments: conservative substitutions43.2% identity; Global alignment score: 374
10 20 30 40 50 alpha V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSA : :.: .:. : : :::: .. : :.::: :... .: :. .: : ::: :. beta VHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP 10 20 30 40 50
60 70 80 90 100 110 alpha QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL .::.::::: :.....::.:.. .....::.:: ::.::: ::.::.. :. .:: :.beta KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF 60 70 80 90 100 110
120 130 140 alpha PAEFTPAVHASLDKFLASVSTVLTSKYR :::: :.:. .: .:.:...:. ::.beta GKEFTPPVQAAYQKVVAGVANALAHKYH 120 130 140
![Page 24: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/24.jpg)
However, in the case of amino acids Not all matches are equal. Not all mismatches are equal!
![Page 25: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/25.jpg)
Amino acid properties
Serine (S) and Threonine (T) have similar physicochemical properties
Aspartic acid (D) and Glutamic acid (E) have similar properties
Substitution of S/T or E/D occurs relatively often during evolution
=>
Substitution of S/T or E/D should result in scores that are only moderately lower than identities
=>
![Page 26: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/26.jpg)
Non-polar hydrophobic
All other aa are polar, hydrophylic:
Acidic
Basic
All Amino Acids Are Equal…
![Page 27: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/27.jpg)
http://teachline.ls.huji.ac.il/~72332/mouse/aa-properties.html
Each a”a is characterized by a combination of features (size, charge, etc.).
The relative importance of each feature may vary according to the a”a role in the 3-D structure and function of the protein.
So how can we score matches and mismatches?
![Page 28: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/28.jpg)
To that end, amino acids substitution matrices were developed (Blosum, PAM).
![Page 29: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/29.jpg)
The PAM and BLOSUM substitution matrices describe the likelihood that two residue types would mutate to each other.
Amino Acids Substitution Matrices
These matrices are based on biological sequence information: the substitutions observed in structural (BLOSUM) or evolutionary (PAM) alignments of well studied protein families
These scoring systems have a probabilistic foundation.
![Page 30: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/30.jpg)
• All the PAM data come from alignments of closely related proteins (>85% amino acid identity) from 71 protein families (total of 1572 protein sequences).
• PAM matrices are based on global sequence alignments - these include both highly conserved and highly mutable regions.
PAM series - Percent Accepted Mutation(Accepted by natural selection)
Some of the protein families are:Ig kappa chainKappa caseinLactalbuminHemoglobin MyoglobinInsulinHistone H4
Ubiquitin
![Page 31: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/31.jpg)
PAM series - Percent Accepted Mutation(Accepted by natural selection)
*Varying degrees of conservation
![Page 32: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/32.jpg)
•The PAM 250 matrix is appropriate for searching for alignments of sequence that have diverged by 250 PAMs, 250 mutations per 100 amino acids of sequence. •Because of back mutations and silent mutations this corresponds to sequences that are about ~20 percent identical.
Smaller PAM number – less diversity between compared sequences
Better suited for more conserved sequences
PAM1 99% identity in sequences
Various degrees of conservation
![Page 33: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/33.jpg)
Various degrees of conservationThe PAM1 is the matrix calculated from comparisons
of sequences with no more than 1% divergence. At an evolutionary interval of PAM1, one change has occurred over a length of 100 amino acids.
Other PAM matrices are extrapolated from PAM1. For PAM250, 250 changes have occurred for two proteins over a length of 100 amino acids.
All the PAM data come from closely related proteins>)85% amino acid identity.(
![Page 34: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/34.jpg)
BLOSUM series - Blocks Substitution Matrix. (Henikoff S. & Henikoff JG., PNAS, 1992)
A substitution matrix based on alignments in the BLOCKS database – conserved regions (blocks) of
•Families of proteins•Family members have identical biochemical functions, and show common motifs•Common blocks of local alignment not containing gaps.
The BLOCKS database contains thousands of groups of multiple sequence alignments. Blocks are multiply aligned ungapped segments corresponding to the most highly conserved regions of proteins.
![Page 35: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/35.jpg)
Extracting probabilities from Blocks- example
A A C D A AA A A D C RD R C G N AA N C N A RC R K D A NA A K N C R
Substitutions counted in column 1AA, AD, AA, AC, AA, AD, AA, AC, AA, DA, DC, DA, AC, AA, CA
6AA (P(AA)=6/15)4AD (P(AD)=4/15)4AC1DC…Statistics of substitutions and log-odds computation as described for PAM.
![Page 36: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/36.jpg)
Each matrix is tailored to a particular evolutionary distance. In the BLOSUM62 matrix, for example, the alignment from which scores were derived was created using sequences sharing no more than 62% identity. Sequences more identical than 62% are represented by a single sequence in the alignment so as to avoid over-weighting closely related family members.
![Page 37: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/37.jpg)
A 4 R -1 5 N -2 0 6 D -2 -2 1 6 C 0 -3 -3 -3 9 Q -1 1 0 0 -3 5 E -1 0 0 2 -4 2 5 G 0 -2 0 -1 -3 -2 -2 6 H -2 0 1 -1 -3 0 0 -2 8 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 K -1 2 0 -1 -1 1 1 -2 -1 -3 -2 5 M -1 -2 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 A R N D C Q E G H I L K M F P S T W Y V
Blosum62 scoring matrix
![Page 38: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/38.jpg)
Using an amino acid substitution matrix
Gap penalties (not included in this example) are treated as previously
described
match
match
mismatch
mismatch
Notice that matches and mismatches
don’t have the same values.
![Page 39: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/39.jpg)
Different matrices give somewhat different scores,
but same general trends are observed.
What trends?
![Page 40: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/40.jpg)
A substitution is more likely to occur between amino acids with similar biochemical properties.
![Page 41: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/41.jpg)
Likelihood of a substitution is also affected by the degree of degenerativity of the genetic code of the different amino acids
![Page 42: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/42.jpg)
How do we choose the most appropriate scoring matrix?
• Blosum matrices are more commonly used than PAM matrices.
•The Blosum matrices are best for detecting local alignments.
•The Blosum62 matrix is the best for detecting the majority of weak protein similarities.
•The Blosum45 matrix is the best for detecting long and weak alignments.
![Page 43: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/43.jpg)
Rat versus mouse RBP
Rat versus bacteriallipocalin
![Page 44: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/44.jpg)
The following matrices are roughly equivalent
PAM100 BLOSUM90PAM120 BLOSUM80PAM160 BLOSUM60PAM200 BLOSUM52PAM250 BLOSUM45
![Page 45: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/45.jpg)
Limitations
• Substitution matrices do not take into account long range interactions between residues.
• They assume that identical residues are equal (whereas in real life a residue at the active site has other evolutionary constraints than the same residue outside of the active site)
• They assume evolution rate to be constant.
![Page 46: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/46.jpg)
DNA Substitution Matrices
Purine – Purine
Pyrimidine - Pyrimidine
Purine – Pyrimidine
Pyrimidine - Purine
![Page 47: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/47.jpg)
ConservationThe extent to which nucleotide or protein sequences are related. It can be evaluated by identity and similarity.
Identity ( | )The extent to which two sequences are invariant.
Similarity ( . : )Changes at a specific position of an amino acid that preserve the physico-chemical properties of the original residue.
Definitions
Page 47
![Page 48: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/48.jpg)
There are many ways to align two sequences.
Several ways to present the pairwise alignment
Do not blindly trust your alignment to be the only truth. In particular, gapped regions may be quite variable.
Sequences sharing less than 20% identity are difficult to align.
![Page 49: Pairwise Alignment. Sequences are related.. Phylogenetic tree of globin-type proteins found in humans](https://reader031.vdocuments.net/reader031/viewer/2022032201/56649d625503460f94a43c9d/html5/thumbnails/49.jpg)
Dotplots: visual sequence comparison
1. Place two sequences along axes of plot
2. Place dot at grid points where two sequences have identical residues
3. Diagonals correspond to conserved regions