“homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on...

24
“Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.

Upload: rosalyn-perry

Post on 25-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

“Homology-enhanced probabilistic consistency” multiple sequence alignment :

a case study on transmembrane protein

Jia-Ming Chang

2013-July-09

Chang, J-M, P Di Tommaso, J-Fß Taly, C Notredame. 2012. Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13.

Page 2: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Transmembrane proteinMembrane proteins are likely to constitute 20-30% of all ORFs contained in genomes.

Odorant receptors

Richard Benton, “Eppendorf winner. Evolution and revolution in odor detection,” Science (New York, N.Y.) 326, no. 5951 (October 16, 2009): 382-383.

Page 3: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Transmembrane protein multiple sequence alignment

• 1994 first address alignment for transmembrane proteins

– Cserzo M, Bernassau JM, Simon I, Maigret B: New alignment strategy

for transmembrane proteins. J Mol Biol 1994, 243(3):388-396.

• Few multiple sequence alignment software till now => 3

– Shafrir Y, Guy HR: STAM: simple transmembrane alignment

method. Bioinformatics 2004, 20(5):758-769.

– Forrest LR, Tang CL, Honig B: On the accuracy of homology modeling

and sequence alignment methods applied to membrane

proteins. Biophys J 2006, 91(2):508-517.

– Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for

improved multiple alignment of transmembrane proteins.

Bioinformatics 2008, 24(4):492-497.

Page 4: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

BAliBASE 2.0 reference 7

Pirovano W, Feenstra KA, Heringa J: PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 2008, 24(4):492-497.

Page 5: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

We need an accurate Transmembrane MSA!

Page 6: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Homology-extended

Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.

Page 7: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Homology-extended

Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.

Page 8: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Pair-hidden Markov Model

Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 2005, 15(2):330-340.

Emission probabilities, which correspond to traditional substitution scores, are based on the BLOSUM62 matrix.

Page 9: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Probabilistic consistency transformation

Page 10: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Homology-extended probabilistic consistency

New emission probabilities are like the following.

20 20

)..,..(),('m n

nmnmji AAAApyxp

where αm is the frequency with which residue m appears at position i and βn is the frequency with which residue n appears at position j; p(A.A.m, A.A.n) is the original emission probabilities in ProbCons.

Page 11: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Homology-extended probabilistic consistency

where αi , βj , and rk are the profile frequency.

Page 12: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Homology-extended

Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824.

Que1: how to build a profile?

Que2: how to score profiles?

Page 13: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Que1: how to build a profile?• Database Size

• Searching parameters

– E-value : most used, anything else???

1. Matrix file : -M2. Filter the query sequence for low-complexity subsequence : -F3. Neighborhood word threshold : -f4. Truncates the report to number of alignments: -b

Page 14: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Word hit & Neighborhood

Page 15: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Searching parameters

• Fast, Insensitive search

– High percent identity

– blastp –F “m S” –f 999 –M BLOSUM80 –G 9 –E 2 –e 1e-5

• Slow, Sensitive search

– Increase sensitivity, decrease specificity

– blastp –F “m S” –f 9 –M BLOSUM45 –e 100 –b 10000 –v

10000

• Book “BLAST”, page 146, 147

Page 16: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

UniRef50

TM

UniRef90

TM

UniRef100

TM

UniProtTM

Different database

UniProt (release 15.15 – 2010)

NCBI non-redundant (NR)

UniRef50

UniRef90

UniRef100

keyword:"Transmembrane [KW-0812]"

Page 17: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Database SizeData Set No.

UniRef50-TM 87,989

UniRef90-TM 263,306

UniRef100-TM 613,015

UniProt-TM 818,635

UniRef50 3,077,464

UniRef90 6,544,144

UniRef100 9,865,668

UniProt 11,009,767

NCBI NR 10,565,004UniRef5

0TM

UniRef90

TM

UniRef100

TM

UniProtTM

UniProt (release 15.15 – 2010)

NCBI non-redundant (NR)

UniRef50

UniRef90

UniRef100

keyword:"Transmembrane [KW-0812]"

Page 18: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Performance comparison of different database sizes for the BAliBASE2-

ref7.

UniRef50-TM contains about 100 times fewer sequences than the full UniProt.

The level accuracy is comparable and even superior to that achieved with the default PSI-Coffee while the CPU time requirements are dramatically decreased by a factor 10.

Page 19: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,
Page 20: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

10% more columns are correctly aligned when compared with PRALINETM .

The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively.

Page 21: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

BAliBASE 3.0

The performance of other methods are from Rausch et al. The SP and TC scores of full-length sequences are evaluated by core blocks (by xml).

Page 22: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Que2: how to score profiles?

Edgar RC, Sjolander K: A comparison of scoring functions for protein sequence profile alignment. Bioinformatics 2004, 20(8):1301-1308.

Page 23: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

• Prediction mode : –template_file PSITM

• Output : -output tm_html

This output was obtained on Or94b of D. melanogaster and its orthologs of other Drosophlia species. Notably, the predicted topology of the Or94b set is consistent with the Benton et al.’s conclusion.

Page 24: “Homology-enhanced probabilistic consistency” multiple sequence alignment : a case study on transmembrane protein Jia-Ming Chang 2013-July-09 Chang, J-M,

Paolo Di Tommaso

http://tcoffee.crg.cat/tmcoffee