T-Coffee tutorial ACGT Retreat 2012 Jean-François Taly, Ionas Erb and Cedrik Magis

T-Coffee tutorial ACGT Retreat 2012 Jean-François Taly, Ionas Erb and Cedrik Magis

T-Coffee tutorial

What is T-Coffee ?Tree based Consistency based Objective Function For AlignmEnt Evaluation

Progressive AlignmentConsistency

Dynamic Programming Using A Substitution MatrixProgressive Alignment3Depends on the CHOICE of the sequences.Depends on the ORDER of the sequences (Tree).Depends on the PARAMETERS:Substitution Matrix.Penalties (Gop, Gep).Sequence Weight.Tree making Algorithm.Progressive Alignment4T-Coffee and ConsistencyJ. Mol. Biol. (2000) 302, 205-217

M-Coffee:T-Coffee and other alignersPrimary libraries can be computed from any third party aligners (pairwise or MSA):clustalw2mafftmuscleprobconspcmaand many more type t_coffee for a full listTemplate Based AlignmentVery useful in case of weak sequence similaritywrong libraries will lead to wrong MSAsReplace the sequence with something more informative:Profile PSI-CoffeePDB Structure ExpressoRNA StructureR-CoffeeLLL?Simple scoring schemes result in alignment ambiguitiesPSI-Coffee:Homology extensionLLLLLLLLLLLIVILLLLLLLProfile 1Profile 2PSI-Coffee:Use conservation across the protein familyEXPRESSO: Finding automatically the right template structureSourcesTemplateLibraryStructural Template AlignmentSource & Template AlignmentRemove TemplatesTemplateBLASTPDBBLASTPDBStructural Alignment(SAP)10R-Coffee:Embedding RNA Structures Within The T-Coffee LibrariesCCGGTC LibraryG G Score XC C Score Y


The R-extension can be added on the top of any existing method:Mafft / Muscle / ProbConsConsan align the RNA sequence and predict secondary structure at the same timeBetter libraries but very slowRNA secondary structures:Predicted: RNAplFoldReal onesRNA SequencesSecondary StructuresPrimary LibraryR-Coffee ExtendedPrimary LibraryProgressive AlignmentUsing The R-ScoreRNAplfoldConsanorMafft / Muscle / ProbConsR-CoffeeExtensionR-ScoreSoon! SARA-Coffee:Like expresso but with RNA structures extracted from the PDB

Carsten Kemena Giovanni Bussotti12Pro-Coffeegives you a global alignment of homologous regulatory sequences (promoters, enhancers).

uses a dinucleotide substitution matrix derived from TRANSFAC binding site alignmentswas optimized on an ortholog finding task with promoter sequences and validated with multi-species ChIP-seq dataValidation Pro-Coffee

Which alignment is better?Validation Pro-Coffee

The 2nd one? But can we trust these binding site predictions?Validation Pro-Coffee

The 2nd one! The green sites are confirmed by ChIP-seq.

Magis & al, JMB 2010

MSA define equivalences

T-RMSD computes Intramolecular distances

One column = One matrix

One matrix = one tree

Nb columns = support

Using 3D structure for structural clustering

Structural Tree / PFAM / 3D-Coffee From structural clustering to phylogenetic inference

Glenney & wiens, Journal of Immunology 2007

Magis et al, TIBS (2012, submitted)Which Flavor?Fast AlignmentsM-Coffee with Fast Aligners: mafft, muscle, kalign

Difficult Protein AlignmentsPSI-CoffeeExpresso

Structural clusteringT-RMSD

RNA AlignmentsR-Coffee

Promoter AlignmentsPro-CoffeeServer: tcoffee.crg.cat Paolo Di Tommaso

Command line structuret_coffee-in input_file_name-method kalign_msa,muscle_msa,mafft_msaGive the list of methods you want for the computation of the primary librariesOn line documentation: http://www.tcoffee.org/Documentation/t_coffee/t_coffee_tutorial.htmCommand line structuret_coffee-in input_file_name-modefmcoffeeT-Coffee special modesmcoffeepsicoffeeexpressoOn line documentation: http://www.tcoffee.org/Documentation/t_coffee/t_coffee_tutorial.htmmcoffeepsicoffeercoffeeprocoffeeInput/output formatt_coffee-in input_file_name-modeexpresso-output output_format

clustal_aln (default)fasta_alnphylip_alnsaga_alnmsf_alnpir_alncompressed_aln On line documentation: http://www.tcoffee.org/Documentation/t_coffee/t_coffee_tutorial.htmT-Coffee other programst_coffee-other_pg seq_reformat


On line documentation: http://www.tcoffee.org/Documentation/t_coffee/t_coffee_tutorial.htmseq_reformat T-Coffee alignment editing toolt_coffee-other_pg seq_reformat-in input_file_name-output output_format-action+trim _seq_%%90_On line documentation: http://www.tcoffee.org/Documentation/t_coffee/t_coffee_tutorial.htmt_coffee-other_pg seq_reformat -helpOn line documentation: http://www.tcoffee.org/Documentation/t_coffee/t_coffee_tutorial.htmseq_reformat T-Coffee alignment editing toolT-Coffee & the cacheT-Coffee keeps data in :~/.t_coffee/cache/ Warning! The cache will accumulate your data and may become very bigSeveral options :-cache update-cache ignore-cache path

Tutorial web sitehttps://sites.google.com/site/tcoffeetutorials


Where to Trust Your Alignments

Most Methods AgreeMost Methods Disagree

