clustal w and clustal x version 2.0 김영호, 박준호, 최현희 the 9 th protein folding winter...
Post on 18-Dec-2015
218 views
TRANSCRIPT
Clustal W and Clustal X version 2.0
김영호김영호 , , 박준호박준호 , , 최현희최현희
The 9The 9thth Protein Folding Winter School Protein Folding Winter School
The Paper
Abstract
The Clustal W and Clustal X multiple sequence alignment programs have been completely rewritten in C++
This will facilitate the further development of the alignment algorithms in the future
This has allowed proper porting of the programs to the latest versions of Linux, Macintosh and Windows operating systems
Introduction Introduction 11
Contents
Clustal W 2.0 and Clustal X 2.0 Clustal W 2.0 and Clustal X 2.0 22
New FeaturesNew Features33
Related SourcesRelated Sources44
Introduction
One of the oldest and most widely used First distributed by post on floppy disks
(late 1980s, witten in Microsoft Fortran for MS-DOS)
Clustal 1 ~ Clustal 4 (1988, 1989, IBM compatible PCs)
Clustal V (1992, VAX/VMS, Unix, Apple Macintosh, IBM compatible PCs)
Introduction
Clustal W and Clustal X (late 1990s)
Other powerful tools BAliBASE T-Coffee MAFFT MUSCLE
Yet, Clustal W and Clustal X continue to be very widely used. (EBI Clustal site gets millions of multiple alignment jobs per yr)
Introduction
Clustal W and Clustal X W : Command terminal X : Graphic
Procedure Sequence input
(choose a chain or domain from each FASTA sequence) Concatenate all the query sequences in one file Run Output
(score, alignment)
Clustal W 2.0 and Clustal X 2.0
What’s new? Rewritten in C++
Easier to maintain the code Easier to modify, replace some of the
alignment algorithms. UPGMA guide trees
Alternative to the NJ guide trees Speeds up the alignment of large data sets
Iterative alignment facility Increase alignment accuracy
Clustal W 2.0 and Clustal X 2.0
Clustal X Developed using NCBI’s vibrant toolbox The vibrant toolbox is no longer supported
Clustal X 2.0 Rewritten using the Qt GUI toolbox Qt GUI toolbox provides a native look and feel
on Windows, Linux and Mac platforms`
New Features
UPGMA Faster than NJ
(takes less than a minute to cluster 10,000 sequences while NJ takes over an hour)
Slightly less accurate than BAliBASE benchmark, but on large alignments this is offset by the savings in processing time(2h vs. 12h)
New Features
Iteration A quick and effective method of refining
alignments. ‘Remove first’ iteration scheme WSP (Weighted Sum of Pairs)
During each iteration step, each sequence is removed form the alignment in turn and realigned. If the WSP score is reduced then the resulting alignment is retained.
New Features
Command line option ‘-clustering=UPGMA’
Calls algorithm for UPGMA ‘-iteration=alignment’
Refines the final alignment Less accurate but faster
‘-iteration=tree’ Refines at each step in the progressive
alignment More accurate but slower
‘-numiters’ Sets iteration cycles (default: 3)
Related Sources
EBI Website European Bioinformatics Institute website Supports several alignment programs We can try various programs
(Eg. ClustalW, MAFFT, T-coffee, MUSCLE etc.)
Related Sources
Clustal (web)
Related Sources
Clustal (dos)
Related Sources
Clustal (dos)
Related Sources
MUSCLE
Related Sources
T-Coffee
Related Sources
MAFFT
Related Sources
Kalign