algorithms for biological sequence analysis
DESCRIPTION
Algorithms for Biological Sequence Analysis. Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: September 14, 2009 WWW: http://www.csie.ntu.edu.tw/~kmchao. About this course. - PowerPoint PPT PresentationTRANSCRIPT
Algorithms for
Biological Sequence Analysis
Kun-Mao Chao (趙坤茂 )
Department of Computer Science and Information Engineering
National Taiwan University, TaiwanDate: September 14, 2009
WWW: http://www.csie.ntu.edu.tw/~kmchao
2
About this course• Course: Algorithms for biological sequence analysis• Some basic knowledge on algorithm development and program
design is required. • We will be focused on the sequence-related algorithmic
problems. Genomic sequences are our main target.– The oldest language– The largest program
• Fall semester, 2009• 13:20 – 16:20 Monday, 107 CSIE Building.• 3 credits• Web site: http://www.csie.ntu.edu.tw/~kmchao/seq09fall
3
Coursework:
• Homework assignments and Class participation (15%)
• Two midterm exams (60%; 30% each):– October 26, 2009 (tentatively)– December 7, 2009 (tentatively)
• Oral presentation of selected papers (25%)
4
OutlinesPart I: Sequence Homology
– Introduction to basic algorithmic strategies– Pairwise sequence alignment– Multiple sequence alignment– Chaining algorithms for genomic sequence analysis– Suboptimal alignment– Comparative genomics– Compressed / constrained sequence comparison– Hidden Markov models (the Viterbi algorithm et al.)
Part II: Sequence Composition– Maximum-sum and maximum-density segments– SNP and haplotype data analysis– Approximate gapped palindrome– Genome annotation– Other advanced topics
5
A Brief History of Genetics
• 1859 Charles Darwin published “The Origin of Species.”
• 1865 Genes are particular factors. [Gregor Mendel]
• 1869 Discovery of nucleic acid [Friedrich Miescher]
• 1903 Chromosomes are hereditary units. [Walter Sutton]
• 1910 Genes lie on chromosomes. [Thomas Hunt Morgan]
• 1913 Chromosomes are linear arrays of genes. [Alfred Sturtevant]
6
A Brief History of Genetics (cont’d)
• 1931 Recombination occurs by crossing over. [Harriet Creighton and Barbara McClintock]
• 1944 DNA is the genetic material. [Oswald Avery, Colin McLeod and Maclyn McCarty]
• 1953 DNA is a double helix. [James Watson and Francis Crick]
• 1961-1967 Genetic code is triplet. [Marshall Nirenberg, Har Gobind Khorana, Sydney Brenner & Francis Crick]
• 1977 DNA was sequenced for the first time. [Fred Sanger, Walter Gilbert, and Allan Maxam]
• 21th Century: Many genomes completely sequenced
7
Milestones of Bioinformatics
• 1962 Pauling's theory of molecular evolution• 1965 Margaret Dayhoff's Atlas of Protein Sequences• 1970 Needleman-Wunsch algorithm• 1977 DNA sequencing and software to analyze it (
Staden)• 1981 Smith-Waterman algorithm developed• 1981 The concept of a sequence motif (Doolittle)• 1982 GenBank Release 3 made public• 1982 Phage lambda genome sequenced
8
Milestones of Bioinformatics (cont’d)
• 1983 Sequence database searching algorithm (Wilbur-Lipman)
• 1985 FASTP/FASTN: fast sequence similarity searching• 1988 National Center for Biotechnology Information (NC
BI) created at NIH/NLM• 1988 EMBnet network for database distribution• 1990 BLAST: fast sequence similarity searching• 1991 EST: expressed sequence tag sequencing• 1993 Sanger Centre, Hinxton, UK• 1994 EMBL European Bioinformatics Institute, Hinxton,
UK
9
Milestones of Bioinformatics (cont’d)
• 1995 First bacterial genomes completely sequenced
• 1996 Yeast genome completely sequenced
• 1997 PSI-BLAST
• 1998 Worm (multicellular) genome completely sequenced
• 1999 Fly genome completely sequenced
10
Milestones of Bioinformatics (cont’d)
• Human Genome Project (1990-2003)
• Mouse 2002
• Rat 2004
• Chimpanzee 2005
• Completed Genomes
11
Chimpanzee Genome
12
The Primate Family Tree
Source: Nature
13
A New BookPublished by Springer in 2009
(ISBN 978-1848003194)
Sequence Comparison: Theory and Methodsby Kun-Mao Chao and Louxin Zhang