algorithms for biological sequence analysis

13
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙趙趙 ) Department of Computer Scienc e and Information Engineering National Taiwan University, T aiwan Date: September 14, 2009 WWW: http://www.csie.ntu.edu.t w/~kmchao

Upload: vielka-dawson

Post on 01-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Algorithms for Biological Sequence Analysis. Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan Date: September 14, 2009 WWW: http://www.csie.ntu.edu.tw/~kmchao. About this course. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Algorithms  for  Biological Sequence Analysis

Algorithms for

Biological Sequence Analysis

Kun-Mao Chao (趙坤茂 )

Department of Computer Science and Information Engineering

National Taiwan University, TaiwanDate: September 14, 2009

WWW: http://www.csie.ntu.edu.tw/~kmchao

Page 2: Algorithms  for  Biological Sequence Analysis

2

About this course• Course: Algorithms for biological sequence analysis• Some basic knowledge on algorithm development and program

design is required. • We will be focused on the sequence-related algorithmic

problems. Genomic sequences are our main target.– The oldest language– The largest program

• Fall semester, 2009• 13:20 – 16:20 Monday, 107 CSIE Building.• 3 credits• Web site: http://www.csie.ntu.edu.tw/~kmchao/seq09fall

Page 3: Algorithms  for  Biological Sequence Analysis

3

Coursework:

• Homework assignments and Class participation (15%)

• Two midterm exams (60%; 30% each):– October 26, 2009 (tentatively)– December 7, 2009 (tentatively)

• Oral presentation of selected papers (25%)

Page 4: Algorithms  for  Biological Sequence Analysis

4

OutlinesPart I: Sequence Homology

– Introduction to basic algorithmic strategies– Pairwise sequence alignment– Multiple sequence alignment– Chaining algorithms for genomic sequence analysis– Suboptimal alignment– Comparative genomics– Compressed / constrained sequence comparison– Hidden Markov models (the Viterbi algorithm et al.)

Part II: Sequence Composition– Maximum-sum and maximum-density segments– SNP and haplotype data analysis– Approximate gapped palindrome– Genome annotation– Other advanced topics

Page 5: Algorithms  for  Biological Sequence Analysis

5

A Brief History of Genetics

• 1859 Charles Darwin published “The Origin of Species.”

• 1865 Genes are particular factors. [Gregor Mendel]

• 1869 Discovery of nucleic acid [Friedrich Miescher]

• 1903 Chromosomes are hereditary units. [Walter Sutton]

• 1910 Genes lie on chromosomes. [Thomas Hunt Morgan]

• 1913 Chromosomes are linear arrays of genes. [Alfred Sturtevant]

Page 6: Algorithms  for  Biological Sequence Analysis

6

A Brief History of Genetics (cont’d)

• 1931 Recombination occurs by crossing over. [Harriet Creighton and Barbara McClintock]

• 1944 DNA is the genetic material. [Oswald Avery, Colin McLeod and Maclyn McCarty]

• 1953 DNA is a double helix. [James Watson and Francis Crick]

• 1961-1967 Genetic code is triplet. [Marshall Nirenberg, Har Gobind Khorana, Sydney Brenner & Francis Crick]

• 1977 DNA was sequenced for the first time. [Fred Sanger, Walter Gilbert, and Allan Maxam]

• 21th Century: Many genomes completely sequenced

Page 7: Algorithms  for  Biological Sequence Analysis

7

Milestones of Bioinformatics

• 1962 Pauling's theory of molecular evolution• 1965 Margaret Dayhoff's Atlas of Protein Sequences• 1970 Needleman-Wunsch algorithm• 1977 DNA sequencing and software to analyze it (

Staden)• 1981 Smith-Waterman algorithm developed• 1981 The concept of a sequence motif (Doolittle)• 1982 GenBank Release 3 made public• 1982 Phage lambda genome sequenced

Page 8: Algorithms  for  Biological Sequence Analysis

8

Milestones of Bioinformatics (cont’d)

• 1983 Sequence database searching algorithm (Wilbur-Lipman)

• 1985 FASTP/FASTN: fast sequence similarity searching• 1988 National Center for Biotechnology Information (NC

BI) created at NIH/NLM• 1988 EMBnet network for database distribution• 1990 BLAST: fast sequence similarity searching• 1991 EST: expressed sequence tag sequencing• 1993 Sanger Centre, Hinxton, UK• 1994 EMBL European Bioinformatics Institute, Hinxton,

UK

Page 9: Algorithms  for  Biological Sequence Analysis

9

Milestones of Bioinformatics (cont’d)

• 1995 First bacterial genomes completely sequenced

• 1996 Yeast genome completely sequenced

• 1997 PSI-BLAST

• 1998 Worm (multicellular) genome completely sequenced

• 1999 Fly genome completely sequenced

Page 10: Algorithms  for  Biological Sequence Analysis

10

Milestones of Bioinformatics (cont’d)

• Human Genome Project (1990-2003)

• Mouse 2002

• Rat 2004

• Chimpanzee 2005

• Completed Genomes

Page 11: Algorithms  for  Biological Sequence Analysis

11

Chimpanzee Genome

Page 12: Algorithms  for  Biological Sequence Analysis

12

The Primate Family Tree

Source: Nature

Page 13: Algorithms  for  Biological Sequence Analysis

13

A New BookPublished by Springer in 2009

(ISBN 978-1848003194)

Sequence Comparison: Theory and Methodsby Kun-Mao Chao and Louxin Zhang