cisc 436/636 computational biology &bioinformatics (fall...

42
CISC 436/636 Computational Biology &Bioinformatics (Fall 2016) Lecture 1 Course Overview Li Liao Computer and Information Sciences University of Delaware

Upload: others

Post on 17-Jun-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 436/636 Computational Biology &Bioinformatics

(Fall 2016) Lecture 1

Course Overview

Li Liao

Computer and Information Sciences

University of Delaware

Page 2: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Administrative stuff

Webpage: http://www.cis.udel.edu/~lliao/cis636f16

Syllabus and tentative schedule (check frequently for update)

Office hours: 12:45PM-1:45PM Tuesdays and Thursdays. Appointments

Collect info (name, email, major, programming language)

Introduce textbook and other resources

URLs, PDF/PS files, or hardcopy handout

A reading list

Workload

4 homework assignments (hands-on to learn the nuts and bolts)

• Language issue: Perl is strongly recommended (A tutorial is provided)

Mid-term and final exams

Late policy: 10% off per class and no later than one week.

Page 3: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Bioinformatics Books

Markketa Zvelebil and Jeremy Baum, Understanding Bioinformatics, Garland Science, 2008.

Dan E. Krane & Michael L. Raymer, Fundamental Concepts of Bioinformatics, Benjamin Cummings 2002

R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.

João Meidanis & João Carlos Setubal. Introduction to Computational Molecular Biology. PWS Publishing Company, Boston, 1996.

Peter Clote and Rolf Backofen, Computational Molecular Biology: An Introduction, Willey 2000.

Dan Gusfield. Algorithms on String, Trees, and Sequences. Cambridge University Press, 1997.

P. Baldi and S. Brunak, Bioinformatics, The Machine Learning Approach, The MIT press, 1998.

D.W. Mount, Bioinformaics: Sequence and Genome Analysis, CSHLP 2004.

Page 4: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Molecular Biology Books

Free materials:

Kimball's biology

Lawrence Hunter: Molecular biology for

computer scientists

DOE’s Molecular Genetics Primer

Books:

Instant Notes series: Biochemistry,

Molecular Biology, and Genetics

Molecular Biology of The Cell, by Alberts

et al

Page 5: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Bioinformatics

- use and develop computing methods to solve biological

problems

The field is characterized by

an explosion of data

difficulty in interpreting the data

a large number of open problems

until recently, relative lack of sophistication of

computational techniques (compared with, say, signal

processing, graphics, etc.)

Page 6: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Why is this course good for you?

Expand your knowledge base

Bioinformatics is a computational wing of

biotechnology.

Computational techniques in bioinformatics

are widely useful elsewhere

- Cybergenomics: detect, dissect malware

Job market is strong

Page 7: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Industry is moving in

IBM:

BlueGene, the fastest computer with 1 million CPU

Blueprint worldwide collects all the protein information

Bioinformatics segment will be $40 billion in 2004 up from

$22 billion in 2000

GlaxoSmithKline

Celera

Merck

AstraZeneca

Page 8: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Computing and IT skills

Algorithm design and model building

Working with unix system/Web server

Programming (in PERL, Java, etc.)

RDBMS: SQL, Oracle PL/SQL

Page 9: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

People

International Society for Computational

Biology (www.iscb.org) ~ 2500 members

Severe shortage for qualified

bioinformatians

Page 10: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Conferences

ISMB (Intelligent Systems for Molecular

Biology) started in 1992

RECOMB (International Conference

on Computational Molecular Biology) started

in 1997

PSB (Pacific Symposium on Biocomputing)

started 1996

TIGR Computational genomic, started in

1997

...

Page 11: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Journals

Bioinformatics

BMC Bioinformatics

Journal of Computational Biology

Genome Biology

Genomics

Genome Research

Nucleic Acids Research

...

Page 12: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

A short history: 2000 -- 2010

Page 13: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Human Genome at Ten:

Page 14: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 15: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Human Genome at Ten:

Page 16: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Human Genome at Ten:

Page 17: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Human Genome at Ten:

Page 18: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 19: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 20: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 21: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 22: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

It is “much easier” to teach people with those skills

about biology than to teach biologists how to code

well.

Page 23: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:
Page 24: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 25: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 26: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 27: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 28: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Organisms: three kingdoms -- eukaryotes, eubacteria, and archea

Cell: the basic unit of life

Chromosome (DNA)

> circular, also called plasmid when small (for bacteria)

> linear (for eukaryotes)

Genes: segments on DNA that contain the instructions for organism's structure and function

Proteins: the workhorse for the cell.

> establishment and maintenance of structure

> transport. e.g., hemoglobin, and integral transmembrane proteins

> protection and defense. e.g., immunoglobin G

> Control and regulation. e.g., receptors, and DNA binding proteins

> Catalysis. e.g., enzymes

Page 29: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

Bioinformatics in a … cell

CISC 636, F16, Lec1, Liao

Page 30: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

Protein-Protein Interaction plays essential

roles in cellular processes

CISC 636, F16, Lec1, Liao

Page 31: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Small molecules:

> sugar: carbohydrate

> fatty acids

> nucleotides: A, C, G, T --> DNA (double helix, hydrogen bond, complementary bases A-T, G-C)

four bases: adenine, cytosine, guanine, and thymidine (uracil)

5' end phosphate group

3' end is free

1' position is attached with the base

double strand DNA sequences form a helix via hydrogen bonds between complementary bases

hydrogen bond:

- weak: about 3~5 kJ/mol (A covalent C-C bond has 380 kJ/mol), will break when heated

- saturation:

- specific:

Page 32: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

atcgggctatcgatagctatagcgcgatatatcgcgcgtatatgcgcgcatattag tagctagtgctgattcatctggactgtcgtaatatatacgcgcccggctatcgcgct atgcgcgatatcgcgcggcgctatataaatattaaaaaataaaatatatatatatgc tgcgcgatagcgctataggcgcgctatccatatataggcgctcgcccgggcgcga tgcatcggctacggctagctgtagctagtcggcgattagcggcttatatgcggcga gcgatgagagtcgcggctataggcttaggctatagcgctagtatatagcggctagc cgcgtagacgcgatagcgtagctagcggcgcgcgtatatagcgcttaagagcca aaatgcgtctagcgctataatatgcgctatagctatatgcggctattatatagcgca gcgctagctagcgtatcaggcgaggagatcgatgctactgatcgatgctagagca gcgtcatgctagtagtgccatatatatgctgagcgcgcgtagctcgatattacgcta cctagatgctagcgagctatgatcgtagca…………………………………….

Hack the life…

Page 33: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Genetic Code: codons

Page 34: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Information Expression

1-D information array 3-D biochemical structure

Page 35: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Challenges in Life Sciences

Understanding correlation between genotype

and phenotype

Predicting genotype <=> phenotype

Phenotypes:

drug/therapy response

drug-drug interactions for expression

drug mechanism

interacting pathways of metabolism

Page 36: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

Credit:Kellis & Indyk CISC 636, F16, Lec1, Liao

Profiling

Structure Prediction 14

Page 37: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 38: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Page 39: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Topics

Mapping and assembly

Sequence analysis (Similarity -> Homology): Pairwise alignment (database searching)

Multiple sequence alignment, profiling

Gene prediction

Pattern (Motif) discovery and recognition

Phylogenetics analysis Character based

Distance based

Probabilistic

Structure prediction RNA Secondary

Protein Secondary & tertiary

Network analysis: Metabolic pathways reconstruction

PPI network

Regulatory networks (Gene expression)

Page 40: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

How should I learn this course?

Come to the class, do homework assignments, reading assignments, and ask questions!

Nuts and Bolts: A lot of facts, new terminologies, models and algorithms

A typical approach to study almost any subject

> what is already known? (what is the state of the art, so you won't reinvent the wheel)

> what is unknown?

o Known unknowns

o unknown unknowns

Page 41: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

How much should I know about biology?

- Apparently, the more the better

- The least, Pavzner's 3-page "All you need to know about Molecular biology".

- Chapters 1 & 2 of the text.

- We adopt an "object-oriented" scheme, namely, we will transform biological problems into abstract computing problems and hide unnecessary details.

So another big goal of this course is learn how to do abstraction.

Page 42: CISC 436/636 Computational Biology &Bioinformatics (Fall ...lliao/cis636f16/cis636_lec1_overview.pdf · R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis:

CISC 636, F16, Lec1, Liao

Goals?

At the end of this course, you should be able to

- Describe the main computational challenges in

molecular biology.

- Implement and use basic algorithms.

- Describe several advanced algorithms.

Sequence alignment using dynamics programming

Hidden Markov models

Hierarchical clustering

K-means

Gradient descent optimization

Monte Carlo simulation

- Know the existing resources: Databases, Software, …