![Page 1: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/1.jpg)
CS 466Introduction to Bioinformatics
Saurabh Sinha
![Page 2: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/2.jpg)
What is the course about?
• Algorithmic concepts, applied to sample (toy)problems in “bioinformatics”– Follows the text book
• “Real” bioinformatics research– Follows the best journals
• Not about practical training in the use ofpopular bioinformatics software
![Page 3: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/3.jpg)
Grading
• Assignments: 40%– About one every two weeks
• Mid Term: 30%• Final: 30%
![Page 4: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/4.jpg)
Expectations
• Some programming skills– Any programming language is fine
![Page 5: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/5.jpg)
Administrative Details• Instructor:
– Saurabh Sinha– Room 2122, Siebel Center– Email: [email protected]
• Class hrs: Tue & Thu, 3:30pm-4:45pm, 1131SC• Office hrs: Tue, before class (2:30 - 3:30 pm) 2122SC• Web site: http://veda.cs.uiuc.edu/courses/fa08/cs466/• Welcome to sit in, if not taking for credit
![Page 6: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/6.jpg)
Text books
• Jones and Pevzner: required
• Durbin et al.: recommended
![Page 7: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/7.jpg)
Other course
• CS 591 BIO: weekly seminar onbiomedical informatics
• Fridays at 10:30 am.
![Page 8: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/8.jpg)
Motivating bioinformatics
![Page 9: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/9.jpg)
Special issue of journal Science, July 1, 2005.
![Page 10: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/10.jpg)
>What Is the Universe Made Of?>What is the Biological Basis ofConsciousness?>Why Do Humans Have So Few Genes?>ToWhat Extent Are Genetic Variation and Personal HealthLinked?>Can the Laws of Physics Be Unified?>How Much CanHuman Life Span Be Extended?>What Controls OrganRegeneration?>How Can a Skin Cell Become a NerveCell?>How Does a Single Somatic Cell Become a WholePlant?>How Does Earth's Interior Work?>Are We Alone in theUniverse?>How and Where Did Life on Earth Arise?>WhatDetermines Species Diversity?>What Genetic Changes MadeUs Uniquely Human?>How Are Memories Stored andRetrieved?>How Did Cooperative Behavior Evolve?>How WillBig Pictures Emerge from a Sea of Biological Data?>How FarCan We Push Chemical Self-Assembly?>What Are the Limits ofConventional Computing?>Can We Selectively Shut OffImmune Responses?>Do Deeper Principles Underlie QuantumUncertainty and Nonlocality?>Is an Effective HIV VaccineFeasible?>How Hot Will the Greenhouse World Be?>What CanReplace Cheap Oil -- and When?>Will Malthus Continue to BeWrong?
![Page 11: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/11.jpg)
Many of the most profound scientificquestions of today are within therealm of bioinformatics research
![Page 12: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/12.jpg)
“Why do humans have so fewgenes ?”
![Page 13: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/13.jpg)
A simple organism
GENE
Raw
mat
eria
lsEnvironmental signal
Response (protein)
![Page 14: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/14.jpg)
A simple organism
GENE1
GENE2
GENE3
![Page 15: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/15.jpg)
A simple organism
GENE1
GENE2
GENE3
GENE4
GENE5
GENE6
GENE7
GENE8
GENE9
GENE10
![Page 16: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/16.jpg)
A complex organism
GENE1
GENE2
GENE3
GENE4
GENE5
GENE6
GENE7
GENE8
GENE9
GENE10
Complex circuit of interactions
![Page 17: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/17.jpg)
Regulatory networks• This may be the reason why humans
have so few genes (the circuit, not thenumber of switches, carries thecomplexity)
• Bioinformatics can unravel suchnetworks, given the genome (DNAsequence) and gene activity information
![Page 18: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/18.jpg)
Decoding the regulatory network
• Find patterns (“motifs”) in DNAsequence that occur more often thanexpected by chance
• Statistics on DNA sequences and words
• Knowing these can tell us about edgesin the regulatory network
![Page 19: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/19.jpg)
Decoding the regulatory network
• An example computational problem:• Given a string of length 10,000 over the
alphabet {A,C,G,T}• Count the number of occurrences Nw
ofevery 6 letter word w
• Are there specific words that occurmore frequently than expected bychance?
![Page 20: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/20.jpg)
Decoding the regulatory network
• What is expected by chance?• What is “more frequently”?• Interesting mathematical questions
• The Moby Dick example
![Page 21: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/21.jpg)
![Page 22: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/22.jpg)
Decoding the regulatory network
• What is expected by chance?• What is “more frequently”?• Interesting mathematical questions
• The Moby Dick example• This is called “motif finding”, and helps
decode the regulatory network!
![Page 23: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/23.jpg)
Comparing DNA• Humans are about 99.9% identical to each
other, DNA-wise.
• How do we know that ?
• Compare the genome of two individuals.
• The computational problem: Are twosequences similar ?
![Page 24: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/24.jpg)
Sequence alignment• Why is this a problem?
• The two sequences will differ by “substitutions”,“insertions” and “deletions” accumulated duringevolution
• The comparison algorithm has to be robust tosuch possibilities.– A special technique called “dynamic programming”
does all this, and is “efficient”
![Page 25: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/25.jpg)
Sequence alignment
• Why should we care?
• Compare human genome with fish. You’ll seesome portions that are highly similar.
• These “conserved” portions are often genes…
• … or regulatory sequences! The regulatorynetwork again.
![Page 26: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/26.jpg)
On counting genes
• The original question was “Why do humanshave so few genes?”
• How do we know how many genes there arein the human genome ? (And where they arein the genome)
• Experiments can be designed, butbioinformatics plays a major role
![Page 27: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/27.jpg)
Gene prediction
• The task of predicting the locations ofgenes in a new genome (“annotation”)
• Gene prediction software
• The more sophisticated ones use“Hidden Markov models” (HMM) andmultiple species comparison
![Page 28: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/28.jpg)
HMM for Gene Prediction
http://researchweb.watson.ibm.com/journal/rd/453/birney.html
What is this graph?
It captures the “architecture” of a gene.
It translates into a “probabilisticmodel”.
It leads directly to a gene finding algorithm
![Page 29: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/29.jpg)
“What controls organregeneration ?”
“How does a single somaticcell become a whole plant ?”
![Page 30: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/30.jpg)
Development and Regeneration
• Developmental biology
• The timeline from a single cell (with geneticmaterial from mother and father) to amulticellular embryo, and to an adult
• A paradox : All cells in the adult body havethe same DNA, then how come different cellsare different ?
![Page 31: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/31.jpg)
Regulatory networks again
• Bioinformatics used to scan entire genomefor regions that participate in “segmenting”the embryo
• Hidden Markov models used to detect suchregions
• Multiple species comparison aids discovery
![Page 32: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/32.jpg)
“How did cooperative behaviorevolve?”
![Page 33: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/33.jpg)
Social behavior and bioinformatics?
• Social behavior in honey bees
• Young worker bees are nurses in the hive;older ones go out to forage
• This behavioral maturation is determined byneeds of colony. What is the genetic basis ofthis ?
![Page 34: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/34.jpg)
Social behavior and bioinformatics
• Illinois team scanned the genome tounderstand this (2006)
• Regulatory network of social behavior
• Statistical tools, such as Hypergeometric test
• Machine learning tools such as “supportvector machine classification”
![Page 35: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/35.jpg)
Other challenges
![Page 36: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/36.jpg)
Protein structure prediction
http://www.denizyuret.com/students/vkurt/thesis-main_dosyalar/image006.gif
![Page 37: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/37.jpg)
Protein structure prediction• Can we predict the 3-D structure of a protein
from its amino acid sequence ?
• Why ?– One good reason: structure gives clues about function. If we
can tell the structure, we can perhaps tell the function
– We can design amino acid sequences that will fold intoproteins that do what we want them to do. Drug design !!
• Neural networks, a popular technique incomputer science, applied to this problem
![Page 38: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/38.jpg)
Metagenomics• Most studies to date are on genomes of one
species
• A sample from the soil contains hundreds ofbacteria, thousands of viruses. Can we studyall of these ?
• The Sorcerer II expedition• http://www.sorcerer2expedition.org/version1/HTML/main.htm
![Page 39: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/39.jpg)
Many more challenges
• New types of data come due totechnological breakthroughs in biology
• High throughput data carriesunprecedented amount of information
• Too much noise• Bioinformatics removes the noise and
reveals the truth
![Page 40: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/40.jpg)
Bioinformatics
• Is not about one problem (e.g., designingbetter computer chips, better compilers,better graphics, better networks, betteroperating systems, etc.)
• Is about a family of very different problems,all related to biology, all related to each other
• How can computers help solve any of thisfamily of problems ?
![Page 41: CS 466 Introduction to Bioinformaticsveda.cs.uiuc.edu/courses/fa08/cs466/lectures/Lecture1.pdf · Introduction to Bioinformatics ... –Any programming language is fine. ... •How](https://reader031.vdocuments.net/reader031/viewer/2022030712/5afbd6587f8b9a444f8b5382/html5/thumbnails/41.jpg)
Bioinformatics and You
• You can learn the tools of bioinformatics• These tools owe their origin to computer
science, information theory, probabilitytheory, statistics, etc.
• You can learn the language of biology,enough to understand what the problems are
• You can apply the tools to these problemsand contribute to science