20160208 introduction to bioinformatics - utrecht...
TRANSCRIPT
2/8/16
1
IntroductiontoBioinformatics
BasE.DutilhSystemsBiology:BioinformaticDataAnalysis
UtrechtUniversity,February8th 2016
Infoanddocumentation• http://tbb.bio.uu.nl/BDA/
• http://www.google.com/ http://www.wikipedia.org/– …butonly forguidanceandhints:never taketheinternetforgranted
• Campbell Biology,9th or10th edition, Pearson
• Reader– Printedinblackandwhite– DownloadfullcolorPDFat:http://tbb.bio.uu.nl/BDA/BioInf2016.pdf
– Errata:http://tbb.bio.uu.nl/BDA/errata.html
2/8/16
2
Courseevaluation• Final markcourse
– 40%markofBioinformaticDataAnalysis• BasDutilh
– 10%markofBasicMaths• KirstentenTusscher
– 50%markofMathematics/Theoretical Biology• KirstentenTusscher enRobdeBoer
• BioinformaticDataAnalysisexam– Written exam– “Cheatsheet”allowed:onehand-written A4,double-sided isOK– Date:March 14th 2015at13:30-16:30inEducatorium Gamma
• BioinformaticDataAnalysisbonuspoint– Makeall exercises andhavethem signed by your assistant
• This hasto be done inthe same weekofthe practical• Incaseofemergency: lastchanceto sign offisonMonday before lecture
– Themaximummarkisa10– Mini-articlewascancelled
Howwouldyoufigureoutthefunctionofaprotein?
Knock-outmouse
X-raystructureActivityassay
BLASTsearch
2/8/16
3
Howaboutforallproteinsinagenome?
Genomesizes
Tb: Tera basepairs(1012)Gb:Gigabasepairs(109)Mb:Megabasepairs(106)Kb:Kilobasepairs(103)
Chaos chaos (1.4 Tb,Friz 1968)
2/8/16
4
Genedensityandnon-codingDNA• Mammals(including humans) havethelowestgenedensity– NumberofgenesinagivenlengthofDNA
• Introns withingenes• Noncoding DNAbetweengenes
Componentsofthehumangenome• 20,000– 25,000protein-codinggenes(1.5%)
• Introns (25.9%)
• Transposable elements(44.7%)– DNAtransposons– Longterminalrepeat(LTR)retrotransposons– Shortinterspersednuclearelements(SINEs)– Longinterspersednuclearelements(LINEs)– Endogenous retroviruses– Miniatureinvertedrepeattransposableelements(MITEs)
2/8/16
5
Largestgenomes
Largestsequencedgenome:Loblollypine(Pinus taeda)20,000,000,000bp (20Gb)
Kinugasasō (Parisjaponica)149,000,000,000bp (149Gb)
Smallestgenomes• Eukaryota– Free:Ostreococcus tauri (12.6Mb)– Endosymb:Encephalitozoon intestinalis (2.3Mb)
• BacteriaandArchaea– Free:Mycoplasma genitalium (580kb)– Endosymb:Cand. Carsonella ruddii (160kb)
• Viruses– Circoviridae (1.8kb– onlytwoproteins!)
2/8/16
6
Humangenome• 3,000,000,000 bp (3Gb)• HumanGenomeProject (HGP)
– 1990-2003– Draftgenomesequencecompletein2000
• Referencegenome– Source:blood (female)andsperm(male)– Samplestakenfrommanydonors,butonlyafewwereusedtoprotectdonor identities
– Sequenceisnot fromoneindividual• >70%fromonemaledonor
• CostHGP:$3,000,000,000– Target:$1,000genome
Prokaryotes
Geneticdiversity• PhylogeneticTreeofLife
Bacteria
Archaea
Eukaryotes
2/8/16
7
Genomesequencing
Clonedgenomes
Segmentsknownorder
Fragmentandsequence
Assemblesequences
Consensusgenome
WholeGenomeShotgun (WGS)approach
2/8/16
8
Personalgenomesequences
CraigVenter JamesWatson
ReferenceGenome
~5.000.000differences
~2.000.000differences
~5.000.000differences
Yourpersonalgenomesequence
2/8/16
9
Sowehavea$200personalgenome…
• …nowthemillion dollarquestionis:
WhatcanIlearnfrommy3,000,000,000A’s,C’s,G’s,andT’s?
Personalizedmedicine
• Fromreactivetoproactivemedicine– Identifyhighriskalleles– Adaptlifestyle(e.g.riskofhighbloodpressure)– Preventivescreeningortreatment(e.g.riskofcancer)
• Pharmacogenomics:– Impactofgeneticvariationonresponsetomedication
SergeyBrinCo-founder
LRRK2polymorphismonchromosome12- 28%riskofParkinson’satage59- 51% atage69- 74% atage79
Co-invester
2/8/16
10
Biology isBigData science#sequ
encedgeno
mes
Moore'sLaw: computerpowerdoublesevery~2years.
RNA Protein
Omics sciences• Thesuffix -ome referstoa totality ofsomesort• Gene(genetics)• Transcript(RNA)• Protein
• Metabolite• Lipid• Microbe
• Genome• Transcriptome• Proteome
• Metabolome• Lipidome• Microbiome
• Genomics• Transcriptomics• Proteomics
• Metabolomics• Lipidomics• Microbiomics (?!)
DNA
2/8/16
11
Genomics• Identifydifferencesingenecontentbetweengenomes• Discovernewspecies:“BiologicalDarkMatter”• Analyzegenomeevolution• Predictgenefunctions
Chordata ↔Echinodermata
1,000,000,000,000 specieson earth?
10,000 speciescultured
30,000 genomessequenced
2/8/16
12
Sample
Filter
Microbesorviruses
Metagenomics
2/8/16
13
Spangetal. Nature2015
Metagenomicdiscovery ofLokiarchaeota
Prokaryotes
Geneticdiversity• PhylogeneticTreeofLife
Bacteria
Archaea
Eukaryotes
2/8/16
14
Image:LisaBrownfor
Humanmicrobiomeandvirome• Inyourbody: ~1013 humancells~1014 bacteria~1015 viruses
Bioinformatics• Bioinformatics:studyofinformatic processesinbioticsystems
PaulienHogeweg andBenHesper (UtrechtUniversity,1970)• BioinformaticDataAnalysis:usingcomputationalmethodstoanalyzebiologicaldata
2/8/16
15
Bioinformatics inUtrechttoday
Bringyourlaptop