introducci ó a la bioinformàtica roderic guigó i serra roderic.guigo@crgt
DESCRIPTION
Introducci ó a la Bioinformàtica Roderic Guigó i Serra [email protected]. Bioinform àtica, UPF Curs 2010-. US-EC Workshop on Marine Genomics, Washington DC fall 2010. Training the next generation of Biologists. Roderic Guig ó, [email protected] - PowerPoint PPT PresentationTRANSCRIPT
Introducció a la BioinformàticaRoderic Guigó i [email protected]
Bioinformàtica, UPF Curs 2010-
Training the next generation of Biologists
Roderic Guigó, [email protected] for Genomic Regulation,
Barcelona
US-EC Workshop on Marine Genomics, Washington DC fall 2010
Why “next generation biologists” should be trained differently than
biologists of previous generations?
Why “next generation biologists” should be trained differently than
biologists of previous generations?
•The impact of technology– in the way we do Biology
Technology is not new to Biology
In 1676 his credibility was questioned when he sent the Royal Society a copy of his first observations of microscopic single celled organisms. Heretofore, the existence of single celled organisms was entirely unknown … The Royal Society arranged to send an English vicar, as well as a team of respected jurists and doctors to Delft, Holland to determine whether it was in fact Van Leeuwenhoek's ability to observe and reason clearly (wikipedia)
Two moments in the second half of the past century
1. Sequencing (Sanger et al)ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTG
Two moments in the second half of the past century
2. mutliplexing, automating,…– Surveying many things at once– Surveying whole systems
Biology is transitioning (at least partially)
from an “analytic” science: the real world
is disected in its elemental components in order to be comprehended
to “syntetic” science: the challenge is
the integration of globlal information on the living cell/individual/population/(eco)sytem.
From analytic to syntetic
Biology, a science in which the effort has traditionally been directed towards data aquisition has become in a very short time a discipline in which the data is obtained with almost no human intervention, and the effort is turning towards data analysis.
From data acquisition to data analysis
April 21, 2023 10
DNA microarrays
Sequencing Evolution/Revolution
1990: thousand bases/day
2000: million bases/day
2010: billion bases/day
• 2008: Major genome centers can sequence the same number of base pairs every 4 days
• 1000 Genome project launched
• World-wide capacity dramatically increasing
Further Evolution of Large-scale Genome Sequencing
• 2000: Human genome working drafts
• Data unit of approximately 10x coverage of human– 10 years and cost about $3 billion
• 2009: Every 4 hours ($25,000)
• 2010: Every 14 minutes ($5,000)
• Illumina HiSeq2000 machine produces 200 gigabases per 8 day run (BGI have ordered have 128)
Slide from Paul Flicek. EBI Bioinformatics Advisory Council
la llei de Moore
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Within individual ecosystems
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s
phenotype
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Within individual ecosystems
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s
phenotype
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Within individual ecosystems
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s
phenotype
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Ecosystems (enviromental, individual)
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s
phenotype
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Ecosystems (enviromental, individual)
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s
phenotype
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Ecosystems (enviromental, individual)
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s
phenotype
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Ecosystems (enviromental, individual)
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s
phenotype
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Ecosystems (enviromental, individual)
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s
phenotype
Sequencing challenges
• Sequencing to survey dynamics of ecosystems• Metagenomes
– Ecosystems (enviromental, individual)
• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments
– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s
phenotype
ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC
In summary
• Intrinsec symbolic/computational nature of biological (genomic) data
• Emphasis in synthesis (rather/in addition to analysis)
• Exponential data production– Separated from human intervention
bioinformàtica
Articles a Medline amb la paraula clau Bioinformatics.
any # articlesfins el 1990 0
bioinformàtica
Articles a Medline amb la paraula clau Bioinformatics.
any # articlesfins el 1990 01990-1994 15
bioinformàtica
Articles a Medline amb la paraula clau Bioinformatics.
any # articlesfins el 1990 01990-1994 151995-1999 823
bioinformàtica
Articles a Medline amb la paraula clau Bioinformatics.
any # articlesfins el 1990 01990-1994 151995-1999 8232000-2004 7827
bioinformàtica
Articles a Medline amb la paraula clau Bioinformatics.
any # articlesfins el 1990 01990-1994 151995-1999 8232000-2004 78272005-2008 18822
Bioinformatics, Genomics, Systems Biology in Medline
bioinformatics 14,100,000
chemoinformatics 226,000
astroinformatics 195
neuroinformatics 364,000
socioinformatics 610
geoinformatics 506,000
meteoinformatics 48
econoinformatics 441
ecoinformatics 160,000
Bioinformatics
Google search: X-informatics (11 juny, 2007)
Engineering and biology: increasingly interconnected• Improved technologies to survey Biological
Systems– NGS and the like [technological fluency]
• Engineering of Biological Systems– Medicine– New and modified biological systems
• Using Biology to build non-biological systems– DNA computing
Biology has changed and it is changing
•Quantitative thinking•Ability to attack unanticipated problems
Biology requires quantitative thinking
• Statistics • Mathematics• Computer Science• …
and programming skills (unix)
• The ability to interrogate data, and to models systems
dues idees • La biologia, una disciplina en la que l’esforç ha estat
tradicionalment dedicat a l’obtenció de les dades, ha esdevingut en poc temps una disciplina en la que les dades s’obtenen de manera gairebé automàtica, i l’esforç s’ha desplaçat cap a l’anàlisi de les dades.
• La Bioinformàtica més que un altre (sub) disciplina de la Biologia (com ara la bioquímica, la genètica, la botànica, …) és una disciplinea que permea tota la Biologia; és una manera de fer Biologia; en molts casos, la única manera de fer Biologia.
• Molts processos biològics poden ser entesos com a computacions gairebe sensu stricto.