introducci ó a la bioinformàtica roderic guigó i serra roderic.guigo@crgt

39
Introducció a la Bioinformàtica Roderic Guigó i Serra [email protected] Bioinformàtica, UPF Curs 2010-

Upload: lassie

Post on 24-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Introducci ó a la Bioinformàtica Roderic Guigó i Serra [email protected]. Bioinform àtica, UPF Curs 2010-. US-EC Workshop on Marine Genomics, Washington DC fall 2010. Training the next generation of Biologists. Roderic Guig ó, [email protected] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Introducció a la BioinformàticaRoderic Guigó i [email protected]

Bioinformàtica, UPF Curs 2010-

Page 2: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Training the next generation of Biologists

Roderic Guigó, [email protected] for Genomic Regulation,

Barcelona

US-EC Workshop on Marine Genomics, Washington DC fall 2010

Page 3: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Why “next generation biologists” should be trained differently than

biologists of previous generations?

Page 4: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Why “next generation biologists” should be trained differently than

biologists of previous generations?

•The impact of technology– in the way we do Biology

Page 5: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Technology is not new to Biology

In 1676 his credibility was questioned when he sent the Royal Society a copy of his first observations of microscopic single celled organisms. Heretofore, the existence of single celled organisms was entirely unknown … The Royal Society arranged to send an English vicar, as well as a team of respected jurists and doctors to Delft, Holland to determine whether it was in fact Van Leeuwenhoek's ability to observe and reason clearly (wikipedia)

Page 6: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Two moments in the second half of the past century

1. Sequencing (Sanger et al)ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTG

Page 7: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Two moments in the second half of the past century

2. mutliplexing, automating,…– Surveying many things at once– Surveying whole systems

Page 8: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Biology is transitioning (at least partially)

from an “analytic” science: the real world

is disected in its elemental components in order to be comprehended

to “syntetic” science: the challenge is

the integration of globlal information on the living cell/individual/population/(eco)sytem.

From analytic to syntetic

Page 9: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Biology, a science in which the effort has traditionally been directed towards data aquisition has become in a very short time a discipline in which the data is obtained with almost no human intervention, and the effort is turning towards data analysis.

From data acquisition to data analysis

Page 10: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

April 21, 2023 10

DNA microarrays

Page 11: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing Evolution/Revolution

1990: thousand bases/day

2000: million bases/day

2010: billion bases/day

Page 12: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

• 2008: Major genome centers can sequence the same number of base pairs every 4 days

• 1000 Genome project launched

• World-wide capacity dramatically increasing

Further Evolution of Large-scale Genome Sequencing

• 2000: Human genome working drafts

• Data unit of approximately 10x coverage of human– 10 years and cost about $3 billion

• 2009: Every 4 hours ($25,000)

• 2010: Every 14 minutes ($5,000)

• Illumina HiSeq2000 machine produces 200 gigabases per 8 day run (BGI have ordered have 128)

Slide from Paul Flicek. EBI Bioinformatics Advisory Council

Page 13: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt
Page 14: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt
Page 15: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

la llei de Moore

Page 16: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt
Page 17: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Within individual ecosystems

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Page 18: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Within individual ecosystems

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Page 19: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Within individual ecosystems

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Page 20: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Page 21: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Page 22: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Page 23: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s

phenotype

Page 24: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s

phenotype

Page 25: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s

phenotype

Page 26: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

Page 27: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

In summary

• Intrinsec symbolic/computational nature of biological (genomic) data

• Emphasis in synthesis (rather/in addition to analysis)

• Exponential data production– Separated from human intervention

Page 28: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 0

Page 29: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 01990-1994 15

Page 30: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 01990-1994 151995-1999 823

Page 31: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 01990-1994 151995-1999 8232000-2004 7827

Page 32: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 01990-1994 151995-1999 8232000-2004 78272005-2008 18822

Page 33: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Bioinformatics, Genomics, Systems Biology in Medline

Page 34: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

bioinformatics 14,100,000

chemoinformatics 226,000

astroinformatics 195

neuroinformatics 364,000

socioinformatics 610

geoinformatics 506,000

meteoinformatics 48

econoinformatics 441

ecoinformatics 160,000

Bioinformatics

Google search: X-informatics (11 juny, 2007)

Page 35: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Engineering and biology: increasingly interconnected• Improved technologies to survey Biological

Systems– NGS and the like [technological fluency]

• Engineering of Biological Systems– Medicine– New and modified biological systems

• Using Biology to build non-biological systems– DNA computing

Page 36: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Biology has changed and it is changing

•Quantitative thinking•Ability to attack unanticipated problems

Page 37: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

Biology requires quantitative thinking

• Statistics • Mathematics• Computer Science• …

and programming skills (unix)

• The ability to interrogate data, and to models systems

Page 38: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt
Page 39: Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crgt

dues idees • La biologia, una disciplina en la que l’esforç ha estat

tradicionalment dedicat a l’obtenció de les dades, ha esdevingut en poc temps una disciplina en la que les dades s’obtenen de manera gairebé automàtica, i l’esforç s’ha desplaçat cap a l’anàlisi de les dades.

• La Bioinformàtica més que un altre (sub) disciplina de la Biologia (com ara la bioquímica, la genètica, la botànica, …) és una disciplinea que permea tota la Biologia; és una manera de fer Biologia; en molts casos, la única manera de fer Biologia.

• Molts processos biològics poden ser entesos com a computacions gairebe sensu stricto.