introducci ó a la bioinformàtica roderic guigó i serra roderic.guigo@crgt

Post on 24-Jan-2016

32 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Introducci ó a la Bioinformàtica Roderic Guigó i Serra roderic.guigo@crg.cat. Bioinform àtica, UPF Curs 2010-. US-EC Workshop on Marine Genomics, Washington DC fall 2010. Training the next generation of Biologists. Roderic Guig ó, roderic.guigo@crg.cat - PowerPoint PPT Presentation

TRANSCRIPT

Introducció a la BioinformàticaRoderic Guigó i Serraroderic.guigo@crg.cat

Bioinformàtica, UPF Curs 2010-

Training the next generation of Biologists

Roderic Guigó, roderic.guigo@crg.catCenter for Genomic Regulation,

Barcelona

US-EC Workshop on Marine Genomics, Washington DC fall 2010

Why “next generation biologists” should be trained differently than

biologists of previous generations?

Why “next generation biologists” should be trained differently than

biologists of previous generations?

•The impact of technology– in the way we do Biology

Technology is not new to Biology

In 1676 his credibility was questioned when he sent the Royal Society a copy of his first observations of microscopic single celled organisms. Heretofore, the existence of single celled organisms was entirely unknown … The Royal Society arranged to send an English vicar, as well as a team of respected jurists and doctors to Delft, Holland to determine whether it was in fact Van Leeuwenhoek's ability to observe and reason clearly (wikipedia)

Two moments in the second half of the past century

1. Sequencing (Sanger et al)ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTG

Two moments in the second half of the past century

2. mutliplexing, automating,…– Surveying many things at once– Surveying whole systems

Biology is transitioning (at least partially)

from an “analytic” science: the real world

is disected in its elemental components in order to be comprehended

to “syntetic” science: the challenge is

the integration of globlal information on the living cell/individual/population/(eco)sytem.

From analytic to syntetic

Biology, a science in which the effort has traditionally been directed towards data aquisition has become in a very short time a discipline in which the data is obtained with almost no human intervention, and the effort is turning towards data analysis.

From data acquisition to data analysis

April 21, 2023 10

DNA microarrays

Sequencing Evolution/Revolution

1990: thousand bases/day

2000: million bases/day

2010: billion bases/day

• 2008: Major genome centers can sequence the same number of base pairs every 4 days

• 1000 Genome project launched

• World-wide capacity dramatically increasing

Further Evolution of Large-scale Genome Sequencing

• 2000: Human genome working drafts

• Data unit of approximately 10x coverage of human– 10 years and cost about $3 billion

• 2009: Every 4 hours ($25,000)

• 2010: Every 14 minutes ($5,000)

• Illumina HiSeq2000 machine produces 200 gigabases per 8 day run (BGI have ordered have 128)

Slide from Paul Flicek. EBI Bioinformatics Advisory Council

la llei de Moore

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Within individual ecosystems

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Within individual ecosystems

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Within individual ecosystems

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq and nucleosome positioning• RNA sequencing as a proxy to the cell’s

phenotype

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s

phenotype

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s

phenotype

Sequencing challenges

• Sequencing to survey dynamics of ecosystems• Metagenomes

– Ecosystems (enviromental, individual)

• Other species genomes• Reference Human Genome• Individual genomes• Individual meta-genomes• Within individual genomic diversity• Sequencing as the read-out of experiments

– Chip-Seq, nucleosome positioning, …• RNA sequencing as a proxy to the cell’s

phenotype

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

In summary

• Intrinsec symbolic/computational nature of biological (genomic) data

• Emphasis in synthesis (rather/in addition to analysis)

• Exponential data production– Separated from human intervention

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 0

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 01990-1994 15

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 01990-1994 151995-1999 823

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 01990-1994 151995-1999 8232000-2004 7827

bioinformàtica

Articles a Medline amb la paraula clau Bioinformatics.

any # articlesfins el 1990 01990-1994 151995-1999 8232000-2004 78272005-2008 18822

Bioinformatics, Genomics, Systems Biology in Medline

bioinformatics 14,100,000

chemoinformatics 226,000

astroinformatics 195

neuroinformatics 364,000

socioinformatics 610

geoinformatics 506,000

meteoinformatics 48

econoinformatics 441

ecoinformatics 160,000

Bioinformatics

Google search: X-informatics (11 juny, 2007)

Engineering and biology: increasingly interconnected• Improved technologies to survey Biological

Systems– NGS and the like [technological fluency]

• Engineering of Biological Systems– Medicine– New and modified biological systems

• Using Biology to build non-biological systems– DNA computing

Biology has changed and it is changing

•Quantitative thinking•Ability to attack unanticipated problems

Biology requires quantitative thinking

• Statistics • Mathematics• Computer Science• …

and programming skills (unix)

• The ability to interrogate data, and to models systems

dues idees • La biologia, una disciplina en la que l’esforç ha estat

tradicionalment dedicat a l’obtenció de les dades, ha esdevingut en poc temps una disciplina en la que les dades s’obtenen de manera gairebé automàtica, i l’esforç s’ha desplaçat cap a l’anàlisi de les dades.

• La Bioinformàtica més que un altre (sub) disciplina de la Biologia (com ara la bioquímica, la genètica, la botànica, …) és una disciplinea que permea tota la Biologia; és una manera de fer Biologia; en molts casos, la única manera de fer Biologia.

• Molts processos biològics poden ser entesos com a computacions gairebe sensu stricto.

top related