![Page 1: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/1.jpg)
![Page 2: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/2.jpg)
![Page 3: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/3.jpg)
“Biomedical computing is entering an age where creative explorationof huge amounts of data will lay the foundation of hypotheses.Much work must still be done to collect data and create the tools toanalyse it. Bioinformatics, which provides the tools to extract andcombine knowledge from isolated data, gives us ways to think aboutthe vast amounts of information now available. It is changing theway biologists do science.”
A report to Harold Varmus, June 3 1999.
![Page 4: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/4.jpg)
3 Kilobytes
6 Megabytes
9 Terabytes
12 Petabytes
15 Exabytes
18 Zettabytes
21 Yottabytes
![Page 5: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/5.jpg)
GAATTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGAGCCCAAATGAGCGATAGATAGATAGATCGTGCGGCGATCTCGTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGGTTCTGGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGAGCCCAAATGAGCGATAGATAGATAGATCGTGCGGCGATCTCGTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGGTTCTGGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAAAGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGTGCGGCGATCTCGTACTGGACGGAAATGTCAGGAGATAGGAGAAGAAAA
![Page 6: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/6.jpg)
Nucleotide sequence database.
0
200
400
600
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96Year
Meg
ab
ase
s
![Page 7: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/7.jpg)
The Human Proteome
• ~ 30,000 protein coding genes
• Expansion of the number of different protein molecules due to:– (a) alternative splicing (30 to 50% increase);– (b) post-translational modifications (5 to 10 fold
increase)
• There could well be about 1 million different protein molecules in the human body
![Page 8: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/8.jpg)
![Page 9: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/9.jpg)
![Page 10: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/10.jpg)
Annotated genome
Annotation
Depth
of
know
ledge
Breadth of knowledge
Detailed analysis (typically biological)
of single genes
Large-scale analysis (typically
computational) of entire genome
![Page 11: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/11.jpg)
The two major methods of gene prediction
• sequence comparison
• ab initio
![Page 12: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/12.jpg)
Approaches to gene finding: Generalized hidden Markov models
![Page 13: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/13.jpg)
Limitations of Gene Prediction Programs
• Good at predicting ORF-containing sequence
• Prediction of exact exon-intron boundaries difficult
• Fuse & split genes• Cannot predict UTRs• Cannot predict nested genes
![Page 14: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/14.jpg)
Computational Analysis
Fly Alignments
•Known genes/cDNAs
•ESTs
•Transposons
Cross-species Sequence Similarities
Proteins & ESTs•Fly•Primate•Rodent•Worm•Yeast•Plant•Other Insects•Other Vertebrates•Other Invertebrates
Gene Predictions
•Genie
•Genscan
•tRNAscan-SE
![Page 15: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/15.jpg)
![Page 16: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/16.jpg)
Drosophila Gene Collection 1 Pavel Tomancak
![Page 17: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/17.jpg)
• Embryonic expression of wild-type eve (rust) and a transgene containing the stripe 3 + 7 tertiary element (blue)
• Alignment of eve 5’ regulatory region
• D. melanogaster vs (A) D.erecta (B) D.pseudoobscura
(C) D. willistoni and (D) D.littoralis
stripe 3 + 7
eve
![Page 18: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/18.jpg)
![Page 19: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/19.jpg)
![Page 20: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/20.jpg)
![Page 21: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/21.jpg)
Gene_Ontology
FlyBase - Drosophila - Cambridge & EBI, HarvardBerkeley & Bloomington.
Saccharomyces Genome Data Base - Stanford.Mouse Genome Informatics - Jackson Labs.
The Arabidopsis Information Resource - StanfordWormBase - Caltech & CSHL
DictyBase - Chicago
SwissProt - Hinxton & Geneva The Institute for Genome Research - MD
With support from NIH (NHGRI) &AstraZeneca.
![Page 22: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/22.jpg)
The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
![Page 23: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/23.jpg)
What is an Ontology?
An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and
implementations. …a specification of a conceptualization is a written, formal description of a set of concepts and
relationships in a domain of interest.
Peter Karp (2000) Bioinformatics 16:269
![Page 24: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/24.jpg)
• The Gene Ontology Consortium subscribes to the
Manifesto of Liberation Bioinformatics:
• Open source
• Open standards
• Open annotation
• Open data• merci tim hubbard - liberationise extraordinaire de ‘inxton
![Page 25: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/25.jpg)
Introduction to GO Introduction to GO
GO: A Gene Ontology
GO Objectives:
Provide a controlled vocabulary for the description of the molecular function and cellular location of gene products, as well as the role of the gene products in basic biological processes
Use these terms as attributes of gene products in the collaborating databases
Allow queries across databases using GO terms, providing the linking of biological information across species
![Page 26: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/26.jpg)
GO = Three OntologiesGO = Three Ontologies
• Biological Process = goal or objective within cell
• Molecular Function = elemental activity or
task
• Cellular Component = location or complex
![Page 27: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/27.jpg)
Parent-Child RelationshipsParent-Child Relationships
HierarchyOne-to-many parental relationship
Directed acyclic graph - dagMany-to-many parental relationship
Each child has only one parent
Each child may have one or more parents
![Page 28: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/28.jpg)
Classes of parent-child relationship:
• ISA (hyponomy) - as in: an elephant is a mammal.
• PARTOF (meronomy) - as in: a trunk is part of an elephant.
![Page 29: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/29.jpg)
cellular_component
%membrane %vacuolar membrane %nuclear membrane%intracellular %cell <cytoplasm <vacuole <vacuolar membrane <vacuolar lumen <nucleus <nuclear membrane
cellular_component
vacuolarmembrane
membrane intracellular
vacuole
vacuolarlumen
cytoplasmnucleus
nuclearmembrane
cell
instance of (%), part of (<).
Structure of the Ontologies
![Page 30: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/30.jpg)
• molecular function 5232 terms• biological process 6416 terms• cellular component 1111 terms
•all 12,759 terms
• definitions 7735 (61%) September 13 2002
Content of GO
![Page 31: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/31.jpg)
![Page 32: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/32.jpg)
![Page 33: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/33.jpg)
![Page 34: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/34.jpg)
![Page 35: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/35.jpg)
Thank yous
• Genome annotation: Colleagues in the European and Berkeley Drosophila Genome Projects.
• FlyBase: Colleagues in Harvard, Berkeley, Bloomington & Cambridge.
• Gene Ontology: Colleagues in Berkeley, Jackson Labs, Stanford and EBI.
![Page 36: “Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still](https://reader036.vdocuments.net/reader036/viewer/2022070411/56649ccf5503460f9499b759/html5/thumbnails/36.jpg)