(near term) develop database requirements to yield schema and interfaces
DESCRIPTION
(near term) Develop Database Requirements to Yield Schema and Interfaces MoBIoS: Database Management for Data in Metric Spaces Daniel P. Miranker Univ. of Texas. What we know for sure: Exploit Commodity Architecture. External Data/DB Sources. Web App Server. Curating New Content. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/1.jpg)
1. (near term) Develop Database Requirements to Yield Schema and Interfaces
2. MoBIoS: Database Management for Data in Metric Spaces
Daniel P. Miranker
Univ. of Texas
![Page 2: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/2.jpg)
What we know for sure: Exploit Commodity Architecture
DB
Curating New Content
Computing GridWebApp
Server
External Data/DB Sources
Users
![Page 3: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/3.jpg)
Repository Schema and Interface Definitions
Issue:
• Database organization and data interchange should be addressed simultaneously
• Once established, difficult to change
Best to get this right the first time.
![Page 4: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/4.jpg)
What we know for sure:
DB Schema
Curating New Content
Computing GridWebApp
Server
1. Data transfer XML & Nexus files2. Curate: (manage quality)
Users
Both 1 & 2 impact schema, (data provenance)
![Page 5: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/5.jpg)
XML and Bioinformatics
• Taxonomic Markup Language (TML)
• PhyloML
• BEAST: Bayesian Evolutionary Analysis Sampling Trees
• AGAVE: Architecture for Genomic Annoation Visualization and Exchange
![Page 6: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/6.jpg)
Answers Start with a Requirements Analysis
• Who
• What
• Why
• How
“Use cases”: specific examples of what is to be accomplish
![Page 7: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/7.jpg)
A Head Start
Requirements of Phylogenetic Databases (with Nakhleh, Barbancon Piel & Donoghue)[BIBE ’03]
• Did a requirements analysis
• Proof of concept for a correctly normalized database schema
1 evolutionary (tree)-edge = 1 row in the database
![Page 8: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/8.jpg)
Who is interested in using Phylogenies?
• Casual Users
• Visualization
• Study Development
• Super-tree algorithms
• Simulation Studies
• Parameter Derivation
• Comparative Genomics
![Page 9: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/9.jpg)
Super-Tree Algorithms Use-Cases
Construct phylogenies by assembling existing studies
Collect those studies by:
• Determine minimum spanning clade for a set of taxa
• Find all phylogenies sufficiently similar to a given phylogeny
![Page 10: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/10.jpg)
Requirements of Phylogenetic Databases
![Page 11: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/11.jpg)
The MoBIoS ProjectMolecular Biological Information System
Daniel P. Miranker
University of Texas
![Page 12: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/12.jpg)
MoBIoS – A Simple IdeaOrganize the Storage Manager Around Metric Space Indexing
Relational Databases
B+ trees 1
dimensional
Spatial Databases
R & K-D trees 2 & 3 dimensions
Metric Databases
VP, M & GNAT trees
No dimensions
Or
very high dimensions
![Page 13: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/13.jpg)
Biological queries conducted with sequential scans.
• Sequence (BLAST)
• Phylogenies (Tree of Life)
• Mass Spectra (Proteomics)
• Ligand Docking (Rational Drug Design)
![Page 14: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/14.jpg)
Metric Space is
• a pair, M=(D,d), where • D is a set of points • d is [metric] distance function with the following
properties:
– d(x, y) = d (y, x) (symmetry)– d(x, y) > 0, d(x, x) = 0 (non negativity)– d(x, y) <= d(x, z) + d(z, y) (triangle inequality)
![Page 15: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/15.jpg)
Can Biology Be Modeled by Metrics?
• Already metrics re:– Phylogenetic trees
– Ligand docking
• First Biologically Effective Metric Model of Amino Acid Substitution [Xu&Miranker 03] In effect, precisely the phylogenetic relationships among
sequences are exploited to form a database index.
• Metrics for proteomic mass-spectra underway
![Page 16: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/16.jpg)
MoBIoS Architecture(Molecular Biological Information System)
phylogenies
![Page 17: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/17.jpg)
First Application (with Randy Linder)
Compared:
{entire Arib. Genome} x {“entire” Rice genome}
To determine conserved pairs of primer pairs,
In O(m log n), will repeat study again soon, faster.
![Page 18: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/18.jpg)
When biological data is put in to an RDBMS
• Primary data is stored in text or blob fields– Annotations may be relational
• Data retrieval – Filter DB, sequential dump, O(n), to utilities
• E.g. BLAST, TreeBASE, Sequest
Organism Function Sequence (BLOB)
Yeast membrane AACCGGTTT
Yeast mitosis TATCGAAA
E. Coli membrane AGGCCTA
![Page 19: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/19.jpg)
Homework: Due tomorrow morning
1. Who are you, (generically)?
2. Use case involving the database
![Page 20: (near term) Develop Database Requirements to Yield Schema and Interfaces](https://reader035.vdocuments.net/reader035/viewer/2022062409/56815186550346895dbfbef9/html5/thumbnails/20.jpg)
Don’t know: A General Web Service
DB Schema
Curating New Content
Computing GridWebApp
Server
ToL Infrastructure @ SDSC
Computing Grid