the data ring: community content sharing serge abiteboul (inria) alkis polyzotis (uc santa cruz)
Post on 20-Dec-2015
218 views
TRANSCRIPT
The Data Ring: Community Content Sharing
Serge Abiteboul (INRIA)
Alkis Polyzotis (UC Santa Cruz)
Data Sharing Communities
• Examples: UCSC genome browser, SwissProt, Flickr• Interesting data management problem
– Shared information is heterogeneous– Data is distributed and dynamic– Lack of central administration– Users are not database savvy
Data sharing community: a group of users that share and query information within some domain
The Data Ring
• P2P middleware system that provides:– Monitoring– Querying– …and other database-like services over the
distributed information
• Main goal: simplicity of use
Data abstraction in the data ring
• Topological layer• Physical layer• External layer
Data abstraction in the data ring
• Declarative query services• Data and query model based on XML
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Topological Layer
Data abstraction in the data ring
• Basic service is distributed query evaluation• Comprises the overlay network (DHT), physical access structures (indices,
replicas, views), and the catalog.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Physical Layer
Data abstraction in the data ring
• Provides semantically richer data models
External Layer
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Data abstraction in the data ring
• Our focus is on the topological and physical layer
• External layer is equally important and an active research area
Physical Layer
Topological Layer
Thesis #1: formalism for distributed XML data and queries
Distributed XML data and queries
• What made the relational model successful:– A logic for describing tables– An algebra for query optimization
• We need the equivalent for trees in a distributed context:– A logic for describing distributed XML data– An algebra for optimizing distributed XML queries
Desiderata for description logic
• Seamless transition between data and services– Important for loose data integration
• Support for XML streams– Streams are essential for subscription services– They are also necessary to support recursion
Starting point: AXML
• AXML: XML tree with embedded web service calls– Seamless transition between intentional and
extensional data– Provides a simple mechanism for loose data
integration
• Core concept: XML streams– A web service call returns a stream of elements– Support for both push and pull semantics
Desiderata for algebra
• Be amenable to rewrites • Capture the topology of distributed computation • Allow seamless transition between logical and
physical state– Plans may need to be re-optimized in mid-flight– It may be necessary to perform partial optimization– Error recovery
A proposal based on AXML
• A distributed plan is a workflow of web services … which is exactly a AXML tree
• Components:– An encoding of distributed plans in AXML– Rewrite rules
• A nice bonus: plans can be readily exchanged between nodes
Disclaimer
• AXML is a starting point, not a panacea• Bottom line: we need formalisms for
distributed XML queries
Thesis #2: autonomic administration
Autonomic administration
• Users are not database experts– Typically, scientists with computer experience
• Users are averse to too many “knobs”• No central authority that is responsible for
administration• Autonomic administration is a necessity -- not
a gadget
Facets of autonomy
• Self-monitoring• Self-tuning• Self-healing
Some issues
• System integration• Distribution• On-line tuning• Pro-active tuning
Distributed vs. local tuning
• Distributed tuning– Based on the global workload– Catalog organization, replication
• Local tuning– Based on local workload– Physical design tuning
Data activation for files
• A large portion of the data is expected to be in files• We need to develop query processors for data
residing in files• File activation: optimize access to the file based on
the local workload– E.g., instantiate an index on file contents or materialize a
relational view
• Local tuning is essential in this context