the data ring: community content sharing serge abiteboul (inria) alkis polyzotis (uc santa cruz)

21
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Post on 20-Dec-2015

218 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

The Data Ring: Community Content Sharing

Serge Abiteboul (INRIA)

Alkis Polyzotis (UC Santa Cruz)

Page 2: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Data Sharing Communities

• Examples: UCSC genome browser, SwissProt, Flickr• Interesting data management problem

– Shared information is heterogeneous– Data is distributed and dynamic– Lack of central administration– Users are not database savvy

Data sharing community: a group of users that share and query information within some domain

Page 3: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

The Data Ring

• P2P middleware system that provides:– Monitoring– Querying– …and other database-like services over the

distributed information

• Main goal: simplicity of use

Page 4: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Data abstraction in the data ring

• Topological layer• Physical layer• External layer

Page 5: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Data abstraction in the data ring

• Declarative query services• Data and query model based on XML

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Topological Layer

Page 6: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Data abstraction in the data ring

• Basic service is distributed query evaluation• Comprises the overlay network (DHT), physical access structures (indices,

replicas, views), and the catalog.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Physical Layer

Page 7: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Data abstraction in the data ring

• Provides semantically richer data models

External Layer

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Data abstraction in the data ring

• Our focus is on the topological and physical layer

• External layer is equally important and an active research area

Physical Layer

Topological Layer

Page 9: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Thesis #1: formalism for distributed XML data and queries

Page 10: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Distributed XML data and queries

• What made the relational model successful:– A logic for describing tables– An algebra for query optimization

• We need the equivalent for trees in a distributed context:– A logic for describing distributed XML data– An algebra for optimizing distributed XML queries

Page 11: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Desiderata for description logic

• Seamless transition between data and services– Important for loose data integration

• Support for XML streams– Streams are essential for subscription services– They are also necessary to support recursion

Page 12: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Starting point: AXML

• AXML: XML tree with embedded web service calls– Seamless transition between intentional and

extensional data– Provides a simple mechanism for loose data

integration

• Core concept: XML streams– A web service call returns a stream of elements– Support for both push and pull semantics

Page 13: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Desiderata for algebra

• Be amenable to rewrites • Capture the topology of distributed computation • Allow seamless transition between logical and

physical state– Plans may need to be re-optimized in mid-flight– It may be necessary to perform partial optimization– Error recovery

Page 14: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

A proposal based on AXML

• A distributed plan is a workflow of web services … which is exactly a AXML tree

• Components:– An encoding of distributed plans in AXML– Rewrite rules

• A nice bonus: plans can be readily exchanged between nodes

Page 15: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Disclaimer

• AXML is a starting point, not a panacea• Bottom line: we need formalisms for

distributed XML queries

Page 16: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Thesis #2: autonomic administration

Page 17: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Autonomic administration

• Users are not database experts– Typically, scientists with computer experience

• Users are averse to too many “knobs”• No central authority that is responsible for

administration• Autonomic administration is a necessity -- not

a gadget

Page 18: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Facets of autonomy

• Self-monitoring• Self-tuning• Self-healing

Page 19: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Some issues

• System integration• Distribution• On-line tuning• Pro-active tuning

Page 20: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Distributed vs. local tuning

• Distributed tuning– Based on the global workload– Catalog organization, replication

• Local tuning– Based on local workload– Physical design tuning

Page 21: The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)

Data activation for files

• A large portion of the data is expected to be in files• We need to develop query processors for data

residing in files• File activation: optimize access to the file based on

the local workload– E.g., instantiate an index on file contents or materialize a

relational view

• Local tuning is essential in this context