cost framework for a heterogeneous distributed semi-structured environment

13
Cost Framework for a Heterogeneous Distributed Semi- structured Environment Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1) DBMAN 2007 (1) ETIS Laboratory University of Cergy- Pontoise Cergy-Pontoise, France (2) Xcalia S.A., Paris, France June 18 th , 2007

Upload: adia

Post on 08-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Cost Framework for a Heterogeneous Distributed Semi-structured Environment. Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1). (1) ETIS Laboratory University of Cergy-Pontoise Cergy-Pontoise, France (2) Xcalia S.A., Paris, France. June 18 th , 2007. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1)

DBMAN 2007

(1) ETIS LaboratoryUniversity of Cergy-Pontoise

Cergy-Pontoise, France

(2) Xcalia S.A., Paris, France

June 18th, 2007

Page 2: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Outline

Motivation

Cost models for heterogeneous data sources

Contributions Generic language for cost communication Dynamic cost estimation framework

Conclusion

DBMAN 2007

Page 3: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Motivation

Cost-based query optimization Various execution plans for the same query Different costs for each plan (execution time, price,

communication, etc.) Cost model used to estimate the cost of candidate plans

Cost formulas: source oriented or operation oriented Statistics of data sources

Problems in the case of mediation context

Data source autonomy: cost models not available Integration of various cost models at mediator level Cost communication between components of the system

DBMAN 2007

Page 4: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Cost models for heterogeneous data sources

Cost models based on operation implementation

Generic cost models Specific methods

Known sources Heterogeneous autonomous sources

Relational

Data sources

Object oriented

Data souces

Semi-structured

Data sources

Operation[GP89]

[ML86] [SA82]

Sampling[ZL98]

Calibration[DKS92]

Adaptive[Zhu95]

Adapted

Refined

Operation[CD92] [BMG93]

[DOA+94]

Calibration[GST96]

AccessPath

[GGT96]

Extended

Flora[Flo96][Gru96]

Hybrid cost model[NGT98]

Cost modelby history[ACP96]

Wrapper[HKWY97][ROH99]

Operation[AAN01][MW99]

XQuerySelf-Learning[ZHJGML05]

Applied

Applied

DBMAN 2007

Page 5: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Background XLive mediation system and its XQuery evaluation process

DBMAN 2007

Wrapper Wrapper Wrapper

… …

XQuery

Query Result(XML)

Relational data source

XML data source

Web services

CanonizedXQuery

Tree Graph View(TGV)

AnnotatedTGV

XAlgebra

Query

Canonization

Modeling

Annotation

Transformation

Evaluation

Cost-basedOptimization

Response

Wrapperoperators

Mediator Equivalent rules SearchStrategy Mediator

InformationRepository

WrapperInformationRepository

Cost information

Cost information

Mediatoroperators

Page 6: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

BackgroundTree Graph View (TGV)

An example of XQuery TGV presentation

DBMAN 2007

Page 7: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Generic cost model in a mediation context

Design a generic cost model… Source type: relational, semi-structured, web-service… Specific methods

Calibration, History… APIs implemented by the system Principle: as accurate as possible

…Using cost formulas Equation systems Statistics expressed also in the form of equation Constant values

Existing generic cost model (Disco) Object Oriented environment Predefined variables in the language

DBMAN 2007

Page 8: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Our proposal: Generic Language for Cost Communication (GLCC)

A language based on XML Cost formulas and equation

systems in the form of MathML

A generic language No predefined variables Express different costs for

various optimization objectives (time, price…)

DBMAN 2007

Page 9: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Dynamic cost estimation framework

Cooperation and communication between different components of XLive

Use execution results (response time) to improve the accuracy of cost models

Cost communication performed in GLCC

DBMAN 2007

Page 10: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Overall cost estimation on the mediatorTGV cost annotation

For one or a group of operations in a TGV, annotate with cost information

Annotated

DBMAN 2007

Page 11: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Overall cost estimation on the mediatorCost Annotation Tree (CAT)

Breadth-first traversal of CAT to associate the execution cost for each node

DBMAN 2007

Page 12: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Conclusion and future work

Contributions First cost-based query optimization framework for XML-based

mediation system Generic language Suitable for various search strategies

Future work Cost model validation: Accuracy and performance Calibrating cost of native XML Data sources Search Strategy

DBMAN 2007

Page 13: Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Thanks for your attention!

Questions?

DBMAN 2007