university of maryland scaling heterogeneous information access for wide area environments michael...

24
University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

Upload: stuart-blankenship

Post on 18-Jan-2018

219 views

Category:

Documents


0 download

DESCRIPTION

University of Maryland The Big Picture

TRANSCRIPT

Page 1: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Scaling Heterogeneous Information Access for Wide area Environments

Michael Franklin and Louiqa Raschid

Page 2: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Wide-Area Data AccessProblems Scalability of Wrapper-Mediator Systems Publishing and Discovery of Sources Dissemination of Relevant Information

Relevant Technologies Flexible Architectures Adaptive Systems Metadata Management

Page 3: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

The Big Picture

Page 4: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

the little picture

Predator O-R DBMS

Remote wrapper interface

Planner

ScramblerMDT

Wrapper interface

Web sources

Page 5: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Querying Web Sources Generating wrappers for Web accessible sources to

provide an API for queries and structured answers. Obtaining and representing source capability and

content descriptions to use in query planning. Estimating the response time for cost-based

optimization

Page 6: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Web application wrapper toolkit Define the capabilities of Web sources A wrapper interface to publish source capability A wrapper toolkit

Translation from query + bindings –› URL Declarative language to specify Extractors Simple extractors HTML or XMLData –» structured object Complex extractors - customizable crawler utility for extraction of

meta-information Generator for JDBC compliant wrappers Metadata and query and answer interface

Page 7: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Weather source

Page 8: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Results from the Weather source

Page 9: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Page 10: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Query Planning for Web sourcesObjective: Generate safe optimal plans with possibly

replicated sources Multiple heterogeneous sources

Limited capability (bindings) Possible replication of contents Complete / Incomplete sources

Use meta-information to construct lattices Generate safe plans with alternatives Mediator algebra and rules for optimization

Page 11: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Page 12: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Content and Capability Descriptions Domain information Capability descriptions:

I/O relationships: Time,Date Channel,Title,Category Content: Date:CurrentYear Time:{0, …,23} Channel:CNW Completeness information, Complete. Source S3 provides complete

answer when Time and Date are bound and Channel=ppv and Category=Movies. Explicitly provided by the source DBA. Augmented by inference. Augmented by learning based on query feedback

Page 13: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Sources in Lattices

Page 14: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Display pay-per-view movies shown on August 14th,1998 at 9:30am.

Using Buckets (S1|S3) in AlternatePartition and (S5 S1) and

(S5S3)in SimilarPartition

Page 15: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Web Source Response Time Estimation Tool - MDTProblem: Difficulty in determining evaluation costs Physical implementation details unknown Load on network and source unknownObjective: Tool to estimate response time based on query

feedback and estimate confidence. To be used in a combined cost-model and to choose between alternate sources.

MDT is a tool that estimates response time based on Day, Time, Quantity, etc.

Page 16: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Configuring and learning in the MDT MDT is configured for some hierarchy of dimensions Calibration of each dimension

min/ max/ scale Allowed deviation Confidence window

Learning algorithm Cell splitting algorithm Value correction algorithm Estimate response time and confidence

Page 17: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Correcting the confidence of estimated value

Page 18: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Page 19: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Page 20: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Page 21: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Page 22: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Page 23: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Conclusions Extend the Predator O-R DBMS with scalable

mediator functionality Current implementation status

Scrambling enabled optimizer Mediator algebra and logical optimizer Cost-based optimizer based on MDT estimation

Toolkit for generating wrappers for Web sources

Page 24: University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland

Still to come … Publishing source metadata Discovering sources Source selection using metadata User profiles Dissemination of relevant data