university of maryland scaling heterogeneous information access for wide area environments michael...
DESCRIPTION
University of Maryland The Big PictureTRANSCRIPT
University of Maryland
Scaling Heterogeneous Information Access for Wide area Environments
Michael Franklin and Louiqa Raschid
University of Maryland
Wide-Area Data AccessProblems Scalability of Wrapper-Mediator Systems Publishing and Discovery of Sources Dissemination of Relevant Information
Relevant Technologies Flexible Architectures Adaptive Systems Metadata Management
University of Maryland
The Big Picture
University of Maryland
the little picture
Predator O-R DBMS
Remote wrapper interface
Planner
ScramblerMDT
Wrapper interface
Web sources
University of Maryland
Querying Web Sources Generating wrappers for Web accessible sources to
provide an API for queries and structured answers. Obtaining and representing source capability and
content descriptions to use in query planning. Estimating the response time for cost-based
optimization
University of Maryland
Web application wrapper toolkit Define the capabilities of Web sources A wrapper interface to publish source capability A wrapper toolkit
Translation from query + bindings –› URL Declarative language to specify Extractors Simple extractors HTML or XMLData –» structured object Complex extractors - customizable crawler utility for extraction of
meta-information Generator for JDBC compliant wrappers Metadata and query and answer interface
University of Maryland
Weather source
University of Maryland
Results from the Weather source
University of Maryland
University of Maryland
Query Planning for Web sourcesObjective: Generate safe optimal plans with possibly
replicated sources Multiple heterogeneous sources
Limited capability (bindings) Possible replication of contents Complete / Incomplete sources
Use meta-information to construct lattices Generate safe plans with alternatives Mediator algebra and rules for optimization
University of Maryland
University of Maryland
Content and Capability Descriptions Domain information Capability descriptions:
I/O relationships: Time,Date Channel,Title,Category Content: Date:CurrentYear Time:{0, …,23} Channel:CNW Completeness information, Complete. Source S3 provides complete
answer when Time and Date are bound and Channel=ppv and Category=Movies. Explicitly provided by the source DBA. Augmented by inference. Augmented by learning based on query feedback
University of Maryland
Sources in Lattices
University of Maryland
Display pay-per-view movies shown on August 14th,1998 at 9:30am.
Using Buckets (S1|S3) in AlternatePartition and (S5 S1) and
(S5S3)in SimilarPartition
University of Maryland
Web Source Response Time Estimation Tool - MDTProblem: Difficulty in determining evaluation costs Physical implementation details unknown Load on network and source unknownObjective: Tool to estimate response time based on query
feedback and estimate confidence. To be used in a combined cost-model and to choose between alternate sources.
MDT is a tool that estimates response time based on Day, Time, Quantity, etc.
University of Maryland
Configuring and learning in the MDT MDT is configured for some hierarchy of dimensions Calibration of each dimension
min/ max/ scale Allowed deviation Confidence window
Learning algorithm Cell splitting algorithm Value correction algorithm Estimate response time and confidence
University of Maryland
Correcting the confidence of estimated value
University of Maryland
University of Maryland
University of Maryland
University of Maryland
University of Maryland
University of Maryland
Conclusions Extend the Predator O-R DBMS with scalable
mediator functionality Current implementation status
Scrambling enabled optimizer Mediator algebra and logical optimizer Cost-based optimizer based on MDT estimation
Toolkit for generating wrappers for Web sources
University of Maryland
Still to come … Publishing source metadata Discovering sources Source selection using metadata User profiles Dissemination of relevant data