on-the-fly data integration
TRANSCRIPT
09.05.2008
Mapping Data to Queries
Martin Hentschel
Systems Group, ETH Zurich
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
“…, but the real advantage of XML is precisely
that it allows you to go from Point A to
destinations unknown.”
-- Larry O’Brien,
Microsoft
2
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected] 3
Goals
Integrate data from various data feeds Light-weight
Easy to use
Fast
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected] 4
Goals
Integrate data from various data feeds Light-weight
Mapping rules Easy to use
Based on common language (XQuery)
FastImplements research ideas (YFilter)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Targets
Health care Electronic health records (Health Level 7)
Finance Exchange of financial data (xBRL)
Web services News feeds Weather
Every domain which uses several data sources
5
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Example
Find the most powerful car
6
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Example
Find the most powerful car
7
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
daten is-a db;auto is-a car;ps is-a hp;
daten is-a db;auto is-a car;ps is-a hp;
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Example
Find the most powerful car
Apply standard XQuery
8
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<db> <car> <name>Ford</name> <hp>130</hp> </car></db><daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
daten is-a db;auto is-a car;ps is-a hp;
daten is-a db;auto is-a car;ps is-a hp;
let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car
let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Example
Find the most powerful car
Apply standard XQuery
9
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<db> <car> <name>Ford</name> <hp>130</hp> </car></db><daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
daten is-a db;auto is-a car;ps is-a hp;
daten is-a db;auto is-a car;ps is-a hp;
let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car
let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car
<auto> <name>VW Golf</name> <ps>150</ps></auto>
<auto> <name>VW Golf</name> <ps>150</ps></auto>
Result
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Usage Scenarios
Continuous query processing
10
DSMSDSMS
Queries
Queries
RulesRulesStreamingInputEvents
StreamingOutputEvents
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Usage Scenarios
Publish/subscribe systems
11
RulesRules
Publishers Subscribers
EnhancedBroker
EnhancedBroker
Data
SubscriptionsData
Data
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Usage Scenarios
Data integration
12
RulesRules
Source 1
Company‘sData Store
Data
Data
DataSource 2
Source x
Homogeneous
DataData
HandlerData
Handler
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The Is-A Rule
Map XML elements
Expresses a substitutability relationship Like in object oriented design Use the car wherever vehicles are expected
It follows //vehicle also returns car elements Returned as car Not transformed into vehicle Consistent with OO-approach
13
car is-a vehicle; car is-a vehicle;
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The Is-A Rule
Map path expressions XPath path expressions Left hand side may include predicates
14
german/car is-a auto;auto is-a german/car;
german/car is-a auto;auto is-a german/car;
car[@ps < 100] is-aslow/
vehicle;
car[@ps < 100] is-aslow/
vehicle;
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The Is-A Rule
Specify contexts Element names could be used differently in
different contexts
Scope applicability of rules Further refinement
15
car in cars[@country=‘Germany’]
is-a auto;
car in cars[@country=‘Germany’]
is-a auto;
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The Is-A Rule
Element construction Map elements Transform data, e.g. for
Integration of very diverse data
16
auto as $a is-a<car>
<kw>{$a/ps * 0.74}</kw>
</car>;
auto as $a is-a<car>
<kw>{$a/ps * 0.74}</kw>
</car>;
<car> <name>Ford</name> <kw>100</kw></car>
<car> <name>Ford</name> <kw>100</kw></car>
<auto> <name>VW Golf</name> <ps>150</ps></auto>
<auto> <name>VW Golf</name> <ps>150</ps></auto>
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Implementation
Several possibilities MDQ approach
- Native approach, novel MDQ data model- Allows lazy execution
Query rewrite- E.g. //(car | auto | vehicle | ...)- Does not scale
Data translation- Translate input data- Big overhead
17
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
MDQ Data Model
Classical XML tree model
18
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
auto
psname
„Golf“ „150“
daten
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
MDQ Data Model
MDQ data model
Move names from
nodes to edges
19
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
auto
psname
„Golf“ „150“
daten
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
MDQ Data Model
Application of mapping rules
20
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
auto
psname
„Golf“ „150“
daten
daten is-a db;auto is-a car;ps is-a hp;
daten is-a db;auto is-a car;ps is-a hp;
db
car
hp
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Lazy Evaluation, YFilter
Built from left hand side of rules
Non-deterministic finite state machine
Main idea: Evaluate XQuery program Iterate through data model Report to YFilter Apply rules only when reaching an accepting
state
21
R1: daten is-a db;R2: auto is-a car;R2: ps is-a hp;
R1: daten is-a db;R2: auto is-a car;R2: ps is-a hp;
* daten
auto
ps
R1
R2
R3
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Experiment: Throughput
Complex query (multiple scans, joins)
QR: too many unions, DT: overhead of translation
22
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Experiment: Throughput
Simple query
Less unions for QR, DT: still overhead of translation
23
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Experiment: Throughput
1 input message, bundle of queries evaluated at once
QR: even more unions, DT: less overhead, only transforms input message once
24
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Again: Advantages
Performance Novel data model, lazy execution
Light-weight Mappings rules are small units
Extensibility Add more rules as new sources are adopted
Flexibility Complex mappings through element
constructors25
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The End
Visit our website, LIVE DEMO! http://fifthelement.inf.ethz.ch:8080/rules
Write us, please! [email protected]
26