tiresias - people.cs.umass.eduameli/projects/tiresias/slides/sigmod2012.pdf · 0 10 20 30 40 50 60...

26
University of Washington Database Group Tiresias The Database Oracle for HowTo Queries Alexandra Meliou §Dan Suciu § University of Massachuse9s Amherst University of Washington

Upload: others

Post on 11-Sep-2019

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

University of Washington Database Group

Tiresias  The  Database  Oracle  for  How-­‐To  Queries  

Alexandra  Meliou§✜  

Dan  Suciu✜  

§University  of  Massachuse9s  Amherst  ✜University  of  Washington  

Page 2: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Hypothetical  (What-­‐if)  Queries  

Brokerage  company  DB  

Key  Performance  Indicators  (KPI)  

Example  from  [Balmin  et  al.  VLDB’00]:  “An  analyst  of  a  brokerage  company  wants  to  know  what  would  be  the  effect  on  the  return  of  customers’  porPolios  if  during  the  last  3  years  they  had  suggested  Intel  stocks  instead  of  Motorola.”  

change  something  in  the  source  (hypothesis)  

observe  the  effect  in  the  target  

forward  

Page 3: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

How-­‐To  Queries  

Brokerage  company  DB  

Key  Performance  Indicators  (KPI)  

Modified  example:  “An  analyst  wants  to  ask  how  to  achieve  a  10%  return  in  customer  porPolios,  with  the  least  number  of  trades.”    

find  changes  to  the  source  that  achieve  the  desired  effect  

declare  a  desired  effect  in  the  target  

reverse  

Page 4: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

TPC-­‐H  example   A  manufacturing  company  keeps  records  of  inventory  orders  in  a  LineItem  table.     KPI:  Cannot  order  more  than  7%  of  the  inventory  from  any  single  country  

 Can  reassign  orders  to  new  suppliers  as  long  as  the  supplier  can  supply  the  part  

 Minimize  the  number  of  changes  

(constraints)  

(variables)  

(op\miza\on  objec\ve)   constraint  op\miza\on  

Page 5: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

extract  data  

Constraint  Optimization  on  Big  Data  

DB  

construct  op\miza\on  model  

this  is  for  a  set  of  10  lineitems  and  40  

suppliers  

Mixed  Integer  Programming  (MIP)  

solver  

transform  into  data  updates  

MathProg  

Imprac\cal!  

Page 6: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Demo:  Tiresias  

a  tool  that  makes    how-­‐to  queries  prac\cal  

Page 7: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Tiresias:  How-­‐To  Query  Engine  

DBMS

MIP solver

Tiresias  

TiQL  (Tiresias  Query  Language)  

Declara\ve  interface,  extension  to  Datalog  

RULES:

HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)

& SuppNation(sk2,country)

HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)

HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)

[c? <= 10] <- HOrderSum(country,c?)

MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)

Page 8: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Overview  

MathProg  or  AMPL  

RULES:

HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)

& SuppNation(sk2,country)

HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)

HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)

[c? <= 10] <- HOrderSum(country,c?)

MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)

TiQL  

Visualiza\ons  

Page 9: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

RULES:

HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)

& SuppNation(sk2,country)

HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)

HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)

[c? <= 10] <- HOrderSum(country,c?)

MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)

MathProg    or  AMPL  

TiQL  

Visualiza\ons  

Overview  

Demo  

Page 10: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Demo  

MathProg  or  AMPL  

RULES:

HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)

& SuppNation(sk2,country)

HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)

HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)

[c? <= 10] <- HOrderSum(country,c?)

MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)

TiQL  

Visualiza\ons  

Overview  

Language  seman\cs  

Evalua\on  of  a  TiQL  program:  Transla\on  from  TiQL  to  linear  constraints  

Performance  op\miza\ons  

Page 11: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Tiresias  Query  Language   Datalog-­‐like  nota\on:  

 TiQL  seman\cs:  Mapping  from  EDBs  (Extensional  Database)  to  possible  worlds  over  HDBs  (Hypothe\cal  Database)  

head   body:  conjunc\on  of  predicates  

HDB  

P (x̄) :-B1(x̄1) ∧B2(x̄2) ∧ · · · ∧Bn(x̄n)

HP (x̄) :- body

Page 12: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

TiQL  Rules  

Reduc\on  Rule   HP(x̄) :< body

Constraint  Rule   [arithm-pred] <- body

Deduc\on  Rule   HP(x̄) :- body

Seman+cs:    Similar  to  repair-­‐key  seman\cs  [Antonova  et  al.  SIGMOD’07],  [Koch  ICDT’09]  

Seman+cs:    Takes  a  subset  of  tuples  

Seman+cs:    The  head  predicate  needs  to  hold  for  all  tuples  

Page 13: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Demo  

MathProg  or  AMPL  

RULES:

HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)

& SuppNation(sk2,country)

HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)

HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)

[c? <= 10] <- HOrderSum(country,c?)

MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)

TiQL  

Visualiza\ons  

Overview  

Language  seman\cs  

Evalua\on  of  a  TiQL  program:  Transla\on  from  TiQL  to  linear  constraints  

Performance  op\miza\ons  

Page 14: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Evaluating  a  TiQL  Program  

Mixed  Integer  Program  (MIP)  

TiQL  RULES:

HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)

& SuppNation(sk2,country)

HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)

HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)

[c? <= 10] <- HOrderSum(country,c?)

MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)

DB  

Page 15: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

CORE HChooseSok pk sk sk’1 P15 S10 S101 P15 S10 S212 P32 S43 S102 P32 S43 S43

Evaluating  a  TiQL  Program  HChooseS(ok,pk,sk,sk’) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)

LineItemok pk sk quant1 P15 S10 222 P32 S43 45

PartSupppk skP15 S10P15 S21P32 S10P32 S43

ok pk sk sk’1 P15 S10 S102 P32 S43 S10

ok pk sk sk’1 P15 S10 S102 P32 S43 S43

ok pk sk sk’1 P15 S10 S212 P32 S43 S43

ok pk sk sk’1 P15 S10 S212 P32 S43 S10

possible  worlds  

Page 16: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Key  Constraints  CORE HChooseSok pk sk sk’1 P15 S10 S101 P15 S10 S212 P32 S43 S102 P32 S43 S43

x1

x2

x3

x4

0 ≤ xi ≤ 1∀kj :

i,key(xi)=kj

xi ≤ 1x1 + x2 ≤ 1x3 + x4 ≤ 1

Key(ok, pk, sk)

ok pk sk sk’1 P15 S10 S101 P15 S10 S21

NOT  a  possible  world  

Page 17: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Provenance  Constraints   A  TiQL  rule  specifies  transforma\ons   Transforma\ons  define  provenance  

 Boolean  seman\cs  for  queries  without  aggregates   Semi-­‐module  provenance  for  queries  with  aggregates  [Amsterdamer  et  al.  PODS’11]  

Y = X1 ∨X2 ∨ . . . ∨Xn

∀i, y ≥ xi

y ≤�

i

xi

Disjunc+on:  

Y = X1 ∧X2 ∧ . . . ∧Xn

∀i, y ≤ xi

y ≥�

i

xi − (n− 1)

Conjunc+on:  

Page 18: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Demo  

MathProg  or  AMPL  

RULES:

HChooseS(ok,pk,sk,qnt,sk’,country) :- PartSupp(pk,sk’) & LineItem(ok,pk,sk,qnt)

& SuppNation(sk2,country)

HLineItem(ok,pk,sk’,qnt) :- HChooseS(ok,pk,sk,qnt,sk’,country)

HOrderSum(country,count(*)) :- HChooseS(ok,pk,sk,qnt,sk’,country)

[c? <= 10] <- HOrderSum(country,c?)

MAXIMIZE(count(*)) :- HChooseS(ok,pk,sk,qnt,sk,country)

TiQL  

Visualiza\ons  

Overview  

Language  seman\cs  

Evalua\on  of  a  TiQL  program:  Transla\on  from  TiQL  to  linear  constraints  

Performance  op\miza\ons  

Page 19: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Optimizing  Performance   Model  op\mizer  

 eliminates  variables,  constraints,  and  parameters   uses  key  constraints,  func\onal  dependencies,  and  provenance  

 Par\\oning  op\mizer  Significantly  faster  than  lefng  the  MIP  solver  do  it  

max(x1 + x2 + x3 + x4)s.t : x1 + x2 ≤ 50

x3 + x4 ≤ 50xi ≥ 0

max(x1 + x2)s.t : x1 + x2 ≤ 50

xi ≥ 0

max(x3 + x4)s.t : x3 + x4 ≤ 50

xi ≥ 0

Page 20: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

1

10

100

1000

10000

100 1000 10000

Runt

ime

(sec

)

Data size (# of tuples)

without model optimizationwith model optimization

Evaluation  of  the  Model  Optimizer  

baseline  

with  op\miza\on  

Page 21: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

0

10

20

30

40

50

60

70

80

0 100 200 300 400 500 600

Runt

ime

(sec

)

Group size (# of partitions per group)

TiQL to MIP translationMIP solver

total Tiresias query time

0

5000

10000

15000

20000

25000

0 100 200 300 400 500 600

Runt

ime

(sec

)

Group size (# of partitions per group)

TiQL to MIP translationMIP solver

total Tiresias query time

Evaluation  of  Tiresias  Partitioning  

10k  tuples   1M  tuples  

granularity  of  par\\oning  

complex  dependency  on  the  granularity  of  par\\oning  

Page 22: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

0

1

2

3

4

5

6

7

8

5k 10k 50k 100k 500k 1M

Runt

ime

(sec

)

Data size (# of tuples)

MIP solver / per partition (avg)MIP constructor / per partition (avg)

Scalability  

The  MIP  solver  run\me  (per  par\\on)  does  not  increase  

with  data  size  

Constructor  \me  depends  on  DB  query  execu\on  \me  

Page 23: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Related  Work    Provenance  

[Amsterdamer  et  al.  PODS’11],  [Cui  et  al.  TODS’00],  [Green  et  al.  PODS’07]  

  Incomplete  databases  [Antonova  et  al.  SIGMOD’07],  [Imielinski  et  al.  JACM’84],  [Koch  ICDT’09]  

 Other  RDM  problems  [Arasu  et  al.  SIGMOD’11],  [Binnig  et  al.  ICDE’07],  [Bohannon  et  al.  PODS’06],  [Fagin  et  al.  JACM’10]  

Page 24: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Next  Steps  with  Tiresias  

Tiresias  Handling  non-­‐par\\onable  problems  

Approxima\ons  

Paralleliza\on  and  handling  of  skew  

Result  analysis  and  feedback-­‐based  problem  genera\on  

Page 25: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

SIGMOD  Demo  Group  C  Loca+on:  Vaquero  A      Time:  13:30-­‐15:00  

Page 26: Tiresias - people.cs.umass.eduameli/projects/tiresias/slides/SIGMOD2012.pdf · 0 10 20 30 40 50 60 70 80 0 100 200 300 400 500 600 Runtime (sec) Group size (# of partitions per group)

Contributions   How-­‐To  queries  

 Using  MIP  solvers  to  answer  How-­‐To  queries  

 Tiresias  prototype  implementa\on  

hlp://db.cs.washington.edu/\resias