interactive query processing vijayshankar raman computer science division university of california...

62
Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

Post on 22-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

Interactive Query Processing

Vijayshankar RamanComputer Science Division

University of California at Berkeley

Page 2: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 2

Motivation: Nature of querying

querying = extracting information from data sets different techniques in different settings

intrinsically slow significant user-system interaction

info. seekers work iteratively, gradually refining requests based on feedback

Page 3: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 3

Problems with traditional solutions

mismatch between system functionality and mode of HCI

black boxes• do batch processing• frustrating delays in iterative process

process

query

exactanswer

Page 4: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 4

Interactive processing

HCI requirements• users must get continual feedback on results of processing• allow users to control processing based on prior feedback

performance goals• not to minimize time to give complete results• give continually improving partial results• adapt to dynamically specified performance goals

process

query

exactanswer

Page 5: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 5

Context and Talk Outline query processing over structured data processing = dataflow through

pipelining operators

outline of rest of talksupport for dynamic user control in traditional query proc.

architectures (static plans)• architecture for giving more aggressive partial results• policy for generating partial results

client user interface impact of user actions on query execution

other related issues• interactive data cleaning (A-B-C)• adaptive query processing

R S P Q

Page 6: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 6

Design goals in supporting user control

make minimal change to system architecture

must be independent of particular query processing algorithms

no delay in processing

Page 7: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 7

Online Reordering (Raman et al. ’99,’00)

users perceive data being processed over time• prioritize processing for “interesting” tuples• interest based on user-specified preferences

reorder dataflow so that interesting tuples go first

encapsulate reordering as pipelined dataflow operator

scan

reorder in

dex sca

n

Page 8: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 8

Framework for Online Reordering

want no delay in processing in general, reordering can only be best-effort

typically process/consume slower than produce• exploit throughput difference to reorder

two aspects• mechanism for best-effort reordering• reordering policy

acddbadb...

abcdabc..

reorder consumeproduceprocess

f(t)

user interest

Page 9: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 9

Juggle mechanism for reordering

continually prefetch from input, spooling onto auxiliary side disk if needed

juggle data between buffer and side disk, to keep buffer full of “interesting” items• getNext chooses best item currently on buffer

reordering policy determines what getNext returns, and enrich/spool decisions

buffer

spoolfetch enrich

getNextjuggle

side diskproduce

process/consume

Page 10: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 10

Reordering policies

quality of feedback for a prefix t1t2

…tk

QOF(UP(t1), UP(t2

), … UP(tk )), UP = user preference

• determined by application

goodness of reordering: dQOF/dt implication for juggle mechanism

• process gets item from buffer that increases QOF the most

• juggle tries to maintain buffer with such items

time

QOFGOAL: “good” permutation of

items t1…tn to t1…tn

Page 11: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 11

One application of reordering online aggregation [Hellerstein et. al ‘97,

Hellerstein and Haas’99 ]• for SQL aggregate queries, give gradually improving

estimates • with confidence intervals

• allow users to speed up estimate refinement for groups of interest

• prioritize for processing at a per-group granularity

SELECT AVG(gpa) FROM studentsGROUP BY college

Page 12: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 12

Online Aggregation Screenshot

SELECT AVG(gpa) FROM studentsGROUP BY college

Page 13: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 13

QOF in Online Aggregation

avg weighted confidence interval preference acts as weight on confidence interval

• QOF = UPi / ni , ni = number of tuples processed from group i

process pulls items from group with max UPi / nini

desired ratio of group i tuples on buffer = UPi2/3/ UPj

2/3

• juggle tries to maintain this by enrich/spool

Page 14: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 14

Other Quality of Feedback functions

rate of processing (for a group) preference • QOF = (ni - nUPi)

2 (variance from ideal proportions)

process pulls items from group with max (nUPi - ni )

desired ratio of group i tuples in buffer = UPi

will see more complex QOF later

Page 15: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 15

Results: Reordering in Online Aggregation

implemented in Informix UDO experiments with modified TPC-D queries questions:

• how much throughput difference is needed for reordering• can we reorder handle skewed data

one stress test: skew, minimal proc. cost

• index-only join• 5 orderpriorities, zipf distribution

A B C D E 1 1/2 1/3 1/4 1/5

consume

process

scan

juggle

index

SELECT AVG(o_totalprice), o_orderpriorityFROM order WHERE exists ( SELECT * FROM lineitem WHERE l_orderkey = o_orderkey)GROUP BY o_orderpriority

Page 16: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 16

Performance results A B C D E

initial preferences 1 1 1 5 3

after T1 1 1 3.5 0.5 1

time

# t

up

les

pro

cess

ed

3 times faster for interesting groups 2% completion time overhead

E

C A

con

fid

en

ce

inte

rval

time

Page 17: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 17

Findings

higher processing costs (index/hash join, subquery, …) make reordering easy

at very low processing costs, juggle constrained bydensity of interesting tuples in source• outlier groups hard to speed up

• better to use index on reordering column [HHW ‘97] reordering becomes easier over time

questions to answer:• where to place juggle• are groups fixed statically

Page 18: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 18

Outline

motivation and context support for dynamic user control in traditional

query proc. architectures architecture for giving more aggressive partial

results policy for generating partial results

• user interface for displaying partial results • impact on routing

related work, work-in-progress, future work

Page 19: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 19

Generating partial results traditional arch. also generate continual result tuples

• arises from continual dataflow thru pipelining operators• much work on pipelining joins [WA91, HH99, IFF+99, UF00]

this is too rigid • especially in distributed envirorments

Resu

l t S

pace

Page 20: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 20

Context: Query processing in Telegraph

Telegraph: adaptive dataflow system to query diverse, distributed sources

much data available as services over Internet• currently only accessible by browse/search/forms

want to semi-automatically combine this data

examples:• campaign finance information (fff.cs.berkeley.edu)

multiple donor lists, home prices, census info., crime ratings, maps, celebrity lists, legislative records …

• restaurant information lists, reviews, addresses, maps, health inspection reports,

nutrition information …

Page 21: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 21

More aggressive partial results complete results too rigid

• source latencies high, diverse• dynamic source variations, delays• query does not capture user desires

want to process queries flexibly, giving results asap• adapt dynamically to user preferences and source variations• allow user feedback to refine partial results

Page 22: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 22

Correctness of partial results must contain some essential columns

• group-by columns / sort-by columns

full disjunction/outer-join semantics• appropriate for many Internet data sources

join semantics• cannot give partial results without ensuring that match

exists

aggregates: over-estimate, even with outer-join semantics• update early, and compensate later

Page 23: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 23

Dynamic query plans

Eddy [Avnur and Hellerstein 2000]• router for directing data thru modules• dynamically chooses join and selection order• all partial tuples generatable

original routing policy: optimize completion time our focus: continual partial results, adaptation to

user preferences

Eddy

R S

T

Page 24: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 24

Eddy Routing

Eddy must decidea) what tuple to route nextb) where to route it

based on user preferences, and source properties

need routing policy and a reordering mechanism• eddy memory and module queues bounded

. . . modules

R PS . . . inputs

Eddy

Page 25: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 25

Prioritizing tuples

perform online reordering within Eddy• all incoming tuples placed in reorderer• when space available on queues to modules, Eddy takes tuple from

reorderer and routes it to modules

best-effort automatically reorder before slow modules

. . . modules

R PS . . . inputs

copy

Page 26: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 26

Routing and reordering policy

GOAL: at any time, route to max. dQOF/dt = benefit of sending tuple to module / cost

cost: estimate data rates to/from module benefit: dependent on application and UI

• how partial results impact the UI• user preferences

QOFGOAL:

tim

e

Page 27: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 27

Outline

motivation and context online reordering for user prioritization of partial

results architecture for generating more aggressive

partial resultspolicy for generating partial results

• Telegraph UI: displaying results and inferring preferences• impact on routing • experimental results

related work/work-in-progress/future work

Page 28: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 28

Telegraph UI

screenshot

Page 29: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 29

Telegraph UI

partial results displayed on screen as they arrive values clustered into groups

• can roll-up or drill-down to see different granularities

client has hash table mapping groups to values navigation

• vertical/horizontal scrolling• column (un)hiding• rollup/drilldown

different columns visible at different drilldown levels

Page 30: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 30

Getting user preferences (1)

infer from navigation• row/column scrolling, group drill down and rollup

prioritize visible rows/columns

• “query evolution” subset “one-size-fits-all” queries future work: query expansion

explicit• at the cell level -- need for some expensive sources• can also have up/down buttons on columns

Page 31: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 31

Getting user preferences (2)

preferences:• col priorities• {<predicate, row priority, row-col priorities>… }

e.g. {<1, 1.0, none>, <2, 1.7, none>,

<3, 1.3, map=high>, <8/33014, 1.0, none>, <8/60610, 1.0, none>}

Page 32: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 32

Benefit of a partial result

depends on user preferences benefit of updating a cell in output

• row priority x column weight x cell worthiness

• column weight = max(col priority, row-col priority)• incremental cell worthiness: how much does one extra

update add to the cell’s value quality of feedback enumerations -- 1 aggregations -- change in confidence interval

informing user about execution progress• map cell worthiness to foreground color value

Page 33: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 33

Benefit of Routing a Tuple route tuple according to expected benefit and cost

benefit of sending a tuple t to a module M and forming set T = cells c T benefit of updating c

estimate fanout by sampling previous values -- selectivity estimation

t M T

Page 34: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 34

Granularity of Reordering/Routing

reorder and route tuples at granularity of a group group = <base relations, predicate>

groups created and deleted dynamically, as user navigates in UI

group predicate may be a subset of application-specified row predicate• final row depends on values this tuple joins with

Page 35: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 35

Partial results, no reordering

Bush Contributors Income Crime Ratings

800200 400 6000

10000

20000

40000

30000

50000

time (s)

tota

l w

ort

hin

ess

with p

artia

l res

ults

without partial results

Page 36: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 36

Partial results, no reordering

0 100 200 300

10000

20000

Bush Contributors Income Crime Ratings

time(s)

tota

l w

ort

hin

ess

100 200 3000

1000delay

with partial results

without

partial results

Page 37: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 37

Results with row-wise reordering

scrolling: {AZ, AR, CA CO, CT}, LA,KY,MA,MD,MI}

0

100

40 80

Ag

gr.

Up

date

s

time

0

0.24

groups

rel. e

rror

Page 38: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 38

Prioritizing particular columns

previous graph: Income updates much faster than Crime Ratings• can we prioritize Crime Ratings?

increase number of threads to probe it

if eddy sends na tuples to A, nb to B, per secondna <= threadsa/latency(A), nb <= threadsb/latency(B)

threadsa + threadsb <= Number of Threads Per Query

max. na worthiness(A) + nb worthiness(B)

more generally, optimize for multiple resources• good citizenship • costs

Page 39: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 39

Outline

motivation and context online reordering for user prioritization of partial

results architecture for generating more aggressive

partial results policy for generating partial results conclusions and future work

Page 40: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 40

Related Work

IR work• ranked retrieval, relevance feedback• search strategies, Berry Picking

making operators pipelining (Ripple Join, XJoin, Tuqwila) precomputed summaries (OLAP, materialized views, AQUA) top N / fast-first query processing

parachute queries, union queries

adaptivity• dynamic query plans• mid-query reoptimization (KD’98, IFF+99)• competition (DEC Rdb)

Page 41: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 41

Summary query processor should care about how

user/application uses results online reordering effective way of supporting

dynamic user control give partial results as user wishes by embedding

reordering within dynamically controlled dataflow

system flexibility helpful hard to map these user-interaction needs into

concrete algorithm performance goals• wanted: benchmarks based on user/application traces

For more information:http://telegraph.cs.berkeley.edu/, http://control.cs.berkeley.edu/

Page 42: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 42

Future Work

session-granularity query processing• query evolution• lazy evaluation: user/client navigates through partial results• closer dbms-application interaction

reduced operator set querying• execute query by appropriate routing through data sources

and state modules

Page 43: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 43

Lessons Learned query processor should care about how

user/application uses results

system flexibility helpful

hard to map these user-interaction needs intoconcrete algorithm performance goals• wanted: benchmarks based on user/application traces

For more information:• http://telegraph.cs.berkeley.edu/

• http://control.cs.berkeley.edu/

Page 44: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 44

relational operators: logical astractions• encapsulate multiple physical effects

inflexible in handling unexpected changes• information hiding

operator speeds resource consumption

• work sharing competitive access paths

• inflexible tuple routing suppose user asks for SQ tuples suppose there is a delay in P

want to encapsulate at level of physical operators: data sources, data structures

Granularity of Query Operators

R SP

Q

R

Eddy

S

Q

RS RS-PQ PQ PQij

Page 45: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 45

State Modules

store state in separate State Modules (SteMs)• SteM like an index: supports builds and probes• unifies caches, join state, rendezvous buffers

Eddy routes tuples through SteMs and data sources• gets results from sources, or probe/build into SteMs

joins and data access performed in the process of routing

R S P Q

Q R

Eddy

SteMP SteMQSteMSSteMR

S P

QR

Eddy

S

Q

RS RS-PQ PQ PQij

Page 46: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 46

HCI Motivation: Why Interactivity?

Berry picking (Bates ‘90, ‘93)

user studies• decision support (O’day and Jeffries ‘93)• relevance feedback (Koenemann and Belkin ‘96)

Page 47: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 47

BACKUP: Disk data layout during juggling

performance goal:• favor early results: optimize Phase 1, at expense of Phase 2• spool : sequential I/O, enrich: random I/O + reordering in Phase 2 much easier than in Phase 1

enrich needs approx. index on side-disk• have hash-index on tuples, according to user interest• done at “group” granularity

Page 48: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 48

BACKUP: Other applications of reordering

can add reorder to any dataflow

will later discuss application for query systems that give aggressive partial results

also useful in batch query processing • sorting often used in query plans for performance gains• can replace by best-effort reordering• little performance hit, but plan is now pipelining• will not discuss further in this talk

spreadsheets

Page 49: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 49

BACKUP: Estimators for rowPriority, incremental

worthiness easy, unless output row depends on new values

• currently use average of all possible rows• could estimate a distribution instead

Page 50: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 50

BACKUP: Interactive query processing with non-pipelining operators

basic techniques independent of pipelined operators. • reordering --- can be used in general query plans,

although effectiveness may be hindered by blocking operators

• State Modules applicable in general -- helps with adaptivity

much work on making plan operators pipelining -- ripple join, xjoin, tuqwila• reordering itself can be used to avoid blocking sorts in

some cases

Page 51: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 51

Query processing using SteMs (1) SteM operations -- works like index on a table

• build(tuple): add tuple to SteM• probe(tuple): find all matches among build values

bounce back tuple if all matches not found and probe tuple not cached elsewhere

performing joins by routing through SteMs• index joins

regular synchronous index joins easy -- have no state over distributed sources, lookups cached by building into SteM

• separating cache allows Eddy to distinguish access cost

• asynchronous index joins [GW’00] helps for high-throughput Internet sources SteM acts as rendezvous buffer for

pending probe resultsR

Eddy

SteMSSteMR

S

R bl

d.

S bl

d.R pr

obe

S pr

obe

R

Page 52: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 52

Query processing using SteMs (2) performing joins by routing through SteMs (contd)

• doubly pipelined hash join• use two SteMs, one on each source

can also do extensions (e.g. Tuqwila [IFF+99]), and even non-pipelined joins, using appropriate SteMs

thus can simulate any static query plan by appropriate routing

R

Eddy

SteMSSteMR

R bl

d.

S bl

d.R pr

obe

S pr

obe

S

Page 53: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 53

Query processing using SteMs (3) how to do this in general?

• don’t want Eddy to be a super-join• want it to route as per user desires and source properties,

independent of any join algorithm

routing constraint:• each tuple must have a non-zero probability of being sent to

every appropriate module• can avoid repeated looping by marking tuples

Theorem: any routing that satisfies above constraint will produce all query results and terminate

could get duplicates• prune at application, or• enforce atomicity in tuple routing

Page 54: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 54

Benefits of SteMs adaptivity

• join algorithm and access path selection are determined by eddy routing

• hence Eddy can dynamically choose these based on source properties

• in fact, Eddy can do hybrid join algorithms adapt to dynamic delays, etc.

work sharing -- for query evolution, competition better information to the Eddy

• speeds, memory consumption

Page 55: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 55

BACKUP: Lazy Evaluation

User navigates thru partial results• can see only a few at a time

exploit to delay processing• do on demand on the region user currently navigating

over

things to delay• expensive source lookups (e.g. map)• joins (e.g. …)• any other formatting etc.

Page 56: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 56

Backup: giving probabilistic results

view partial result as somethng that throws light on outer-join space• positive result: probabilistic • negative result: deterministic

benefit of positive result:• weighted sum of cells it is likely to show; penalty for false

result• repudiation of earlier false results

benefit of negative result:• direct benefit unlikely• repudiation of earlier false results

complications• many tuples may contribute to single output tuple --

aggregation

Page 57: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 57

Probabilistic partial results

partial result => may not apply all filters

still want to show result probabilistically• filters ill-specified• data on Internet often inconsistent • show all partial results: “full disjunction” (Galindo-Legaria

‘94)

positive and negative results

Outp

ut

Space

Oute

r J o

i n S

pace

Page 58: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 58

What does user see?

user navigates thru partial results

with complete row partial results:• can either explore these results in detail (sort/scroll along

columns as in spreadsheet)• compute approx. aggregates on them in online fashion

errors shrinking as more results come in

with probabilistic, incomplete results?• easy if primary key in partial result• what to show otherwise?

Page 59: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 59

Quantifying benefit of partial results

partial result sheds light on outer-join space• positive result: probabilistic • negative result: deterministic

benefit of positive result:• weighted sum of cells it is likely to show• penalty for false result ?• repudiation of earlier false results ?

benefit of negative result:• direct benefit unlikely• repudiation of earlier false results ?

Page 60: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 60

Why probabilistic results are good

data on Internet is often inconsistent• hence even exact processing cannot give perfect answers

people are used to sloppy answers can allow negations in expert interfaces only

Page 61: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 61

Data growth vs. Computer Speedup Moore’s Law -- # of transistors/chip doubles every 18 months

(1965) data growth

0

500

1000

1500

2000

2500

3000

3500

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

Year

Petabytes

Sales

Moore'sLaw

Source: J. Porter, Disk/Trend, Inc. (http://www.disktrend.com/pdf/portrpkg.pdf)

Page 62: Interactive Query Processing Vijayshankar Raman Computer Science Division University of California at Berkeley

April 19, 2023 Interactive Query Processing 62

Disk Appetite, contd.

Greg Papadopoulos, CTO Sun:• Disk sales doubling every 9 months

similar results from Winter VLDB survey

time to process all your data doubles every 18 months!