crowdsourced databases: query processing with people · joe, bad @ response,1234567890 joe,...

86
Crowdsourced Databases: Query Processing with People Adam Marcus, Eugene Wu, David Karger, Sam Madden, Rob Miller MIT CSAIL

Upload: others

Post on 09-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Crowdsourced Databases: Query Processing with People

Adam Marcus, Eugene Wu, David Karger, Sam Madden, Rob Miller

MIT CSAIL

Page 2: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Crowdsourced Databases: Query Processing with People

Adam Marcus, Eugene Wu, David Karger, Sam Madden, Rob Miller

MIT CSAIL

How to Crowdsource the Introduction of

Your Talk to OtherResearch Groups

Page 3: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

CIDR Deadline

Good ideas in DatabasesJane, John

AbstractTraditional databases fi l l in here.This leads to fi l l in here.We propose find a good name, which what does it do?.

Page 4: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

OR

$

Page 5: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 6: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 7: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 8: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 9: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Turker Interface

Human Intelligence Task

(HIT)

Page 10: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Uses of Human Computation

● Data cleaning/integration (ProPublica)● Finding missing people (Haiti, Fossett, Gray)● Translation/Transcription (SpeakerText)● Word Processing (Soylent)● Outsourced insurance claims processing● Data journalism (Guardian)

Page 11: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Challenges in Human Computation

● Workers are not silicon drones● Worker latency is minutes, hours ● Optimization parameters (price, accuracy) and

model are different

Page 12: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Challenges in Human Computation

● Workers are not silicon drones● Worker latency is minutes, hours ● Optimization parameters (price, accuracy) and

model are different

These problems are currentlyaddressed in an ad-hoc way

Page 13: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Qurk is a declarative workflow management system that allows human computation over

relational data

Page 14: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Qurk is a declarative workflow management system that allows human computation over

relational data

(a system which throws humans in the direct path of query execution)

Page 15: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Data Model/Query Model

● UDFs that compile to HTML forms● List types

● Make relational with Pig-style FLATTEN operator● Collapse to single values with UDAs

Page 16: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

TASK pcInfo(name, affiliation) Text: “What is the email address and phone number of %s, who works at %s”, name, affiliation Response: Form((“Email Address”,String),

(“Phone Number”,String))

Page 17: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

SELECT name, FLATTEN(pcInfo(name, affiliation)) FROM pcINTO temporary Q1

Page 18: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

SELECT name, FLATTEN(pcInfo(name, affiliation)) FROM pcINTO temporary Q1

Joe,((h@berkeley, 123-456-7890)(h@berkeley, (999)4445555)(bad @ response,1234567890)

Joe, h@berkeley, 123-456-7890Joe, h@berkeley, (999)4445555)Joe, bad @ response,1234567890

flatten

)

Page 19: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

SELECT name, majorityVote(normalize(phone)), majorityVote(normalize(email)),

FROM Q1GROUP BY name

Page 20: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

SELECT name, majorityVote(normalize(phone)), majorityVote(normalize(email)),

FROM Q1GROUP BY name

Joe, h@berkeley, (123)456-7890Joe, h@berkeley, (999)444-5555Joe, bad@response,(123)456-7890

normalize

Joe, h@berkeley, 123-456-7890Joe, h@berkeley, (999)4445555)Joe, bad @ response,1234567890

Page 21: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

SELECT name, majorityVote(normalize(phone)), majorityVote(normalize(email)),

FROM Q1GROUP BY name

Joe, h@berkeley, (123)456-7890Joe, h@berkeley, (999)444-5555Joe, bad@response,(123)456-7890

normalize

Joe, h@berkeley, 123-456-7890Joe, h@berkeley, (999)4445555)Joe, bad @ response,1234567890

Joe, h@berkeley, (123)456-7890

majorityVote

Page 22: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

SELECT survivor.nameFROM survivor, missingWHERE majorityVote(samePerson(survivor.img,

missing.img))

Haiti Join

Page 23: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Async Executor

Storage Engine

σ

BA

Results

a1

a2 b1

Qurk System Design

Page 24: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Async Executor

Storage Engine

Task Manager

Task4

Task5

Task6

Results

Tasks

Results

σ

BA

Page 25: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Async Executor

Storage Engine

σ

BATask Manager

Task4

Task5

Task6

Results

Co

mpi

led

HIT

s

HIT

res u

lts

Tasks

Results

InternalHIT

Results

Page 26: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Async Executor

Storage Engine

σ

BATask Manager

Task4

Task5

Task6

Results

Co

mpi

led

HIT

s

HIT

res u

lts

Tasks

Results

InternalHIT

Results

Statistics Manager Online Optimizer

Page 27: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σ

BATask Manager

Task4

Task5

Task6

Results

a1

a2 b1

Co

mpi

led

HIT

s

HIT

res u

lts

Tasks

Results

InternalHIT

Results

Optimization Opportunities

$$$, accuracy, latency

Page 28: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Storage Engine

Statistics Manager Online Optimizer

Query ExecutorTaskMgr.

Page 29: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Storage Engine

Statistics Manager Online Optimizer

Query ExecutorTaskMgr.

Page 30: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Combine Operators

σ

Photos

male?

adult?

σ

Page 31: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Combine Operators

σ

Photos

male?

adult?

σ

Page 32: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Combine Operators

σ

Photos

male?

adult?

σ

Page 33: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Combine Operators

σ

Photos

male?

adult?

σ

Page 34: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Batch Tuples

σ

Photos

male?

adult?

σ

Turker 1

Turker 2

Page 35: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Batch Tuples

σ

Photos

male?

adult?

σ

Turker 1

Page 36: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Batch Tuples

σ

Photos

male?

adult?

σ

Turker 1

Early Experimental Result:Batching Tuples

+ Maintains accuracy - Increases latency

Page 37: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Join Optimizations

● Avoid cross product

Page 38: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Join Optimizations

● Avoid cross product● Technique 1: batching

Page 39: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Join Optimizations

● Avoid cross product● Technique 1: batching● Technique 2: join heuristics reduce search space

● e.g., image equi-join: gender, ethnicity match

Page 40: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Turker Fatigue

Page 41: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Turker Fatigue

Skew Correction

Page 42: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Turker Fatigue

Skew Correction

Early Experimental Result:Skew Correction

+ Improves Turker accuracy - Increases number of HITs

Page 43: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Storage Engine

Statistics Manager Online Optimizer

Query ExecutorTaskMgr.

Page 44: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

MTurk

Storage Engine

Statistics Manager Online Optimizer

Query ExecutorTaskMgr.

Compiler

Cache

Page 45: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

MTurk

Storage Engine

Statistics Manager Online Optimizer

Query ExecutorTaskMgr.

Compiler

Cache

ML Models

Page 46: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Storage Engine

Statistics Manager Online Optimizer

Query ExecutorTaskMgr.

Page 47: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Online optimizations

● Batch size per HIT● Price per HIT

● Small effect on accuracy● Large effect on latency

● Turkers per HIT

Page 48: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Storage Engine

Statistics Manager Online Optimizer

Query ExecutorTaskMgr.

Page 49: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Human Computation Platforms

● Paid Crowd (e.g., MTurk)● Experts (e.g., Aardvark)● Games (e.g., Image Labeling)

Page 50: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Qurk

● Human computation inside relational databases● Data model: SQL + Lists● Asynchronous system design● Lots of optimization opportunities

Page 51: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Qurk

● Human computation inside relational databases● Data model: SQL + Lists● Asynchronous system design● Lots of optimization opportunities● Probably still can't con your way into CIDR

ask us for a [email protected]/[email protected]

Page 52: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Image credits

● Clock: http://www.cs.wichita.edu/~vnambood/research.htm

● CAPTCHA: http://en.wikipedia.org/wiki/CAPTCHA

Page 53: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Statistics

● ~80k HITs available at any time● ~$8K worth of work at any time● ~1.2K projects at any time● Source: http://www.mturk-tracker.com/general/

Page 54: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Other Human Computation Platforms

● Outsourced insurance claims processing● Data journalism (Guardian)● CAPTCHA ( )● Games with a purpose (image labeling)

Page 55: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Operator Implementations

● Non-blocking to allow pipelining (e.g., symmetric hash join)

● Join batching● Rank by comparison, rating, or partial order

Page 56: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

SELECT name, affiliationFROM pc

Aggregate

3 turkers5 cents each

UPDATE pcSET phone =...

Page 57: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

SELECT name, affiliationFROM pc

Aggregate

3 turkers5 cents each

UPDATE pcSET phone =...

Data processed outside of DBAd-hoc parameter tuningPrimitive uncertainty reasoningLogical plan = Physical plan

Page 58: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 59: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 60: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Ditch images, make it a list

Page 61: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 62: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Soylent's Shortn

Page 63: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 64: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE
Page 65: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

Wrap this into mechanical turk slide, do as text

Page 66: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

(want more about information integration,data cleaning)

Page 67: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Page 68: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuples

Page 69: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

a1

a2b1

Batch Tuples

Page 70: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

a1

a2b1

Batch Tuples

Page 71: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand Operators

Page 72: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand Operators

σ B

A

σmale?

adult?

Page 73: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand Operators

σ B

A

σmale?

adult?

Page 74: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand Operators

σ B

A

σmale?

adult?

Page 75: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

Page 76: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

● Non-blocking for pipelining (e.g. symmetric hash join)

● Avoid |A|*|B| join cost● Necessary conditions for equality

● image equi-join: gender, ethnicity match● Perform |A| + |B| scans to find predicates,

remove impossible join pairs

Page 77: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsJoin Optimizations

● Non-blocking for pipelining (e.g. symmetric hash join)

● Avoid |A|*|B| join cost● Necessary conditions for equality

● image equi-join: gender, ethnicity match● Perform |A| + |B| scans to find predicates,

remove impossible join pairs

● Alternative: cluster images

Page 78: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

Page 79: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

Page 80: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

Page 81: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

HIT Avoidance

Page 82: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

HIT Compiler

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

HIT Avoidance

HIT Compiler

Task Model

HIT Cache

Page 83: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

HIT Avoidance

HIT Compiler

Task Model

HIT Cache OnlineOptimizations

Page 84: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

HIT Avoidance

HIT Compiler

Task Model

HIT Cache

Statistics Manager Online Optimizer

OnlineOptimizations

Page 85: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

HIT Avoidance

HIT Compiler

Task Model

HIT Cache

Statistics Manager Online Optimizer

OnlineOptimizationsHumanComputationPlatforms

Page 86: Crowdsourced Databases: Query Processing with People · Joe, bad @ response,1234567890 Joe, h@berkeley, (123)456-7890 majorityVote. SELECT survivor.name FROM survivor, missing WHERE

MTurk

Executor

Storage Engine

σBA

Task Manager

Task4

Task5

Task6

Batch Tuplesand OperatorsOperatorOptimizations

SelectivityInjection

HIT Avoidance

HIT Compiler

Task Model

HIT Cache

Statistics Manager Online Optimizer

OnlineOptimizationsHumanComputationPlatforms

Paid CrowdExpertsGames