rabj freebase all

38
The Anatomy of a Large-Scale Human-Computation Engine Shailesh Kochhar, Stefano Mazzocchi, Praveen Paritosh Freebase August Meetup

Upload: brixofglory

Post on 07-May-2015

1.009 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Rabj freebase all

The Anatomy of a Large-Scale Human-Computation Engine

Shailesh Kochhar, Stefano Mazzocchi, Praveen Paritosh

Freebase August Meetup

Page 2: Rabj freebase all

Aug 18, 2010 Freebase Meetup

1: Freebase & Human Computation

2: Example – Stanford Library

3: RABJ

4: Consensus

Page 3: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Freebase

Structured database

12 MM entites, 300 MM triples/facts

Page 4: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Where does the data come from?

Page 5: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Community contributions

Mass Data Loads

Page 6: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Human Judgments Improve Both

Page 7: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Community

Simplify contribution through games

Page 8: Rabj freebase all

Aug 18, 2010 Freebase Meetup

http://typewriter.freebaseapps.com/

Page 9: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Community

Simplify contribution through games

Enable QA for Gridworks loads

Page 10: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Page 11: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Mass Data Loads

Precision: QA for >99% accuracy

Page 12: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Book Edition QA

Page 13: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Mass Data Loads

Precision: QA for >99% accuracy

Coverage: Manual reconciliation

Page 14: Rabj freebase all

Aug 18, 2010 Freebase Meetup

matchmaker

http://matchmaker2.freebaseapps.com/

Page 15: Rabj freebase all

Aug 18, 2010 Freebase Meetup

1: Freebase & Human Computation

2: Example – Stanford Library

3: RABJ

4: Consensus

Page 16: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Reconcile Stanford Library Catalog

with freebase.com

Page 17: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Stanford Library Catalog

4.4MM book editions

1.3MM English book editions

1.2MM English books

600K authors

Page 18: Rabj freebase all

Aug 18, 2010 Freebase Meetup

For freebase, identity is key

match books, match authors

Page 19: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Automatic matching insufficient

Trained judges needed to decide hard

cases

Page 20: Rabj freebase all

Aug 18, 2010 Freebase Meetup

How to get this?

Page 21: Rabj freebase all

Aug 18, 2010 Freebase Meetup

RABJRedundant Array of Brains in a Jar

Page 22: Rabj freebase all

Aug 18, 2010 Freebase Meetup

What?

Abstraction

Powers human judgment (HJ)

applications

3.1MM judgments in 16 months

Page 23: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Provides primitive elements for more

sophisticated applications

Page 24: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Questions

Judgments

Queues

Agents

Page 25: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Design Constraints

Page 26: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Content-agnostic

Dynamic data

Low latency

Page 27: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Architecture

Page 28: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Questions contain metadata, pointers

to dynamic content

Questions added to queues

Metadata allows slicing and dicing

Page 29: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Acre applications pull questions from

RABJ

RABJ matches judge to available tasks

Acre renders question, sends

judgment back

Page 30: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Declarative consensusYes: 3, No: 3, Skip: 4, Invalid: 3, Max: 6

RABJ notifies agents when consensus

is reached

Page 31: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Scale

Page 32: Rabj freebase all

Aug 18, 2010 Freebase Meetup

2.3 MM questions

3.1 MM judgments

500+ queues

20+ applications

Page 33: Rabj freebase all

Aug 18, 2010 Freebase Meetup

1: Freebase & Human Computation

2: Example – Stanford Library

3: RABJ

4: Consensus

Page 34: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Always have leftovers

Page 35: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Perfect Consensus? Not!

Page 36: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Evaluating QAers

Page 37: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Explore

http://rabj.freebaseapps.com/explorer

Create

http://wiki.freebase.com/wiki/RABJ_Tutorial

Reference

http://wiki.freebase.com/wiki/RABJ_API/

Page 38: Rabj freebase all

Aug 18, 2010 Freebase Meetup

Questions?