derek hansen, jake gehring, patrick schone, and matthew reid family history technology workshop...

20
DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY & QUALITY: COMPARING A-B-ARBITRATE AND PEER REVIEW

Upload: kathryn-margery-hunt

Post on 18-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

FAMILYSEARCH INDEXING

TRANSCRIPT

Page 1: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

D E R E K H A N S E N , J A K E G E H R I N G , PAT R I C K S C H O N E , A N D M AT T H E W R E I D

FAMILY HISTORY TECHNOLOGY WORKSHOPFEBRUARY 3, 2012

IMPROVING INDEXING EFFICIENCY & QUALITY:COMPARING A-B-ARBITRATE AND PEER REVIEW

Page 2: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

FAMILYSEARCH

Page 3: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

FAMILYSEARCH INDEXING

Page 4: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B-ARBITRATE PROCESS (A-B-ARB)

A

B

ARB

Page 5: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

THE PROBLEM

Page 6: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

OUR APPROACH•Historical Data Analysis•Field Experiment comparing quality control models

Page 7: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

HISTORICAL DATA ANALYSIS• Quality (estimated based on A-B agreement)• Measures difficulty more than actual quality• Underestimates quality, since an experienced Arbitrator

reviews all A-B disagreements• Good at capturing differences across people, fields, and

projects• Time (calculated using keystroke-logging data)• Idle time is tracked separately, making actual time

measurements more accurate• Outliers removed

Page 8: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY FIELD

Page 9: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY LANGUAGE

English Language• Given Name: 79.8• Surname: 66.4

French Language• Given Name: 62.7%• Surname: 48.8%

1871 Canadian Census

Page 10: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCE

Birth Place: All U.S. CensusesB

(nov

ice ↔

exp

ert)

A (novice ↔ expert)

Page 11: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCE

Given Name: All U.S. CensusesB

(nov

ice ↔

exp

ert)

A (novice ↔ expert)

Page 12: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCE

Surname: All U.S. CensusesB

(nov

ice ↔

exp

ert)

A (novice ↔ expert)

Page 13: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCE

Gender: All U.S. CensusesB

(nov

ice ↔

exp

ert)

A (novice ↔ expert)

Page 14: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A-B AGREEMENT BY EXPERIENCEU.S. - English Canada - English

Canada - FrenchMexico - Spanish

Page 15: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

TIME & KEYSTROKE BY EXPERIENCE

Page 16: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

TIME & KEYSTROKE OF ARB

Page 17: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

A NEW APPROACH? (A-R-ARB)

• Peer review model• Efficiency ++•Quality ?

Page 18: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

PEER REVIEW PROCESS (A-R-ARB)

A R ARB

Already Filled In Optional?

Page 19: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

FIELD EXPERIMENT

• Develop Truth Set of 2,000 1930 Census images• Use historical A-B-ARB data• Create new A-R-ARB dataset by having

new indexers review and arbitrate• Compare quality & efficiency• Qualitatively identify types of errors

Page 20: DEREK HANSEN, JAKE GEHRING, PATRICK SCHONE, AND MATTHEW REID FAMILY HISTORY TECHNOLOGY WORKSHOP FEBRUARY 3, 2012 IMPROVING INDEXING EFFICIENCY  QUALITY:

DISCUSSIONIMPLICATIONS• Transition users from novice to expert• Recruit foreign language indexers• Intelligent matching based on expertise

(in A-B-ARB &/or A-R-ARB)

FUTURE POSSIBILITIES• Peer review by algorithms?• Initial indexing by algorithms?