improving family search indexing efficiency and quality
DESCRIPTION
RootsTech workshop presentationTRANSCRIPT
![Page 1: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/1.jpg)
D E R E K H A N S E N , J A K E G E H R I N G , PAT R I C K S C H O N E , A N D M AT T H E W R E I D
FAMILY HISTORY TECHNOLOGY WORKSHOPFEBRUARY 3, 2012
IMPROVING INDEXING EFFICIENCY & QUALITY:COMPARING A-B-ARBITRATE AND PEER REVIEW
![Page 2: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/2.jpg)
FAMILYSEARCH
![Page 3: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/3.jpg)
FAMILYSEARCH INDEXING
![Page 4: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/4.jpg)
A-B-ARBITRATE PROCESS (A-B-ARB)
A
B
ARB
![Page 5: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/5.jpg)
THE PROBLEM
Time
Am
ou
nt Scanned
Documents
![Page 6: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/6.jpg)
OUR APPROACH
•Historical Data Analysis•Field Experiment comparing quality control models
![Page 7: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/7.jpg)
HISTORICAL DATA ANALYSIS
• Quality (estimated based on A-B agreement)• Measures difficulty more than actual quality• Underestimates quality, since an experienced Arbitrator
reviews all A-B disagreements• Good at capturing differences across people, fields, and
projects
• Time (calculated using keystroke-logging data)• Idle time is tracked separately, making actual time
measurements more accurate• Outliers removed
![Page 8: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/8.jpg)
A-B AGREEMENT BY FIELD
![Page 9: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/9.jpg)
A-B AGREEMENT BY LANGUAGE
English Language
• Given Name: 79.8• Surname: 66.4
French Language
• Given Name: 62.7%• Surname: 48.8%
1871 Canadian Census
![Page 10: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/10.jpg)
A-B AGREEMENT BY EXPERIENCE
Birth Place: All U.S. Censuses
B (
novic
e ↔
exp
ert
)
A (novice ↔ expert)
![Page 11: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/11.jpg)
A-B AGREEMENT BY EXPERIENCE
Given Name: All U.S. Censuses
B (
novic
e ↔
exp
ert
)
A (novice ↔ expert)
![Page 12: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/12.jpg)
A-B AGREEMENT BY EXPERIENCE
Surname: All U.S. Censuses
B (
novic
e ↔
exp
ert
)
A (novice ↔ expert)
![Page 13: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/13.jpg)
A-B AGREEMENT BY EXPERIENCE
Gender: All U.S. Censuses
B (
novic
e ↔
exp
ert
)
A (novice ↔ expert)
![Page 14: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/14.jpg)
A-B AGREEMENT BY EXPERIENCE
U.S. - English Canada - English
Canada - FrenchMexico - Spanish
![Page 15: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/15.jpg)
TIME & KEYSTROKE BY EXPERIENCE
![Page 16: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/16.jpg)
TIME & KEYSTROKE OF ARB
![Page 17: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/17.jpg)
A NEW APPROACH? (A-R-ARB)
• Peer review model• Efficiency ++•Quality ?
![Page 18: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/18.jpg)
PEER REVIEW PROCESS (A-R-ARB)
A R ARB
Already Filled In Optional?
![Page 19: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/19.jpg)
FIELD EXPERIMENT
• Develop Truth Set of 2,000 1930 Census images• Use historical A-B-ARB data• Create new A-R-ARB dataset by having
new indexers review and arbitrate• Compare quality & efficiency• Qualitatively identify types of errors
![Page 20: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/20.jpg)
DISCUSSION
IMPLICATIONS• Transition users from novice to expert• Recruit foreign language indexers• Intelligent matching based on expertise
(in A-B-ARB &/or A-R-ARB)
FUTURE POSSIBILITIES• Peer review by algorithms?• Initial indexing by algorithms?
![Page 21: Improving Family Search Indexing Efficiency and Quality](https://reader034.vdocuments.net/reader034/viewer/2022051610/5487387db4af9f640d8b5321/html5/thumbnails/21.jpg)
QUESTIONS
• Derek Hansen ([email protected])• Jake Gehring ([email protected])• Patrick Schone ([email protected])• Matthew Reid ([email protected])