can bilateral digitization tear down the wall between institutions and the public? ben brumfield...
TRANSCRIPT
Can Bilateral Digitization Tear Down the Wall Between
Institutions and the Public?
Ben BrumfieldDigital Frontiers 2012
“You know Ben that it really stinks that I can't get access to the original. My grandfather Jeremiah wrote the diary so that I could read about his daily life happenings. My grandfather Edward used to own it and if he had known that I would be so interested in it I'm sure he would have kept it and given it to me instead of the university.”Alan Williams, 2009 email
Walls
• Professionally conserved• Publicly accessible • Catalogued• 1000 miles away• Reading room restrictions• “Permission-to-publish” agreements • Costly scanning fees
Penetrating the Walls
• Digitization• Collaboration
Shallow Digitization(Institutional Version)
• “Scan-and-dump” facsimiles– Limited metadata– No transcripts– Not crawlable
Shallow Digitization(Amateur Version)
• Full transcripts– No facsimiles– No provenance– No metadata on sources– Invisible editorial decisions
• Cut-and-paste replication– No attribution
Deep Digitization
• Institutional Challenges– Funding– Manpower
• Non-institutional Challenges– Standards– Access to sources
Crowdsourcing
• Who are the volunteers?• What can they do?
• OldWeather.org• Zenas Matthews• Harry Ransom Center Fragments
Accuracy
• Individual transcriptions are about 97% accurate
• Of 1000 transcribed logbook entries:– 3 will be lost because of transcription errors– 10 will be illegible– At least 3 will be errors in the logs
OldWeather Participation
• More than 1.6 million weather observations.
• 16,000 volunteers.• 1 million log pages transcribed.
• Mean contribution of 100 transcriptions per user.
OldWeather Participation
• More than 1.6 million weather observations.
• 16,000 volunteers.• 1 million log pages transcribed.
• Mean contribution of 100 transcriptions per user – but this statistic is worthless!
Power-law Distribution
• Most contributions are made by a core of well-informed enthusiasts.
• True regardless of project size.
• What are the implications?
One “Well-Informed Enthusiast”
• In 14 days,– Entire diary transcribed– 250 revisions to 43 pages– Two dozen footnotes
Crowdsourcing’s Virtuous Circle
• Volunteers• Deep digitization• Findability• More Volunteers!
One Volunteer’s Story
• Nat Wooding– Retired data analyst– 100 pages of Julia Brumfield’s diaries
transcribed and indexed in six months– No relation to diarist
One Volunteer’s Story
• Nat Wooding– Retired data analyst– 100 pages of Julia Brumfield’s diaries
transcribed and indexed in six months– No relation to diarist
– Great-uncle was diarist’s letter carrier, also named Nat Wooding
Non-institutional Digitization
The Invisible Archive
• Private collections• Family archivists (filing cabinets)
– or their heirs (boxes in the attic)• Non-notable subjects• Flickr
The Standards Problem
• “We can't overemphasize the potential futility of citing websites, any websites,but especially non-institutional websites.”
– Diggitt McLaughlin• (H-SHEAR 2011-04-27)
The Standards Problem
• “Needless to say, amateurs will continue to put out poorly edited versions of documents in print which we, as professionals, will continue to eschew using.”
– Christopher L. Miller • (H-OIEAHC list, 1996-05-07)
Solutions
• Collaboration
• Participation by professionals in amateur projects
• FreeREG/FreeCEN
Solutions
• Community
• Flickr• RootsTech
Solutions
• Software Platforms
• Suggested rigor• Graceful degradation
Thanks!
Ben Brumfield
[email protected]://fromthepage.com/
Slides and transcript to be posted athttp://manuscripttranscription.blogspot.com/