yehuda’s dictionary crowdsourcing the digitization of ben- · crowdsourcing the digitization of...
TRANSCRIPT
![Page 1: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/1.jpg)
The Old New ThingCrowdsourcing the Digitization of Ben-
Yehuda’s Dictionary
![Page 2: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/2.jpg)
Don’t Panic
● Hebrew is written right-to-left● ~100 years ago, Eliezer Ben-Yehuda
researched and wrote the largest and (still!) most comprehensive dictionary of the Hebrew language
● It is now public domain
![Page 3: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/3.jpg)
So...
● Scan → Commons● Optical Character Recogition (OCR) →
Wikisource● Proofread on Wikisource● enrich Wiktionary● ...right?
![Page 4: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/4.jpg)
Well...
As they say on Facebook relationship statuses:
It’s complicated...
![Page 5: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/5.jpg)
![Page 6: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/6.jpg)
Hmm...
● Appealing?● Digestible?● Manageable?● Doable?
![Page 7: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/7.jpg)
(My) Solution: go granular
● Major design principle: each headword (definition) is a separate unit of work, with its own status and lifecycle
● To do that: manually (but easily) partition scans into individual definition parts.
● Definitions live their lives: get assigned, transcribed, proofread (3 times!), published, and linked to.
![Page 8: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/8.jpg)
click!
System automagically stitches together entries beginning in page X, continuing through page X+1, and ending in page X+2.
![Page 9: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/9.jpg)
Example of a single (short) entry● digestible, manageable, clearly delineated.● from the moment entries are separated from
each other, everything else happens at the level of the individual entry, not “page”.
![Page 10: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/10.jpg)
The Bulk of the Work
● Web-based transcription system to type, fix-up (type Greek or Arabic), and triple-proofread each individual entry
● Entries move through these stages, and are published as soon as done.
● The dictionary is thus already partially available, with gradually-filled “holes”.
![Page 11: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/11.jpg)
![Page 12: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/12.jpg)
A liiittle bit of markup... Looks familiar? ;)
![Page 13: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/13.jpg)
![Page 14: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/14.jpg)
Permalink to specific entry
1..n entries still to be completed
![Page 15: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/15.jpg)
Click!
![Page 16: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/16.jpg)
Magic!
![Page 17: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/17.jpg)
Biblical citations are automagically linked to the specific verse on Hebrew Wikisource! :)
![Page 18: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/18.jpg)
Numbers
● ~12000 pages in the dictionary● ~300 entries completed during limited beta,
by about 8 active volunteers● (top contributor typed ~450 entries and
proofread ~1150)● After public launch, expect to complete
dictionary within 2 years.
![Page 19: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/19.jpg)
What shall we do tomorrow?
● More links from citations to extant sources online (Wikisource, Project Ben-Yehuda, HebrewBooks.org)
● Export entries as [draft] Hebrew Wiktionary articles
● Digitize other right-to-left dictionaries (Hebrew, Arabic, Farsi, ...)
![Page 20: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/20.jpg)
Technical details
● Tools used: Ruby, Rails, jQuery, HTML5 Canvas, ImageMagick, TinyMCE
● Running on Debian Linux :)● Code available on GitHub: https://github.
com/abartov/ebydict● If you can read Hebrew (and Rashi script),
join us! http://ebydict.benyehuda.org
![Page 21: Yehuda’s Dictionary Crowdsourcing the Digitization of Ben- · Crowdsourcing the Digitization of Ben-Yehuda’s Dictionary. Don’t Panic Hebrew is written right-to-left ~100 years](https://reader034.vdocuments.net/reader034/viewer/2022052010/60201bcd06c2e95c8227eb98/html5/thumbnails/21.jpg)
Project Ben-Yehuda?
● A volunteer-created digital library of public domain Hebrew textual works. cf. [[Project Gutenberg]], [[Project Runeberg]] Hi, Lars! :)
● Founded 1999. (Yes, before Wikipedia...)● over 9000 works digitized, ~200 active
volunteers● Special retro-1980s Web design! :-p