improving accessibility of archived raster dictionaries of complex script languages
TRANSCRIPT
Improving Accessibility ofArchived Raster Dictionaries of
Complex Script Languages
Computer Science Department, Old Dominion UniversityNorfolk, Virginia - 23529
Sawood Alam
National University of Sciences and TechnologyIslamabad, Pakistan
Fateh ud din B Mehmood
Computer Science Department, Old Dominion UniversityNorfolk, Virginia - 23529
Michael L. Nelson
OK Google, Define Dictionarya book or electronic resource that liststhe words of a language (typically inalphabetical order) and gives theirmeaning, or gives the equivalent wordsin a different language, often alsoproviding information aboutpronunciation, origin, and usage.
Dictionaries Are DifferentRead: random accessWrite: maintain sort orderThe most compact mode topreserve a language
Problem: English Dictionary
Johnson's English dictionary
Unicode CollationOrdered assembly of written informationUnicode values != natural collationArabic script: U+0600 to U+06FFOut of order alphabets in derived languagesCommon Locale Data Repository (CLDR)
Nested OrderingRoot word sorting (Arabic)
Morphological derivationDerived word simplification
Radicals and strokes (Chinese)
Dictionary ExplorerMultilingual Multi-dictionary LookupSearching and ExploringAnnotation and digitizationUser Contribution and FeedbackOpen Source => GitHub:/urduweb/DictionaryExplorer
Dictionary Explorer: English
Dictionary Explorer: English
Dictionary Explorer: Urdu
Dictionary Explorer: Urdu
Indexing TimeDictionary Pages Index Mode Time
English toUrdu
180 Sparse Manual andScript
10minutes
MonolingualUrdu
2,500 Sparse Manual 2 hours
MonolingualClassic Urdu
3,200 Full* Crowdsource** 60 days
* 75,000 words, phrases, proverbs, and idioms** 13 contributors
Conclusions and Future WorkIdentified issues
Too many matchesLack of fielded searchingLack of OCR supportNo input method assistance
Collation chalangesAccessibility levels: Ordered Pages, Sparse, Full, andLocation indexes, annotation, and digitizationImplemented a multi-lingual multi-dictionary explorerEffort and prefix evaluationIn future: elastic index and automatic region estimsteGitHub:/urduweb/DictionaryExplorer