word prediction in hebrew preliminary and surprising results
DESCRIPTION
Word Prediction in Hebrew Preliminary and Surprising Results. Yael Netzer Meni Adler Michael Elhadad Department of Computer Science Ben Gurion University, Israel. Outline . Objectives and example. Methods of Word Prediction Hebrew Morphology Experiments and Results Conclusions?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/1.jpg)
August 6th ISAAC 2008
Word Prediction in Hebrew
Preliminary and Surprising ResultsYael Netzer
Meni AdlerMichael Elhadad
Department of Computer ScienceBen Gurion University, Israel
![Page 2: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/2.jpg)
August 6th ISAAC 2008
Outline • Objectives and example.• Methods of Word Prediction• Hebrew Morphology• Experiments and Results• Conclusions?
Outline
![Page 3: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/3.jpg)
August 6th ISAAC 2008
Word Prediction - Objectives
• Ease word insertion in textual software – by guessing the next word– by giving a list of possible options for the
next word– by completing a word given a prefix
• General idea: guess the next word given the
previous ones[Input w1 w2] [guess w3]
Objectives
![Page 4: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/4.jpg)
August 6th ISAAC 2008
(Example)I s_____
Word Prediction Example
![Page 5: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/5.jpg)
August 6th ISAAC 2008
(Example)I s_____ verb, adverb?
Word Prediction Example
![Page 6: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/6.jpg)
August 6th ISAAC 2008
(Example)I s_____ verb
sang? maybe. singularized? hopefully
Word Prediction Example
![Page 7: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/7.jpg)
August 6th ISAAC 2008
(Example)I saw a _____
Word Prediction Example
![Page 8: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/8.jpg)
August 6th ISAAC 2008
(Example)I saw a _____ noun / adjective
Word Prediction Example
![Page 9: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/9.jpg)
August 6th ISAAC 2008
(Example)I saw a b____
Word Prediction Example
![Page 10: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/10.jpg)
August 6th ISAAC 2008
(Example)I saw a b____ brown? big? bear?
barometer?
Word Prediction Example
![Page 11: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/11.jpg)
August 6th ISAAC 2008
(Example)I saw a bird in the _____
Word Prediction Example
![Page 12: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/12.jpg)
August 6th ISAAC 2008
(Example)I saw a bird in the _____ [semantics will
do good]
Word Prediction Example
![Page 13: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/13.jpg)
August 6th ISAAC 2008
(Example)I saw a bird in the z____
Word Prediction Example
![Page 14: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/14.jpg)
August 6th ISAAC 2008
(Example)I saw a bird in the z____ obvious (?)
Word Prediction Example
![Page 15: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/15.jpg)
August 6th ISAAC 2008
Statistical Methods• Statistical information
– Unigrams: probability of isolated words• Independent of context, offer the most likely
words as candidates – More complex language models (Markov
Models)• Given w1..wn, determine most likely candidate for
wn+1
– Most common method in applications is the unigram (see references in [Garay-Vitoria and Abascal, 2004])
Word Prediction Methods
![Page 16: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/16.jpg)
August 6th ISAAC 2008
Syntactic Methods• Syntactic knowledge
– Consider sequences of part of speech tags[Article] [Noun] predict [Verb]
– Phrase structure[Noun Phrase] predict [Verb]
– Syntactic knowledge can be statistical or based on hand-coded rules
Word Prediction Methods
![Page 17: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/17.jpg)
August 6th ISAAC 2008
Semantic Methods• Semantic knowledge
– Assign semantic categories to words – Find a set of rules which constrain the
possible candidates for the next word• [eat verb] predict [word of category food]
– Not widely used in word prediction, mostly because it requires complex hand coding and is too inefficient for real-time operation
Word Prediction Methods
![Page 18: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/18.jpg)
August 6th ISAAC 2008
Word Prediction Knowledge Sources
• Corpora: texts and frequencies• Vocabularies (Can be domain specific)• Lexicons with syntactic and/or semantic
knowledge• User’s history • Morphological analyzers• Unknown words models
Word Prediction Methods
![Page 19: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/19.jpg)
August 6th ISAAC 2008
Evaluation of Word Prediction
• Keystroke savings• Time savings • Overall satisfaction
– Cognitive overload (length of choice list vs. accuracy).
• A predictor is considered adequate if its hit ratio is high as the required number of selections decreases.1-(# of actual keystrokes/# of expected keystrokes)
Word Prediction Evaluation
![Page 20: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/20.jpg)
August 6th ISAAC 2008
Work in non-English Languages
• Languages with rich morphology:– n-gram-based methods offer quite reasonable
prediction [Trost et al. 2005] but can be improved with more sophisticated syntactic/semantic tools
• Suggestions for inflected languages (e.g. Basque)– Use two lexicons: stems and suffixes– Add syntactic information to dictionaries and
grammatical rules to the system, offer stems and suffixes
– Combine these two approaches: offer inflected nouns.
Hebrew Word Prediction
![Page 21: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/21.jpg)
August 6th ISAAC 2008
Motivation for Hebrew
• We need word prediction for Hebrew– No known previous published research for
Hebrew.
• We wanted to test our morphological analyzer in a useful application.
Hebrew
![Page 22: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/22.jpg)
August 6th ISAAC 2008
Initial Hypothesis
Word prediction in Hebrew will be complicated,
morphological and syntactic knowledge will be
needed.
![Page 23: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/23.jpg)
August 6th ISAAC 2008
Hebrew Ambiguity• Unvocalized writing: most vowels are “dropped”
inherent inhrnt • Affixation: prepositions and possessives are
attached to nounsin her note inhrntin her net inhrnt
• Rich Morphology– ‘inhrnt’ could be inflected into different forms
according to sing/pl, masc/fem properties. inhrnti, inhrntit, inhrntiot
– Other morphological properties may leave ‘inherent’ unmodified (construct/absolute forms for noun compounding).
Hebrew
![Page 24: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/24.jpg)
August 6th ISAAC 2008
Ambiguity Level• These variations create a high level of ambiguity:
– English lexicon: inherent inherent.adj– With Hebrew word formation rules:
inhrnt in.prep her.pro.fem.poss note.noun in.prep her.pro.fem net.noun inherent.adj.masc.absolute inherent.adj.masc.construct
• Parts of speech tagset:– Hebrew: Theoretically: ~300K, In practice: ~3.6K distinct
forms– English: 45-195 tags
• Number of possible morphological analyses per word:– English: 1.4 (Average # words / sentence: 12)– Hebrew: 2.7 (Average # words / sentence: 18)
Hebrew
![Page 25: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/25.jpg)
August 6th ISAAC 2008
(Real Hebrew) Morphological Ambiguity
• bzlm בצלם– bzelem (name of an association) בצלם– b-zalem (while taking a picture) בצלם– bzalam (their onion) בצלם– b-zila-m (under their shades) בצלם– b-zalam (in a photographer) בצלם– )ba-zalam (in the photographer בצלם– )b-zelem (in an idol בצלם– )ba-zelem (in the idol בצלם
Hebrew Morphology
![Page 26: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/26.jpg)
August 6th ISAAC 2008
Morphological AnalysisGiven a written form, recover the following
information:• Lexical category (part-of-speech)
– noun, verb adjective, adverb, preposition…• Inflectional properties
– gender, number, person, tense, status…• Affixes
– Prefixes: מ ש ה ו כ ל ב (prepositions, conjunctions, definiteness)
– Pronoun suffix: accusative, possessive, nominative
Hebrew Morphology
![Page 27: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/27.jpg)
August 6th ISAAC 2008
Morphological AnalysisExample: given the form בצלם propose the following
analyses:• בצלם
– proper-noun בצלם• בצלם
– verb, infinitive בצלם• בצלם
– noun, singular, masculine בצל-ם• בצלם
– noun, singular, masculine ב-צל-ם• בצלם בצלם
– noun, singular, masculine, absolute ב-צלם– noun, singular, masculine, construct ב-צלם
• בצלם בצלם – noun, definitive singular, masculine ב-צלם
Hebrew Morphology
![Page 28: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/28.jpg)
August 6th ISAAC 2008
Morphological Disambiguation
A difficult task in Hebrew:
Given a written form, select in context the correct morphological analysis out of all possible analyses.
We have developed a successful* system to perform morphological disambiguation in Hebrew [Adler et al, ACL06, ACL07, ACL08].
*93% for POS tagging and 90% for full morphology analysis, which was used in this test)
Hebrew Morphology
![Page 29: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/29.jpg)
August 6th ISAAC 2008
Word Prediction in Hebrew• We looked at Word Prediction as a
sample task to show off the quality of our Morphological Disambiguator
• But first… we checked a simple baseline
Hebrew Word Prediction
![Page 30: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/30.jpg)
August 6th ISAAC 2008
Baseline: n-gram methods• Check n-gram methods (unigram,
bigram, trigram)• Four sizes of selection menus: 1, 5, 7
and 9• Various training sets of 1M, 10M and
27M words to learn the probabilities of n-grams.
• Various genres.
Hebrew Word Prediction
![Page 31: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/31.jpg)
August 6th ISAAC 2008
Prediction results using n-grams only
Hebrew Word Prediction
Keystrokes needed to enter a message in % (Smaller is better)
For tri-grams model trained on 27M corpus – very good results!
![Page 32: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/32.jpg)
August 6th ISAAC 2008
Adding Syntactic Information
P(wn|w1,…,wn-1) = λ1P(wn-i,…,wn|LM) + λ2P(w1,…,wn|μ),– μ is the morpho-syntactic HMM (morphological disambiguator)– Combine P(w1,…,wn|μ) with the probabilistic language
model LM in order to rank each word candidate given previous typed words.
– if the user typed I saw, and the next word candidates are
{him, hammer}we use the HMM model, for calculating: p(I saw him|μ) p(I saw hammer|μ), in order to tune the probability given by the n-gram.
* Trained on a 1M sized corpus.Hebrew Word Prediction
![Page 33: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/33.jpg)
August 6th ISAAC 2008
Results with morpho-syntactic knowledge
Hebrew Word Prediction
Model sequences of parts of speech with morphological features
Results w/o syntactic knowledge
![Page 34: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/34.jpg)
August 6th ISAAC 2008
Some Notes on Results• n-grams perform very well (high level of
keystroke saving)• High rate for all genres• And the expected:
– Better prediction when trained on more data– Better prediction with tri-grams– Better prediction with larger window
• Morpho-syntactic information did not improve results (in fact, it hurt!)
Results
![Page 35: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/35.jpg)
August 6th ISAAC 2008
Conclusion• Statistical data on a language with rich
morphology yields good results – up to 29% with nine word proposals– 34% for seven proposals– 54% for a single proposal
• Syntactic information did not improve the prediction.
• Explanation - morphology didn't improve due the use of p(w1,…,wn|μ) of an unfinished sentence
Hebrew Word Prediction - Conclusions
![Page 36: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/36.jpg)
August 6th ISAAC 2008
תודה
Thank you
![Page 37: Word Prediction in Hebrew Preliminary and Surprising Results](https://reader036.vdocuments.net/reader036/viewer/2022062501/5681683f550346895dde0d32/html5/thumbnails/37.jpg)
August 6th ISAAC 2008
Technical Information• CMU – N-grams• Storage – Berkeley DB to store
knowledge for WP: Mapping n-grams• More questions on technology – [email protected]
Hebrew Word Prediction