interview presentation
DESCRIPTION
TRANSCRIPT
![Page 1: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/1.jpg)
Dependency Language ModelsJoseph Gubbins
![Page 2: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/2.jpg)
Language Models
Assign a probability to a sentence or phrase
)...( 21 nwwwP
![Page 3: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/3.jpg)
Language Models
Are used in:
Machine translation
![Page 4: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/4.jpg)
Language Models
Are used in:
Speech recognition
![Page 5: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/5.jpg)
Language Models
Are used in:
- Information Retrieval
- Predictive text entry
- Handwriting recognition
![Page 6: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/6.jpg)
N-gram Language Models
Chain rule decomposition:
Assumption: Markov property
n
iiin wwwwPwwwP
112121 )...|()...(
)...|()...|( 121121 iNiNiiii wwwwPwwwwP
![Page 7: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/7.jpg)
N-gram Language Models
Estimate from corpus
Problem: unobserved N-grams cause probability estimate to be zero
Solution: use smoothing techniques
)...(
)...()...|(
121
121121
~
iNiNi
iiNiNiiNiNii wwwCount
wwwwCountwwwwP
![Page 8: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/8.jpg)
N-gram Language Models
Weak point of N-gram language models:
Long range syntactic dependencies are ignored
![Page 9: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/9.jpg)
Sentence Completion Problems
Choose the most probable from a list of possible sentences
Used in standardised tests such as SAT and GRE
![Page 10: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/10.jpg)
Sentence Completion Problems
When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.
- tall- loud- invisible- quick- formidable
Source: Microsoft Sentence Completion Challenge
![Page 11: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/11.jpg)
Sentence Completion Problems
When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.
- tall- loud- invisible- quick- formidable
Source: Microsoft Sentence Completion Challenge
![Page 12: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/12.jpg)
Sentence Completion with 5-grams
5-gram probability: context of 4 words before and after
When his body had been carried from the cellar, we found ourselves with a problem which was almost as ____ as that with which we had started.-> relationship between problem and formidable is missed
![Page 13: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/13.jpg)
Dependency Grammar
Syntactic analysis of sentence
Each word “depends” on another word
For example:Subject and object depend on verbAdjectives depend on what they describe
![Page 14: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/14.jpg)
Dependency Grammar
Dependency relations form a tree structure
For example, for the sentence:
When his body had been carried from the cellar, we found ourselves with a problem which was almost as formidable as that with which we had started.
![Page 15: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/15.jpg)
Dependency Grammar
![Page 16: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/16.jpg)
Dependency Grammar
On the dependency tree, problem and formidable are adjacent.-> Idea: Create dependency language model
![Page 17: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/17.jpg)
Dependency Language Model
Model for the “lexicalisation” of a given dependency tree.
Takes inspiration from N-gram language models.
![Page 18: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/18.jpg)
Dependency Language Model
We denote the ancestor sequence of a word by For example,
![Page 19: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/19.jpg)
Dependency Language Model
We assume:1. each word is conditionally independent of the words outside of its ancestor sequence, given the ancestor sequence
2. the words are independent of the grammatical labels
![Page 20: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/20.jpg)
Dependency Language Model
Let be a breadth-first enumeration of the words in the dependency tree.
Under our assumptions, using the chain rule, we have
![Page 21: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/21.jpg)
Dependency Language Model
Markov Assumption:
where is the sequence of (N – 1) closest ancestors of w.
This leads to:
)()1( wA N
![Page 22: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/22.jpg)
Training Dependency LMDependency parse a large corpus.
Count sequences in dependency tree.
Estimate probability by maximum likelihood estimator:
)),(( w,nsobservatio#
)),(( nsobservatio#))(|(
)1(
)1()1(
~
wwA
wwAwAwP
iN
iiN
iN
i
![Page 23: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/23.jpg)
Using LabelsOur model assigns the same probability to an apple ate you and you ate an apple
![Page 24: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/24.jpg)
Using LabelsSolution: incorporate labels
Assume that each word/label pair is conditionally independent of the rest of the tree given the words/labels in its ancestor sequence.
Use maximum likelihood estimators.
![Page 25: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/25.jpg)
Microsoft Research Sentence Completion Challenge
1040 sentence completion problems, each with 5 possible answers
Training data set of 520 19th century novels
Parsed with MaltParser and trained unlabelled and labelled order N dependency language models for N=2,3,4,5
![Page 26: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/26.jpg)
Microsoft Research Sentence Completion Challenge
Best result of any method apart from Neural Networks
![Page 27: Interview presentation](https://reader033.vdocuments.net/reader033/viewer/2022061202/547c9508b4af9f1b108b4ca1/html5/thumbnails/27.jpg)
Conclusion
Developed two new language models based on dependency grammar
Competitive results on MSR Sentence Completion Challenge
Questions?