introduction motivation linguistic levels types of mwes approaches to identify mwes limitations...
DESCRIPTION
Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References. Put the sweater on Put the sweater on the table Put the light on. Put the sweater on Put the sweater on the table Put the light on Roughly defined as: - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/1.jpg)
Multiword Expressions
Presented by:
Bhuban Seth (09305005)
Somya Gupta (10305011)
Advait Mohan Raut (09305923)
Victor Chakraborty (09305903)
Under the guidance of: Prof. Pushpak Bhattacharya.
![Page 2: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/2.jpg)
Contents
Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References
![Page 3: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/3.jpg)
Introduction
Put the sweater on Put the sweater on the table Put the light on
![Page 4: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/4.jpg)
Introduction
Put the sweater on Put the sweater on the table Put the light on
Roughly defined as: Idiosyncratic interpretations that cross word
boundaries (or spaces)
![Page 5: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/5.jpg)
Examples
His grandfather kicked the bucket. This job is a piece of cake Put the sweater on He is the dark horse of the match
Google Translations of above sentences:
अपने दादा बाल्टी लात मारी
इस काम के केक का एक टुकड़ा है
स्वेटर पर रखो
वह मैच के अंधेरे घोड़ा है
![Page 6: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/6.jpg)
Motivation
Multiword expressions
•“Of the same order of magnitude as the number of single words” (Jakendoff 1977)•41% - WordNet 1.7 (Fellbaum 1999)
Resolution needed in:
•Machine Translation – Google translate Poor performance example•Information Retrieval•Tagging , Parsing , Question Answering System , WSD
![Page 7: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/7.jpg)
Linguistic Levels
•In short, Ad hocLexicology
•Put on weight, Put the sweater on
Morphology and Syntax
•Spill the BeansSemantics
•Kick the Bucket, Kick the bucket filled with waterPragmatics
![Page 8: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/8.jpg)
How to Handle These?
Variation in Flexibility
Syntactic Idiomaticity
![Page 9: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/9.jpg)
Types (Sag et al 2002)
![Page 10: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/10.jpg)
Types - Examples
Type ExampleFixed In Short , Ad hoc, Palo Alto, Alta
VistaCompound Nominals Congressman, Car park, Part of
SpeechProper Names Deccan Chargers, Delhi
DaredevilsNon Decomposable Idioms Kick the Bucket
Decomposable Idioms Spill the Beans, Let the Cat out Verb Particle Constructions Take off, Put on, Light Verb Constructions Give a Demo, Take a Shower
Institutionalized Phrases Black and White, Traffic Light, Telephone booth
![Page 11: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/11.jpg)
Approaches
![Page 12: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/12.jpg)
Knowledge Based Approach
1)Word with space : Fixed expression• Stemmer may be used to
detect MWEs.• But it fails .. Why???• Kicks the bucket MWE• Kick the buckets Not
MWE• Princeton Wordnet – Flaw
2)Circumscribed Constructions:• Consecutive
Nouns Most probably MWE
3) Inflection Head : Semi fixed expression• Ex : part of
speech parts of speech
![Page 13: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/13.jpg)
Statistical Approaches
Co-occurrence properties
Substitutability
Distributional Similarity
Semantic Similarity
![Page 14: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/14.jpg)
Co-occurrence properties
Example: Black and White
Scan a corpus and find probabilities of bigrams and tri-grams.
P(X|Y) = P(XY)/P(Y)
If P(X|Y) is high, then there is a chance that word sequence ‘YX’ is a MWE.
Demerit:• “I am “ Not MWE.
![Page 15: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/15.jpg)
Point-wise Mutual Information (PMI)
PMI(X,Y)= log {P(X,Y)/(P(X).P(Y))}
PMI(X,Y) of a word pair (X,Y) is measure of strength of their
collocation
Other methods like students-t test and Pearson chi-square can also be used.
Demerit:• Need to differentiate between
systematic & chance co-occurrence
![Page 16: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/16.jpg)
Pearson’s chi-square test
Based on assumption of normal distribution of word frequency, which
could be a limitation
Null hypothesis: the words are independent of each other.
Higher the value of the chi-square statistic, the stronger the association
between the words
Demerit:• For small data collections, assumptions
of normality and chi-square distribution do not hold. Hence, large corpus required
![Page 17: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/17.jpg)
Substitutability
The ability to replace parts of lexical items with alternatives.
Alternatives can be similar or opposite words with respect to tasks & approaches.
Mostly after the substitution the new phrase no longer remains MWE.
Can be used to remove possible Non-MWEs
Src: Kim, 2008
![Page 18: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/18.jpg)
Distributional Similarity
A method to extract the semantic similarity using the context
When two words are similar, then their context words are also similar
Src: Kim, 2008
![Page 19: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/19.jpg)
Semantic Similarity
Similar NCs could have same semantic relations
Src: Kim, 2008
![Page 20: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/20.jpg)
Method
Src: Kim, 2008
![Page 21: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/21.jpg)
MWE Resources
•British National Corpus (BNC)•Brown CorpusCorpus•WordNet•Moby’s Thesaurus- contains 30K root words & 2.5M synonyms and related words
Lexical Resources
•WordNet::Similarity- gives measure of semantic similarity between two given wordsTools
![Page 22: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/22.jpg)
Limitations of current Approaches
Many NLP approaches treat MWEs according to the words-with-spaces method
Many approaches get commonly-attested MWE usages right, sometimes using “ad hoc” methods, e.g. preprocessing
However, most approaches handle variation badly, fail to generalize, and result in NLP systems that are difficult to maintain and extend
![Page 23: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/23.jpg)
Conclusion
MWEs have been classified in terms of lexicalized phrases (like fixed , semi fixed and syntactically flexible) and institutionalized phrases.
MWE analysis in NLP is equally important as any of the other domain like MT or WSD.
Hybrid approach is most probably the best method so far to extract MWE from corpus.
![Page 24: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/24.jpg)
References
Kim, S. N. (2008). Statistical modeling of multiword expressions.
Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Filckinger, D. (2001). Multiword Expression : A pain in the neck for the NLP. In the proceeding of the 3rd International conference on Intelligent text processing and computational linguistics.
Calzolari, N. a. (2002). Towards best practice for
multiword expressions in computational lexicons. Proc. of the 3rd International conference of language resources and evaluation, (pp. 1934--40).
![Page 25: Introduction Motivation Linguistic Levels Types of MWEs Approaches to identify MWEs Limitations Conclusion References](https://reader035.vdocuments.net/reader035/viewer/2022062817/5681693f550346895de0beef/html5/thumbnails/25.jpg)
Thank You
Questions???