from “meaning”s to words İlknur durgar el-kahlout
TRANSCRIPT
![Page 1: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/1.jpg)
FROM “Meaning”s FROM “Meaning”s
TO WordsTO Words
İlknur DURGAR
EL-KAHLOUT
![Page 2: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/2.jpg)
Problem
For a given definition, find the appropriate word (or words), that has a similar definition– traditional dictionary no use
![Page 3: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/3.jpg)
Examples
Akımı ölçmek için kullanılan alet akımölçer(A device that is used to measure the current ammeter)
akımölçer: elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre
(ammeter: a device that measures the intensity of electrical current, amperemeter)
![Page 4: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/4.jpg)
Examples
Çalıştığı işten kendi isteği ile ayrılmak istifa(Leaving one’s job voluntarily resignation)
istifa: kendi isteği ile görevden ayrılma(resignation: leaving voluntarily, of a position)
![Page 5: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/5.jpg)
Applications
Computer-assisted language learning Solving crossword puzzles Reverse dictionary
![Page 6: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/6.jpg)
Outline
Problem Statement Challenges Our Approach Methods Results Result Summary Conclusion
![Page 7: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/7.jpg)
Problem Statement
For example, one knows the meaning of the word akımölçer (ammeter):
Akımı ölçmek için kullanılan alet (A device that is used to measure the current)
However the actual definition of the word in the dictionary is:
elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre
(a device that measures the intensity of electrical current, amperemeter)
![Page 8: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/8.jpg)
Problem Statement
Find the similarity between two definitions Akımı ölçmek için kullanılan alet (A device that is used to measure the current)
elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre
(a device that measures the intensity of electrical current, amperemeter)
![Page 9: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/9.jpg)
Meaning-to-Word (MTW)
Meaning-to-Word System (MTW)– attacks the problem of finding the appropriate
word (or words), whose meaning “matches” the given definition
![Page 10: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/10.jpg)
Challenges
Two challenging problems
– finding words whose definitions are "similar" to the query in some sense.
– ranking the candidate words using a variety of ways.
![Page 11: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/11.jpg)
Information flow in MTW
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
![Page 12: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/12.jpg)
Meanings To Words (MTW)
The problem of retrieving words from their "meaning"s at first sight seems to be an information retrieval problem
![Page 13: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/13.jpg)
Information Retrieval (IR)
responds to the user's query by selecting documents from a database and ranking them in terms of relevance.
uses (mostly) statistical and symbolic techniques to retrieve documents for a given query, employing shallow natural language analysis.
![Page 14: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/14.jpg)
Similarities between MTW and IR
Goals – Select relevant items from a collection based
on a query
Collections
– Collection Dictionary
Documents: – Documents Definitions
![Page 15: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/15.jpg)
Similarities between MTW and IR
Approaches:– compare the user request with each of the
information in the collection Ranking:
– most important task– But ranking strategies are different
![Page 16: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/16.jpg)
Differences between IR and MTW
Expected results:– Many relevant documents vs. only one correct word
Query Expression:– Keywords vs. sentence (or phrases)
Space size: – Long documents (avg. 300 - 400 words ) vs. one
sentence long definitions (avg. 10 - 20 words)– Huge collection(106-109doc) vs. medium dictionary
(105 word definitions)
![Page 17: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/17.jpg)
Available Resources
Turkish Dictionary Turkish Wordnet
![Page 18: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/18.jpg)
Normalization
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
Normalization
![Page 19: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/19.jpg)
Normalization
Tokenization: – All inter-word (non-word, non-digit) symbols eliminated (ex.
Punctuation). – Each word is a term
Stemming: – same stem but different affixes– enables matching different morphological variants of the original
definition's words Stop Word Elimination:
– have little or no meaning– Frequency (very frequent words)– Linguistic (determiners, prepositions, pronouns,..)
![Page 20: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/20.jpg)
Query Processing
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
Query Processing
![Page 21: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/21.jpg)
Query Processing
Subset Generation:– Search with different set of words– Select informative words from user’s query
Query: hiç evlenmemiş kişi (a person who has never been married)
* {önce, evlenmemiş, kişi}(before, unmarried, person)
* {evlenmemiş, kişi} {önce, kişi} {önce, evlenmemiş} (unmarried, person) (before, person) (before, unmarried)
*{evlenmemiş} {önce} {kişi} (unmarried) (before) (person)
![Page 22: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/22.jpg)
Query Processing
Subset Sorting:– Unordered list of subsets are insufficient
• Top-down sorting
– Rank the generated subsets 1) By the number of words
Ex: {önce,evlenmemiş, kişi} (before, unmarried, person) vs. {evlenmemiş, kişi} (unmarried, person)
2) By the sum of frequency logarithmEx:{evlenmemiş, kişi} (unmarried, person) vs. {önce, kişi} (before, person)
![Page 23: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/23.jpg)
Searching for “Meaning”s
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
![Page 24: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/24.jpg)
Searching for “Meaning”s
Two methods – Stem Match– Query Expansion
![Page 25: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/25.jpg)
Stem Match
Morphological normalization of words– Find meanings that contain morphological
variants of the original definition
![Page 26: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/26.jpg)
Stem Match (Ex.)
{A device that is used to measure the current}
{ akımı ölçmek için kullanılan alet }
ak (white) ölç (measure) için (to) kullan (use) alet (device)
akım (current) iç (drink) kul (slave)
akı (flux)
![Page 27: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/27.jpg)
Stem Match
akımı ölçmek için kullanılan alet - A device that is used to measure the current
elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre - a device that measures the intensity of electrical current, amperemeter
![Page 28: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/28.jpg)
Stem Match Drawback:
– Conflate two words with very different meanings to the same stem
(ex: yüksek (high) yüksek (high), yük (load)
ilim (science, my city), ilde (in the city) il (city))
– Cant find relations between similar words
(ex: kimse (someone) kişi (person) ,
bölüm (part) kısım (portion))
![Page 29: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/29.jpg)
Query Expansion
The users of retrieval systems often use different words to describe the concepts in their queries than the authors use to describe the same concept in their documents.
In experiments, two people use the same term to describe an object less than 20% of the time.(Furnas 1987).
![Page 30: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/30.jpg)
Using Query Expansion
Two different approaches:• Expand query with relations (synonyms,
specializations, generalizations)• Expand query with unexpanded query’s
relevant answers
Synonym relation used in MTW Ex:{besin,gıda} (food, nourishment)
{iyileş,düzel} (to get better) /{iyileş,geliş} (to improve)
![Page 31: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/31.jpg)
Query Expansion (Ex.)
{A device that is used to measure the current}
{ akımı ölçmek için kullanılan alet }
*ak (white) ölç (measure) için (to) ***kullan (use) alet (device)
akım (current) iç (drink)****kul (slave)
**akı (flux)
*beyaz ölçüm ***faydalan araç
**debi ***yararlan gereç
**akış ****köle
![Page 32: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/32.jpg)
Query Expansion (Ex.)
akımı ölçmek için kullanılan alet - A device that is used to measure the current
elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre - a device that measures the intensity of electrical current, amperemeter
![Page 33: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/33.jpg)
Ranking
User Definition
Search in Dictionary
Rank Candidates
query
candidates
List of words
![Page 34: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/34.jpg)
Ranking
The main goal of a retrieval system is to find the documents that are relevant to a query.
Documents that are likely to be more relevant
should be ranked at the top and documents that are likely to be less relevant should be ranked at the bottom of the ranked list. (Hiemstra 1999)
![Page 35: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/35.jpg)
Ranking
Most important part of MTW– Having the right answer in the retrieved set is
not enough– Aim is to have the right answer at top of the
retrieved set (Ex: in first top 50 answers)
![Page 36: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/36.jpg)
Ranking
Simple but effective methods– Subset informativeness (subset sorting)– Number of matched words (subset sorting)– Length of the candidate definition– Longest Common Subsequence
![Page 37: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/37.jpg)
Some statistics Train sets:
– 50 queries from real users
– 50 queries from a dictionary Test sets:
– 50 queries from real users – 50 queries from a dictionary
Test set 1 Train set 2 Test set 1 Train set 2
# of queries 50 50 50 50
Avg. # of query words
5.66 4.64 9.24 13.98
Max. # of query words
17 12 23 45
Min. # of query words
2 1 1 6
![Page 38: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/38.jpg)
Stem Match (all stems included)
Rank Test set 1 Train set 1 Test set 2 Train set 2
1-10 13 (26%) 18 (36%) 45 (90%) 41 (82%)
11-50 7 (14%) 12 (24%) 2 (4%) 5 (10%)
51-100 4 (8%) 1 (2%) 1 (2%) 2 (4%)
101-300 3 (6%) 3 (6%) 2 (4%) 1 (2%)
301-500 2 (4%) 2 (4%) 0 (0%) 1 (2%)
501-1000 6 (12%) 2 (4%) 0 (0%) 0 (0%)
Over 1000 4 (8%) 2 (4%) 0 (0%) 0 (0%)
Not found 11 (22%) 10 (20%) 0 (0%) 0 (0%)
![Page 39: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/39.jpg)
Stem Match (longest stem included)
Rank Test set 1 Train set 1 Test set 2 Train set 2
1-10 14 (28%) 21 (42%) 46 (92%) 43 (86%)
11-50 5 (10%) 9 (18%) 1 (2%) 5 (10%)
51-100 4 (8%) 1 (2%) 1 (2%) 1 (2%)
101-300 3 (6%) 1 (2%) 2 (4%) 1 (2%)
301-500 2 (4%) 3 (6%) 0 (0%) 0 (0%)
501-1000 5 (10%) 2 (4%) 0 (0%) 0 (0%)
Over 1000 4 (8%) 2 (4%) 0 (0%) 0 (0%)
Not found 13 (26%) 11 (22%) 0 (0%) 0 (0%)
![Page 40: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/40.jpg)
Query Expansion Match (all stems included)
Rank Test set 1 Train set 1 Test set 2 Train set 2
1-10 14 (28%) 24 (48%) 45 (90%) 41 (82%)
11-50 9 (18%) 9 (18%) 2 (4%) 5 (10%)
51-100 3 (6%) 3 (6%) 1 (2%) 2 (4%)
101-300 7 (14%) 2 (4%) 2 (4%) 1 (2%)
301-500 0 (0%) 1 (2%) 0 (0%) 1 (2%)
501-1000 4 (8%) 5 (10%) 0 (0%) 0 (0%)
Over 1000 4 (8%) 1 (2%) 0 (0%) 0 (0%)
Not found 9 (18%) 5 (10%) 0 (0%) 0 (0%)
![Page 41: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/41.jpg)
Query Expansion Match (longest stem included)
Rank Test set 1 Train set 1 Test set 2 Train set 2
1-10 14 (28%) 24 (48%) 41 (82%) 39 (78%)
11-50 6 (12%) 8 (16%) 5 (10%) 6 (12%)
51-100 5 (10%) 5 (10%) 0 (0%) 2 (4%)
101-300 7 (14%) 2 (4%) 0 (0%) 2 (4%)
301-500 1 (2%) 1 (2%) 0 (0%) 0 (0%)
501-1000 5 (10%) 3 (6%) 0 (0%) 0 (0%)
Over 1000 3 (6%) 2 (4%) 1 (2%) 1 (2%)
Not found 9 (18%) 5 (10%) 0 (0%) 0 (0%)
![Page 42: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/42.jpg)
Data fusion
No single method is better than all others in all cases
Merging results from different methods seems to be promising approach for achieving improved performance
Many data fusion methods including min, max, average, sum, weighted average and other linear combination functions
![Page 43: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/43.jpg)
Data Fusion
Weighted Sum
21 *)(_*)(_)( wwscoreQEwwscoreSMwScore
![Page 44: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/44.jpg)
Data Fusionc1= 0.7 (stem match const.)
c2= 0.3 (query expansion const.)
Rank Test set 1 Train set 1
1-10 15 (30%) 22 (44%)
11-50 10 (20%) 14 (28%)
51-100 4 (8%) 1 (2%)
101-300 3 (6%) 2 (4%)
301-500 3 (6%) 0 (0%)
501-1000 5 (10%) 3 (6%)
Over 1000 -- --
Not found 11 (22%) 8 (16%)
![Page 45: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/45.jpg)
Result Summary
Stem Match (longest stem included)• 60% real user queries
• 96% dictionary queries
Query Expansion (all stems included)• 68% real user queries
• 92% dictionary queries
Data Fusion (longest stem included)• 72% real user queries
![Page 46: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/46.jpg)
Conclusion
Meaning to Word system is implemented for Turkish language
Results on unseen data are rather satisfactory Query expansion is better
• Although, it can not find the words for all queries
• 68% of real user queries and 90% of dictionary queries are found in the first 50 results
Data fusion has a better performance • 72% of real user queries are found in first 50% results
![Page 47: FROM “Meaning”s TO Words İlknur DURGAR EL-KAHLOUT](https://reader035.vdocuments.net/reader035/viewer/2022081505/551bef52550346b9588b6510/html5/thumbnails/47.jpg)
THANK YOU !!THANK YOU !!