measuring the similarity between implicit semantic relations using web search engines
DESCRIPTION
MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES. Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka (WSDM’09) Speaker : Yi-Ling Tai Date : 2009/11/23. OUTLINE. Introduction Method Retrieving Contexts Extracting Lexical Patterns - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/1.jpg)
MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES
Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuka(WSDM’09)
Speaker : Yi-Ling TaiDate : 2009/11/23
1
![Page 2: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/2.jpg)
OUTLINE Introduction Method
Retrieving Contexts Extracting Lexical Patterns Identifying Semantic Relations Measuring Relational similarity
Experiments Conclusions
2
![Page 3: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/3.jpg)
INTRODUCTION Implicit semantic relations between two words
Google, Youtube (acquisition) Ostrich, bird (is a large)
Similar semantic relations between two words pairs Google, Youtube → Yahoo, Inktomi Ostrich, bird → lion, cat
This paper proposed a method to compute the similarity between implicit semantic relations in two word-pairs.
3
![Page 4: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/4.jpg)
OUTLINE OF THE SIMILARITY METHOD
4
![Page 5: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/5.jpg)
OUTLINE OF THE SIMILARITY METHOD Web search component
query a Web search engine to find the contexts Pattern extraction component
extract lexical patterns that express semantic relations
Pattern clustering component cluster the patterns to identify particular relation
Similarity computation component. compute the relational similarity between two
word-pairs
5
![Page 6: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/6.jpg)
RETRIEVAL CONTEXTS Snippets - brief summaries provided by Web
search engines along with the search results. containing two words, captures the local context
query “Google * *YouTube”
6
![Page 7: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/7.jpg)
RETRIEVAL CONTEXTS “ * ” - wildcard operator, matches one word or
none.
To retrieve snippets for a word pair (A,B) “A * B”, “B * A”, “A * * B”, “B * * A”,“A * * * B”, “B *
* * A”, and A B query words co-occur within a maximum of three
words “ ” ensure that the two words appear in the order
remove duplicates if they contain the exact sequence of all words 7
![Page 8: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/8.jpg)
EXTRACTING LEXICAL PATTERNS shallow lexical pattern extraction algorithm
extract the semantic relations between two words from web snippets.
not require language preprocessing
Consist of the following three steps Step 1:
Replace two words with two variables X and Y replace all numeric values by D do not remove punctuation marks
8
![Page 9: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/9.jpg)
EXTRACTING LEXICAL PATTERNS Step 2:
Exactly one X and one Y must exist in a subsequence The maximum length of a subsequence is L words. Gaps should not exceed g words. Total length of all gaps should not exceed G words. expand all negation contractions, didn’t → did not
Step 3: select subsequences with frequency greater than N
9
![Page 10: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/10.jpg)
EXTRACTING LEXICAL PATTERNS a modified prefixspan algorithm
consider all the words in a snippet not limited to extracting patterns from only the
mid-fix
X to acquire Y, X acquire Y, X to acquire Y for.10
![Page 11: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/11.jpg)
IDENTIFYING SEMANTIC RELATIONS A semantic relation can be expressed using
more than one pattern.
If there are many related patterns between two word-pairs, we can expect a high relational similarity.
cluster lexical patterns using their distributions over word-pairs , to identify semantically related patterns.
11
![Page 12: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/12.jpg)
12
![Page 13: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/13.jpg)
IDENTIFYING SEMANTIC RELATIONS p : word-pair frequency vector of pattern p : frequency of pattern p occurs with
the word-pair SORT : sorts the patterns in the descending
order of their total occurrence in all word-pairs
c : the vector sum of all word-pair frequency vectors corresponding to the patterns that belong to that cluster.
: denote the vector addition : similarity threshold 13
![Page 14: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/14.jpg)
MEASURING RELATIONAL SIMILARITY : feature vector of a word-pair
Elements of the feature vector , are the total frequencies of the word-pair in each cluster.
the relational similarity between two word-pairs
is a correlation matrix 14
![Page 15: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/15.jpg)
MEASURING RELATIONAL SIMILARITY the correlation between clusters and by
the element in
is the union between the two clusters
15
![Page 16: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/16.jpg)
EXPERIMENTS Dataset
100 instances (word or named-entity pairs)
five relation types ACQUIRER-ACQUIREE PERSON-BIRTHPLACE CEO-COMPANY COMPANY-HEADQUARTERS PERSON-FIELD
16
![Page 17: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/17.jpg)
EXPERIMENTS manually select 20 instances for each types.
Wikipedia online newspapers company reviews
For each instance, download snippets using YahooBOSS API
17
![Page 18: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/18.jpg)
EXPERIMENTS - LEXICAL PATTERNS Lexical Patterns
run the pattern extraction algorithm L = 5, g = 2, and G = 4. total number of unique patterns is 473910
we only select the 148655 patterns that occur at least twice. 18
![Page 19: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/19.jpg)
EXPERIMENTS - PATTERN CLUSTERS Ratio : singletons to total number of clusters
19
![Page 20: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/20.jpg)
EXPERIMENTS -RELATION CLASSIFICATION We evaluate the proposed relational similarity
measure in a relation classification task. k-nearest neighbor classification
classification accuracy
average precision
Rel(r) : a binary valued function that returns 1 if the word-pair at rank r has the same relation 20
![Page 21: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/21.jpg)
EXPERIMENTS -RELATION CLASSIFICATION
= 0.955 2629 non-singleton clusters 6930 singletons
21
![Page 22: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/22.jpg)
EXPERIMENTS -RELATION CLASSIFICATION the top 10 clusters with the largest number
of lexical patterns. the top four patterns that occur in most
number of word-pairs
22
![Page 23: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/23.jpg)
RELATIONAL SIMILARITY MEASUREScompare the relational similarity measures VSM:
each word-pair is represented by a vector of pattern frequencies
the relational similarity between two word-pairs is computed as the cosine similarity
LRA: Latent Relational Analysis Create a matrix in which the rows represent
word-pairs and the columns represent lexical patterns
singular value decomposition (SVD) 23
![Page 24: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/24.jpg)
RELATIONAL SIMILARITY MEASURES IP:
set in Formula 2 to the identity matrix compute relation similarity using pattern clusters
CORR: the proposed relational similarity measure.
24
![Page 25: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/25.jpg)
RELATIONAL SIMILARITY MEASURES
25
![Page 26: MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES](https://reader036.vdocuments.net/reader036/viewer/2022081422/56813790550346895d9f3230/html5/thumbnails/26.jpg)
CONCLUSIONS We proposed a method to compute the
similarity between implicit semantic relations in two word-pairs. only a few queries to compute quickly compute relational similarity for unseen
word-pairs a general framework - designing relational similarity
measures can be modeled as searching for a matrix
26