grouping search-engine returned citations for person name queries reema al-kamha research supported...
Post on 19-Dec-2015
219 views
TRANSCRIPT
Grouping Search-Engine Returned Citations for Person Name Queries
Reema Al-Kamha
Research Supported by NSF
The Problem
Search engines return too many citations Example: “Christopher Young” Google returns around 26,500 citations
Many people named “Christopher Young” It would help to group the citations by person. How do we group them?
Three facets Attributes Links Page Similarity
Confidence matrix for each facet
Final confidence matrix
Our Solution
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9
D0 1 0 0 0 0 0 0 0 0 0
D1 1 0 0 0 0.49 0 0 0 0.49
D2 1 0 0 0 0 0 0 0
D3 1 0 0 0 0 0 0
D4 1 0 0 0 0 0.86
D5 1 0 0 0 0
D6 1 0 0 0
D7 1 0 0
D8 1 0
D9 1
Confidence Matrix for Attributes Facet
D1&D5 have the same State. D1&D9 have the same State. D4&D9 have the same City.
Links
Returned citations that have a same host www.cs.byu.edu/info/dwembley.html
www.cs.byu.edu/info/directory.php
One returned citation links to another returned citation.
Confidence Matrix for Links Facet
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9
D0 1 0.99 0 0 0 0.99 0 0 0 0
D1 1 0 0 0 0 0 0 0 0
D2 1 0 0 0 0 0 0 0
D3 1 0 0 0 0 0 0
D4 1 0 0 0 0 0
D5 1 0 0 0 0
D6 1 0 0 0
D7 1 0 0
D8 1 0
D9 1
D5 D0D1
D0
Page Similarity
Similarity between two documents to which the two returned citations link
The number of shared pairs of adjacent capitalized words
Confidence Matrix for Page Similarity Facet
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9
D0 1 0 0 0 0 0 0 0 0 0
D1 1 0 0 0.92 0.95 0 0 0.95 0.95
D2 1 0 0 0 0 0 0 0
D3 1 0 0 0 0 0 0
D4 1 0.95 0 0 0.92 0.95
D5 1 0 0 0.92 0.95
D6 1 0 0 0
D7 1 0 0
D8 1 0.95
D9 1
Final Matrix
Combine the confidence matrices using Stanford Certainty Measure.
For Example: D1, D5 Confidence value for the attribute facet is 0.49 Confidence value for the link facet is 0 Confidence value for the link facet is 0.95 Confidence value between D1, D5 is
0.49+0.95- 0.49*0.95 = 0.97
Final Matrix and Grouping Method
D0 D1 D2 D3 D4 D5 D6 D7 D8 D9
D0 1 0.99 0 0 0 0.99 0 0 0 0
D1 1 0 0 0.92 0.97 0 0 0.95 0.97
D2 1 0 0 0 0 0 0 0
D3 1 0 0 0 0 0 0
D4 1 0.95 0 0 0.92 0.99
D5 1 0 0 0.92 0.95
D6 1 0 0 0
D7 1 0 0
D8 1 0.95
D9 1
{D0,D1}, {D0,D5}, {D1,D4}, {D1,D5}, {D1,D8}, {D1,D9}, {D4,D5}, {D4,D8}, {D4,D9}, {D5,D8}, {D5,D9}, {D8,D9}{D0,D1,D4,D5,D8,D9}, {D2}, {D3}, {D6},
{D7}
Recall and Precision
Assume we get:{0,1,3} {2,4} {5}
The correct grouping is: {0,1,2,3} {4,5}
We get:(0,1) (0,3) (1,3) (2,4)
The correct group gives: (0,1) (0,2) (0,3) (1,2) (1,3) (2,3) (4,5)
R=3/7 , P=3/(3+1)
Split and Merge
Assume we get:{0,1,3} {2,7,4} {5} {6}
The correct grouping is: {0,1,3,5,6} {2,7} {4}
Merge: 1/8 +1/8 = 2/8
Split: 1/8