grouping search-engine returned citations for person name queries reema al-kamha research supported...

17
Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF

Post on 19-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Grouping Search-Engine Returned Citations for Person Name Queries

Reema Al-Kamha

Research Supported by NSF

The Problem

Search engines return too many citations Example: “Christopher Young” Google returns around 26,500 citations

Many people named “Christopher Young” It would help to group the citations by person. How do we group them?

“Christopher Young” Query to Google

“Christopher Young” Query Results for Our System

Three facets Attributes Links Page Similarity

Confidence matrix for each facet

Final confidence matrix

Our Solution

Attributes

Email Address, Phone, City, State, Zip Code.

D0 D1 D2 D3 D4 D5 D6 D7 D8 D9

D0 1 0 0 0 0 0 0 0 0 0

D1 1 0 0 0 0.49 0 0 0 0.49

D2 1 0 0 0 0 0 0 0

D3 1 0 0 0 0 0 0

D4 1 0 0 0 0 0.86

D5 1 0 0 0 0

D6 1 0 0 0

D7 1 0 0

D8 1 0

D9 1

Confidence Matrix for Attributes Facet

D1&D5 have the same State. D1&D9 have the same State. D4&D9 have the same City.

Links

Returned citations that have a same host www.cs.byu.edu/info/dwembley.html

www.cs.byu.edu/info/directory.php

One returned citation links to another returned citation.

Confidence Matrix for Links Facet

D0 D1 D2 D3 D4 D5 D6 D7 D8 D9

D0 1 0.99 0 0 0 0.99 0 0 0 0

D1 1 0 0 0 0 0 0 0 0

D2 1 0 0 0 0 0 0 0

D3 1 0 0 0 0 0 0

D4 1 0 0 0 0 0

D5 1 0 0 0 0

D6 1 0 0 0

D7 1 0 0

D8 1 0

D9 1

D5 D0D1

D0

Page Similarity

Similarity between two documents to which the two returned citations link

The number of shared pairs of adjacent capitalized words

Confidence Matrix for Page Similarity Facet

D0 D1 D2 D3 D4 D5 D6 D7 D8 D9

D0 1 0 0 0 0 0 0 0 0 0

D1 1 0 0 0.92 0.95 0 0 0.95 0.95

D2 1 0 0 0 0 0 0 0

D3 1 0 0 0 0 0 0

D4 1 0.95 0 0 0.92 0.95

D5 1 0 0 0.92 0.95

D6 1 0 0 0

D7 1 0 0

D8 1 0.95

D9 1

Final Matrix

Combine the confidence matrices using Stanford Certainty Measure.

For Example: D1, D5 Confidence value for the attribute facet is 0.49 Confidence value for the link facet is 0 Confidence value for the link facet is 0.95 Confidence value between D1, D5 is

0.49+0.95- 0.49*0.95 = 0.97

Final Matrix and Grouping Method

D0 D1 D2 D3 D4 D5 D6 D7 D8 D9

D0 1 0.99 0 0 0 0.99 0 0 0 0

D1 1 0 0 0.92 0.97 0 0 0.95 0.97

D2 1 0 0 0 0 0 0 0

D3 1 0 0 0 0 0 0

D4 1 0.95 0 0 0.92 0.99

D5 1 0 0 0.92 0.95

D6 1 0 0 0

D7 1 0 0

D8 1 0.95

D9 1

{D0,D1}, {D0,D5}, {D1,D4}, {D1,D5}, {D1,D8}, {D1,D9}, {D4,D5}, {D4,D8}, {D4,D9}, {D5,D8}, {D5,D9}, {D8,D9}{D0,D1,D4,D5,D8,D9}, {D2}, {D3}, {D6},

{D7}

Recall and Precision

Assume we get:{0,1,3} {2,4} {5}

The correct grouping is: {0,1,2,3} {4,5}

We get:(0,1) (0,3) (1,3) (2,4)

The correct group gives: (0,1) (0,2) (0,3) (1,2) (1,3) (2,3) (4,5)

R=3/7 , P=3/(3+1)

Split and Merge

Assume we get:{0,1,3} {2,7,4} {5} {6}

The correct grouping is: {0,1,3,5,6} {2,7} {4}

Merge: 1/8 +1/8 = 2/8

Split: 1/8

Measurements

Precision and Recall R=89% , P=96.6%

Weighted Merge and Split M=0.036 , S=0.008

Contributions

Grouped person-name queries by person

Provided an additional tool for search engine queries