ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured...

23
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi- structured and Structured Data Guoliang Li et al. Guoliang Li et al.

Post on 22-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

EASE: An Effective 3-in-1 Keyword Search Method for

Unstructured, Semi-structured and Structured

Data

Guoliang Li et al.Guoliang Li et al.

The ProblemThe Problem

Keyword search introduces false positivesKeyword search introduces false positives

i.e.: “Conference 2008 Canada Data Integration”i.e.: “Conference 2008 Canada Data Integration”

The ProblemThe Problem

Websites are organized through contentWebsites are organized through content

““Dr Pain, Math 343, Linear Algebra”Dr Pain, Math 343, Linear Algebra”

The SolutionThe Solution

Combine linked pages for search, Combine linked pages for search, ordered by rankingordered by ranking

The Solution

r-Radius Steiner Graph Problem r-Radius Graph

Centric Distance: shortest path Radius: minimal centric distance

vu

t

r

s

The Solution

r-Radius Steiner Graph Problem Content node: Contains a keyword Steiner node: Two content nodes

u

t

r“Dr Pain”

“Math 343”

v

s

r-Radius Steiner Graph on search

Example:Example:

r-Radius Steiner Graph on search

r-Radius Steiner Graph on search

The graph model for the publication database

Adjacency MatrixAdjacency Matrix

Finding r-Radius GraphsFinding r-Radius Graphs Query: “Shanmugasundaram, Guo, Query: “Shanmugasundaram, Guo,

XRANK”XRANK”

Avoiding OverlappingAvoiding Overlapping

Maximal r-Radius GraphMaximal r-Radius Graph It is not contained in another r-Radius It is not contained in another r-Radius

subgraphsubgraph But wait! There is still overlapBut wait! There is still overlap No problem:No problem:

Graph Clustering Graph Clustering Graph PartitioningGraph Partitioning

Graph ClusteringGraph Clustering

RankingRanking

TF-IDF-based IR ranking (tf,idf,ndl) is TF-IDF-based IR ranking (tf,idf,ndl) is okok

Better yet: structural compactness-Better yet: structural compactness-based DB ranking (SIM)based DB ranking (SIM) More compact more relevantMore compact more relevant Length of path inversely proportional to Length of path inversely proportional to

rankingranking

IndexingIndexing

IR score and Sim score are combinedIR score and Sim score are combined An inverted index (EI-Index) is An inverted index (EI-Index) is

created created The inverted index stores keyword The inverted index stores keyword

pairs and scorespairs and scores

ExperimentsExperiments

ResultsResults

ResultsResults

ResultsResults

ResultsResults

Strengths of the PaperStrengths of the Paper

Very well written paperVery well written paper Deep research on the topicDeep research on the topic Mathematical based and provedMathematical based and proved Baseline with current methodsBaseline with current methods Good resultsGood results

Weakness and Future WorkWeakness and Future Work

It might be too complexIt might be too complex Could work on ways to find Steiner Could work on ways to find Steiner

graphs fastergraphs faster It doesn’t consider cases of farming It doesn’t consider cases of farming

sites or bogus sitessites or bogus sites

Questions?Questions?