hykss: hybrid keyword and semantic search

HyKSS: Hybrid Keyword and Semantic Search

Andrew Zitzelberger

1

Keyword Search

2

Form Based Search

3

4

over 8,000 meters in elevation less than 100K miles faster than 100 mph

What about?

HyKSS

• Hybrid Keyword and Semantic Search• Semantics – extracted annotations–Multiple ontologies

• Keywords – text

6

Thesis Statement

• HyKSS (hybrid search)– Outperforms keyword and semantic search– Dynamic query weighting outperforms various

other hybrid search approaches– Allows queries over multiple ontologies– Allows pay-as-you-go improvement

7

Extraction Ontologies

8

Data Frames

9

Indexing Architecture

10

Keyword Indexer Semantic Indexer

Keyword Index Semantic Index

Document Collection

Indexing Architecture Implementation

1111

Keyword Indexer

Semantic Indexer

Keyword Index

Semantic Index

Document Collection

OntoES

OntologyLibrary

Sesame

Lucene

Query Processing

12

Free Form Query

Execute Query

Post-Process Query

Combine Results

Pre-Process Query

Execute Query

Post-Process Query

Pre-Process Query

Keyword Processing Semantic Processing

Keyword Query Pre-Processing

13

• Remove Lucene special characters (except quotes)• Remove (inequality) comparison constraints• Remove non-phrase stopwords

hondas in "excellent condition" in orem for under 12 grand

hondas “excellent condition” orem

Keyword Query Execution and Post-Processing

• Executed by Lucene• Empty Post-Processing step

14

Semantic Query Pre-ProcessingIndividual Ontology Scoring

hondas in "excellent condition" in orem for under 12 grand

15

Semantic Query Pre-ProcessingOntology Set Creation

• For each ontology sorted by score:– For each remaining ontology:• Add point for each new or subsuming match• If added points > 0 add ontology

• Completely subsumed ontologies are removed during query generation

16

Semantic Query Pre-ProcessingOntology Set Creation

17

Price < 12000

LocationVehicle

ContractualServices Location

Vehicle

ContractualServices

Vehicle_Score + 1

US_City=“orem”

Price < 12000

Price < 12000

ContractualServices_Score + 1 Vehicle_Score

US_City=“orem”

Semantic Query Pre-ProcessingStructured Query Generation

• Open world assumption• SPARQL query

18

Semantic Query Execution and Post-Processing

• Sesame query execution• Semantic ranking:– 1 point for each requested projection satisfied– Normalized by # of projections requested

hondas in "excellent condition" in orem for under 12 grand– Projections on Make, Price and US_City

19

Hybrid Query Processing

• Linear interpolation:– (kw_weight * kw_score) + (sm_weight * sm_score)

• Dynamic solution:– # keywords remaining (#kw)– concept match score (cms)

= ½ * (selections + projections)– kw_weight = #kw/(#kw + cms)– sm_weight = cms/(#kw + cms)

20

Basic Search

21

Results Display

22

23

Form Based Search

Results Display

Experimental Setup – Ontology Libraries

• 5 Ontology Levels– Number– Generic Units– Vehicle Units– Vehicle– Vehicle+

25

Experimental Setup – Query Sets

• 113 syntactically unique queries from database students

• 60 syntactically unique queries from linguistic students

26

Experimental Setup – Document Collection

• 250 vehicle advertisements (Craigslist)– 100 training, 50 validation, 100 test

• 318 mountain pages (Wikipedia)• 66 roller coaster (Wikipedia)• 88 video game advertisements (Craigslist)

27

Experiments

1) Training queries over test vehicle documents2) Test queries over test vehicle documents3) Training queries over test vehicle documents +

additional noise4) Test queries over test vehicle documents + additional

noise5) 5 queries over noisy data (Generic Units only)

28

Experiments - Metric

• Mean Average Precision

29

Experimental Results

30


31


32

Conclusions

• Hybrid search outperforms keyword and semantic search

• HyKSS’s dynamic query weighting approach outperforms various other weighting techniques

• Using multiple does not outperform selecting and using a single ontology

33

External Image Citations• Slide 2 Google search screenshot: http://www.google.com (07/30/11)• Slide 3 partial car search form screenshots: http://autotrader.com/fyc (07/30/11)• Slide 4 mountain image: http://en.wikipedia.org/wiki/Lhotse (04/26/11)• Slide 4 car image: http://en.wikipedia.org/wiki/Honda (04/26/11)• Slide 4 roller coaster image: http://en.wikipedia.org/wiki/Kingda_Ka (04/26/11)• Slide 4 Wikipedia logo: http://en.wikipedia.org/wiki/Main_Page (04/26/11)• Slide 4 craigslist logo: http://provo.craigslist.org/ (04/26/11)

34

hykss: hybrid keyword and semantic search

Documents

semantic query execution

unique queries

experimentstraining

keyword search

remaining ontology

semantic searchsemantics

hyksshybrid keyword

score sm