hykss: hybrid keyword and semantic search

of 34 /34
HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1

Author: raheem

Post on 05-Jan-2016




1 download

Embed Size (px)


HyKSS: Hybrid Keyword and Semantic Search. Andrew Zitzelberger. 1. Keyword Search. 2. Form Based Search. 3. What about?. over 8,000 meters in elevation. less than 100K miles. faster than 100 mph. 4. 5. HyKSS. Hy brid K eyword and S emantic S earch - PowerPoint PPT Presentation


  • HyKSS: Hybrid Keyword and Semantic SearchAndrew Zitzelberger*

  • Keyword Search*

  • Form Based Search*

  • *over 8,000 meters in elevationless than 100K milesfaster than 100 mphWhat about?

  • *

  • HyKSSHybrid Keyword and Semantic SearchSemantics extracted annotationsMultiple ontologiesKeywords text*

  • Thesis StatementHyKSS (hybrid search)Outperforms keyword and semantic searchDynamic query weighting outperforms various other hybrid search approachesAllows queries over multiple ontologiesAllows pay-as-you-go improvement*

  • Extraction Ontologies*

  • Data Frames*

  • Indexing Architecture*Keyword IndexerSemantic IndexerKeyword IndexSemantic IndexDocument Collection

  • Indexing Architecture Implementation**OntoESOntologyLibrarySesameLucene

  • Query Processing*Free Form QueryExecute QueryPost-Process QueryCombine ResultsPre-Process QueryExecute QueryPost-Process QueryPre-Process QueryKeyword ProcessingSemantic Processing

  • Keyword Query Pre-Processing*Remove Lucene special characters (except quotes)Remove (inequality) comparison constraintsRemove non-phrase stopwords

    hondas in "excellent condition" in orem for under 12 grand

    hondas excellent condition orem

  • Keyword Query Execution and Post-ProcessingExecuted by LuceneEmpty Post-Processing step*

  • Semantic Query Pre-Processing Individual Ontology Scoring hondas in "excellent condition" in orem for under 12 grand


  • Semantic Query Pre-Processing Ontology Set CreationFor each ontology sorted by score:For each remaining ontology:Add point for each new or subsuming matchIf added points > 0 add ontologyCompletely subsumed ontologies are removed during query generation*

  • Semantic Query Pre-Processing Ontology Set Creation*Price < 12000LocationVehicleContractualServicesLocationVehicleContractualServicesVehicle_Score + 1US_City=oremPrice < 12000Price < 12000ContractualServices_Score + 1Vehicle_ScoreUS_City=orem

  • Semantic Query Pre-Processing Structured Query GenerationOpen world assumptionSPARQL query*

  • Semantic Query Execution and Post-ProcessingSesame query executionSemantic ranking:1 point for each requested projection satisfiedNormalized by # of projections requested

    hondas in "excellent condition" in orem for under 12 grandProjections on Make, Price and US_City


  • Hybrid Query ProcessingLinear interpolation:(kw_weight * kw_score) + (sm_weight * sm_score)Dynamic solution:# keywords remaining (#kw)concept match score (cms) = * (selections + projections)kw_weight = #kw/(#kw + cms)sm_weight = cms/(#kw + cms)


  • Basic Search*

  • Results Display*

  • *Form Based Search

  • Results Display

  • Experimental Setup Ontology Libraries5 Ontology LevelsNumberGeneric UnitsVehicle UnitsVehicleVehicle+*

  • Experimental Setup Query Sets113 syntactically unique queries from database students60 syntactically unique queries from linguistic students*

  • Experimental Setup Document Collection250 vehicle advertisements (Craigslist)100 training, 50 validation, 100 test318 mountain pages (Wikipedia)66 roller coaster (Wikipedia)88 video game advertisements (Craigslist)


  • ExperimentsTraining queries over test vehicle documentsTest queries over test vehicle documentsTraining queries over test vehicle documents + additional noiseTest queries over test vehicle documents + additional noise5 queries over noisy data (Generic Units only)*

  • Experiments - MetricMean Average Precision


  • Experimental Results*

  • Experimental Results*

  • Experimental Results*

  • ConclusionsHybrid search outperforms keyword and semantic searchHyKSSs dynamic query weighting approach outperforms various other weighting techniquesUsing multiple does not outperform selecting and using a single ontology*

  • External Image CitationsSlide 2 Google search screenshot: http://www.google.com (07/30/11)Slide 3 partial car search form screenshots: http://autotrader.com/fyc (07/30/11)Slide 4 mountain image: http://en.wikipedia.org/wiki/Lhotse (04/26/11)Slide 4 car image: http://en.wikipedia.org/wiki/Honda (04/26/11)Slide 4 roller coaster image: http://en.wikipedia.org/wiki/Kingda_Ka (04/26/11)Slide 4 Wikipedia logo: http://en.wikipedia.org/wiki/Main_Page (04/26/11)Slide 4 craigslist logo: http://provo.craigslist.org/ (04/26/11)