  HyKSS: Hybrid Keyword and Semantic SearchAndrew Zitzelberger

  Keyword Search

  Form Based Search

  over 8,000 meters in elevationless than 100K milesfaster than 100 mphWhat about?

  HyKSSHybrid Keyword and Semantic SearchSemantics extracted annotationsMultiple ontologiesKeywords text

  Thesis StatementHyKSS (hybrid search)Outperforms keyword and semantic searchDynamic query weighting outperforms various other hybrid search approachesAllows queries over multiple ontologiesAllows pay-as-you-go improvement

  Extraction Ontologies

  Data Frames

  Indexing ArchitectureKeyword IndexerSemantic IndexerKeyword IndexSemantic IndexDocument Collection

  Indexing Architecture ImplementationOntoESOntologyLibrarySesameLucene

  Query ProcessingFree Form QueryExecute QueryPost-Process QueryCombine ResultsPre-Process QueryExecute QueryPost-Process QueryPre-Process QueryKeyword ProcessingSemantic Processing

  Keyword Query Pre-ProcessingRemove Lucene special characters (except quotes)Remove (inequality) comparison constraintsRemove non-phrase stopwords

    hondas in "excellent condition" in orem for under 12 grand

    hondas excellent condition orem

  Keyword Query Execution and Post-ProcessingExecuted by LuceneEmpty Post-Processing step

  • Semantic Query Pre-Processing Individual Ontology Scoring hondas in "excellent condition" in orem for under 12 grand


  Semantic Query Pre-Processing Ontology Set CreationFor each ontology sorted by score:For each remaining ontology:Add point for each new or subsuming matchIf added points > 0 add ontologyCompletely subsumed ontologies are removed during query generation

  Semantic Query Pre-Processing Ontology Set CreationPrice < 12000LocationVehicleContractualServicesLocationVehicleContractualServicesVehicle_Score + 1US_City=oremPrice < 12000Price < 12000ContractualServices_Score + 1Vehicle_ScoreUS_City=orem

  Semantic Query Pre-Processing Structured Query GenerationOpen world assumptionSPARQL query

  • Semantic Query Execution and Post-ProcessingSesame query executionSemantic ranking:1 point for each requested projection satisfiedNormalized by # of projections requested

    hondas in "excellent condition" in orem for under 12 grandProjections on Make, Price and US_City


  • Hybrid Query ProcessingLinear interpolation:(kw_weight * kw_score) + (sm_weight * sm_score)Dynamic solution:# keywords remaining (#kw)concept match score (cms) = * (selections + projections)kw_weight = #kw/(#kw + cms)sm_weight = cms/(#kw + cms)


  Basic Search

  Results Display

  Form Based Search

  • Results Display

  Experimental Setup Ontology Libraries5 Ontology LevelsNumberGeneric UnitsVehicle UnitsVehicleVehicle+

  Experimental Setup Query Sets113 syntactically unique queries from database students60 syntactically unique queries from linguistic students

  • Experimental Setup Document Collection250 vehicle advertisements (Craigslist)100 training, 50 validation, 100 test318 mountain pages (Wikipedia)66 roller coaster (Wikipedia)88 video game advertisements (Craigslist)


  ExperimentsTraining queries over test vehicle documentsTest queries over test vehicle documentsTraining queries over test vehicle documents + additional noiseTest queries over test vehicle documents + additional noise5 queries over noisy data (Generic Units only)

  • Experiments - MetricMean Average Precision


  Experimental Results

  Experimental Results

  Experimental Results

  ConclusionsHybrid search outperforms keyword and semantic searchHyKSSs dynamic query weighting approach outperforms various other weighting techniquesUsing multiple does not outperform selecting and using a single ontology

