keyword search over rdf graphs

of 23 /23
Keyword Search over RDF Graphs Shady Elbassuoni * and Roi Blanco ** * Max-Planck Institute for Informatics ** Yahoo! Research, Barcelona

Author: roi-blanco

Post on 16-Jul-2015

194 views

Category:

Technology


3 download

Embed Size (px)

TRANSCRIPT

  • Keyword Search over RDF GraphsShady Elbassuoni* and Roi Blanco**

    * Max-Planck Institute for Informatics** Yahoo! Research, Barcelona

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • RDF DatasetsShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardInnerspace hasGenre ComedyJoe_Dante directed InnerspaceToy_Story hasWonPrize Academy_AwardRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyTom_Hanks actedIn Toy_StoryDiner hasWonPrize Academy_AwardDiner type Comedy_filmsSteve_Guttenberg actedIn DinerThe_Pink_Panther type Criminal_comedy_filmsThe_Pink_Panther hasWonPrize Academy_AwardPolice_Academy type Comedy_filmsSteve_Guttenberg actedIn Police_AcademyThe_Darwin_Awards type Comedy_films

    subjectpredicateobject

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Searching RDF DataStructured triple-pattern queries (SPARQL)Example: comedies that have won an academy award

    Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011SELECT ?mWHERE {?m hasGenre Comedy . ?m hasWonPrize Academy_Award}

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Searching RDF DataTriple-pattern queries are very expressive but are not that useableMost users/ Search APIs prefer keyword queriesSupport keyword search over RDF graphsShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Keyword Search over RDF DataHow to process keyword queries?Translate keyword queries into SPARQLDirectly process the queries over the RDF graphWhat are the results to a keyword query?ResourcesTriplesTuples of triples (subgraphs)

    Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Keyword Search over RDF DataHow to process keyword queries?Translate keyword queries into SPARQLDirectly process the queries over the RDF graphWhat are the results to a keyword query?ResourcesTriplesTuples of triples (subgraphs)Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Keyword Search over RDF DataHow to process keyword queries?Translate keyword queries into SPARQLDirectly process the queries over the RDF graphWhat are the results to a keyword query?ResourcesTriplesTuples of triples (subgraphs)Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Keyword Search over RDF DataHow to process keyword queries?Translate keyword queries into SPARQLDirectly process the queries over the RDF graphWhat are the results to a keyword query?ResourcesTriplesTuples of triples (subgraphs)Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Processing Keyword QueriesConstruct a document D(t) for each triple tD(t) contains all literals in t and any text associated with the URIs in tExample:Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011innerspace USA1987 science fiction comedy film Joe Dante Michael Finnell Dennis Quaid Martin Short Meg Ryan academy award best visual effects t: Innerspace hasGenre Comedy We can now create triple-term indexes

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Retrieving Query ResultsFor each query keyword, retrieve a list of triples Join the triples from different lists based on their URIsShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011comedy awardInnerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...`T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Retrieving Query ResultsRetrieve a list of triples matching a query keywordJoin the triples from different lists based on their URIsShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011comedy awardInnerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...`T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardT: Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_Award

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Retrieving Query ResultsRetrieve a list of triples matching a query keywordJoin the triples from different lists based on their URIsShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011comedy awardInnerspace hasGenre ComedyRoad_Trip hasGenre ComedyToy_Story hasGenre ComedyDiner type Comedy_filmsPolice_Academy type Comedy_filmsThe_Darwin_Awards type Comedy_films...Traffic hasWonPrize Academy_AwardInnerspace hasWonPrize Academy_AwardToy_Story hasWonPrize Academy_AwardDiner hasWonPrize Academy_AwardThe_Darwin_Awards type Comedy_films...`T: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardT: Toy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_AwardT: Police_Academy type Comedy_Films . The_Darwin_Awards type Comedy_FilmsResult Ranking is crucial!!

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Language Models for TriplesD(t)t:Innerspace hasGenre ComedyShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011Esitmate fromwP(w)

    wP(w|D(t))innerspace0.23419870.123science0.012fiction0.020comedy0.111film0.179classic0.111meg0.019ryan0.019oscar0.148. . . . . .

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Ranking ModelShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011comedy awardT: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award but we treat triples as bags of words!

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Ranking ModelShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011comedy awardT: Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Awardprobability of the structure of triple t being relevant to keyword w

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Estimating Structural RelevanceFor each keyword, construct a probability distribution over predicatesExample: award

    Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011P(Innerspace hasWonPrize Academy_Award|award) = P(hasWonPrize|award)estimated from the whole dataset

    rP(r|w)hasWonPrize0.459wasNominatedFor0.387type0.112directed0.020actedIn0.021producedIn0.025bornIn0.008. . . . . .

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Example Ranked Query ResultsShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    comedy awardBag of WordsCombat_Academy type Comedy_films . The_Darwin_Awards type Comedy_filmsPolice_Academy type Comedy_films . The_Darwin_Awards type Comedy_films Innerspace hasGenre Comedy . Innerspace hasWonPrize Academy_Award Structure AwareInnerspace hasGenre Comedy . Innerspace hasWonPrize Academy_AwardToy_Story hasGenre Comedy . Toy_Story hasWonPrize Academy_AwardShrek hasWonPrize Academy_Award_Best_Animated_Feature . Shrek hasGenre Comedy

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Experimental SetupUser study over two RDF datasets: movies from IMDBbooks from LibraryThing Models compared:Structure Aware ApproachBag of Words ApproachLanguage-model-based Object RetrievalBANKS (keyword search over databases) Shady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Experimental Setup30 evaluation queriesGathered relevance assessments for the top-50 results retrieved by each modelShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Experimental ResultsShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011P-value < 0.05

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • ConclusionKeyword Search over RDF data is crucialTo support keyword search over RDF dataCombine structured triples with textConstruct a document for each tripleRetrieve meaningful query resultsTuples of joined triplesCan be extended to larger subgraphs of the RDF graphRank the retrieved results A language model approach that uses both text and structureShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • Ranking ModelShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

  • RDF GraphsShady Elbassuoni, Keyword Search over RDF Graphs, CIKM 2011

    Shady Elbassuoni, Query Relaxation for Entity-Relationship Search, ESWC 2011

    *If we zoom in on one of these datesets, they are basically just a set of triples with three fields : subject, predicate & object. RDF is a very flexible means to encode structured information in a machine readable format for instance the first triple hear states that the movie trafiic has won an Academy award. Note, that subjects and objects are URIs or literals and predicates are URIs.So how do we search RDF data? We use structured query languages like sparql where a query is a set of triple patterns. A triple pattern is just a triple with one variable. Lets look at an example. Consider we are looking for comedies that have won an academy award. This can be expressed using the triple-pattern query in the pink or simon box the ?m is a variable and the triples in the curly braces are triple patterns. In particular, the first one has predicate hasGenre and So triple patterns are really powerful and can be used to find very interesting information but do we really expect the regular users to use it? Unfortunately not! And since we computer scientists are really nice people and we always try to make the lives of the poor casual users easy, we need to enable them to search RDF data using keywords******An RDF dataset can also be viewed as a graph where subjects and objects are nodes and predicates represent labeled edges. For example, the triple about Traffic winning an academy award is represented using this edge.