a probabilistic framework for information integration and retrieval on the semantic web

17
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science, University of Mannheim, Germany presented by Thomas Packer

Upload: lacey

Post on 14-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web. by Livia Predoiu , Heiner Stuckenschmidt Institute of Computer Science, University of Mannheim, Germany presented by Thomas Packer. Sources of Uncertainty in Automated Processes in the Semantic Web. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

A Probabilistic Framework for Information Integration and

Retrieval on the Semantic Webby

Livia Predoiu, Heiner StuckenschmidtInstitute of Computer Science,

University of Mannheim, Germany

presented byThomas Packer

Page 2: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Sources of Uncertainty in Automated Processes in the Semantic Web

• Uncertain Document Classification• Uncertain Ontology Learning from Text• Uncertain Ontology Matching

• Leads to uncertain, unreliable or contradictory information.

• Traditional logic cannot handle inconsistency.

Page 3: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Motivational Example

• Domain: Bibliography• Use Case: Find publications with keyword

“AI”.• Complication: Second ontology does not

include the concept of “topic” or “keywords”.• Solution: Use machine learning to categorize

documents from the second collection.

Page 4: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Motivational Example (Continued)

• Domain: Bibliography• Use Case: Find publications with keyword

“AI”.• Complication: “Report” concept in one

ontology kind of corresponds to “Publication” in the other.

• Solution: Map concepts between ontologies.

Page 5: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Approach

• Start with a more standard approach, Description Logic Programs.

• Extend them with probabilistic information.• Call the result Bayesian Description Logic

Programs (BDLPs).

• It is a subset of Bayesian Logic Programs.• It also integrates logic programming and

description logics knowledge bases.

Page 6: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

BDLP Pedigree

Description Logic Programs (DLPs)

Bayesian Description Logic Programs (BDLPs)

Bayesian Logic Programs (BLPs)

Description Logic (DL)

Logic Programs (LPs)

Bayesian Networks (BNs)

Page 7: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Uses of Bayesian Description Logic Programs

• Framework for – information retrieval – information integration – across heterogeneous ontologies.

Page 8: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Description Logic Programs (Background)

• Intersection of:– Description Logics (knowledge representation)– Logic Programming (automated theorem proving)

• DLP program contains:– Set of rules– Set of facts

• Rules have the form:– Conjunction of predicates implies some other predicate.– H and B’s are atomic formulae.– Predicate argument are called terms.– Terms are constants or variables.– A ground atom’s terms are all constants.

Page 9: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Description Logic Programs (Background)

Page 10: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Description Logic Programs (Background)

• Restricted expressivity• Many existing DL ontologies fit DLP

restrictions.• Reasoning in DLP is decidable.• Reasoning has much lower complexity than DL

reasoning in general (in theory and in practice).

Page 11: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Bayesian Description Logic Programs

• BDLP program contains:– Set of rules– Set of facts

• Rules have the form:– Conjunction of predicates implies some other predicate.– “|” instead of “” to imply conditional probability.– Each rule has a probability distribution specifying the

probability of each state of the head atom given the states of the body atoms.

– Each ground atom corresponds to a BN node.

Page 12: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Example BDLP

Page 13: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Example Bayesian Network

• Blue Ontology 2• Cyan Learned from Ontology 2• Black & White Ontology 1• Red arcs Mappings

Page 14: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Where do Probabilities Come From?

• Deterministic ontologies– true = 1.0– false = 0.0

• Probabilistic tools– Naïve Bayes document categorization– Probabilistic ontology mapping

• Subjectively.– People argue that people are inconsistent in their judgment

of probabilities.– Using subjective probabilities is still more accurate than

forcing people to use Boolean judgments.

Page 15: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Example Query

• Query for publications about AI.• Non-ground query.• Two valid groundings.• Query BN for probabilities (IR with ranking).

Page 16: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Conclusion• Strengths:

– Actually explains how Bayesian Networks relate to predicates.– Handles integration (which others do not).– Handles IR.

• Weaknesses– DLPs don’t allow for negation or equivalence.– No measured evaluation.– Size of model and therefore BN can be exponential in size of KB.– Intractable exact inference in BN’s with cycles.

• Future work– Learn BLP programs from data.– Prune BN to portion relevant to query.– Approximate probabilistic inference.– Parallel/distributed programming.

Page 17: A  Probabilistic  Framework for Information Integration and Retrieval on the Semantic Web

Questions