mining bayesian networks out of ontologies

26
J Intell Inf Syst (2012) 38:507–532 DOI 10.1007/s10844-011-0165-4 Mining Bayesian networks out of ontologies Bellandi Andrea · Turini Franco Received: 16 August 2009 / Revised: 10 May 2011 / Accepted: 12 May 2011 / Published online: 14 June 2011 © Springer Science+Business Media, LLC 2011 Abstract Probabilistic reasoning is an essential feature when dealing with many ap- plication domains. Starting with the idea that ontologies are the right way to formalize domain knowledge and that Bayesian networks are the right tool for prob- abilistic reasoning, we propose an approach for extracting a Bayesian network from a populated ontology and for reasoning over it. The paper presents the theory behind the approach, its design and examples of its use. Keywords Probabilistic reasoning · Ontology queries 1 Introduction Ontologies have been proposed Guarino and Poli (1995) as the means for adding se- mantics to the web. They provide a formal representation of the knowledge on given domains, which can then be processed by machines. Knowledge can be extracted from an ontology using logical reasoning that exploits both the relationships among classes (concepts) and the facts stored in it (the instances of the classes). In this paper we show that the fact that the ontology contains both the data and their semantic description offers the opportunity of yet another kind of reasoning, i.e. probabilistic reasoning. The idea is simple. The semantic organization of concepts can provide the conditional probability dependencies among them, and the frequen- cies of the data instances can provide the necessary probability distributions. With our approach probabilistic reasoning can be performed over an ontology without A. Bellandi (B ) Livorno Port Authority, Livorno, Italy e-mail: [email protected] F. Turini Department of Computer Science, University of Pisa, Pisa, Italy e-mail: [email protected]

Upload: bellandi-andrea

Post on 25-Aug-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532DOI 10.1007/s10844-011-0165-4

Mining Bayesian networks out of ontologies

Bellandi Andrea · Turini Franco

Received: 16 August 2009 / Revised: 10 May 2011 / Accepted: 12 May 2011 /Published online: 14 June 2011© Springer Science+Business Media, LLC 2011

Abstract Probabilistic reasoning is an essential feature when dealing with many ap-plication domains. Starting with the idea that ontologies are the right way toformalize domain knowledge and that Bayesian networks are the right tool for prob-abilistic reasoning, we propose an approach for extracting a Bayesian network froma populated ontology and for reasoning over it. The paper presents the theory behindthe approach, its design and examples of its use.

Keywords Probabilistic reasoning · Ontology queries

1 Introduction

Ontologies have been proposed Guarino and Poli (1995) as the means for adding se-mantics to the web. They provide a formal representation of the knowledge on givendomains, which can then be processed by machines. Knowledge can be extractedfrom an ontology using logical reasoning that exploits both the relationships amongclasses (concepts) and the facts stored in it (the instances of the classes).

In this paper we show that the fact that the ontology contains both the data andtheir semantic description offers the opportunity of yet another kind of reasoning,i.e. probabilistic reasoning. The idea is simple. The semantic organization of conceptscan provide the conditional probability dependencies among them, and the frequen-cies of the data instances can provide the necessary probability distributions. Withour approach probabilistic reasoning can be performed over an ontology without

A. Bellandi (B)Livorno Port Authority, Livorno, Italye-mail: [email protected]

F. TuriniDepartment of Computer Science, University of Pisa, Pisa, Italye-mail: [email protected]

Page 2: Mining Bayesian networks out of ontologies

508 J Intell Inf Syst (2012) 38:507–532

any need to modify the ontology itself. The treasure, i.e. all that is needed for thereasoning, is already there. Some processing is required, but there is no need to addexplicit knowledge.

The need to extend the ontological approach to include some form of probabilisticreasoning is well understood (Pearl 1988). Uncertainty exists in almost every aspectof ontology engineering, for example in domain modeling and ontology reasoning.Assuming, for example, that our knowledge is about business, we might want to beable to answer queries like “what is the likelihood of a company defaulting given thatit is a limited company and has branches outside Europe?”.

In the next section we review existing approaches for extending ontology for-malisms with the capabilities of probabilistic reasoning. All require modificationsto the structure of the ontology and, as far as we know, ours is the first proposal thatmanages to avoid such modifications.

Our method has three steps. The first, called the ontology compiling process,compiles the ontology into a Bayesian network (Niedermayer 2008). Bayesiannetworks are a powerful language for representing probabilistic relationships amonglarge numbers of uncertain hypotheses. They have been applied to a wide varietyof problems including medical diagnoses, classification systems, multi-sensor fusion,and legal analyses for trials. However, standard Bayesian networks are insufficientlyexpressive to cope with the expressive power of the ontologies. We thus use a specificform of Bayesian network, i.e. layered Bayesian Networks. The ontology compilingprocess extracts the layered structure of the Bayesian network directly from theschema of the knowledge base (TBox). The second step is about learning the initialprobability distributions. Distributions, both prior and conditional, are computeddirectly from the ontology instances (ABox), on the basis of Bayes theorem. Thethird step consists in performing probabilistic reasoning, using inference schemasbased on the structure of the extracted Bayesian network. The resulting layeredBayesian network offers reasoning capabilities that can satisfy anon trivial set ofprobabilistic queries.

Preliminary results and ideas behind this reseearch have been presented inBellandi and Turini (2009). The paper is organized as follows. Section 2 brieflydiscusses probabilistic reasoning in ontologies, and presents the main researchlines related to our work. Section 3 provides some background on ontology andlayered Bayesian networks. The main inference schemas over Bayesian networksare discussed in Section 4. Section 5 introduces our approach and presents thefundamental aspects of the probabilistic reasoning schema, i.e. layered BayesianNetworks. A query language for reasoning over the extracted networks using queriesinvolving probabilites is introduced in Section 6. The language is presented in termsof its grammar and operational semantics. Section 7 shows detailed examples ofhow the implemented system can be used. Future research issues are outlined in theconclusions.

2 Related work

There is increasing interest in extending traditional ontology formalisms to includesound mechanisms for representing uncertain knowledge and reasoning on it. Animportant question, therefore, is how probabilistic formalisms, such as Bayesian

Page 3: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 509

Networks, can be integrated within formal ontologies. Probabilistic theories producequalified conclusions, which are graded by numerical measures of plausibility. Incontrast, formal ontologies focus on purely logical reasoning, which leads to definiteconclusions. While there are several upper ontologies, each with its own lattice type,generally there is no uncertainty associated with the relations in an individual ontol-ogy. However, the situation changes when we consider the problem of categorizinginstances. Probability theory is an essential tool for performing this kind of inferencein a systematic and sound manner. The information on probabilities can be obtainedfrom an analysis of instance frequencies, from the judgement of experienced experts,from the physical characteristics of sensing systems, or from some combination of theabove.

In general, there are two different approaches to handling uncertainty in thiscontext. The first is to try to extend current ontology representation formalismswith uncertainty reasoning. The second suggests the representation of probabilisticinformation using an OWL or RDF(S) ontology.1

Our proposal lies between the two research lines, because it neither modifies theontology formalism, nor represents the probability concepts within the ontology.Among the most related works are Ding et al. (2004, 2005) and Ding and Peng (2004,2005). The authors represent the probability concept within OWL, i.e. they developa framework which augments and supplements OWL to represent and reason withuncertainty based on Bayesian networks. They augment OWL semantics in order toallow probabilistic information to be represented via additional markups. The resultis an ontology that is annotated with probabilities, which can then be translated intoa Bayesian network. Their approach only deals with is-a relationships, whereas wecan also deal with other kinds of relationships (object properties), thus offering moregeneral forms of reasoning.

Another relevant contribution in this area is Costa and Laskey (2005). In theirapproach, OWL ontologies can be used to represent complex Bayesian probabilisticmodels, in a way that is flexible enough to be used by diverse Bayesian probabilistictools based on different probabilistic technologies. This approach entails modifyingand re-arranging the original knowledge base to deal with the uncertainty, by intro-ducing new relationships, or by using non classical Bayesian networks. Our approach,instead, extracts the Bayesian Network from the ontology itself by exploiting boththe TBox and the ABox (Baader et al. 2003).

Costa and Laskey (2005) can be more flexible than our current method, butrequires an explicit input from the expert. We consider of our approach to beadvantageous in that, in the general context of the semantic web, it is highly desirablethat the uncertainty model and the related reasoner require minimal modifications tothe semantic web and the knowledge base. This may turn out to be the determiningfactor for the community in terms of accepting or rejecting such system theories.

A very interesting approach is presented in McGarry et al. (2007a, b). The authorspropose the integration of ontologies within the probabilistic framework of Bayesiannetworks, which enables the reasoning and prediction of protein functions. Theyautomatically generate a viable ontology based on the information extraction ofkeywords from the research literature and their use for the definition of entities

1Please refer to http://www.w3.org/TR/rdf-mt/ for OWL definitions and notions related to RDF.

Page 4: Mining Bayesian networks out of ontologies

510 J Intell Inf Syst (2012) 38:507–532

and relationships. The ontology is then cast within a probabilistic framework usingBayesian networks for the inferencing and the prediction. Our research on the otherhand, defines a method for extracting the Bayesian network out of ontologies ratherthan an application of Bayesian techniques to ontologies. The authors then useprior beliefs about a specific situation to pre-structure their Bayesian network. Forexample, if a particular gene were known to regulate several target genes, they wouldintroduce this relationship into the Bayesian network by adding the appropriateedges.

The Bayesian network, extracted according to our method, does not need tobe refined by adding arcs or nodes. The translation process provides the completeoutput by exploiting the knowledge coded by an ontology both at a concept leveland an instance level. As stated before, our method does not require extra inputsto build a Bayesian network as, for example, in Danev et al. (2006). Here, theauthors present an approach to harness the knowledge and inference capabilitiesinherent in an ontology model to automate the construction of Bayesian networks inorder to accurately represent a domain of interest. The tasks involved in this processrequire expert inputs. In order to create arcs between Bayesian nodes, the algorithmrelies on rules that are specific to the application domain and define which ontologyproperties or relations between concepts correspond to arcs in the Bayesian network.It is a very interesting approach because dependencies can be created betweenBayesian nodes, which correspond to ontology classes that are not explicitly boundby any ontology relation. Our research on the other hand, exploits only the explicitontological knowledge, and each ontology relation (i.e. object property) expresses acausal relation in the Bayesian network that is going to be extracted. Next, for eachBayesian node we also compute the initial conditional probability distribution byexploiting the ABox ontology. The task of estimating the conditional probabilities isnot treated in Danev et al. (2006).

3 Ontologies and layered Bayesian networks

In this section we review the basic concepts regarding ontologies and we introducethe definition of Layered Bayesian Networks that we will use later.

3.1 Ontology

An ontology explicitly represents the classes of entities of an application domain,their properties, their relationships, the roles they can play, how they are decom-posed into parts, and the events and processes that they are involved in. Knowledgecan be extracted from an ontology using logical reasoning that exploits both therelationships between classes (concepts) and the facts stored in it (the instances of theclasses). An ontology is represented by an RDF graph, which is in turn a set of RDFtriples. An ontology builds on RDF and RDF Schema and adds more vocabularyfor describing properties and classes (Broekstra et al. 2002) among others, relationsbetween classes (e.g. disjointness), cardinality (e.g. exactly one), equality, richertyping of properties, characteristics of properties (e.g. symmetry), and enumeratedclasses. However, the meaning of an ontology is solely determined by the RDFgraph. Formally, ontologies consist of two parts: intensional and extensional. The

Page 5: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 511

intensional part, referred to as T Box, contains knowledge about concepts (i.e.classes) and the complex relations between them (i.e. roles). The extensional part,referred to as ABox, contains knowledge about entities (i.e. individuals) and howthey relate to the classes and roles.

Figure 1 shows an example of ontology. It contains information on two domains;Company domain which describes the nature of the companies, and Personneldomain which describes the nature of the personnel in the organization chart of acompany. The graph also contains the hasPersonnel relationship connecting the twodomains. This relationship is called an object property.

Ontologies also enable us to describe knowledge by using data properties. Theyrepresent attributes of each concept of a domain. A data property can refer to bothquantitative and qualitative attributes. For example, each company can have aninitial capital, which is a quantitative attribute, and a name which is a qualitativeattribute. In our study we deal neither with data properties nor with logical relations.This task will form part of our future research, as discussed further in the conclusions.

We are specifically interested in looking at ontologies that organize knowledge us-ing object properties connecting concepts which, in turn, are hierarchically structuredusing the is-a relationship. In the example in Fig. 1 the two hierarchies are Companyon one side and Personnel on the other. The object relationship hasPersonnel, whichbinds the two roots, is naturally inherited along the is-a hierarchy. The followingdefinitions set the terms for introducing our approach.

Definition 1 (Domain concept) A domain concept D is a is-a hierarchy of classes.

Definition 2 (TBox) A TBox component is a triple < D, C,R > where:

– D is the set of the domain concepts.– C is the set of all the ontology classes.– R is the set of object properties binding domain concepts to each other.

Fig. 1 Example of ontology

Page 6: Mining Bayesian networks out of ontologies

512 J Intell Inf Syst (2012) 38:507–532

Definition 3 (ABox) An ABox component is a set of ontology instances I . Eachinstance belongs to a class and to all its superclasses. Given a subset {i} ⊆ I , theinstances {i} can be related by means of a subset of object properties {r} ⊆ R,according to the T Box component.

Definition 4 (Ontology) An ontology O is a pair < T ,I > where:

– T is a triple < D, C,R >.– I is the set of ontology instances satisfying T .

3.2 Layered Bayesian Networks

The main definitions of a layered Bayesian Network (lBN in the following) arebased on Hierarchical Bayesian Networks features (Flach and Lachiche 2000; Flachand Gyftodimos 2004). lBNs are very similar to Bayesian Networks in that theyrepresent probabilistic dependencies among variables as a direct acyclic graph. Eachnode in the graph represents a random variable and is quantified by the conditionalprobability of the variable given the values of its parents in the graph. In lBNseach node can contain in turn, a Bayesian network. Thus, a node can represent acomplex hierarchical domain, rather than a simple event. Thus, each arc representsthe relationship between domain values represented by each node. Intuitively, lBNsare a generalisation of standard Bayesian networks where, at the higher level, a nodein the network can be a Bayesian network coding hierarchical domains.

Figure 2a shows a simple example of lBN, where B is a node representing ahierarchical domain. This allows the random variables of the network to representthe values of the domain they code, at the lower level. This means that within a

Fig. 2 Example of 2lBN

Page 7: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 513

single node, there may also be links between components, representing probabilisticdependencies among parts of the structure’s lower level. Thus, in our context, we willuse a two-level Bayesian network (hereafter 2lBN).

Our idea is to use this representation to capture the probabilistic nature of thedomain knowledge. The structure of the two-level network is flexible. A network canbe easily extended by adding more components inside a composite node, or refinedby transforming a leaf node into a composite one. This can be done on the basis, forexample, of the taxonomical knowledge provided by ontologies.

Each node representing a specific domain using a Bayesian network is calleda High Level Node (HLN). Each arc binding HLNs with each other, is called aHigh Level Relation (HLR). Each node within an HLN, which stands for a specificsubclass of that domain, is called a Low Level Node (LLN). Each arc binding LLNsis called a Low Level Relation (LLR).

Each HLR is labelled by the name of the specific relation that it represents. LLRsare not labelled, because they all represent is-a relationships among domain values,within specific HLNs.

The following definitions formalize the above notions.

Definition 5 (2lBN-structural part) A 2lBN structural part is a triple < N ,V,A >

where

– N is the set of HLNs, each of which has a domain type τ , and corresponds to arandom variable.

– V is the set of LLNs belonging to each HLN. Each LLN corresponds to a randomvariable, and all the LLNs belonging to each HLN make up the whole Bayesiannetwork associated with that HLN.

– A ⊆ N 2 is the set of directed labelled arcs between elements of N such that theresulting graph contains no cycles.

Figure 2a shows an example of 2lBN structural part, where N = {A, B, C}, V ={B1, B2, B3}, and A = {l1, l2}.

Definition 6 (2lBN-probabilistic part) A 2lBN probabilistic part related to a 2lBN-structure consists of:

– A LLRT, i.e. a Low Level Relation probability table for each node ∈ V , withrespect to the is-a relationships (LLR).

– A HLRT, i.e. a High Level Relation probability table for each node ∈ N , withrespect to the labelled relationships (HLR).

Figure 2b shows examples of conditional probability tables related to A, B1, B2,B3 nodes.

Definition 7 (2lBN) A two-level Bayesian Network is a pair < S,Pr > where:

– S =< N ,V,A >, is a 2lBN-structure.– Pr, is the 2lBN-probabilistic part related to S .

Page 8: Mining Bayesian networks out of ontologies

514 J Intell Inf Syst (2012) 38:507–532

4 Bayesian inference

The DAG of a Bayesian network model is a compact representation of the depen-dence and independence properties of the joint probability distribution representedby the model.

If there is an arc from node A to another node B, A is called a parent of B, and Bis a child of A. The set of parents of a node Xi is denoted, in general, by parent(Xi).In a Bayesian network, the joint distribution of the node values can be written as theproduct of the local distributions of each node and its parents:

P(X1, ..., Xn) =∏n

i=1P(Xi|parent(Xi))

If node Xi has no parents, its local probability distribution is said to be unconditional,otherwise conditional. If the value of a node is known, then the node is said to be anevidence node.

The graph encodes independencies between variables (Geiger et al. 1990). InBayesian networks, d-separation is a property of two nodes X and Y with respectto a set of nodes Z. X and Y are said to be d-separated by Z if no information canflow between them when Z is observed. The d in d-separation stands for dependence.Informally, two variables X and Y are independent conditional on Z if knowledgeabout X gives you no extra information about Y once you have knowledge of Z. Inother words, once you know Z, X adds nothing to what you know about Y.

The basic computation on Bayesian networks is the computation of the condi-tional probability of every node, given the observed evidence. Although evaluatingBayesian networks is, in general, NP-hard (Dagum and Luby 1993; Cooper 1990),there is a class of networks that can be efficiently solved in linear time in the numberof nodes. This class is polytrees. A polytree is a directed graph with at most oneundirected path between any two vertices. In other words, a polytree is a directedacyclic graph for which there are no undirected cycles either. We give the followingdefinition.

Definition 8 (Path uniqueness property) A sequence of n arcs, with n ≥ 1, belongingto a polytree R is called a path. In R each node can be reached by any other nodevia a unique path. We call this feature, the uniqueness property.

The most critical problem related to the efficiency of the inference process is tofind the optimal order in which the computations are performed. The inference taskis in principle, solved by performing a sequence of multiplications and additions.Below we present the main inference schemas (Pearl 1988; Kim and Pearl 1983;Jensen and Nielsen 2007; Cowell et al. 2001) using the network in Fig. 5a as anexample:

– Causal inference. In general, it permits the computation of the probability of aneffect E given one of its causes C1. The main steps are re-writing the conditionalprobability of E given the evidence C1 in terms of all the probabilities of E andall of its parents (which are not part of the evidence C1), given the evidence. Forexample, referring to Fig. 5a, in order to compute P(D|B) we have:

P(D|B) = P(D|A, B) · P(A) + P(D|A, B) · P(A)

Page 9: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 515

– Diagnostic inference. It permits the computation of the probability of a cause C1

given its effect E. By applying Bayes rule, the diagnostic reasoning is transformedinto a causal reasoning with respect to a normalisation factor. For example, inorder to compute P(B|D) we have:

P(B|D) = P(D|B) · P(B)

P(D)

P(D|B) is computable by applying a causal inference schema. P(B)

P(D)is called a

normalisation factor.– Explaining away inference. It is a combination of the previous schemas. It uses a

causal inference step within a diagnostic process.For example, to compute P(B|A, D), we have to apply the Bayes rule:

P(B|A, D) = P(A, D|B) · P(B)

P(A, D)

Because of the conditional probability definition, the above is equivalent to:

P(A|D, B) · P(D|B) · P(B)

P(A, D)

5 Ontology compiling process

In this section we define the ontology compiling process for deriving the 2lBNdirectly from the ontology. It has two phases: compiling T Box into a 2lBN structuralpart, and compiling ABox into a 2lBN probabilistic part.

In phase one, each concept is mapped into a random variable of the 2lBN, andthe whole structure of the Bayesian network is built up.

In phase two, uncertainty values are associated with the relations modelled in thefirst phase. The uncertainty of classes and relations in an ontology is represented hereby probability distributions. The probability distributions can generally be providedby domain experts, but in our proposal they are directly derived from the explicit datastored in the ontology. We deal with two kinds of distributions. The first representsthe probability that an arbitrary instance belongs to a specific class, and the secondrepresents the probability that an arbitrary instance is involved in a given objectproperty. A class A is handled as a random Boolean variable with two states a anda. We interpret P(a) as the prior probability that an arbitrary instance belongs to A,and P(a|b) as the conditional probability that an instance of class B also belongs toclass A.2 Similarly, we can interpret P(a), P(a|b), P(a|b), P(a|b) with the negationinterpreted as “not belonging to”.

Concerning relations among domain concepts, we treat domains and ranges of ob-ject properties as random multi-valued variables, which associate instances belongingto a class with instances belonging to another class with a certain likelihood, withrespect to the object properties they encode.

2Note that the concept of “belonging” referred to each instance can be thought of as “involved inis-a relation”. P(a) means that a is involved in is-a relation, because each ontology instance alwaysbelongs to some class. P(a|b) means that an is-a relation exists between a and b .

Page 10: Mining Bayesian networks out of ontologies

516 J Intell Inf Syst (2012) 38:507–532

5.1 Compiling the TBox

The idea is to map each ontology class into a random Boolean variable, then to findout all the domain concepts, and to map each of them into a random multi-valuedvariable. Hence, at the upper level of the network each HLN represents a domainconcept of the ontology, and each labelled arc represents a specific object property.At the lower level, each HLN is a Bayesian network consisting of LLNs, each ofwhich represents an ontology class. The arcs of these Bayesian networks encodeinstances of the is-a ontology relation among classes.

Definition 9 (�s mapping) The compiling process of a Tbox maps the triple <

D, C,R > into the triple < N ,V,A > in such a way that each concept domain in Dis mapped into a multi-valued random variable of N , each class of C is mapped intoa Boolean variable in V , and each relationship in R is mapped into a Bayesian arcof A. is-a relationships are mapped into low level relations and each object propretyis mapped into a high level relation. Abusing notation, we will also denote by �s

the mappings restricted to each component of the triple, and with �−1s the reverse

mapping.

Again referring again to the example in Fig. 1, if we compute

�s(< {COMPANY, PERSONNEL}, {C}, {hasPersonnel} >)

where C = {Personnel, Management, HumanResources, RiskManager, Account-ingManager, Of f icerWorkers, MiddleManager, Company, Partnership, Customer,Jointventure, Vendor, Competitor, Supplier, Reseller, LimitedLiabilityPS, LimitedPS,PCSupplier, OtherSupplier} we obtain the Bayesian network depicted in Fig. 3, withtwo HLNs, i.e. PERSONNEL and COMPANY, and one HLR, i.e. hasPersonnel.All the LLNs represent the subclasses of each domain concept. Note that theprobability distributions have not been computed yet.

5.2 Compiling the ABox

In this section we introduce the method for computing the initial probability dis-tributions associated with 2lBN. Note that this method assumes the existence of a

Fig. 3 Example of 2lBN structural part

Page 11: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 517

very large set of instances (ABox), which can be an acceptable sample space forprobabilities. In the worst case, when instances are not available, we need someexternal knowledge of the data, such as synthetic instances or training data setsfrom which we can compute the initial probability distributions, or probability tablesprovided by experts.

There are two kinds of distributions: low level relation probability distributionsrelated to the is-a relationships, and high level relation probability distributionsrelated to the labelled relationships (which are ontology object properties).

5.3 Low level relation probability table (LLRT)

These kinds of distributions refer to random Boolean variables that representontology classes. Each domain D concept has its own distribution that describeshow instances are distributed over its taxonomical description. �s(ci) = vi returnsthe Boolean random variable vi associated with the class ci. vi equals true whenthe ontology instance belongs to the class ci, otherwise it is false. In practice, eachdomain concept Di

3 contains all the instances of the specific domain it represents. Inother words, Di can be thought of as the root class of the taxonomy representing thatdomain. From this point of view, all the instances not belonging to a specific classbelong to the class defined by the set differences between Di and that class. Thus, foreach vi we can define �−1

s (vi) in terms of �−1s (vi) as follows:

�−1s (vi) = �−1

s (Ti)/�−1s (vi)

where / is the set difference operator, and where �−1s (Ti) = Di is the domain

concept to which �−1s (vi) = ci belongs. The number of instances belonging to the set

difference between the ontology classes �−1s (Ti) and �−1

s (vi), is equal to the numberof instances not belonging to the class �−1

s (vi), i.e. equivalent to set the value of vi tofalse (i.e. vi).

The probabilistic part LLRT is then a table in which the first two columns reportall the classes and the domain concepts to which they belong. The other columnsreport all the possible combinations of the truth values of the random Booleanvariables. All columns identify a combination corresponding to a state si in whicheach instance can be. For each of these combinations, the instances belonging to thestate si are counted. We use #INST to denote the function counting the number ofinstances of each ontology class. The set operations over classes have the usual settheoretical meaning.

Definition 10 Let ci, c j two ontology classes. We have that:

– ci ∩ c j is the set containing the instances belonging both to ci and to c j

– ci ∪ c j is the set containing the instances belonging to ci and those belonging toc j

– ci is the set containing the instances not belonging to ci

3We do not consider constraints on logical relations among classes such as intersection, disjoint,union, and so on.

Page 12: Mining Bayesian networks out of ontologies

518 J Intell Inf Syst (2012) 38:507–532

On the basis of this table, we can compute the probability distributions by applyingthe Bayes formula in the following form:

P(ti|t j) = P(ti ∩ t j)

P(t j)=

#INST(ci∩c j)

#INST(T)

#INST(c j)

#INST(T)

= #INST(ci ∩ c j)

#INST(c j)

where �s(ci) = ti, �s(c j) = t j, �s(C) = T and T represents the HLN to which ti andt j belong to.

5.4 High level relations probability table (HLRT)

The second kind of distribution is related to the object properties of the ontology.Each domain concept is mapped to an HLN, and HLNs are connected via labelledarcs corresponding to object properties. In terms of Bayesian networks, the arcsrepresent Bayesian conditionings among HLNs. We then have to compute thedistributions associated with each HLN w.r.t. the object properties it is involvedin. Each HLN corresponds to a multi-valued random variable, which assumes allthe possible values referring to its own domain concept value, and the values of thedomain concept over which it ranges via specific labelled Bayesian conditionings.

For example, the conditional probability distribution of the hasPersonnel objectproperty is specified by the following notation:

P(Personnel|hasPersonnelCompany)

This particular notation represents the probability that the hasPersonnel relationexists between a specific kind of company and a specific class of personnel, dependingon how both person and company concept domains are modelled in the ontology.The previous conditional probability is computed by applying the Bayes formula inthe following form:

P(Personnel = a|hasPersonnelCompany = c)=

P(< Company = c, hasPersonnel, Personnel = a >)

P(< Company = c, hasPersonnel, Personnel = All >)

where a and c are classes belonging to personnel and company concepts respectively,and Personnel = All stands for all the instances belonging to Personnel. Note thatall the instances are counted in the HLN entered by the Bayesian conditioning arc.The following defines a function for counting the ontology triples.

Definition 11 (#TRIPLE function) Let T, {Ti} be a node and a set of nodes belongingto a polytree R, such that {Ti} are the parents of T. Let p1, p2, ..., pn be arcs belongingto R, such that ∀i.pi = arc(T, {Ti},R) with i = 1, ..., n. The function #TRIPLE isdefined as:

#TRIPLE : ({�−1s (T1), ..., �

−1s (Tn)}, {�−1

s (p1), ..., �−1s (pn)}, �−1

s (T))

−→ number of instances

Figure 4 shows a general example of 2lBN counting. Within each HLN, the instancesare distributed by LLN, on the basis of the ontological knowledge. In order to

Page 13: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 519

Fig. 4 Instances counting

correctly count the frequencies, we need to consider the cardinality of each HLR.For example, p2 associates instances of T3 with instances of T1 in the following way:

– t9 instances are in relation to r distinct t13 instances,– t8 instances are in relation to k distinct t11 instances.

Analogously, p1 associates T1 instances with T2 instances. For example, in order tocompute:

P(T1 = t12|p1,p2 T2 = t3, T3 = t9)

we have to count the number of triples < t3, p1, t12 > divided by the number of triples< t3, p1,All >, and the number of triples < t9, p2, t12 > divided by the number oftriples < t9, p2,All >. It means to compute the percentage of the triples satisfyingthe condition specified by the above probability. By applying the function #TRIPLE

and the function #INST we obtain:

(h + j)#INST(T1)

· #INST(T1)

(h + j) + r= (h + j)

(h + j) + r

Our ontology entails counting h and j objects of the triple < t3, p1, t12 >, because t4and t5 are subclasses of t3.

Page 14: Mining Bayesian networks out of ontologies

520 J Intell Inf Syst (2012) 38:507–532

5.5 Reasoning over 2lBN

By reasoning over LLNs belonging to the same HLN we can make inferences foranswering queries about concept subsumption, class overlaps, and class inclusions.

The inference engine we are proposing supports more general reasoning tasks,since it allows the formulation of queries involving object properties. At this level,each arc of the polytree is associated with specific semantics given by the objectproperty it refers to, and with a label given by the name of that object property.Thus, the conditional probability of an HLN depends both on the evidence nodeand on which arcs bind it to the evidence node. Each arc induces its conditioning onthe other HLNs through the query path (we call it induced conditional probabilityor conditioning space). In fact, the traversal of an arc pi on a path requires therestriction of the next conditioning space to the one in which the object propertyreferred to by pi, holds.

In terms of space complexity, computing all the conditional probabilities inadvance of all the possible conditionings among spaces singled out by each arccannot be efficient. It is more convenient to dynamically compute all the inducedconditional probabilities, every time we need to solve a specific Bayesian query. TheBayes formula is also used for computing these probabilities, but the induced spacein which its arguments are evaluated depends on the set of arcs through which thecurrent arc was reached. In our system, firstly each Bayesian query is decomposedinto the product of factors of directly computable probabilities, and then the rulescomputing each conditional probability factor are applied, taking into account theinduced conditional probability.

6 Bayesian query language

This section introduces a simple query language (BQ language in the following) forexpressing queries involving probabilities.

6.1 BQ language: syntax

The reasoning process over a 2lBN obtained by compiling an ontology is driven byuser queries aimed at computing conditional probabilities.

The syntax of the query language is shown in Table 1. The intuitive meaning of aquery is that we ask for the probability of a node in a given position in a hierarchy(IdeH = IdeL) given a path of relationships connecting it to the EVIDENCE. TheEVIDENCE can be a single node in the case of causal inference or diagnosticinference, or a pair of nodes, in the case of the explaining away inference.

Table 1 Grammar of thequery language

BAYESIAN_QUERY ::= P(NODE |PATH EVIDENCE)NODE ::= IdeH = IdeL | IdeH

PATH ::= ε | Rel | Rel.PATHEVIDENCE ::= NODE | (NODE,NODE)

Page 15: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 521

Fig. 5 Example of polytree

It is possible to represent both a prior probability and a conditional probability byspecifying an evidence. Evidence can refer either to is-a ontology relations amongclasses or to ontology object properties.

According to layered Bayesian networks, we can perform reasoning tasks ateach level. When we refer to the lower level, we involve only is-a relationshipsby performing reasoning among t nodes belonging to the IdeL syntactic category.At the upper level, we deal with arcs connecting nodes that belong to the IdeH

syntactic category. All the inference schemas we are going to present make inferenceat both levels of the network. However, at the upper level, we need to rememberthe semantics of each arc defined in the ontology, i.e. which object properties areinvolved in that particular reasoning task. Consequently, the Bayesian conditioningis annotated by the path that is made up of the arcs connecting the query node tothe evidence node (e.g. the set of object properties connecting the query class to theevidence class). Since the network is a polytree it is possible to specify the path in aunique way. Each path is formed by Rel elements . Nodes and paths refer to ontologyclasses and object properties respectively, according to the mapping �s.

From the point of view of the polytree structure, each evidence node is underor over a query node. An example is given in Fig. 5b. The set of nodes over D is{A, B, C, E} (formally denoted by D {A, B, C, E}) because all these nodes areconnected only to the parent of D i.e. A and B. The set of nodes under D is { F,G, H, I} (formally denoted by D {A, B, C, E}) because all these nodes are onlyconnected to the direct descendents of D, i.e. F and G.

6.2 BQ language: operational semantic

Now we will describe the main inference rules on polytrees, at the upper level of thenetwork.4

4The rules for reasoning at the lower level are structurally the same, although arcs at the lower levelrepresent is-a relationships.

Page 16: Mining Bayesian networks out of ontologies

522 J Intell Inf Syst (2012) 38:507–532

The conditional probability of a node depends on both the evidence node, and onwhich path binds it to the evidence node. Each arc has its probability distributionrelated to the domain and the range of the object property it refers to, but it alsoinduces its conditioning on the other HLNs along a query path.

As stated before, it is more convenient to dynamically compute all the inducedconditional probabilities. In order to compute these probabilities, the Bayes formulais used as described in Section 5.4, but the induced sub-space in which its argumentsare evaluated depends on the set of arcs through which we reached the current arc.This sub-space is denoted by σ in the following. For this purpose, we introduce thefollowing functions and definitions.

Definition 12 (Conditioning space function) Let T be a node of a polytree R. Let pbe an arc entering T. The function:

sub p(T)

returns the sub-space of instances of T.

If we need to refer to the whole subspace of instances of T, independently of p,we write sub(T).

In the operational semantics, first of all each Bayesian query is decomposed intothe product of directly computable probabilities, and then the rules for computingeach conditional probability factor are applied, taking into account the inducedconditional probabilities.

Tables 2 and 3 show the main semantic rules for the BQ language, related tothe most general case, in which each node is specified in the form HLN = LLN.Hereafter, we use t and T to refer to generic LLNs and HLNs respectively. Sincethe network is a polytree at both levels, each of the following definitions are valid forboth t and T nodes. The following definitions are used in the formal semantics.

Definition 13 (Root) Let t be a node in R. The function root is defined as follows:

root(t,R) ={

true if t is a root of the polytree;f alse otherwise.

Definition 14 (Parent) Let t be a node in R, and {ti} be a set of nodes in R. Thefunction parent is defined as follows:

parent(t,R) ={� if root(t,R);

{ti} otherwise.

where ∀i = 1, ..., n. ti is the direct ancestor of t in R.

Definition 15 (Path concatenation) Let p1, p2 be paths in R connecting T to T1 andT to T2 respectively, with T1 T and T2 T. p = p1.p2 is the path connecting T1

to T2.

Notice that, since both p1 and p2 are unique paths, also p is a unique path.

Page 17: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 523

Table 2 Operational semantic: recursive rules

RULE 1:: UNDER_OVER

T2 T1 T3 T2p1→ T1 T1

p2→ T3 p = p1.p2

σ1 = sub(T1) σ2 = sub(T2)

P(T2 = t2|p1 T1 = t1,R, σ2) −→ n1 P(T1 = t1|p2 T3 = t3,R, σ1) −→ n2

P(T2 = t2|p1 T1 = t1,R, σ2) −→ n3 P(T1 = t1|p2 T3 = t3,R, σ1) −→ n4

———————————————————————————–

P(T1 = t1|p(T2 = t2, T3 = t3),R, σ ) −→(

n1 · n2

n1 · n2 + n3 · n4

)

RULE 2:: OVER

T1 T2 T1p→ T2 σ1 = sub(T1)

{T0} = parent(T1,R) {p0} = arc(T1, {T0},R)

T j ∈ {T0} T jtail(p)→ T2 p j = arc(T1, T j,R) σ j = sub p j(T j)

P(T1 = t1|{p0}{T0},R, σ1) −→ n1 P(T j|tail(p)T2 = t2,R, σ j) −→ n2

Ti ∈ ({T0}/T j) ∧ (pi = arc(T1, Ti,R)) ∧ (σi = sub(Ti))

P(Ti|pi ε,R, σi) −→ mi

———————————————————————————–

P(T1 = t1|pT2 = t2,R, σ ) −→ n1 · n2 ·∏(#{T0})−1

i=1mi

RULE 3:: UNDER

T1 T2 T2p→ T1 {T0} = parent(T2,R)

Tk ∈ {T0} T1pref ix(p)→ Tk pk = arc(T2, Tk,R)

σ1 = sub(T1) σ2 = sub(T2) σk = sub pk (Tk)

P(T2 = t2|pk Tk,R, σ2) −→ n2

P(Tk|pref ix(p)T1 = t1,R, σk) −→ nk

P(Tk|pref ix(p)T1 = t1,R, σk) −→ n4

p = pm.tail(p)

P(T1 = t1|pm ε,R, σ1) −→ n1 P(T1 = t1|pm ε,R, σ1) −→ n5

———————————————————————————–P(T1 = t1|pT2 = t2,R, σ ) −→ n2·n1·nk

(n2·n1·nk)+(n2·n4·n5)

Definition 16 (Prefix and tail) Let {p1, p2, ..., pn} be a set of paths belonging to thepolytree R, such that p = p1.p2.p3. · · · .pn−1.pn is a path and p ∈ R. We define thefollowing functions:

pref ix(p) = p1.p2. · · · .pn−1

and

tail(p) = p2.p3. · · · .pn

Definition 17 (Arc) Let T be a node belonging to the polytree R. The function

arc(T, parent(T,R),R),

Page 18: Mining Bayesian networks out of ontologies

524 J Intell Inf Syst (2012) 38:507–532

Table 3 Operational semantic: non recursive rules

RULE 4:: INITCOND

{T0} = parent(T1,R) {p0} = {pi|Ti ∈ {T0} ∧ pi = arc(T1, Ti,R)}n2 = #TRIPLE({�−1

s ({T0})}, {�−1s ({p0})},�−1

s (T1))

n1 = #TRIPLE({�−1s ({T0})}, {�−1

s ({p0})}, σ )

——————————————————————————————–P(T1 = t1|{p0} {T0},R, σ ) −→ n1

n2

RULE 5:: INITPRIOR

root(T,R) T1p→ T

n1 = #TRIPLE({�−1s (T)}, {�−1

s (p)}, σ )

n2 = #INST(�−1s (T))

——————————————————P(T|p ε,R, σ ) −→ n1

n2

returns the set of the names of the object properties corresponding to the arcsconnecting T to its parents.

Finally, we use the notation T1p→ T2 to mean that the path p connects the node

T1 to T2.

7 Example of query execution by means of semantic rules

Here we show an example of query computation using the semantic rules. It refersto the ontology depicted in Fig. 6. It involves the concepts of Project, representingvarious kinds of projects, Event representing changes in the status or profile of eachcompany, Sector, representing the area in which each company operates, Company,and Person. All these classes are bound to each other by the hasCeo object property,connecting Person to Company, hasSector connecting Company to Sector, hasEventconnecting Company to Event, and leads connecting Person to Project.

The compiling process builds the two-level Bayesian network shown in Fig. 8.Figure 7a provides a snapshot of the interface of the system supporting the com-pilation process. As can be seen in Fig. 8, each HLN is a Bayesian network codingall is-a ontology relations, and the arcs among HLN identify the upper level of theBayesian network, coding all ontology object properties. With this structure we cansolve all Bayesian queries that can be written using the BQ language.

Suppose now that we want to know the likelihood that the chief executive officerof a company is female, given that she manages technology innovation projects, andthe company has been closed down. The evidence part is composed of a conjunctionof two pieces of evidence that are conditions regarding Event and Project. Putanother way, we want to know the probability that a path exists between InnovationProject and Woman, and between Woman and CompanyClousure. By using our

Page 19: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 525

Fig

.6E

xam

ple

ofon

tolo

gy

Page 20: Mining Bayesian networks out of ontologies

526 J Intell Inf Syst (2012) 38:507–532

Fig. 7 Usage of system GUI. a Ontology compiling process. b Query submission

query language, we specify the object properties that we want to involve in ourBayesian query, as Fig. 7b shows:

P(PERSON = Woman|(leads),(hasCeo.hasEvent) PROJECT

= Innovation, EV ENT = CompanyClosure, σ )

Since two evidence nodes are specified, the only rule we can apply is U NDER_OV ER. The premise of this rule requires the verification of the following statements:

PROJECT PERSON EV ENT, where PROJECT represents the HLNto which Innovation belongs, PERSON represents the HLN to which Womanbelongs, and EV ENT represents the HLN to which CompanyClosure belongs;punder = leads connects the query HLN (hereafter Q) to the evidence PROJECT(E− from now on), pover = hasCeo.hasEvent connects the query HLN to the evi-dence EV ENT (hereafter E+), and p = punder.pover.

With regard to paths involved in each Bayesian query, we ignore the direction ofeach arc, and we consider only the direction of the whole path. Conventionally, weassume that the direction of each path goes from the node on the left of the symbol“|”, to the node on the right. This means that when inference processes invert queryand evidence, the direction of the Bayesian path is also inverted. We can separate thiskind of evidence in the Bayesian query computation, by using the Bayes formula:

P(Q|pover,punder E+, E−) = P(E−|punder,p Q, E+) · P(Q|pover E+)

P(E−|p E+)

We can re-write this formula as follows:

P(Q|pover,punder E+, E−) = P(E−|punder,p Q, E+) · P(Q|pover E+) · Kwhere K = 1

P(E+|p E−)is a normalisation factor and is computed as shown in the

following. Since E+ and E− are d-separated by Q, we obtain:

P(Q|pover,punder E+, E−) = P(E−|punder Q) · P(Q|pover E+) · K.

Page 21: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 527

Fig

.82l

BN

deri

ved

byth

eon

tolo

gyco

mpi

ling

proc

ess

Page 22: Mining Bayesian networks out of ontologies

528 J Intell Inf Syst (2012) 38:507–532

In order to compute the factor K, we exploit the property that the sum of theprobability of an event given one piece of evidence and the probability of its negationgiven the same evidence is equal to one. In our case we have:

P(Q|pover,punder E+, E−) + P(Q|pover,punder E+, E−) = 1

By applying the Bayes formula to both members of the previous equation we obtain:

P(E−|punder Q) · P(Q|pover E+)

K + P(E−|punder Q) · P(Q|pover E+)

K = 1

from which we derive that K = P(E−|punder Q) · P(Q|pover E+) + P(E−|punder Q) ·P(Q|pover E+). Replacing K in the first equation, we obtain:

P(Q|pover,punder E+, E−, σ )

= P(E−|punder Q, σ ′) · P(Q|pover E+, σ ′′)P(E−|punder Q, σ ′) · P(Q|pover E+, σ ′′) + P(E−|punder Q, σ ′) · P(Q|pover E+, σ ′′)

Since we are at the first step of a Bayesian query computation, the σ spaces of eachfactor are the whole spaces of their query nodes; σ ′ = Innovation, and σ ′′ = Woman.This is the conclusion of the U NDER_OV ER rule. Since Q is an evidence over E−in P(E−|punder Q, σ ′), and E+ is an evidence over Q in P(Q|pover E+, σ ′′), now we cancompute them separately.5 Concerning P(E−|punder Q, σ ′), where E− = Innovationand Q = Woman, the system applies rule INITCOND, because the rule premiseparent(PROJECT) = PERSON holds. This rule computes the probability of anode, given all its parent nodes, in the space σ ′. Table 2 highlights that each Ti

belonging to the set of nodes T0 is connected to T1 via the arc pi. n2 counts all theontology triples T1 =< subject, predicate, object >, where object is any instance of T1

satisfying the intersection of the ranges of each predicate with domain subject. n1

counts the same triples, here called T2, where object contains instances belongingto the spaceσ ′. Note that σ ′ is a sub space of T1, and this guarantees that n2 >

n1. Finally, it computes the ratio between the number of instances belonging toobject of T1, and the number of instances belonging to object of T2. In our case,the set of nodes T0 consists of the single node PERSON, which is connected toPROJECT via leads (leads = arc(PROJECT, PERSON) holds). n1 counts allthe ontology triples T1 =< Woman, leads, Innovation >, and n2 counts the triplesT2 =< Woman, leads,All >, where All represents all the instances belonging toPROJECT.

Concerning P(Q|pover E+, σ ′′), where E+ = CompayClosure and Q = Woman, thesystem applies the rule OV ER because EV ENT PERSON. This rule recur-sively computes the probability of each ancestor of the query node, given the evi-dence, until either an ancestor reaches the evidence node, or the evidence becomesunder the query of that ancestor. At each step a Bayesian query P(Q|pover E+) isdecomposed into three factors:

(a) The probability of the query node Q given all its parents.(b) The probability of the parent of Q which is connected to the evidence via p,

given the evidence.

5The computation process of the same factors with Q instead of Q, is analogous and is omitted.

Page 23: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 529

(c) The product of the a priori probabilities of the remaining parents of Q.

In our case, the path binding PERSON to EV ENT consists of two specificarcs, hasCeo and hasEvent. The first step is to involve the parents of PERSON.The rule OV ER verifies that arc(PERSON, COMPANY) = hasCeo andparent(PERSON) = COMPANY.

Thus, concerning (a) we obtain P(PERSON = Woman|hasCeoCOMPANY, σ ′′),where σ ′′ = sub(Woman) = Woman. Since parent(PERSON) = COMPANY, therule INITCOND is applied and the (a) factor is immediately computed, as ex-plained above. In this case PERSON has just one parent and the (c) factor is equalto one. But generally speaking, it is possible that a node has many parents. Among allthe parents of Q, we need to select the parent, from which we can reach the evidenceE+. Since we are dealing with polytrees, the other parents are independent of theevidence. We need to compute all the a priori probabilities of the remaining parentsof PERSON w.r.t. the arcs connecting them to PERSON. In this case there are noparents of PERSON but, in general, we may need to compute probabilities with thefollowing form:

P(Ti|piε)

which is computed by the INIT PRIOR rule that computes the prior probability ofa node.6 Since each arc has specific semantics given by the object property it refersto, the prior probability also has to take into account the arcs exiting from the node.

The remaining factor (b), is:

P(COMPANY|tail(hasCeo.hasEvent) EV ENT = CompanyClosure, σ ′′′)

where tail(hasCeo.hasEvent) = hasEvent is the path binding COMPANY, theselected parent of PERSON, to the evidence. Note that the new σ ′′′ space on whichwe have to compute the new probability is sub hasCeo(COMPANY), since we justcrossed hasCeo starting from PERSON.

Next, the system does not recursvely re-apply the same rule to this factor,because the evidence EV ENT is under the query COMPANY. This implies achange in the reasoning schema, from the bottom-up inference to the top-downinference. The system verifies that EV ENT PERSON, thus the U NDERrule is applied. By applying the Bayes formula, the system inverts the evidencewith the query and the inference process terminates in one step. This is becauseparent(EV ENT) = COMPANY, and no factors on which we can apply recursionare returned. In fact n2 (and n4 i.e. the same factor with different truth values) areequal to one, because Tk and T1 are the same HLNs (COMPANY). Thus, the mainresulting factors are P(EV ENT = CompanyClosure|hasEventCOMPANY, σ ′′′), andP(COMPANY|hasEventε, σ

′′′′),7 where σ ′′′′ = COMPANY.Concerning the first factor, the system applies the INITCOND rule, and the

probability is computed. The second factor, is a prior probability, and INIT PRIORis applied since root(COMPANY) is verified.

6When Ti is not a root node, the prior probability is computed by summing the prior probability ofthe parent nodes recursively, until the root nodes are reached.7The computation process of the same factors with COMPANY instead of COMPANY, isanalogous and is omitted.

Page 24: Mining Bayesian networks out of ontologies

530 J Intell Inf Syst (2012) 38:507–532

The ratio in the rule conclusion is composed of n1 counting the triples <

COMPANY, hasEvent, CompanyClosure >, and n2 counting all the instances ofCOMPANY. The Bayesian query is solved, and the final probability value isobtained by multiplying all the above resulting factors.

8 Conclusions and future work

Dealing with uncertainty in knowledge representation and reasoning is a crucialproblem in many applications of artificial intelligence. To this purpose, as discussedin Section 2, many proposals have focused on adding features to represent uncer-tainty in ontologies and to reason on the knowledge enriched that way. Uncertaintyis added at the Tbox level by means of new expert knowledge.

The main contribution of our approach is a method for extracting the measures ofuncertainty from the ontology itself, that is from its Abox. On one hand the methodworks if the ontology is populated, but on the other hand it is not incompatibile withthe acquisition of the probabilities from external experts. The general framework weare moving in, is to look of the life cycle of an ontology as a very dynamic one, inwhich the process of populating the ontology can have a feedback on the Tbox itself.So far, we have considered only the computation of the probability distributions outof the Abox, but we could imagine to extract furter knowledge out of the instancesby means of data mining methods.

When limiting ourselves to the problem of computing the probability distribu-tions, there are two main issues still to be addressed: how to handle data propertyrelationships, and how to deal with cycles in 2lBN. Data properties representattributes of each concept of a domain. A very interesting problem is how to take dataproperties into account in the 2lBNs. It would then be possible to make inferencesover a richer Bayesian network, and to ask queries involving both object and dataproperties. For instance, with reference to the example in the previous section, wecould formulate a query such as: “What is the likelihood of a company default, whichincreases its capital by an amount greater than 300,000 Euros, when such an amount isgreater than 25% of the initial capital, and its CEO is a 70-year-old male?”.

With regard to dealing with cycles, when the structure of the Bayesian networkis not a polytree, all the reasoning recursive procedures may not terminate becausethere may be more than one path connecting two nodes. Reflective object propertiescan cause loops because, in this case a node is both the parent and the child of itself.For example, if we add a reflective relation hasParent to Company class as in theprevious example, (meaning that a Company has a parent Company) we may wantto ask the system the following question: “What is the probability that the CEO of acompany is also the Director of its parent company?”. The facts involved in the queryare that a company C1 has a CEO that is a person P1, the company C1 has a parentcompany that is C2, and C2 has a CEO that is P1. This query could be expressedby both a conjunction of queries and a conjunction of various sources of evidence.In some cases recursive procedures may not terminate because hasParent creates acycle in the network. On the other hand, our main constraint is that the resultinggraph, produced by the ontology compiling process, contains no cycles.

In any case, there are some solutions (Henrion 1988; Lauritzen and Spiegelhalter1988; Shafer 1996) for dealing with networks that are not polytrees, and we could

Page 25: Mining Bayesian networks out of ontologies

J Intell Inf Syst (2012) 38:507–532 531

investigate how to integrate them into our method. In Lauritzen and Spiegelhalter(1988) for example, the authors group nodes of the network by super-nodes, in sucha way that the resulting network is a polytree. This process can be repeated, bygrouping super-nodes by other super-nodes, until the network is a polytree. So allthe reasoning schemas for polytrees can be used for making inferences, but for eachsuper-node there are many conditional probability tables.

Future work will be devoted to extending current experiments, which so far havebeen carried out on the Financial Risk Management Ontology of the UE ProjectMusing (2006).

References

Baader, F., Calvanese, D., McGuinnes, D. L., Nardi, D., & Patel-Schneider, P. F. (2003). The descrip-tion logic handbook: Theory, implementation, applications. Cambridge: Cambridge UniversityPress.

Bellandi, A., & Turini, F. (2009). Extending ontology queries with Bayesian network reasoning. InProceedings of the IEEE 13th international conference on intelligent engineering systems.

Broekstra, M., Decker, D., Fensel, F., Harmelen, V., Horrocks, I., & Klein, S. (2002). Enablingknowledge representation on the web by extending RDF schema. Computer Networks, 39(5),609–634.

Cooper, G. F. (1990). The computational complexity of probabilistic inference using Bayesian beliefnetworks. Artif icial Intelligence, 42, 393–405.

Costa, K. B., & Laskey, P. C. G. (2005). Bayesian logic for the 23rd century. In Proceedings ofuncertainty in artif icial intelligence.

Cowell, R. G., Dawid, A. P., Lauritzen, S. L., & Spiegelhalter, D. J. (2001). Probabilistic networksand expert systems. New York: Springer.

Dagum, M., & Luby, P. (1993). Approximating probabilistic inference in Bayesian belief networks isNP-Hard. Technical Report ID: KSL-91-53.

Danev, B., Devitt, A., & Matusikova, K. (2006). Constructing Bayesian networks automatically usingontologies, second workshop on formal ontologies meets industry (FOMI 2006). Trento, Italy, 14December 2006, Applied Ontology.

Ding, Z., & Peng, Y. (2004). A probabilistic extension to the web ontology language OWL. In Thirty-seventh Hawaii international conference on system sciences.

Ding, Z., & Peng, Y. (2005). Modifying Bayesian networks by probabilistic constraints. In Proceed-ings of the conference on uncertainty in artif icial intelligence.

Ding, Z., Peng, Y., & Pan, R. (2004). A Bayesian approach to uncertainty modelling in OWLontology. In Proceedings of the international conference on advances in intelligent systems.

Ding, Z., Peng, Y., & Pan, R. (2005). BayesOWL: Uncertainty modeling in semantic web ontologies,soft computing in ontologies and semantic web. New York: Springer.

Flach, N., & Lachiche, P. A. (2000). Decomposing probability distributions on structured individuals.Reports of the 10th international conference on inductive logic programming.

Flach, E., & Gyftodimos, A. (2004). Hierarchical Bayesian network an approach to classif ication andlearning from structured data, knowledge representation and search.

Geiger, D., Verma, T., & Pearl, J. (1990). Identifying independence in Bayesian networks. Networks,20, 507–533.

Guarino, N., & Poli, R. (1995) Formal ontology in conceptual analysis and knowledge representation.International Journal of Human and Computer Studies, 43, 625–640.

Henrion, M. (1988). Propagation of uncertainty in Bayesian networks by probabilistic logic sampling.Uncertainty in Artif icial Intelligence, 2, 149–163.

Jensen, F. V., & Nielsen, T. D. (2007). Bayesian networks and decision graphs (2nd ed.). New York:Springer.

Kim, J., & Pearl, J. H. (1983). A computational model for causal and diagnostic reasoning in inferencesystems. In Proceedings of the eighth international joint conference on artif icial intelligence.

Lauritzen, D., & Spiegelhalter, S. (1988). Local computations with probabilities on graphical struc-tures and their application to expert system. Journal of the Royal Statistical Society B, 50(2),57–224.

Page 26: Mining Bayesian networks out of ontologies

532 J Intell Inf Syst (2012) 38:507–532

McGarry, K., Garfield, S., Morris, N., & Wermter, S. (2007a). Integration of hybrid bio-ontologiesusing Bayesian networks for knowledge discovery. In Proceedings of the third internationalworkshop on neural-symbolic learning and reasoning.

McGarry, K., Garfield, S., & Wermter, S. (2007b). Auto-extraction, representation and integrationof a diabetes ontology using Bayesian networks. In Proceedings of the 20th IEEE internationalsymposium on computer-based medical systems (pp. 612–617).

Niedermayer, D. (2008). An introduction to Bayesian networks and their contemporary applications(Vol. 156, pp. 117–130). New York: Springer.

Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. SanMateo: Morgan Kaufmann.

Shafer, G. (1996). Probabilistic expert systems. Philadelphia: SIAM.The Integrated European MUSING project (2006). http://www.musing.eu/. Accessed 1 July 2010.