ontology based approach to bayesian student model design

Accepted Manuscript

Ontology Based Approach to Bayesian Student Model Design

Ani Grubišić, Slavomir Stankov, Ivan Peraić

PII: S0957-4174(13)00228-5

DOI: http://dx.doi.org/10.1016/j.eswa.2013.03.041

Reference: ESWA 8493

To appear in: Expert Systems with Applications

Please cite this article as: Grubišić, A., Stankov, S., Peraić, I., Ontology Based Approach to Bayesian Student Model

Design, Expert Systems with Applications (2013), doi: http://dx.doi.org/10.1016/j.eswa.2013.03.041

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers

we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and

review of the resulting proof before it is published in its final form. Please note that during the production process

errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

http://dx.doi.org/10.1016/j.eswa.2013.03.041

http://dx.doi.org/http://dx.doi.org/10.1016/j.eswa.2013.03.041

1

Ontology Based Approach to Bayesian Student Model Design

Ani Grubišića, Slavomir Stankova, Ivan Peraićb aFaculty of Science, Teslina 12, Split, Croatia,

bHigh School “Biograd na moru”, Augusta Šenoe 29, 23210 Biograd na moru, Croatia

[email protected], [email protected], [email protected]

Tel: ++385 21 385 133, Fax: ++385 21 384 086

Corresponding author: Ani Grubišić [email protected] Faculty of Science, Teslina 12, Split, Croatia Tel: ++385 21 385 133, Fax: ++385 21 384 086

1

Ontology Based Approach to Bayesian

Student Model Design

Abstract

Probabilistic student model based on Bayesian network enables making conclusions about

the state of student‟s knowledge and further learning and teaching process depends on

these conclusions. To implement the Bayesian network into a student model, it is necessary

to determine "a priori" probability of the root nodes, as well as, the conditional probabilities of

all other nodes. In our approach, we enable non-empirical mathematical determination of

conditional probabilities, while “a priory” probabilities are empirically determined based on the

knowledge test results. The concepts that are believed to have been learned or not learned

represent the evidence. Based on the evidence, it is concluded which concepts need to be

re-learned, and which not. The study described in this paper has examined 15 ontologically

based Bayesian student models. In each model, special attention has been devoted to

defining "a priori" probabilities, conditional probabilities and the way the evidences are set in

order to test the successfulness of student knowledge prediction. Finally, the obtained results

are analyzed and the guidelines for ontology based Bayesian student model design are

presented.

Keywords

Intelligent tutoring systems, e-learning, Knowledge modeling, Probabilistic algorithms,

Bayesian network, conditional probabilities

1. Introduction

Today, there is the ubiquitous need and desire to improve the quality and availability of

various educational systems. The education has become a lifelong process and need, and its

quality became an impetus. It became clear that the latter cannot be achieved without the

appropriate and effective use of information and communication technology (ICT) in the

learning and teaching process. The use of ICT in learning and teaching enabled a concept

called e-learning. High-quality implementation of e-learning in a form of e-learning systems,

brings many advantages in the learning and teaching process and enables the desired new,

modern and quality education. The introduction of these technologies and innovations in the

field of education not only reduces the cost-effectiveness of pedagogical theory application,

but also opens opportunities to explore models from different fields (Millán & Pérez, 2002).

One special class of e-learning systems are Intelligent Tutoring Systems (ITS), which, in

contrast to the traditional systems that support the learning and teaching process, have the

ability to adapt to each student. It is this ability to adapt to each student that allows the

improvement of learning and teaching process, because it was shown that the best approach

is one-on-one tutoring (Bloom, 1984).

The intelligent tutoring systems are a generation of computer systems intended for the

support and enhancement of learning and teaching process in the selected domain

knowledge, thereby respecting the individuality of those who teach and those who are

http://ees.elsevier.com/eswa/viewRCResults.aspx?pdf=1&docID=21254&rev=0&fileID=230300&msid=1502D578-BBD7-40A8-81F5-AEF579687386

2

teaching ((Wenger, 1987), (Ohlsson, 1986), (Sleeman & Brown, 1982)). The intelligent

tutoring systems become a student‟s personal "computer teacher". A computer teacher on

one side is always cheerful, shows no negative emotions, while a student, on the other hand

has no need to hide ignorance and can communicate freely.

The intelligent tutoring systems can adapt the content and manner of presentation of certain

topics to different student abilities. In this sense, knowledge is the key to intelligent behavior,

and therefore intelligent tutoring systems have the following basic knowledge: (i) knowledge

that the system has about the domain knowledge (expert module), (ii) teaching principles and

methods for applying these principles (teacher module), and (iii) methods and techniques for

modeling student‟s acquiring knowledge and skills (student module).

Nowadays, ontology is commonly used to formalize knowledge in the ITSs (Lee, Hendler &

Lassila, 2001). Ontology describes a conceptual model of a domain, that is, it represents

objects, concepts and other entities that are believed to exist, and relations among them

(Genesereth and Nilsson, 1987, according to (Gruber, 1993)). The main structural elements

of the conceptual model are the concepts and relations. Consequently, every area of human

endeavor can be presented with a set of properly related concepts that correspond to

appropriate domain knowledge. Ontological description of the domain knowledge provides a

simple formalization of declarative knowledge using various tools that support working with

concepts and relations.

The component of an ITS that represents the student's current state of knowledge and skills

is called the student model. The student model is a data structure, and diagnosis is a

process that manipulates it. The student model as such represents a key component of ITS.

The design of these two components is called the student modeling problem (VanLehn,

1988). If a student model is "bad" to the extent that it does not even closely describe the

student‟s characteristics, then all the decisions of other ITS components that are based on

this model are of poor quality. Therefore, considerable research is carried out in the field of

student modeling.

Design and implementation of intelligent tutoring systems systematically contributed and still

contributes to the development of methods and techniques of artificial intelligence (AI). An

artificial intelligence, as the area that connects computers and intelligent behavior, occurred

at the end of 50 - and early 60-ies of last century with pioneers Alan Turing, Marvin Minsky,

John McCarthy and Allen Newell (Urban-Lurain, 1996). The AI is essentially oriented on

knowledge representation, natural language understanding and problem solving, all of which

is equally important for the development of the intelligent tutoring concept (Beck, Stern &

Haugsjaa, 1996)

One of the techniques widely used in different areas of artificial intelligence are Bayesian

networks. The idea of Bayesian networks is not the latest as they began to engage in the 80-

ies of the last century in the field of expert systems. The true extent of this area began in the

90-ies of the last century, probably due to the increase in computer speed and renewed

interest in distributed systems. Large computational complexity is one of the biggest barriers

to a wider use of Bayesian networks.

Unlike traditional expert systems, where the main purpose is modeling the experts‟

knowledge and replacing them in the process of planning, analyzing, learning and decision

making, the purpose of Bayesian network is modeling a particular problem domain. Thus

3

they become the help for experts while studying the causes and consequences of the

problems they model (Charniak, 1991).

It is extremely important to put emphasis on the domain modeling, as the most important

feature of Bayesian networks. The domain modeling refers to collecting and determining all

necessary values for Bayesian network initialization. Specially, it refers to modeling

dependencies between variables. Dependencies are modeled using a network structure and

a set of conditional probabilities (Charniak, 1991).

Integration of student models with Bayesian networks in the ITSs is one way to facilitate

student learning. Specifically, this model allows making conclusions about the actual student

knowledge. Also, it enables a computer tutor to guide the learning and teaching process

towards the learning of only those concepts that the student has not already learned.

The aim of this paper is to design student model based on Bayesian networks and

ontologies, and to compare the results of its predictions with actual student knowledge. All

probabilities in the majority of Bayesian network are determined empirically, and that

presents the biggest problem in their design. Therefore, novel methods for parameter

estimation in Bayesian networks are an important research endeavor, given the utility of the

Bayesian approach for student modeling. In our approach, we enable non-empirical

mathematical determination of conditional probabilities, while “a priory” probabilities are

empirically determined based on the knowledge test results.

In the second chapter attention is paid to the theoretical background underlying Bayesian

networks. The third chapter describes fifteen probabilistic student models that differ in the

way the conditional probabilities are defined and in the way the evidences are set. Finally,

the obtained results are analyzed and the guidelines for ontology based Bayesian student

model design are presented.

2. Application of Bayesian theory in student modeling

One major difficulty that arises in student modeling is uncertainty. The ITSs needs to build a

student model based on the small amounts of very uncertain information, because the

certainty of information can be obtained only based on students' activities in the system. If

these activities do not occur, the diagnosis must be carried out on the basis of uncertain

information. Moreover, because the ITS base its decisions on a student model, the

uncertainty from the student model contributes to poorly adaptive learning and teaching

process. The student model is built based on observations that ITS makes about student.

Student model can be viewed as a compression of these observations: raw data are

combined, some of them are ignored, and the result is a summary of beliefs about the

student.

Powerful general theory of decision making are developed and designed specifically for

managing uncertainty. One of them is a Bayesian probability theory ((Bayes, 1763), (Cheng

& Greiner, 2001), (Mayo, 2001)), which deals with reasoning under uncertainty. Bayesian

networks are one of the current approaches of solving uncertain modeling ((Mayo, 2001),

(Conati et all, 1997), (VanLehn et all, 1998), (Conati, Gertner & Vanlehn, 2002), (Gamboa &

Fred, 2002)). This technique combines the strict formalism of probability with a graphical

representation and efficient inference mechanisms.

4

The Bayesian network is a probabilistic graphical model that displays dependencies between

nodes (Pearl, 1988). It is a directed acyclic graph in which nodes represent variables and

edges represent their interdependence. A node is a parent of a child, if there is an edge from

the former to the latter. In Bayesian network nodes that have no parents are called roots, and

these variables are first placed in the Bayesian network. The roots are not influenced by any

node, while they affect their children. When we put all nodes that do not have children in the

Bayesian network, the structure of the Bayesian network is defined (Korb & Nicholson,

2011).

After the structure of Bayesian network is defined, it is necessary to define the possible

values that each node can take and the values of conditional probabilities of nodes (Korb &

Nicholson, 2011). For nodes without parents only “a priori” probabilities have to be defined.

“A priori” probabilities for all other nodes can be defined using the corresponding conditional

probabilities tables designed based on the values of the “a priori” probabilities of their

parents. Therefore, it is superfluous to explicitly specify “a priori” probabilities for the nodes

that have parents (Korb & Nicholson, 2011).

The dimension of the conditional probability table is determined by the parents of a node. In

the case of discrete binary variables, for the node with n parents, the conditional probability

table has 2n rows. Namely, the conditional probabilities table of the non-root node has 2n

rows. Each row contains one of 2n combinations of values T and F, that is, each row contains

t values T. We will use this number of values T in the conditional probability table rows for

enabling non-empirical mathematical determination of probabilities.

The conditional probability function has two well-marked values: (i) a value that a student

does not know the parents of the node, although he/she knows the concept itself - unlucky

slip, (ii) and a value that a student knows all the parents of the node, although he/she does

not know the node itself - lucky guess. In the literature, these well-marked values are equal

to 0.1 (Mayo, 2001). Consequently, the probability of truthful knowing is 1-0.1=0.9. That

means, if all the parents are known, we predict that the concept itself is known with the

probability 0.9.

The Bayesian network can be used for probabilistic inference on the probability of any node

in the network if conditional probabilities tables are known. Based on the Bayesian network,

the ITS can calculate the expectation (probability) of all unknown variables based on the

known variables (evidence) (Charniak, 1991).

3. Ontological Approach to the Bayesian Student Model Design

In this section we describe an approach to Bayesian student model design in an intelligent

tutoring system that has an expert knowledge presented in a form of ontology. We observe

only the model, not the diagnosis process itself.

For a student who has never learned domain-specific knowledge, we believe that he/she

knows concepts from the domain knowledge graph with the very small probability, that is, we

draw conclusions about his/her knowledge without testing it. We consider, in this case, that

the student knows a concept from the domain knowledge with a probability 0. Likewise, if we

have tested students' knowledge about some domain knowledge, and determined with

certainty that the student knows all the concepts from that domain knowledge, then we can

argue that the student knows all concepts from that domain knowledge with probability 1.

5

The problem is how to determine probabilities between 0 (not knowing) and 1 (knowing).

That is why we define an expert Bayesian student model (Mayo, 2001) over domain

knowledge concepts combined with the overlay model (Carr & Goldstein, 1977). For each

student, and domain knowledge for each concept we define the probability of a student

knowing that concept.

When a student model is created, all probabilities are 0. After each question from the test

that examines the knowledge about one or more concepts and relations, the conditional

probabilities of the concepts involved in those relations change. The correct answers

increase, while the incorrect answers reduce the probability of knowing the concepts

involved.

The teacher uses the probabilities from the student model to determine which concepts the

student knows with high probability (e.g., more than 0.8 – (Bloom, 1976)) so that the system

does not bother the students with learning and teaching concepts the student already knows.

In this way, the Bayesian student model serves as a "sieve" that passes to the learning and

teaching process only those concepts that the student does not know with high probability.

We have developed a methodology for determining the most suitable way of calculating

conditional probabilities, as well as, for determining the most suitable way of setting evidence

in such environment. We explain the structure of the domain knowledge ontology, the design

of the Bayesian network and present the results of applied methodology for selecting the

most suitable Bayesian student model.

For the purpose of this research, we have used an adaptive e-learning system Adaptive-

Courseware Tutor (AC-ware Tutor) (Grubišić, 2012). This system has ontological domain

knowledge, as well as, knowledge tests that enabled us to get instances of actual student‟s

knowledge before and after knowledge test.

3.1. Domain Knowledge Ontology

Domain knowledge is presented with concepts and relations between them. As we have to

indicate the direction of the relation between concepts, we use the terms child and parent. In

order to clearly indicate for each relation in the ontology which concepts it connects and what

the nature of that relation is, we introduce the following definition (Definition1):

Definition1: Let set ECON=K1,…,Kn, n≥0,be a set of concepts, set

EREL=r1,…,rmUhas_subtype, has_instance, has_part, slot, filler, m≥0, a set of

relations and ØE an empty element. Domain knowledge DK is a set of triplets

(K1, r, K2) that define that the concepts K1 and K2 are associated with relation r. In

this way we define that the concept K1 is the parent of concept K2 and that

concept K2 is the child of concept K1.

Since the basic elements of the domain knowledge triples are concepts and relations

between them, we use a graph theory as a mathematical foundation for managing subsets

and elements of domain knowledge, as well as for domain knowledge visualization (Gross &

Yellen, 1998). Therefore, we define a directed domain knowledge graph on which all the

rules from the graph theory apply (Definition2).

Definition2: For domain knowledge DK we define directed domain knowledge graph

DKG=(V,A) where the set of vertices is V=ECON and a set of edges

6

A=(K1,K2)(K1,r,K2) Є DK, r≠ØE, K1≠K2 is equal to a set of ordered pairs of

those concepts from the domain knowledge that are related.

The set of concept Kx„s parents is a set ParentsKx=KЄECON(Kx,r,K) Є DK,

K≠Kx, r≠slot, filler, ØE =KЄV(Kx,K) Є A, K≠Kx. The number pKx is equal to

the number of elements in the set ParentsKx and denotes the number of concept

Kx„s parents.

The set of concept Kx„s children is a set ChildrenKx=KЄECON(K,r,Kx) Є DK,

K≠Kx, r≠slot, filler, ØE =KЄV(K,Kx) Є A, K≠Kx. The number cKx is equal to the

number of elements in the set ChildrenKx and denotes the number of concept

Kx„s children.

The vertex from DKG is called a root if it has no parents and has children.

The vertex from DKG is called a leaf if it has parents and has no children.

These different types of relationships among concepts of the ontology describe the semantic

of the related nodes, but are completely equal when it comes to domain knowledge graph

design. The only thing that matters is if the relation between two nodes exists or not, and

what is the direction of that relation.

We define a weight function XV:VDKG[0,1] on the domain knowledge graph, where XV(K x)

corresponds to the probability of a student knowing concept K x. The values of the

function XV are determined after each knowledge test and calculation of it‟s values depend

on the question score and certain concept‟s parents and children.

In our approach, the values of the function XV depend on another weight function defined in

Definition3:

Definition3: The function XA: ADKG-1,0,1,…,max defined by ∀KxKyЄADKG, XA(KxKy)=score

obtained by answering a question that applies to edge KxKy, is a weight function

on a set of edges of the domain knowledge graph. The following applies:

∀KxЄVDKG, ∀KxKyiЄADKG, ∀KyiKxЄADKG

When the student model initializes, all edges in the domain knowledge graph have the weight

-1, that is ∀KxKyЄADKG, XA(KxKy)=-1, which means that the knowledge about the relationship

between those two concepts has not been tested yet. The function XA allows assigning

weights to those edges that connect the concepts mentioned in certain question. Each edge

from A’ has the weight between 0 and max, where max is an integer that corresponds to a

maximum score that can be assigned to a question. Thus, domain knowledge graph with

weight function XA becomes edge-weighted graph where the weighting function values

change after each knowledge test.

Now, a mathematical definition of the function XV is given (Definition4):

Definition4: The function XV:VDKG[0,1] defined by: ∀KxЄVDKG, ∀KxKyiЄADKG, XA(KxKyi)≠-1,

∀KyiKxЄADKG, XA(KyiKx)≠-1

7

is a weight function on a set of vertices of the domain knowledge graph.

The value XV(Kx) represents the weighted sum of values of the function XA on the edges

towards parents of the concept Kx and towards children of concept Kx. We believe that the

student knows the concept Kx completely if and only if XV(Kx)=1, which is true only if all the

values of the function XA on the edges towards parents and children are max. Then the

probability of knowing the concept is the highest, that is, equal 1.

3.2. Bayesian Network Design

We define a Bayesian network BN over domain knowledge concepts as directed acyclic

graph where the vertices are variables KX (they correspond the nodes of DKG) and that can

take the values T (true, learned) and F (false, not learned) and directed edges between

random variables show how are they are related (they correspond the edges of DKG).

To implement and test a new approach to probabilistic student model design, we defined the

Bayesian network with 73 nodes (see Figure 1). These 73 nodes represent 73 concepts from

domain knowledge “Computer as a system” (Grubišić, 2012). It is important to emphasize

that there are four root nodes: Computer system, Computer system model, Programming

language and Logical gate. In this paper we used the Bayesian networks software package

GeNIe (Graphical Network Interface) which provides a graphical user interface for simple

construction of Bayesian networks (http://genie.sis.pitt.edu).

Figure 1. Bayesian network structure

8

An adaptive e-learning system AC-ware Tutor has ontological domain knowledge, as well as,

knowledge tests that enable realization of the functions XA and XV over domain knowledge

graph, as described in previous section. The usage of the AC-ware Tutor has enabled us to

get instances of actual student‟s knowledge before and after knowledge test.

In the previous section we indicated that the function XV is of a paramount importance for

determining "a priori" probabilities of root nodes, but also in evidence setting.

The Bayesian student model contains all concepts from the domain knowledge, as well as

the values of the function XV for each concept. When the student model initializes, the values

of the function XV for each concept are 0. These values can be changed only after a

knowledge test is conducted. Since learning and teaching process consists of multiple

learning-testing cycles, the student model has to be changed after each cycle, that is, after

each knowledge test.

We observe two instances of a particular student model. A Student_model_1 is an instance

taken at the end of one learning-testing cycle. A Student_model_2 is an instance taken at the

end of the following learning-teaching cycle. These two instances have the same structure,

but the values of the function XV are different for certain concepts that were involved in the

knowledge test (after knowledge test, the values of the function XV change). These two

instances are the basis for complex analysis that will be presented below.

Based on the domain knowledge graph and the values stored in Student_model_1, three

different Bayesian networks will be designed (BN1, BN2, BN3) that have equal nodes and

edges, but different calculations of conditional probabilities. These three different networks

will be tested on the basis of setting evidences in five different ways (Test1,..., Test5).

Finally, there will be a total of fifteen different Bayesian student models (Model1,...,

Model15). After applying the methodology for selecting the most suitable Bayesian student

model, according to its prediction effectiveness, it will be clear which one of these models

most accurately predicts student‟s knowledge on the basis of comparison with the actual

values stored in Student_model_2.

3.2.1. Calculating the "a priori" probabilities

Every Bayesian network is defined when “a priory” probabilities of root nodes and conditional

probabilities tables of its non-root nodes are defined. Therefore, in our approach “a priory”

probabilities of root concepts are defined based on values of weight function XV. Namely, “a

priory” probability P(KX) of root node KX (corresponds to root from DKG) is defined as

following: P(KX)=P(KX=T)=XV(KX) - the probability of knowing the concept KX, as well as,

P(KX=F)=1-XV(KX) - the probability of not knowing the concept KX. If XV(KX)=0, then

P(KX=T)=0.1, because of the possibility of lucky guess.

In Table 1 there is a part of the values stored in Student_model_1. It is obvious, from the

above formulas, that the root nodes have the following “a priori” probabilities: Computer

system (T=0.33, F=0.67), Computer system model (T=0.083, F=0.917), Programming

language (T=0.1, F=0.9- lucky guess), Logical gate (T=0.0416, F=9584).

9

Table 1. Part of the student model instance

KX XV(KX)

1.44MB 0.125 Application software 0.375 Arithmetic operation 0 Arithmetic-logic unit 0.375 Assembler 0 Basic 0 C 0.25 Central unit 0.5 Central processing unit 0.5 Disjunction 0.083 Diskette 0.25 DOS 0 Fortran 0.125 I gate 0 OR gate 0.5 Information 0.125 Instruction 0.4375 Interpreter 0 Output unit 0.2917 Language translators 0 Capacity 0.125 Compact disc 0.25 Compiler 0 Conjunction 0 Logical operation 0.25

3.2.2 Conditional probabilities calculation methods

The most important feature of our approach is the usage of non-empirical mathematical

determination of conditional probabilities. This is very important as this is the bottleneck for a

wider use of this complex technique for predicting student‟s knowledge. Automating this

segment simplifies the Bayesian student model design. The conditional probabilities, in our

approach, depend only on the domain knowledge ontology, that is, only on the structure of

the domain knowledge graph DKG.

To specify the Bayesian student model that provides better and more accurate results, the

conditional probabilities will be calculated in three ways using the structure of the domain

knowledge graph DKG. In the first calculation method, the conditional probabilities depend

only on the number of parents, in the second method they depend on the number of parents

and children, while in the third method they depend only on the number of children.

Common to all three calculation methods are equal "a priori" probabilities of the root nodes.

What makes the difference in these three approaches is the determination of the “weight” of

the truth (knowing). The probability of truthful knowing 0.9 in each approach is divided with

different quantifiers (number of parents, number of children and parents, number of children)

and this value is the "weight" of truth in the conditional probabilities tables.

The first method for calculating conditional probabilities is a variation of the leaky AND

(Conati, Gertner & Vanlehn, 2002) lays on the fact that the fewer parent concepts are known,

the lower the probability of the target node is (and so belief that the student knows the

corresponding concept).

The Bayesian network we use for student modeling is derived from domain knowledge

ontology. The ontology includes semantically defined relationships among concepts that can

be bidirectional. Since the original Bayesian networks consider only parent nodes for

conditional probabilities calculations, in order to facilitate those bidirectional relations, we

10

have to fragment original Bayesian network to include a forest of nodes, in order to ignore

the non-directed dependencies encoded in the original Bayesian network. In this way we

transform serial connections (PXC) from node‟s parents (P) to node (X) and to node‟s

children (C) into converging connections (PXC) from node‟s parents (P) to node (X) and

from node‟s children (C) to node (X) (Korb & Nicholson, 2011).

Therefore, the second and the third method, the Bayesian network first has to be

fragmented, in order to enable conditional probabilities calculations using childe nodes as

well. For example, if domain knowledge ontology includes triples (Memory, has_subtype,

Mass memory), (Mass memory, has_subtype, Floppy Disk), (Mass memory, has_instance,

Hard Disk), (Mass memory, has_instance, Compact Disc), we would like to see how the fact

that the student knows concepts Floppy Disk, Hard Disk and Compact Disc (child nodes)

influences the prediction of knowing the concept Mass memory. Furthermore, we would like

to see how the fact that the student knows the previously mentioned concepts combined with

the concept Memory (child and parent nodes together) influences the prediction of knowing

the concept Mass memory.

3.2.2.1 Conditional probabilities based on the number of node’s parents

In the first approach (BN1), the conditional probabilities table of the non-root node KXis

defined based on the number of its parents - pKx. The number and percentage of nodes with

certain number of parents can be seen in Table 2.

Table 2. The structure of Bayesian network 1

Number of parents Total number of nodes Percentage of nodes

roots 4 5.48% 1 53 72.60% 2 12 16.44% 3 3 4.11% 4 1 1.37%

In this approach, each value T from conditional probability table, has “weight” 0.9/pKx.

Therefore, row “weight” is t*0.9/pKx. This row “weight” defines conditional probability of the

non-root concept Kx: P(KX=TKy Є ParentsKX, Ky=T ν Ky=F) = t*0.9/pKx. In the same way we

define P(KX=FKy Є ParentsKX, Ky=T ν Ky=F) = 1-t*0.9/pKx.

For example, let us analyze the determination of conditional probability of node "Mass

Memory" that has two parents ("Central Unit" and "Memory"). In this case, each T value in

the conditional probabilities table has "weight" 0.9/2 = 0.45. The conditional probability of

concept "Mass Memory" is given in Figure 2.

Central Unit T F

Memory T F T F

P(Mass memory=TCentral Unit, Memory) 0.9 0.45 0.45 0.1

P(Mass memory=FCentral Unit, Memory) 0.1 0.55 0.55 0.9

Figure 2. Conditional probabilities based on the number of node’s parents

11

3.2.2.2 Conditional probabilities based on the number of node’s parents and children

In the second approach (BN2), the conditional probabilities table of the non-root node KX is

defined based on the number of its parents and children - pKx+cKx. The number and

percentage of nodes with certain number of parents and children can be seen in Table 3.


Number of parents and children Total number of nodes Percentage of nodes

roots 4 5.48% 1 26 35.62% 2 17 23.29% 3 10 13.69% 4 11 15.07% 5 3 4.11% 6 2 2.74%

In this approach, each value T from conditional probability table, has “weight” 0.9/(pKx+cKx).

Therefore, row “weight” is t*0.9/(pKx+cKx). This row “weight” defines conditional probability

of the non-root concept Kx: P(KX=TKy Є ParentsKX U ChildrenKX, Ky=T ν Ky=F) =

t*0.9/(pKx+cKx). In the same way we define P(KX=FKy Є ParentsKX U ChildrenKX, Ky=T ν

Ky=F) = 1-t*0.9/(pKx+cKx).

For example, let us analyze the determination of conditional probability of node"Mass

Memory" that has two parents ("Central Unit" and "Memory") and three children ("Floppy

Disk", "Hard Disk" and "Compact Disc"). In this case, each T value in the conditional

probabilities table has "weight" 0.9/(2 +3)=0.18. The conditional probability of concept "Mass

Memory" is given in Figure 3.

Central Unit T

Memory T F

Floppy disk T F T F

Hard disk T F T F T F T F

Compact disk T F T F T F T F T F T F T F T F

P(Mass memory=T Central unit, Memory, Floppy Disk, Hard Disk, Compact Disk)

0,9 0,72 0,72 0,54 0,72 0,54 0,54 0,36 0,72 0,54 0,54 0,36 0,54 0,36 0,36 0,18

P(Mass memory=F Central unit, Memory, Floppy Disk, Hard Disk, Compact Disk)

0,1 0,28 0,28 0,46 0,28 0,46 0,46 0,64 0,28 0,46 0,46 0,64 0,46 0,64 0,64 0,82

Central Unit F

Memory T F

Floppy disk T F T F

Hard disk T F T F T F T F

Compact disk T F T F T F T F T F T F T F T F

P(Mass memory=T Central unit, Memory, Floppy Disk, Hard Disk, Compact Disk)

0,72 0,54 0,54 0,36 0,54 0,36 0,36 0,18 0,54 0,36 0,36 0,18 0,36 0,18 0,18 0,1

P(Mass memory=F Central unit, Memory, Floppy Disk, Hard Disk, Compact Disk)

0,28 0,46 0,46 0,64 0,46 0,64 0,64 0,82 0,46 0,64 0,64 0,82 0,64 0,82 0,82 0,9

Figure 3. Conditional probabilities based on the number of node’s parents and children

3.2.2.3 Conditional probabilities based on the number of node’s children

In the third approach (BN3), the conditional probabilities table of the non-root node KX is

defined based on the number of its children - cKx. The number and percentage of nodes with

certain number of children can be seen in Table 4.

12


Number of children Total number of nodes Percentage of nodes

roots 4 5.48% leafs 31 42.46%

1 2

12 14

16.44% 19.18%

3 9 12.33% 4 3 4.11%

In this approach, each value T from conditional probability table, has “weight” 0.9/cKx.

Therefore, row “weight” is t*0.9/cKx. This row “weight” defines conditional probability of the

non-root concept Kx: P(KX=TKy Є ChildrenKX, Ky=T ν Ky=F) = t*0.9/cKx. In the same way we

define P(KX=FKy Є ChildrenKX, Ky=T ν Ky=F) = 1-t*0.9/cKx.

The only problem in this approach are the nodes that have no children. For those nodes cKx

is 0, and we cannot calculate the “weight” of truth according to the above formula. Therefore,

we determine that each value T from conditional probability table, has “weight” 0.5.

For example, let us analyze the determination of conditional probability of node "Mass

Memory" that has three children ("Floppy Disk", "Hard Disk" and "Compact Disc"). In this

case, each T value in the conditional probabilities table has "weight" 0.9/3=0.3. The

conditional probability of concept "Mass Memory" is given in Figure 4.

Floppy Disk T F

Hard Disk T F T F

Compact Disk T F T F T F T F

P(Mass memory=T Floppy Disk, Hard Disk, Compact Disk) 0.9 0.6 0.6 0.3 0.6 0.3 0.3 0.1

P(Mass memory=F Floppy Disk, Hard Disk, Compact Disk) 0.1 0.4 0.4 0.7 0.4 0.7 0.7 0.9

Figure 4. Conditional probabilities based on the number of node’s children

3.2.3 Setting the pieces of evidence

The importance of the function XV is not only in determining the "a priori" probabilities of root

nodes, but it is also used for setting the pieces of evidence. We will observe five different

ways of setting the pieces of evidence (five different values of the function XV used as

threshold) in order to examine their efficiency and reliability. These various threshold values

are defined completely heuristically, and the following analysis is done in order to determine

which one of these heuristic values is the best for setting the pieces of evidence. It is

important to observe gained predictions and compare them with the actual values from the

instance of the real student model Student_model_2, the gold standard.

3.2.3.1 Test 1

Let Kx be any node. If XV(Kx) ≥ 0.9, then we set the evidence on the node Kx on

truth. Similarly, if 1-XV(Kx) ≥ 0.9, then we set the evidence on the node Kx on false.

Example 1: In the instance Student_model_1 exists the value XV(Computer system

model)=0.083. It is clear that 1-XV(Computer system model)=0.917 which is greater than 0.9.

Therefore, we set the evidence on the node Computer system model on false.

In this way, we set the false evidence on four nodes (5% of all nodes are pieces of

evidence).

13

3.2.3.2 Test 2



Example 2: In the instance Student_model_1 exists the value XV(Fortran)=0.125. It is clear

that 1-XV(Fortran)=0.875 which is greater than 0.8. Therefore, we set the evidence on the

node Fortran on false.

In this way, we set the false evidence on twelve nodes (16% of all nodes are pieces of

evidence).

3.2.3.3 Test 3



Example 3: In the instance Student_model_1 exists the value XV(Central Unit)=0.25. It is

clear that 1-XV(Central Unit)=0.75 which is equal to 0.75. Therefore, we set the evidence on

the node Central Unit on false.

In this way, we set the true evidence on four nodes and false evidence on seventeen nodes

(29% of all nodes are pieces of evidence)

3.2.3.4 Test 4



Example 4: In the instance Student_model_1 exists the value XV(Input Unit)=0.312. It is clear

that 1-XV(Input Unit)=0.688 which is greater than 0.65. Therefore, we set the evidence on the

node Input Unit on false.

In this way, we set the true evidence on four nodes and false evidence on nineteen nodes.

The results would be the same if we have observed a limit 0.7 (32% of all nodes are pieces

of evidence).

3.2.3.5 Test 5



Example 5: In the instance Student_model_1 exists the value XV(Application

Software)=0.375. It is clear that 1-XV(Application Software)=0.625 which is greater than 0.6.

Therefore, we set the evidence on the node Application Software on false.

In this way, we set the true evidence on four nodes and false evidence on twenty six nodes

(41% of all nodes are pieces of evidence).

14

By comparing the way the pieces of evidence are set, the differences are obvious if we

observe only the total number of set pieces of evidence. In Test1, only four pieces of

evidence are set. The same four pieces of evidence occur in all other ways of evidence

setting. It is logical to assume that there are differences in making predictions between

setting only four pieces of evidence in the Bayesian network (Test1) and setting thirty pieces

of evidence (Test5). It is also logical to assume that the more evidences are set, the more

accurate prediction model we have. These assumptions will be refuted furthermore.

It will be shown that it is essential to set evidence in a quality manner and that the quantity of

evidence does not play the most important factor for accuracy of prediction. Moreover,

mentioned five ways of setting evidence will be used on all three models of Bayesian

networks. This way we test the prediction effectiveness of, in total, 3x5=15 Bayesian student

models (Model1,..., Model15).

3.2.4 Testing the Bayesian student model prediction effectiveness

Student‟s knowledge after the knowledge test is contained in an instance of a student

model Student_model_2. That instance contains the actual student‟s knowledge. So, based

on the actual knowledge, it is known which concepts has student mastered, and mentioned

15 models will be analyzed to show which one of them best predicts this actual student‟s

knowledge.

The comparative analysis included only those nodes whose values of the function XV differ in

the student model instances Student_model_1 and Student_model_2. The nodes that are

evidences were excluded from the comparative analysis.

For each model, the percentage of overlapping in relation to an instance of the model

student Student_model_2 is given. If a value of prediction for a given node and its value of

the function XV differ less or equal to 0.1, then we have a prediction match. If a value of

prediction for a given node and its value of the function XV differ more than 0.1 and less or

equal to 0.2, then we have a prediction indication. If a value of prediction for a given node

and its value of the function XV differ more than 0.2, then we have a prediction miss. These

values are determined heuristically and have no support in the literature, therefore have to be

verified in future experiments. The results of an analysis are presented in Tables 5, 6, 7.

Table 5. Results of Bayesian student model prediction testing

Model Bayesian network Evidence setting Number of compared nodes Match ≤0.1

Indication 0.1≤0.2

Miss >0.2

Model1 BN1 Test1 41 36% 32% 32% Model2 BN2 Test1 41 32% 17% 51% Model3 BN3 Test1 41 15% 12% 73% Model4 BN1 Test2 33 12% 27% 61% Model5 BN2 Test2 33 18% 18% 64% Model6 BN3 Test2 33 9% 12% 71% Model7 BN1 Test3 23 22% 17% 61% Model8 BN2 Test3 23 22% 17% 61% Model9 BN3 Test3 23 26% 17% 57%

Model10 BN1 Test4 21 9% 19% 72% Model11 BN2 Test4 21 28% 24% 48% Model12 BN3 Test4 21 14% 33% 52% Model13 BN1 Test5 14 14% 28% 58% Model14 BN2 Test5 14 28% 14% 58% Model15 BN3 Test5 14 7% 21% 72%

15

Table 6. Average results regardless of evidence setting

Bayesian network Match ≤0.1


Miss >0.2

BN1 19% 25% 56% BN2 26% 18% 56% BN3 14% 17% 64%

Table 7. Average results regardless of Bayesian network

Evidence setting Number of pieces of evidence Match ≤0.1


Miss >0.2

Test1 4 28% 20% 52% Test2 12 13% 19% 68% Test3 21 23% 17% 60% Test4 23 17% 25% 58% Test5 30 16% 21% 63%

Observing the results from the mentioned tables, it is not difficult to conclude that the

network BN3 has the "worst" results (the highest percentage in the last column of Table 6 -

64%). This result can be attributed to the setting the conditional probability value on 0.5

for all nodes without children. When we compare the networks BN1 and BN2, we can

conclude that the BN1 has better results in Test1 and Test5, while in Test3 they have

identical results. The BN2 has shown better results in Test2 and Test4. Overall the BN2 has

the most matches (the highest percentage in the second column of Table 6 - 26%) and,

therefore, it can be considered the best.

Looking at the Table 7 and trying to answer which evidence setting is the best for knowledge

prediction, it is not hard to see that this is Test1 (the highest percentage in the third column

of Table 7 - 28%). We conclude that it is essential to set evidences in a quality manner and

that the quantity of evidence does not play the most important factor for accuracy of

prediction.

If we observe individual results in Table 5, we conclude that the model that has the most

overlap with actual student‟s knowledge is Model1 where the conditional probabilities were

determined based on the number of parents (BN1) and evidence were set for nodes whose

values of the function XV were greater or equal to 0.9 (Test1). This model has at least

prediction misses, and in relation to other models, very high number of prediction matches.

Therefore, this model stands as an appropriate Bayesian student model for predicting

student knowledge in ontology based environments.

4. Conclusion

The intelligent tutoring systems need to build a model based on uncertain information

received from the students. This information can be variously interpreted, therefore, the role

of probabilistic models is especially important. To build a model that, given the small number

of high-quality information to make conclusions about student‟s knowledge and to adapt to it,

requires a lot of effort. Bayesian network theory provides the above, but is particularly

important to find the best way how to implement Bayesian networks in student model design

process.

16

The desire to provide a new, modern and quality education requires a lot of research. This

paper describes a Bayesian student model, as a new way of modeling students in ontology

based intelligent tuoring systems. Development of this model is illustrated through empirical

research that included comparative analysis of fifteen potential models where we looked for

the one that the best predicts the student‟s knowledge.

The most important feature of this model is its non-empirical mathematical determination of

conditional probabilities, while “a priory” probabilities are empirically determined based on the

knowledge test results. This is very important as this is the bottleneck of using Bayesian

networks. Automating this segment will eventually lead to a wider use of this complex

technique for predicting student‟s knowledge, as the conditional probabilities depend only on

the structure of domain knowledge ontology.

The basis of this study was to find the best way to design a Bayesian student model.

Numerous deployments were observed and a special emphasis placed on determining the

conditional probabilities and evidence setting. Believing that the most important aspect is

determination of conditional probabilities, it was proven that nothing less important is the

setting of evidence. The model that, among all tested models, represents the best actual

student‟s knowledge is a model where the conditional probabilities were determined based

on the number of parents (BN1) and evidence were set for nodes whose values of the

function XV were greater or equal to 0.9 (Test1). In the future, we should find answers why

this model was wrong in 32% cases and eliminate these prediction misses.

It turned out that a small, but well selected, number of evidence enable better prediction of

the student‟s knowledge than many unfounded evidence. In further studies related to

Bayesian student model design, we will conduct broader research on a larger sample of

instances of actual student models and see what is the percentage of selected Bayesian

student model that accurately predict student‟s knowledge. Furthermore, we will test this

model on different domain knowledge to conclude about the model‟s generality and

independence of domain knowledge. There are several aspects that should be involved in

the extension of the presented work: in depth sensitivity analysis, real-time usage of the

network updated as a result of student actions in order to find out about its accuracy.

Acknowledgements

This paper describes the results of research being carried out within project 177-0361994-

1996 Design and evaluation of intelligent e-learning systems within the program 036-1994

Intelligent Support to Omnipresence of e-Learning Systems, funded by the Ministry of

Science, Education and Sports of the Republic of Croatia.

6. References

[1] Bayes, R. (1763). An essay toward solving a problem in the doctrine of chances. Philos.

Trans. R. Soc. London, 53, pp. 370-418.

[2] Beck, J., Stern, M. & Haugsjaa, E. (1996). Applications of AI in Education. Crossroads,

3(1), pp. 11-15.

[3] Bloom, B. S. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction

as Effective as One-to-One Tutoring. Educational Researcher, 13(6), pp. 4-16.

17

[4] Bloom, B.S. (1976). Human Characteristics and School Learning. New York: McGraw-Hill

Book Company

[5] Carr, B., Goldstein, I.P. (1977). Overlays. A theory of modeling for computer-aided

instruction, AI Lab Memo 406, Massachusetts Institute of Technology, Cambridge,

Massachusetts

[6] Charniak, E. (1991). Bayesian Networks without tears, AI magazine, 12(4), pp. 50–63.

[7] Cheng, J. & Greiner, R. (2001). Learning bayesian belief network classifiers: Algorithms

and system. Advances in Artificial Intelligence, pp. 141-151.

[8] Conati, C., Gertner, A. & Vanlehn, K. (2002). Using Bayesian networks to manage

uncertainty in student modeling. User Modeling and User-Adapted Interaction, 12(4), pp.

371-417.

[9] Conati, C., Gertner, A. S., Vanlehn, K. & Druzdzel, M. J. (1997). On-line student modeling

for coached problem solving using Bayesian networks. User Modeling: Proceedings of

the Sixth International Conference, UM97, pp. 231-242.

[10] Gamboa, H. & Fred, A. (2002). Designing intelligent tutoring systems: a bayesian

approach. Enterprise information systems III, 1, pp. 452-458.

[11] Gross, J. L. & Yellen, J. (1998). Graph Theory and Its Applications (1st ed.). CRC Press.

[12] Gruber, T. R. (1993). A translation approach to portable ontology specifications.

Knowledge acquisition, 5(2), pp. 199-220.

[13] Grubišić, A. (2012). Adaptive student's knowledge acquisition model in e-learning

systems. PhD Thesis, Faculty of Electrical Engineering and Computing, University of

Zagreb, Croatia (in Croatian).

[14] Korb, K.B., Nicholson, A.E. (2011). Bayesian Artificial Intelligence. Chapman & Hall/CRC

Press, 2nd edition

[15] Lee, T. B., Hendler, J. & Lassila, O. (2001). The semantic web. Scientific American,

284(5), pp. 34-43.

[16] Mayo, M. J. (2001). Bayesian Student Modelling and Decision-theoretic Selection of

Tutorial Actions in Intelligent Tutoring Systems, PhD Thesis, University of Canterbury,

Christchurch, New Zealand.

[17] Millán, E., Pérez-De-La-Cruz, J.L. (2002). A Bayesian Diagnostic Algorithm for Student

Modeling and its Evaluation, User Modeling and User-Adapted Interaction 12(2-3), pp.

281-330.

[18] Ohlsson, S. (1986). Some principles of intelligent tutoring, Instructional Science, 14, pp.

293–326.

[19] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems, San Mateo: Morgan

Kaufmann

[20] Sleeman, D. & Brown, J. S. (1982). Introduction: Intelligent Tutoring Systems: An

Overview. Intelligent Tutoring Systems (Sleeman, D.H., Brown, J.S.), pp. 1-11. Academic

Press, Burlington, MA.

[21] Urban-Lurain, M. (1996). Intelligent tutoring systems: An historic review in the context of

the development of artificial intelligence and educational psychology.In: Technical Report,

Department of Computer Science and Engineering, Michigan State University.

[22] VanLehn, K. (1988). Student Modeling. In Foundations of Intelligent Tutoring Systems, M.

C. Polson, J. J. Richardson, Eds., Lawrence Erlbaum Associates Publishers, pp. 55 – 79

[23] VanLehn, K., Niu, Z., Siler, S. & Gertner, A. (1998). Student modeling from conventional

test data: A Bayesian approach without priors, In Goettle B., Halff H., Redfield C., and

Shute V. (Eds.) Proc. of the 5th International Conference on Intelligent Tutoring Systems,

Springer-Verlag, pp. 434-443.

18

[24] Wenger, E. (1987). Artificial Intelligence and Tutoring Systems. Morgan Kaufmann

Publishers, Inc., California, USA

1

Table 1. Part of the student model instance

KX XV(KX)

1.44MB 0.125 Application software 0.375 Arithmetic operation 0 Arithmetic-logic unit 0.375 Assembler 0 Basic 0 C 0.25 Central unit 0.5 Central processing unit 0.5 Disjunction 0.083 Diskette 0.25 DOS 0 Fortran 0.125 I gate 0 OR gate 0.5 Information 0.125 Instruction 0.4375 Interpreter 0 Output unit 0.2917 Language translators 0 Capacity 0.125 Compact disc 0.25 Compiler 0 Conjunction 0 Logical operation 0.25

Table1

1


Number of parents Total number of nodes Percentage of nodes

roots 4 5.48% 1 53 72.60% 2 12 16.44% 3 3 4.11% 4 1 1.37%

Table2

1


Number of parents and children Total number of nodes Percentage of nodes

roots 4 5.48% 1 26 35.62% 2 17 23.29% 3 10 13.69% 4 11 15.07% 5 3 4.11% 6 2 2.74%

Table3

1


Number of children Total number of nodes Percentage of nodes

roots 4 5.48% leafs 31 42.46%

1 2

12 14

16.44% 19.18%

3 9 12.33% 4 3 4.11%

Table4

1

Table 1. Results of Bayesian student model prediction testing

Model Bayesian network Evidence setting Number of compared nodes Match ≤0.1


Miss >0.2

Model1 BN1 Test1 41 36% 32% 32% Model2 BN2 Test1 41 32% 17% 51% Model3 BN3 Test1 41 15% 12% 73% Model4 BN1 Test2 33 12% 27% 61% Model5 BN2 Test2 33 18% 18% 64% Model6 BN3 Test2 33 9% 12% 71% Model7 BN1 Test3 23 22% 17% 61% Model8 BN2 Test3 23 22% 17% 61% Model9 BN3 Test3 23 26% 17% 57%

Model10 BN1 Test4 21 9% 19% 72% Model11 BN2 Test4 21 28% 24% 48% Model12 BN3 Test4 21 14% 33% 52% Model13 BN1 Test5 14 14% 28% 58% Model14 BN2 Test5 14 28% 14% 58% Model15 BN3 Test5 14 7% 21% 72%

Table5

1

Table 1. Average results regardless of evidence setting

Bayesian network Match ≤0.1


Miss >0.2

BN1 19% 25% 56% BN2 26% 18% 56% BN3 14% 17% 64%

Table6

1

Table 1. Average results regardless of Bayesian network

Evidence setting Number of pieces of evidence Match ≤0.1


Miss >0.2

Test1 4 28% 20% 52% Test2 12 13% 19% 68% Test3 21 23% 17% 60% Test4 23 17% 25% 58% Test5 30 16% 21% 63%

Table7

Figure1

http://ees.elsevier.com/eswa/download.aspx?id=230295&guid=3c569977-92ad-42c1-b8d6-087c2599e576&scheme=1

Figure1_color

http://ees.elsevier.com/eswa/download.aspx?id=230296&guid=85e393f4-6a63-4acc-b4dd-d0039f7a118e&scheme=1

Figure2

http://ees.elsevier.com/eswa/download.aspx?id=230297&guid=fef87806-2953-4c59-a79b-070276fab26f&scheme=1

Figure3

http://ees.elsevier.com/eswa/download.aspx?id=230298&guid=71202186-7f64-42d9-8043-6f927bb1ddbc&scheme=1

Figure4

http://ees.elsevier.com/eswa/download.aspx?id=230299&guid=22bfcc5f-ba93-4ea6-b89d-dcb7e8d651d9&scheme=1

1. Probabilistic student model based on Bayesian network

2. Non-empirical mathematical determination of conditional probabilities

3. Guidelines for ontology based Bayesian student model design

4. Novel methods for parameter estimation in Bayesian networks

5. Expert Bayesian student model over domain knowledge concepts combined with the

overlay model

ontology based approach to bayesian student model design

Documents