doc

JÜRGEN HELLER, MARK LEVENE, KEVIN KEENOY, DIETRICH ALBERT & CORD HOCKEMEYER

5. COGNITIVE ASPECTS OF TRAILS

A stochastic model linking navigation behaviour to the learner’s cognitive state

ABSTRACT

Learners’ observable behaviour in following trails of learning objects (LOs) can be interpreted at a cognitive level, and, vice versa, cognitive assumptions will have impact on the observable trails. This implies a mutual dependence between observable and non-observable, or latent trails. The observable trails consist of sequences of LOs, and describe the way in which users navigate the learning environment. The (latent) cognitive trails consist of sequences of cognitive states that a learner can be in, and which may represent the learner’s preferences, expectations, skills, or some prior knowledge. The chapter suggests a stochastic model for describing the relationship between the observable and (latent) cognitive trails. In this model Knowledge Space Theory (Doignon, & Falmagne, 1985, 1999; Albert, & Lukas, 1999) is used as the basic framework for representing the learners’ knowledge. The discussion of the model covers the specification of its parameters as well as its empirical validation, and emphasizes its potential for optimizing learning environments through assessing the effectiveness of learning paths. Application of these methods is exemplified in two case studies, which are an English grammar course and a course teaching basics in the SQL query language. We demonstrate how the methods are put to work by means of web data mining techniques in connection with pre- and post-assessments of the users’ cognitive states before and after interacting with the learning environment. The paper concludes with a discussion of other potential applications, for instance, in web-usage, and possible extensions of the model to further improve its predictive power.

OBSERVABLE AND COGNITIVE TRAILS IN LEARNING ENVIRONMENTS

The subsequent sections briefly summarize the state-of-the-art in analysing and modelling user navigation behaviour in a learning environment on the one hand, and in representing the user’s knowledge and learning behaviour induced by

J. Schoonenboom, M. Levene, J. Heller, K. Keenoy, M. Turcsányi-Szabó (eds.), Trails in education: technologies that support navigational learning, 00–00.© 2006 Sense Publishers. All rights reserved.

Heller, J., Levene, M., Keenoy, K., Hockemeyer, C., & Albert, D. (2007). Cognitive aspects of trails: A Stochastic Model Linking Navigation Behaviour to the Learner's Cognitive State. In J. Schoonenboom, J. Heller, K. Keenoy, M. Levene & M. Turcsanyi-Szabo (Eds.), Trails in Education: Technologies that Support Navigational Learning (pp. 119-146). Rotterdam: Sense Publisher.

HELLER, LEVENE, KEENOY, ALBERT & HOCKEMEYER

interacting with the environment on the other hand. This sets the stage for integrating these two approaches into a general stochastic model, which allows for predicting the probability of observable as well as cognitive trails.

LO Trails and Their Stochastic Modelling

A web learning environment consists of learning objects (LOs) that are connected to each other by means of hyperlinks. An environment like this is created by instructors, and learners navigate through it in order to try to achieve their learning goals. The ability to model such environments and navigation through them is useful to aid the instructors (in this case learning environment designers) in designing the link structure for the learning material. In general, a web learning environment can be modelled as a graph, whose nodes are the LOs of the hypertext and whose edges are the hyperlinks between LOs. Figure 2 below provides an example of such a graph. Each of the LOs E, O1, O2, O3, O4, and O5 constitutes a node and the directed edges (i.e. arrows) indicate how the learner can move between them. Navigation through the environment is then represented by paths through the graph. These paths – which are the observable LO trails – can be stochastically modelled as Markov chains (Kemeny, & Snell, 1960) on the graph, where the probability of moving from one node (i.e. one LO of the environment) to another is determined by which LO the user is currently visiting. In order to deduce a Markov model of learners' observable behaviour, web data mining techniques can be used to deduce navigation sessions from the server log files (Borges, & Levene, 2000), including the statistics of how many times learners choose particular links when confronted with a choice (i.e. from the frequencies of page transitions as recorded in the log file). The trails taken by users through a learning environment can be observed in real-time by watching where the user clicks (either manually or using a tracking tool), or after the trail has been completed using web log mining techniques to find the trails from the log files recorded at the server. As the Markov model gives probabilities for transitions from one LO to another within the environment, the probability of observing any particular trail through the environment can be calculated by taking the product of each of the LO transition probabilities from the Markov model. For example, if the transition from O1 → O2

has probability 0.9 and O2 → O4 has probability 0.5, then the probability of observing trail O1 → O2 → O4 (once the user has arrived at page A) is 0.9 0.5 = 0.45.

When environments are modelled in this way the Markov property means that the only thing of importance in determining the next link to be followed is the current LO. This may be too simplistic for some situations. In such cases higher-order Markov models could be used to take the influence of earlier LO visits into account, or as in the subsequently developed more complex model, the sequence of cognitive states of the learner can also be considered. In this case semantic meta-data about the LOs can be used in conjunction with the basic Markov model to give a deeper analysis of LO trails: Meta-data about the concepts that can be learned from the LO and the types of problem that can be answered by the learner after the

2

COGNITIVE ASPECTS OF TRAILS

LO has been mastered can be used to help infer the possible learning paths (cognitive trails) that might be expected from learners following a particular trail.

Trails in Knowledge Spaces

As learners navigate through a learning environment and interact with the LOs in it, their knowledge may change: they learn. In this section we briefly outline Knowledge Space Theory that provides a framework for representing the learner’s state knowledge its changes. Knowledge Space Theory has been developed constantly over the last twenty years (Doignon, & Falmagne, 1985, 1999; Albert, & Lukas, 1999). This theory is fundamentally different from the traditional psychological approach, which is based on a numerical evaluation of some ‘aptitude’. Knowledge Space Theory provides a completely qualitative, i.e. non-numerical, but nevertheless precise representation of the knowledge of individual learners in a certain domain. In the last two decades it accumulated an impressive body of theoretical results (see Doignon, & Falmagne, 1999).

Knowledge Space Theory is based on the notion of a knowledge domain, which is identified with a set Q of problems. Usually, these problems are assumed to be dichotomous, i.e. the answer to each of these problems is judged to be either correct, or incorrect. A typical example in the field of arithmetic is the problem

Gwendolyn is 3/4 as old as Rebecca. Rebecca is 2/5 as old as Edwin. Edwin is 20 years old. How old is Gwendolyn?

The knowledge state of a learner is identified with the subset K Q of problems in the knowledge domain Q that the learner is capable of solving. This means that for a knowledge domain of n problems there exist no less than 2n

potential knowledge states. The observed solution behaviour, however, will exhibit some dependencies – from observing that a student is capable of mastering a given problem, sometimes the mastery of other problems can be surmised. Due to these mutual dependencies not all of the subsets of Q are plausible knowledge states, but only a collection of them. This motivates the definition of a knowledge structure as a collection K of knowledge states K of a given knowledge domain Q. A knowledge structure is assumed to contain at least the empty set and the set Q, as it may always be the case that none or all of the problems are solved.

Set-inclusion induces a natural ordering on a knowledge structure. Figure 1 provides an illustration of such an ordering, in which the upwards-directed line segments represent set-inclusion. Any sequence of upwards directed line segments from the naive knowledge state to the set Q of full mastery may be interpreted as a cognitive trail representing one of the possible learning paths in the given knowledge structure. The learning path {}, {a}, {a, b}, {a, b, d}, {a, b, c, d} provides an example of a cognitive trail that is possible in the knowledge structure K.

3


Figure 1. Example of a knowledge structure on the knowledge domain Q = {a, b, c, d}.

Knowledge structures that satisfy certain properties have received particular attention; among them are knowledge spaces, which are closed for set union. This means that with any two knowledge states their set-union is also a state of the knowledge space. Quasi-ordinal knowledge spaces are closed for both union and intersection (see Doignon, & Falmagne, 1985, 1999). The notion of a knowledge space is equivalent to the concept of an and/or-graph that is well known in the context of artificial intelligence.

The consideration of learning processes in Knowledge Space Theory is based on the interpretation of the upwards directed line segments in the diagram of a knowledge structure (see Figure 1) as the possible learning paths. We can formulate Markov models of the learning process that proceeds along these paths representing cognitive trails. The simplest such model identifies the states of the Markov chain with the knowledge states of the knowledge structure. As an initial condition we may assume that the learner is in the naive state with probability 1. Moreover, a transitions from state K to state K' is only possible if K' is a neighbouring state that covers K. Consider the knowledge structure of Figure 1, and assume, for example, that an individual is in knowledge state {a}. Then the only possible transitions are into the knowledge states {a, b} and {a, c}, as these are the only knowledge states that are linked to {a} upwards directed lines. On this basis, various learning models were suggested, which are reviewed in Doignon, & Falmagne (1999).

Extensions of Knowledge Space Theory have been suggested that enrich the originally behaviouristic perspective of the approach, which confines consideration to the problem solving behaviour, by embedding cognitive assumptions. A recent account of the results obtained within this line of research can be found in Albert, & Lukas (1999). Moreover, the approach has proved to be highly effective and beneficial in adaptive computer assisted knowledge assessment and technology

4


enhanced learning. There are various systems that implement this approach, e.g. the commercial ALEKS system (http://www.aleks.com), which is a fully automated mathematics tutor, and the APeLS system (Conlan, Hockemeyer, Wade, & Albert, 2002), which rests upon a competence-based extension of the theory (Albert, Hockemeyer, Conlan, & Wade, 2001; Conlan, Hockemeyer, Lefrere, Wade, & Albert, 2001).

A STOCHASTIC MODEL

In this section we suggest a formal framework capable of capturing the mutual dependence between observable trails of LOs and cognitive trails on knowledge domains by means of a stochastic model. Two sets form the basis of the stochastic modelling in the subsequently developed theory. On the one hand, we consider the set L of LOs that constitute the learning environment. On the other hand, we refer to a knowledge structure K on a knowledge domain Q. The knowledge domain Q consists of problems that test the skills and competencies taught by the LOs in L. The two sets L and K thus are intimately related. This relationship is actually mediated by assigning skills and competencies to each element in both sets. For the present purpose it can be characterised by a prerequisite map g that associates to each problem in Q a collection of subsets of LOs. Each of these subsets assigned to a problem is interpreted a minimal subset of LOs that provide the content sufficient for solving it. More than one such subset may be assigned to a problem to reflect the fact that there may be different ways to solve it, which are taught in different subsets of LOs. To put it more formally, a prerequisite map on the knowledge domain Q is defined as a map g that associates to each problem in Q a nonempty collection of nonempty subsets of L. Assuming each collection of LOs to be nonempty amounts to confine consideration to problems for which the content taught by the LOs is relevant.

Consider the following simple example. The set of LOs for the learning environment illustrated in Figure 2 is given by L = {E, O1, O2, O3, O4, O5}.

Figure 2. Example of a learning environment.

5


The symbol E denotes an entry point to the learning environment. In contrast to E, the LOs O1 to O5 teach some content, and may be accessed in accordance with the depicted link structure.

Suppose that the problems in the knowledge domain Q = {a, b, c, d} test the contents taught by the LOs in L. In particular, let the prerequisite map g on Q be given by

g(a) = {{O1, O2}},g(b) = {{O1, O3}},g(c) = {{O1, O2, O3, O4}, {O1, O2, O5}},g(d) = {{O1, O2, O3, O5}}.

According to this assignment the problem a can be solved using the content taught by the two LOs O1, O2. Problem c can be solved in two different ways. One of the solutions requires the content of the LOs O1, O2, O3, O4, and the other one the content of O1, O2, O5. Details on how a prerequisite map may be established from taught and tested competencies associated to LOs and problems (e.g. in the form of metadata) can be found in Heller et al. (2004).

The prerequisite map g induces a knowledge structure on Q. For deriving this structure we consider all subsets of L, which are taken as representing the portion of the content that has been learned after visiting a sequence of LOs. The subset {O1, O3, O4}, for example, induces the subset {b} of Q, i.e. having learned the content of O1, O3, O4 only allows to solve problem b. The subset {b} thus is a possible knowledge state of the knowledge structure

K = {, {a}, {b}, {a, b}, {a, c}, {a, b, c}, {a, b, d}, Q},

which actually is the knowledge structure illustrated in Figure 1. The diagram of the knowledge structure shows the possible learning paths that a user of the above displayed learning environment can take.

Basics

We now outline a stochastic model capable of predicting the overt navigation behaviour and the underlying learning process, as well as their mutual dependencies. As only knowledge of both the current LO L and the current knowledge state K leads to a proper characterisation of the process at any point in time, we subsequently consider the set of all pairs (L, K), which is the Cartesian product L K. Within a Markov chain model we identify the set of potential states the learner can be in (the Markov states, or M-states, for short) with exactly this Cartesian product L K. The Markov property that we assume implies that the future of the process is completely determined by the current M-state, i.e. the pair (L, K) characterising the learners position in the learning process at this point in time. In particular, it is not important how the process got to this state, or, in other words, all information about the past is embodied in the current state.

6


We consider discrete time, which will be indicated by a subscript t = 0, 1, 2, … . This index is incremented whenever the learner selects a link to another learning object from those available at a time. For all points t in time let L t and Kt denote random variables that take their values in L and K, respectively. The Markov chain then consists of a sequence of pairs

(L0, K0), (L1, K1), (L2, K2), (L3, K3), … .

To identify this sequence information for an individual learner in a specific learning environment information from different sources has to be integrated. The trail of visited LOs can be obtained by log data analysis. Assessing the knowledge states of K requires testing the learner on the problems in the knowledge domain Q. We will provide details on the requested knowledge assessment later on.

The Markov model is defined by specifying an initial probability distribution on the set of M-states L K – i.e. by specifying P(L0, K0) – and by giving the conditional probabilities P(Lt, Kt | Lt-1, Kt-1) of a transition from state (Lt-1, Kt-1) at time t-1 to state (Lt, Kt) at time t, for all t = 1, 2, … . Drawing upon the Markov property allows for computing the probability of any trail (L0, K0), …, (Ln, Kn) by

P((L0, K0), …, (Ln, Kn)) = P(Ln, Kn | Ln-1, Kn-1) ∙ … ∙ P(L1, K1 | L0, K0) ∙ P(L0, K0).

Specifying an initial probability distribution on L K will pose no problems in most of the cases. The LO L0 is taken to be nothing else than the entry point to the learning environment, the common starting point from which all learners depart (denoted by E in Figure 2). Consequently, LO L1 represents the first LO providing content that is actually inspected by the learner, which is in line with considering K0 as the knowledge state before being exposed to the content. This knowledge state is either the naive state for all learners (e.g. whenever the material is completely new to them), or is assumed to be any other state in K, which may differ over learners. In the first case, we have the initial condition P(L0, K0) = 1 whenever L0 = E, K0 = and P(L0, K0) = 0 otherwise. Here E denotes the entry point of the learning environment. In the second case only the probabilities P(L0 = E, K0) can be non-zero, and their actual values may be estimated from data of an assessment that precedes access to the learning environment (pre-assessment).

Defining the conditional probabilities P(Lt, Kt | Lt-1, Kt-1) for all t = 1, 2, … requires to take into account their interpretation in the present context. In fact, the transition (Lt-1, Kt-1) → (Lt, Kt) may be interpreted as:

A person visiting LO Lt-1 and having knowledge state Kt-1 at time t-1 selects LO Lt

and, as a consequence of this, moves into knowledge state Kt at time t.

Notice that, whenever no learning occurs, the knowledge state Kt at time t may equal the knowledge state Kt-1 at time t-1. In any case, the above interpretation suggests that the transition from state (Lt-1, Kt-1) at time t-1 to state (Lt, Kt) at time t can be decomposed into two sub-processes, or stages

7


1. the selection of the next LO,2. the learning process induced by the selected LO.

These two stages are reflected in the formula

P(Lt, Kt | Lt-1, Kt-1) = P(Lt | Lt-1, Kt-1) ∙ P(Kt | Lt, Kt-1),

which additionally incorporates the (straightforward and quite plausible) assumption that only the current LO will affect the transition between knowledge states.

This equation supposes that the effect of the history of the visited LOs up to L t-1

is completely subsumed in the knowledge state K t-1. The conditional probability P(Lt | Lt-1, Kt-1) refers to the first of the above listed stages, and describes the impact of the knowledge state Kt-1 on choosing the link from LO Lt-1 to Lt. The conditional probability P(Kt | Lt, Kt-1) models the second stage, and captures the impact of the LO Lt on the transition of knowledge states from Kt-1 to Kt (i.e. the learning process). The presented formal framework thus describes the influence that knowledge states impose on the transitions between LOs, as well as the effect that the observable trails have on the knowledge states. Figure 3 provides a graphical representation of the model showing the conditional dependencies, from which the above equation can be inferred. It forms a so-called Bayesian network (Jensen, 2001).

Figure 3. Bayesian network representing the assumed conditional dependencies.

Considering the conditional probability P(Lt | Lt-1, Kt-1) instead of P(Lt | Lt-1) allows for modelling dependencies between LOs that are mediated by knowledge states. A learner, who already knows the content of a LO, for example, may select this one with lower probability compared to a learner not having this prior knowledge. In the same way P(Kt | Lt, Kt-1) explicitly refers to the LO that possibly causes a transition between knowledge states. By this the model generalises the conceptions where the trails of LOs and the knowledge trails, respectively, were assumed to form Markov chains.

8


Parameter Constraints

As outlined above, the stochastic model contains a number of parameters (the initial and conditional probabilities) that need to be specified in each application of the model. Some of the values, however, are already determined due to characteristics of the considered situation. Various aspects of the learning environment put constraints on the conditional probabilities P(L t | Lt-1, Kt-1), and P(Kt | Lt, Kt-1), among them the topology induced by the link structure on the set of LOs, and the relationship between the possible knowledge states. Due to the link structure on L on the one hand, and the knowledge structure K on the other hand, some of the conditional probabilities will have zero value (‘structural zeros’). These parameters need not be estimated in the applications. We illustrate the resulting reduction of the number of free parameters for the learning environment depicted in Figure 2 with |L| = 6, and the knowledge structure K of Figure 1 with |K| = 8. The probability P(Lt | Lt-1, Kt-1) can be non-zero only if there is a direct link from LO Lt-1 to Lt. There are 13 direct links between the 6 LOs. This means that instead the |L|2 ∙ |K| = 62 ∙ 8 = 288 potential parameters only 13 ∙ 8 = 104 conditional probabilities have to estimated from the data. Similarly, for P(K t | Lt, Kt-1) to be non-zero we have to have Kt-1 Kt (see condition C3 below), and the difference between the knowledge states Kt-1 and Kt has to be related to the content taught in LO Lt (see condition C2 below). If we take into account set-inclusion then, instead of |K|2 ∙ |L| = 82 ∙ 6 = 384 potential parameters only 31 ∙ 6 = 186 non-zero conditional probabilities remain (31 is the number of pairs K, K' K for which K K' holds).

Not only in large scale applications the general model outlined above may still contain a large number of parameters, even under the constraints discussed above. In order to further reduce the number of free parameters, more specific sub-models may be formulated. The additionally introduced assumptions, however, have to be checked for theoretical soundness as well as for empirical validity. For instance, the two conditional probabilities P(Lt | Lt-1, Kt-1) and P(Kt | Lt, Kt-1) may be assumed to satisfy

P(Lt | Lt-1, Kt-1) = P(Lt | Lt-1) ∙ P(Lt | Kt-1),

and

P(Kt | Lt, Kt-1) = P(Kt | Lt) ∙ P(Kt | Kt-1),

respectively. Each of these equations can be plugged into the general stochastic model, which decreases the number of parameters from 104 to 13 + 6 ∙ 8 = 61 and from 186 to 31 + 8 ∙ 6 = 79, respectively. These additional assumptions relate the stochastic model to the above discussed Markov chain models of both the navigation behaviour – captured by the conditional probabilities P(Lt | Lt-1) – and the learning process on the knowledge structure K – captured by the conditional probabilities P(Kt | Kt-1). Apart from the theoretical soundness of the interpretation

9


of the parameters the empirical adequacy of the additional assumptions has to be checked by statistical tests. Standard methods, such as likelihood ratio tests (e.g. Lindgren, 1993) or methods based on information criteria, like AIC or BIC (Akaike, 1973; Schwarz, 1978) may be used to test them against the general model outlined above.

APPLICATION OF THE MODEL

Observability, Parameter Estimation, and Empirical Validation

Whereas the LO Lt visited at time t can be observed directly, this is not true for the knowledge state Kt. Its determination would require partial (or even full) assessment of the learner’s knowledge state at each point t in time, which obviously is not a viable option in practical applications. Besides consuming a lot of time, this continuously disrupts the learning process. So, we are left with only partial observability of the M-states. In order to deal with this situation we consider a scenario in which we have a pre-assessment (K0) before the LOs are inspected, and a post-assessment (Kn) after finishing the interaction with the learning environment (resulting in the sequence L0, …, Ln of visited LOs). Due to the partial observability of the M-states the stochastic model may be conceived as a special case of a hidden Markov model as illustrated in Figure 4. The squares represent entities that are observable within the considered scenario, while non-observable M-states are represented by circles. As the labelling of the downward arrows in the diagram indicates the relation between the M-states (Lt, Kt) and the respective LO Lt is deterministic, i.e. we have P(Lt | Lt, Kt) = 1.

Figure 4. Bayesian network representation of the stochastic model as a hidden Markov model.

Standard procedures (e.g. Viterbi, Baum-Welch or EM algorithms) are available for identifying the most likely sequence K1, …, Kn−1 and for parameter estimation, given the observation of L0, …, Ln and K0, Kn (cf. Rabiner, & Juang, 1986; Rabiner, 1989). Notice that the above described parameter constraints induced by the link topology and the knowledge structure need to be implemented properly in these procedures (e.g. Niculescu, 2005; Niculescu, Mitchell, & Rao, 2005).

10


The empirical validation of the Markov chain model can be based on deriving a prediction of the distribution over the knowledge states Kn in the post-assessment, given the sequence L0, …, Ln of visited LOs and the pre-assessment K0. This prediction may be contrasted with the distribution that is estimated directly from the log data by the observed relative frequencies within a cross-validation design (i.e. the two estimates of the marginal distribution are based on different sub-samples). Methods based on the Kullback-Leibler divergence (sometimes also called relative entropy) may be used to evaluate the resulting discrepancy.

Identifying Cognitive Trails

In the sequel we consider a situation in which all the content taught by the LOs is completely new to the learner. In this case knowledge is acquired exclusively by navigating the learning environment and we have K0 = . Again, the sequence of visited LOs is observed and Kn is assessed in a post-test. The fundamental question is, whether we can uniquely identify the sequence of M-states (Lt, Kt) with t = 1, …, n. The major advantage in this situation is that parameter estimation techniques based on log data analysis may be employed (cf. Borges, & Levene, 2000; Levene, & Loizou, 1999). Given the partial information it is, however, not possible to uniquely identify the trail that the learner took. For this we need to introduce assumptions on the learning process. The following plausible assumptions narrow down the set of possible sequences of knowledge states K1, …, Kn−1 by making explicit their compatibility to the observations.– Condition C1. Solving a problem q cannot be learned before visiting a set of

relevant LOs, which are sufficient for its solution: If q Kt \ K0 then there is a subset N g(q) such that N {L0, …, Lt};

– Condition C2. Learning to solve a problem q can only occur when visiting a relevant LO: If q Kt \ Kt-1 then there is a subset N g(q) such that Lt N;

– Condition C3. There is no forgetting, i.e. the trail of knowledge states K0, …, Kn

is non-decreasing (with respect to set-inclusion): K0 … Kn.Condition C1 assumes that a correct response to a problem, which was not

solved in the pre-assessment, cannot occur before visiting a subset of LOs that provide content sufficient for solving it. This reasonable assumption is related to the scope of the given trail of LOs, i.e. to what in principle can be learned from it. A problem q Q lies within this scope (which means that we may have q Kn \ K0 ) if and only if there exists N g(q) such that N {L0, …, Ln}. A learner following this trail of LOs, however, does not necessarily learn to solve all the problems within its scope.

Condition C2 means that learning to solve a problem can only occur if the currently visited LO is relevant for its solution. This means that the solution is learned as soon as the last portion of the relevant and sufficient information is considered, and excludes effects based on unrelated material mediating learning. Condition C3 implies K0 Kn, which, in principle (i.e. in case of K0 being non-empty), may be contradicted by data. In empirical applications, however, we may avoid this problem by simply identifying K0 with K0 Kn. Proceeding in this way

11


implicitly interprets the correct response to the problems in K0 \ Kn as lucky guesses.

A trail (L0, K0), …, (Ln, Kn) on L K is called consistent whenever the trail of knowledge states K0, …, Kn is compatible to the trail L0, …, Ln of LOs, i.e. whenever the compatibility conditions C1-C3 are satisfied.

Notice that the compatibility conditions C1-C3 are independent, and, in general, there will be more than one trail of knowledge states satisfying these conditions. The non-uniqueness is illustrated by the trails specified in Table 1. Both pairs of consistent trails T1, T2 and T3, T4 are based on identical trails of LOs as well as coinciding pre- and post-assessment. In the trails T1 and T2 the solution to problem c is learned at different points in time. Trails T3 and T4 differ with respect to learning to solve problem a.

T1: (E, ), (O1, ), (O2, {a}), (O3, {a, b}), (O4, {a, b, c}), (O5, {a, b, c, d})T2: (E, ), (O1, ), (O2, {a}), (O3, {a, b}), (O4, {a, b}), (O5, {a, b, c, d})T3: (E, ), (O1, ), (O2, {a}), (O3, {a, b}), (O2, {a, b}), (O4, {a, b, c}), (O5, {a, b, c})T4: (E, ), (O1, ), (O2, ), (O3, {b}), (O2, {a, b}), (O4, {a, b, c}), (O5, {a, b, c})

Table 1. Examples of consistent trails on the learning environment and knowledge structure as illustrated in Figure 2 and Figure 1, respectively.

The examples demonstrate that, in general, the compatibility conditions C1-C3 will not suffice to reconstruct a single trail of knowledge states from the given data. There are, however, sufficiently strict assumptions that warrant uniqueness of the inferred cognitive trail.

Strict Learning AssumptionHere we assume that learning can occur as soon as the relevant content has been exposed. Given the observable trail L0, …, Ln of LOs from L, a prerequisite map g on Q, and K0, Kn Q.

Consider a trail K0, …, Kn of knowledge states in the domain Q, which is defined in the following way. For all q Kn \ K0 and t {1, …, n-1} we have

q Kt \ K0 if and only if there is a subset N g(q) such that N {L0, …, Lt}.

This condition is called Strict Learning Assumption (SLA).Notice that the Strict Learning Assumption differs from C1, because it is stated

in form of a logical equivalence, and not as an implication. This seemingly slight difference has important implications. First of all, the Strict Learning Assumption defines a uniquely determined trail. If we assume that there are two trails K0, K1, …, Kn-1, Kn and K0, K'1, …, K'n-1, Kn then for all t {1, …, n-1} we have q Kt \ K0 if and only if the above condition holds, which in turn is equivalent to q K't \ K0. Second, it can be shown that any trail K0, …, Kn defined by SLA satisfies the compatibility conditions C1-C3, and thus constitutes a consistent trail.

12


From the trails listed in Table 1 only T1 and T3 are in accordance with SLA. In T2, for example, the learner is not able to solve problem c after visiting O 1, O2, O3

and O4, although these LOs provide all the required information according to the prerequisite map g. The essential assumption is that learning occurs as soon as the relevant material is available. This may be too optimistic. In particular, the Strict Learning Assumption lacks plausibility if LOs are visited more than once. Consider the trail T3, where the LO O2 is revisited, although its contents have already been learned. In fact, revisiting an LO may be interpreted as an indication of the fact that the material has not been learned during previous visits. This is taken into account in the subsequently outlined Weak Learning Assumption.

Weak Learning AssumptionUnder the same conditions as above consider a trail K0, …, Kn of knowledge states in the domain Q, which is defined in the following way. For all q Kn \ K0 and t {1, …, n-1} we have

q Kt \ K0 if and only if there is a subset N g(q) such that (N {L0, …, Lt} and N {Lt+1, …, Ln} = ).

This condition is called Weak Learning Assumption (WLA).As in case of SLA, the Weak Learning Assumption defines a unique consistent

trail (i.e. a trail that satisfies the compatibility conditions C1-C3). Still, learning occurs whenever the relevant contents has been exposed, only deferred to the last occurrence of multiply visited content. Notice that for trails, in which none of the LOs are visited more than once, the assumptions SLA and WLA coincide. This is the case for the trails T1 and T2 in Table 1. Thus, for T1 both SLA and WLA hold, while T2 satisfies neither SLA, nor WLA. The trails T3 and T4 of Table 1 contain multiple visits to LO O2. We have already seen that SLA holds for T3. This means it cannot satisfy WLA, since solving problem a is learned from O2 at the first visit. In contrast to that, learning how to solve problem a does not occur before the second visit to O2 in trail T4, which therefore satisfies WLA.

In principle, we can formulate an even more relaxed assumption where learning occurs at the latest point in time for which a consistent trail results. Stating this hypothesis as a general rule, however, would mean adopting an overly pessimistic point of view.

Effectiveness of Trails

The proposed model allows for judging the effectiveness of certain trails of LOs. This information may provide guidelines for an optimisation of the learning environment. It can form the basis for reshaping the link structure in the learning environment by eliminating links that belong to ineffective trails. The information could also be used to guide the adaptation of the hypertext using techniques such as link hiding and adaptive ordering (Brusilovsky, Kobsa, & Vassileva, 1998).

13


The effectiveness of a trail of LOs has to evaluate the actual performance of the learners relative to what in principle can be learned from the particular trail, which has been called its scope. Recall that a problem q in Q lies in the scope s (L0, …, Ln) of the trail L0, …, Ln if and only if there exists N in g (q) such that N {L0, …, Ln}. As learners following the trail of LOs do not necessarily learn to solve all the problems within its scope, we may contrast it with the actual solution behaviour represented by Kn \ K0.

A whole variety of numerical indices may be devised to capture the resulting discrepancy. As a first index we propose to consider the probability P(s(L0, …, Ln) | L0, …, Ln) of solving all problems within the scope given a certain trail, which means Kn \ K0 = s(L0, …, Ln). A trail of LOs clearly is effective, if this probability is close to 1, and ineffective if it is close to 0. More differentiated information may be gained from additionally conditioning on the initial knowledge state K0 , which can be employed to guide procedures for adapting the hypertext, like link hiding.

In general, given the trail L0, …, Ln we can consider a probability distribution on the collection of subsets of Q that contain the problems the solution to which has actually been learned. This collection is defined by the intersection K s (L0, …, Ln) for all K K. The information in this distribution may be integrated into a single index by forming the mean with respect to the number of problems the solution of which has been learned, i.e. the mean of |K s (L0, …, Ln)| for all K K. To be comparable for different trails this index has to be normalised with respect to the number of problems in the scope |s (L0, …, Ln)|. Other aspects of the observed trail may also be taken into account, like, for example, its length n. Given the same effectiveness as measured by the above introduced indices, a shorter trail may be considered superior to a longer one.

CASE STUDIES

In order to make use of the theoretical model presented in the previous sections to predict user behaviour, suggest useful trails or adapt the structure of a learning environment, the transition probabilities for the model must first be estimated. Transition probabilities are the probabilities of a user moving from one (L,K) state to another, and their values for a particular environment must be estimated from usage data showing the behaviour of many users in the environment.

As the (L, K) states are not directly observable at each transition the states for each transition must be inferred from whatever data is available about the L and K states of users. In an on-line environment usage data can be obtained from log files recording the history of access to the on-line resources. Data mining techniques can be used to ‘clean’ the log file, and trails of L states for individual users (and individual sessions) can be straightforwardly extracted as the sequences of resources accessed by a user. The corresponding K states can be inferred from the responses given to on-line assessments that test the current knowledge state of a user. In most cases users will take an assessment test only at the beginning and end of a session, or at the beginning and end of a course, perhaps with some intermediate testing. The very least that is needed to be able to use the model is a

14


pre-assessment giving the user's knowledge state before interacting with the environment and a post-assessment giving the knowledge state after learning has taken place. In this case either the SLA or WLA can be used to derive a unique trail of knowledge states.

In addition to the test scores, we must also know the mapping that associates each problem q in Q with the subset of the resources LO in the environment that teach the skills necessary to solve q. This enables us to infer at which point during the observed trail of L states a particular change in knowledge state occurred, thus generating the (L,K) trails for users that give us the usage statistics necessary for predicting the transition probabilities.

Ideally, in order to make full use of our theoretical model the log file for a system should allow the identification of the following:– The resources visited by individual users – ideally users should have to ‘log in’

at the start of each session and the log file records the user for each request. Where this is not possible, individual users can be identified using cookies or IP address, but these options are not as reliable as explicit log-in information.

– The date and time of each access, to allow the identification of sessions.– The knowledge state of a user at the beginning and end of a session – the results

from a pre-assessment and post-assessment for each session is ideal, but in systems where users must log in each time they use the system a pre-assessment may only be necessary for the first session, with the results of the post-assessment at the end of a session serving as the pre-assessment for the following session (this methodology assumes no forgetting between sessions).

and the following information about the environment must also be known:– The mapping of test questions to LOs – each question in the pre- and post-

assessments should ideally relate to the content taught by a single LO within the system in order to infer the cognitive trail of K states. This means that if question q tests content taught by LO L and the post-assessment shows that the user has learned to answer q during the preceding session, then the content was learned at a point where they were interacting with L, and not any other LO in the environment. If the mapping from questions to LOs is one-to-many, we need to introduce assumptions on the learning process (like SLA or WLA), or we can infer only probabilities for where the concept tested by q was learned.We have obtained log data from two different web-based learning environments.

In the following case studies we illustrate how our model could be applied to each of these environments. The first is an English grammar course offered to students at Eötvös Loránd University during October 2004. The second is a course developed by the computer science department at Trinity College Dublin that teaches some basics about database management systems and the SQL query language.

15


English Grammar Course

The CourseThe English grammar course uses the Coraler mapping tool to display links to free on-line educational portals teaching various aspects of English grammar, categorised according to topic and difficulty level. The map allows learners to navigate the material and to take both a general test and small topic tests. The nodes of the graph are coloured differently to show which topic nodes have been mastered (or partially mastered) by the learner, based on their test results. An environment such as this is ideal for the application of our model, as there is regular testing of learners to asses their knowledge state, and the test questions are related to the content of known LOs. The log files record all of the information necessary to be able to apply the model – both resources visited and test results – and every action is associated with the corresponding user's ID, so identification of sessions is straightforward and reliable.

The course was run at Eötvös Loránd University during October 2004, but unfortunately it was not used enough to provide the quantities of usage data necessary to be able to estimate probabilities with any degree of confidence. We will here use some of the data collected to show the main principles of how the model can be applied, without giving a full analysis of the environment.

Analysing the Log FileThe system had almost fifty registered users, but only seventeen of these accessed the system. We extracted the trails for individual users, and of the seven of any reasonable length only one has the structure necessary for analysis using our model (i.e. beginning with a test, followed by some visits to LOs and then some more post-assessment). The other users' trails were disregarded for one of the following reasons: one was the system administrator so the trail was meaningless in terms of learning, one did no pre-assessment, one no post-assessment and three did a pre-assessment but then only clicked on one content page.

The trail taken by the remaining user over three sessions is shown in Figure 5. The log data that this visualisation is generated from can be found in Heller et al. (2004).

16


Figure 5. The trail of one user of the English grammar course, over three sessions.

The ``Home'' page is the entry point to the system (like E in Figure 2), so is the starting point for each of the session trails. All three sessions together can be considered as a single trail beginning with the taking of the general test on the 1st October and ending with the taking of mini-test #20 on the 5th of October. Even this trail is not ideal for the application of our model because the user did not re-take the general test after following their learning trail. However, the results of the mini-tests (which consist of three questions on a single topic) could be used to assess whether individual concepts within the knowledge space have been learned, and if we had statistics from many users of the system (and thus could estimate the transition probabilities) our model would allow us to predict from the user trail which questions the user might be able to answer if they did take another general test.

In order to see how the model can be applied we will consider one small part of the English grammar course. Three figures show the situation: Figure 6 shows a section of the knowledge space for the grammar course – knowledge of the simple present and past tenses, Figure 7 shows the mapping from some of the questions in the general test to the LOs that teach the concepts necessary to be able to answer

17


the questions, and Figure 8 shows the trail taken by one hypothetical learner through the environment.

Figure 6. A small part of the knowledge space for the English grammar course

Figure 7. The mapping from questions in the general test to pages of learning content, and the mapping from content pages to topics.

Figure 8. A learner trail through the English grammar course, and the accompanying partially known Markov chain of (Lt, Kt) states.

18


As can be seen in Figure 8, the Markov chain of (Lt, Kt) states is partially known – the results of the pre-and post-assessments give the knowledge state at the beginning and end of the trail ( and {a} respectively), and the trail of learning objects can be read directly from the log file. The knowledge state has changed at some point along the trail from {} to {a}, but at which point the change occurred is not directly observed. Our model allows us to infer the intermediate knowledge states of the trail – the Kt. Referring back to the above introduced compatibility conditions C1-C3:– C1 tells us that the change in knowledge state from {} to {a} can not have

occurred before the user visited one of the pages that teach a. It can be seen in Figure 7 that these are pages P1, P8 and P12. This means that the earliest along the trail that the change could occur is at the visit to P8, the second page along the trail, i.e. K1 must be {}.

– C2 tells us that {a} can only enter the knowledge state while the learner is visiting one of the pages that teach a, i.e. while visiting P1, P8 or P12.

– C3 tells us that once the knowledge state becomes {a} then it will remain like that for the rest of the trail.This still leaves us with three possible consistent trails:

T1: (L0,{}), (P4,{}), (P8,{a}), (P12,{a}), (P30,{a}), (P1,{a}), (Lt,{a}) T2: (L0,{}), (P4,{}), (P8,{}), (P12,{a}), (P30,{a}), (P1,{a}), (Lt,{a}) T3: (L0,{}), (P4,{}), (P8,{}), (P12,{}), (P30,{}), (P1,{a}), (Lt,{a}).

Under either the SLA or WLA this set of three possible trails can be reduced to one. If the Strict Learning Assumption holds then the learner trail must be T1, as the content a is learned the first time relevant material is viewed, which is when the learner visits P8. As there are no re-visits to the same LO in the trail, the same is true under the Weak Learning Assumption – it will be T1 that is the correct trail.

In order to estimate the parameters for the model we would need log data for a collection of users. For our example part of the English grammar course, imagine a small log file that allows us (using the WLA) to infer the trails shown in Table 2.

T1: (L0,{}), (P4,{}), (P8,{}), (P12,{}), (P30,{}), (P1,{a}), (Lt,{a}) T2: (L0,{}), (P4,{}), (P1,{}), (P12,{a}), (P7,{a}), (P4,{a}), (Lt,{a}) T3: (L0,{}), (P12,{}), (P4,{}), (P1,{a}), (P30,{a}), (P4,{a,b}), (Lt,{a,b})

Table 2. An example set of trails inferred from a log file for the English grammar course.

The estimation of the initial probability distribution P(L0, K0) is simple, as all the trails begin at the entry point, which in this case is the set of links presented after taking the general test, and all users so far have begun in the naive knowledge state:

P(L0, K0) = 1 for L0 = E, K0 = and P(L0, K0) = 0 otherwise.

19


Further probabilities can be estimated as the relevant frequencies observable in the logged trails. Estimating the probabilities of choosing a certain page conditional on the previous page and previous knowledge state yields

P(Lt=P4 | Lt-1=L0, Kt-1={}) = 0.66,P(Lt=P12 | Lt-1=L0, Kt-1={}) = 0.33, P(Lt=P1 | Lt-1=P4, Kt-1={}) = 0.66,P(Lt=P8 | Lt-1=P4, Kt-1={}) = 0.33,P(Lt=P8 | Lt-1=P4, Kt-1={}) = 0.33,

and estimating the probabilities for the current knowledge state conditional on the current page and the previous knowledge state provides

P(Kt={} | Lt=P4, Kt-1={}) = 1,P(Kt={} | Lt=P1, Kt-1={}) = 0.33, P(Kt={a} | Lt=P1, Kt-1={}) = 0.66, P(Kt={a, b} | Lt=P4, Kt-1={a}) = 0.5, P(Kt={a} | Lt=P30, Kt-1={a}) = 1.

Obviously, this example of three trails is too small for the estimates to be reliable, but it shows the principle behind applying the model. The estimated values can then be validated by predicting the results of post assessments based on the trail followed.

SQL Course

The CourseThis on-line SQL course is divided into five main sections – database concepts, creating a database, populating a database, database retrieval and database applications. The course is adaptive in that it only displays the sections that teach material that the learner does not already know: A personalised home page for the course (analogous to the entry point page E in Figure 2) is generated based on the user's responses to an initial questionnaire consisting of five questions. Depending on the answer to each question a link to one of the available sections will be included in the personalised home page or not. Figure 9 shows the structure of the course – the five sections that are shown or hidden depending on the questionnaire responses are the five sub-trees below the home page, and the leaf pages represent lessons consisting of a number of learning objects (‘pagelets’) that are accessed in sequence.

20


Figure 9. Structure of the SQL course.

The adaptation is at a very high level, and the initial five questions are not really precise enough to count as a pre-assessment and so do not give a fine-grained picture of the user's knowledge state before beginning the course. No dependencies between the sections are taken into account when building the course (i.e. the prerequisite map is not considered in structuring the environment), so the knowledge space corresponding to the course is effectively modelled as five independent sections. This is probably too simplistic a model: If a user can embed SQL in a C application (covered in the fifth section of the course), they would reasonably be expected to know some basic database concepts (the subject of the first section), however this constraint is not enforced within the system.

Assessment in the course is through a project that the learners must complete that involves creating some database tables and running queries on them. This is external to the on-line environment, and so there is no `post-assessment' data stored in the log file – without separate assessment results for each user there is no way to assess the learners' knowledge states after interacting with the environment.

Analysing the Log FileThe data is obviously far from ideal for the application of our model – there is no pre- or post-assessment, which means we cannot analyse the trails of K states for learners. Despite these drawbacks we can do some analysis of the trails taken by users through the system, which can begin to show how some parts of the model

21


can be acquired from basic server log data. The log data processed consists of 12472 lines covering the period 18-30 Nov 2003. From this we have generated 532 user sessions, within which we can identify 339 unique patterns of behaviour within a session (i.e. 339 different trails). The format of the log file and the scope of the data recorded in it imposes some further limitations on our analysis:– Cached pages – the analysis does not look into referral (i.e. which was the

previous page visited) to establish whether any pages have been cached in-between actual hits to the web server, hence the extracted sessions may not give the full picture of the trails of LOs actually seen by the users.

– Some sessions may be incomplete as there is the possibility that a session may extend over the ‘timeout’ period and/or users may run concurrent sessions in different windows. Some sessions begin at ‘middle’ pages (i.e. do not follow the login → course index → section sequence), which may be evidence of this.

– It is difficult to identify individual users from the log file – the user is `tagged' only at the login page, and subsequent pages are not tagged. This makes it difficult to accurately assign log entries to the right users – in our analysis we have assumed that all entries belonging to the same IP address are grouped together to represent one distinct user.

Number of links selected Percentage of sessions

0 4.7%1 28.9%2 7.7%3 1.5%4 1.7%5 1.9%6 1.9%7 1.3%10 1.3%

11-15 8.8%16-20 4.7%21-40 15.7%41-60 9.0%61-80 4.0%81-100 3.7%>100 3.2%

Table 3. The number of links selected per session for the SQL course log data

In order to more easily see general patterns of behaviour (rather than exact trails), pages (and therefore the visits to them) can be categorised as falling into one of four categories or ‘levels’ within the learning environment – the login} level, the rebuild/personalise level, the section level and the pagelet level. Looking at Figure 9, the Home Page (the course index) can be considered to be the login level, the first row of five pages reached directly from the Home Page are the section level, the pages below the first row in the diagram represent clusters of

22


pagelets, and the rebuild/personalise level is not shown as it is in some sense ‘outside’ of the course structure (if it had been shown it could be placed ‘parallel’ to the Home Page). Table 4 shows a sample of the URLs that appear in the log file, and their respective categorisations into levels.

URL in log file Level

/sql/login.jsp Login

/sql/test.jsp?learner=user1 Rebuild/Personalise

/sql/test.jsp?Q1=a&Q2=a&Q3=a&Q4=a&Q5=a&learner=user1&build=true Rebuild/Personalise

/sql/section.jsp?course=SQL%20Course&section=Database%20Concepts Section

/sql/page.jsp?course=SQL%20Course&section=Database%20Concepts&subsection=Introduction&pagelet=1

Pagelet


Pagelet


Pagelet


Pagelet

Table 4. Categorisation of page hits into the four levels.

The user behaviour varies tremendously. The most frequent behaviour is a single click on the login page, as shown in Figure 10. Another common sequence of activity is shown in Figure 11 – the user logs in, rebuilds/personalises the content, visits the course index, selects a section from the course table of contents, and then selects a number (n) of pagelets from the subsection. This group of users only visits one section out of the five possible sections during a session.

Figure 10. The most common trail for the SQL course – a single hit on the login page.

23


Figure 11. Another common trail for the SQL course – visiting pagelets in only one section.

We have seen how individual user's trails can be extracted from the log file and how these can be used to identify general patterns of behaviour. However, with 339 different trails and 532 instances there is not enough data here to make reliable estimates of transition probabilities from the observed frequencies. As there is no pre- or post-assessment we do not have the data necessary to say anything about the knowledge states of actual users. We will now hypothetically consider how the knowledge structure for the ‘Populating a Database’ section of the course could be (partially) derived from analysis using our model, were this data available. The first thing to notice is that the knowledge structure of the environment will impose some restrictions on the trails that can possibly occur – so by observing patterns in the user log data it is possible to hypothesise about what the knowledge structure might be. As more log data is collected the number of possible knowledge structures consistent with the observations will reduce.

For example, Figure 12 shows the ‘Populating a Database’ part of the course and one possible corresponding knowledge structure. Assuming that users begin this section of the course in the null knowledge state we can deduce many propositions about the trails that could possibly be observed if this is the correct knowledge structure, such as:– If a trail does not contain L1 then the final knowledge state should always be ,

as learners must learn {a} from L1 before any further progress can be made;– If the final knowledge state contains {c} then the trail must contain both L2 and

L3 , with L3 being visited after L2 , as learners must learn {b} from L2 before being able to learn {c} from L3;

– The final knowledge state for a learner following the trail {L1 , L4 , L3 , L2} must be one of , {a}, {a, b}, {a, d}, {a, b, d};

– and so on …If, when analysing log data files, any of these propositions was found to be

contradicted then this would be evidence that the knowledge structure is not the one hypothesised, and a new hypothesis consistent with the observations would need to be made.

24


Figure 12. (a) The ``Populating a Database'' section of the SQL course, showing which parts teach which concept; (b) The possible knowledge structure for this section of the

course, which shows that the ‘Insert Statement’ material must be learned before the ‘Update Statement’ can be learned.

Once the probable knowledge structure has been deduced it can be used to inform future re-structuring of the learning material for a static environment, or to add useful adaptive functionality to an adaptive environment. In the case shown in Figure 12 the environment could be re-structured so that L3 could only be accessed after L2, or a more adaptive system could be engineered so that the links to L3 are hidden either until L2 has been visited, or until an assessment shows that the learner's knowledge state contains b. If the system was designed to automatically implement our model by recording user behaviour, these changes could be made automatically once the system was sure about the knowledge structure.

DISCUSSION

The case studies begin to show how our model can be applied in different situations. The case study of the English grammar course illustrates the first step in any application of the model -getting an estimate of the parameters (i.e. the (L, K) transition probabilities) from usage data and the results of learner assessments. This can be done for any environment where we have log data recording the usage of the system coupled with pre- and post-assessments of the user's knowledge states.

We then saw in the SQL course case study how frequent navigational patterns (common trails) can be discovered from server log files. Consideration of the common navigational patterns that emerge, along with the stochastic model of (L,

25


K) states can be applied for several purposes. Firstly, the model can reveal whether the most common trails followed (as observed in the log file) are also the most effective for learning. If this is the case then any re-design of the linkage structure of the course (either manual or adaptive) should support these common types of usage. If the most common trails are not the most effective then it may be that the structure of the environment encourages or even forces suboptimal trails to be followed, and in this case the course can be re-designed to prevent users following the less efficient paths. The SQL course showed a huge range of different user behaviours, and while it may be a good thing to allow learners to explore however they like, it may be worth adjusting the structure to prevent some of the paths where learning does not happen to make browsing more efficient for learning.

The SQL course case study also showed how the observed trails can be used to (partially) derive the knowledge structure for a course. This information can also be used to inform the future structuring of the course, again so that the material along allowed paths through the environment corresponds to possible paths of progression through the knowledge structure.

The model can further be used to predict user behaviour within an environment. Given the model (with estimated parameters), an estimate of the current knowledge state of the user and the user's current location the most probable next step in the user trail can easily be calculated. Such predictions of the item most likely to be requested next could be used to optimise pre-fetching and caching of resources. Perhaps more interestingly, the model can also be used to suggest the best next step – a LO that the learner is ready to learn the content of – to learners as they navigate. This would provide a personalised environment, as the suggestions would be based on an assessment the individual user's current knowledge state, current location, and possibly their historical trail so far.

One method of automatically adapting a course is through the use of Adaptive Hypertext (AH) techniques such as link hiding and adaptive ordering (Brusilovsky, Kobsa, & Vassileva, 1998). Awareness of the learner's current knowledge state means that an AH system could use link hiding to hide links to LOs covering material that the learner is not yet ready to tackle (i.e. when there is no route in the knowledge structure from the learner's current knowledge state to a knowledge state containing the concepts taught at the ‘hidden’ LOs). Adaptive ordering can be used to order material into the most effective trails, according to the model built from initial (or perhaps ongoing, on-line) log file analysis. An adaptive system such as this would mean that the course designer could set the course up to start by giving a very open choice of LOs to learners, and the system would adapt over time to give the most effective structuring of the material.

The suggested model is not only applicable to learning environments. In the more general web environment current user models do not attempt to take any account of users’ knowledge states. Usage mining models could be improved to do so, possibly giving an insight into modelling, for example, user behaviour in e-commerce environments.

In this situation the definition of cognitive states is much looser – the relevant set of cognitive categories may even be as simple as: {Browsing, Buying}. As we

26


did with the SQL course, it may also be useful to characterise pages into a small set of page types, perhaps: {Catalogue page, Product page, Search results, Shopping basket, Checkout}. Armed with these sets of categories the log file can be analysed by extracting the individual user sessions (using standard web data mining techniques), and then categorising the cognitive state of the user based on the outcome and characteristics of the observed trail. If the trail ends with a sale, then we can say that the user was definitely ‘buying’, and mark the trail as a ‘buying’ trail. For other trails it is not so clear-cut, but there will be useful ‘fuzzy’ rules for allocating probabilities to the cognitive state, for example if the user only looked at product pages then it is quite probable that they were only ‘browsing’, but if they got as far as the checkout then there is a higher probability that they may have been ‘buying’, but something stopped them from doing so.

Once all trails have been classified according to the user’s cognitive state the set of ‘buying’ trails can be analysed to find any common features, and likewise for ‘browsing’ trails. This will enable an adaptive e-commerce system to recognise as early as possible if a visitor to the site looks to be following a ‘buying’ or ‘browsing’ trail. In the former case the system can then adapt to try to make sure the visitor has every opportunity to buy when they are ready. In the latter case the environment could adapt to try to shift the visitor onto a ‘buying’ path.

CONCLUSIONS

The present chapter suggests a Markov chain model that interlinks observable trails of learning objects (LOs) with associated latent trails in an underlying cognitive space. It elaborates on the theory and develops methods for its application, which is exemplified in two case studies. The main contribution of this work is the combination of Knowledge Space Theory with web data mining techniques to produce a new model of the relationship between the cognitive processes of users of on-line environments and their observable navigation behaviour. As we have begun to see in the case studies, the model opens up new possibilities for including analysis of cognitive information in web data mining, with applications for both learning environments and other more general web-based environments.

We believe the model is a good start, and a definite improvement of the current state of both data mining (which does not consider cognitive states), and learner assessment (which does not usually consider learner navigation). However, there is still potential to further improve predictive power of the model, e.g. by taking into account the effect of the time spent on a particular LO (visiting time). Within an extended framework the dependence of the probability of a transition between knowledge states on the current LO’s visiting time may be modelled by a learning curve, the possible forms of which are well understood in psychology.

One issue that arose in conducting the case studies is that it is difficult to obtain sufficient good quality log data as current e-learning practice appears to be deficient in collecting (and storing) thorough logs. This would seem to be an issue for the community in general – there is widespread use of on-line learning environments, and they are often criticised for being quite ineffective at enabling

27


learning. Log files coupled with assessment records should be a major tool in assessing and improving such environments, but many of the systems appear not to be keeping sensible logs of all the relevant information, making useful log file analysis impossible, or at best patchy.

ACKNOWLEDGEMENTS

We wish to thank Owen Conlan and Vincent Wade at Trinity College Dublin for kindly providing us with access to the log files from their SQL course. Thanks also to Marta Turcsányi-Szabó and Péter Kaszás at Eötvös Loránd University for re-running their English grammar course in October and providing us with the log data.

REFERENCES

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B.N. Petrov, & F. Csaki (Eds.). Second International Symposium on Information Theory. Budapest: Adadeiai Kiado.Albert, D., & Lukas, J. (1999). Knowledge Spaces: Theories, Empirical Research Applications. Mahwah: Lawrence Erlbaum..Albert, D., Hockemeyer, C., Conlan, O., & Wade, V. (2001). Reusing adaptive learning resources. In C.-H. Lee et al. (Eds.), Proceedings of the International Conference on Computers in Education ICCE/SchoolNet2001 (vol. 1, pp. 205–210).Borges, J., & Levene, M. (2000). Data mining of user navigation patterns. In B. Masand, & M. Spiliopoulou (Eds.), Web Usage Analysis and User Profiling, Lecture Notes in Artificial Intelligence (vol. 1836, pp. 92–111). Berlin: Springer.Brusilovsky, P., Kobsa, A., & Vassileva, J. (Eds., 1998). Adaptive Hypertext and Hypermedia. Dordrecht: Kluwer.Conlan, O., Hockemeyer, C., Lefrere, P., Wade, V., & Albert, D. (2001). Extending educational metadata schemas to describe adaptive learning resources. In H. Davies, Y. Douglas, & D.G. Durand (Eds.), Hypertext '01: Proceedings of the twelfth ACM Conference on Hypertext and Hypermedia (pp. 161-162). New York: Association for Computing Machinery.Conlan, O., Hockemeyer, C., Lefrere, P., Wade, V., & Albert, D. (2002). Metadata driven approaches to facilitate adaptivity in personalized eLearning systems. The Journal of Information and Systems in Education, 1, 38–44.Doignon, J., & Falmagne, J. (1985). Spaces for the assessment of knowledge. International Journal of Man-Machine Studies, 23, 175–196.Doignon, J., & Falmagne, J. (1999). Knowledge Spaces. Berlin: Springer.Heller, J., Keenoy, K., Levene, M., Hassan, M.M., Hockemeyer, C., & Albert, D. (2004). Cognitive and Pedagogical Aspects of Trails: A Case Study. Available: http://www.dcs.bbk.ac.uk/trails/docs/D22-01-02-F.pdfJensen, F.B. (2001). Bayesian Networks and Decision Graphs. Statistics for Engineering & Information Science. New York: Springer.

28


Kemeny, J.G., & Snell, J.L. (1960). Finite Markov Chains. Princeton: van Nostrand.Levene, M., & Loizou, G. (1999). A probabilistic approach to navigation in hypertext. Information Sciences, 114, 165–186.Lindgren, B.W. (1993). Statistical Theory (4th ed.). London: Chapman & Hall.Niculescu, R.S. (2005). Exploiting parameter domain knowledge for learning in bayesian networks. Technical report CMU-TR-05-147, Carnegie Mellon University.Niculescu, R.S. , Mitchell, T.M. , & Rao, R.B. (2005). Parameter related domain knowledge for learning in graphical models. Proceedings of SIAM Data Mining conference 2005 (pp. 310–321).Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.Rabiner, L.R., & Juang, B.H. (1986). An introduction to hidden Markov models. IEEE ASSP Magazine, 3(1), 4–16.Schwarz, J. H. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461_464.

AFFILIATIONS

Jürgen HellerDepartment of PsychologyUniversity of Graz

Mark LeveneSchool of Computer Science and Information SystemsBirkbeck University of London

Kevin KeenoySchool of Computer Science and Information SystemsBirkbeck University of London

Dietrich AlbertDepartment of PsychologyUniversity of Graz

Cord HockemeyerDepartment of PsychologyUniversity of Graz

29

doc

Documents

observable trails

web learning environment

learning behaviour

latent cognitive trails

navigational learning

latent trails

learning material

learning goals