bayesian knowledge tracing and other predictive models in educational data mining

Post on 24-Feb-2016

60 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining. Zachary A. Pardos. PSLC Summer School 2011. Zach Pardos. PLSC Summer School 2011. Outline of Talk. Introduction to Knowledge Tracing History Intuition Model Demo Variations (and other models) - PowerPoint PPT Presentation

TRANSCRIPT

Bayesian Knowledge Tracing and Other Predictive Models in Educational Data Mining

Zachary A. Pardos

PSLC Summer School 2011

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

2Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Outline of Talk

• Introduction to Knowledge Tracing– History– Intuition– Model– Demo– Variations (and other models)– Evaluations (baker work / kdd)

• Random Forests– Description– Evaluations (kdd)

• Time left? – Vote on next topic

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

History• Introduced in 1995 (Corbett & Anderson,

UMUAI)• Basked on ACT-R theory of skill knowledge

(Anderson 1993)• Computations based on a variation of

Bayesian calculations proposed in 1972 (Atkinson)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Intuition• Based on the idea that practice on a skill leads

to mastery of that skill • Has four parameters used to describe student

performance• Relies on a KC model• Tracks student knowledge over time

Given a student’s response sequence 1 to n, predict n+1

0 0 0 11 1 ?

For some Skill K:

Chronological response sequence for student Y[ 0 = Incorrect response 1 = Correct response]

1 …. n n+1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

0 0 0 11 1 1

Track knowledge over time(model of learning)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing Knowledge Tracing (KT) can be represented as a simple HMM

Latent

Observed

Node representationsK = Knowledge nodeQ = Question node

Node statesK = Two state (0 or 1)Q = Two state (0 or 1)

UMAP 2011 7

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing Four parameters of the KT model:

P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

UMAP 2011

P(L0) P(T) P(T)

P(G) P(G) P(G) P(S)

Probability of forgetting assumed to be zero (fixed)

8

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Formulas for inference and prediction

• Derivation (Reye, JAIED 2004):

• Formulas use Bayes Theorem to make inferences about latent variable

If (1)

(2) (3)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

0 0 11 1

Model Training Step - Values of parameters P(T), P(G), P(S) & P(L0 ) used to predict student responses• Ad-hoc values could be used but will likely not be the best fitting• Goal: find a set of values for the parameters that minimizes prediction error

1 1 1 10 10 0 0 01 0

Student AStudent BStudent C

0

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Model Training:

Model Tracing Step – Skill: Subtraction

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing

0 1 1

Student’s last three responses to Subtraction questions (in the Unit)

Test set questions

Latent (knowledge)

Observable(responses)

10% 45% 75% 79% 83%

71% 74%

P(K)

P(Q)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Model Prediction:

Influence of parameter values

P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09

Student reached 95% probability of knowledgeAfter 4th opportunity

Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Estimate of knowledge for student with response sequence: 0 1 1 1 1 1 1 1 1 1

P(L0): 0.50 P(T): 0.20 P(G): 0.14 P(S): 0.09P(L0): 0.50 P(T): 0.20 P(G): 0.64 P(S): 0.03

Student reached 95% probability of knowledgeAfter 8th opportunity

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Influence of parameter values

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

( Demo )

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Variations on Knowledge Tracing(and other models)

Prior Individualization Approach

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing Do all students enter a lesson with the same background knowledge?

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)

Node representationsK = Knowledge nodeQ = Question nodeS = Student node

Node statesK = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)

P(L0|S)

Observed

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

S value P(S=value)

1 1/N

2 1/N

3 1/N

… …N 1/N

CPT of Student node

• CPT of observed student node is fixed• Possible to have S value for every student ID• Raises initialization issue (where do these prior values come from?)

S value can represent a cluster or type of student

instead of ID

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Prior Individualization Approach

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

S value P(L0|S)

1 0.05

2 0.30

3 0.95

… …N 0.92

CPT of Individualized Prior node• Individualized L0 values need to be seeded• This CPT can be fixed or the values can be learned• Fixing this CPT and seeding it with values based on a student’s first response can be an effective strategy

This model, that only individualizes L0, the Prior Per

Student (PPS) model

P(L0|S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Prior Individualization Approach

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing Conditional Probability Table of Student node and Individualized Prior node

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

S value P(L0|S)

0 0.05

1 0.30

CPT of Individualized Prior node• Bootstrapping prior• If a student answers incorrectly on the first question, she gets a low prior•If a student answers correctly on the first question, she gets a higher prior

P(L0|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Prior Individualization Approach

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing What values to use for the two priors?

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

S value P(L0|S)

0 0.05

1 0.30

CPT of Individualized Prior node

What values to use for the two priors?

P(L0|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Prior Individualization Approach

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing What values to use for the two priors?

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

S value P(L0|S)

0 0.10

1 0.85

CPT of Individualized Prior node1. Use ad-hoc values

P(L0|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Prior Individualization Approach

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing What values to use for the two priors?

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

S value P(L0|S)

0 EM

1 EM

CPT of Individualized Prior node1. Use ad-hoc values2. Learn the values

P(L0|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Prior Individualization Approach

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing What values to use for the two priors?

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

S value P(L0|S)

0 Slip

1 1-Guess

CPT of Individualized Prior node1. Use ad-hoc values2. Learn the values3. Link with the

guess/slip CPT

P(L0|S)

1

1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Prior Individualization Approach

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing What values to use for the two priors?

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

S value P(L0|S)

0 Slip

1 1-Guess

CPT of Individualized Prior node1. Use ad-hoc values2. Learn the values3. Link with the

guess/slip CPT

P(L0|S)

1

1

With ASSISTments, PPS (ad-hoc) achieved an R2 of 0.301 (0.176 with KT)

(Pardos & Heffernan, UMAP 2010)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Prior Individualization Approach

UMAP 2011 25

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Variations on Knowledge Tracing(and other models)

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

UMAP 2011

P(L0) P(T) P(T)

(Baker et al., 2010)

26

1. BKT-BFLearns values for these parameters byperforming a grid search (0.01 granularity)and chooses the set of parameters with the best squared error

. . .

P(G) P(G) P(G)P(S) P(S) P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

UMAP 2011

P(L0) P(T) P(T)

(Chang et al., 2006)

27

2. BKT-EMLearns values for these parameters with Expectation Maximization (EM). Maximizes the log likelihood fit to the data

. . .

P(G) P(G) P(G)P(S) P(S) P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

UMAP 2011

P(L0) P(T) P(T)

(Baker, Corbett, & Aleven, 2008)

28

3. BKT-CGSGuess and slip parameters are assessed contextually using a regression on features generated from student performance in the tutor

. . .

P(G) P(G) P(G)P(S) P(S) P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

UMAP 2011

P(L0) P(T) P(T)

(Baker, Corbett, & Aleven, 2008)

29

4. BKT-CSlipUses the student’s averaged contextual Slip parameter learned across all incorrect actions.

. . .

P(G) P(G) P(G)P(S) P(S) P(S)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

UMAP 2011

P(L0) P(T) P(T)

(Nooraiei et al, 2011)

30

5. BKT-LessDataLimits students response sequence length to the most recent 15 during EM training.

. . .

P(G) P(G) P(G)P(S) P(S) P(S)Most recent 15 responses used (max)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

K K K

Q Q Q

P(T) P(T)

Model ParametersP(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

Nodes representationK = knowledge nodeQ = question node

Node statesK = two state (0 or 1)Q = two state (0 or 1)

P(L0)

P(G)P(S)

Knowledge Tracing P(L0) = Probability of initial knowledgeP(T) = Probability of learningP(G) = Probability of guessP(S) = Probability of slip

UMAP 2011

P(L0) P(T) P(T)

(Pardos & Heffernan, 2010)

31

6. BKT-PPSPrior per student (PPS) model which individualizes the prior parameter. Students are assigned a prior based on their response to the first question.

. . .

P(G) P(G) P(G)P(S) P(S) P(S)

K K K

Q Q Q

P(T) P(T)P(L0|S)

P(G)P(S)

S

Knowledge Tracing with Individualized P(L0)P(L0|S)

Observed

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

UMAP 2011 32

7. CFARCorrect on First Attempt Rate (CFAR) calculates the student’s percent correct on the current skill up until the question being predicted.

Student responses for Skill X: 0 1 0 1 0 1 _

Predicted next response would be 0.50

(Yu et al., 2010)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

UMAP 2011 33

8. TablingUses the student’s response sequence (max length 3) to predict the next response by looking up the average next response among student with the same sequence in the training set

Training setStudent A: 0 1 1 0Student B: 0 1 1 1Student C: 0 1 1 1

Predicted next response would be 0.66

Test set student: 0 0 1 _

Max table length set to 3:Table size was 20+21+22+23=15

(Wang et al., 2011)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

UMAP 2011 34

9. PFAPerformance Factors Analysis (PFA). Logistic regression model which elaborates on the Rasch IRT model. Predicts performance based on the count of student’s prior failures and successes on the current skill.

An overall difficulty parameter ᵝ is also fit for each skill or each item In this study we use the variant of PFA that fits ᵝ for each skill. The PFA equation is:

(Pavlik et al., 2009)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Study

• Cognitive Tutor for Genetics– 76 CMU undergraduate students – 9 Skills (no multi-skill steps)– 23,706 problem solving attempts – 11,582 problem steps in the tutor– 152 average problem steps completed per student

(SD=50)– Pre and post-tests were administered with this

assignment

Dataset

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

MethodologyEvaluation

Intro to Knowledge Tracing

Study• Predictions were made by the 9 models using a 5 fold cross-validation by student

Methodologymodel in-tutor prediction

Student 1 Skill A Resp 1 0.10 0.22 0

Skill A Resp 2 ….

0.51 0.26 1

Skill A Resp N 0.77 0.40 1

Student 1 Skill B Resp 1 …

0.55 0.60 1

Skill B Resp N 0.41 0.61 0

BKT-BF

BKT-EM…

Actual

• Accuracy was calculated with A’ for each student. Those values were then averaged across students to report the model’s A’ (higher is better)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

StudyResultsin-tutor model prediction

Model A’

BKT-PPS 0.7029

BKT-BF 0.6969

BKT-EM 0.6957

BKT-LessData 0.6839

PFA 0.6629

Tabling 0.6476

BKT-CSlip 0.6149

CFAR 0.5705

BKT-CGS 0.4857

A’ results averaged across students

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

StudyResultsin-tutor model prediction

Model A’

BKT-PPS 0.7029

BKT-BF 0.6969

BKT-EM 0.6957

BKT-LessData 0.6839

PFA 0.6629

Tabling 0.6476

BKT-CSlip 0.6149

CFAR 0.5705

BKT-CGS 0.4857

A’ results averaged across studentsNo significant differences within these BKT

Significant differences between these BKT and PFA

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Study• 5 ensemble methods were used, trained with the same 5 fold cross-validation folds

Methodologyensemble in-tutor prediction

• Ensemble methods were trained using the 9 model predictions as the features and the actual response as the label.

Student 1 Skill A Resp 1 0.10 0.22 0

Skill A Resp 2 ….

0.51 0.26 1

Skill A Resp N 0.77 0.40 1

Student 1 Skill B Resp 1 …

0.55 0.60 1

Skill B Resp N 0.41 0.61 0

BKT-BF

BKT-EM…

Actual

features label

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Study• Ensemble methods used:

1. Linear regression with no feature selection (predictions bounded between {0,1})2. Linear regression with feature selection (stepwise regression)3. Linear regression with only BKT-PPS & BKT-EM4. Linear regression with only BKT-PPS, BKT-EM & BKT-CSlip5. Logistic regression

Methodologyensemble in-tutor prediction

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

StudyResultsin-tutor ensemble prediction

Model A’

Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7028

Ensemble: LinReg with BKT-PPS & BKT-EM 0.6973

Ensemble: LinReg without feature selection 0.6945

Ensemble: LinReg with feature selection (stepwise) 0.6954

Ensemble: Logistic without feature selection 0.6854

A’ results averaged across students

Tabling

No significant difference between ensembles

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

StudyResultsin-tutor ensemble & model prediction

Model A’BKT-PPS 0.7029Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7028Ensemble: LinReg with BKT-PPS & BKT-EM 0.6973BKT-BF 0.6969BKT-EM 0.6957Ensemble: LinReg without feature selection 0.6945Ensemble: LinReg with feature selection (stepwise) 0.6954Ensemble: Logistic without feature selection 0.6854BKT-LessData 0.6839PFA 0.6629Tabling 0.6476BKT-CSlip 0.6149CFAR 0.5705BKT-CGS 0.4857

A’ results averaged across students

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

StudyResultsin-tutor ensemble & model prediction

Model A’

Ensemble: LinReg with BKT-PPS, BKT-EM & BKT-CSlip 0.7451Ensemble: LinReg without feature selection 0.7428Ensemble: LinReg with feature selection (stepwise) 0.7423Ensemble: Logistic regression without feature selection 0.7359Ensemble: LinReg with BKT-PPS & BKT-EM 0.7348BKT-EM 0.7348BKT-BF 0.7330BKT-PPS 0.7310PFA 0.7277BKT-LessData 0.7220CFAR 0.6723Tabling 0.6712Contextual Slip 0.6396BKT-CGS 0.4917

A’ results calculated across all actions

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

In the KDD Cup• Motivation for trying non KT approach:– Bayesian method only uses KC, opportunity count and

student as features. Much information is left unutilized. Another machine learning method is required

• Strategy:– Engineer additional features from the dataset and use

Random Forests to train a model

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Random Forests

• Strategy:– Create rich feature datasets that include features created

from features not included in the test set

Valid

ation

set 2

(v

al2)

Valid

ation

set 1

(v

al1)

raw training dataset rows raw test dataset row

s

Feature Rich Validation set 2 (frval2)

Feature Rich Validation set 1 (frval1)

Feature Rich Test set (frtest)

Non validation training rows

(nvtrain)

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

• Created by Leo Breiman• The method trains T number of separate decision tree

classifiers (50-800)• Each decision tree selects a random 1/P portion of the

available features (1/3)• The tree is grown until there are at least M

observations in the leaf (1-100)• When classifying unseen data, each tree votes on the

class. The popular vote wins or an average of the votes (for regression)

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Feature ImportanceFeatures extracted from training set:• Student progress features (avg. importance: 1.67)

– Number of data points [today, since the start of unit]– Number of correct responses out of the last [3, 5, 10]– Zscore sum for step duration, hint requests, incorrects– Skill specific version of all these features

• Percent correct features (avg. importance: 1.60)– % correct of unit, section, problem and step and total for each skill and also for each

student (10 features)

• Student Modeling Approach features (avg. importance: 1.32)– The predicted probability of correct for the test row– The number of data points used in training the parameters– The final EM log likelihood fit of the parameters / data points

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

• Features of the user were more important in Bridge to Algebra than Algebra

• Student progress features / gaming the system (Baker et al., UMUAI 2008) were important in both datasets

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Rank Feature set RMSE Coverage1 All features 0.2762 87%

2 Percent correct+ 0.2824 96%

3 All features (fill) 0.2847 97%

Rank Feature set RMSE Coverage1 All features 0.2712 92%

2 All features (fill) 0.2791 99%

3 Percent correct+ 0.2800 98%

Algebra

Bridge to Algebra

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Rank Feature set RMSE Coverage1 All features 0.2762 87%

2 Percent correct+ 0.2824 96%

3 All features (fill) 0.2847 97%

Rank Feature set RMSE Coverage1 All features 0.2712 92%

2 All features (fill) 0.2791 99%

3 Percent correct+ 0.2800 98%

Algebra

Bridge to Algebra

• Best Bridge to Algebra RMSE on the Leaderboard was 0.2777• Random Forest RMSE of 0.2712 here is exceptional

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Rank Feature set RMSE Coverage1 All features 0.2762 87%

2 Percent correct+ 0.2824 96%

3 All features (fill) 0.2847 97%

Rank Feature set RMSE Coverage1 All features 0.2712 92%

2 All features (fill) 0.2791 99%

3 Percent correct+ 0.2800 98%

Algebra

Bridge to Algebra

• Skill data for a student was not always available for each test row• Because of this many skill related feature sets only had 92% coverage

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Conclusion from KDD• Combining user features with skill features was very

powerful in both modeling and classification approaches

• Model tracing based predictions performed formidably against pure machine learning techniques

• Random Forests also performed very well on this educational data set compared to other approaches such as Neural Networks and SVMs. This method could significantly boost accuracy in other EDM datasets.

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Hardware/Software• Software– MATLAB used for all analysis

• Bayes Net Toolbox for Bayesian Networks Models• Statistics Toolbox for Random Forests classifier

– Perl used for pre-processing• Hardware– Two rocks clusters used for skill model training

• 178 CPUs in total. Training of KT models took ~48 hours when utilizing all CPUs.

– Two 32gig RAM systems for Random Forests• RF models took ~16 hours to train with 800 trees

Random Forests

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Choose the next topic• KT: 1-35• Prediction: 36-67• Evaluation: 47-77• sig tests: 69-77• Regression/sig tests: 80-112

Time left?

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

UMAP 2011 55

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

Individualize Everything?

Fully Individualized ModelModel ParametersP(L0) = Probability of initial knowledgeP(L0|Q1) = Individual Cold start P(L0)P(T) = Probability of learningP(T|S) = Students’ Individual P(T)P(G) = Probability of guessP(G|S) = Students’ Individual P(G)P(S) = Probability of slipP(S|S) Students’ Individual P(S)

Node representationsK = Knowledge nodeQ = Question nodeS = Student nodeQ1= first response nodeT = Learning nodeG = Guessing nodeS = Slipping node

Parameters in bold are learnedfrom data while the others are fixed

K K K

Q Q Q

P(T) P(T)P(L0|Q1)

P(G)P(S)

S

Student-Skill Interaction Model

Node statesK , Q, Q1, T, G, S = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)(Where N is the number of students in the training data)

G S

T

P(T|S)

P(G|S) P(S|S)

Q1

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized ModelModel ParametersP(L0) = Probability of initial knowledgeP(L0|Q1) = Individual Cold start P(L0)P(T) = Probability of learningP(T|S) = Students’ Individual P(T)P(G) = Probability of guessP(G|S) = Students’ Individual P(G)P(S) = Probability of slipP(S|S) Students’ Individual P(S)

Node representationsK = Knowledge nodeQ = Question nodeS = Student nodeQ1= first response nodeT = Learning nodeG = Guessing nodeS = Slipping node

Parameters in bold are learnedfrom data while the others are fixed

K K K

Q Q Q

P(T) P(T)P(L0|Q1)

P(G)P(S)

S

Student-Skill Interaction Model

Node statesK , Q, Q1, T, G, S = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)(Where N is the number of students in the training data)

G S

T

P(T|S)

P(G|S) P(S|S)

Q1

S identifies the student

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized ModelModel ParametersP(L0) = Probability of initial knowledgeP(L0|Q1) = Individual Cold start P(L0)P(T) = Probability of learningP(T|S) = Students’ Individual P(T)P(G) = Probability of guessP(G|S) = Students’ Individual P(G)P(S) = Probability of slipP(S|S) Students’ Individual P(S)

Node representationsK = Knowledge nodeQ = Question nodeS = Student nodeQ1= first response nodeT = Learning nodeG = Guessing nodeS = Slipping node

Parameters in bold are learnedfrom data while the others are fixed

K K K

Q Q Q

P(T) P(T)P(L0|Q1)

P(G)P(S)

S

Student-Skill Interaction Model

Node statesK , Q, Q1, T, G, S = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)(Where N is the number of students in the training data)

G S

T

P(T|S)

P(G|S) P(S|S)

Q1

T contains the CPT lookup table of individual student learn rates

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

(Pardos & Heffernan, JMLR 2011)

Fully Individualized ModelModel ParametersP(L0) = Probability of initial knowledgeP(L0|Q1) = Individual Cold start P(L0)P(T) = Probability of learningP(T|S) = Students’ Individual P(T)P(G) = Probability of guessP(G|S) = Students’ Individual P(G)P(S) = Probability of slipP(S|S) Students’ Individual P(S)

Node representationsK = Knowledge nodeQ = Question nodeS = Student nodeQ1= first response nodeT = Learning nodeG = Guessing nodeS = Slipping node

Parameters in bold are learnedfrom data while the others are fixed

K K K

Q Q Q

P(T) P(T)P(L0|Q1)

P(G)P(S)

S

Student-Skill Interaction Model

Node statesK , Q, Q1, T, G, S = Two state (0 or 1)Q = Two state (0 or 1)S = Multi state (1 to N)(Where N is the number of students in the training data)

G S

T

P(T|S)

P(G|S) P(S|S)

Q1

P(T) is trained for each skill which gives a learn rate for:P(T|T=1) [high learner] and P(T|T=0) [low learner]

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

(Pardos & Heffernan, JMLR 2011)

SSI model results

Algebra Bridge to Algebra0.279

0.28

0.281

0.282

0.283

0.284

0.285

0.286

PPSSSIRM

SE

Dataset New RMSE Prev RMSE Improvement

Algebra 0.2813 0.2835 0.0022

Bridge to Algebra 0.2824 0.2860 0.0036

Average of Improvement is the difference between the 1st and 3rd place. It is also the difference between 3rd and 4th place.

The difference between PPS and SSI are significant in each dataset at the P < 0.01 level (t-test of squared errors)

Intro to Knowledge Tracing

Bayesian Knowledge Tracing & Other Models PLSC Summer School 2011Zach Pardos

(Pardos & Heffernan, JMLR 2011)

top related