bayesian ordinal peer grading - cornell universitykarthik/publications/ppt/nips2014-hpml.pdf ·...

Bayesian Ordinal Peer Grading

Karthik Raman, Thorsten Joachims

Cornell University

Evaluation at Scale: Peer Grading

• MCQs & Auto-graded questions: Not a good test of understanding. • Limits kinds of courses offered.

• Conventional Evaluation:• Medium-scale classes (20-200 students) : TAs perform grading.

• Scaling to MOOCs (10000+ students)??

• PEER GRADING: Students grade each other.

• Overcomes scaling limitation of TA grading:• Number of “graders” scales with number of

students!

Ordinal Peer Grading [KDD 2014]

• Students are not trained graders: Need to make feedback process simple!

• Ordinal feedback easier to provide and more reliable than cardinal feedback:• Project X is better than Project Y vs. Project X is a B+.

• Ordinal Peer Grading: Graders provide ordering of assignments • Need to infer overall ordering and grader reliabilities.

>

>

..

..> >

…

…

… …

A

A

B

B

B

C

C

…

…

..

..

Mallows Ordinal Peer Grading

• GENERATIVE MODEL:

• is the Kendall-Tau distance between orderings:

• # of pairs ordered differently between the two rankings.

• Greedy algorithm to find MLE (Maximum-Likelihood estimator).

• Easy to extend: Grader Reliabilities, other aggregation models (Bradley-Terry ..)

• Viable alternate to conventional TA evaluation [KDD ’14]• As good (if not better) than cardinal peer grading.

Instructors Want More Details!

• Need finer-grained details to fully trust algorithm.

• Uncertainty information• Helps identify most

confusing assignments.

• Allows instructors to understand grading output.

Solution: Bayesian Mallows Peer Grading

• Sample orderings using MCMC (Metropolis-Hastings)

σt-

1σ’Sample new

orderingAccept/Reject?

σt…..

O(|D| log|D|) time

…..

O(|D|2) time

Experiments

• Meaningful posteriors at cost of minimal grading error

20.8

31.9

21.8 21.4

20.3

30.8

15

20

25

30

35

40

Poster Report

NCS (Cardinal)

MAL-MLE

Bayes-MAL

44.2

51.7

42.6 45.1

74.5

86.073.9 74.0

30

40

50

60

70

80

90

Poster Report

Bayes(50%)

Boot(50%)

Bayes(80%)

Boot(80%)

bayesian ordinal peer grading - cornell universitykarthik/publications/ppt/nips2014-hpml.pdf ·...

Documents