[ieee 2013 learning and teaching in computing and enginering (latice) - macau (2013.3.21-2013.3.24)]...

8
Empirical Study on the Effect of Achievement Badges in TRAKLA2 Online Learning Environment Lasse Hakulinen, Tapio Auvinen, and Ari Korhonen Department of Computer Science and Engineering Aalto University Espoo, Finland Email: {lasse.hakulinen, tapio.auvinen, ari.korhonen}@aalto.fi Abstract—Achievement badges are a form of gamification that can be used to motivate users and to encourage desired actions. In this study, we describe and evaluate the use of achievement badges in the TRAKLA2 online learning environment where students complete interactive, automatically assessed exercises about data structures and algorithms. The students’ activity in TRAKLA2 was logged in order to find out whether the achievement badges had an effect on their behavior. We used a between-subject experimental design where the students (N=281) were randomly divided into a treatment and a control group, with and without achievement badges. Students in the treatment group were awarded achievement badges, for example, for solving exercises with only one attempt, returning exercises early, or completing an exercise round with full points. Course grading was similar for both groups, i.e. collecting badges did not affect the final grade. Our results show that achievement badges can be used to affect the behavior of students even when the badges have no impact on the grading. Statistically significant differences in students’ behavior were observed with some badge types, while some badges did not seem to have such an effect. We also found that students in the two studied courses responded differently to the badges. Based on our findings, achievement badges seem like a promising method to motivate students and to encourage desired study practices. I. I NTRODUCTION Online learning environments are often used in computer science (CS) education because they help ease the teachers’ burden of marking exercises and can provide instant feedback for the students, which could be otherwise impossible in large courses. However, regardless of the benefits, they can be improved. In this paper, we report on an experiment in which achievement badges have been utilized in order to study if they have an effect on students’ time management, carefulness, and learning results in general. Gamification is the use of game design elements in non- game contexts such as learning environments [1]. In gamifica- tion, engaging features of games are applied to other systems in order to make them more engaging and fun. Gamification has been popular over the last years, for example, in marketing and social media platforms. A typical gamification method is the use of achievement badges. A badge is a graphical icon that appears to the user after reaching an achievement. Typically badges have no practical value to users, i.e. they are not worth money or open new possibilities in a game or a learning environment. Instead, the motivation to pursue badges comes from the emotional reward of achieving a challenging goal. Badges can also be used to imply what is desired behavior, or to showcase one’s performance to peers. TRAKLA2 [2] is an online learning environment where students do algorithm simulation exercises and get instant feedback. In this research paper, we describe and evaluate the use of achievement badges in TRAKLA2. The goal for using achievement badges was to motivate students to willingly follow better study practices instead of enforcing them. In our experiment, students were randomly divided into a treatment group with badges and a control group without badges. Badges were awarded, for example, for solving exer- cises without mistakes, returning exercises early, or completing an exercise round with full points. The badges had no effect on grading, however. After the course, we studied students’ behavior from the system logs. Our research question was: Do achievement badges have an effect on students’ behavior in terms of time management, carefulness, and learning? This paper is structured as follows: related work and earlier studies are described in Section II. Our experiment is described in Section III, and the results of the experiment are reported in Section IV. In Section V, we present our interpretation of the results, and finally, Section VI concludes the paper. II. RELATED WORK Over the last years, different forms of gamification have been applied to a variety of systems [3]. The main goal of gamification is not to turn the systems into fully fledged games, but rather to apply some game elements in order to make the systems more motivating and engaging, or to alter users’ behavior in some meaningful way. Gamification is a broad concept and uses methods such as achievement badges, leader boards, points, levels, and power-ups. Gamification is not a new concept, however. For example, the merit badges given by the scouts and military marks of rank are well known forms of reward systems that can be related to gamification. Achievement badges are one of the most commonly used gamification methods [4]. There are many definitions for achievement badges, but commonly they are seen as an ad- ditional system which provides optional goals and challenges. Montola [5] et al. describe achievement systems as secondary reward systems with optional sub-goals that are visible to others. Hamari and Eranti [4] define them as an optional challenge provided by a meta-game that is independent of a single game session and yields possible reward(s). The 2013 Learning and Teaching in Computing and Engineering 978-0-7695-4960-6/13 $26.00 © 2013 IEEE DOI 10.1109/LaTiCE.2013.34 47 2013 Learning and Teaching in Computing and Engineering 978-0-7695-4960-6/13 $26.00 © 2013 IEEE DOI 10.1109/LaTiCE.2013.34 47

Upload: a

Post on 12-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Empirical Study on the Effect of AchievementBadges in TRAKLA2 Online Learning Environment

Lasse Hakulinen, Tapio Auvinen, and Ari KorhonenDepartment of Computer Science and Engineering

Aalto University

Espoo, Finland

Email: {lasse.hakulinen, tapio.auvinen, ari.korhonen}@aalto.fi

Abstract—Achievement badges are a form of gamification thatcan be used to motivate users and to encourage desired actions.In this study, we describe and evaluate the use of achievementbadges in the TRAKLA2 online learning environment wherestudents complete interactive, automatically assessed exercisesabout data structures and algorithms. The students’ activityin TRAKLA2 was logged in order to find out whether theachievement badges had an effect on their behavior. We used abetween-subject experimental design where the students (N=281)were randomly divided into a treatment and a control group,with and without achievement badges. Students in the treatmentgroup were awarded achievement badges, for example, for solvingexercises with only one attempt, returning exercises early, orcompleting an exercise round with full points. Course gradingwas similar for both groups, i.e. collecting badges did not affectthe final grade. Our results show that achievement badges can beused to affect the behavior of students even when the badges haveno impact on the grading. Statistically significant differences instudents’ behavior were observed with some badge types, whilesome badges did not seem to have such an effect. We also foundthat students in the two studied courses responded differentlyto the badges. Based on our findings, achievement badges seemlike a promising method to motivate students and to encouragedesired study practices.

I. INTRODUCTION

Online learning environments are often used in computerscience (CS) education because they help ease the teachers’burden of marking exercises and can provide instant feedbackfor the students, which could be otherwise impossible in largecourses. However, regardless of the benefits, they can beimproved. In this paper, we report on an experiment in whichachievement badges have been utilized in order to study if theyhave an effect on students’ time management, carefulness, andlearning results in general.

Gamification is the use of game design elements in non-game contexts such as learning environments [1]. In gamifica-tion, engaging features of games are applied to other systemsin order to make them more engaging and fun. Gamificationhas been popular over the last years, for example, in marketingand social media platforms. A typical gamification method isthe use of achievement badges. A badge is a graphical icon thatappears to the user after reaching an achievement. Typicallybadges have no practical value to users, i.e. they are not worthmoney or open new possibilities in a game or a learningenvironment. Instead, the motivation to pursue badges comesfrom the emotional reward of achieving a challenging goal.Badges can also be used to imply what is desired behavior, or

to showcase one’s performance to peers.

TRAKLA2 [2] is an online learning environment wherestudents do algorithm simulation exercises and get instantfeedback. In this research paper, we describe and evaluatethe use of achievement badges in TRAKLA2. The goal forusing achievement badges was to motivate students to willinglyfollow better study practices instead of enforcing them.

In our experiment, students were randomly divided intoa treatment group with badges and a control group withoutbadges. Badges were awarded, for example, for solving exer-cises without mistakes, returning exercises early, or completingan exercise round with full points. The badges had no effecton grading, however. After the course, we studied students’behavior from the system logs. Our research question was:

• Do achievement badges have an effect on students’behavior in terms of time management, carefulness,and learning?

This paper is structured as follows: related work and earlierstudies are described in Section II. Our experiment is describedin Section III, and the results of the experiment are reportedin Section IV. In Section V, we present our interpretation ofthe results, and finally, Section VI concludes the paper.

II. RELATED WORK

Over the last years, different forms of gamification havebeen applied to a variety of systems [3]. The main goal ofgamification is not to turn the systems into fully fledged games,but rather to apply some game elements in order to makethe systems more motivating and engaging, or to alter users’behavior in some meaningful way. Gamification is a broadconcept and uses methods such as achievement badges, leaderboards, points, levels, and power-ups. Gamification is not anew concept, however. For example, the merit badges given bythe scouts and military marks of rank are well known formsof reward systems that can be related to gamification.

Achievement badges are one of the most commonly usedgamification methods [4]. There are many definitions forachievement badges, but commonly they are seen as an ad-ditional system which provides optional goals and challenges.Montola [5] et al. describe achievement systems as secondaryreward systems with optional sub-goals that are visible toothers. Hamari and Eranti [4] define them as an optionalchallenge provided by a meta-game that is independent ofa single game session and yields possible reward(s). The

2013 Learning and Teaching in Computing and Engineering

978-0-7695-4960-6/13 $26.00 © 2013 IEEE

DOI 10.1109/LaTiCE.2013.34

47

2013 Learning and Teaching in Computing and Engineering

978-0-7695-4960-6/13 $26.00 © 2013 IEEE

DOI 10.1109/LaTiCE.2013.34

47

motivation for pursuing badges can vary between the users.Montola et al. [5] state that different motivations to completeachievements include e.g. social status, completionism andextended play time.

The use of achievement badges has been shown to havean impact on users’ behaviour even though the badges haveno concrete (e.g. monetary) value. Montola et al. [5] studiedachievement badges in a photo sharing service. They foundthat there is potential in badges as some users were motivatedby them. However, some other users were indifferent towardsthem or did not like them. They conclude that overall, badgesdid not play a key role in the system.

There are examples of using gamification to increasestudents’ learning motivation. For example, Flatla et al. [6]tested gamified calibration tasks among university students andfound out that they were highly preferred to the traditionalcalibration tasks. In addition to increasing motivation, badgescan be used to explicitly tell students what kind of behavioris desired and considered beneficial. Badges can also be usedto offer additional challenges without raising the requirementsfor the highest course grade.

A. Risks of gamification

Even though it seems that gamification can increase moti-vation and make tedious tasks more fun, there are also possiblepitfalls. Gamification has been criticized for focusing too muchon external rewards when the actual engagement should comefrom students’ intrinsic motivation [7]. Nicholson [8] statesthat one problem with gamification is that it can reduce internalmotivation for the activity by replacing internal motivationwith external. He, however, suggests that gamification can beused to improve internal motivation if the game elements canbe made meaningful to users through information. Moreover,Lee and Hammer [9] point out that extensive gamificationmight teach students that they should only study when pro-vided with external rewards.

Weaver et al. have analyzed the presenter’s paradox in anumber of studies [10]. Their findings show that perceiversmake less favorable evaluations of the value of a productwhen mildly favorable information is added to highly favorableinformation. In other words, adding an invaluable extra itemto a valuable product may make the package appear lessvaluable in the eyes of the buyer. For example, offering onemonth free music service with an very expensive mobilephone might appear less valuable for a customer than withoutthis extra service. We should consider the risk that addingachievement badges to learning environments may have similarside effects as presenter’s paradox. By adding badges, wemight accidentally imply that the exercises are not intrinsicallymotivating and hence completing the tasks is compensated withexternal rewards.

B. Side effects of automatic assessment

While online learning environments with automatic assess-ment have many benefits, they may allow some students toincorporate undesired study practices. Edwards [11] brings outthat trial and error behavior is an undesired approach amongcomputer science students using automatic assessment. He

claims that students will be more successful at learning if theymove from trial and error to reflection in action.

In our previous study [12], we have observed that ifthe number of resubmissions is not limited, the students arewilling to resubmit and keep trying an exercise until theysolve it correctly. This results in very good average pointsin the course. In addition, although students do not need fullpoints to get the highest possible course grade, quite a highpercentage of students do get full points. Thus, there seemsto be an intrinsic motivation to solve the exercises correctly.However, we have observed two downsides from the unlimitedresubmissions: trial and error behavior [13] and the fact thatthe students get less points from the very first submission [12].This indicates that the students are more careful if there issome penalty involved with incorrect submissions. We believethat achievement badges could have a role to play in here byproviding a reward instead of limiting resubmissions. Thus,one of the aims of this study was to find out how to use ”carrot-and-stick” in such a way that gives students the best learningexperience while still maintaining good learning results. Insome cases, it might be a good idea to penalize bad behaviorby ”stick”, but perhaps achievement badges can provide novel”carrots” that lead to similar results.

Effective time management helps students to choose whatto work on and when. Some students procrastinate doing theirexercises and complete the exercises very near the deadlines.One way to study achievement badges is to look at if theychange students’ time management practices. In particular, welook at if students complete their exercises earlier and thereforeleaving them more time to overcome problems with less hurry.

In our earlier studies we have reported small studentcohorts that seem to solve exercises by iterating instead ofthinking carefully before submitting. One of the aims of thisstudy was to find out if badges motivate students to solve theexercises more carefully. By carefulness, we refer to doingthe exercises meticulously and thinking through the exercisesbefore submitting, in contrast to relying on trial and error.We have previously tested several resubmission policies [12].One way to prevent trial and error is to limit the number ofresubmissions. However, achievement badges might have thesame effect without enforcing strict policies.

Finally, we are interested in students’ learning effect aswell. By that we mean how well the overall learning goals areachieved. We have shown in our previous study [14] that theTRAKLA2 learning environment quite well satisfies our needsin this respect. Most of the problems are not in the learningitself, but in some side effects this kind of distance learningenvironments might cause. Although, some of the badges arerelated to how learning-oriented the students are, our mainfocus in this paper is on the time-management and carefulnessissues.

III. METHODS

In our experiment, achievement badges were added to theTRAKLA2 online learning environment. We used a betweensubject design where students were randomly assigned to atreatment group with the badges visible, and to a control groupwith the badges hidden. The experiment was conducted in theData Structures and Algorithms course at Aalto University in

4848

Spring 2012. The course is mandatory for computer sciencemajor students and for minor students from multiple differentdepartments. CS majors typically take the course in their firstyear, whereas minors take it in the second year. The coursehad 56 mandatory homework exercises that were done onlinein TRAKLA2. The exercises were divided into 8 rounds withdeadlines roughly one week apart.

TRAKLA2 exercises were graded on a scale from 0 (fail)to 5 so that 50% of exercise points yielded passing grade 1 and90% of points yielded grade 5. Major and minor students hadseparate course instances. The aforementioned grade formed20% of the final course grade for major students and 30% forminor students. The rest of the grade was determined by thefinal examination (40% weight in both courses), and closedlabs in majors (40% weight), and a project group work inminors (30% weight). Both courses had the same lectures aswell as the same final examination.

We did not advertise the badges in the course other thanwhat the treatment group saw in the TRAKLA2 environment.This was to reduce possible contamination where the controlgroup is aware of the badges. However, if a student fromeither group wanted to discuss about the badges during thecourse, this was done openly and we responded by telling thatthere is an ongoing research about the effects of badges in thiscourse. Moreover, we did not provide any external motivationto pursue badges such as points to final examination or someother things that could have influenced the final grade.

In a TRAKLA2 exercise, student is shown a piece of code(algorithm) that he or she is supposed to simulate with agiven input data. Typically this simulation involves showinga number of data structures and their content during theexecution of the algorithm. This process is aided by a graphicaluser interface that takes care of drawing the intermediate statesof the data structures while the student manipulates them withthe mouse. Each exercise is worth a certain amount of points.Exercises can be submitted after the deadlines as well, butpoints per exercise are reduced by 50% in that case.

The input data is randomly chosen for each student andeach attempt. Thus, students can try to solve the exercises asmany times as they want, and the best attempt counts. Evenif an exercise is already solved correctly, it can be repeated,for example, while preparing to the final examination. Thesystem also provides model solutions that are visualized asan algorithm animation. The model solution can be comparedwith the student’s solution after the exercise is submitted.Furthermore, students are allowed to see the model solutions atany time without even submitting. As the input data structuresare initialized with random data each time, it is impossible tojust copy the model answer and resubmit it anew.

A. Achievement Badges

There were eight different achievement badges that stu-dents were able to earn on each exercise round. The badgesand their criteria are shown in Table I. In the treatment group,students were able to see the badge descriptions in order toknow how to get them. They also saw the available badgesas gray and blurry images, making it clear which badges arestill waiting to be unlocked. After fulfilling the criteria forunlocking a badge, the badge icon was made visible in the

student’s personal TRAKLA2 main page. There were alsosimple statistics that showed how many badges have beenunlocked overall in the course. However, the unlocked badgesof an individual student were not visible to others.

TABLE I. BADGE DESCRIPTIONS

Id Icon Name Description

A1 Early BirdComplete a round with full points at leasta week before the deadline.

A2 Fast & FuriousBe in the fastest 30 (majors) / 60 (minors)who complete the round with full points.

A3SpeedMachine

Be in the fastest 10 (majors) / 20 (minors)who complete the round with full points.

B1 Got it!Get an exercise correct with first submis-sion (also after deadline).

B2 BrainiacGet full points from the round and use atmost 2 tries for each exercise on average.

B3Y U No MakeMistakes?

Get full points from all the exercises withfirst try.

C1MissionAccomplished

Get full points from the round.

C2 Recap paceR

Get full points from the round and do allthe exercises correctly twise so that thereis at least a week between the first and thelast correct submission of each exercise.

The badges can be categorized into three categories basedon their criteria: time management, carefulness, and learning.Badges A1, A2 and A3 belong to the time managementcategory, and they encourage students to complete the exerciseswell before the deadline. Badges A2 and A3 are competitivebadges, meaning that only a fixed number of students in eachround can get the badge. The carefulness category includesbadges B1, B2 and B3. They encourage students to thinkcarefully before submitting a solution and to avoid submittingincorrect answers. Finally, badges C1 and C2 form the learningcategory. They encourage students to complete the exerciseswith full points and to recap them, regardless of the numberof submissions or the time the submissions are done.

There are also some connections between the badges in thesame category. We wanted to make it possible to collect allthe badges and therefore getting a badge with strict criteriawill also result in getting the similar but easier badge. Theimplications between the badges are as follows:

• Time management: A3 ⇒ A2 (⇒) A1

• Carefulness: B3 ⇒ B2, B3 ⇒ B1, B2 (⇒) B1

• Learning: C2 ⇒ C1

X (⇒) Y means that it is possible to get X without Y, but inmost cases getting badge X results in getting also badge Y.

IV. RESULTS

All major students (Nmajor = 94) and minor students(Nminor = 187) that completed at least one TRAKLA2exercise are included in the study (Nall = 281). Students whoregistered in the course but did not submit anything, as well

4949

as the lecturer, teaching assistants, etc. personnel are excludedfrom the results.

The mean number of different badges awarded to studentsin the treatment and control groups are shown in Table II.We have collected the same statistics for both of the groupseven though the control group did not see the badges. Our nullhypothesis is that there are no significant differences betweenthe number of badges earned by the treatment and controlgroups. If the treatment group consciously aims to earn badges,there should be a significant increase in the number of earnedbadges compared to the control group. The number of awardedbadges was not normally distributed, so significance was testedwith a non-parametric test. We cannot assume that the effectis always positive. It is possible that there are some undesiredeffects which cause the treatment group to earn less badgesthan the control group. Therefore, the two-tailed Wilcoxonrank-sum test was used to test significance. The results areshown in Table II. Columns ”C treatm” and ”C control” showthe proportion of students who unlocked at least one suchbadge during the course, to give an idea of the prevalenceof each badge.

TABLE II. MEAN NUMBER OF AWARDED BADGES (ALL STUDENTS),Ntreatment = 142, Ncontrol = 139.

id meantreatm

meancontrol

W p-value C treatm C control

A1 1.54 1.25 8968 0.15 0.51 0.42

A2 2.73 2.21 8837 0.12 0.68 0.65

A3 0.93 0.69 8901 0.07 0.33 0.23

B1 6.20 6.07 9506 0.58 0.99 1.00

B2 2.44 1.86 8447 0.03 0.73 0.65

B3 0.27 0.09 8599 < 0.01 0.22 0.09

C1 4.24 3.75 8886 0.15 0.88 0.83

C2 0.25 0.01 9312 0.01 0.06 0.01

Significant differences (p < 0.05) between the control andtreatment groups were observed in badges B2 (Brainiac), B3(Y U No Make Mistakes?), and C2 (Recap paceR). In otherbadges, the differences were not statistically significant butwere in favor of the treatment group.

The mean number of different badges earned by majorstudents is shown in Table III, and by minor students inTable IV. With major students, the difference between treat-ment and control groups is significant for badge A3 (Speedmachine). On the other hand, with minor students, badges B2(Brainiac) and B3 (Y U No Make Mistakes?) have significantdifferences. There are no statistically significant differences inother badges, but again the mean number of badges earned bythe treatment group is greater in almost every case.

TABLE III. MEAN NUMBER OF AWARDED BADGES (MAJOR

STUDENTS), Ntreatment = 48, Ncontrol = 46

id meantreatm

meancontrol

W p-value C treatm C control

A1 1.31 0.78 904 0.09 0.48 0.30

A2 2.67 1.94 922 0.16 0.69 0.61

A3 0.96 0.48 854 0.02 0.40 0.15

B1 5.75 5.94 1223 0.36 1.00 1.00

B2 1.88 1.72 1039 0.61 0.67 0.59

B3 0.25 0.11 989 0.17 0.21 0.11

C1 3.79 3.02 916 0.15 0.85 0.72

C2 0.31 0.00 1035 0.09 0.06 0.00

Figure 1 offers a more detailed view showing the amountof badges awarded on each round for the treatment and control

TABLE IV. MEAN NUMBER OF AWARDED BADGES (MINOR

STUDENTS), Ntreatment = 94, Ncontrol = 93

id meantreatm

meancontrol

W p-value C treatm C control

A1 1.65 1.47 4156 0.53 0.53 0.48

A2 2.76 2.34 4034 0.35 0.68 0.67

A3 0.91 0.80 4226 0.62 0.30 0.27

B1 6.44 6.13 3859 0.15 0.99 1.00

B2 2.73 1.94 3535 0.02 0.77 0.68

B3 0.29 0.09 3755 0.01 0.22 0.09

C1 4.47 4.11 4065 0.40 0.89 0.89

C2 0.22 0.01 4138 0.06 0.06 0.01

groups. The topic of each exercise round and the mean pointsearned overall by all students are also shown in the figure. Thetreatment group earned more badges than the control in almostevery case.

A. Time management

Figure 2 shows the proportions of submissions that arrivedmore than 24 hours before the deadline, less than 24 hoursbefore the deadline, and after the deadline. Only the finalsubmission to each exercise is included, and recap submis-sions are excluded. Pearson’s χ2 test was used to test if thedistributions differ significantly. The results are reported inTable V. Our null hypothesis was that there is no differencebetween the distributions of the treatment and control groups.The distributions are significantly different for major students(p < 0.01) and the whole population (p < 0.01). For minorstudents, the difference is not significant (p = 0.063).

0% 20% 40% 60% 80% 100%

ControlMinor

Major

All

Treatment

Control

Control

Treatment

Treatment

Submissions more than 24h before deadline

Submissions less than 24h before deadline Late submissions

Fig. 2. Proportion of exercises submitted early, last day and late.

TABLE V. DISTRIBUTION OF SUBMISSIONS, EARLY, LAST DAY AND

LATE, df = 2

treatment N early(%)

lastday(%)

late(%)

χ2 p-value

Minortreatment 4950 56.2 36.3 7.5

5.5 0.063control 4790 57.4 34.3 8.4

Majortreatment 2258 39.0 45.3 15.7

282.1 < 0.01control 2170 63.1 31.0 5.9

Alltreatment 7208 50.9 39.1 10.1

101.9 < 0.01control 6960 59.2 33.2 7.6

B. Carefulness

One of the characteristics of doing the exercises carefullyis to avoid unnecessary submissions. Figure 3 shows thehistogram of students’ total points divided by the submission

5050

1 2 3 4 5 6 7 80

20

40

60

80

1 2 3 4 5 6 7 80

10

20

30

1 2 3 4 5 6 7 80

10

20

30

40

50

1 2 3 4 5 6 7 80

5

10

15

1 2 3 4 5 6 7 80

10

20

30

40

50

1 2 3 4 5 6 7 80

5

10

15

20

1 2 3 4 5 6 7 80

2

4

6

Badge A1: Got It!

Badge A2: Brainiac

Badge A3: Y U No Make Mistakes?

Badge B1: Early Bird

Badge B2: Fast & Furious

Badge B3: Speed Machine

Badge C1: Mission Accomplished

Badge C2: Recap paceR

round

Treatment

Control

1 2 3 4 5 6 7 80

20

40

60

80

100% %

% % %

% %

%

1 basics 88.52 sorting 74.83 tree traversal 74.94 heaps 83.55 dictionaries 69.66 search trees 63.97 hashing 65.98 graphs 58.3

topic mean points (%)

Fig. 1. Percentage of students who earned each badge, round by round. (All students)

count. The mean points earned per one submission was 15.8for the treatment group and 14.9 for the control group. Twotailed t-test was used to test the difference, and it was notsignificant (t(274) = -1.36, p = 0.17).

���

���

���

��� ���� ����� ����� ����� ����� �����

�� ����������� ��

��������� �������������� ���� �

����� �

�� ����

��

Fig. 3. Points per submission. (All students)

C. Learning

Total mean points for TRAKLA2 exercises in treatmentand control groups were 73% and 70% of maximum points,respectively. The distributions are shown in Figure 4. Thepoints are not normally distributed and therefore the differencewas tested with a non-parametric test. Students in the treatmentgroup got more points on average, but the difference wasnot statistically significant (Wilcoxon rank sum test, W =9180, Ntreatment = 142, Ncontrol = 139, p = 0.31, two-tailed).

��������� ���� ��������� ������������� ����

�� � ���� ���

���

����

�����

�������������

Fig. 4. Distributions of final TRAKLA2 points.

V. DISCUSSION

Our results show that some achievement badges had animpact on the behavior of the students. However, the groupof students that changed their behavior because of the badges,was not big. Feedback collected from students after the coursewas generally positive, but it is not discussed further in thispaper.

Majority of students behaved similarly in the treatmentand control groups, but a small group showed a significantchange in their behavior, indicating that they were motivatedto pursue the badges. The biggest absolute difference betweentreatment and control groups was with badge B3 (Y U NoMake Mistakes?). Table II shows that 22% of students in the

5151

treatment group and 9% in the control group received at leastone badge B3 during the course.

The fact that only a small number of students activelycollected badges is in line with earlier studies on achieve-ment badges (e.g. [5]). In our case, we might be able tomotivate more students to collect badges by providing someexternal motivation. For example, if the badges would havecontributed to the final grade, we believe that more studentswould have been interested in pursuing them. As Hamari andEranti [4] point out, the reward for unlocking a badge canalso be something that has value outside the achievementsystem meta-game. However, in this study, the cause-and-effectrelationships would have been affected as the students mostprobably would have pursued the better grade instead of badgesas such. In addition, because of the ethical reasons, we wantedboth the treatment and control groups to have equal gradingcriteria. Thus, we wanted the experiment to be as simple aspossible and try to find evidence if the students have intrinsicmotivation to earn badges. In the future, however, we can useother motivators as well, in which case the badges only make itmore visible that we encourage certain kind of behavior (e.g.,dividing time-on-task more evenly during the course).

Figure 1 shows the percentage of students who earneddifferent badges on each round. It seems that the biggestdifferences between the treatment and control groups are in thebadges that were hard to earn. Badge 1 (Got it!) was earnedby majority of students from the control group, and there isvery little difference between the groups. In contrast, BadgeB3 (Why Y No Make Mistakes?) and C2 (Recap paceR) weredifficult to earn, and a much larger difference is observed.The reason might be that the additional challenge providedby harder badges is found motivating and therefore worthpursuing. Moreover, there might be a ceiling effect regardingthe easier badges.

The carefulness badges, especially badge B3, seem tocorrelate with the difficulty of the rounds. Round 2 containedsome tedious sorting exercises, and no one was able to solvethe round without mistakes. On the other hand, Round 4contained a heap tutorial that allowed students to practice heapconcepts before attempting the TRAKLA2 exercises, makingit easier to get the exercises correctly with the first try. The factthat the treatment group earned badges much more frequentlythan the control group implies that students were aiming forthe badge.

A. Time management

There were three ”time management” badges that rewardedstudents for completing tasks early (badges A1, A2 and A3).Badges A2 and A3 were competitive so that a fixed amountof fastest students were given the badge, while badge A1 wasawarded to everyone who completed a round at least a weekbefore the deadline. In practice, pursuing for the competitivebadges also resulted in getting badge A1.

The effect of the time management badges was strongerwith CS major than minor students, meaning that majorstudents were more motivated to pursue them. This might becaused by the differences in course arrangements. CS majorsmeet each other in weekly lab sessions and get to know eachother, which may lead to competition about the badges. The

fact that students know each other may also increase the socialvalue of the badges.

Over the years, we have noticed that a big part of thesubmissions to TRAKLA2 come very close to the deadlines.One of the reasons to apply badges to the system was to studyif this behavior could be changed so that students would notleave completing the exercises to the last minute. Our resultsshow that with major students, the proportion of studentswho submit early is significantly increased because of thebadges (Figure 2). Interestingly, major students with badgessubmitted exercises about as early as minor students with orwithout badges. This could mean that for minor students, therewas no room for improvement while for major students, wesuccessfully discouraged unwanted behaviour by introducingbadges.

B. Carefulness

Three of the badges (B1, B2 and B3) rewarded students fornot submitting incorrect answers and avoiding trial and errorbehavior. In the minor course, the difference in the number ofearned B2 and B3 badges was statistically significant, whereasin the major course, the carefulness badges did not seem tohave such an effect. It seems that especially in minor course,badges encouraged students to think the exercises throughmore carefully before submitting them. In our previous studies,we have noticed that having no resubmission limits in theexercises leads to trial and error behavior with some students.It seems that badges can be used to some extent to reduce thisphenomenon. However, more research is needed to understandwhy the badges did not have a similar effect in both courses.

Students in the treatment group used slightly less sub-missions per earned point than students in the control group(Figure 3), but the difference was not statistically significant. Abadge awarded for submitting exercises without any mistakesmay not be the best motivator against trial and error behavior.Once a student makes the first mistake, the badge is lost andthere is no longer an incentive to avoid resubmissions in therest of the exercises in that round. Furthermore, the giftedstudents who are capable of completing a whole round withoutany mistakes are unlikely the ones prone to iterating in the firstplace. Thus, it is important to balance the achievement criteriaso that they are realistically reachable by the group whosebehavior we are trying to change.

C. Time management vs. Carefulness

The CS major students, who had stronger effect in thetime management badges, did not have such an effect in thecarefulness badges. On the other hand, with minor studentsthis phenomenon was reversed: they had a strong effect inthe carefulness badges, but no significant effect in the timemanagement badges. The nature of these badges might causethem to be mutually exclusive. Reaching for a position in thetop fastest solvers might cause those students to choose speedover carefulness. On the other hand, being cautious aboutmaking mistakes will probably lead to missing the competitivetime management badges. For some reason, different badgeswere pursued by the students of the two courses. This phe-nomenon should be studied further in the future in order tobetter understand students’ motivation to pursue badges.

5252

Our preliminary hypothesis is that the different gradingpolicy between these courses might have a role to play in here.In the CS major course, the TRAKLA2 exercises contributedless (only 20% weight) to the final grade than in minorcourse (30% weight). Thus, the majors might have been morecareless and wanted to complete the exercises quickly, asthe final grade is mostly determined by other activities inthis course. In contrast, minors might have felt that this isa more important part of the course and invested more timeand effort to get the exercises right. Although the populationsare different (different major, minors are typically older, etc.),it would be interesting to study if this phenomenon disappearsin case the grading policies are the same. Another possibleexplanation is that the major students respond differently tothe competitive badges because they know each other betterfrom the lab sessions. Minor students, in contrast, come frommany different departments and complete a project work ingroups of 2–4 people instead of having weekly lab sessions.

D. Learning

There were two badges (C1 and C2) in the learningcategory that encouraged to complete exercise rounds with fullpoints and to recap the exercises afterwards. Badge C2 (RecappaceR) had significant differences between the treatment andcontrol groups. The badge itself required considerable effortto achieve, because it required students to redo a completeround with full points. In the control group, only one studentcompleted one round as recap. In the treatment group, 9students completed at least one round as recap, and some ofthem completed several rounds. Even though the number ofstudents who got at least one recap badge was not high in thetreatment group either, it is clearly seen that some studentswere pursuing the badge instead of getting it as a side effect ofsomething they would have done anyway. Two students wentas far as to complete every TRAKLA2 exercise twice with fullpoints in order to earn all available recap badges.

In our opinion, it is beneficial to redo TRAKLA2 exercisesas recap. However, the requirements for getting badge C2 werenot necessarily optimal for efficient learning. It would be betterto focus on the exercises that caused difficulties the first timearound, rather than trying to complete whole exercise rounds.Relaxing the achievement criteria could also motivate morestudents to purse the badge.

Overall, students in the treatment group earned more pointsin TRAKLA2 than students in the control group even thoughthe difference was not statistically significant (Figure 4). Theminor students performed better and had a larger differencebetween the treatment and control groups. As discussed earlier,one reason for differences between the courses might be thatTRAKLA2 had a greater impact on the course grade in theminor course.

E. Validity threats

Experimental research is conducted in order to determinecause-and-effect relationships. Our assumption has been thatin the treatment group, the changes in the independent variable(badges visible) caused the observed changes in the dependentvariable (behavior in TRAKLA2 exercises). In addition, in thecontrol group, this causality should be the opposite. All badges

achieved in control group should have been caused by the(perhaps motivating) exercises in TRAKLA2. However, thislatter assumption might not be the case as the groups mayhave been aware of each other. Thus, students in the controlgroup may have tried to achieve similar goals that have beenset for getting the badges.

There might also be some confounding variables that affectthe internal validity of this study. For some reason, CS majorstudents seem to be more interested in time managementbadges and CS minors in carefulness badges. This might havebeen resulted from the fact that the courses are a little bitdifferent. The TRAKLA2 exercises contribute less to the finalgrade for CS majors than for minors. In addition, CS majorstudents have more social interaction (closed labs with some20–25 participants), thus they might prefer competitive badgesover other badges. CS minors interact only in small teams (2–4members), thus they might be less competitive. Moreover, fourTRAKLA2 exercises were implemented with a new, different,framework in the major course even though the content of theexercises remained the same. Moreover, there can be otherextraneous variables related to the populations and coursesthat we are not aware of, which can influence the dependentvariable and the internal validity of this study. Low internalvalidity means lower evidence of causality.

In a controlled experiment, we should minimize the numberof independent variables. To reliably measure the effect of asingle badge, we should run the experiment for each badgeseparately. However, it would not have been meaningful forstudents to have only one badge available in TRAKLA2. Itis likely that each badge had an effect on multiple aspectsof students’ behaviour. For example, aiming for a competitivespeed badge is likely to reduce carefulness. Thus, our conclu-sions regarding the effects of the individual badges should beconsidered only indicative, while the main result is whetherbadges, in general, had any effect on students’ behavior.

VI. CONCLUSIONS

In this experiment, achievement badges were added tothe TRAKLA2 online learning environment, and changes tostudents’ behavior were studied from the system logs. Theresults show that achievement badges had a significant impacton some aspects of students’ behavior, and a small group ofstudents was especially motivated to pursue them.

Our motivation to include badges in TRAKLA2 was basedon our previous findings regarding undesired side effects suchas trial and error problem solving and poor time management.Badges may encourage students to self-reflection, or makethem aware of their own studying habits such as completingthe exercises early and checking the answers before submitting.The presence of the badges may make students more aware ofthe beneficial study practices even if they do not choose topursue the badges.

We were able to change the behavior of some studentsfor the better by rewarding them with badges. However, itis possible that some badges encouraged unwanted behavioras well. For example, targeting the competitive time man-agement badges might have reduced carefulness. Therefore,more research is needed in balancing the achievement criteriaso that they maximize beneficial learning practices while

5353

minimizing harmful side effects. More research is also neededto understand why the same set of badges had different effectson different populations.

Overall, the achievement badges had an impact on thestudents even though they did not affect the course grading inany way. Based on our results, badges seem like a promisingway to motivate students to study and to use desired learningpractices even if they are not enforced by strict policies suchas limiting the number of resubmissions. However, the appliedmethods should be carefully chosen in order to fully benefitfrom the engaging elements and to avoid gamification frombeing just unnecessary eye candy.

A. Future work

In our implementation, students were able to see theirown badges and simple statistics from the course that showedthe overall number of badges unlocked in the whole course.However, the social aspect of badges was missing. Allowingstudents to show their badges to others could make themmore desirable and might motivate more students to pursuethe badges.

In this study, it was impossible to reliably measure theeffects of individual badges because each badge is likelyto affect multiple aspects of students’ behavior. Studyingachievement badges in massive open online courses (MOOCs)with thousands of students would make it possible to dividestudents into multiple treatment groups with different sets ofbadges, in order to study which badges have the strongesteffects.

It might also be fruitful to perform a within-subjectsexperiment where the same students complete some roundswith badges and some without. The challenge is that the effectsof badges may be carried to later rounds without badges. Onthe other hand, having control rounds without badges first andtreatment rounds with badges in the end makes it difficult toeliminate the varying difficulty of the rounds.

REFERENCES

[1] S. Deterding, D. Dixon, R. Khaled, and L. Nacke, “Fromgame design elements to gamefulness: defining ”gamification”,” inProceedings of the 15th International Academic MindTrek Conference:Envisioning Future Media Environments, ser. MindTrek ’11. New

York, NY, USA: ACM, 2011, pp. 9–15. [Online]. Available:http://doi.acm.org/10.1145/2181037.2181040

[2] L. Malmi, V. Karavirta, A. Korhonen, J. Nikander, O. Seppala, andP. Silvasti, “Visual algorithm simulation exercise system with automaticassessment: TRAKLA2,” Informatics in Education, vol. 3, no. 2, pp.267–288, 2004.

[3] C. Muntean, “Raising engagement in e-learning through gamification,”in Proc. 6th International Conference on Virtual Learning ICVL, 2011,pp. 323–329.

[4] J. Hamari and V. Eranti, “Framework for designing and evaluatinggame achievements,” in Proceedings of DiGRA 2011 Conference: ThinkDesign Play, Hilversum, Netherlands, 2011.

[5] M. Montola, T. Nummenmaa, A. Lucero, M. Boberg, and H. Korhonen,“Applying game achievement systems to enhance user experience ina photo sharing service,” in Proceedings of the 13th InternationalMindTrek Conference: Everyday Life in the Ubiquitous Era, ser.MindTrek ’09. New York, NY, USA: ACM, 2009, pp. 94–97.[Online]. Available: http://doi.acm.org/10.1145/1621841.1621859

[6] D. Flatla, C. Gutwin, L. Nacke, S. Bateman, and R. Mandryk, “Cali-bration games: making calibration tasks enjoyable by adding motivatinggame elements,” in Proceedings of the 24th annual ACM symposiumon User interface software and technology. ACM, 2011, pp. 403–412.

[7] A. Watters, “Codecademy and the future of (not) learningto code,” http://www.hackeducation.com/2011/10/28/codecademy-and-the-future-of-not-learning-to-code/, [Online; accessed 1-October-2012].

[8] S. Nicholson, “A user-centered theoreti-cal framework for meaningful gamification,”http://scottnicholson.com/pubs/meaningfulframework.pdf, 2012,[Online; accessed 1-October-2012].

[9] J. Lee and J. Hammer, “Gamification in education: What, how, whybother?” Academic Exchange Quarterly, vol. 15, no. 2, p. 146, 2011.

[10] K. Weaver, S. Garcia, and N. Schwarz, “The presenter’s paradox,”Journal of Consumer Research, vol. 39, no. 3, pp. 445–460, 2012.

[11] S. Edwards, “Using software testing to move students from trial-and-error to reflection-in-action,” ACM SIGCSE Bulletin, vol. 36, no. 1, pp.26–30, 2004.

[12] L. Malmi, V. Karavirta, A. Korhonen, and J. Nikander, “Experienceson automatically assessed algorithm simulation exercises withdifferent resubmission policies,” Journal of Educational Resourcesin Computing, vol. 5, no. 3, September 2005. [Online]. Available:http://dx.doi.org/10.1145/1163405.1163412

[13] V. Karavirta, A. Korhonen, and L. Malmi, “On the use of resubmissionsin automatic assessment systems,” Computer Science Education,vol. 16, no. 3, pp. 229 – 240, September 2006. [Online]. Available:http://journalsonline.tandf.co.uk/link.asp?id=R77P7107U31V846J

[14] A. Korhonen, L. Malmi, P. Myllyselka, and P. Scheinin, “Does it makea difference if students exercise on the web or in the classroom?”in Proceedings of The 7th Annual SIGCSE/SIGCUE Conference onInnovation and Technology in Computer Science Education, ITiCSE’02.Aarhus, Denmark: ACM Press, New York, 2002, pp. 121–124.

5454