big data characterization of learner behaviour in a highly ...big data characterization of learner...

(2016).BigdatacharacterizationoflearnerbehaviourinahighlytechnicalMOOCengineeringcourse.JournalofLearningAnalytics,3(3),170–192.http://dx.doi.org/10.18608/jla.2016.33.9

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 170

Big Data Characterization of Learner Behaviour in a Highly Technical MOOC Engineering Course

KerrieA.Douglas

SchoolofEngineeringEducationPurdueUniversity,[email protected]

PeterBermel

SchoolofElectricalandComputerEngineeringPurdueUniversity,USA

MdMonzurulAlam

SchoolofElectricalandComputerEngineeringPurdueUniversity,USA

KrishnaMadhavan

SchoolofEngineeringEducationPurdueUniversity,USA

ABSTRACT:MOOCsattractalargenumberoflearnerswithlargelyunknowndiversityintermsofmotivation,ability,andgoals.TounderstandmoreaboutlearnersinhighlytechnicalengineeringMOOCs,thisstudyinvestigatespatternsoflearners’(n=337)behaviourandperformanceintheNanophotonicModellingMOOC,offeredthroughnanoHUB-U.Theauthorsexploredclustersoflearner clickstream patterns using the k-means++ algorithm and found five clusters of learnerbehaviour, labelled according to learners’ use ofmaterials: Fully Engaged,Consistent Viewers,One-Week Engaged, Two-Week Engaged, and Sporadic learners. The Kruskal-Wallisnonparametricstatisticaltestyieldedasignificantdifference(p<0.01)betweenlearners’accessofcoursematerials ineachcluster.Theresearchersthenexaminedtheparticipationandmeanscores on course quizzes and exams for each learner group. One-Week Engaged learners, onaverage,scoredsignificantlyloweronthefirstweek’sassessment.Two-WeekEngagedlearners,on average, scored significantly lower on the second week’s assessments. Other differencesfoundinlearners’participationandperformanceonquizzesandtestsbasedonthefiveclustersare discussed. These findings suggest that some of the high dropout numbers in advancedMOOCsmayberelatedtolearners’performanceoncourseassessments.Inaddition,integrationof learner access to course material with course assessment scores provides a much richerunderstandingoflearnersinaMOOC.

Keywords:MOOCs,learninganalytics,assessment



1 INTRODUCTION

Many education researchers strive to conduct research that will one day translate into widespreadimprovedpractice,informedbyevidence.Theimplementationofmassiveopenonlinecourses,MOOCs,has occurred very differently, arguably in reverse fashion. The potential that MOOCs have todisseminatevaluableknowledgeandskills,withoutfinancial,location,orsocial-statusbarriers,isaveryexciting promise, as noted in the media (e.g., Leckart, 2012). At no previous time in history hasknowledgebeensofreelyavailable.Yet,thereremainslimitedinformationforinstitutionsregardingthevalueofofferingMOOCs(Hollands&Tirthali,2014)andthereislimitedempiricalinformationtoinformpedagogy (Reich, 2015; Pernaet al., 2014).While research related toonline learningmorebroadly isquitevast,researchonhowlearningcanoccurwithintheMOOCmodelisstillrelativelyunexploredandmanyquestionsremain(Reich,2015).Pernaetal.(2014)notethreeareasofconsensusintheemergingfield: 1) MOOCs have very low completion rates, typically 5–12%; 2) how learners progress throughMOOCs is not largely understood; and 3) the effects of individual course characteristics on learneroutcomesisunknown.

Withopenaccess,thousandsoflearnersmayregisterwithlittleornocommitment,enteringandleavingtheMOOCatwill.Thisnon-traditionalbehaviourinopenaccesscoursesbringssignificantchallengestoassessingwhat learninghas indeedoccurred. In addition, the applicationof basic instructional designprinciplesisunclear,especiallyconsideringthediversityoflearnersintermsofeducationalbackground,usage,motivation,andintentionforthecourses(Douglas,Mihalec-Adkins,Hicks,Diefes-Dux,Bermel,&Madhavan, 2016). Given that good instructional design is based on understanding the learningenvironment(i.e., thecontextwhereinthe learningoccurs),the learner,andthetaskstodemonstratelearning(Ragan&Smith,1999), it is imperativeforresearcherstounderstand learnerbehaviourmoreclearlyintheMOOCenvironment,thetypesoflearnerneeds,andhowlearnerscandemonstratetheirlearning.

To better understand behaviour and types of learners in MOOCs, we must create new methods ofresearchthatintegrateanalyticswithtraditionalformsofassessmentdata.Whileclickstreamsarenotameasureoflearning(Thilleetal.,2014),learneraccessdatacanservetoidentifygroupsoflearnerswhoutilize thematerials differently. In addition, identification of usage patterns can serve as a groupingvariable toallowdeeper investigation intowhatare theunderlyingdifferencesbetween learners.Weconsider this integration of learner analytics with forms of traditional educational research andassessment as a big data characterization of learner behaviour (Morabito, 2015) because the vastamountof data is used to reveal somethingof depth. Thepurposeof this research is to characterizelearnerpatternsinahighlytechnicalengineeringMOOC.

The specific MOOC under study, Nanophotonic Modelling, was developed to provide graduate-levelcontent related to the field of nanotechnology, which is constantly changing through researchadvancements (Roco, 2011), causingmany traditional course textbooks to fall quickly out of date. Infact, many of the recent course topics are so new that the information is only available throughconferenceproceedingsandpeer-reviewedjournalpublications.Developingcoursessuchastheoneon



nanophotonicmodelling,nanoHUB-U,significantlydecreasesthetimebetweenresearchdiscoveryandaccesstotheinformationforengineeringstudentsandpractitioners.Asafirststeptounderstandingthetypesof learnerspresentwithinanadvancedMOOC,weidentify learnergroupsbasedontheircoursematerialusage.Next,usingtheemergentgroupsasasortingvariable,weexploreeachgroup’squizandtest scores to understand more about the similarity of learners within each group and aid in ourclassificationoflearners.

2 LITERATURE REVIEW

MOOCs tend toattract levelsof enrollment that arequitedifferent from traditionalhighereducationlearning environments. Anyone with Internet access can register for a course, whether it is out ofpassing curiosity or purely by accident (Liyanagunawardena, Parslow, &Williams, 2014). In a surveystudyof34,779MOOClearners,Christensenetal.(2013)foundthatthemajorityoflearnerswerefromdevelopedcountries,young,employed,andwelleducated.Inaddition,surveyrespondentswereaskedtheirreasonforenrollmentwith10answerchoices,including“other”and“noneoftheabove.”Overall,approximately50%ofrespondentsindicatedthattheyenrolledfor“curiosity,justforfun.”Christensenand colleagues (2013) note that this result varied when considered by course. For example, 75% ofrespondentsreportedtheirreasonforenrollmentinhumanitiescoursesas“curiosity,justforfun.”Thelowest percentage of learners reporting enrollment out of curiosity was approximately 49% for thegroupofscience,healthcare,andmathcourses.Thirty-ninepercentfromthosecoursesreportenrollingto gain specific skills related to job. However, they do not report findings specifically for science orengineeringcourses.

Kizilcec & Schneider (2015) studied learners’ reported reasons for enrolling in MOOCs and foundthirteen different reasons, varying from general interest to improving English. The researchers thenturnedtheopen-endedquestions intoasurveyquestion,askingrespondentstoselect“all thatapply”onadifferentsample.Inotherwords,respondentscouldpickasmanyreasonsforenrolmentasdesired,withoutgivingthemarank.Themeannumberofchosencategorieswas6.3outofthe13reasonsforenrollment.While it is clear that therearemultiple reasons for signingup foracourse, it isunknownwhat reasons are themost important or of value to the learners. In addition, reasons for enrollmentwerenotexaminedincontextoflearnerbehaviourinthecourses.Itisunknownwhetherthosefewthatenroltoincreasetheirskillsmayseekdeeperlevelsoflearningorparticipatewithcoursematerialsmorefully.

2.1 Learner Completion

The openness of MOOCs presents additional challenges as learners may enter and leave a coursewithoutconsequence.Learnersareabletoviewmaterialsbasedontopicsofinterest.Itisunsurprisinglyfound that theactualpercentageof learnerswhocomplete thecourseasper traditionaldefinitionof“completion” (i.e.,successfully finishthemajorityofcourseactivitiesandassessments),oreventhosewho regularly engage in the course at all, can be very low. For example, Perna et al. (2014) studiedlearner usage in 16MOOCs offered through Coursera, and found that fewer than half of registrants



(46%)accessedoneormorevideolectures.Liyanagunawardena,Adams,andWilliams(2013)examined45MOOC-relatedresearchpaperspublishedbetween2008and2012,andnotedthatthehighestrateofsuccessful completion for any course was 19.2%, but the vast majority reported less than 10%. Inaddition,Jordan(2014)consideredcompletionasthepercentageofcourseregistrantswhometcriteriatoearnacoursecertificate in39MOOCsofferedondifferentplatforms(e.g.,edX,Coursera,Udacity),and founda typicalcompletion rate tobe5%.Little isknownabout thedifferencesbetween learnerswho leavethecourseafteronlyacoupleofweeksversus thosewhoaremoresporadicor thosewhocompletealltheaspects.

WhileconsensushasemergedthatcompletionratesarenotfullyappropriateintheMOOCenvironment(e.g.,Kizilcec,Piech,&Schneider,2013;DeBoer,Ho,Stump,&Breslow,2014;Liyanagunawardenaetal.,2014),fewhavesuggestedwhatmetricsshouldbeusedtomakeevaluativeconclusions.Indeed,giventhelackofpersonalinvestmentneededtosignupforacourse,researchersandevaluatorsarenotableto infer the quality of a MOOC based on completion rate alone. The openness and accessibility toMOOCs has even led some to argue that completion rates are “misleading and counterproductiveindicators”of thequalityandpotentialofMOOCs (Hoetal.,2015).DeBoeretal. (2014)elaborateonthispoint,assertingthatcompletionratesmustbere-conceptualizedforaMOOCenvironmentwherelearnersmayconsiderMOOCslessofatraditionalcoursethanacollectionoflearningresources.Inthatsense,accesstoMOOCmaterialsbecomesthedesiredgoalratherthanthecourse itself.Otherspointout that MOOCs did not create a need to redefine successful completion; rather they raise alongstandingissueinhighereducation:learnershavedifferentgoals(Liyanagunawardenaetal.,2014).Tinto(1975)pointstotheargumentthatitisinappropriatetocategorizealllearnerswhowithdrawfroma course as one group without distinguishing between academic failures and voluntary withdrawals.WhenTinto’sconceptualizationisappliedtoMOOCs,it isimportanttounderstandthereasonsbehinddisengagementandothernon-traditionalusepatterns.

Thelevelof importancetoplaceoncompletionratesispartlydependentupontheintendeddesignofthe MOOC (Yuan & Powell, 2013). Indeed, MOOC completion rates become meaningful only wheninterpretedinthecontextoflearnerandstakeholdergoals(Koller,Ng,Do,&Chen,2013).Retentionandsuccessful completion, as evaluative metrics, also become important in contexts where MOOCstakeholders (e.g., faculty who teach, university administrators) desire learners to use the coursematerials more traditionally. Indeed, the MOOC phenomenon poses new challenges for educationalresearchers to determine what metrics of evaluation are appropriate. In order to begin to answermeaningfulquestionsaboutthevalueofMOOCs,thesechallengesmustbeaddressed.

2.2 Learner Usage

Reich (2015)points to four recent studieswhere learneractivity ina coursewaspositively correlatedwith higher final outcomes, such as course completion and exam scores (Collins, 2013; Murphy,Gallagher, Krumm,Mislevy, & Hafter, 2014; Reich et al., 2014;Wilkowski, Deutsch, & Russell, 2014).Giventhediscrepancybetweenenrollmentandcourseactivity,behaviourinMOOCsseemstoindicatethatmany learnersare takingadvantageofMOOCs in thesameway theyuseotheronline resources.



MOOC learners generally enter the course with a wide variety of participation objectives (Kizilcec &Schneider, 2015)making them far too heterogeneous to fully characterize simply by the percentageusingcourseresources.These latentorhiddenvariablescontributeto learnerbehaviour inthecourseand must be explored (Thille et al., 2014). Simply put, descriptive statistics present an incompletepictureofthetruevarianceoflearnerbehaviour.

Some researchers have considered the types of learners based on usage patterns as a method forunderstanding how learner needs can be met better (Kizilcec, Piech, & Schneider, 2013). Studyinglearnerpatterns in three computer science coursesof varyingdifficulty, Kizilcec et al. (2013) proposefour profiles: 1) completing 2) auditing, 3) disengaging, and 4) sampling. Completing is described aslearners who completemost aspects of the course, including the assessments. This group wasmostsimilar to traditional classroom learners and varied in their performance on assignments.Auditing isdescribedasthegroupoflearnerswhofollowedalongwiththevideosthroughoutthecourseduration,but who did assessments infrequently. Disengaging is described as the group of learners whoparticipatedinmostaspectsofthecourseinitially,butthendisengagedatsomepointinthefirstthirdofthe course offering. Sampling is described as the group of learners who watched videos or othermaterialsforonlyoneortwoassessmentperiods,manyofwhomonlywatchedonevideo.

Similarly,Hill (2013a) identifiedfourpatternsof learnerswithinCourseraMOOCs:1)Lurkerswhoonlysampleitems,ordonothingatallbeyondregistration,2)Drop-Inswhobecomeinvolvedforaselectedtopic,3)PassiveParticipantswho“viewacourseas contentandexpect tobe taught,” tend towatchvideos and maybe take quizzes, but do not usually participate in activities/discussion, and 4) ActiveParticipants who participate in most aspects of theMOOC, including social media. Hill (2013b) thenrevised his finding to delineate between those who only register (No-Shows) and those who readcontent,butdonotparticipateindiscussions(Observers).AlimitationofbothKizilcecetal.(2013)andHill (2013b) isthatneitherfullydescribesthemethodsusedtodeterminethe idealnumberof learnergroupsnorteststhesignificantdifferencebetweengroups.Theroleoftheoryorrationaleindevelopingageneralizabletypologyof learnerusage isalsonotfullyexplicitand it ispossibletogenerateseveraldifferentvisualizationsofaccessthatmayormaynotbehelpfulforMOOCinstructionaldesign.

TofurtherresearchMOOCeducationandevaluation,wemustexplorewhetherthetypologyoflearnerpatterns found in Kizilcec et al. (2013) or Hill (2013b) generalize toMOOCs regardless of content, orwhethertypesoflearnerpatternsvarybasedoncoursecontext.Furthermore,tounderstandthesimilarlearners within each group, we must contextualize groups in terms of their performance on courseassessments.Thisstudyidentifiesgroupsoflearnerbehaviourpatternsanddelvesdeeperintotheirquizandtestassessmentperformance.Specifically,weaskthefollowingresearchquestions:1)Whataretheclusters of learner usage patterns in a highly advanced MOOC engineering course? 2) How areparticipationandperformancepatternssimilarwithineachcluster?3)Whatarethedifferencesinquizand exam participation and performance between clusters? The specific course studied in this paperwasentitledNanophotonicModelling,ahighlytechnical,advancedengineeringcourseprovidedthroughnanoHUB-U.



3 BACKGROUND

UnlikeMOOCsofferedonlargeplatformsthatdoextensivemarketingtodrivehighenrollment,ahighlytechnical (graduate level) engineering course offered through an NSF-sponsored computationalnanotechnology siteprovides theopportunity toexploremeaningful results available fromcombininglearnerusageandassessmentperformancedata.AsNanophotonicModelling ishighlyspecializedandnotofferedthroughamajorMOOCplatform,enrollmentwassubstantially lowerthantypicalMOOCs.Yet,comparedtosimilaron-campusgraduatecoursesthattypicallyhave10students,theenrollment(n=337)waswellbeyondwhatwouldbeplausibleatonegeographic location.Thecoursewasdesignedfor a relatively narrow intended learner type, i.e., someone with a graduate-level understanding ofphysics and computational simulation. Ideally, the instructor would target engineering and scienceprofessionals who need nanotechnology skills and knowledge. By developing courses such asNanophotonicModelling,oneofthemajorintendedoutcomesofnanoHUB-Uistosignificantlydecreasethe time between research discovery, to access of new information, to project applications forengineeringstudentsandpractitioners.Thesequencebeginswiththemostbasicconcepts inordertoscaffoldmoredifficultmethodslater.

Table1.StructureofNanophotonicModellingcourse

Week(s) Majortopic(s)

1 Photonicbandstructuresolvers

2 Transfermatrixanalysis;rigorouscoupledwaveanalysis(RCWA)

3–4 Finite-differencetimedomain

5 Finite-elementmethods

Much like other nanoHUB-U offerings, the NanophotonicModelling course contains five lectures perweekofapproximately20minuteseach,releasedweeklyoverthecourseoffiveweeks.Thisinstructor-led sectionof the course is intended for learners to approach sequentially, as in a traditional course.Duringthistime,agraduateteachingassistantandtheprofessorregularly interactwiththeirstudentsthrough a message board. The instructor-led parts of the course conclude after the fifth week.Subsequently,thecoursematerialisarchivedandmadefreelyavailablethroughaself-pacedsectionofthecoursewithnofixedenddate.Inadditiontothelectures,fivetypesofcoursematerialsaregivenforeach lecture:1)asetof lectureslides inPDF format,2)aquiz,3)anassignment,4)apdfassignmentsolution,and5)atutorialtoprovideanoverviewofthesolutiontechnique.Finally,aweeklyexamtakesplaceattheendofeachoffiveweeks.Studentswhoachieveanaverageofover60%inallthegraded



materialbytheendofthe lastweek, includingmaterialnottaken,receiveacertificateofcompletion,knownasabadge.Insomecases,learnerscanalsoreceiveUniversityorcontinuingeducationcredit(s).

4 METHODS

4.1 Participants

Inthissection,wepresentthelearnerdemographicsintermsoftheirgeographiclocations,gender,andorganizationalaffiliation.Often,MOOCresearchers report thenumberof learners in thecategoriesofregistrantsandstarters (e.g.,Pernaetal.,2014).Registrantsare the totalunique individualsenrolled,while Starters are the total unique individuals who actually accessmaterials in the first week of thecourse. Complicatingmatters, however, learnersmay choose to join after the course has started. Toaccount for this issue,wealso create a third categoryofActiveLearners, definedas the total uniqueindividualswhoparticipatedinoneormoreactivitiesatanypointduringthecourse,i.e.,downloadingorwatchingoneormore coursematerial itemat leastonceduring the time itwasoriginallyoffered.Nanophotonic Modelling had 337 registrants from 48 countries, with 68% of participants from 45countries.Therewere226ActiveLearnerswhoutilizedmaterialsasthecourseprogressed.Althoughthelearners were from across the globe, some geographical clustering was observed in Anglophonecountries:notably,thelargestgroupofregistrants(31%)werefromtheUnitedStates,withthesecondhighest (17%) fromIndia.Nearly thesamepercentageofActiveLearners in thecoursecamefromtheUnited States (31%) and India (16%). No other single country accounted for more than 5% of thelearners.AsshowninTable2,29%ofthelearnersarefrom34countries,eachcountrywithonlyafewparticipants. Thirty-one learners earned a badge for the instructor-led portion of NanophotonicModelling.

Table2.GeographicDistributionofActiveLearners

Country %Active

UnitedStates 31.4India 15.9Egypt 4.4Canada 4.0Mexico 2.7RussianFederation 2.7Brazil 2.2HongKong 2.2Bangladesh 1.8China 1.8Spain 1.8Other(All<2%) 29.2



4.2 Sources of Data

Wecollectedlearnerbehaviourandassessmentdatatodevelopacharacterizationofthelearnersfromthe Nanophotonic Modelling course. The learner behaviour is derived from typical clickstream data:numberoflectureviews,accesstoPowerPointslides,tutorials,assignments,andassignmentsolutions.Clickstreamdatawere recorded every time a learner clicks on any course resource. In this study,weidentified31coursematerialsmadeavailableeveryweek,totalling155coursematerialsduringthefiveweeksof the course. Two typesof coursematerialsweremadeavailable to the learnerseveryweek:learningmaterials like lectures, homework, and tutorials, aswell as assessmentmaterials likequizzesandexams.Clickstreamdatacaptures thedateand timewhen the learneraccessesanymaterial,butcomeswithsomepracticallimitations.Forexample,learnersmaydownloadanycoursematerialinPDForvideoformatorfollowaYouTubelink,buttheclickstreamdatadoesnotcapturewhetherthelearnersavesthematerialortakesnotesandthenrefersbacktoit.

Alldatawerede-identifiedusinganalgorithmthatassignedarandomnewuniquecodeforeachlearner.Thenewlycreatedcodewasusedtocreateaunique1:1mapbetweenmultipledatasetswhilegreatlylowering risk of re-identification. Also, the actual calendar dates and access duration values wereconvertedtoordinalnumberstoprotecttheprivacyofthelearners.Still,thisapproachcancapturethemultipleaccesseventstoparticularcoursematerialsbysomelearners,whichmaybeuseful in learnercharacterization.

Examandquizscoreswerecollectedforanalysis.Therewasonequizperlectureanditconsistedoftwoquestions,directlyrelatedtomaterialcoveredinthelecture(multiple-choice,withfourtofivechoicesperitem).Intotal,therewere25quizzesand50questionstoassesslearnersdirectlyafterpresentationofthematerial. Inaddition,therewereweeklyexams,eachwith10questionsalignedtotheconceptscoveredinthatweek.Intotal,thefiveexamscomprised50questions.

4.3 Identifying Learner Groups by Clustering

The research team analyzed the clickstream data by looking for patterns of course material access,whichisthebasisforidentifyinglearnergroups.Therationaleforthisapproachissimple:byexploringthe usage patterns and understanding more about learners within and between each pattern ofbehaviour,wecanunderstandmoreaboutthediversityoflearners’participationandachievement.Itisimportanttounderstandthatallthelearnersmaynothavethesamelearningobjective,whichmaygiverisetodifferentpatternsofaccess.Investigatingtherelationshipbetweenlearnersandcoursematerialscan give better insight into the learners (Hecking, Ziebarth, & Hoppe, 2014). Relationships betweenlearners based on identifying similarities in course material usage patterns can help revealcommonalities of each group. Some researchers analyzed the learner groups using social networkanalysismethodstofindgroupswithcommoninterests(Harrer,Malzahn,Zeini,&Hoppe,2007).Otherresearcherspresenteddataminingclusteringmethods forweb-basedcoursematerialusage (Romero,Gutiérrez,Freire,&Ventura,2008).Whilemanyclusteringalgorithmscanbeused,weusedak-means



clustering technique. The popularity of k-means clustering techniques rests on the ease ofimplementation, simplicity, efficiency, and empirical success (Jain, 2010). This approach helped toidentify key patterns of course material usage among the learners, and then divide them into acorrespondingsetoflearnergroups.

4.3.1 Data Pre-Processing Prior to applying the k-means++ (explained in next section) algorithm to our data, we identified theproperdatasetssuitable for thek-means++algorithm.For instance,amongthe155coursematerials,lectures and tutorials had different video formats.More precisely, identical videos were provided inthreeformats:onlinestreaming,downloadableMP4,andYouTube.Thelearnercangetthesamestudymaterial fromanyofthosethreevideomodes;therefore,accessingall threeformatsor justonedoesnot reflect different learning intentions. Therefore,we combinedall three formatsof the samevideointo a single item. This approach results in 95 distinct course materials. Next, each course item islabelledwithaunique IDnumber,according to thesequenceof release.Wegeneratedabinary tablewith226rows,representingthematerialusagepatternofeachactivelearner.Thebinarysequenceofanindividualhadanumericdigit“1”ifthepersonaccessedthecorrespondingmaterialanda“0”ifs/hedidnotaccessit.

Next,wegeneratedadotplotfromthebinarytable,asshowninFigure1.Everydotrepresentsalearneraccessing the corresponding course material. The horizontal axis represents the material ID and theverticalaxisrepresentsthelearnerID.Itappearsuponcasualinspectionthatthematerialusagepatternisjustrandom.Nevertheless,clusteringalgorithmscanrevealuniquepatternsofcontentusage.

Figure1:LearnersCourseMaterialAccessData

4.3.2 K-Means Clustering Algorithm Thek-meansclusteringalgorithmisawell-establisheddataminingresearchmethodforclassificationofdata into distinct groups. As discussed in Section 4.3.1, our initial data is derived from the access



patterns of 226 active learners, captured as a 95-dimensional binary vector 𝑋. We select 𝑘 clustercentres 𝜇 from𝑋 to minimize the sum of distances in the 95-dimensional space using the k-meansalgorithm (Jain, 2010). Intuitively, this corresponds to choosing group membership to maximizeuniformitywithin each group, even if theoverall data set is very heterogeneous.However, there aremanyexamplesofgeneratinglow-quality,non-reproducibleclusterswhenusingarbitrarilychoseninitialcluster centres in k-means (Arthur & Vassilvitskii, 2007). Therefore, in this work, we follow the k-means++ heuristic method to reproducibly classify our learners (Arthur & Vassilvitskii, 2007). Theprobabilisticapproachofchoosingthe initialclustercentresbythek-means++algorithmimprovesthequality and reproducibility of the found cluster (Arthur& Vassilvitskii, 2007). The key steps are 1) torandomlyselectaninitialclustercentrefromX;2)toweighttheselectionofsubsequentclustercentresbytheirrelativedistancestootherpointsinX;3)toassignallpointstotheclosestclustercentre;4)tore-centreeachclustersoformed;and5)torepeatthelasttwostepsuntilnofurtherchangesareseen.

We implemented themethodoutlined aboveusing the function “kmeans,” available inMATLAB. Thereproducibility of the clusters by k-means++ algorithm depends on the initialization (Lisboa, Etchells,Jarman,&Chambers,2013).Therefore,k-means++clusteringcanbetreatedasaheuristicapproachtodivide the dataset into different groups. Nonetheless, the choice of groupswas fairly uniform in thedatasetsexaminedinthenextsection.

5 RESULTS

Inthissection,wedemonstratehowtheclusteringmethodshowsdifferent learnergroupsamongthestudents.Wealsodiscussthesimilaritiesanddifferencesbetweenthegroupsintermsofmaterialusagepattern,aswellastheirperformancesinquizzesandexams.Weexplainwhattheclusterstellusaboutthelearnersandwhattheconsequencesare.

5.1 Identified Learner Groups

The k-means++ clustering technique found some interesting learner groupswithin the NanophotonicModellingcourse.CoursematerialusagepatternsbydifferentclusteredgroupsarepresentedinFigure3.Everydot inFigure3 indicatesthatthelearneraccessedthatparticularmaterial;converselyablankrepresents that the learner did not access that material. Note that only learner data is analyzed;registeredbutinactivelearnersareexcludedfromouranalysis.

Acommonconcernofclustering is theoptimalnumberofclusters.We identifiedtheoptimalnumberboth empirically and through rationale. The two empirical methods used were within-cluster errordispersionandgapstatistics,asinTibshirani,Walther,andHastie(2001).Within-clustererrordispersion(𝑊%)ispresentedagainstthenumberofclustersinFigure2aandthegapstatisticisshowninFigure2b.ItisobviousinFigure2athatthe𝑊% valuestartedtobecomeflatat𝑘 = 5forthefirsttime;therefore,the optimal number of clusters is 5. Gap statistic evaluation was completed using the function“evalclusters,” available inMATLAB. A plot of gap values against the number of clusters is shown inFigure2b.Therequirement foroptimalnumberofcluster is𝐺𝑎𝑝 𝑘 ≥ 𝐺𝑎𝑝 𝑘 + 1 − 𝑠%01where𝑘 is



theoptimalnumberofclustersand𝑠%01isthestandarderroroftheclusteringsolutionusedintheGapstatisticmethodproposedinTibshiranietal.(2001).Finally,themethodidentifiedtheoptimalnumberofclusteras6.

(a)

(b)

Figure2:(a)Within-ClusterErrorDispersion;(b)GapStatistic

Thelearnersareclusteredbythek-means++methodwithclustersnumbering6andthen5.AlthoughGapstatisticsuggestedthatthenumberofclustersbesix,fewlearnerswereactuallyplacedinthesixthcluster, and they were very similar to learners in other clusters; i.e., the learners of cluster 6 wereengagedforthefirsttwoweekswithsomeirregularuseofcoursematerials.Basedonthisrationale,wedecided to cluster the learners into five groups in order todescribe themaccording to theirmaterialusagepattern,ratherthanmaintainingthesixthclustergroupoftwo-weekusagebutnotfullyengaged

Figure3:LearnerClusteringbyCourseMaterialsUsagePattern



The simulation for k-means++ clusteringwasperformedby changing variousnumbers of clusters andsettling finally on five distinct groups (clusters) of learners, as presented in Figure 3, with the x-axisrepresenting unique course materials in chronological order, and the y-axis representing the newcluster-basedorderingofeachuniquelearner.Clusters1through5arecolouredblue,magenta,black,red,andgreen,respectively.Thisclusteringgraphshowsthepatternsofactivitiesofthelearnersmuchmore clearly than in Figure 1. According to our reduced coursematerial list, 19 itemswere releasedevery week. The first 13 are studymaterials, while the last six are assessments (five quizzes and anexam),asindicatedonthehorizontalaxisinFigure3.Theclusteringgraphhighlightskeypatternsofthematerialusagepresentineachgroupoflearners.

Basedon thematerial usagepattern,we identified the five clusters and assigned labels to the foundclusters,asdescriptiveof theirusage toaid in interpretation:1)FullyEngagedLearners,2)ConsistentViewers, 3) Two-Week Engaged Learners, 4) One-Week Engaged Learners, and 5) Sporadic Learnersrespectively.FullyEngagedLearnersaccessthestudymaterialsregularly,andattemptmostquizzesandexams.ConsistentViewersactivelyaccesscontentovertheentirecourse,butdonotregularlyaccessthecourse assessment materials. Two-Week Engaged Learners actively access materials in the first twoweeksofthecourse,withasharpreductionintheiractivitiesinsubsequentweeks.One-WeekEngagedLearnersarequiteinteresting;theyfullyaccessthefirstweek’smaterials,butdoverylittleinthecourseafterwards.Finally,SporadicLearners randomlyaccessmaterials.We identified two learners fromthisgroupwhoattemptedallfiveexamswithoutaccessinganyofthestudymaterials.ManyoftheSporadicLearnersaccessedonlyassignmentsandnoneofthevideos.Thecluster-wisedistributionoflearnersispresentedinTable3.SporadicLearnersformthebiggestgroupamongthefive,accountingfor42%.

Table3.ClusterGroups

Cluster n %ofLearners

FullyEngagedLearners 39 17%ConsistentViewers 13 6%Two-WeekEngagedLearners 23 10%One-WeekEngagedLearners 57 25%SporadicLearners 94 42%

5.2 Exam and Quiz Grades for Each Group

To better capture the similarities and differenceswithin and between groups,we also examined theexamandquizscoresforeachgroup.Wecalculatedmeanandstandarddeviationofthescoreswithineachgrouptomorethoroughlyunderstandwhether learnerswhobehavedsimilarlywithinthecoursealsoperformedsimilarly.



Figure4:ScoresObtainedbyDifferentGroupsinWeeklyExams.†Averageamongstudentswho

attemptedtheexam

Figure5:ScoresObtainedbyDifferentGroupsinQuizzes

Figure4presentsthegroup-wiseperformanceintheweeklyexamsandquizzes,respectively.Themeanscoreswere calculated by computing the average of scores for all learnerswho attempted the givenassessmentineachgroup.Asshown,thefirstthreelearnerclusters(FullyEngaged,ConsistentViewers,and Two-Week Engaged) scored 90 or above in the first exam, while the other two less-consistentgroupsperformedpoorly.FullyEngagedandConsistentViewersscoredabove90consistentlyinthefirstweek’squizzes,whiletheothergroupswerenotconsistentatall(Figure5).Allscoresdecreasedineachlearnergrouponthesecondexam(withtheexceptionofOne-WeekEngaged,astheydidnotattemptanymoreexamsafterthefirstweek).

85.067.5

74.572.0

60.7

0.0

20.0

40.0

60.0

80.0

100.0

FullyEngagedLearners

ConsistentViewers

Two-WeekEngagedLearners

One-WeekEngagedLearners

SporadicLearners

Score✝

Week1 Week2 Week3 Week4 Week5 Average

020406080

100120

Q1.1

Q1.2

Q1.3

Q1.4

Q1.5

Q2.1

Q2.2

Q2.3

Q2.4

Q2.5

Q3.1

Q3.2

Q3.3

Q3.4

Q3.5

Q4.1

Q4.2

Q4.3

Q4.4

Q4.5

Q5.1

Q5.2

Q5.3

Q5.4

Q5.5

Score✝

FullyEngagedLearners ConsistentViewers

Two-WeekEngagedLearners One-WeekEngagedLearners

SporadicLearners

†



Wealsoconsideredlearners’assessment-relatedbehaviouroverthedurationofthecourse.Wefoundthat90%ofthoseFullyEngagedattemptedall theexamsandquizzes.Bycontrast,onlyabout15%ofConsistentViewers attempted firstweek’s examandquiz, andnoneattemptedany thereafter. In theTwo-WeekEngagedgroup,only35%attemptedthefirstweek’sexam;participationgraduallydeclinedover eachweek thereafter. In the case ofOne-Week Engaged, 44% attempted a firstweek quiz, andfewerthan9%tookthefirstexam.Theydidnotattemptanyotherquizzesorexamsafterthefirstweek.Lessthan10%oftheSporadicgroupattemptedalltheexamsandquizzes.

We now consider the variance of performance by compiling detailed statistics on the first weekassessmentsforeachgroupof learners,providedinTable4.ThisworkshowsthatOne-WeekEngagedand Sporadic Learners had a high standard deviation of scores in the first exam compared to othergroups while Two-Week Engaged, Consistent Viewers, and Fully Engaged Learners scored in the firstweek’squizzeswithhighstandarddeviations,indicatingthevarianceofperformanceofthelearnersinthesegroupswashigh.Overall,thelearnerswithinthesethreegroupsperformedsimilarlytoeachotherin the first week’s exam compared to other groups. Two-Week Engaged, One-Week Engaged, andSporadic learners performed similarly in the first week quizzes, in the B-range; Fully Engaged andConsistentViewersearnedveryhighpercentages.

Table4:Group-wiseMeanandStandardDeviation(SD)ofAssessmentScores

Assessment GroupName

NumberofLearnersAttempted

%ofLearnerswithinCluster

Mean(SD)

FirstWeekExam

FullyEngaged 37 95 90.0(11.0)ConsistentViewers 2 15 95.0(5.0)Two-WeekEngaged 8 35 92.5(9.7)One-WeekEngaged 5 9 72.0(20.4)Sporadic 7 7 61.4(31.8)

FirstWeekQuizzes

FullyEngaged 37 95 96.2(13.3)ConsistentViewers 4 31 100.0(0.0)Two-WeekEngaged 11 48 82.9(23.7)One-WeekEngaged 25 44 86.0(22.4)Sporadic 9 10 82.4(23.0)

5.3 Statistical Significance of Clustering

Given that learner groups identified by the clustering technique performed differently in the course,thereissomeevidenceofunderlyinglearnercharacteristics(i.e.,latenttraits)thatcontributetolearnerbehaviour in the course. To check for the significance of our clustering solution,we performed non-parametricKruskal-Wallis(1952)andMann-WhitneyU(Kirk,2008)testsonthecoursematerialaccesspattern of each cluster. The usual technique for such measurements is the analysis of variance;



however,with uneven usage patterns of groups, the data did notmeet assumptions of normality orsimilargroupsize.Therefore,wechose thenon-parametricKruskal-Wallis test (1952),which tests thehypothesis that theclustershaveasignificantlydifferentdistribution in thepopulation (Stark,Woods,Thilaka,&Kumar,2012).Toallowforcomparisonbetweentheclusters ina95-dimensionalspace,theKruskal-Wallistestwasperformedbysummingeachparticipants’accesstoallcoursematerials. Itwasperformedonalltheclustersinasinglecalculation(i.e.,rankorderingallpoints,calculatingthesumofranksforeachcluster,andperforminga𝜒;testwithdf=5–1=4,sincewehavefiveclusters.Theresultswere significant, H(4) = 191.88; p<0.01, indicating that the different clusters have a less than 1%likelihoodofbeingdrawnfromthesameunderlyingdistribution.

TheMann-WhitneyUtestisusefulfortestingthehypothesisthattwogroupsareidenticalornot(Kirk,2008).WeperformedtheMann-WhitneyUtestasposthoctestfortheKruskal-Wallistestasafollow-up to the finding that each cluster representsdifferent learner types.Many researchershaveappliedthistechniqueforstatisticalhypothesistesting.Forexample,researchersappliedtheMann-WhitneyUtest to establish the impact of the neurotransmitter oxytocin on trust between humans (Kosfeld,Heinrichs,Zak,Fischbacher,&Fehr,2005).Inourstudy,wehad10optionsforcomparison.Finally,wecalculatedtheeffectsizesofallthecomparisoncases.Theeffectsizewasdefinedas𝑟 = 𝑍 𝑁whereZwasthez-scorefoundfromthetestandNwasthesamplesize.Itisimportanttonotethatwesetthesignificance level for the test to 0.05, but the critical level of significance should be 0.05/10 = 0.005becausewehad10cases.Thesignificancelevels(𝑝 < 0.001)ofallthecomparisonslistedinTable3arewellbelowthecriticalsignificance level,again implyingthattheclustersareveryunlikelytobedrawnfromidenticaldistributions.Inaddition,allthetestcaseshadlargeeffectsizes(𝑟 < −0.5),exceptforamediumeffect size incomparingClusters1and2,which is consistentwithourearlier findings.Takentogether,thesestatisticalresultsconstitutesignificantevidencethatclusteringcanprovidenewinsightintothebehaviourofdistinctlearnerpopulations.

Table3:EffectSizeandSignificanceLevelfromtheMann-WhitneyTest

Comparison EffectSize(r)

FullyEngaged/ConsistentViewers −0.49FullyEngaged/Two-WeekEngaged −0.83FullyEngaged/One-WeekEngaged −0.85FullyEngaged/Sporadic −0.80ConsistentViewers/Two-WeekEngaged −0.77ConsistentViewers/One-WeekEngaged −0.67ConsistentViewers/Sporadic −0.58Two-WeekEngaged/One-WeekEngaged −0.75Two-WeekEngaged/Sporadic −0.69One-WeekEngaged/SporadicLearners −0.74

Note:p<0.001forallcomparisons



6 DISCUSSION

Fromtheresultsofthek-means++clusteringtechnique,weidentifiedfive learnergroupsaccordingtotheir course usage patterns: Fully Engaged, Consistent Viewers, One-Week Engaged, Two-WeekEngaged, and Sporadic. The results from our analysis are similar, but somewhat different from thepatternsfoundbyKizilcecetal. (2013)andHill (2013b).TheFullyEngagedgroupissimilar inusagetoKizilcec et al.’sCompleting group.We chose to label the group of learnerswho participated inmostcourseactivitiesasFullyEngagedratherthanCompletingfortworeasons:1)theterm“completion”hasbecomesomewhatcontroversialintheMOOCliterature,and2)learnersinthisgroupnotonlyfinishedthecourse,theyfaithfullyengagedinmostorallaspects.TheCompletinggroupvariedbycourselevel,contributingto27%ofthelearnersenrolledinthehigh-school levelcourse;8%oftheundergraduate-courselevel,and5%ofthegraduatelevelcourse.OurFullyEngagedgroupmadeup17%oflearnersinNanophotonicModelling.Ofnote,we identifiedgroupsof learnerswhofullyengagedwiththecourseearlyonandwithdrewactivityafteroneortwoweeks.Thisisanimportantfinding,especiallyinlightoftheirperformanceonassessmentspriortodisengagingwiththecourse.

Next,we examined how learners in differing behaviour groups participated in and performed on thecourseassessments. Together, thisdataallowedus a closer lookat the varying learner groupswithinNanophotonicModellinganddevelopedacharacterizationofthetypesof learnerspresent.Themajordistinction between the groupswas their usage of study and assessmentmaterials, with remarkableuniformitywithineachgroup.Next,wesummarizethecharacteristicsofeachlearnergroup.

Fully Engaged learners represented approximately 17% of learners within Nanophotonic Modelling,somewhat higher than the typical completion rate (less than 10%) reported in the literature(Liyanagunawardenaetal.,2013).ItmaybethatbecauseNanophotonicModellingisahighlytechnicalcourseofferedonnanoHUB.org,thelearnersattractedtothecoursemaybesomewhatdifferentthanthosewhoregisterinaMOOCofferedthroughalargeplatform,suchasCourseraoredX.FullyEngagedlearners earned completion certificates and consequentially are the most similar to traditionalclassroomlearners.Theytendedtoparticipateinmostorallaspectsofthecourse,includingtheweeklyquizzesandexams.Ultimately,asagroup,theyperformedverywell,withanaveragefinalgradeof90.Unfortunately,manyofthetechniquescommonlyusedtoanalyzecoursebehavioursimplydonottellthestoryofthosewhowantedtocompleteallaspectsofthecourse.MoreaboutthisgroupofMOOClearnersmustbeunderstood.TheseFullyEngaged learnersget lost in themeansandaveragesof themajority of learners in MOOCs, yet there is value in understanding their experiences. Within thesesmaller percentages of Fully Engaged learners, having access to the content may have the biggestimpact, such as learners who intend to apply the information in their work projects or changedissertation topics based on this newunderstanding.While themajority of learners certainly did notutilizethecourseinastrictcourseformat,somedidandindeedperformedquitewell.

ConsistentViewers represented approximately 6%of learnerswithin the course. Their usagepatternsindicatethattheyhaddifferentgoalsthantheFullyEngagedlearners.ConsistentViewersaccessedmost



orallofthestudymaterials,butdidnotattemptthemajorityofthequizzesandexams.Interestingly,inthefirstweek,the15%ofthegroupwhotooktheassessmentsperformedverywellontheexam(M=95.0,SD= 5.0) and all thequizzes (M=100, SD= 0).Nevertheless, it is unknownwhy they chose todiscontinue taking theassessments.Onepotential explanation is that theywere interested in gainingtheknowledge,butwerenotconcernedaboutearningacertificateorachievingahighgrade.Withoutassessment scores, there is no way to make inferences about the level of knowledge this groupachieved.Atthesurface level, it isapparentthattherewassomethingaboutthe learningopportunitythatbroughtthemback intothecourseeachweek.Onepotentialexplanationmaybethat thisgroupincluded faculty, students,orworkingprofessionalengineerswhodidnothave time todevote to theassessments,butwantedtobecomefamiliarwiththecontent.Alongthesamelines,itisunknownhoworiftheConsistentViewersassessedtheirownlearningorthedepthatwhichtheyintendedtolearnthematerial.Interestingly,theyfaithfullyfollowedalong,whichseemstoindicateahighlevelofmotivationand intentiondespitenotusing theassessmentopportunities for feedback.TheirbehaviourbringsuptheundefinedroleofassessmentinMOOCsforthosenotseekingcertificatesofcompletion.Testsandquizzes are often thought of asmeans to passing a formal class, yet in an open-online environment,assessmentscanbetakenmultipletimesasafeedbackmechanism.Ratherthanviewingassessmentasaburden,assessmentscanhelpinformwhatareasmayneedextraattentiontounderstand.Thevaluethese learnersplacedonthe learningexperienceandhowtheydiffer fromthose intheFullyEngagedgroupareconsiderationsforfurtherresearch.Additionally,itisunknownwhetherthisgroupoflearnerswould return to reference their course materials for application or how they benefitted from thelearningexperience.

Two-WeekEngagedlearnersrepresentedapproximately10%ofthelearners.Theywereengagedinthecourseveryactivelyduringthefirsttwoweeks.AsshowninFigures5and6,theirperformance inthesecond exam and the secondweek’s quizzeswas very poor (M= 46.0 and 71.0, SD = 19.5 and 24.7respectively).Only8.7%of these learnerssubsequentlyattemptedthethirdweek’sexamandquizzes.Considering their low scores, one possible explanation for their discontinued engagement with thecourse is perhaps related to the course being too difficult for them. It is unknown whether theirchallengeswererelatedtocoursecharacteristics(suchaspedagogy)oriftheywerenotpreparedforthecontentcovered(i.e.,didnothavethenecessaryprerequisiteknowledgetobesuccessful).

One-Week Engaged learners made up approximately 25% of the learners. This is the second largestgroupamong the learners.Theyattempted the firstexamandthe firstweek’squizzesbutperformedconsiderably lower on the exam (M = 72.0, SD = 20.4) than the first three clusters. These learnersdroppedthecourseimmediatelyafterthefirstweek.Theydidnotevenaccessthestudymaterials,eventhough they were all available online and could be downloaded for later access. One possibleexplanationmaybethattheyrecognizedearlyonthattheydidnothavethebackgroundneededforthecourseorthecoursewasdifferentfromtheirexpectations,sotheychosetodisengage.

ThelargestclusterwasSporadicLearners,comprising42%ofthetotalcourselearners.Theirbehaviourswouldappearrandomtoanoutsideobserver.Theydidnotaccessanytypeofcoursematerialsregularly



atall.Afewlearners(7%oftheclusterwhotookthefirstexam)attemptedthefirstexamandthefirstweek’squizzeswithoutaccessingthestudymaterialsandthusperformedpoorly(M=61.4and82.4,SD=31.8and23.9,respectively).ThemajorityofSporadicLearnerswereaccessingthematerialswithnodefinedpattern.Someaccessedseveralresourcesaroundonetopic,othersonlyaccessedassignmentswithoutvideos,whileotherstookafewexams.Onepossiblereasonforthisbehaviouristhattheywereinterestedinspecifictopicsratherthaninlearningallthematerialandearningacertificate.ConsideringthatSporadicLearnersmakeupthelargestgroupoflearners,thereisaneedtobetterunderstandwhatappearsassporadic.Itislikelythatlearnerswithinthisgrouphavedifferentinterests,motivations,andintentions.Itispossiblethatwhatappearsassporadicisactuallyplannedordeliberate.Thereisaclearneedtocontextualize theclickstreamandassessmentdatawithadeeperunderstandingof the latentgroupsoflearners.Inordertomakemeaningfulinferencesaboutdifferencesinbehaviour,itisessentialthatfutureofferingsofthecourseincludepre-coursesurveystocaptureself-reportedlearnerinterestsandplannedbehaviour.

7 LIMITATIONS

This study was conducted post-hoc on event log and assessment data collected through theNanophotonicModelling instructor-ledcourseoffering.Learnersexpressedmotivationsand intentionsin the coursewere not included in this study. As a first step to a bigger research agenda of learner-centred approaches to MOOC research, the current study finds groups of learners based on theirpatterns of usage and makes meaning of their learning outcomes in the course assessments.Furthermore,itisunknownhowsimilarthepatternofNanophotonicModellinglearnerbehaviouristomoregeneraltopicMOOCs,orotherhighlyadvancedcoursesthatresembleMOOCs,retainingmostbutnotalloftheircharacteristics.Inparticular,wechosetorepresentthelearnerbehaviourinfiveclusters,where a group of learners who engaged with most materials, but not all in the two weeks werecombinedwithagroupthataccessedallmaterialsinthefirsttwoweeks.Inlargerdatasets,thisgroupmay becomemore pronounced. The k-means++ algorithm is not necessarily the best algorithm in allcircumstances and should be compared with other approaches in future work (such as DBscan).Nonetheless, we find that within-cluster error dispersion drops by 63% when using five clusters, asshowninFigure2a,whichsuggeststhatitcanbeausefulanalyticalapproach.Anotherlimitationofthisstudy is treating the access of videos and assignments as a binary variable rather than capturingadditional information associated with multiple views. Although it is possible to capture whetherstudentsclickonanycourseelementmorethanoncewhileonline,itisbeyondthescopeofthisdatasetto capture whether and how often any downloaded material is used by the student while offline.Becauseofthisconstraint,multipletimeaccessisnotmodelled.Furthermore,whileweusedk-means++ontheaccesspatternsofvideosandassignments,additionaldatacollectedmaybeworthaddingtotheinvestigationtoseeiftheexistingpatternspersistoraremodifiedinanysignificantfashion.



8 CONCLUSIONS

NanophotonicModellingisaverytechnicalMOOCdesignedforlearnersofaspecificbackground.Evenwithinthishighlytargeted learnerpopulation,ourresults indicate fivedistinctgroupsof learnerswhoutilized the course in different ways to meet their own learning goals and needs. By analyzing thepatternsofaccessandperformanceforeachgrouptotheextentdatawasavailable,wewereabletoobtainkey insights into theirbehaviour.Forexample,by traditionalcoursestandards,achievementoflessthan60%ofpossiblepointsresults inafailinggrade,and isconsideredunsuccessful.Ourfindingsprovide evidence that there are groups ofMOOC learners clearly unconcernedwith gradeoutcomes,but find enough value in the coursematerials to return intermittently.While certainly some learnersperformedpoorlyonanexamandthenwithdrew,notallofthosewhodiscontinuedactivityperformedpoorly.Therefore,itwouldbeamistaketoinferthatallwhodisengagefromthecoursedosobecausetheyareunsuccessfulinlearningthecontent.

TheopennessofMOOCsprovidessomechallengetohowassessmentcanbeused.Whileonegroupoflearners,FullyEngaged,completedallassessments,themajorityoflearnersdidnot.Thismayberelatedtothosewhosoughtcertificatesversusthosetakingthecoursemoreinformally.Somelearnersmaynotunderstandtheformativerolethatassessmentcanplayinlearning.WhileFullyEngagedlearnerscouldhaveusedtheassessmentsasawaytoself-monitortheirown learning, learnerswhodidnotattemptthequizzesorexamsdidnot seek theopportunity forexternal feedbackof their learning. Instructorsmay point out the learning value from the assessments. Furthering this idea of would be to clearlyidentifyassessmentsasformative(usedtoenhanceorsupport learning)orsummative(usedtoassesscompetency).AsMOOCdevelopers consider the roleof assessment fornon-certificateand certificateearners,twopathsmaybedeveloped:onewhereassessmentisafeedbackmechanismwherelearnerscantakethesamequizorexammultipletimesandanotherwhereadditionalsummativeassessmentsare used to verify the issuance of a certificate. In this way, the integrity of the certificate can bemaintained,whilestillallowinglearnerstohaveaccesstothecorrectanswerstothequizzesandtests.

The similarity of learners within each cluster both in terms of their course material usage andperformance on assessments is confirmatory that there are hidden or latent learner traits andcharacteristics.Thesedistinctivegroupcharacteristicsandperformanceclearlydemonstratethatsimplyreporting theoverall coursemeans in termsofcompletionandpass rate truly fails toaccount for thediversityoflearnerswithinthecourse.Furthermore,moreresearchisneededtounderstandthelatentvariablesthataccountforwhatisseeningroupsoflearnerbehaviour.

Inanopen-accessenvironment,learnershaveawidevarietyofintentions,anditisinaccuratetoassumethat all learners are committed to the course or that none of them are. There is simply too muchheterogeneitytoclassifyallMOOClearnersintermsoftheirmeanusageorperformance.Furthermore,learnersmaynotcompleteallaspectsofthecourseandyetstillgainvaluableknowledge.SomelearnersmaychoosetoutilizeaMOOCmorelikeatextbook,focusingoncertainsectionsandnotonothers.Forexample,educationalresearchersmayhaveseveralresearchmethodologytextbooks,containingafew



chaptersthatareextremelyrelevantforaresearcherwhofrequentlyutilizestheconcepts.Inthesameway, it is possible that some learners come in and out of a MOOC as needed. Before evaluativeinferencescanbemadeaboutthequalityofanyMOOC,moreresearchisrequiredtounderstandtypesoflearnersandtheirlearningneedsbetter.

While large studieswith descriptive statistics basedon learner behaviour are abundant in theMOOCliterature,itisimperativethatresearchersfirstfocusonidentifyingthegroupsoflearnersinthecourse.Lumpingalllearnerstogetherandcalculatingtheoverallcourseassessmentscoresdoesnotadequatelydescribeallthatoccurswithinacourse.Withthevarietyofusepatternsandthehighlevelofnotfullyengaged learners, there is much noise in any statistical model that attempts to estimate based onoverallcoursemean.

Additionally, the categories of learners as “Registrants” or “Starters” does not capture the variety oflearners within a course. Additional categories based on learner patterns are needed. Others havepointedoutthatdescriptivestatisticsinMOOCenvironmentsarenotalwayshelpful(e.g.,Pernaetal.,2014;Reich,2014).Wehavedemonstratedtheuseofclickstreamdataalongsidecourseassessmenttounderstandmoreabout learners.Furthermore,overall coursedescriptivestatisticsareconfoundedbylatent learnervariables.However,by identifying typesof learnerbehaviourandexaminingdescriptivestatisticswithineachcourse,wehavearicherunderstandingofwhatoccurredandhowtomakefuturecurricularimprovementsbasedonthatinformation.

Futureresearchshouldfocusonthevalueofthelearningexperiencetoeachgroupoflearnersandhowcoursescanbedevelopedto incorporatethis information.Whiledescriptivereportsaboutcompletionrates and access of coursematerials are one approach tomaking sense of theMOOC phenomenon,learner-centred, theory-driven approaches are needed to contextualize the meaning of the findings(Wiebe, Thompson, & Behrend, 2015). MOOC learners do not follow a normal distribution curve intermsoftheirbehaviourinthecourse.However,thereisstrongevidencefromNanophotonicModellingthattherearegroupsoflearnerswhoareverysimilar.Oncewebegintounderstandmoreaboutthesegroupsoflearners,wewillhaveamorecontextualizedwaytomakeevaluativeandresearchinferencesfromthelargedatasetsofclickstreambehaviour.ItmaybeappropriatetovalueMOOCsatleastinpartbasedonhowlearnersbenefitfromaccesstotheinformation,notonlywhethertheyfollowthecourseinatraditionalsense.There isalsopotential tobreakupcourses intotwoormoremoduleswithself-containedassessments so that students could select thosemost relevant to their interests. From thisperspective,thenextstepinourresearchistounderstandqualitativelyhowlearnersineachclusterfeltabouthavingthisopportunity,andwhethertheybenefitedfromtheinformationandassessmentsinaquantifiablefashion.

9 ACKNOWLEDGEMENTS

ThisworkwasmadepossiblebyNSFAwardEEC1227110-NetworkforComputationalNanotechnologyCyberplatform) and EHR 1544259. Any opinions, findings, and conclusions or recommendations



expressed in this material are those of the authors and do not necessarily reflect the views of theNationalScienceFoundation.

REFERENCES

Arthur,D.,&Vassilvitskii,S. (2007).K-means++:Theadvantagesofcarefulseeding.Proceedingsofthe18thAnnualACM-SIAMSymposiumonDiscreteAlgorithms (SODA ’07),7–9 January2007,NewOrleans, LA, USA (pp. 1027–1035). Philadelphia, PA: Society for Industrial and AppliedMathematics.

Collins,E.D.(2013).SJSUPlusaugmentedonlinelearningenvironmentpilotprojectreport.Sacramento,CA:TheResearchandPlanningGroupforCaliforniaCommunityColleges.

Christensen,G., Steinmetz,A.,Alcorn,B.,Bennett,A.,Woods,D.,&Emanuel,E. J. (2013).TheMOOCphenomenon: Who takes massive open online courses and why?http://dx.doi.org/10.2139/ssrn.2350964

DeBoer, J., Ho, A. D., Stump, G. S., & Breslow, L. (2014). Changing “course”: Reconceptualizingeducational variables formassive open online courses. Educational Researcher, 43(2), 74–84.http://dx.doi.org/10.3102/0013189X14523038

Douglas,K.A.,Mihalec-Adkins,B.P.,Hick,N.,Diefes-Dux,N.,Bermel,P.,&Madhavan,K.(2016,June).Learners in advanced nanotechnologyMOOCs: Understanding their intention andmotivation.Paper presented at the 123rd Annual Conference of the American Society of EngineeringEducation,26–29June2016,NewOrleans,LA,USA.

Harrer,A.,Malzahn,N.,Zeini,S.,&Hoppe,H.(2007).Combiningsocialnetworkanalysiswithsemanticrelations.InC.Chinn,G.Erkens,&S.Puntambekar(Eds.),Mice,Minds,andSociety:Proceedingsofthe8thInternationalConferenceonComputer-SupportedCollaborativeLearning(CSCL2007),(pp.267–276).NewBrunswick:InternationalSocietyoftheLearningSciences.

Hecking, T., Ziebarth, S., & Hoppe, H. (2014). Analysis of dynamic resource access patterns in onlinecourses.JournalofLearningAnalytics,1(3),34–60.

Hill, P. (2013, March 6). Emerging student patterns in MOOCs: A graphical view. e-Literate [Weblogpost]. Retrieved fromhttp://mfeldstein.com/emerging_student_patterns_in_moocs_graphical_view

Hill, P. (2013,March10). Emerging studentpatterns inMOOCs:A (revised) graphical view.e-Literate.[Weblog post]. Retrieved from http://mfeldstein.com/emerging-student-patterns-in-moocs-a-revised-graphical-view/

Ho,A.D.,Chuang,I.,Reich,J.,Coleman,C.A.,Whitehill,J.,Northcutt,C.G.,Williams,J.J.,Hansen,J.D.,Lopez,G.,&Petersen,R.(2015).Harvardxandmitx:Twoyearsofopenonlinecoursesfall2012–summer2014.SocialScienceResearchNetwork(SSRN).http://dx.doi.org/10.2139/ssrn.2586847

Hollands,F.M.,&Tirthali,D.(2014).MOOCs:Expectationsandreality.NewYork:CenterforBenefit-CostStudiesofEducation,TeachersCollege,ColumbiaUniversity.



Jain,A. K. (2010).Data clustering: 50 years beyondK-means.PatternRecognition Letters,31(8), 651–666.http://dx.doi.org/10.1016/j.patrec.2009.09.011

Jordan, K. (2014). Initial trends in enrolment and completion of massive open online courses.International Review of Research in Open and Distance Learning, 15(1), 133–160.http://dx.doi.org/10.19173/irrodl.v15i1.1651

Kirk,R.E.(2008).Statistics:AnIntroduction,5thed.Belmont,CA:Thompson/Wadsworth.Kizilcec,R.F.,Piech,C.,&Schneider,E.(2013,April).Deconstructingdisengagement:Analyzinglearner

subpopulationsinmassiveopenonlinecourses.Proceedingsofthe3rdInternationalConferenceon Learning Analytics and Knowledge (LAK ’13), 170–179.https://doi.org/10.1145/2460296.2460330

Kizilcec, R. F., & Schneider, E. (2015). Motivation as a lens to understand online learners. ACMTransactions on Computer–Human Interaction (TOCHI), 22(2), Article 6, 24 pages.https://doi.org/10.1145/2699735

Koller,D.,Ng,A.,Do,C.,&Chen,Z.(2013).Retentionandintentioninmassiveopenonlinecourses:Indepth.EducauseReview,48(3),62–63.

Kosfeld, M., Heinrichs, M., Zak, P. J., Fischbacher, U., & Fehr, E. (2005). Oxytocin increases trust inhumans.Nature,435(7042),673–676.http://dx.doi.org/10.1038/nature03701

Kruskal,W. H., &Wallis,W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of theAmericanStatisticalAssociation,47(260),583–621.

Leckart,S.(2012,March20).TheStanfordeducationexperimentcouldchangehigherlearningforever.Wired.

Lisboa,P.J.,Etchells,T.A.,Jarman,I.H.,&Chambers,S.J.(2013).Findingreproducibleclusterpartitionsforthek-meansalgorithm.BMCBioinformatics,14(1).http://dx.doi.org/10.1186/1471-2105-14-S1-S8

Liyanagunawardena, T. R., Adams, A. A., &Williams, S. A. (2013).MOOCs: A systematic study of thepublished literature 2008–2012. International Review of Research in Open and DistanceLearning,14(3),202–227.http://dx.doi.org/10.19173/irrodl.v14i3.1455

Liyanagunawardena, T. R., Parslow, P., & Williams, S. A. (2014). Dropout: MOOC participants’perspective. In U. Cress, C. Delgado-Kloos (Eds.), Proceedings of the SecondMOOC EuropeanStakeholdersSummit(pp.95–100).Lausanne,Switzerland:P.A.U.Education.

Morabito,V. (2015).Bigdataandanalytics.StrategicandOrganisational Impacts.NewYork:Springer.http://dx.doi.org/10.1007/978-3-319-10665-6

Murphy,R.,Gallagher, L., Krumm,A. E.,Mislevy, J.,&Hafter,A. (2014).Researchon theuseof KhanAcademy in schools: Research brief. Menlo Park, CA: SRI Education. Retrieved fromhttp://www.sri.com/sites/default/files/publications/2014-03-07_implementation_briefing.pdf

Perna,L.W.,Ruby,A.,Boruch,R.F.,Wang,N.,Scull,J.,Ahmad,S.,etal.(2014).MovingthroughMOOCs:Understanding the progression of learners in massive open online courses. EducationalResearcher,43(9),421–432.http://dx.doi.org/10.3102/0013189X14562423

Ragan,T.J.,&Smith,P.L.(1999).Instructionaldesign.NewYork:MacmillanPublishingCompany.



Reich, J. (2014).MOOCcompletionand retention in the contextof student intent.EDUCAUSEReviewOnline. Retrieved from http://er.educause.edu/articles/2014/12/mooc-completion-and-retention-in-the-context-of-student-intent

Reich, J. (2015). Rebooting MOOC research. Science, 347(6217), 34–35.http://dx.doi.org/10.1126/science.1261627

Reich,J.,Emanuel,J.,Nesterko,S.O.,Seaton,D.T.,Mullaney,T.,Waldo,J.,etal.(2014).HeroesX:TheAncient Greek Hero: Spring 2013 Course Report. HarvardX–MITx Working Paper Series.http://dx.doi.org/10.2139/ssrn.2382246

Roco, M. C. (2011). The long view of nanotechnology development: The National NanotechnologyInitiative at 10 years. Journal of Nanoparticle Research, 13(2), 427–445.http://dx.doi.org/10.1007/s11051-010-0192-z

Romero,C.,Gutiérrez,S.,Freire,M.,&Ventura,S. (2008).Miningandvisualizingvisited trails inweb-basededucational systems. InR.S. J.d.Baker,T.Barnes,& J.E.Beck (Eds.),EducationalDataMining2008:Proceedingsofthe1stInternationalConferenceonEducationalDataMining(EDM2008), 20–21 June 2008,Montreal,QC, Canada (pp. 182–186). International EducationalDataMiningSociety.

Stark,H.,Woods,J.W.,Thilaka,B.,&Kumar,A.(2012).Probability,statistics,andrandomprocessesforengineers(Vol.76).UpperSaddleRiver,NJ:PearsonEducation.

Tinto,V. (1975).Dropout fromhighereducation:A theoretical synthesisof recent research.ReviewofEducationalResearch,45(1),89–125.

Thille,C.,Schneider,E.,Kizilcec,R.F.,Piech,C.,Halawa,S.A.,&Greene,D.K.(2014).Thefutureofdata-enrichedassessment.Research&PracticeinAssessment,9,5–16.

Tibshirani,R.,Walther,G.,&Hastie,T.(2001).Estimatingthenumberofclustersinadatasetviathegapstatistic. Journal of the Royal Statistical Society B, 63(2), 411–423.http://dx.doi.org/10.1111/1467-9868.00293

Wiebe,E.,Thompson,I.,&Behrend,T.(2015).MOOCsfromtheviewpointofthelearner:AresponsetoPerna et al. (2014). Educational Researcher, 44(4), 252–254.http://dx.doi.org/10.3102/0013189X15584774

Wilkowski, J., Deutsch, A., & Russell, D.M. (2014,March). Student skill and goal achievement in themappingwithGoogleMOOC.Proceedingsof the1stACMconferenceon Learning@Scale (L@S2014),3–10.https://dx.doi.org/10.1145/2556325.2566240

Yuan,L.,&Powell,S. (2013).MOOCsandopeneducation: Implications forhighereducation.Glasgow,UK:JISCCETIS.Retrievedfromhttp://publications.cetis.ac.uk/2013/667

big data characterization of learner behaviour in a highly ...big data characterization of learner...

Documents