mongodb & the mcgraw-hill education learning analytics platform

31
MongoDB and the McGraw-Hill Education Learning Analytics Platform: Towards an Open, Scalable, Streaming Solution for Education MongoDB World 2015

Upload: mongodb

Post on 11-Aug-2015

71 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MongoDB & The McGraw-Hill Education Learning Analytics Platform

MongoDB and the McGraw-Hill Education Learning Analytics Platform:Towards an Open, Scalable, Streaming Solution for Education

MongoDB World 2015

Page 2: MongoDB & The McGraw-Hill Education Learning Analytics Platform

2

Outline

• McGraw-Hill Education • My background• The MHE Learning Analytics Platform

(LAP)• Standardized educational input

events• MongoDB schema design• Server infrastructure• Performance

• Conclusions

Page 3: MongoDB & The McGraw-Hill Education Learning Analytics Platform

3

• Global Company with over 5,000 employees

• Now a Learning Science Company• All content available digitally by Fall

2015• Higher Ed system is Connect• K-12 LMS is Engrade• Adaptive systems LearnSmart,

SmartBook and ALEKS

McGraw-Hill Education

Page 4: MongoDB & The McGraw-Hill Education Learning Analytics Platform

4

• Global and Marine Seismologist• Small College Physics Professor• Oracle Database Administrator• Head of IT Operations at MIT Sloan

School of Management• Head of MHE Digital Platform Group’s

Analytics team’s Data Science group• Systems Engineer on this project

My Background

Page 5: MongoDB & The McGraw-Hill Education Learning Analytics Platform

5

Motivation MHE has several digital educational platforms

including Connect for Higher Ed and Engrade for K-12

Instrument platforms to send student/educator events in real time to a central system (LAP)

Ingest and store education events in data store (MongoDB)

Analytics provides “insights” to students/educators

Introduction to LAP

Page 6: MongoDB & The McGraw-Hill Education Learning Analytics Platform

6

Demo Connect Insight for Students (CIS)

Introduction to the LAP

Page 7: MongoDB & The McGraw-Hill Education Learning Analytics Platform

7

Standardized education events (Caliper) Utilizes JSON-LD (linked data) format Caliper uses Actor - Verb - Object

tuple to form learning events (ex: student – submit – test)

Triggered from student/educator activity and sent to LAP input API

IMS Caliper Format for Education

Page 8: MongoDB & The McGraw-Hill Education Learning Analytics Platform

8

LAP Architectural Design

LAP

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

Page 9: MongoDB & The McGraw-Hill Education Learning Analytics Platform

9

LAP Architectural Design

CollectionCollection

Receiver

LAP

IngestionAPI

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

Page 10: MongoDB & The McGraw-Hill Education Learning Analytics Platform

10

LAP Architectural Design

ConnectCaliperEvent

CollectionCollection

ReceiverEngradeCaliperEvent

OtherCaliperEvents

LAP

IngestionAPI

Long-termStorage

SQS

Page 11: MongoDB & The McGraw-Hill Education Learning Analytics Platform

11

LAP Architectural Design

CollectionCollection

Worker

LAP

IngestionAPI

Long-termStorage

SQS

MongoDBData Store

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

Page 12: MongoDB & The McGraw-Hill Education Learning Analytics Platform

12

LAP Architectural Design

CollectionCollection

Worker

LAP

IngestionAPI

Long-termStorage

SQS

MongoDBData Store

Results/AnalysisResults/

Analysis

OutputAPI

Results/Analysis

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

Page 13: MongoDB & The McGraw-Hill Education Learning Analytics Platform

13

LAP Architectural Design

CollectionCollection

Worker

LAP

IngestionAPI

Long-termStorage

SQS

MongoDBData Store

Results/AnalysisResults/

Analysis

OutputAPI

Results/Analysis

Results/AnalysisResults/

Analysis

ConnectInsight for Students

EngradeInsight forTeachers

FutureInsights

Insight

OutputAPI

Results/Analysis

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

Page 14: MongoDB & The McGraw-Hill Education Learning Analytics Platform

14

• JSON-LD input suggested a document store

• MongoDB accessible and well documented

• Provided needed performance and capacity

• Support from MongoDB Inc. (10Gen)• Six Month Development Support contract

• Dedicated consultants

• Ongoing support contract

Why MongoDB?

Page 15: MongoDB & The McGraw-Hill Education Learning Analytics Platform

15

Standardized education events (Caliper) Caliper (JSON-LD) produced by triggers

in the Connect Oracle database Triggered from student/educator activity

and sent to LAP input API LAP then verifies input, transforms into

MongoDB schema, calculates aggregates, and sends to data visualizations

Data Flow Through the LAP

Page 16: MongoDB & The McGraw-Hill Education Learning Analytics Platform

16

Data Flow Through the LAP

Standardized education events (Caliper examples)

1. Assessment Created2. Assessment Attempt Started3. Assessment Attempt Submitted4. Assessment Attempt GradedAn assessment is an on-line homework assignment, quiz or test associated with a McGraw-Hill digital textbook.

Page 17: MongoDB & The McGraw-Hill Education Learning Analytics Platform

{ "@context": " http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/ea7db7cf-2ed9-43a3-b9d4-1472265157c5", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "created", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==", "@type" : "instructor" }, "startedAtTime" : "2012-11-01T08:00:00","object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment", "category" : "homework", "origin" : "ASSESSMENT", "topics" : ["addition", "subtraction"], "maxSubmissionsAllowed" : 3, "maxTimeAllowed" : 10800, "maxOutcomePossible" : 100.0, "startDate" : "2012-11-01T04:00:00", "dueDate" : "2012-11-05T08:00:00", "assessmentName" : "Sample Assignment Zero", "noFeedback" : false, "ALEDisplayName" : "Critical Missions", "attemptDeductions" : false, "lateSubmissionDeduction" : true, "studyAttempts" : true, "forceSubmission" : true }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }}

Assessment Created Caliper Event

Page 18: MongoDB & The McGraw-Hill Education Learning Analytics Platform

Assessment Attempt Started Caliper Event

{ "@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/87231361-6c9c-4ef6-8ea3-49e39e78eb4d", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "started", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==", "@type" : "student" }, "startedAtTime" : "2012-11-04T11:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }, "attemptCount" : 1}

Page 19: MongoDB & The McGraw-Hill Education Learning Analytics Platform

Assessment Attempt Submitted Caliper Event

{ "@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/b1db77ea-44a7-4f99-a819-e8b7e142f457", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "submitted", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==", "@type" : "student" }, "startedAtTime" : "2012-11-04T11:45:01", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type" : "assessment", "maxSubmissionsAllowed" : 3, "maxTimeAllowed" : 10800, "dueDate" : "2012-11-08T12:00:00" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York", }, "attemptCount" : 1, "timeTaken" : 1800}

Page 20: MongoDB & The McGraw-Hill Education Learning Analytics Platform

Assessment Attempt Graded Caliper Event

{ "@context" : "http://mheducation.com/mhe-caliper/v1/OutcomeEvent", "@type" : "OutcomeEvent", "@id" : "mhe-caliper:connect-000/eventId/d7c28248-8e14-495e-9a5c-bb1cc1e0882d", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "graded", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==", "@type" : "instructor" }, "startedAtTime" : "2012-11-04T19:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment", "student" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }, "attemptCount" : 1, "outcome" : .85, "percentDeducted" : 5}

Page 21: MongoDB & The McGraw-Hill Education Learning Analytics Platform

21

Constraints on developing schema Several learning activities require

multiple Caliper events• Example: student starts, submits,

and is graded to complete a quiz No guarantee that external applications

will send events in chronological order May receive duplicate events

Data Flow Through the LAP

Page 22: MongoDB & The McGraw-Hill Education Learning Analytics Platform

22

MongoDB Schema – Version 0.1

V0.1 2 schema model (student and

class)

Class Collection describes the class, section and assignments

Student Collection• Assessment array updated when

attempt is complete• All events for an activity• Attempts for each activity in a sub-array

Page 23: MongoDB & The McGraw-Hill Education Learning Analytics Platform

23

V0.1 Problems

• Too embedded• Difficult to update a student doc • Query-logic-update

MongoDB Schema

Page 24: MongoDB & The McGraw-Hill Education Learning Analytics Platform

24

MongoDB Schema Version 2

• Remove nested arrays• Move attempts doc up to the top level

Page 25: MongoDB & The McGraw-Hill Education Learning Analytics Platform

25

V0.2 Problems

• Still have query-logic-update• Difficult to do atomically and maintain

deterministic state

MongoDB Schema Version 2

Page 26: MongoDB & The McGraw-Hill Education Learning Analytics Platform

26

MongoDB Schema Version 3

{}{}

• Remove arrays altogether• Replace arrays with assessment and attempt docs,

each of which contains several sub-docs

Page 27: MongoDB & The McGraw-Hill Education Learning Analytics Platform

27

V0.3 Atomic updates now much easier Save raw Caliper event in event collection Only update student collection if all required

events are in event collection

MongoDB Schema Version 0.3

Page 28: MongoDB & The McGraw-Hill Education Learning Analytics Platform

28

Query Utilization

• 3 basic queries to build visualization for CIS• All student docs for current class

• All student docs for current student

• Class doc for current class

• All queries are on indexed parameters• Student doc _id = class_id:student_id

• Class doc _id = class_id

Page 29: MongoDB & The McGraw-Hill Education Learning Analytics Platform

29

Infrastructure

• All servers and storage is in AWS• Backups done using EBS snapshots• DB size estimated to grow about 500

[GB/year]• Data size estimate small enough for un-

sharded cluster• 3 member replica sets • Write to primary, read from primary and

secondary's

Page 30: MongoDB & The McGraw-Hill Education Learning Analytics Platform

30

Performance

• Estimated peak load 100 [events/sec] = 100 [kB/sec]

• Average load of 1,500,000 events/day• Max of 2,500,000 events per day• Initially planned on sharded, replicated cluster

but for now do not need this• Added SQS Queue to handle periods of very

high load• Upgraded from MongoDB 2.6 to 3.0 (~ x10

faster)

Page 31: MongoDB & The McGraw-Hill Education Learning Analytics Platform

31

Conclusions

• We have a learning analytics platform in production utilizing a MongoDB data store

• After several iterations we developed a MongoDB schema which:• Handles data coming in arbitrary

order with duplicates• Performs one step, atomic inserts• Has high performance during peak

loads