mongodb & the mcgraw-hill education learning analytics platform

Post on 11-Aug-2015

71 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MongoDB and the McGraw-Hill Education Learning Analytics Platform:Towards an Open, Scalable, Streaming Solution for Education

MongoDB World 2015

2

Outline

• McGraw-Hill Education • My background• The MHE Learning Analytics Platform

(LAP)• Standardized educational input

events• MongoDB schema design• Server infrastructure• Performance

• Conclusions

3

• Global Company with over 5,000 employees

• Now a Learning Science Company• All content available digitally by Fall

2015• Higher Ed system is Connect• K-12 LMS is Engrade• Adaptive systems LearnSmart,

SmartBook and ALEKS

McGraw-Hill Education

4

• Global and Marine Seismologist• Small College Physics Professor• Oracle Database Administrator• Head of IT Operations at MIT Sloan

School of Management• Head of MHE Digital Platform Group’s

Analytics team’s Data Science group• Systems Engineer on this project

My Background

5

Motivation MHE has several digital educational platforms

including Connect for Higher Ed and Engrade for K-12

Instrument platforms to send student/educator events in real time to a central system (LAP)

Ingest and store education events in data store (MongoDB)

Analytics provides “insights” to students/educators

Introduction to LAP

6

Demo Connect Insight for Students (CIS)

Introduction to the LAP

7

Standardized education events (Caliper) Utilizes JSON-LD (linked data) format Caliper uses Actor - Verb - Object

tuple to form learning events (ex: student – submit – test)

Triggered from student/educator activity and sent to LAP input API

IMS Caliper Format for Education

8

LAP Architectural Design

LAP

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

9

LAP Architectural Design

CollectionCollection

Receiver

LAP

IngestionAPI

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

10

LAP Architectural Design

ConnectCaliperEvent

CollectionCollection

ReceiverEngradeCaliperEvent

OtherCaliperEvents

LAP

IngestionAPI

Long-termStorage

SQS

11

LAP Architectural Design

CollectionCollection

Worker

LAP

IngestionAPI

Long-termStorage

SQS

MongoDBData Store

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

12

LAP Architectural Design

CollectionCollection

Worker

LAP

IngestionAPI

Long-termStorage

SQS

MongoDBData Store

Results/AnalysisResults/

Analysis

OutputAPI

Results/Analysis

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

13

LAP Architectural Design

CollectionCollection

Worker

LAP

IngestionAPI

Long-termStorage

SQS

MongoDBData Store

Results/AnalysisResults/

Analysis

OutputAPI

Results/Analysis

Results/AnalysisResults/

Analysis

ConnectInsight for Students

EngradeInsight forTeachers

FutureInsights

Insight

OutputAPI

Results/Analysis

ConnectCaliperEvent

EngradeCaliperEvent

OtherCaliperEvents

14

• JSON-LD input suggested a document store

• MongoDB accessible and well documented

• Provided needed performance and capacity

• Support from MongoDB Inc. (10Gen)• Six Month Development Support contract

• Dedicated consultants

• Ongoing support contract

Why MongoDB?

15

Standardized education events (Caliper) Caliper (JSON-LD) produced by triggers

in the Connect Oracle database Triggered from student/educator activity

and sent to LAP input API LAP then verifies input, transforms into

MongoDB schema, calculates aggregates, and sends to data visualizations

Data Flow Through the LAP

16

Data Flow Through the LAP

Standardized education events (Caliper examples)

1. Assessment Created2. Assessment Attempt Started3. Assessment Attempt Submitted4. Assessment Attempt GradedAn assessment is an on-line homework assignment, quiz or test associated with a McGraw-Hill digital textbook.

{ "@context": " http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/ea7db7cf-2ed9-43a3-b9d4-1472265157c5", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "created", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==", "@type" : "instructor" }, "startedAtTime" : "2012-11-01T08:00:00","object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment", "category" : "homework", "origin" : "ASSESSMENT", "topics" : ["addition", "subtraction"], "maxSubmissionsAllowed" : 3, "maxTimeAllowed" : 10800, "maxOutcomePossible" : 100.0, "startDate" : "2012-11-01T04:00:00", "dueDate" : "2012-11-05T08:00:00", "assessmentName" : "Sample Assignment Zero", "noFeedback" : false, "ALEDisplayName" : "Critical Missions", "attemptDeductions" : false, "lateSubmissionDeduction" : true, "studyAttempts" : true, "forceSubmission" : true }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }}

Assessment Created Caliper Event

Assessment Attempt Started Caliper Event

{ "@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/87231361-6c9c-4ef6-8ea3-49e39e78eb4d", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "started", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==", "@type" : "student" }, "startedAtTime" : "2012-11-04T11:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }, "attemptCount" : 1}

Assessment Attempt Submitted Caliper Event

{ "@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/b1db77ea-44a7-4f99-a819-e8b7e142f457", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "submitted", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==", "@type" : "student" }, "startedAtTime" : "2012-11-04T11:45:01", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type" : "assessment", "maxSubmissionsAllowed" : 3, "maxTimeAllowed" : 10800, "dueDate" : "2012-11-08T12:00:00" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York", }, "attemptCount" : 1, "timeTaken" : 1800}

Assessment Attempt Graded Caliper Event

{ "@context" : "http://mheducation.com/mhe-caliper/v1/OutcomeEvent", "@type" : "OutcomeEvent", "@id" : "mhe-caliper:connect-000/eventId/d7c28248-8e14-495e-9a5c-bb1cc1e0882d", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "graded", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==", "@type" : "instructor" }, "startedAtTime" : "2012-11-04T19:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment", "student" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }, "attemptCount" : 1, "outcome" : .85, "percentDeducted" : 5}

21

Constraints on developing schema Several learning activities require

multiple Caliper events• Example: student starts, submits,

and is graded to complete a quiz No guarantee that external applications

will send events in chronological order May receive duplicate events

Data Flow Through the LAP

22

MongoDB Schema – Version 0.1

V0.1 2 schema model (student and

class)

Class Collection describes the class, section and assignments

Student Collection• Assessment array updated when

attempt is complete• All events for an activity• Attempts for each activity in a sub-array

23

V0.1 Problems

• Too embedded• Difficult to update a student doc • Query-logic-update

MongoDB Schema

24

MongoDB Schema Version 2

• Remove nested arrays• Move attempts doc up to the top level

25

V0.2 Problems

• Still have query-logic-update• Difficult to do atomically and maintain

deterministic state

MongoDB Schema Version 2

26

MongoDB Schema Version 3

{}{}

• Remove arrays altogether• Replace arrays with assessment and attempt docs,

each of which contains several sub-docs

27

V0.3 Atomic updates now much easier Save raw Caliper event in event collection Only update student collection if all required

events are in event collection

MongoDB Schema Version 0.3

28

Query Utilization

• 3 basic queries to build visualization for CIS• All student docs for current class

• All student docs for current student

• Class doc for current class

• All queries are on indexed parameters• Student doc _id = class_id:student_id

• Class doc _id = class_id

29

Infrastructure

• All servers and storage is in AWS• Backups done using EBS snapshots• DB size estimated to grow about 500

[GB/year]• Data size estimate small enough for un-

sharded cluster• 3 member replica sets • Write to primary, read from primary and

secondary's

30

Performance

• Estimated peak load 100 [events/sec] = 100 [kB/sec]

• Average load of 1,500,000 events/day• Max of 2,500,000 events per day• Initially planned on sharded, replicated cluster

but for now do not need this• Added SQS Queue to handle periods of very

high load• Upgraded from MongoDB 2.6 to 3.0 (~ x10

faster)

31

Conclusions

• We have a learning analytics platform in production utilizing a MongoDB data store

• After several iterations we developed a MongoDB schema which:• Handles data coming in arbitrary

order with duplicates• Performs one step, atomic inserts• Has high performance during peak

loads

top related