mongodb & the mcgraw-hill education learning analytics platform
TRANSCRIPT
MongoDB and the McGraw-Hill Education Learning Analytics Platform:Towards an Open, Scalable, Streaming Solution for Education
MongoDB World 2015
2
Outline
• McGraw-Hill Education • My background• The MHE Learning Analytics Platform
(LAP)• Standardized educational input
events• MongoDB schema design• Server infrastructure• Performance
• Conclusions
3
• Global Company with over 5,000 employees
• Now a Learning Science Company• All content available digitally by Fall
2015• Higher Ed system is Connect• K-12 LMS is Engrade• Adaptive systems LearnSmart,
SmartBook and ALEKS
McGraw-Hill Education
4
• Global and Marine Seismologist• Small College Physics Professor• Oracle Database Administrator• Head of IT Operations at MIT Sloan
School of Management• Head of MHE Digital Platform Group’s
Analytics team’s Data Science group• Systems Engineer on this project
My Background
5
Motivation MHE has several digital educational platforms
including Connect for Higher Ed and Engrade for K-12
Instrument platforms to send student/educator events in real time to a central system (LAP)
Ingest and store education events in data store (MongoDB)
Analytics provides “insights” to students/educators
Introduction to LAP
6
Demo Connect Insight for Students (CIS)
Introduction to the LAP
7
Standardized education events (Caliper) Utilizes JSON-LD (linked data) format Caliper uses Actor - Verb - Object
tuple to form learning events (ex: student – submit – test)
Triggered from student/educator activity and sent to LAP input API
IMS Caliper Format for Education
8
LAP Architectural Design
LAP
ConnectCaliperEvent
EngradeCaliperEvent
OtherCaliperEvents
9
LAP Architectural Design
CollectionCollection
Receiver
LAP
IngestionAPI
ConnectCaliperEvent
EngradeCaliperEvent
OtherCaliperEvents
10
LAP Architectural Design
ConnectCaliperEvent
CollectionCollection
ReceiverEngradeCaliperEvent
OtherCaliperEvents
LAP
IngestionAPI
Long-termStorage
SQS
11
LAP Architectural Design
CollectionCollection
Worker
LAP
IngestionAPI
Long-termStorage
SQS
MongoDBData Store
ConnectCaliperEvent
EngradeCaliperEvent
OtherCaliperEvents
12
LAP Architectural Design
CollectionCollection
Worker
LAP
IngestionAPI
Long-termStorage
SQS
MongoDBData Store
Results/AnalysisResults/
Analysis
OutputAPI
Results/Analysis
ConnectCaliperEvent
EngradeCaliperEvent
OtherCaliperEvents
13
LAP Architectural Design
CollectionCollection
Worker
LAP
IngestionAPI
Long-termStorage
SQS
MongoDBData Store
Results/AnalysisResults/
Analysis
OutputAPI
Results/Analysis
Results/AnalysisResults/
Analysis
ConnectInsight for Students
EngradeInsight forTeachers
FutureInsights
Insight
OutputAPI
Results/Analysis
ConnectCaliperEvent
EngradeCaliperEvent
OtherCaliperEvents
14
• JSON-LD input suggested a document store
• MongoDB accessible and well documented
• Provided needed performance and capacity
• Support from MongoDB Inc. (10Gen)• Six Month Development Support contract
• Dedicated consultants
• Ongoing support contract
Why MongoDB?
15
Standardized education events (Caliper) Caliper (JSON-LD) produced by triggers
in the Connect Oracle database Triggered from student/educator activity
and sent to LAP input API LAP then verifies input, transforms into
MongoDB schema, calculates aggregates, and sends to data visualizations
Data Flow Through the LAP
16
Data Flow Through the LAP
Standardized education events (Caliper examples)
1. Assessment Created2. Assessment Attempt Started3. Assessment Attempt Submitted4. Assessment Attempt GradedAn assessment is an on-line homework assignment, quiz or test associated with a McGraw-Hill digital textbook.
{ "@context": " http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/ea7db7cf-2ed9-43a3-b9d4-1472265157c5", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "created", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==", "@type" : "instructor" }, "startedAtTime" : "2012-11-01T08:00:00","object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment", "category" : "homework", "origin" : "ASSESSMENT", "topics" : ["addition", "subtraction"], "maxSubmissionsAllowed" : 3, "maxTimeAllowed" : 10800, "maxOutcomePossible" : 100.0, "startDate" : "2012-11-01T04:00:00", "dueDate" : "2012-11-05T08:00:00", "assessmentName" : "Sample Assignment Zero", "noFeedback" : false, "ALEDisplayName" : "Critical Missions", "attemptDeductions" : false, "lateSubmissionDeduction" : true, "studyAttempts" : true, "forceSubmission" : true }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }}
Assessment Created Caliper Event
Assessment Attempt Started Caliper Event
{ "@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/87231361-6c9c-4ef6-8ea3-49e39e78eb4d", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "started", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==", "@type" : "student" }, "startedAtTime" : "2012-11-04T11:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }, "attemptCount" : 1}
Assessment Attempt Submitted Caliper Event
{ "@context" : "http://purl.imsglobal.org/ctx/caliper/v1/AssessmentEvent", "@type" : "AssessmentEvent", "@id" : "mhe-caliper:connect-000/eventId/b1db77ea-44a7-4f99-a819-e8b7e142f457", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "submitted", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==", "@type" : "student" }, "startedAtTime" : "2012-11-04T11:45:01", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type" : "assessment", "maxSubmissionsAllowed" : 3, "maxTimeAllowed" : 10800, "dueDate" : "2012-11-08T12:00:00" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York", }, "attemptCount" : 1, "timeTaken" : 1800}
Assessment Attempt Graded Caliper Event
{ "@context" : "http://mheducation.com/mhe-caliper/v1/OutcomeEvent", "@type" : "OutcomeEvent", "@id" : "mhe-caliper:connect-000/eventId/d7c28248-8e14-495e-9a5c-bb1cc1e0882d", "generatedAtTime" : "2012-11-01T08:00:01", "verb" : "graded", "actor" : { "@id" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc3Nw==", "@type" : "instructor" }, "startedAtTime" : "2012-11-04T19:00:00", "object" : { "@id" : "mhe-caliper:connect-000/assessment/MjYwMTM4Nzc4Mg==", "@type": "assessment", "student" : "mhe-caliper:connect-000/actor/MjYwMTM4Nzc4MQ==" }, "learningContext" : { "enrolledIn" : "section", "course" : "mhe-caliper:connect-000/course/MjYwMTM4Nzc4NQ==", "section" : "mhe-caliper:connect-000/section/MjYwMDU1MTc0Ng==", "courseName" : "Intro to Algebra", "sectionName" : "MWF 10-11", "timeZone" : "America/New_York" }, "attemptCount" : 1, "outcome" : .85, "percentDeducted" : 5}
21
Constraints on developing schema Several learning activities require
multiple Caliper events• Example: student starts, submits,
and is graded to complete a quiz No guarantee that external applications
will send events in chronological order May receive duplicate events
Data Flow Through the LAP
22
MongoDB Schema – Version 0.1
V0.1 2 schema model (student and
class)
Class Collection describes the class, section and assignments
Student Collection• Assessment array updated when
attempt is complete• All events for an activity• Attempts for each activity in a sub-array
23
V0.1 Problems
• Too embedded• Difficult to update a student doc • Query-logic-update
MongoDB Schema
24
MongoDB Schema Version 2
• Remove nested arrays• Move attempts doc up to the top level
25
V0.2 Problems
• Still have query-logic-update• Difficult to do atomically and maintain
deterministic state
MongoDB Schema Version 2
26
MongoDB Schema Version 3
{}{}
• Remove arrays altogether• Replace arrays with assessment and attempt docs,
each of which contains several sub-docs
27
V0.3 Atomic updates now much easier Save raw Caliper event in event collection Only update student collection if all required
events are in event collection
MongoDB Schema Version 0.3
28
Query Utilization
• 3 basic queries to build visualization for CIS• All student docs for current class
• All student docs for current student
• Class doc for current class
• All queries are on indexed parameters• Student doc _id = class_id:student_id
• Class doc _id = class_id
29
Infrastructure
• All servers and storage is in AWS• Backups done using EBS snapshots• DB size estimated to grow about 500
[GB/year]• Data size estimate small enough for un-
sharded cluster• 3 member replica sets • Write to primary, read from primary and
secondary's
30
Performance
• Estimated peak load 100 [events/sec] = 100 [kB/sec]
• Average load of 1,500,000 events/day• Max of 2,500,000 events per day• Initially planned on sharded, replicated cluster
but for now do not need this• Added SQS Queue to handle periods of very
high load• Upgraded from MongoDB 2.6 to 3.0 (~ x10
faster)
31
Conclusions
• We have a learning analytics platform in production utilizing a MongoDB data store
• After several iterations we developed a MongoDB schema which:• Handles data coming in arbitrary
order with duplicates• Performs one step, atomic inserts• Has high performance during peak
loads