learning analytics community exchange - lace projectlaceproject.eu › publications ›...

20
Learning Analytics Community Exchange Interoperability Study Assessment and Allied Activities Draft for Public Comment By: Adam Cooper Published: 09 December 2014 Keywords: learning analytics, assessment, quiz, essay, survey, activity data This working document considers the kind of events that are likely to occur in mainstream assessment processes, and allied activity such as questionnaire responses. It does so from the perspective of asking: “which events are of interest for learning analytics?Its purpose is to identify that set of events and attributes which may be considered to be commonplace, and therefore a candidate for cross-platform data stores, analysis-time mapping, improving interoperability, and standardisation. Audience: this is a relatively technical document aimed at readers with experience in software development and architecture, or development of interoperability standards, etc.

Upload: others

Post on 07-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Learning Analytics Community Exchange

Interoperability Study – Assessment and Allied

Activities Draft for Public Comment

By: Adam Cooper

Published: 09 December 2014

Keywords: learning analytics, assessment, quiz, essay, survey, activity

data

This working document considers the kind of events that are likely to

occur in mainstream assessment processes, and allied activity such as

questionnaire responses. It does so from the perspective of asking:

“which events are of interest for learning analytics?” Its purpose is to

identify that set of events and attributes which may be considered to be

commonplace, and therefore a candidate for cross-platform data stores,

analysis-time mapping, improving interoperability, and standardisation.

Audience: this is a relatively technical document aimed at readers with

experience in software development and architecture, or development

of interoperability standards, etc.

Page 2: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

Contents

1. Introduction .............................................................................................................................. 1

Caveat ........................................................................................................................................... 2

2. Assessment Analytics Context .................................................................................................... 2

3. A Simple Core Model ................................................................................................................. 5

Assumptions .................................................................................................................................. 5

Data for All Events ......................................................................................................................... 6

Core Model Events and Their Attributes ........................................................................................ 8

Possible Missing Pieces ................................................................................................................ 12

4. Current Standards .................................................................................................................... 13

PSLC DataShop ............................................................................................................................ 13

ADL Experience API (xAPI) ............................................................................................................ 14

IMS LIS......................................................................................................................................... 15

IMS Caliper .................................................................................................................................. 16

MOOCdb ..................................................................................................................................... 16

5. Source Material ....................................................................................................................... 16

Technical ..................................................................................................................................... 16

References .................................................................................................................................. 17

6. About ... ................................................................................................................................... 18

Page 3: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

1

1. Introduction This working document1 explores the processes of various kinds of mainstream assessment and

allied activities from the perspective of asking “which events may be of interest for learning

analytics?” This exploration is not ab-initio, but takes IMS QTI as a starting-point conceptual model.

The events of interest include those triggered by the learner as well as events that pertain to the

learner’s actions, for example scoring.

The immediate purpose in conducting this study is to determine that set of events and attributes

which may be considered to be commonplace, and therefore a candidate for cross-platform data

stores, analysis-time mapping, improving interoperability, and potentially standardisation.

The following are considered to be in scope as “assessment and allied activities”:

Objective questions delivered electronically and scored automatically.

Manual essay marking, including using “electronic management of assessment” tools such as

Turnitin Grademark, and automatic originality evaluation (often termed plagiarism

detection).

Surveys/questionnaires delivered electronically, but not scored. For example, end of module

satisfaction surveys.

Competency examinations, observation of candidate behaviour and assessment against

objective descriptions of skills.

Objective questions delivered physically and scored automatically (e.g. optical mark

reading).

Use of “clickers”.

Module/course/unit level grading.

Assessment of portfolio evidence.

Double marking, moderation, and other managed quality assurance and adjustment

activities.

Automated score adjustment, such as lateness penalties, “extra credit” options, etc.

For those situations where electronic delivery occurs, this may be achieved in fully online, or online-

offline-sync scenarios. Events may originate from one or both of client (user device) and server

(responsible for delivery).

These activities do not involve an identical set of events but have sufficient overlap to motivate an

exploration of the common ground between them. Such a common vocabulary – the specific one

developed in this paper will be referred to as the Core Model – should make it easier to develop re-

usable analytical methods2, i.e.to be general purpose. Additions may be required to allow for

assessment activities that do not fit precisely into the assumed stereotypes.

Although a general purpose Core Model promises benefits through consistency, an over-general

approach forces a degree of abstraction, or extensive optionality, which would make consistent use

of the Core Model more difficult, consequently making analysis more difficult or impractical. Hence,

1 As a working document, it is subject to change without archival of previous revisions.

2 By considering a range of applications, it is assumed that the resulting Core Model is more likely to be general

purpose and not to only be suited to one or two models for how assessment happens.

Page 4: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

2

the following are considered out of scope on the grounds that they represent a small fraction of the

assessment activity currently undertaken in educational/training establishments yet would require

the addition of complexity into a Core Model3:

Adaptive testing.

Peer assessment.

Assessment in the context of Intelligent Tutoring Systems.

Specialist assessment models/theories such as Item Response Theory.

Section 5, “Source Material”, gives references to the technical and other sources used.

Caveat

In spite of its resting on the existing, and widely implemented, basis of IMS QTI, this study should be

considered as a speculation. It is a starting point from which to consider each of the bulleted

applications given in the introduction, as general cases that summarises a considerable range of real-

world variety.

2. Assessment Analytics Context Before considering the specifics of events in assessment and allied activities, it is sensible to briefly

consider the ultimate application of the data for learning analytics. What is it that people4 are likely

to want to do with this data?

It seems reasonable to claim that assessment processes are the oldest source of data about learning

yet data from assessment has not received a great deal of attention in the literature on learning

analytics, except at a coarse level: predictions based on macro-level outcomes and grade point

averages, analytics looking at the relationship between summative outcomes and candidate

attributes, or simple visualisations of scores. This belies two realities: the ease with which

assessment analytics can be aligned to current teaching and learning practice; the extensive history

of psychometrics. Brief comment is made on these two points, although the place of assessment in

teaching and learning practice is taken as self-evident, followed by comment on existing research

literature from the Learning Analytics and Educational Data Mining communities.

Public discourse in the e-Learning space makes it clear that there is growing interest in the Electronic

Management of Assessment5 (EMA). Although the focus of attention is not on learning analytics,

these platforms are relevant in what they make possible. Combining this with the evolving support

for assessment activities in software such as Moodle, Blackboard, and Turnitin, and a sizable base for

specialised e-Assessment platforms6 suggests that there is a good basis for assessment analytics.

3 They are, arguably, better dealt with in a special-purpose, rather than general-purpose, scheme. 4 The assumption here is that these people are working in the context of an education or training organisation, or a software/content provider to those organisations. Organisations specialising in testing and assessment are not the expected users of the Core Model. 5 For example, Jisc, which supports effective use of technology across UK Higher and Further Education, has a

current (2014) project on the topic - http://www.jisc.ac.uk/research/projects/electronic-management-of-assessment . 6 The 2014 survey of Technology Enhanced Learning, http://www.ucisa.ac.uk/tel , conducted by the UK

organisation UCISA, showed that the following percentages of respondents had as a centrally-supported eAssessment tool: Blackboard (50%), Moodle (29%), Questionmark Perception (23%).

Page 5: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

3

Alignment of assessment analytics to current practice could be expanded upon in various ways, but

the overall idea is that current practice provides a jumping-off point for adapting practice to

incorporate more use of the results of data analysis. This might involve improving the identification

of learner uncertainty/difficulty, providing more actionable feedback, improving the assessment

instruments, identifying weaknesses in the learning activities and resources, etc. These practices

occur today, but they could be enhanced in scale and quality with access to data with suitable

structure and level of detail.

The learning analytics discourse has given surprisingly little attention to these aspects, given the way

assessment affects all learners and is embedded in teaching and learning practice (Ellis 2013). Ellis’s

work is interesting both because she identifies assessment as a good entry point for learning

analytics with teaching staff, and because she focuses particularly on electronic support for teacher-

marked essays, which challenges the common assumption that assessment analytics is only about

“quizzes” containing objective questions. The use cases emerging from the use of Electronic

Management of Assessment (EMA) systems for essay marking7 provide a useful counter-balance to

the prevailing view of “e-assessment emphasis”.

Although psychometrics – the theory and techniques of psychological measurement – is a field with

much that is accessible only to assessment specialists and research workers, it includes Classical Test

Theory (CTT), which should be accessible to anyone competent to carry out learning analytics8. CTT

can be employed for the following cases9 in a typical learning analytics setting:

Item level difficulty, discrimination, reliability, etc. IMS QTI includes support for item level

usage statistics.

Test-level utility, determining whether the test tells us anything useful about the candidates.

Inter-rater reliability, in which the consistency of scoring by two or more human or

computer-based markers is compared statistically.

These do not cover all aspects of assessment-related analytics, being predominantly about the

instruments of assessment being exercised on a cohort/sample10. They are oriented towards, and

commonly used to evaluate, the quality of summative assessments and are less suited to, and used

for, supporting the processes of teaching and learning, although CTT is used to identify possible

issues with learning activity design and learning resource content.

The Educational Data Mining (EDM) conference proceedings and journal, the Learning Analytics and

Knowledge (LAK) conference proceedings, and the Journal of Learning Analytics contain numerous

7 Cath Ellis’s presentation to the ALT ELESIG – video at http://vimeo.com/85331242 – and to eLearning Forum Asia – slides at http://elfasia.org/2012/wp-content/uploads/2011/11/Breakout-Session2A-Speaker_2.pdf – make the case for assessment analytics to improve student learning. 8 The Questionmark white paper, “Item Analysis Analytics” by Greg Pope (available via https://www.questionmark.com/us/whitepapers/Pages/default.aspx), illustrate this scope of accessibility. Moodle also provides a “quiz statistics report”, which includes standard CTT measures (https://docs.moodle.org/28/en/Quiz_statistics_report). Software libraries exist for CTT, for example for R: http://cran.r-project.org/web/views/Psychometrics.html 9 Note: the Core Model will not consider these statistics per se as they are the product of analysis; we are only

concerned with the capture of data to support these kinds of analysis. 10

For example, item difficulty is only meaningful when referred to a sample; the same question might be trivial to mathematics undergraduate and impossible for a middle school student.

Page 6: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

4

references to assessment. On the whole, however, these are not particularly relevant to this study,

oriented as it is to practical learning analytics activities in the near- to mid-term, because they:

fall outside either the likely competence or educational practices of typical educational

establishments; or

deal with summative results rather than more granular records of activity.

There are some exceptions to this general statement, for example a poster at the 2014 EDM

Conference entitled “Towards Uncovering the Mysterious World of Math Homework”(Feng 2014). A

notable exception is a paper from the 2009 EDM Conference describing the use of process mining

tools to explore patterns of navigation and response events in the taking of online objective tests

(Pechenizkiy et al. 2009), further described in a chapter of the Handbook of Educational Data mining

(Trcka et al. 2011). The process mining approach is also applicable to situations where events from

different sources overlap, for example where video watching and question answering combine to

give a richer picture of student activity outside the nominal “attempting a question”. The chapter by

Trcka et al is also interesting in that it describes the use of an established piece of process mining

software, ProM, with an existing XML workflow logging language, MXML, although it is now

superseded by XES (Günther & Verbeek 2014), which has been submitted to IEEE for

standardisation.

In summary, the context imagined for this study includes four different kinds of situation in which

assessment and related events have a role in “closing the loop”:

Assessment instruments are designed and used. Assessment analytics allows for the

determination of reliability, efficiency, and validity and the identification of which

assessments, items, or markers/scorers are in need of improvement.

Courses are designed and delivered. Assessment analytics may allow for the objective

determination of topics that require more clarity in presentation, opportunities to practice,

under-pinning material, etc. This may be achieved in “real time” during delivery or in

periodic re-design.

Informative (“instructional”) resources and low-stakes assessment are typically combined in

LMSs and similar tools. Understanding patterns of activity across these two kinds of resource

can help in the improvement of the informative resources.

Students are accustomed to receiving marks and written feedback, but assessment analytics

can help to pinpoint particular areas as next steps for improvement in knowledge, skills, or

performance-acts (e.g. essay construction), etc. in ways that are more precise and

convincing.

Finally, it is noted that Learning analytics may also be used as an assessment instrument, where

outcome measures are derived from captured activity data. This kind of approach may be able to

extend the diversity of assessment opportunities, especially to include more natural/authentic

situations in a scalable way, for example to encompass ideas of membership and process, or skilled

practice. This is not, however, the focus of this paper.

Page 7: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

5

3. A Simple Core Model The summary from the previous section has implications at the level of data. It indicates the

potential for the gathering, management, and analysis of assessment-related data that goes well

beyond summative scores11 to incorporate:

Detail on the strengths and weaknesses in submitted work in a form that allows for

corrective action.

Detail at per-item level.

Detail in the sequence of events including time-on-task, pathways taken, etc.

These indicate where a Core Model should include detail.

Assumptions

Granularity of Events

It has been assumed that the Core Model will focus on events - at instances in time, not over

durations - that a learner would identify as significant in the assessment process. This does not

generally give data such as “time spent answering question N”, but such statistics may be computed

at analysis time. The data that would be used to compute such derived quantities should be stored

in any case, since it may be useful for computing numerous other metrics or identifying various kinds

of patterns. The principle of storing the most atomic data rather than derived data should offer the

greatest potential utility to a range of analytics situations, including some yet to be practiced. Non-

derived data is also very important for tracking-down errors, and it may be a necessary component

of transparent analytics, in which later challenges or queries must be addressed.

Treating the events as single atomic items also allows the same vocabulary to be used for the range

of different cases indicated as in scope in the introduction; there will be variety in the lifecycles of

some cases but these may be expressed using a common vocabulary. For example, whereas an

online delivery system might have a very well-defined idea that an attempt at an assessment item

can be defined by the interval between showing that question and a response being made, this is of

dubious utility when an essay is submitted and marked by a tutor. Yet both include a submission

event and the production of a score/grade.

IMS QTI as a Base Model and Vocabulary

IMS QTI has evolved over a number of years and is built on the experience and expertise of members

of the assessment industry, as well as having numerous implementations in software. This is good

reason to believe that it correctly captures many of the key concepts of assessment design and

online delivery, and does so with discrimination. Furthermore, the development of QTI by multiple

participants in the project team gives some assurance that it reflects a range of practices.

The scope of the Core Model, as outlined in the previous sections, differs from QTI in that the Core

Model focuses on the learner/candidate experience, whereas QTI focuses on expressing assessment

11

This document is not concerned with interoperability at the level of summative scores. While it may be an exaggeration to say that this is “a solved problem”, there are several approaches for which there is evidence: IMS LIS Outcomes (and a profile for IMS LTI), SCORM, ADL eXperience API, and IMS QTI Results.

Page 8: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

6

content, how responses should be transformed to outcomes, and a means to convey results

(although the Results Reporting part is the least implemented).

Using IMS QTI as a base model and vocabulary for the Simple Core Model should both reduce the

effort required to describe the Core Model and increase its quality. The effort required becomes

more a case of re-expressing some QTI ideas from a candidate/learner perspective, and investigating

applicability outside the umbrella scenario of e-assessment in QTI (although the QTI Results

Reporting specification is explicit in its coverage of assessment other than by testing).

Minimal Assessment Resource Metadata, No Assessment Resource Content

In practice, there will generally be information about the assessment resource, in addition to the

resource itself, that describes or controls how it is delivered, scored, managed, etc. This information

would be applicable to all learners being assessed in a given instance of the assessment. While this

information will be necessary for some analyses, for the purposes of developing a Simple Core

Model, an approach of minimising metadata and avoiding content has been adopted because:

It would greatly inflate the task in hand to propose a common model, or such a model

already exists (e.g. IMS QTI);

Capturing learner-level data in a consistent and unified form is taken to be the currently-

significant missing piece of the educational technology jigsaw, whereas the metadata and

content already exists (although not necessarily in a consistent and unified form).

Candidate activity and metadata have very different lifecycles and rate of

production/change.

Light-weight event payloads are sought in the interest of scalability and performance.

In general, the operationalisation of “minimal assessment resource metadata” will be the inclusion

in the event payload of identifiers to related facts.

The aim is to strenuously avoid following a design path that leads to a data model that looks like a

data warehouse schema for assessment on the grounds that this would be to create something that

is too challenging to adopt, as well as being questionable as an architecture for event logging. The

use of identifiers allows later processing into an OLAP12 data cube, etc, should that be required.

Individual as the Subject of Assessment

The discussion in this document assumes it is individuals that are the subject of assessment.

Data for All Events

Time

At least three times may be applicable: 1) the clock time of the device being used at the time of the

event; 2) the clock time of the server storing event logs for analytics and 3) the clock time of the

delivery system (learning management or e-Assessment server). Time zone differences, and the

possibility of online-offline-sync uses, mean that discriminating between these may be very

important for analysis; from the point of view of analysing the learning process, the learner’s local

time is highly relevant.

12

OnLine Analytical Processing, an established business intelligence approach to efficiently querying multi-dimensional data.

Page 9: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

7

The Core Model includes, for all events:

Clock time of the device being used for access, since this is the best estimator of the

learner’s local time.

Clock time of the delivery system, since this will reflect course delivery timings (e.g. release

times, deadlines).

It is assumed that the logging server time stores its time as a matter of course, and that this is

available when event data is extracted from store for analysis.

Identifiers

The following are assumed to be identified for all events logged:

A user identifier. This may require cross-mapping in analysis pre-processing13.

A session identifier. This would show the authentication session for online use, or its

equivalent for off-line cases (e.g. an observation session in a competency assessment).

One or more identifiers for the assessment and its component parts. See below.

Identification of the application and version that originates the data. This may influence

the meaning or significance attributed to the data when it is analysed. It is equivalent to

tool_consumer_info_* in IMS LTI.

In addition, when the assessment items are delivered electronically:

An “attempt” session identifier. This has application-specific meaning for what constitutes

an attempt on an item in an online delivery system. For IMS QTI compliant software, this is

the Candidate Session identifier14.

Identification of the containing learning resource/activity. This may include identification of

the “learning context”. These should at least include equivalent s for resource link and

context ids as defined in IMS LTI. This provides information on where the assessment was

launched from and the course/module it is part of.

Identifiers for the assessment and component parts follow the following structure:

The assessment. This is always present and must uniquely identify a single assessment

opportunity. This is distinct from the QTI usage, which refers to the test (a measuring

instrument). An assessment opportunity could be described as an instance of a test available

in a limited time window, and generally to a limited number of candidates, but it also

includes assessment other than by test. It is identical to the LineItem concept as defined in

IMS LIS (Outcomes Management Service).

An assessment part. Assessment parts distinguish independently submit-able units of the

assessment. This must be unique within an assessment and must be present when a subset

13

It is common practice to expose different user ids to external tools, and to use different identifiers for data exports that may contain PII (e.g. forum text), as an aspect of data security. 14

This statement is in need of verification. The QTI ASI specification glossary defines Candidate Session in terms of interactions with items but does not make clear what the meaning of the term is when multiple items are viewed simultaneously.

Page 10: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

8

of items in an assessment is submitted. See the QTI specification of “submissionMode” for

an account of the QTI equivalent “testPart” in simultaneous submission mode.

A section. Section identifiers distinguish groups of assessment items that are presented

together, but not with items in another section. For example, if sets of questions are shown

one screen-page at a time, the section identifier would show which questions were

presented together. There may be 1 or more sections in a part. If the items in a section are

submittable (for outcome processing), then those items also comprise a part. This must be

unique within an assessment and must be present when a subset of items in an assessment

is shown.

An item. An atomic unit of scoring that may contain information, instructions, and more

than one unit of interaction (e.g. two related multiple choice selections). The concept of an

item, and its relationship to interactions, is as described in the QTI Implementation Guide

but is not dependent on a delivery system implementing QTI.

Identifiers for these will be abbreviated Aid, Pid, Sid, Iid in the following.

Core Model Events and Their Attributes

Following on from the brief outline of learning analytics requirements, and taking IMS QTI as a

conceptual reference-point, a Core Model is advanced in the following two tables. These outline,

separately, the kind of event (Table 1Table 1) and the nature of the attributes needed to capture the

necessary facts about the event (Table 2).

Action Event Attributes Notes

Access Aid

Sid

Array of Iid15

Resume

[NumAttempt]

At least one item is presented to the respondent16. If only some items in an assessment are presented, Sid must be specified. This event must always occur in cases of e-assessment and may also occur if, for example, the task description for an assignment is provided electronically but the activity of producing the assignment is not tracked.

Response Change Aid

Iid

Item response

This event, which only applies for fully-tracked electronic delivery (and is not required), allows for the capture of the current state of the response to a given item after it changes

17, but before submission or leaving the item(s).

TimeOut Aid

Sid

Array of Iid

The delivery system reached the maximum time allowed for a response and prevented further interaction. This may, but need not be, immediately followed by submission of responses for scoring, signalled by a separate event.

Get Hint Aid

Sid

Array of Iid

Identifier for the hint resource

This is a no-penalty hint requiring no response processing. See section 7.6 in QTI specification (infoControl class). Hints will usually be at the level of an item.

Leave Aid

Sid

The respondent discontinued interaction in a controlled manner (e.g. due to clicking “next”). The current state of their

15

Of length 1 for single item presentation and listing all item identifiers in the given assessment or section that is identified. The same applies to all cases where “Array of Iid” is indicated in this table. 16

This may be an item for which no response is possible, e.g. initial instructions. 17

Some detail may be required here if the Core Model were to be developed into a technical specification, although it may be best left as an application-specific rule (e.g. JavaScript onChange event in HTML) as to what constitutes a change.

Page 11: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

9

Array of Iid

Array of item responses

responses may be saved, according to application design, but there is no response processing. The state following this event is equivalent to the pendingSubmission value of sessionStatus in the QTI Results Reporting specification.

Submit Aid

Pid

Array of Iid

Array of item responses

KeyId

OutcomeExpected

[NumAttempt]

Response(s) have been submitted, either by an explicit user action or automatically (e.g. following a time-out). Depending on the assessment, it may be that items are submitted individually, in sets (“parts”), or for the whole assessment. In all cases, the 1..* values for the submitted items are included, along with the root Aid. The state following this event is equivalent to the pendingResponseProcessing value of sessionStatus in the QTI Results Reporting specification. For essay assignments and similar cases, the Iid could be considered to be redundant but is must be given for consistency. This may be the first event recorded for some assessments.

Start observing Aid

Pid

KeyId

Observer identity

This is for cases where observable behaviour, rather than a response to a question or assignment, is being assessed. This would apply to competence assessment of vocational skills, observation-based assessment of collaboration, etc. This event marks the start of an observation. In some senses, this is comparable to a submission, and the KeyId has equivalent purpose.

Stop observing KeyId Signals that a period of observation has ended. Depending on the situation, outcomes may be determined after “stop observing” or between the start and end of an observation period.

Outcome determined

OutcomeLevel

OutcomeStatus

Identifier(s) for assessment, part, or items (according to value of OutcomeLevel)

Array of OutcomeLists

Assessor identifier

Array of keyIds

OutcomeLevel indicates whether the outcome is for an assessment, or an assessment part, or comprises a set of item-level outcomes. For cases of double marking, two outcomes would arise from one submission. An assessment-level outcome may arise from multiple submissions/observations, so multiple keyIds may be required. Conversely a single submission of all assessment items of may lead to multiple “outcome determined” events, for example if objective questions are mixed with those requiring text responses and human marking. The state following this event is equivalent to the “final” value of sessionStatus in the QTI Results Reporting specification. NB: this does not mean that the outcome is finalised for the assessment; moderation or penalties may apply.

Outcome adjusted

{as outcome determined}

AdjustmentReason

This allows for various kinds of adjustment to be applied, where the officially-recorded outcome includes a penalty or bonus, e.g. a lateness penalty. The pre-adjusted outcome is a more accurate indication of the student’s ability. This means that adjustments due to moderation (etc) are not “outcome adjusted” events (see the explanation for OutcomeStatus attribute below).

Table 1 – Core Model Events

Attribute Notes

Adjustment reason

Enumeration Lateness, additional credit.

Assessor identity An identifier of the person or software responsible for

Page 12: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

10

declaring the outcome. Although the word “assessor” is used, this could also apply to cases where an outcome declaration is essentially a ratification or verification act.

Identifier for the hint resource

Item response An array of {interaction type, cardinality, interaction response, response type}

Based on the IMS QTI item/interaction model and vocabulary (see below).

NumAttempt Integer The index number of the attempt according to the delivery engine, if known. IMS QTI compliant systems are required to maintain this information but it is not reasonable to expect all tracked applications to do so.

Observer identity An identifier of the person or software responsible for observing the performance. There will usually be an outcome-determined record with the same agent.

OutcomeList An array of {outcome label, outcome value, outcome data type, outcome reference}

One or more outcomes, which may be associated with an assessment, part or item. This is substantially modelled on IMS QTI (see below).

OutcomeExpected Boolean An outcome is expected for this submission. A value of false would apply to a questionnaire. Although this could be inferred from the identity of the application (see “Identifiers”, above), an explicit attribute avoids the need for look-up tables.

OutcomeLevel Enumerated value (assessment, part, item)

Specifies whether the outcome record contains outcomes for individual items (typically item scores), an outcome for an assessment part, or an outcome for the entire assessment.

OutcomeStatus Enumerated value (provisional, final)

This allows for provisional outcomes to be appropriately marked, and final outcomes, determined by processes such as verification, ratification, moderation, etc. to be clearly identified.

Resume Boolean A flag to indicate previously saved responses were restored to the items presented in the Access event.

KeyId A unique identifier for the submission or observation, usable to ensure correlation of submission/observation with outcomes for cases where outcome processing is delayed and multiple submissions/observations are permitted. Timestamps should allow this correlation to be inferred; this identifier ensures the correlation is accurately known.

Table 2 - Attributes for Core Model Events

Item Response Details

IMS QTI models an assessment item as one or more interactions. In many online quizzes or

questionnaires, there will only be a single interaction, for example a single-selection multiple choice.

Multiple interaction items allow for cases where the score for an item depends on the response to

both interactions. Hence an Item Response is an array, although commonly of length 1.

The base set of interaction types (which may be extended if necessary) is as defined in the IMS QTI

2.1 ASI Information Model. The cardinality value, drawn from the IMS QTI enumeration, captures the

difference between, for example, a multiple choice where only one option may be chosen and

where multiple options may be chosen; although these are often described as different “question

types”, the QTI approach of using interaction types with a qualifier is adopted.

Page 13: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

11

The interaction response type should be drawn from the baseType enumeration of IMS QTI and the

encoding of the response should follow the QTI specifications for how choices, pairings, etc. are

expressed.

Some responses may be files18, for example an essay, photograph, presentation slides, video, etc.

Outcome List Elements

For assessment based on objective questions, the computation of assessment-level outcomes from

item-level scores follows from mathematical formulae or algorithms, but for human-marked or

observation-based assessment there may be an equivalent structure in the form of a marking

scheme. Marking schemes have a role in learning analytics of non-objective assessment as they

naturally indicate strengths and weaknesses in relation to the intention of assessment. Marking

schemes may take a variety of forms:

Marking guide in which the marker can freely assign a score against several dimensions of

quality up to a specified maximum.

A rubric19, a matrix approach to scoring performance in essays, competency examinations,

etc. The matrix defines various dimensions of quality (attributes being assessed) and several

descriptions of typical performance against each dimension that match certain scores or

level values. For each dimension, the human marker chooses the description that best

matches their subject to determine the score/level to assign to the outcome that

corresponds to the dimension.

Checklist of criteria, in which a yes/no decision maps to a non-zero/zero score for each

criterion. This can be viewed as a special case of a rubric.

The approach to outcomes should allow for a level of detail beyond a simple summative score for

both objective item-based assessment and human-marked assessment against a scheme of some

kind. This leads to the following approach to capturing outcome information.

Each outcome for an assessment etc, and there may be several outcomes, comprises:

outcome label: a name for the outcome, which may be specific to an assessment, testing

application, institution, etc. This includes the outcome variable names, and their specified

usage, as defined in the QTI specification (SCORE, DURATION20 and PASSED). The approach

of using section or part identifiers as prefixes when designating section or part SCOREs etc,

as described in the QTI specification, is not necessary since the scope of the outcome label is

given by the combination of “outcome level” and “identifier” attributes of the “outcome

determined” event. Outcome labels may be used to identify dimensions of quality in a

marking guide or rubric.

outcome value: the value of the outcome, e.g. an integer or decimal score, a letter grade,

etc.

outcome data type: the data type of the outcome value, using the baseType enumeration

from IMS QTI.

18

In practice, these are expected to be stored in the system handling the submission and not to be duplicated into an analytics data store. Consequently, the assumption is that a URL will be used to locate the response. 19

The word “rubric” is also often used to refer to instructions given to candidates in an assessment. 20 DURATION should only be used when the test delivery engine tracks time spent.

Page 14: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

12

outcome reference an optional reference, by URI, to an externally-defined21 learning

objective, performance criterion, etc. This should be interpreted as an imprecise mapping,

and not necessarily as an indicator that the external outcome was achieved.

It may be useful to determine which items were not answered (this is common in applications of

classical test theory), or to generalise this to assessments. An outcome label of MISSING is reserved

for this purpose. Items that are not answered should use this outcome label and other cases where

response(s) have not been submitted but where an outcome is recorded may use MISSING.

It is commonplace for assignments to be subjected to some form of originality evaluation, commonly

referred to as plagiarism detection software. Strictly speaking, the determination of plagiarism is

generally a human judgement informed by the results of an automated originality evaluation. The

result of originality evaluation may be captured using the reserved outcome label of “ORIGINALITY”.

In addition, software to support the electronic management of assessment often supports the use of

“comment banks”, i.e. pre-defined comments that may be selected by the marker. Since comments

are usually used when a weakness is identified, these allow for patterns of weakness to be explored

within a cohort, or between cohorts undertaking the same assessment task, etc. To accommodate

this use case an outcome label of “COMMENT_TAGS” is proposed to contain a space-separated list

of comment tags (assessment-scoped comment identifiers) to be associated with the outcome.

Possible Missing Pieces

These are features omitted from the Core Model but which may have some merit, and may need

further thought or discussion.

Activity Type

It may be useful in practice to know what stereotype the delivery application conforms to since this

will indicate the event patterns expected. There is likely to be some variation so an activity type

indication is likely to be just an “indication”, a hint. This is left as a missing piece because it is not

clear how to balance specificity of type vs number of type definitions required. At one end of the

spectrum, it may simply hint at the stereotype – e.g. be a label such as quiz, assignment, survey,

grading, etc. – and at the other becomes equivalent to an identification of the application, or maybe

further broken down. Indeed, it may be most useful to simply declare that Activity Type is an

application-specific vocabulary, so to allow analytics scripts to contain code to handle similarities

and differences according to local rules; attempts to specify stereotypes may be doomed.

Requesting Explanations

Some delivery systems can provide post-submission explanation that is intended to help the

candidate understand the problem with an incorrect response. It may be useful for the request of

such information to be recorded. Note: the act of requesting an explanation is assumed to be

educationally-relevant, rather than this information being automatically provided in the normal

course of events. The QTI endAttemptInteraction would lead to logging of a “request explanation”

event, in addition to submission and outcome events.

21

External to the assessment; this may be a reference to an intended learning outcome in an educational establishment’s module specifications, or to a state-wide educational standard.

Page 15: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

13

Sequence Index of Items

Since items may sometimes be shuffled, it may be useful to record the sequence index of items to

indicate the actual order of presentation. This could be achieved by Iid ordering or a sequence index

being added to the Access event.

Assessment/item Metadata

Although a general principle of minimising metadata was advanced in the section “Minimal

Assessment Resource Metadata, No Assessment Resource Content”, this may have been applied

over-rigidly. In some cases, it may be practical to avoid adding additional attributes. For example an

outcome label MAXSCORE is indicated in the QTI documentation, but not referred to in the core

model above us.

Event Patterns, State Transitions and Lifecycles

The Core Model has avoided dealing with event sequences and their relationship to delivery system

state-transitions and the lifecycle of assessment and related processes. Practical implementation

work on capturing these events should attend to this temporal aspect. It is likely that there are some

recurrent patterns that are shared by similar kinds of activity; it would be useful to gather these to

support convergent practice, but it is felt to be over-speculative to propose such patterns in the

absence of evidence.

There is one existing state model in the IMS QTI 2.1 specification; it defines a state model for

compliant delivery engines, as well as an enumeration for sessionStatus in the QTI results reporting

specification (as indicated in Table 1). While not all “assessment and allied activities” will be QTI

compliant, and not all delivery engine transitions are necessarily useful for learning analytics – with a

learner/learning focus rather than a delivery-engine focus – a mapping would be generally

informative as well as being of particular relevance to QTI implementations.

4. Current Standards The word “standards” is used loosely to include proposed generic data storage/access patterns.

PSLC DataShop

The Tutor Message format (v4) was consulted22, and found to contain a few details on assessment-

related semantics:

The semantic_event accommodates RESULT, ATTEMPT, and HINT_REQUEST. A further free-

form 30 character string permits expression of a subtype to the semantic event.

The action_evaluation element has preferred values that include CORRECT and INCORRECT.

TMF includes a skill data element, intended to associate a “knowledge component” (a

concept specific to intelligent tutoring systems) with other data.

The mapping from existing TMF data to the Core Model is quite minimal. Alternatively, the Tutor

Message Format is general purpose, and it may be possible to profile it (describe how it should be

used, with definition of appropriate vocabularies) to fully express the Core Model.

22 http://pslcdatashop.web.cmu.edu/dtd/guide/tutor_message_dtd_guide_v4.pdf

Page 16: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

14

ADL Experience API (xAPI)

The xAPI does not specify vocabularies for the events given in the Core Model but it does specify

data structures for interactions and outcomes/results and includes features that allow vocabularies

for event types, which are referred to as activity verbs in the xAPI specification. ADL and the xAPI

community expects that these vocabularies will published online, separately from the core

specificaiton.

Concerning the Built-in Features

The relevant built-in features are listed below, with section references referring to xAPI v1.0.1

documentation.

Interaction Activities (section 4.1.4). These are limited to interactions as defined in SCORM

and this part of xAPI specifies how to describe the activity rather than the user’s activity.

A Result object (section 4.1.5). This “represents a measured outcome related to the

statement in which it is included.” This breaks the principle of atomic events (see the section

“Assumptions”, above) since it includes the result as part of another statement of activity. In

addition to breaking the principle – which is not, of course, un-challengeable – such bundling

is not appropriate for some of the assessment and allied activities indicated in the

introduction to this document, for example when assessment presentation, submission, and

scoring are quite separate events, each with their own attributes.

A Score object (section 4.1.5), which is part of the Result object. This handles only a single

numerical outcome.

It would be possible to use Result and Score to capture some of the information in the Core Model in

some situations but, relative to the Core Model, a considerable loss of information would occur.

Correspondingly, it would not be possible to extract sufficient information from an xAPI statement to

express the events as per the Core Model. To get around this problem would require the addition of

quite a few extensions, essentially to capture a series of component events within an umbrella

assessment activity record.

In conclusion: if using xAPI to capture detailed assessment (and related) events, Result should be

avoided and externally-defined vocabularies preferred. This will give a more uniform approach than

using Results extensions, since it avoids making assessment a “special case”.

Concerning Externally-defined Vocabularies

The Tin Can Registry23 and the ADL xAPI vocabulary24 list event verbs that may be used with xAPI,

including “saved” and “submitted” (http://activitystrea.ms/schema/1.0/{save,submitted}), which are

borrowed from Activity Streams, and “answered” (http://adlnet.gov/expapi/verbs/answered). These

map on to the Core Model in the case when the object is an assessment or related activity. Another

verb is “completed”, also borrowed from Activity Streams, but the semantics of “completed” are not

a perfect match to the submission of an assignment; it is submission that would be tracked in

practice. There is also a verb, “viewed” (http://id.tincanapi.com/verb/viewed), and “resumed”, that

could be correlated with “Accessed” in the Core Model. The verbs “passed”, “failed”, and

“mastered” are essentially special cases of the Core Model concept of an outcome; they are

23

https://registry.tincanapi.com/#home/verbs 24 http://adlnet.gov/expapi/verbs/

Page 17: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

15

insufficient to cope with variety but could be used in alongside, as commonly understood summative

outcomes.

An application profile for capturing assessment and allied events would have to nuance these

definitions. It may be cleaner to coin new verbs for the Core Model event types and this would be

necessary for some of them in any case.

The Tin Can Registry also includes a recipe entitled Checklist Performance Observation25, which

describes how a session of pass/fail assessments of a set of predetermined tasks would be recorded

using the Experience API with standard verbs, with their particular interpretation signalled by an

identifier for the recipe. This matches one of the use cases for the Core Model, and illustrates one

way in which the other use cases could be mapped to xAPI. It also suggests that the Core Model

could be taken forward as a series of recipes using common concepts to address various use cases;

the Core Model could remain as a resource for new recipes, but the existence of defined recipes

would make it easier to select templates in common situations.

It would require working through quite a few use-case driven examples to clarify the relationship

between the core model and xAPI + vocabularies, but the tentative conclusion are that:

Both could be used for self-contained objective tests with simple numerical outcomes and

basic skill assessments.

It would be possible to create new verbs and usage recipes for xAPI along the lines of the

Core Model.

It would be necessary to use the xAPI extension feature to transport all of the attributes in Table 2,

again with a requirement to create at least one URI for unique identification of type.

IMS LIS

Overall, the scenario of use of LIS Outcomes is very different to the event tracking approach of the

Core Model. LIS is concerned with data synchronisation between student record systems and

learning management systems, and so is concerned with few high stakes summative outcomes.

Never-the-less, these are of interest for learning analytics, so it might be useful to capture only these

summative events into the same data store as, for example video use, even if assessments are not

tracked in detail. LIS outcome data should also correlate precisely with final summative Core Model

outcome events.

The points of contact between the Core Model and LIS Outcomes are summarised. As noted in

“Identifiers”, the assessment entity in the Core Model is identical to the LineItem concept as defined

in IMS LIS (Outcomes Management Service). If replicating LIS interactions as tracking events26, the

Core Model would be used to capture the Result on completion of CreateResult(),

CreateByProxyResult(), ReplaceResult(), and UpdateResult(). OutcomeStatus is an equivalent of LIS

statusofResult, and not lineItemType, which conceptually aligns with the outcome label.

25

https://registry.tincanapi.com/#profile/20/recipes 26

This should not be understood as a recommendation that LIS web service calls be replicated as tracking events; LIS activity is unlikely to be synchronous with learner events.

Page 18: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

16

The Core Model does not include an equivalent to the LIS ResultValue, which gives the permitted

range of outcomes, consequential to the principle of minimal metadata in the Core Model.

IMS Caliper

Work on IMS Caliper27 is in progress and will be published by IMS on completion. Subject to this

occurring during 2015, this “draft for public comment” document will be revised accordingly.

MOOCdb

MOOCdb28 includes some support for assessment and the documentation states:

“Due to the online nature of submissions, assessments are handled in different ways. Assessments

could be done by the computer via simple check mechanisms or automated algorithms, peer review,

evaluation by instructors and/or graders. For some courses multiple assessors are used. The MOOCdb

schema captures this [these] situations.”

MOOCdb is, however, very much focussed on the submission event and a minimal representation of

outcome (it uses “assessment” to refer to the outcome) as a single floating point number in the

range 0-1. It also appears to lack information, such as an outcome status, that would be necessary in

multiple-assessor, or staged assessment, scenarios.

It appears to be possible to express MOOCdb data in the Core Model except that:

MOOCdb lacks any specification of interaction types and response format.

MOOCdb includes assessment structure and metadata (e.g. deadline, weighting) that were

intentionally not included in the Core Model.

Expressing Core Model in MOOCdb would be limited to the MOOCdb Submissions and Assessment

tables (the Problems table contains structure and metadata) and would only be possible if the

SCORE outcome is used. This transformation would lose quite a lot of information in many scenarios.

5. Source Material

Technical

ADL Experience API (xAPI)

This is sometimes known as Tin Can API, from the project that initially developed it. The developers

continue to provide information and software support to adopters.

Core resources are:

The Experience API 1.0.1 specification - http://www.adlnet.gov/tla/experience-

api/technical-specification/

The ADL xAPI vocabulary - http://adlnet.gov/expapi/verbs/

The Tin Can Registry - https://registry.tincanapi.com/#home/verbs

IMS LIS (Learner Information Systems)

27

http://imsglobal.org/caliper 28 http://moocdb.csail.mit.edu/wiki/index.php?title=MOOCdb

Page 19: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

17

LIS is comprised of a number of parts, supporting the principal data exchanges between student

record systems and learning management systems. Only the “outcomes service” is relevant to this

study.

The original LIS v1.0 specification -

http://www.imsglobal.org/lis/lisv2p0p1/OMSInfoModelv1p0.html.

A public draft v1.0 showing a compatible use with IMS LTI -

http://www.imsglobal.org/lti/ltiv1p2pd/ltiOMIv1p0pd.html

IMS QTI (Question and Test Interoperability)

Whenever “IMS QTI” is written in this document, the reference should be understood to refer to

version 2.1. It is available from http://www.imsglobal.org/question/. Particular sections of relevance

are:

Implementation Guide

Assessment Test, Section and Item Information Model (ASI)

Results Reporting

References

Ellis, C., 2013. Broadening the Scope and Increasing the Usefulness of Learning Analytics: The Case for Assessment Analytics. British Journal of Educational Technology. Available at: http://eprints.hud.ac.uk/16829/1/Ellis_BJET_submission.docx [Accessed July 11, 2014].

Feng, M., 2014. Towards Uncovering the Mysterious World of Math Homework. In Proceedings of the 7th International Conference on Educational Data Mining. pp. 425–426. Available at: http://educationaldatamining.org/EDM2014/uploads/procs2014/posters/101_EDM-2014-Poster.pdf [Accessed November 12, 2014].

Günther, C.W. & Verbeek, E., 2014. XES Standard Definition v2.0. Available at: http://www.xes-standard.org/_media/xes/xesstandarddefinition-2.0.pdf [Accessed April 29, 2013].

Pechenizkiy, M. et al., 2009. Process Mining Online Assessment Data. In T. Barnes et al., eds. Proceedings of the 2nd International Conference On Educational Data Mining. pp. 279–288.

Trcka, N., Pechenizkiy, M. & van der Aalst, W., 2011. Process Mining from Educational Data. In C. Romero et al., eds. Handbook of Educational Data Mining. CRC Press, pp. 123–142.

Page 20: Learning Analytics Community Exchange - LACE Projectlaceproject.eu › publications › public-drafts › wp7-assess.pdf · Double marking, moderation, and other managed quality assurance

Interoperability Study – Assessment And Allied Activities

18

6. About ...

Acknowledgements

The author would like to thank Brian Kelly and Tore Hoel for reviewing the v0.2.1 draft.

This document was produced with funding from the European Commission Seventh

Framework Programme as part of the LACE Project, grant number 619424.

About the Author

Adam works for Cetis, the Centre for Educational Technology and Interoperability

Standards, at the University of Bolton, UK. He rather enjoys data wrangling and hacking

about with R. He is a member of the UK Government Open Standards Board, and a

member of the Information Standards Board for Education, Skills and Children’s

Services, and is a strong advocate of open standards and open system architecture.

Adam is leading the workpackage on interoperability and data sharing.

About this document

(c) 2014, Adam Cooper.

Licensed for use under the terms of the Creative Commons Attribution v4.0

licence. Attribution should be “by Adam Cooper, for the LACE Project

(http://www.laceproject.eu)”.

For more information, see the LACE Publication Policy: http://www.laceproject.eu/publication-

policy/. Note, in particular, that some images used in LACE publications may not be freely re-used.

This is a public draft document for comment; the latest version and an explanation of how

to comment is available from: http://www.laceproject.eu/dpc/assessment-events-

learning-analytics-interoperability-study/. The final version will be linked-to from there.

About LACE

The LACE project brings together existing key European players in the field of learning analytics &

educational data mining who are committed to build communities of practice and share emerging

best practice in order to make progress towards four objectives.

Objective 1 – Promote knowledge creation and exchange

Objective 2 – Increase the evidence base

Objective 3 – Contribute to the definition of future directions

Objective 4 – Build consensus on interoperability and data sharing

http://www.laceproject.eu @laceproject