section a: conducting assessment - · pdf filethrough the theory test whereas the actual...

Specialist Diploma in Applied Learning and Teaching

Module: Conduct and Review of Assessment

© 2013 Adrian Chow All rights reserved Page | 1

Section A: CONDUCTING ASSESSMENT

Purpose of Practical Test

The driving test (theory and practical) constitute a performance assessment that encompass 3 dimensions: a) the knowledge b) skills c) affective components which requires different test points, for example remembrance (Bloom Taxonomy) level i.e. traffic signs will be assessed through the theory test whereas the actual psychomotor skills of engaging gear and brake and attitude (road safety) will be assessed through the practical test.

Practical Driving Test (PDT) purpose centred on Assessment of learning on the intended learning outcomes i.e. surfacing of evidences that cannot be achieved in the theory tests segments. For example, the candidate reaction under “real traffic” conditions i.e. overtaking of vehicles, different gears for complementing speed under heavy / light traffic. Here, the PDT serve to maintain standards as per stakeholders requirements (government stipulations on certification of Class 111 drivers) for certifying students have achieved certain standards and selection of students (differentiating competent and incompetent drivers).

The PDT complements the basic and advanced theory written tests based on summative assessment on an OBE (outcome based evidence) basis of the nature of the subject matter being tested where the candidate need to demonstrate skills of all 3 dimensions given the objective of assessment focus on ensuring “road-worthy” of the candidate. Here, the candidate must be able to demonstrate the different dimensions skills which can be interlinked, for example, checklist item 41 requires the candidate to demonstrate cognition skills (understanding of pedestrian crossing) and attitude of pedestrian safety as attaining road-worthy status centred on the candidate ability to practice safe driving i.e. non danger to oneself and others road users.

In addition, PDT merge with formative assessment should failure occurs given the presence of elements of Assessment for learning i.e. assessor and candidates diagnosis of performance gaps and specific ways of improvement from the actual performance on road.

Strengths of written test modes

Assessment can be done for large group and less effort is required for grading, for example using MCQ as test format

Enable testing of different levels of learning order, for example, remembrance (traffic signs), application (based on scenario)

Higher validity through assessing specific skills linked to learning outcome, for example, question aiming at understanding level: “Do you pull handbrake while parking?

Lower subjectivity when applying restricted response answers i.e. “What is the speed limit when driving in expressway”?




Ensure candidate understanding of traffic signs, reaction to typical traffic scenarios for certified competence prior to actual “practical applications” for enhancing candidates preparedness on road as safety both candidates and other road users is of utmost importance.

Limitations of written test modes

May be open to guessing (MCQ) and bluffing (short answer questions)

Validity and reliability of test modes subject to expertise of questions developer i.e. ability to draft questions centred on specific learning outcomes.

Not appropriate for assessing psychomotor and affective skills, for example, coordination of hand and feet actions when engaging gear and display of safety attitudes (safe following distance of front vehicle)

Strengths of practical test modes

Enable assessment under “real-life” situations where the skills attained are applied for preparation of life i.e. student’s ability to drive a car safely and competently in traffic.

It focuses on both process and product assessment i.e. specific tasks (reversing, braking) and safe and competent driving both in-circuit and on the road via a detailed industry standards checklist for recording and scoring with clear marking scheme

Provides motivation for students as the assessment encourages the application of learning under natural setting i.e. students are able to see the process and product outcomes of attained skills i.e. engaging of clutch, monitoring of traffic via side mirrors that its of relevancy to them in the role of a driver.

Evaluation of complex learning outcomes and skills that cannot be evaluated with written tests through extended performance tasks that is boarder, assessment of board range of skills and less structured. For example, making a safe U turn in heavy traffic and rainy condition requires interaction of knowledge of reading traffic, vehicle handling skills and safety attitude.

Limitations of practical test modes

Require considerable time and effort to administer and grade, for example the assessor must ascertain observation of correct and




incorrect actions cum attitudes of the candidate for supporting his awarding of comments and penalty points while ensuring safety of him and the candidate given actual movement of vehicle.

Reliance on the integrity, expertise of assessor i.e. subject matter expert (SME) of stipulated knowledge, skills, abilities (KSA) of the checklist and sound judgement in assessing attitudes domain i.e. is it nervousness or lack of awareness for wrong gear usage.

Judgement and scoring may be subjective due to halo, horn effect (impression of candidate) and bias (stereotyping of lady drivers)

Issue of non-catered events, for example assessor fatigue, condition, familiarity, breakdown of car and impact of other drivers i.e. giving way or obstructing candidate manoeuvres.

Use of the Checklist

The checklist serves as a rubric for evaluating the quality of candidate response for knowledge, skills and affective components (performance skills) in the PDT.

The checklist enables distinguishing of acceptable responses from unacceptable responses i.e. mount kerb (evaluative criteria), quality definitions (the way that qualitative differences in students’ responses are to be judged) i.e. strike kerb (10 points penalty) versus mount kerb (immediate failure and scoring strategy that encompass both analytical (judgement on each criterion; roll backwards) and holistic (based on overall impression, tasks that are more than the sum of individual components i.e. candidate may be proficient in psychomotor skills but lack required safety attitude) scoring approach.

Here, the checklist facilitate determination of evidence from the candidate display of high level skills and tasks that are important and authentic (resembling real-world challenges) i.e. able to drive a car in a safe manner under real traffic conditions independently.

Explanation for how the checklist helps the candidate.

The checklist enables the candidate to be aware of the areas and weightage of performance to be tested (validity and fairness) that is based on the learning outcomes of the course. For example, proceed on red light rank as immediate failure given the perceived safety element versus that of incorrect braking technique (demerit of 2 points)

Deep learning occurs as the candidate is able to evaluate his/her own performance against the stipulated performance tasks which support assessment for learning (feedback of tester at the end of PDT) and assessment as learning (students use assessment information to make adaptations to their learning process and to develop new understanding of driving ).

Increase motivation for candidates through clarifying assessment




goals and making learning more meaningful (learning for life) i.e. relevancy to future pursuits (as competent driver).

Explanation for how the checklist helps the tester.

The checklist with scoring is useful for assessment of learning because they contain clear attribute i.e. qualitative description underlying characteristics of each performance criteria linking to the learning outcomes.

The checklist serves as “roadmap” for the tester to unearth evidence via observation of candidate performance in a sequential manner i.e. circuit assessment and on-road assessment. In addition, vital tasks are repeated both in the circuit and on-road assessment i.e. focusing on safety element (fail to signal in good time)

The checklist facilitates the tester towards representation of performance aspects that are necessary for successful performance and applicable in various contextual settings.

Tester can utilise the checklist for diagnosis purpose i.e. feedback to candidate on performance and highlight the performance gaps that need to be fulfilled in the event of both pass and fail.

The checklist enables the tester in awarding of pass/fail status of candidates through documentation of observed skills for maintaining of standards as per stakeholders requirements (traffic police).

Limitations of the checklist as an assessment rubric lies in :

Misinterpretation of the language used, issues of inter and intra reliability given different perspectives of what constitute a mistakes or safe driving from the different levels of perceptions by assessors

Issue with subjectivity from “halo and horn effect” i.e. first impression, initial performance and stereotyping (woman drivers, teenagers).

Require the assessor to be “expert” in the subject matter domain as the scoring is based on judgement of performance by the candidates. For example, assessor must be able to judge response of candidates based on varying road conditions and be able to take a holistic viewpoint of what encapsulate safe and competent driving.




Section B: INTERPRETING STUDENT RESPONSES

Number of markers 3

Number of responses in Batch 1

30 Number of responses in Batch 2

30

1st IRR coefficient score (Batch 1)

0.567

2nd IRR coefficient score (Batch 2)

0.615

Benchmarking Excel File for Batch 1

See attached Excel File

Benchmarking Excel File for Batch 2

See attached Excel File

Critique i. Analyse the significance of the variation in the IRR

coefficient scores.

The IRR coefficient score has increased from 0.567 (Batch 1) to 0.615 (Batch 2)

The variation of the score arises from increase inter-rater and intra –rater reliability as the benchmarker goes through the response where there are differences in marks awarded by the markers (Questions 4 and 28). Here, there are subjectivity in the interpretation of questions and perception of the rubrics which give rise to the variation. Batch 2 show a higher IRR score as the markers gain better understanding of the questions and rubrics, resulting in different interpretation of the student’s response (intra-rater reliability).

ii. Discuss the implication on the effectiveness of the

benchmarking exercise.

The benchmarking exercise enable resolution of subjectivity factor of interpretation of questions, answers and rubrics by giving markers opportunities to view others agreements / disagreements of interpretation of students performance




through highlighting the differences in scoring of the various questions which facilitate the development of a common understanding of the question and the marking scheme.

The role of the benchmarker as a neutral party permit voicing of rationales in marks allocation given the markers’ different level of subject matter expert (SME) of the topics which enable the markers to reflect on his initial interpretation thus achieving higher intra-rater reliability given new “understanding”.

iii. Justify possible reasons for the observed IRR

coefficient scores. Reasons for the observed coefficient score centred on: :

Different interpretation cum understanding of the questions, answers and rubrics by markers.

Issues of too lenient or stringent in marking

Expertise of the marker in assessing due to lack of knowledge of topic and uses of assessment rubrics.

Surfacing of subjectivity in marking i.e. halo and horn effect from impression of initial answers (first few questions)

Human variables i.e. fatigue, overzealous effort of assessors in adhering to self conceived “fairness” in marking.

iv. Suggest EITHER how to improve the benchmarking

exercise OR what can be repeated for successful benchmarking exercises in future.

The benchmarking process may be improved through understanding the objectives of the benchmarking exercise i.e. achievement of inter-rater coefficient to near 0.7 for reliability of marking

Principal assessor to facilitate discussion on the differences and refining of rubrics with consensus on awarding of marks.

Ensure equality in voicing of opinions i.e. danger of dominant member who impose views via designation or subject expert




Agreeable selection of benchmarker, preferably knowledge expert in course by members of the team for minimising conflicts and directing of benchmarking exercise towards fulfilment of objectives

Minimise human errors by prevention of assessment fatigue through selecting appropriate sample of population size and non pollution from irrelevant variables i.e. comparison with past cohorts answers or best answers

Benchmarking Excel File for Batch 1 and Batch 2

Question Batch 2 Scores Batch 1 Scores

1 1 2 2 1 2 2

2 2 2 2 2 2 2

3 1 1 1 1 1 1

4 1 1 1 2 1 2

5 1 1 1 1 1 1

6 1 1 1 1 1 1

7 1 2 1 1 2 1

8 2 2 2 2 2 2

9 2 2 1 2 2 1

10 2 2 2 2 2 2

11 1 0 0 1 0 0

12 2 2 2 2 2 2

13 1 1 2 1 1 2

14 2 1 2 2 1 2

15 2 2 2 2 2 2

16 0 1 0 0 1 0

17 2 2 2 2 2 2

18 2 2 2 2 2 2

19 1 1 1 1 1 1

20 2 1 2 2 1 2

21 2 2 2 2 2 2

22 0 1 0 0 1 0

23 2 2 2 2 2 2

24 1 1 1 1 1 1

25 2 2 2 2 2 2

26 1 1 1 1 1 1

27 1 1 1 1 1 1

28 1 1 1 2 1 1

29 1 1 1 1 1 1

30 2 0 2 2 0 2




30

Section C: ANALYSING STUDENT RESULTS

Cronbach’s Alpha Value

Type in the Cronbach’s Alpha Value

0.517 (Set E)

Type in your analysis of the reliability of the test

Item-Total Statistics

Cronbach's Alpha if

Item Deleted

Q1 .231

Q2 .226

Q3 .838

Q4 .404

The reliability of the score lean towards the lower range as higher value reflects more reliability. Analysis of the score indicates Q1 and Q2 adds to the reliability of the total test scores i.e. removing them will reduce the reliability to .231 (Q1) and .226 (Q2).

Question 3 do not add to the reliability ratio and removing it will push the total score to a good reliability ratio of .838

Q1

Type in your calculations for the test results of your given dataset for Q1

Marks %

Q. NO. 1 2 3 4

Q1 18% 64% 0% 0%




Describe its level of difficulty

Test Specifications

Full Marks of Question

Competent Level

Proficient Level

Advanced Level

Q1 2 1 1

Q2 2 1 1

Q3 4 2 2

Q4 2 1 1

Total Marks 10 3 4 3

Question 1 indicates 82% of students attained at least 1 mark and 64% of students attained 2 marks which indicate the question is set at an appropriate level of difficulty.

Q2


Marks %

Q. NO. 1 2 3 4

Q2 0% 64% 0% 0%


Question 2 indicates 64 % of students attained 2 marks which aligned to the test specification table of 2 marks for advanced level status. This lend credence of its Cronbach's Alpha i.e. removing this question will lower the reliability of total test scores.

Q3


Marks %

Q. NO. 1 2 3 4

Q3 20% 20% 20% 20%


Question 3 indicate 80% of students achieving 1 mark , 60% achieving 2 marks, 40% achieving 3 marks and 20% achieving 4 marks. The question is significantly above the intended level of




difficulty given only 20% of students achieving 4 marks.

Q4


Marks %

Q. NO. 1 2 3 4

Q4 34% 32% 0% 0%


Question 4 indicates 66% of students achieving 1 mark with 32% achieving 2 marks. The question is significantly above the level of difficulty as there are a high percentage of students (68%) who are unable to achieved 2 marks.

Type in your evaluation of how well the test differentiates students’ performance in accordance to the test specification.

The test does not fully aligned to the test specification objectives as Question 3 and 4 are set above the intended level of difficulty. This may occur from the quality of test questions or testing of outcomes that are not being communicated to the students during the instructional stage and acceptance of test results may require consideration of moderation.

There is a need to review Question 3 and 4 given the negative impacts on reliability of total test scores. The test raises concerns on students ability and question difficulty level i.e. low percentage of students achieving proficient and advanced level scores,(mismatch between expected and actual performance of students in the test) which raise questions of : Did the course achieved its objectives of equipping students with the needed knowledge, skills and abilities (transfer of learning).




Section D: PRESENTING AND COMMUNICATING STUDENT RESULTS

The process of communicating results to students should be guided by the reason for assessment and it’s critical for alignment between it and the approach being adopted for reporting.

The process of communication of results in the case encompass:

Reporter (Evaluative; Assessment of learning) which centred on the purpose of providing evidence of achievement for reporting results that occur at the end of key stages (summative assessment), judgments and reporting of students’ performance in relation to course standards through classifying and labelling of symbols (A, B, C, D or F).

Assessment as learning centred on developing and supporting students’ active participation in their own learning through involvement of students i.e. self assessment by students (viewing of marks awarded for each answer provided by students), opportunities to improve learning (feedback package containing common mistakes and samples of good answers).

Here, it seem the reason for assessment lies in summative evaluation of students understanding of a particular topic with the approach focusing on selection of students into “bands” of proficiency (A, B , C, D, F) . The approach appears to be clinical in nature with the onus of responsibility in learning from the results of the tests resting on students’ acumen in evaluating their “gaps of knowledge”.

In addition, there is a lack of qualitative feedback i.e. comments that will enhance the learning as not all students are capable of self directed learning and may require both hard and soft scaffolding strategies. For example, content related comments will enable students to evaluate their answers against the “samples of good answers” provided.

The case show high probability of absence of descriptive feedback approach (Assessment of learning) as seen in the lack of elaboration of what the students have or have not achieved




and non-existence of specific ways for improvement. This may derive from the summative nature of the test and it can be assumed the existence of a mechanistic approach for providing information on reassessment process to the students i.e. feedback package.

Good feedback pivot on offering students ‘direction for moving their learning forward’ (Earl, L.2003, p.90). This entails specifying what the students need to gain from the feedback that is aligned with course objectives and learning outcomes. Ways to improve the feedback process includes:

Adoption of ICE approach: getting students to reflect on the fundamentals, basic facts (ideas), drawing relationships and patterns between ideas (connections) and using new knowledge to extend ideas and concepts (extension). This require existence of a change agent for specifying ways of improvement via a diagnostic approach with the identification of learning gaps that are criterion-referenced with both teacher and student involvement. For example, the teacher as a “well of knowledge” to be tapped by students during the course of students reflection on reasons for “wrong answers” that they deemed as “correct answers”.

Complementing quantitative analysis with qualitative comments and diagnosis for structured and free essay towards facilitation of students’ reflection and opportunities to improve learning. For example, beyond stating marks, clear and constructive comments on appropriateness of answers, suggestion on other approaches in answering of questions.

Towards a learner centred environment, provides opportunities for students’ voice on the “forms” of feedback that will be conducive for enhancing learning. For example, overly complex terminology of comments, more specific criterion based feedbacks.




ANNEX A

Annex A.pdf

ANNEX B

Annex B.pdf

END OF PAPER

section a: conducting assessment - · pdf filethrough the theory test whereas the actual...

Documents