validity and reliability neither valid nor reliable reliable but not valid valid & reliable...

19
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of the purpose of testsand the consistency’ with which the purpose is fulfilled/met

Upload: joseph-oliver

Post on 23-Dec-2015

282 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Validity and Reliability

Neither Valid nor Reliable

Reliable but not Valid

Valid & Reliable

Fairly Valid but not very Reliable

Think in terms of ‘the purpose of tests’ and the ‘consistency’ with which the purpose is fulfilled/met

Page 2: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Validity Depends on the PURPOSE E.g. a ruler may be a valid measuring device for

length, but isn’t very valid for measuring volume Measuring what ‘it’ is supposed to Matter of degree (how valid?) Specific to a particular purpose! Must be inferred from evidence; cannot be directly

measured Learning outcomes

1. Content coverage (relevance?)2. Level & type of student engagement (cognitive, affective,

psychomotor) – appropriate?

Page 3: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Reliability

Consistency in the type of result a test yields Time & space participants

Not perfectly similar result but ‘very close-to’ being similar

When someone says you are a ‘reliable’ person, what do they really mean?

Are you a reliable person?

Page 4: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

What do you think…? Forced-choice assessment forms are high in reliability, but weak

in validity (true/false) Performance-based assessment forms are high in both validity

and reliability (true/false) A test item is said to be unreliable when most students answered

the item wrongly (true/false) When a test contains items that do not represent the content

covered during instruction, it is known as an unreliable test (true/false)

Test items that do not successfully measure the intended learning outcomes (objectives) are invalid items (true/false)

Assessment that does not represent student learning well enough are definitely invalid and unreliable (true/false)

A valid test can sometimes be unreliable (true/false) If a test is valid, it is reliable! (by-product)

Page 5: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Question…

In the context of what you understand about VALIDITY and RELIABILITY, how do you go about establishing/ensuring them in your own test papers?

Page 6: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Indicators of quality

Validity Reliability Utility Fairness

Question: how are they all inter-related?

Page 7: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Types of validity measures

Face validity Construct validity Content validity Criterion validity

1. Predictive

2. Concurrent Consequences validity

Page 8: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Face Validity Does it appear to measure what it is supposed to

measure?

Example: Let’s say you are interested in measuring, ‘Propensity towards violence and aggression’. By simply looking at the following items, state which ones qualify to measure the variable of interest: Have you been arrested? Have you been involved in physical fighting? Do you get angry easily? Do you sleep with your socks on? Is it hard to control your anger? Do you enjoy playing sports?

Page 9: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Construct Validity Does the test measure the ‘human’

CHARACTERISTIC(s) it is supposed to? Examples of constructs or ‘human’ characteristics:

Mathematical reasoning Verbal reasoning Musical ability Spatial ability Mechanical aptitude Motivation

Applicable to PBA/authentic assessment Each construct is broken down into its component parts E.g. ‘motivation’ can be broken down to:

Interest Attention span Hours spent Assignments undertaken and submitted, etc. All of these sub-constructs put together – measure ‘motivation’

Page 10: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Content Validity

How well elements of the test relate to the content domain?

How closely content of questions in the test relates to content of the curriculum?

Directly relates to instructional objectives and the fulfillment of the same!

Major concern for achievement tests (where content is emphasized)

Can you test students on things they have not been taught?

Page 11: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

How to establish Content Validity?

Instructional objectives (looking at your list) Table of Specification E.g. At the end of the chapter, the student will be able

to do the following:1. Explain what ‘stars’ are

2. Discuss the type of stars and galaxies in our universe

3. Categorize different constellations by looking at the stars

4. Differentiate between our stars, the sun, and all other stars

Page 12: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Categories of Performance (Mental

Skills)

Content areas

Knowledge Comprehension Analysis Total 1. What are

‘stars’?

2. Our star, the Sun

3. Constellations 4. Galaxies

Total Grand Total

Table of Specification (An Example)

Page 13: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Criterion Validity

The degree to which content on a test (predictor) correlates with performance on relevant criterion measures (concrete criterion in the "real" world?)

If they do correlate highly, it means that the test (predictor) is a valid one!

E.g. if you taught skills relating to ‘public speaking’ and had students do a test on it, the test can be validated by looking at how it relates to actual performance (public speaking) of students inside or outside of the classroom

Page 14: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Two Types of Criterion Validity Concurrent Criterion Validity = how well performance

on a test estimates current performance on some valued measure (criterion)? (e.g. test of dictionary skills can estimate students’ current skills in the actual use of dictionary – observation)

Predictive Criterion Validity = how well performance on a test predicts future performance on some valued measure (criterion)? (e.g. reading readiness test might be used to predict students’ achievement in reading)

Both are only possible IF the predictors are VALID

Page 15: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Consequences Validity

The extent to which the assessment served its intended purpose

Did the test improve performance? Motivation? Independent learning?

Did it distort the focus of instruction? Did it encourage or discourage creativity?

Exploration? Higher order thinking?

Page 16: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Factors that can lower Validity Unclear directions Difficult reading vocabulary and sentence structure Ambiguity in statements Inadequate time limits Inappropriate level of difficulty Poorly constructed test items Test items inappropriate for the outcomes being measured Tests that are too short Improper arrangement of items (complex to easy?) Identifiable patterns of answers Teaching Administration and scoring Students Nature of criterion

Page 17: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Reliability Measure of consistency of test results from one

administration of the test to the next

Generalizability – consistency (interwoven concepts) – if a test item is reliable, it can be correlated with other items to collectively measure a construct or content mastery

A component of validity

Length of assessment

Page 18: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

Measuring Reliability Test – retest Give the same test twice to the same group with any

time interval between tests Equivalent forms (similar in content, difficulty level, arrangement, type of

assessment, etc.)

Give two forms of the test to the same group in close succession

Split-half Test has two equivalent halves. Give test once, score

two equivalent halves (odd items vs. even items) Cronbach Alpha (SPSS)Inter-item consistency – one test – one administration Inter-rater Consistency (subjective scoring)Calculate percent of exact agreement by using

Pearson's product moment and find out the coefficient of determination (SPSS)

Page 19: Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose

How to improve Reliability?

Quality of items; concise statements, homogenous words (some sort of uniformity)

Adequate sampling of content domain; comprehensiveness of items

Longer assessment – less distorted by chance factors

Developing a scoring plan (esp. for subjective items – rubrics)

Ensure VALIDITY