chapter 6: selecting measurement instruments
DESCRIPTION
Chapter 6: Selecting Measurement Instruments. Objectives - PowerPoint PPT PresentationTRANSCRIPT
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.1
Chapter 6:Selecting Measurement Instruments Objectives
State the relation between a variable and a construct, and distinguish among categories of variables (e.g., categorical and quantitative; dependent and independent) and the scales to measure them (e.g., nominal, ordinal, interval, and ratio).
Define measurement, and describe ways to interpret measurement data.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.2
Selecting Measurement Instruments
Objectives Describe the types of measuring
instruments used to collect data in qualitative and quantitative studies (e.g., cognitive, affective, and projective tests).
Define validity, and differentiate among content, criterion-related, construct, and consequential validity.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.3
Selecting Measurement Instruments
Objectives Explain how to measure reliability,
and differentiate among stability, equivalence, equivalence and stability, internal consistency, and scorer/rater reliability.
Identify useful sources of information about specific tests, and provide strategies for test selection.
Provide guidelines for test construction and test administration.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.4
Data & Constructs
Data are the pieces of information you collect and use to examine your topic.
You must determine what type of data to collect.
A construct is an abstraction that cannot be observed directly but is invented to explain behavior. e.g., intelligence, motivation, ability
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.5
Constructs & Variables Constructs must be operationally-
defined to be observable and measurable.
Variables are operationally-defined constructs.
Variables are placeholders that can assume any one of a range of values.
Variables may be measured by instruments.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.6
Measurement Scales
The measurement scale is a system for organizing data.
Knowing your measurement scale is necessary to determine the type of analysis you will conduct.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.7
Measurement Scales
Nominal variables describe categorical data. e.g., gender, political party affiliation,
school attended, marital status Nominal variables are qualitative. Quantitative variables range on a
continuum with ordinal, interval, and ratio variables.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.8
Measurement Scales
Ordinal variables describe rank order with unequal units. e.g., order of finish, ranking of
schools or groups as levels Interval variables describe
equal intervals between values. e.g., achievement, attitude, test
scores
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.9
Measurement Scales
Ratio variables describe all of the characteristics of the other levels but also include a true zero point. e.g., total number of correct items on
a test, time, distance, weight
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.10
Independent & Dependent Variables Dependent variables are those
believed to depend on or to be caused by another variable. Dependent variables are also called
criterion variables. Independent variables are the
hypothesized cause of the dependent variable. There must be at least two levels of an independent variable. Independent variables are also called an
experimental variables, manipulated variables, or treatment variables.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.11
Characteristics of Instruments There are three major ways for
researchers to collect data. A researcher can administer a
standardized test. e.g., an achievement test
A researcher can administer a self-developed instrument.
e.g., a survey you might develop A researcher can record naturally-
occurring events or use already available data.
e.g., recording off-task behavior of a student in a classroom
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.12
Instruments Using standardized instruments
takes less time than developing an instrument.
With standardized instruments, results from different studies that use the same instrument can be compared.
At times researchers may need to develop their own instruments. To effectively design an instrument one
needs expertise and time.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.13
Instruments Tests are a formal systematic
procedure for gathering information about people. Cognitive characteristics (e.g.,
thinking, ability) Affective characteristics (e.g.,
feelings, attitude)
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.14
Instruments A standardized test is
administered, scored, and interpreted the same way across administrations. e.g., ACT or SAT or Stanford
Achievement test
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.15
Instruments
Assessment refers to the process of collecting, synthesizing, and interpreting information, including data from tests as well as from observations. Formal or informal Numerical or textual
Measurement is the process of quantifying or scoring assessment information. Occurs after data collection
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.16
Instruments Qualitative researchers often use
interviews and observations. Quantitative researchers often use
paper and pencil (or electronic) methods. Selection methods: The respondent
selects from possible answers (e.g., multiple choice test).
Supply methods: The respondent has to provide an answer (e.g., essay items).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.17
Instruments Performance assessments
emphasize student process and require creation of a product (e.g., completing a project).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.18
Interpreting Instrument Data Raw Score
Number or point value of items correct (e.g., 18/20 items correct).
Norm-referenced scoring Student’s performance is compared
with performance of others (e.g., grading on a curve).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.19
Interpreting Instrument Data Criterion-referenced scoring
Student’s performance is compared to preset standard (e.g., class tests).
Self-referenced scoring How individual student’s scores change
over time is measured (e.g., speeded math facts tests).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.20
Types of Instruments
Cognitive tests measure intellectual processes (e.g., thinking, memorizing, calculating, analyzing). Standardized tests measure individual’s
current proficiency in given areas of knowledge or skill.
Standardized tests are often given as a test battery (e.g., Iowa test of basic skills, CTBS).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.21
Types of Instruments
Diagnostic tests provide scores to facilitate identification of strengths and weaknesses (e.g., tests given for diagnosing reading disabilities).
Aptitude tests measure prediction or potential versus what has been learned (e.g., Wechsler Scales).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.22
Affective Instruments
Affective tests measure affective characteristics (e.g., attitude, emotion, interest, personality).
Attitude scales measure what a person believes or feels.
Likert scales measure agreement on a scale.
Strongly agree, Agree, Undecided, Disagree, Strongly disagree
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.23
Affective Instruments Semantic differential scales require the
individual to indicate attitude by position on a scale.
Fair Unfair 3 2 1 0 -1 -2 -3
Rating scales may require a participant to check the most appropriate description.
5=always; 4=almost always, 3=sometimes… The Thurstone Scale & Guttman Scales
are also used to measure attitudes.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.24
Additional Inventories Interest inventories assess personal
likes and dislikes (e.g., occupational interest inventories).
Values tests assess the relative strength of a person’s values (e.g., Study of Values instrument).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.25
Additional Inventories Personality inventories provide
participants with statements that describe behaviors characteristic of given personality traits and the participant answers each statement (e.g., MMPI).
Projective tests were developed to eliminate some of the concerns with self-report measures. These tests are ambiguous so that presumably the respondent will project true feelings (e.g., Rorschach).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.26
Criteria for Good Instruments Validity refers to the degree that the
test measures what it is supposed to measure.
Validity is the most important test characteristic.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.27
Criteria for Good Instruments There are numerous established
validity standards. Content validity Criterion-related validity
Concurrent validity Predictive validity
Construct validity Consequential validity
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.28
Content Validity Content validity addresses whether
the test measures the intended content area.
Content validity is an initial screening type of validity.
Content validity is sometimes referred to as Face Validity.
Content validity is measured by expert judgment (content validation).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.29
Content Validity Content validity is concerned with both:
Item validity: Are the test items measuring the intended content?
Sampling validity: Do the items measure the content area being tested?
One example of a lack of content validity is a math test with heavy reading requirements. It may not only measure math but also reading ability and is therefore not a valid math test.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.30
Criterion-Related Validity Criterion-related validity is
determined by relating performance on a test to performance on an alternative test or other measure.
Correlation coefficients are used to determine relative validity.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.31
Criterion-Related Validity Two types of criterion-related validity
include: Concurrent: The scores on a test are
correlated to scores on an alternative test given at the same time (e.g., two measures of reading achievement).
Predictive: The degree to which a test can predict how well a person will do in a future situation, e.g., GRE, (with predictor represented by GRE score and criterion represented as success in graduate school).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.32
Construct Validity Construct validity is the most
important form of validity. Construct validity assesses what
the test is actually measuring. It is very challenging to establish
construct validity.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.33
Construct Validity Construct validity requires
confirmatory and disconfirmatory evidence. Scores on tests should relate to scores on
similar tests and NOT relate to scores on other tests.
For example, scores on a math test should be more highly correlated with scores on another math test than they are to scores from a reading test.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.34
Consequential Validity
Consequential validity refers to the extent to which an instrument creates harmful effects for the user. Some tests may harm the test taker. For example, a measure of anxiety may
make a person more anxious.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.35
Validity
Some factors that threaten validity include: Unclear directions Confusing or unclear items Vocabulary or required reading ability
too difficult for test takers Subjective scoring Cheating Errors in administration
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.36
Self-Report Instruments
There are some concerns with data derived from self-report instruments. One concern is response set, or the
tendency for a participant to respond in a certain way (e.g., social desirability).
Bias may also play a role in self-report instruments (e.g., cultural norms).
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.37
Reliability
Reliability refers to the consistency of an instrument to measure a construct.
Reliability is expressed as a reliability coefficient based upon a correlation.
Reliability coefficients should be reported for all measures.
Reliability affects validity. There are several forms of reliability.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.38
Reliability
Test-Retest (Stability) reliability measures the stability of scores over time. To assess test-retest reliability, a test is
given to the same group twice and a correlation is taken between the two scores.
The correlation is referred to Coefficient of Stability.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.39
Reliability
Alternate forms (Equivalence) reliability measures the relationship between two versions of a test that are intended to be equivalent. To assess alternate forms reliability,
both tests are given to the same group and the scores on each test are correlated.
The correlation is referred to as the Coefficient of Equivalence.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.40
Reliability
Equivalence and stability reliability is represented by the relationship between equivalent versions of a test given at two different times. To assess equivalence and stability
reliability, first one test is given, after a time a similar test is given, and the scores are correlated.
The correlation is referred to as the Coefficient of Stability and Equivalence.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.41
Reliability
Internal Consistency reliability represents the extent to which items in a test are similar to one another. Split-half: The test is divided into halves
and a correlation is taken between the scores on each half.
Coefficient alpha and Kuder-Richardson measure the relationship between and among all items and total scale of a test.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.42
Reliability
Scorer and rater reliabilities reflect the extent to which independent scorers or a single scorer over time agree on a score. Interjudge (inter-rater) reliability:
Consistency of two or more independent scorers.
Intrajudge (intra-rater) reliability: Consistency of one person over time.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.43
Reliability
Standard Error of Measurement is an estimate of how often one can expect errors of a given size in an individual’s test score.SEm=SD * SQT 1-r
Sem=Standard error of measurement
SD=Standard deviation of the test scoresr=the reliability coefficient
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.44
Selecting a Test
Once your have defined the purpose for your study:
1. Determine the type of test that you need.
2. Identify and locate appropriate tests.
3. Determine which test to use after a comparative analysis.
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.45
Selecting a TestThere are several locations where one
can obtain information and reviews about available tests. These are a good place to start when selecting a test.
MMY: The Mental Measurements Yearbook is the most comprehensive source of test information
Pro-Ed Publications ETS Test Collection Database Professional Journals Test publishers and distributors
Educational Research: Competencies for Analysis and Application, 9th edition.Gay, Mills, & Airasian
© 2009 Pearson Education, Inc.All rights reserved.46
Selecting a Test
When comparing tests you have located and you are deciding which to use attend to each of the following:
First, examine validity. Next, consider reliability. Consider ease of test use. Assure participants have not been
previously exposed to the test. Assure sensitive information is not
unnecessarily included.