rsch 6109: assessment & evaluation methods classification of items response formats scoring...
TRANSCRIPT
RSCH 6109: Assessment & Evaluation Methods
Classification of Items
Response Formats
Scoring Procedures
Select Item Writing Guidelines
MCQs
Likert Rating Scales
1. Purpose & Framework
2. Test Specifications or Blueprint
3. Item Construction
4. Field Testing
5. Evaluation & Revision
Test Design & Construction
Test Design & ConstructionAERA, APA, & NCME (1999). Standards for Educational and Psychological Testing; Washington, DC: American Educational Research Association
Drummond, R.J. (2004). Appraisal Procedures for Counselors and Helping Professionals, 5th Ed.; New Jersey: Pearson Publishing
Step 1 Delineate Purpose Phase 1 Establish the need
Phase 2 Define the objectives & test parameters
Step 2 Develop test specs or blueprint
Phase 3 Seek advisory committee input
Step 3 Develop items or tasks & field test
Phase 4 Write questions
Phase 5 Field test
Phase 6 Review items
Step 4 Assemble & evaluate test Phase 7 Assemble final version
Phase 8 Secure technical data
Step 1: Delineate the Purpose & Framework
The purpose and framework delineate what the test is intended to measure.
Step 2: Prepare the Table of Specifications
The table of specifications typically describes the specific format of the items, the response format, and the type of scoring procedures.
Step 3: Develop Test Items or Tasks
Test Design & Construction
RSCH 6109: Assessment & Evaluation Methods
1. Like a mission statement for the test
2. Define the construct to be measured
3. Define the population with whom the test is to be used
4. Determine the target audience for the information the test provides, the test users
5. Define the nature of the decisions to be made based on the information the test provides
Defining the Purpose
RSCH 6109: Assessment & Evaluation Methods
1. A construct is an unobservable quality, ability, or attribute
2. We believe from theory that each person possesses some “amount” of the construct
3. We can’t directly observe or measure the “amount” or level
4. We rely on outward behaviors as indicators of the latent, or underlying construct
5. Contrast Blood Pressure and Depression
What is a Construct?
RSCH 6109: Assessment & Evaluation Methods
1. Theory
2. Literature
3. Expert opinion
4. Qualitative research
5. The goal is to include all aspects of the construct you intend to measure
Defining the Content Domain
Step 1: Delineate the Purpose & Framework
Example:
(Optimal) The purpose of the Counselor Achievement Test (CAT) is to assess counseling students’ knowledge, skills, and abilities for effective counseling services. The framework of the CAT is modeled after the National Counselor’s Exam (NCE) and includes eight content areas. The CAT will consist of 24-32 selected- and constructed-response items, as well as performance tasks. The CAT will be a criterion referenced measure.
(Typical)The purpose of the Study Habits Scale (SHS) is to assess college students’ habits of study. The SHS includes (between 18 and 30) items. The framework of the SHS is based on the work of Blai (1993). The SHS is a self-report measure designed to identify students’ study attitudes and behaviors.
Test Design & Construction
Step 2: Develop the test specifications or blueprint
Test Design & Construction
The table of specifications or test blueprint typically describes the number of items, the specific classification of the items and response format, and the type of scoring procedures.
Sample Table of Specifications for CATTTL# Content Area Item Classification* Format (#of Items)
K C AP AN S E
3 Human growth and development
1 1 1 MCQ (2)Constructed Response (1)
3 Social and cultural foundations
1 1 1 MCQ (2)Constructed Response (1)
3 Helping relationships 1 1 1 MCQ (2)Constructed Response (1)
3 Group Work 1 1 1 MCQ (2)Constructed Response (1)
3 Career and lifestyle development
1 1 1 MCQ (2)Constructed Response (1)
3 Appraisal 1 1 1 MCQ (2)Constructed Response (1)
3 Research and program evaluation
1 1 1 MCQ (2)Constructed Response (1)
3 Professional orientation & ethics
1 1 1 MCQ (2)Constructed Response (1)
24 6 6 6 2 2 2
*Refers to Bloom’s Taxonomy of Educational Objectives (1956). K=knowledge, C=comprehension, A=application, A=analysis, S=synthesis, and E=evaluation
RSCH 6109: Assessment & Evaluation Methods
1. Determine the target length in time to administer and number of items
2. Consider intended use and practical constraints – cost, complexity of scoring, etc.
3. Consider the purpose and the stakes involved in decision making
4. Initially write at least twice as many items as needed
5. Contrast a screening test with a diagnostic test
Developing Items
RSCH 6109: Assessment & Evaluation Methods
1. Short
2. Easy to administer
3. Inexpensive
4. Easy to score
5. Maximizes Sensitivity
6. Makes the correct decision when the condition of interest is present – Minimizes false negatives.
Screening Tests
RSCH 6109: Assessment & Evaluation Methods
1. Longer
2. More complex to administer
3. More expensive
4. Harder to score
5. Maximizes Specificity
6. Makes the correct decision when the condition of interest is not present – Minimizes false positives.
Diagnostic Tests
Step 2: Develop the test specifications or blueprint
Test Design & Construction
The table of specifications typically describes the specific classification of the items, the response format, and the type of scoring procedures.
Item Classifications: Bloom and Krathwohl (1956)
Knowledge Comprehension Application Analysis Synthesis Evaluation
Define, Identify, List, Name
Convert, Explain, Summarize
Compute, Determine, Solve
Analyze, differentiate, Relate
Design, Devise, Formulate, Plan
Compare, Critique, Evaluate, Judge
Bloom, et al’s Taxonomy of Educational Objectives (Cognitive Domain)
Knowledge Remembering previously learned material. Requires recall of facts, procedures,
Define, Recall, Identify, List, Name rules or events.
Comprehension Grasping the meaning of material. Requires reformulation, restatement, translation, Convert, Explain, Summarize or interpretation of content or identification of relationships.
Application Using information in concrete situations. Requires use of information in a setting Compute, Demonstrate, Solve or context other than where it was learned.
Analysis Breaking down material into parts. Requires recognition of logical errors, Analyze, Infer, Differentiate, Relate comparison of components, or differentiation between components.
Synthesis Putting parts together into whole. Requires production of something original, Design, Construct, Combine, Formulate solution to an unfamiliar problem, or combination of parts in an unusual way.
Evaluation Judging the value of a thing for a given purpose using definitive criteria. Requires
Discriminate, Critique, Evaluate,Judge formation of judgements about the worth or value of ideas, products, or procedures that have a specific purpose.
Response Formats:
Selected-Response
Response sets are provided and the user is forced to select among the choices. Examples include: MCQ, T/F, Yes/No, Matching, and Likert Ratings
Constructed-Response
No response sets are provided and the user is forced to provide a unique response. Examples include: Short Answer & Extended Answer.
Performance Tasks
No response sets are provided and the user is required to develop a product or perform some task or set of tasks. Examples include: Restricted and Extended Performance Tasks.
Test Design & Construction
Selected-Response Formats
1. Multiple Choice Questions (MCQ)
Multiple choice items include a question or STEM followed by a number of possible responses or OPTIONS. These options
make-up the RESPONSE SET of the item.
2. True – False Questions
True – false items include a stem and two discrete options. These options can be “True-False”, “Yes-No”, “Always-Never”, etc.
3. Matching Items
Matching exercises consist of two columns of information. The student is required to select the item in the second column which best
reflects the item in the first column.
4. Likert Rating Scale Items
Likert ratings include a scale ranging from one extreme to another. The anchors of the scale vary depending on the nature of the
statement.
Constructed-Response Formats (Optimal Performance)
1. Short Answer Questions
Completion or short answer formats consist of questions that can be answered with a word or short phrase, or a statement having one or more omitted words.
2. Limited Essay Questions
Limited essay questions consist of tasks or items requiring students to give brief, concise responses.
3. Extended Essay Questions
Extended essay questions consist of tasks or items that allow students freedom to choose the form and scope of their responses.
Format Advantages Disadvantages
MCQ Assesses broad range of skills in a limited amount of time. Scoring can be done quickly and objectively.
True-False Numerous items can be administered in a brief amount of time. Easy to write and objective to score.
Matching Assessed a broad range of skills in a limited time. Scoring can be done quickly and objectively.
Short Answer Numerous items can be administered in a short time. Moderately easy to write and score items. Guessing is difficult.
Essay Assesses broad range of skills, particularly higher order cognitive skills. Guessing is difficult.
Difficult and time consuming to write higher order cognitive items. Most items assess knowledge thru comprehension. Guessing reduces validity of scores.
Limited in complexity. Guessing reduces validity of scores. Not appropriate for optimal performance measures.
Higher order cognitive skills are difficult to assess. Guessing reduces validity of scores.
Limited to items that require very few words. Spelling errors can make scoring difficult.
Time consuming to administer and score. Limited content can be sampled during a test period. Scoring can be subjective.
Step 2: Develop the test specifications or blueprint
Test Design & Construction
Scoring Procedures:
Selected-Response
Typically, selected response items include 1 correct answer (a.k.a., dichotomous scoring). However, some tests may weigh responses differently.
Rating scale items are typically added together for a total score. For example, ten 5-point Likert rating scale items would yield a score range from 10 to 50. Typically, a higher score denotes stronger agreement, satisfaction, etc. with the overall construct.
Step 2: Develop the test specifications or blueprint
Test Design & Construction
Scoring Procedures (continued):
Constructed-Response
These formats are relatively more subjective, time consuming, and expensive to score.
Short-answer items require a list of acceptable answers.
Extended response items typically require a scoring rubric. A scoring rubric is a table describing the criteria for scoring, including detailed descriptions for varying degrees of performance. The scoring rubric may yield a holistic or analytic score. Holistic scores refer to the overall impression of the response (or behavior) and analytic scores refer to the discrete dimensions of the response (or behavior). Holistic scores yield one overall score and analytic scores typically yield sub-scores as well as an overall score.
Performance tasks vary depending on the nature and complexity of the tasks. Scoring procedures may require a checklist, Likert rating scale, or rubric.
1. Confidence Weighting Student is asked to indicate what he believes is the correct answer and how certain he is it is correct. Confident items are weighed more heavily than less confident items.
2. Answer Until Correct (AUC)
Student chooses alternatives until the correct response is selected. Once selected, the student moves on to the next item.
Supplemental Information: MCQ Alternatives
3. Elimination & Inclusion Scoring
Student is asked to either cross out all the alternatives that are incorrect (elimination) or circle the alternatives that are most likely correct (inclusion).
4. Multiple-Answer Format
Student is told that any number of the options might be correct. Each item is scored by subtracting the number of incorrect answers from the number of correct answers.
Supplemental Information: MCQ Alternatives
Sample Item: Confidence Weighting Please respond to the following items by circling the letter that corresponds to the correct response. In addition, please rate your level of confidence with your response to each item by circling the corresponding confidence level.
What is the main advantage of using a table of specifications when preparing an achievement test?
A. It reduces the amount of time required. (+0)
B. It improves the sampling of content. (+1)
C. It makes the construction of test items easier. (+0)
D. It increases the objectivity of the test. (+0)
Please circle the number that corresponds to the best descriptor for your level of confidence with the answer chosen:
5 4 3 2 1
Extremely Fairly Neutral Fairly Extremely
Confident Confident Unconfident Unconfident
Scoring Guide: Multiply the correct answer by the level of confidence. For this example, the student would receive 4 out of a possible 5 points.
Sample Item: AUC Please answer the following items by removing the overlay that corresponds to your response. If the answer chosen reveals an “INCORRECT” response, continue selecting until you reveal the “CORRECT” response. Once you have identified the “CORRECT” response you have completed the item and should move on to the next question.
What is the main advantage of using a table of specifications when preparing an achievement test?
A. It reduces the amount of time required.
B. It improves the sampling of content.
C. It makes the construction of test items easier.
D. It increases the objectivity of the test.
Scoring Guide:
1st Attempt = 100% 2nd Attempt = 66%
3rd Attempt = 33% 4th Attempt = 0%
Sample Item: Elimination Scoring
Please respond to the following items by circling the letter that corresponds to the correct response. In addition, please draw a line through those items that you confidently believe are incorrect.
What is the main advantage of using a table of specifications when preparing an achievement test?
Scoring Guide
A. It reduces the amount of time required. (+05)
B. It improves the sampling of content. (+85%)
C. It makes the construction of test items easier.
D. It increases the objectivity of the test.
RSCH 6109: Assessment & Evaluation Methods
Classification of Items
Response Formats
Scoring Procedures
Select Item Writing Guidelines
MCQs
Likert Rating Scales
1. Purpose & Framework
2. Test Specifications or Blueprint
3. Item Construction
4. Field Testing
5. Evaluation & Revision
Test Design & Construction