testing validity: internal validity of test items and item analysis

16
Page 1 Course VALIDITY & ASSESSMENT Learner/Practitioner Assessment Project Purpose of the Assessment : The purpose of the test was to assess knowledge of nurses completing an in-service training about “Patient Safety”. Persons being assessed: Learners who took the “Patient Safety” test are nursing staff on the 8 th Floor Vanderbilt University Medical Center with varying degrees of experience in and outside of Vanderbilt and total years of nursing experience. The learners were attendees of the inservice and the inservice would count towards their 4 hours annual required inservice time. This works as a motivator to get nurses to attend inservices. Framework – content: The content for the inservice was derived from current findings published by the Institute for Healthcare Improvement Safety Initiative called Transforming Care at the Bedside (TCAB) (Viney et al. 2006). The concepts in the inservice were presented to staff to help explain key quality and safety concepts about inpatient acute hospital falls, hospital medication errors, adverse events, and nosocomial pressure ulcers. One arm of recommendations stemming from TCAB is that nurses and teams benefit from current knowledge and awareness about evidenced based research regarding patient safety and hospital quality improvement. Framework – measurement and outcome level: The assessment for this inservice was criterion-referenced framework. The level of learning outcome being assessed is 3A, Learning: Declarative Knowledge measured by posttest (Moore et al. 2009). The passing score for this test was 70%. If learners did not achieve a score of 70% or greater, they did not receive a full hour of inservice time. Out of 37 taking the assessment, 33 scored above this 70% mark. Data Collection tool: 12 item True or False questions online web-based posttest. The link was emailed to each attendee the Friday following

Upload: melissa-powell

Post on 28-Nov-2014

786 views

Category:

Education


4 download

DESCRIPTION

Testing Validity work

TRANSCRIPT

Page 1: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 1

Course VALIDITY & ASSESSMENT

Learner/Practitioner Assessment Project

Purpose of the Assessment: The purpose of the test was to assess knowledge of nurses completing an in-service training about “Patient Safety”.

Persons being assessed: Learners who took the “Patient Safety” test are nursing staff on the 8th Floor Vanderbilt University Medical Center with varying degrees of experience in and outside of Vanderbilt and total years of nursing experience. The learners were attendees of the inservice and the inservice would count towards their 4 hours annual required inservice time. This works as a motivator to get nurses to attend inservices.

Framework – content: The content for the inservice was derived from current findings published by the Institute for Healthcare Improvement Safety Initiative called Transforming Care at the Bedside (TCAB) (Viney et al. 2006). The concepts in the inservice were presented to staff to help explain key quality and safety concepts about inpatient acute hospital falls, hospital medication errors, adverse events, and nosocomial pressure ulcers. One arm of recommendations stemming from TCAB is that nurses and teams benefit from current knowledge and awareness about evidenced based research regarding patient safety and hospital quality improvement.

Framework – measurement and outcome level: The assessment for this inservice was criterion-referenced framework. The level of learning outcome being assessed is 3A, Learning: Declarative Knowledge measured by posttest (Moore et al. 2009). The passing score for this test was 70%. If learners did not achieve a score of 70% or greater, they did not receive a full hour of inservice time. Out of 37 taking the assessment, 33 scored above this 70% mark.

Data Collection tool: 12 item True or False questions online web-based posttest. The link was emailed to each attendee the Friday following the 4 separate nightshift and dayshift inservice events. The test was not proctored, there was no discussion of using other resources and attendees were told that it would be based on the power point lecture. They were told that 70% would be passing.

Person(s) completing the data collection tool: Participants in the inservice complete the test.

Frequency of data collection and the sample: The test was assigned once after the inservices and taken online within two weeks of inservice for full inservice time. It is a one time, no remediation test. 100% of inservice attendees took test.

Descriptive Results from the data set:

Page 2: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 2

One leaner did not answer one item. 2 people are missing from some of this data. One learner did not answer every question and another learner was not a nurse but an ancillary staff member. Their data was removed from reliability testing and item analysis. This first bar chart describes all test takers, their percent of items correct, the mean of 91%, and standard deviation of 16.5.

TABLE 1.

Number of Learners

Percent Correct

Percent Correct

Page 3: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 3

TABLE 2.

All Learners Percent Correct

Frequency Percent Valid Percent Cumulative Percent

Valid

33.33 1 2.7 2.7 2.7

41.67 1 2.7 2.7 5.4

58.33 1 2.7 2.7 8.1

66.67 1 2.7 2.7 10.8

75.00 2 5.4 5.4 16.2

83.33 1 2.7 2.7 18.9

91.67 7 18.9 18.9 37.8

100.00 23 62.2 62.2 100.0

Total 37 100.0 100.0

TABLE 3.

Reliability Statistics

Cronbach's Alpha

Cronbach's Alpha Based on Standardized Items

N of Items

.855 .866 11

In Table 3 The number of items for which we could perform reliability testing is 11. One item is not included in the reliability measure because not all learners answered the question.

Percent Correct

Page 4: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 4

TABLE 4.

Mean Std. Deviation N

gait_belts_scored Gait belts are used to prevent falls.

.7429 .44344 35

device_pu_scored Device related pressure ulcers may be unpreventable when a patient is compromised nutritionally, has poor perfusion and must have device secured in place for life support.

.8857 .32280 35

toiletting_scored Per Vanderbilt policy, if you assist a patient to the toilet, you must stay with them.

.9429 .23550 35

reimbursed_scored As of 2012, hospitals are reimbursed related to their patient safety scores.

.9143 .28403 35

rrt_scored Rapid Response Systems were designed to prevent failure to rescue. Calling Rapid Response for first recognition of trigger is the reliable way to ensure Rapid Response Systems remain reliable.

.9714 .16903 35

Page 5: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 5

reliability_scored Hospital reliability and nursing communication related to patient safety must include checklists, standardized communication formats and information system checks.

.9714 .16903 35

transfusion_scored Transfusion errors begin at the point of collecting the specimen.

.9714 .16903 35

ebp_fall_scored Some hospitals are using hip protectors and helmets on patients who are known for falling.

.9714 .16903 35

stop_pu_scored Pressure ulcers are prevented by appropriate surface selection, regular repositioning and turning, optimizing temperature control, and preventing moisture/providing moisture barrier products.

.8571 .35504 35

fall_liability_scored Patients who fall who have stated a high desire for independence, who have stated they do not have to use the call bell, can not hold us liable if they fall and are hurt.

.8857 .32280 35

Page 6: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 6

zero_scored Falls are preventable and achieving zero falls has been attained in other hospitals.

.8286 .38239 35

In Table 4 the item statistics are presented. The mean percent of learners getting the item correct for each item is in the column labeled mean. 2 people are missing from this data. One learner did not answer every question and another learner was not a nurse but an ancillary staff member. Their data was removed from reliability testing and item analysis. The first item “Gait belts are used to prevent falls” is a false statement. I suspect that it may have been a tricky question. A true statement would be “Gait belts are used to prevent injury during falls.” I think the reason people got it wrong is that it is just a little bit tricky.

TABLE 5.

Item-Total Statistics

Scale Mean if Item Deleted

Scale Variance if Item Deleted

Corrected Item-Total Correlation

Squared Multiple Correlation

Cronbach's Alpha if Item Deleted

gait_belts_scored Gait belts are used to prevent falls.

9.2000 2.929 .690 . .834

device_pu_scored Device related pressure ulcers may be unpreventable when a patient is compromised nutritionally, has poor perfusion and must have device secured in place for life support.

9.0571 3.291 .664 . .833

toiletting_scored Per Vanderbilt policy, if you assist a patient to the toilet, you must stay with them.

9.0000 3.471 .737 . .832

Page 7: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 7

reimbursed_scored As of 2012, hospitals are reimbursed related to their patient safety scores.

9.0286 3.499 .558 . .842

rrt_scored Rapid Response Systems were designed to prevent failure to rescue. Calling Rapid Response for first recognition of trigger is the reliable way to ensure Rapid Response Systems remain reliable.

8.9714 3.793 .533 . .848

reliability_scored Hospital reliability and nursing communication related to patient safety must include checklists, standardized communication formats and information system checks.

8.9714 3.793 .533 . .848

transfusion_scored Transfusion errors begin at the point of collecting the specimen.

8.9714 3.852 .441 . .852

ebp_fall_scored Some hospitals are using hip protectors and helmets on patients who are known for falling.

8.9714 3.970 .260 . .859

Page 8: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 8

stop_pu_scored Pressure ulcers are prevented by appropriate surface selection, regular repositioning and turning, optimizing temperature control, and preventing moisture/providing moisture barrier products.

9.0857 3.081 .775 . .822

fall_liability_scored Patients who fall who have stated a high desire for independence, who have stated they do not have to use the call bell, can not hold us liable if they fall and are hurt.

9.0571 3.114 .838 . .817

zero_scored Falls are preventable and achieving zero falls has been attained in other hospitals.

9.1143 3.692 .228 . .876

Table 5 describes item statistics. Each Cronbach’s Alpha is very good and is calculated to predict internal consistency. This can serve as an index of consistency and an approximation to test-retest reliability.

Measurement Characteristics:

ReliabilityWe are able to come up with measures for internal consistency such as calculating the test

item intercorrelations and reject or accept the questions with the highest or lowest reliability coefficient. We were able to accept all items and the last item makes no difference.

My index of consistency used was the Cronbach’s Alpha. It was 0.86 for 11 test items. This is a very good level of internal consistency. The Standard deviation for this test is 16.54. The mean score is 91.2. This means the average test score of all test participants was 91.2%. The standard error of measurement (SEM) is an estimate of error to use in interpreting an

Page 9: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 9

individual’s test score. A test score is an estimate of a person’s “true” test performance. Using a reliability coefficient and the test’s standard deviation, we can calculate this value:

SEM = sd 1 – r) The Standard Error of Measurement = 6.40. The SEM of the test scores of the test participants was 6.40.

With 99% confidence the mean true test score lies between 74.69 and 100. (16.51) With 95% confidence the mean true test score lies between 78.66 and 100. (12.54)

Validity

The validity of this assessment is that this assessment was a measure of how much was understood about concepts and ideas presented in a staff inservice about safety and quality. Nurses who do not have a general understanding of key ideas about safety quality may have less motivation implementing new processes and strategies to improve quality and safety.

Decisions: Those that score 70% in this assessment will be given a full inservice hour towards their total 4 hours required by the department. If they score less than 70% they only receive a half hour. This assessment would be formative in that it would give feedback to learners about where they have weakness or where they could do further study.

The content validity was assured because each question on the test was exactly quoted from the inservice and from the power point slides shown at the inservice. The content was related to the learning objectives given at the beginning of the class.

Construct validity about the content of the inservice is related to the importance of understanding key points about patient safety and quality in the hospital setting. These ideas are also key points reflected in Joint Commissions National Patient Safety Goals. Vanderbilt University Medical Center also has 5 Pillar Goals for 2012 that relate to patient safety and quality including preventing falls and pressure ulcers. The questions came directly from the lecture. And the content of the assessment is the content from the inservice materials.

When taking Kane’s “argument-based approach to validity”, and using “Criterion 1: Clarity of the Argument” the inservice lecture and test is based directly on the newest evidenced based points that comprise a better understanding of content of the Transforming Care At the Bedside initiative and the National Patient Safety Goals set by The Joint Commission. These points of evidence lay the foundation for understanding patient safety and quality improvement initiatives that are occurring in American hospitals. The inservice was conducted as a way to spread the latest evidenced based information and increase the nurse’s base knowledge. According to Criterion 2: Coherence of the Argument, by assuring transmitted evidenced based information that is relevant to a nurse’s work, the test is a way to measure the transmission of the information. According to Criterion 3: Plausibility of the assumptions, it is very plausible that the test is valid because the test questions are exactly quoted from the lecture and power point slides when test questions are true. When the test question is false the statement is changed in

Page 10: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 10

a simple way to make it false. Other sources of “error,” and other sources of unwanted variance that might undermine the measurement characteristics of this assessment are various things. I’ve listed nine examples of possible sources of error. 1. The test taker not being present during the inservice would undermine the results of the test. 2. The questions must be phrased in a clear non-confusing way. 3. There could be and was an attendee who was not a nurse but an ancillary staff member that wanted to attend and take the test. I did not include them in the reliability and item analysis calculations. 4. There are other factors such as learning or reading disabilities that any of the participants may possess that may interfere with test taking ability. 5. There could have been a distractor that caused the test participant to accidentally mark an answer they did not intend. 6. The test was given through Redcap, scoring was precise and was completed using SPSS. 7. Some of the nurses may have already known the information and to the degree that the inservice was unnecessary. 8. While this patient safety inservice is not given to improve patient safety directly, it is given to improve the nurse’s motivation and involvement in unit and patient safety awareness. 9. It is possible that those who scored poorly had already met their inservice time requirement and did not take the test seriously. There are numerous other possible sources of error (Kane 1992).

Improvement Plan

1. The first way to ensure that knowledge is being gained is to use this test as the pre-test. I could assign this test before giving the class to assess the baseline knowledge.

2. One aspect that could be improved upon is content validity. I could approach this by having a few nurse colleagues assess the test for content as well as question writing (Miller & Linn 2000).

3. Next I could repeat this test and measure test item intercorrelations. I could re-conduct this inservice on another floor with a separate and new cohort and see if my data is different and in what way.

Page 11: TESTING Validity: Internal Validity of Test Items and Item Analysis

Page 11

References

1. Viney MM, Batcheller JM, Houston S, Belcik KB. Transforming Care at the Bedside: Designing New Care Systems in an Age of Complexity. Journal of Nursing Care Quality April. 2006;21(2):143–150.

2. Rutherford P, Moen R, Taylor J. TCAB: The “How” and the “What.” AJN, American Journal of Nursing. 2009;109:5–17.

3. Kane MT. An argument-based approach to validity. Psychological Bulletin. 1992;112(3):527–535.

4. Moore Jr DE. Achieving desired results and improved outcomes: integrating planning and assessment throughout learning activities. J CONTIN EDUC HEALTH PROF. 2009;29(1):1.

5. Miller DM, & Linn RL. Validation of performance-based assessments. APPL PSYCH MEAS. 2000;24(4):367.