chapter 5 test scores as composites

78
1 CHAPTER 5 CHAPTER 5 Test Scores as Test Scores as Composites Composites This Chapter is about the This Chapter is about the Quality of Items in a Test. Quality of Items in a Test.

Upload: dotty

Post on 29-Jan-2016

89 views

Category:

Documents


1 download

DESCRIPTION

CHAPTER 5 Test Scores as Composites. This Chapter is about the Quality of Items in a Test. Test Scores as Composites. What is the Composite Test Score? - PowerPoint PPT Presentation

TRANSCRIPT

  • *CHAPTER 5 Test Scores as CompositesThis Chapter is about the Quality of Items in a Test.

  • *Test Scores as Composites

    What is the Composite Test Score?A composite test score is a total test score created by summing two or more subtest scores i.e., WAIS IV Full Scale IQ consisted of 1-Verbal Comprehension Index, 2-Perceptual Reasoning Index, 3-Working Memory Index, and 4-Processing Speed Index. Your Qualifying Examinations and EPPP Exams are also composite test scores.

  • *Item Scoring Schemes [skeems]SystemsWe have 2 different scoring system1. Dichotomous ScoresDichotomous Scores are restricted to 0 and 1 such as scores on True and False, and multiple-choice question2. Non-dichotomous ScoresNon dichotomous Scores are not restricted to 0 and 1 Can have range of possible points such as in essays. 1,2, 3, 4, 5..

  • *Ex. Dichotomous Scheme Examples1. The space between nerve cell endings is called the a. Dendrite b. Axon ; c. Synapse d. Neutron(In this item, responses a, b, and d are scored 0; response c is scored 1.)2. Teachers in public school systems should have the right to strike. a. Agree b. Disagree(In this item, a response of Agree is scored 1; Disagree is scored 0) .Or, you can use True or False.

  • *Practical Implication for Test ConstructionVariance and Covariance measure the quality of items in a test.Reliability and validity measure the quality of the entire test.=SS/N used by one set of dataVariance is the degree of variability of scores from mean.

  • *Practical Implication for Test ConstructionCorrelation is based on a statistic called Covariance (Cov xy or S xy) COVxy=SP/N-1 used for 2 sets of dataCovariance is a number that reflects the degree to which 2 variables vary together. r=sp/ssx.ssy

  • *VarianceX = ss/N Pop 1 s = ss/n-1 or ss/df Sample 2 4 5SS=x-(x)/NSS=( x-)Sum of Squared Deviation from Mean

  • *CovarianceCovariance is a number that reflects the degree to which 2 variables vary together. Original Data X Y 1 3 2 6 4 4 5 7

  • *CovarianceCOVxy=SP/N-12 ways to calculate the SPSP= xy-(x.y/N)SP= (x-x)(y-y)SP requires 2 sets of dataSS requires only one set of data

  • *Descriptive Statistics for Dichotomous Data

  • Total Score Variance

    As the proportion of total variance attributed to true variance increases, the degree of reliability increases, *

  • *Descriptive Statistics for Dichotomous DataItem Variance & Covariance

  • *Descriptive Statistics for Dichotomous DataP=Item Difficulties:P= (#of examinees who answered an item correctly / total # of examinees or P=f/NSee handoutThe higher the P value The easier the item

  • *

  • Relationship between Item Difficulty P and Variance (quality)

    0 difficult 0.5 1 easy P= Item Difficulty *

  • *Non-dichotomous Scores Examples1. Write a grammatically correct German sentence using the first person singular form of the verb verstehen. (A maximum of 3 points may be awarded and partial credit may be given.)

    2. An intellectually disabled person is a nonproductive member of society. 5. Strongly agree 4. Agree, 3. No opinion 2. Disagree 1. Strongly disagree (Scores can range from 1 to 5 points. with high scores indicating a positive attitude toward intellectually disabled citizens.)

  • *Descriptive Statistics for Non-dichotomous Variables

  • *Descriptive Statistics for Non-dichotomous Variables

  • *Variance of a Composite C=SS/Na=SSa/Na b=SSb/NbC= a+b Ex.From WAIS III-- FSIQ=VIQ+PIQIf More than 2 subtests, C=a+b+c Calculate the variance for each subtest and add them up. Ex. next

  • *Calculate the Composite variance for this test and the next use C= a+b

  • *Calculate the Composite variance for this test and the previous one, use C= a+b

  • *Variance of a Composite C

    More than 2 subtests

    Ex. WAIS IV Full Scale IQ which consist of a-Verbal Comprehension Index, b-Perceptual Reasoning Index, c-Working Memory Index, and d-Processing Speed Index. C=a+b+c+d

  • **Suggestions to Increase the Total Score Variance of a Test1-Increase the number of items in a test2-Item difficulties p (medium range)3-Items with similar content have higher correlations & higher covariance 4-Item scores & total scores variances alone are not indices (in-d-cz) of test quality (reliability and validity).

  • **1-Increase the Number of Items in a Test (how to calculate the test variance)

    Variance for a test of 25 items is higher than a variance for a test of 20 items. =(N)x)+(N)N-1)(COVx)=Ex. If the COVx=items covariance = (0.10) x=items variance (0.20)N= #of items in a test -- first try N=20 =test variance For 20 items 42 , then try N=25 and =test variance for 25 items 65

  • *2-Item Difficulties

    Item difficulties should be almost equal for all of the items and difficulty levels should be in the medium range.

  • *3-Items with Similar Content have Higher Correlations & Higher Covariance

  • *4- Item Scores & Total Scores Variances Alone are not Indices of Test Quality

    Variance and Covariance are important and necessary however, they are not sufficient to determine the test quality. To determine a higher level of test quality we use Reliability and Validity.

  • UNIT II RELIABILITYCHAP 6: RELIABILITY AND THE CLASSICAL TRUE SCORE MODELCHAP 7: PROCEDURES FOR ESTIMATING RELIABILITYCHAP 8: INTRODUCTION TO GENERALIZABILITY THEORYCHAP 9: RELIABILITY COEFFICIENTS FOR CRITERION-REFERENCED TESTS

    *

  • *CHAPTER 6Reliability and the Classical True Score Model Reliability (p)=Reliability is a measure of consistency/dependability, or when a test measures same thing more than once and results in same outcome. Reliability refers to the consistency of examinees performance over repeated administrations of the same test or parallel forms of the test (Linda Crocker Text).

  • THE MODERN MODELS*

  • *TYPES OF RELIABILITY*

    TYPE OF RELIABILITYWHT IT ISHOW DO YOU DO ITWHAT THE RELIABILITY COEFFICIENT LOOKS LIKETest-Retest

    2 AdminA measure of stabilityAdminister the same test/measure at two different times to the same group of participantsr test1.test2Ex. IQ testParallel/alternate Interitem/Equivalent Forms2 AdminA measure of equivalenceAdminister two different forms of the same test to the same group of participantsr testA.testBEx. Stats Testr testA.testB

    Test-Retest with Alternate Forms

    2 AdminA measure of stability and equivalenceOn Monday, you administer form A to 1st half of the group and form B to the second half.On Friday, you administer form B to 1st half of the group and form A to the 2nd half

    Inter-Rater

    1 AdminA measure of agreement Have two raters rate behaviors and then determine the amount of agreement between themPercentage of agreementInternal Consistency

    1 AdminA measure of how consistently each item measures the same underlying construct i.e. dep.Correlate performance on each item with overall performance across participantsCronbachs Alpha MethodKuder-Richardson MethodSplit Half MethodHoyts Method

  • Test-Retest

    Class IQ ScoresStudents X 1st time on MonY 2nd time on Fri John 125 120 Jo 110 112Mary 130 128Kathy 122 120David 115 120*

  • Parallel/alternate Forms

    Scores on 2 forms of stats testsStudents Form A Form B John 95 92 Jo 84 82Mary 90 88Kathy 76 80David 81 78*

  • Test-Retest with Alternate Forms

    On Monday, you administer form A to 1st half of the group and form B to the second half. On Friday, you will administer form B to 1st half of the group and form A to the 2nd half

    Students Form A to 1st group (Mon) Students Form B to 2nd group (Mon)David 85 Mark 82Mary 94 Jane 95 Jo 78 George 80John 81 Mona 80Kathy 67 Maria 70

    Next slide*

  • Test-Retest with Alternate Forms

    On Friday, you administer form B to 1st half of the group and form A to the second Students Form B to 1st group (Fri) Students Form A to 2nd group (FRi)David 85 Mark 82Mary 94 Jane 95 Jo 78 George 80John 81 Mona 80Kathy 67 Maria 70*

  • *HOW RELIABILITY IS MEASUREDReliability is Measured by Using aCorrelation Coefficientr test1test2 or r x.yReliability Coefficients:Indicates how scores on one test change, relative to scores on a second testCan range from 0.0 to 1 1.00 = perfect reliability0.00 = no reliability

  • THE CLASSICAL MODEL*

  • *A CONCEPTUAL DEFINITION OF RELIABILITYCLASSICAL MODEL

  • Classical Test TheoryThe Observed Score, X=T+EX is the score you actually record or observe on a test.The True Score, T=X-E or, the difference between the Observed score and Error score is the True scoreT score is the reflection of the examinee true knowledge The Error Score, E =X-T or, the difference between the Observed score and True score is the Error score.E are factors that cause the True Score and observed score to differ. *

  • *A CONCEPTUAL DEFINITION OF RELIABILITY(X) Observed Score X=TEScore that actually observedConsists of two componentsTrue ScoreError Score

  • *A CONCEPTUAL DEFINITION OF RELIABILITYTrue Score T=X-EPerfect reflection of true value for individualTheoretical score

  • *Method error is due to characteristics of the test or testing situationTrait error is due to individual characteristicsConceptually, Reliability = True Score Observed ScoreReliability of the observed score becomes higher if error is reduced!!

    True ScoreTrue Score + Error ScoreA CONCEPTUAL DEFINITION OF RELIABILITY

  • *Error Score E=X-TIs the Difference between Observed and True score X=TE95=90+5 or 85=90-5 The difference between T and X is 5 points or E=5

    A CONCEPTUAL DEFINITION OF RELIABILITY OR

  • *The Classical True Score ModelX=TEX= Represents the observed test scoreT= Represents the individual's True knowledge of scoreE= Represents the random error component

  • *Classical Test TheoryWhat Makes up the Error Score?E=X-TError Score consist of;1-Method Error and 2-Trait Error1-Method Error Method Error is the difference between True & Observed Scores resulting from the test or testing situation.2-Trait ErrorTrait Error is the difference between True & Observed Scores resulting from the characteristics of examinees.See next slide

  • *What Makes up the Error Score?

  • *Expected Value of True ScoreDefinition of the True ScoreThe True score is defined as the expected value of the examinees test scores (mean of observed scores) over many repeated testing with the same test.

  • *Error ScoreDefinition of the Error ScoreError scores for an examinee over many repeated testing should be Zero. eEj=Tj-Tj=0 eEj=Expected value of ErrorTj=Examinee True Score Ex. next

  • *Error ScoreX-E=T or, the difference between the Observed score and Error score is the True score (scores are from the same examinee) 98-8= 90 88+2=90 80+10=90 100-10=90 XE=T 95-5=90 81+9=90 88+2=90 90-0=90-8+2+10-10-5+9+2-0=0

  • **INCREASING THE RELIABILITY OF A TEST Meaning Decreasing Error7 Steps

    1. Increase Sample Size (n)2. Eliminate Unclear Questions3. Standardize Testing Conditions4. Moderate the Degree of Difficulty of the tests (P) 5. Minimize the Effects of External Events6. Standardize Instructions (Directions)7. Maintain Consistent Scoring Procedures (use rubric)

  • **Increasing Reliability of your Items in a Test

  • **Increasing Reliability Cont..

  • *How Reliability (p) is Measured for an Item/scoreP=True Score/True Score + Error Score or p=T/T+E0=== p === 1Note: In this formula you always add your Error(the difference between T and X) to the True Score in the denominator () , Whether is positive or negative.

    p=T/T + (the difference between T and X which is E)p=T/T+E

  • Which Item has the Highest Reliability?Maximum points for this question is 10 p=T/T+E

    +2= 8.. 8/10=0.80 -3=6. 6/9=0.666 +7=1.1/8=0.125 -1=9..9/10=0.90 +4=6....6/10=0.60 -4=6.....6/10=0.60 +1=7....7/8=0.875 0=1010/10=1.0 -5=4..4/9=0.444 +6=3..3/9=0.333>MORE ERROR

  • How Classical Reliability (p) is Measured for a TestX=T+E p=T/Xfor an essay item/scoreExaminees1. X1=t1+e1 Ex. 10 = 7+32. X2=t2+e2 Ex. 8 = 5 + 33. X3=t3+e3 Ex. 6 = 4 + 2Then calculate theX=4 & T=2.33 *

  • How Classical Reliability (p) is Measured for a TestReliability Coefficient for All Itemspx1x2=T/XPx1x2 for previous ex=2.33/4.00= 0.58Pk=T/X

    *

  • How Reliability Coefficient (p) is Measuredfor a TestT X TE=X 3+2= 54+3=78+6=139+5=142+1=31+1=28+1=97+3=10P= T/ x 9.643/19.554= 0.4930*

  • Reliability Coefficient (p)for parallel test forms Reliability Coefficient (p) =The correlation between scores on parallel test forms.Next slide

    *

  • XE=TScores on Parallel Test Forms X Test A Y Test B 98-2= 96 95-6=89 88+2=90 80+6=86 80+11=91 87-4=83 100-8=92 75+12=87 95-3=92 90-5=85 81+12=93 82-2=80 88+1=89 86-3=83 90-3=87 85+6=91r=sp/ssx.ssy r=0.882*

  • **Reliability Coefficient and Reliability IndexThe item-reliability index provides a measure of the tests internal consistency.

  • **Reliability Coefficient and Reliability IndexReliability Coefficient- px1x2=T/XReliability Index pxt=T/XTherefore-px1x2=(pxt)Or pxt = Just like the relationship between and The higher the item-reliability index,The higher the internal consistency of the test.

  • **Reliability Coefficient and Reliability Index Reliability CoefficientPX1X2= T/XReliability Coefficient is the correlation coefficient that expresses the degree of reliability of a test. Reliability IndexPXT= T/XReliability index is the correlation coefficient that expresses the degree of relationship between True (T) and Observed (X) scores of a test. It is the of Reliability Coefficient.

  • *Reliability of a CompositeC=a+b..+kTwo Ways to Determine/predict the Reliability of the Composite Test Scores*1-Spearman Brown Prophecy Formula Allows us to estimate the reliability of a composite of parallel tests when the reliability of one of these tests is known.Ex. Next *2 -CRONBACHS Alpha () or Coefficient ()

  • *Next week Split Half Reliability Method which is the same as Spearman Brown Prophecy Formula when K=2

    *

  • *Next week Split Half Reliability Method which is the same as Spearman Brown Prophecy Formula when K=2

    *

  • **1. Spearman Brown Prophecy Formula

  • *

    *1. Spearman Brown Prophecy Formula

  • If N or K=2 then, we can call it Split half Reliability Method which is used for Measuring the Internal Consistency Reliability (see next chapter) The effect of changing test length can also be estimated by using Spearman Brown Prophecy Formula. Just like increasing the variance of a test by increasing the # of items in a test (Chapter 5)*

  • *The Spearman-Brown Prophcy Formula is used for: a,b,ca. Correcting for one half of the test by estimating the reliability of the whole test.b. Determining how many additional items are needed to increase reliability up to a certain level.c. Determining how many items can be eliminated without reducing reliability below a predetermined level

    *

  • *Reliability of a CompositeC=a+b..+k

    *2-CRONBACHS Alpha () or Coefficient () is a preferred statistic Allows us to estimate the reliability of a composite when we know the composite score variance and/or the covariance among all its components.Next slide

  • *Reliability of a CompositeC=a+b..+k

    *2-CRONBACHS Alpha () or Coefficient ()

    =Pcc= (1-

    K= # of tests=3 i= Variance of each test ta, tb, tc ta =2, tb =3, tc=4 C= Composite score variance=12

  • *The Standard Error of Measurement E or MA standard error of measurement is a tool used to estimate or infer how far an observed score deviates from a true score. Standard Error of Measurement is the Mean of the Standard Deviations () of all errors (E) made by several examinee. Next slideE=T-X

  • *The Standard Error of Measurement E or MStandard Error of Measurement is the Mean of the Standard Deviations () of all errors (E) made by several examinee. E=T-XExaminees Test 1 Test 2 Test 3 Test 41. E=95-90=5 -----4---- -----3------ ----4--2. E=85-86=1 -----1----- ----3----- --2--3. E=90-95=5 -----3----- ----1------ ----3---4. E=95-93=2 -----2---- -----4---- ------1-- 1 2 3 4

  • **The Standard Error of Measurement E1. Find the s of these errors (E) for all of the examinees tests.2. The mean/average for these s is called the Standard Error of MeasurementE = x Pxx= r =reliability coefficient or use Px1x2 for parallel tests.x=Standard Deviation for a Set of Observed Scores(X).

  • **The Standard Error of Measurement E is a tool used to estimate or infer how far an observed score (X) deviates from a true score (T). E = x

    Pxx=r=reliability coefficient=use Px1x2 for parallel tests=.91x=Standard Deviation for a Set of Observed Scores=10 ----- E=3 next slide

  • *The Standard Error of Measurement EThis means the average difference between the True scores (T) and Observed scores (X) is 3 points for all examinees which is called the Standard Error of Measurement.

    The standard error of measurement is inversely related to reliability. As E Increases the reliability decreases.

  • *

  • *

    *******