overview of field trial analysis procedures national research coordinators meeting windsor, june...

45
Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008

Upload: constance-flowers

Post on 18-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Overview of field trial analysis procedures

National Research Coordinators Meeting Windsor, June 2008

NRCMeetingWindsor

June 2008

Content of presentation

• Purposes of field trial analysis• Methodologies applied

– IRT Rasch model– Factor analysis

• Criteria used– Fit indices– Item and scale statistics

NRCMeetingWindsor

June 2008

Purposes of field trial

• Test feasibility of construct measurement • Review scalability of item material• Check test and questionnaire length• Compare different formats (items with and

without “don’t know” category)• Inform on relationships between constructs

and variables• Compare results from on-line and paper

surveys

NRCMeetingWindsor

June 2008

Data included in analysis

• Data from 31 countries included in international analyses

• Instrument data from– Student test (98 cognitive items)– Student questionnaire– Teacher questionnaire– School questionnaire– Regional instruments (cognitive and

questionnaire data)

• Comparison of on-line and paper surveys (international option)

NRCMeetingWindsor

June 2008

Types of analysis

• Review of frequencies and means

• Review of correlations between variables and constructs

• Review of reliabilities and item-score correlations

• IRT (Rasch) Scaling results

• Exploratory and Confirmatory Factor Analysis

NRCMeetingWindsor

June 2008

Analysis reports

• Part 1 in NRC(June08)2.pdf– Analysis of cognitive test items– Analysis of student questionnaire data– Four appendix documents (a, b, c and d)

• Part 2 in NRC(June08)3.pdf– Analysis of teacher questionnaire data– Analysis of school questionnaire data– Two appendix documents (a and b)– Addition document on comparison of paper and

online mode (NRC(June08)3c.pdf)

NRCMeetingWindsor

June 2008

Cognitive test data analysis

• Review of omitted, invalid and “not reached” responses

• Analysis of item difficulties, discrimination and Rasch model fit

• Differential item functioning– Gender groups– Countries

• Analysis of dimensionality• Trend item analysis

NRCMeetingWindsor

June 2008

Not reached items (medians)

NRCMeetingWindsor

June 2008

The IRT “Rasch” Model

• Modelling probability of getting a correct response

• Modelling probability of getting an incorrect response

in

innii XP

exp1

exp)1(

)exp(1

1)0(

inniXP

NRCMeetingWindsor

June 2008

IRT curves

NRCMeetingWindsor

June 2008

MC Item statistics (example)

NRCMeetingWindsor

June 2008

Example

NRCMeetingWindsor

June 2008

IRT models for categorical data

• Extension of Rasch model with additional step parameters

• Partial credit model has different step parameters for each item

iim

h

k

kijin

x

kijin

x mxPii

,,1,0

)(exp

)(exp)(

0 0

0

NRCMeetingWindsor

June 2008

Partial Credit Model – probabilities

NRCMeetingWindsor

June 2008

Test item difficulties

and abilities

| | 2 | | X| | X|57 | XX| | XX| | XX|48 67 | XXX|23 | XXX| | XXXX|78 | XXXXX|31 76 | 1 XXXXX|45 80 | XXXXXXX|17 41 42 72 97 | XXXXXX|12 | XXXXXX|59 68 73 75 98 | XXXXXXX|61 77 79 83 91 | XXXXXXXX|8 11 60 90 | XXXXXXXXX|85 89 | XXXXXXXXXX|14 29 37 47 53 84 94 | XXXXXXXX|6 65 92 | XXXXXXXXX|27 63 70 88 | 0 XXXXXXXX|22 44 64 81 93 | XXXXXXXXX|2 7 9 32 50 71 | XXXXXXXXX|4 13 55 62 96 | XXXXXXXXX|5 15 20 24 | XXXXXXXX|16 25 26 30 58 | XXXXXXXX|18 46 49 51 52 82 86 | XXXXX|1 21 43 54 66 | XXXXX|3 19 28 34 35 74 | XXXXXX|40 56 | -1 XXXX|69 | XXX|10 95 | XXX|33 36 | XX| | X| | X| | X|38 87 | X|39 | | | | | -2 | | ==================================================================== Each 'X' represents 106.7 cases ====================================================================

NRCMeetingWindsor

June 2008

Gender DIF

• Gender effect directly estimated with ACER ConQuest

• Reflects difference in logits if item parameters had been estimated separately for males and females

• Differences for combined effect > 0.3 flagged (effect * 2)– DPC item stats: separate effects reported

• Generally few cognitive test items had Gender DIF

NRCMeetingWindsor

June 2008

Item-by-country interaction

NRCMeetingWindsor

June 2008

Item dimensionality

• No clear pattern of dimensionality with regard to old CIVED and new ICCS items– High correlation between the two test

parts (0.87)

• High correlation for sub-dimensions– Cognitive dimensions: 0.89– Content dimensions: 0.93

NRCMeetingWindsor

June 2008

Review of CIVED trend items

NRCMeetingWindsor

June 2008

Coder reliability

0102030405060708090

100

IC2

PD

O1

IC2

BIO

1

IC2

MB

O1

IC2

ET

O1

IC2

VS

O1

IC2

RR

O1

IC2

WF

O1

IC2

WF

O2

NRCMeetingWindsor

June 2008

Summary for test item analysis

• Very positive results regarding scalability of test items

• Support for uni-dimensionality of test items

• Few items to be deleted or modified

• Open-ended items performed generally well (except one item)

NRCMeetingWindsor

June 2008

National reports

• Purpose: Checking of national item statistics and review of possible explanations

• Only for cognitive test items

• Graphical displays

NRCMeetingWindsor

June 2008

Item Stats in graphical

form

NRCMeetingWindsor

June 2008

National item fit and discrimination

International value

Summary of the national values(mean +/- 1STD)National value

[Item #]

NRCMeetingWindsor

June 2008

National Item difficulties and thresholds

International value

National value1. Summary of the

national values(mean +/- 1STD)

[Item #]

NRCMeetingWindsor

June 2008

National item review list

[Item #1]

[Item #2]

[Item #3]

[Item #4]

[Item #5]

NRCMeetingWindsor

June 2008

International Summary

NRCMeetingWindsor

June 2008

Item statistics

• DPC provided NRCs with item statistics

• Review of – category frequencies– point biserials (correlations)– Rasch parameter and fit– Gender DIF information– Difficulty in percentage correct (national

and international)

NRCMeetingWindsor

June 2008

Questionnaire item analysis

• Review of frequencies (including for omitted and invalid responses)

• Comparison of different formats (with and without “don’t know” categories)

• Analysis of scaling properties (reliabilities, Rasch modelling)

• Analysis of dimensionality• Analysis of relationships between

variables and constructs

NRCMeetingWindsor

June 2008

Correlations

• Reporting of Pearson’s correlation coefficients

• Used to review whether expected relationships are found in data (e.g. correlations between indicators of social background)

• Correlation with test performance regularly reported for student scales

• Criteria (not “scientific”): – < 0.1 Not substantial– 0.1 – 0.2 Weak– 0.2 – 0.5 Moderate– > 0.5 Strong

NRCMeetingWindsor

June 2008

Reliabilities

• Cronbach’s alpha coefficient– Is influenced by number of items!

• Item-by-total correlation

• Criteria– < 0.60 Poor– 0.60 – 0.70 Marginally satisfactory– > 0.70 Good

NRCMeetingWindsor

June 2008

Example of scale information

NRCMeetingWindsor

June 2008

Exploratory factor analysis

• Used for exploring dimensionality for sets of items

• VARIMAX rotation– Assumes factors to be uncorrelated

• PROMAX rotation– Assumes factors to be correlated

• Not always reported as it was used primarily in preliminary analysis steps

NRCMeetingWindsor

June 2008

Confirmatory Factor Analysis

• Model estimation based on variances and covariances– LISREL and SAS CALIS estimates– Maximum Likelihood (items assumed to

be continuous)

• Analysis to confirm expected factor structure

• Model fit indices indicate whether the model “fits the data”

NRCMeetingWindsor

June 2008

Example of CFA

VOTEPART1.00

POLPART1.00

INFPART1.00

I03A 0.26

I03B 0.25

I03C 0.59

I03D 0.59

I03E 0.32

I03F 0.45

I03G 0.45

I04C 0.53

I04D 0.54

I04E 0.41

I04F 0.45

I04G 0.45

Chi-Square=2788.41, df=51, P-value=0.00000, RMSEA=0.070

0.86

0.87

0.64

0.64

0.82

0.74

0.74

0.68

0.67

0.77

0.74

0.74

0.43

0.39

0.70

NRCMeetingWindsor

June 2008

CFA by country

NRCMeetingWindsor

June 2008

Fit indices

• RMSEA (Root mean squared error approximation)– > 0.10 Poor model fit of) – 0.10 – 0.05 Marginally satisfactory model fit– < 0.05 Close model fit

• RMR– > 0.10 Poor model fit– 0.10 – 0.05 Marginally satisfactory model fit– < 0.05 Close model fit

• CFI (Comparative fit index) and NNFI (Non-normed fit index)– < 0.70 Poor model fit– 0.80 - 0.90 Marginally satisfactory model fit– > .90 Close model fit

NRCMeetingWindsor

June 2008

IRT models for categorical items

• Partial credit model models the response probability for each depending on the latent trait – Item location parameter – Step parameter

iim

h

k

kijin

x

kijin

x mxPii

,,1,0

)(exp

)(exp)(

0 0

0

NRCMeetingWindsor

June 2008

ACER ConQuest models

• ITEM+ITEM*STEP: Constrained model– Assumes all item parameters to be equal

across countries

• ITEM-CNT+ITEM*CNT+ITEM*STEP:Unconstrained model– Assumes item location parameters to be

different across countries

NRCMeetingWindsor

June 2008

Item-by-country interaction

• Item-by-country interaction effects sum up to zero

• For review, the median of the absolute values was taken as an indicator of measurement invariance across countries

• Those values > 0.3 logits were interpreted as items with large item-by-country interaction

NRCMeetingWindsor

June 2008

IRT result tables

NRCMeetingWindsor

June 2008

Scope of analysis

• Given the narrow timeframe for analysis not all of these analyses were carried out for all instruments

• Reviews of frequencies, computation of scale reliabilities and exploratory factor analyses were done for all field trial instruments

NRCMeetingWindsor

June 2008

Analysis of relationships between context variables

• Correlation of school, teacher and student data aggregated at the school level– Results show quite a few of the expected

relationships

• Single-level regression analysis for test performance and expected electoral participation– 25 percent of variance in test performance and

21 percent of variance in index of expected electoral participation explained by models

NRCMeetingWindsor

June 2008

General outcomes

• Good scaling properties for most items– Some constructs and items not retained due to

poor results

• Comparison of formats– No substantial differences in outcomes but large

differences in missing values– Proposal to omit “don’t know” categories

• Encouraging results for measurement of socio-economic student background– Proposal not to retain household possession

items

NRCMeetingWindsor

June 2008

Questions or comments?