statistical analysis plan for tdts

Statistical Analysis Plan for TDTS

INTERVENTION Thinking, Doing, Talking Science (TDTS)

DEVELOPER Science Oxford (the public brand for The Oxford Trust) and Oxford Brookes University

EVALUATOR American Institutes for Research and NatCen Social Research

TRIAL REGISTRATION NUMBER

ISRCTN22499525

TRIAL STATISTICIAN

Sami Kitmitto, Ph.D.

TRIAL CHIEF INVESTIGATOR

Sami Kitmitto, Ph.D.

SAP AUTHOR Sami Kitmitto, Ph.D. and Raquel González, Ph.D.

SAP VERSION 2

SAP VERSION DATE

07/09/17

EEF DATE OF APPROVAL

21/08/17

DEVELOPER DATE OF APPROVAL

03/08/17

SAP CHANGES

Version 2: In order to facilitate comparability of EEF trials, we have revised the primary intention to treat model to align with other EEF studies. Our primary model now will only include a treatment/control indicator at the school-level and pre-test covariates at the student-level. We will not include the randomisation stratifiers (school percent FSM, number of year 5 teachers) or block (region) indicators. A secondary model will be run which will include the randomisation stratifiers. The analysis section has been revised to reflect this change.

Background

2

Table of contents

Introduction.............................................................................................................................................. 3

Study design ........................................................................................................................................... 3

Protocol changes .................................................................................................................................... 4

Randomisation ........................................................................................................................................ 4

Calculation of sample size ...................................................................................................................... 5

Follow-up ................................................................................................................................................. 6

Outcome measures ................................................................................................................................. 7

Primary outcome .............................................................................................................................. 7

Secondary outcomes ....................................................................................................................... 7

Analysis ................................................................................................................................................... 7

Primary intention-to-treat (ITT) analysis .......................................................................................... 7

Interim analyses ............................................................................................................................... 8

Imbalance at baseline ...................................................................................................................... 8

Missing data ..................................................................................................................................... 8

On-treatment analysis ...................................................................................................................... 9

Secondary outcome analyses .......................................................................................................... 9

Additional analyses .......................................................................................................................... 9

Subgroup analyses ........................................................................................................................ 10

Effect size calculation .................................................................................................................... 10

Report tables ......................................................................................................................................... 10

Pupil characteristics ........................................................................................................................... 12

Table 3: Baseline comparison ............................................................................................................... 12

Outcomes and analysis ..................................................................................................................... 12

3

Introduction

Thinking, Doing, Talking Science (TDTS) is an intervention delivered by a team from

Science Oxford (the public brand for The Oxford Trust) and Oxford Brookes University,

hereafter referred to the Oxford team, aimed at providing teachers with skills and strategies

to develop challenging enquiry-based lessons that incorporate more practical activities;

deeper thinking and discussion; and less, but better, recording. The theory is that the TDTS

professional development (PD) programme would support teachers by providing strategies

and background theory to enable them to develop more engaging and challenging lessons.

This professional development would then improve teacher self-efficacy and teaching

practices, which in turn is thought to improve pupil engagement, content knowledge, and

enquiry skills.

For this evaluation, the programme developers will use a “train the trainer” model to deliver

the intervention at scale. That is, a group of trainers will be trained in the intervention, and

these trainers will in turn provide Year 5 teachers with four one-day continuing professional

development (CPD) trainings throughout the 2016-17 school year. The trainings will occur

outside of the teachers’ schools. The teachers will then incorporate the TDTS skills and

strategies into their lesson plans and practices, thereby implementing the intervention within

their classroom.

The main research questions for the evaluation are:

When implemented at scale, what is the impact of the Thinking, Doing, Talking

Science (TDTS) programme on the a) science knowledge attainment and b) attitudes

towards science of participating pupils?

Our main research question focuses on two outcomes. The primary outcome is science

attainment of pupils at the end of Year 5 using an external science assessment. The

secondary pupil outcome is pupil science attitudes at the end of Year 5 using a pupil survey.

Study design

This study is a blocked cluster randomised control trial with randomisation at the school level within blocks. The trainers were recruited by the Oxford team, Bridget Holligan from Science Oxford and Helen Wilson from Oxford Brookes University. The trainers are primary science specialists and are knowledgeable of schools and systems in their respective regions. The trainers are from Centre for Industry Education Collaboration (CIEC) in Teesside and Lincolnshire, Science Made Simple (SMS) in Lancashire, Institute of Education at University College London (UCL) in London, Bath Spa University in Somerset, the Mathematics and Science Learning Centre at the University of Southampton in Hampshire and Early Years Science in Dorset. During the spring of 2016, the trainers in turn recruited schools to the study within their regions (as listed above). Trainers were instructed to recruit schools in their general geographic regions with an emphasis on recruiting schools that were higher than average in the proportion of pupils receiving free school meals (FSM).

For both intervention and control schools, all Year 5 teachers are required to participate in the study. The TDTS model requires that a minimum of two teachers participate per school. Hence, for one form entry schools recruited into the study the two teachers would be the one Year 5 teacher plus one additional teacher who could be a Science Coordinator or head of Key Stage.

All pupils of the selected teachers were eligible to participate in the evaluation. Recruitment of pupils occurred at the end of pupils’ Year 4 (May-June 2016) when schools distributed an opt-out form to the parents of Year 4 pupils. Pupil data will only be collected for pupils whose family did not complete an opt-out form. Data collection occurs once at the end of Year 5 for

4

the 2016-17 school year with a pupil science assessment and pupil survey about science attitudes. In addition, data from the National Pupil Database will be used to supplement additional information about the pupils.

In order to participate in the programme and evaluation, schools agreed to:

Be randomly assigned to the have the TDTS continuing professional development programme in either a) the 2016–17 school year or b) the 2017–18 school year

Complete a brief school survey describing professional development opportunities available to Year 5 teachers before randomisation

Allow all Year 5 teachers (a minimum of two teachers) to participate in the study, to participate in the programme when it is offered, and to utilise skills from that programme in their classrooms

Provide contact information (i.e., teacher email addresses) for all participating Year 5 teachers to allow the evaluation team to send a survey link directly to the teachers

Allow teachers in the study to participate in a brief teacher survey about their practices

Send an opt-out consent form to the parents of all eligible pupils in the study (i.e., all of the pupils in the classes of Year 5 teachers chosen for the study) and record any opt-outs received so that the data for these pupils are not passed on to the evaluation team

Provide pupil identification information (name, UPN, date of birth) for all pupils in the study (i.e., pupils in the Year 5 classes of teachers chosen for the study) to the evaluation team, and

Allow the administration of a science assessment and pupil survey (jointly administered) by the evaluation team to all pupils in the study at the end of the 2016–17 school year.

This is a two-arm design with a “waitlist” control group: in schools assigned to the intervention, Year 5 teachers participate in the TDTS CPD programme during the 2016-17 school year; control schools will be on a “waitlist” and continue with business as usual in the evaluation year and then be offered the TDTS programme for Year 5 teachers during the 2017–18 school year, after data collection is complete.

Protocol changes

There have not been any changes to the protocol.

Randomisation

Assignment to treatment and control groups occurred at the school level and was conducted separately within each of the seven study regions using minimisation methods.1 The minimisation included two school characteristics:

1) Percentage of pupils eligible for FSM in three categories: a. Low (9 percent FSM or less) b. Medium (between 9 and 24 percent FSM) c. High (greater than 24 percent FSM)

2) Number of Year 5 teachers in two categories: a. One or two b. Three or more

In total, 205 schools were recruited into the study and 106 were assigned into the treatment group and 99 into the control group. The minimisation process can result with unequal

1 The MimimPY program was used for school assignment. We used the “biased coin” method with a

base probability of .75.

5

numbers assigned to the treatment and control groups because the process assigns each school probabilistically one-at-a-time.

The baseline measure of pupil achievement for this study is the academic attainment at the end of Key Stage 1 for pupils in participating schools as measured by the end of Key Stage 1 assessment. Assignment was conducted by the study team in July 2016. We expect to obtain the baseline academic achievement data from the National Pupil Database in the summer of 2017.

Calculation of sample size

The choice of model selected for this study was a mixed multilevel model with three levels

(pupil, school, and region) and treatment at the second level (school level). This choice was

based on the following parameters for the study:

Schools would be recruited within region

Schools would be the unit of randomisation

Pupils would be the level of observation

Power calculations were conducted using the PowerUp tool (Version: 22/01/2015).2 The

following are parameters used in power calculations:

Alpha Level (α) = .05, two-tailed test, with Power (1-β) = .80

Interclass Correlation (ICC) = .123

Proportion of Level 2 units randomized to treatment = .50

Number of school level covariates = 2

Number of pupils per school: 32 pupils, of which 6 would be FSM pupils

Proportion of pupil variance explained by covariates (R21) = .30

Proportion of school variance explained by covariates (R22) = .40

Our goal was to choose a sample size to obtain an MDES of .18 or less. Assuming that 30

schools were recruited within each region and based on the conservative parameters listed

above, a total of 6 regions would generate an MDES of .176 for FSM pupils and .127 for all

pupils.

2 Dong, N. and Maynard, R. A. (2013). PowerUp!: A tool for calculating minimum detectable effect

sizes and sample size requirements for experimental and quasi-experimental designs. Journal of Research on Educational Effectiveness, 6(1), 24-67. doi: 10.1080/19345747.2012.673143 3 In order to determine the appropriate ICC for the power analysis, we reached out to the study team

that led the efficacy trial to ask for their recommendation. The authors of the efficacy trial calculated an ICC of .12 and therefore, this ICC was used in our power analysis. (Hanley, Pam. “Re: Correlations from thinking, talking doing science trial.” Message to Camilla Nevill and Sami Kitmitto. 7 October, 2015. Email).

6

Follow-up

Participant flow diagram4

4 Recruitment was led by the Oxford Team. Where we do not have detail regarding recruitment we

have left the field blank.

Re

cru

itm

en

t A

na

lysis

Allo

ca

tion

Agreed to participate (school n=210)

Randomised (school n= 205; pupil n=8,894)

Excluded (school n=2) Not meeting inclusion criteria (school n=3)

T1 (school n=106; pupil n=4,404)

Waitlist control (school n=99; pupil n=4,490)

Approached by telephone (school n=380) Did not agree to participate

(school n=155) Middle-schools without Y4 (school n=15)

Lost to follow up

4 schools dropped out of the study

Post-test data collected

102 schools

Not

analysed

Analysed

tbc after

data

collection

Not

analysed

Analysed

tbc after

data

collection

Post-test data collected

98 schools

Lost to follow up

1 school dropped out of the study

Mailed materials (school n~2200)

7

Outcome measures

Primary outcome

The primary outcome will be pupil science knowledge attainment, which will be measured by

a score on a science assessment administered as part of the evaluation. This study uses the

science assessment and scoring guide developed by the team from the Institute for Effective

Education (IEE) at the University of York, who conducted the efficacy trial of TDTS.5 The

items in the assessment were drawn from a larger bank of items developed by Terry Russell

and Linda McGuigan.6 The assessment includes items that address the science curriculum

content appropriate for the year group and represents a range of topics and item types (e.g.,

open-ended, closed item).7 In analysis, we will use raw scores which have a possible range

of 0 to 41.

Secondary outcomes

The secondary outcome will be pupil self-efficacy and attitudes towards science, which will

be measured at one time point by a pupil-level survey administered at the same time as the

science assessment. As with the science assessment, this study uses the pupil survey

developed for the efficacy trial of TDTS.8 That survey was adapted from a questionnaire

developed by Kind, Jones and Barmby.9 The items on the survey are on a five point scale

from “Agree a lot” to “Disagree a lot.”

We recognise the importance of summarising the items into meaningful constructs.

Preliminarily, we interpret these items to fall into the domains of self-efficacy towards science

and attitudes towards science. We will use exploratory and then confirmatory factor analysis

to determine an underlying factor structure that supports the creation of reliable indices of

pupil attitudes and self-efficacy towards science.

Analysis

Primary intention-to-treat (ITT) analysis

The primary pupil outcome measure, pupil science knowledge attainment, will be analysed

using a mixed multilevel regression model to reflect the nested nature of the data and the

method of assignment, with pupils nested within schools. To allow for comparability across

other EEF trials, the primary model will include only pupil covariates and an intervention

indicator at the school level designating which schools were part of the intervention and

control groups. School-level random effects will be included in the model by allowing the

intercept to vary randomly across schools.

5 AIR has obtained permission to use this pupil science survey from IEE.

6 Russell, T., & McGuigan, L. (2001). Science Assessment Series 1. Teachers' Guides and

Assessment Units KS1 and KS2. NFER-Nelson. Russell, T., & McGuigan, L. (2001). Science Assessment Series 2. Teachers’ Guides and Assessment Units. NFER-Nelson. 7 AIR has obtained permission to use this pupil science assessment from the developers and efficacy

trial authors. 8 AIR has obtained permission to use this pupil science survey from IEE.

9 Kind, P., Jones, K., & Barmby, P. (2007). Developing attitudes towards science measures.

International Journal of Science Education 29(7), 871–893.

8

Level 1 – Pupil level

Primary pupil outcome measure

Description Source

Primary: Pupil score on the study science assessment Study’s science assessment

Pupil-level control measures

Description Source

KS1_MATHS NPD

KS1_READWRIT NPD

Level 2 – School level (level of randomisation)

School-level control measures

Description Source

School assigned to the treatment group Study assignment

No interactions with the treatment indicator are planned for this analysis, except for the sub-

group analysis for FSM pupils, which is described below.

Interim analyses

No interim analyses are planned for this trial.

Imbalance at baseline

At the time of school assignment, only school-level information on the percent eligible for

FSM and the number of year 5 teachers was available. We will check balance in baseline

student characteristics, specifically student KS1 scores which are used as the pre-test

control in this trial. Differences in test scores will be reported as effect sizes. While there may

be imbalance due to the random nature of assignment, we do not expect imbalance at

baseline due to the low attrition of schools (less than three percent of recruited schools were

lost to attrition).

Missing data

When data are missing, the power of the analysis to detect statistically significant effects is

reduced and, depending on the mechanism by which data are missing, the estimated effects

and standard errors can potentially be biased.

If 5 percent or less of pupils have incomplete outcome information, we will conduct analysis

omitting these pupils. In other words, we will conduct analysis using listwise deletion of any

pupil with incomplete information. Previous research has found that when 5 percent of the

data are missing, bias is low across the various approaches to handling missing data in

analysis including listwise deletion.10

If greater than 5 percent of the pupils have incomplete outcome information, we will

investigate the mechanism by which data are missing and consider multilevel multiple

imputation as an alternative to listwise deletion.

10

Puma, Michael J., Robert B. Olsen, Stephen H. Bell, and Cristofer Price (2009). What to Do When Data Are Missing in Group Randomized Controlled Trials (NCEE 2009-0049). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education.

9

A priori, potential reasons why pupils might be missing outcomes in this study include:

School dropped out after recruitment and assignment to treatment and control groups

Pupil left the study school between the end of year 4 (i.e., at the time of recruitment

when opt-out forms were sent and school assignment to treatment or control groups

was conducted) and testing at the end of year 5

Pupil was absent on the day of testing and not reached in mop-up efforts

Pupil was not tested due to having special educational needs or disability

To investigate these mechanisms, we will first look at the extent of the missing data for each

variable to be included in the analysis and the patterns of missingness across treatment and

control groups. Second, we will estimate a model predicting missingness to examine

whether the covariates in our primary analysis model jointly (using an F-test) predict the

absence of pupil outcome data.

If any differences are found, significant at p<.05, multilevel multiple imputation will be used

for analysis and reporting. Results will be reported alongside the completers analysis.

On-treatment analysis

We will conduct a Complier Average Causal Effect (CACE) analysis of TDTS. The Oxford Team has defined an appropriate minimum level of exposure for “compliers” to use in the CACE analysis. Compliers will be schools sending at least one teacher to at least three of the four training sessions. The rationale for this exposure level is that if at least one teacher attended a session, he/she would be able to share knowledge from the training session with colleagues.

Secondary outcome analyses

The indices of pupil self-efficacy and attitudes towards science will be analysed using the

same model as described above in the primary ITT analysis, but without pupil level

covariates.

Secondary pupil outcome measure

Description Source

Secondary: Pupil self-efficacy and attitudes towards science Study pupil survey

Additional analyses

A second model will be run which will include pupil and school covariates, including all

variables used in the assignment process as well as individual pupil’s prior KS1 scores.

Intervention and control groups will be compared by including an intervention indicator at the

school level. School-level random effects will be included in the model by allowing the

intercept to vary randomly across schools. There are too few regions in the study to reliably

estimate variation in effects across them, but differences will be accounted for by including

fixed-effect region indicators.

Primary pupil outcome measure

Description Source

Primary: Pupil score on the study science assessment Study’s science assessment

10


Description Source

KS1_MATHS NPD

KS1_READWRIT NPD

Level 2 – School level (level of randomisation)

School-level control measures

Description Source

School assigned to the treatment group Study assignment

LEA[yy]_Pct_Pupils_FSM_Eligible NPD

Number of Year 5 teachers in the school Study recruitment data

Region indicator Study recruitment data

In addition, we will run exploratory analyses in which we will estimate our primary and

secondary analyses models including additional pupil level covariates (see table below).


Description Source

KS1_MATHS NPD

KS1_READWRIT NPD

KS2_GENDER NPD

KS2_YEAROFBIRTH and KS2_MONTHOFBIRTH (to calculate age)

NPD

KSF_FSM6 NPD

IDACIScore_[term][yy] NPD

Subgroup analyses

Subgroup analysis will be conducted for the population of FSM pupils using the “everFSM”

variable (“KS2_FSM6”). For this analysis, the primary and secondary outcome analysis

models will be re-estimated using data limited to this sample. Furthermore, an interaction

model will be run on the entire data. This will mirror the main primary outcome model but

also include everFSM and the interaction between everFSM and group as covariates.

Effect size calculation

Primary outcome results will be reported as raw scores as well as effect sizes that standardise the estimated impacts. The numerator will be the regression adjusted estimate of the impact of TDTS from the multi-level model and the denominator will be the standard deviation of the outcome for the full sample.11

Report tables

Table 1: Summary of impact on primary outcome

Group Effect size

(95% confidence interval)

Estimated months’ progress

EEF security rating

EEF cost rating

11

Per EEF guidance, see https://educationendowmentfoundation.org.uk/public/files/Evaluation/Analysis_for_EEF_evaluations_REVISED_Dec_2015.pdf.

11

Treatment vs. control (science assessment)

Treatment FSM vs. control FSM

(science assessment)

Table 2: Minimum detectable effect size at different stages

Stage

N [schools/pupils] (n=intervention; n=control)

Proportion of variance in outcome explained by pre-test + other covariates

ICC

Blocking/ stratification or pair matching

Power Alpha

Minimum detectable effect size (MDES)

Protocol

[180 schools/5760 pupils] (90 treatment schools, 90 control schools; 2880 pupils in treatment schools, 2880 pupils in control schools)

Pupil level: .30 School level: .40

.12 6 regions .80 .05 .13

Randomisation

[205 schools] (106 treatment schools, 99 control schools)

7 regions .80 .05

Analysis

12

Pupil characteristics

Table 3: Baseline comparison

Variable Intervention group Control group

School-level (categorical) n/N (missing) Percentage n/N (missing) Percentage

Region 1 Region 2 Region 3 Region 4 Region 5 Region 6 Region 7

12/103 (0) 7/103 (0) 13/103 (0) 13/103 (0) 11/103 (0) 12/103 (0) 17/103 (0)

11/98 (0) 7/98 (0) 12/98 (0) 15/98 (0) 10/98 (0) 9/98 (0) 17/98 (0)

School-level (continuous) n (missing) [Mean or median] n (missing) [Mean or median]

Percent eligible for FSM

Number of Y5 teachers in the school

IDACI of pupils in the school

Percent pupils classified as white British ethnic origin

Percent of pupils who are EAL

Pupil-level (categorical) n/N (missing) Percentage n/N (missing) Percentage

Eligible for FSM

Gender

Pupil-level (continuous) n (missing) [Mean or median] n (missing) [Mean or median]

Score on KS1 maths

Score on KS1 reading

IDACI

Pupil age

Outcomes and analysis

Table 4: Primary analysis

Raw means Effect size

Intervention group Control group

Outcome n (missing)

Mean (95% CI) n (missing)

Mean (95% CI)

n in model (intervention; control)

Hedges g (95% CI)

p-value

Pupil science assessment

Pupil science self efficacy and attitude survey

statistical analysis plan for tdts

Documents