sampling designs using the national pupil database some issues for discussion by harvey goldstein...

6
Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of Birmingham)

Upload: owen-hoover

Post on 28-Mar-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of

Sampling designs using the National Pupil Database

Some issues for discussionby

Harvey Goldstein (University of Bristol)&

Tony Fielding (University of Birmingham)

Page 2: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of

Size of data set

• The data set already contains some 3000k longitudinal records and increases by 600k a year.

• To carry out reasonably complex analyses, e.g. value added multilevel models, is already time consuming.

• Worth investigating the efficiency of sampling the database – either as a whole or for specific subpopulations such as LEAs.

• Traditional sampling theory can be used for simple statistics such as means or regression coefficients, and there is a literature for ‘power calculations’ for multilevel models (see ESRC research project by Browne at Nottingham)

Page 3: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of

Special features of the NPD

• The ‘population’ characteristics are known and can be used for drawing efficient samples.

• The possibility of an adaptive design exists, e.g.:– Select a random subsample to determine relationships of

interest (equivalent of a pilot study)– Fit a suitable model to estimate parameter values– Choose parameters of interest together with their confidence

intervals – Increase sample size to establish relationship between CI and

sample size and extrapolate to sample size needed to achieve required interval size.

– Any statistic of interest (in additon to CI) can be chosen.

Page 4: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of

Complex designs and replication

• For multilevel models and designs where interest focuses on special groups (e.g. low achievers) we need good choices of numbers of higher level units (schools) and numbers in the groups.

• A similar adaptive approach can be used, evaluating CIs or significance levels as design parameters are altered.

• We also have the opportunity of replicating an analysis by selecting an independent sample from the database.

Page 5: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of

Using all the data

• When analysing a given sample we will also generally have available data related to the sample members, e.g.:– School level averages for each pupil in a study– School level data for previous schools attended– School level data for previous years– LEA data for previous years– School data for neighbouring schools,

• All such data can be incorporated into a model, increasing the number of variables but not the sample size.

Page 6: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of

Other possibilities

• Poststratification: using population distributions to re-weight statistics or to incorporate weights in model estimation.

• Setting up an archive of results that may be useful for designing samples

• Using PLASC to select a research sample – subject to appropriate permissions.