![Page 1: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of](https://reader036.vdocuments.net/reader036/viewer/2022082917/5515dddd550346d46f8b4ba2/html5/thumbnails/1.jpg)
Sampling designs using the National Pupil Database
Some issues for discussionby
Harvey Goldstein (University of Bristol)&
Tony Fielding (University of Birmingham)
![Page 2: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of](https://reader036.vdocuments.net/reader036/viewer/2022082917/5515dddd550346d46f8b4ba2/html5/thumbnails/2.jpg)
Size of data set
• The data set already contains some 3000k longitudinal records and increases by 600k a year.
• To carry out reasonably complex analyses, e.g. value added multilevel models, is already time consuming.
• Worth investigating the efficiency of sampling the database – either as a whole or for specific subpopulations such as LEAs.
• Traditional sampling theory can be used for simple statistics such as means or regression coefficients, and there is a literature for ‘power calculations’ for multilevel models (see ESRC research project by Browne at Nottingham)
![Page 3: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of](https://reader036.vdocuments.net/reader036/viewer/2022082917/5515dddd550346d46f8b4ba2/html5/thumbnails/3.jpg)
Special features of the NPD
• The ‘population’ characteristics are known and can be used for drawing efficient samples.
• The possibility of an adaptive design exists, e.g.:– Select a random subsample to determine relationships of
interest (equivalent of a pilot study)– Fit a suitable model to estimate parameter values– Choose parameters of interest together with their confidence
intervals – Increase sample size to establish relationship between CI and
sample size and extrapolate to sample size needed to achieve required interval size.
– Any statistic of interest (in additon to CI) can be chosen.
![Page 4: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of](https://reader036.vdocuments.net/reader036/viewer/2022082917/5515dddd550346d46f8b4ba2/html5/thumbnails/4.jpg)
Complex designs and replication
• For multilevel models and designs where interest focuses on special groups (e.g. low achievers) we need good choices of numbers of higher level units (schools) and numbers in the groups.
• A similar adaptive approach can be used, evaluating CIs or significance levels as design parameters are altered.
• We also have the opportunity of replicating an analysis by selecting an independent sample from the database.
![Page 5: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of](https://reader036.vdocuments.net/reader036/viewer/2022082917/5515dddd550346d46f8b4ba2/html5/thumbnails/5.jpg)
Using all the data
• When analysing a given sample we will also generally have available data related to the sample members, e.g.:– School level averages for each pupil in a study– School level data for previous schools attended– School level data for previous years– LEA data for previous years– School data for neighbouring schools,
• All such data can be incorporated into a model, increasing the number of variables but not the sample size.
![Page 6: Sampling designs using the National Pupil Database Some issues for discussion by Harvey Goldstein (University of Bristol) & Tony Fielding (University of](https://reader036.vdocuments.net/reader036/viewer/2022082917/5515dddd550346d46f8b4ba2/html5/thumbnails/6.jpg)
Other possibilities
• Poststratification: using population distributions to re-weight statistics or to incorporate weights in model estimation.
• Setting up an archive of results that may be useful for designing samples
• Using PLASC to select a research sample – subject to appropriate permissions.