1 statistical sampling principles for the environment marian scott august 2013

59
1 Statistical sampling principles for the environment Marian Scott August 2013

Upload: melanie-henderson

Post on 28-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Statistical sampling principles for the environment Marian Scott August 2013

1

Statistical sampling principles for the environment

Marian Scott

August 2013

Page 2: 1 Statistical sampling principles for the environment Marian Scott August 2013

2

Outline

• Variation• General sampling principles• Methods of sampling

– Simple random sampling– Stratified sampling– Systematic sampling– How many samples (power calculations)

• examples- the ECN sites, Ecomags project

Page 3: 1 Statistical sampling principles for the environment Marian Scott August 2013

3

network design-ECN

Page 4: 1 Statistical sampling principles for the environment Marian Scott August 2013

4

Ground sampling locationsEcomags

Page 5: 1 Statistical sampling principles for the environment Marian Scott August 2013

5

Variation

• Natural variation in the attribute of interest, might be due to – feeding habits if measuring sheep, rainfall patterns if

measuring plants

• Also variation/ uncertainty due to analytical measurement techniques.

• Natural variation may well exceed the analytical uncertainty

• Expect therefore that if you measure a series of replicate samples, they will vary and if there is sufficient you may be able to define the distribution of the attribute of interest.

Page 6: 1 Statistical sampling principles for the environment Marian Scott August 2013

6

from Gilbert and Pulsipher (2007)

Page 7: 1 Statistical sampling principles for the environment Marian Scott August 2013

7

Data

Frequency

6.05.55.04.54.03.5

20

15

10

5

0

4.647 0.3815 59

4.704 0.6001 14

Mean StDev N

alllogtlogt2007

Variable

Normal Histogram of log activity

Activity (log10) of particles (Bq Cs-137) with Normal or Gaussian density superimposed

Variation

Page 8: 1 Statistical sampling principles for the environment Marian Scott August 2013

8

What is statistical sampling?

Statistical sampling is a process that allows inferences about properties of a large collection of things (commonly described as the population), to be made from observations made on a relatively small number of individuals belonging to the population (the sample). In conducting statistical sampling, one is attempting to make inferences to the population.

Page 9: 1 Statistical sampling principles for the environment Marian Scott August 2013

9

Statistical sampling

The use of valid statistical sampling techniques increases the chance that a set of specimens (the sample, in the collective sense) is collected in a manner that is representative of the population.

Statistical sampling also allows a quantification of the precision with which inferences or conclusions can be drawn about the population.

Page 10: 1 Statistical sampling principles for the environment Marian Scott August 2013

10

Statistical sampling

• the issue of representativeness is important because of the variability that is characteristic of environmental measurements.

• Because of variability within the population, its description from an individual sample is imprecise, but this precision can be described in quantitative terms and improved by the choice of sampling design and sampling intensity (Peterson and Calvin, 1986).

Page 11: 1 Statistical sampling principles for the environment Marian Scott August 2013

11

Good books

• The general sampling textbooks by Cochran (1977) and Thompson (1992),

• the environmental statistics textbook by Gilbert (1987), and

• papers by Anderson-Sprecher et al. (1994), Crepin and Johnson (1993), Peterson and Calvin (1986), and Stehman and Overton (1994).

Page 12: 1 Statistical sampling principles for the environment Marian Scott August 2013

12

Know what you are setting out to do before you start

 describing a characteristic of interest (usually the average),

describing the magnitude in variability of a characteristic, describing spatial patterns of a characteristic,mapping the

spatial distribution, quantifying contamination above a background or

specified intervention level  detecting temporal or spatial trends,   assessing human health or environmental impacts of

specific facilities, or of events such as accidental releases,

• assessing compliance with regulations

Page 13: 1 Statistical sampling principles for the environment Marian Scott August 2013

13

Rules

• Rule 1: specify the objective • Rule 2: use your knowledge of the

environmental context

Page 14: 1 Statistical sampling principles for the environment Marian Scott August 2013

14

Use your scientific knowledge

   the nature of the population such as the physical or biological material of interest, its spatial extent, its temporal stability, and other important characteristics,

  the expected behaviour and environmental properties of the compound of interest in the population members,

   the sampling unit (i.e., individual sample or specimen),

• the expected pattern and magnitude of variability in the observations .

Page 15: 1 Statistical sampling principles for the environment Marian Scott August 2013

15

What is the population?

• The concept of the population is important. The population is the set of all items that could be sampled, such as all fish in a lake, all people living in the UK, all trees in a spatially defined forest, or all 20-g soil samples from a field. Appropriate specification of the population includes a description of its spatial extent and perhaps its temporal stability

Page 16: 1 Statistical sampling principles for the environment Marian Scott August 2013

16

What is the sampling unit?

• The environmental context helps define the sampling unit. It is not practical to consider sampling units so small that their concentration cannot be easily measured; to consider extremely large sampling units, if they are too difficult to manipulate or process.

• A sampling unit is a unique element of the population that can be selected as an individual sample for collection and measurement.

Page 17: 1 Statistical sampling principles for the environment Marian Scott August 2013

17

Sampling units

• In some cases, sampling units are discrete entities (i.e., animals, trees), but in others, the sampling unit might be investigator-defined, and arbitrarily sized.

• Statistical sampling leads to a description of the sampled members of the population and inference(s) and conclusion(s) about the population as a whole.

Page 18: 1 Statistical sampling principles for the environment Marian Scott August 2013

18

RepresentativityAn essential concept is that the taking of a sufficient number of individual samples should provide a collective sample that is representative of all samples that could be taken and thus provides a true reflection of the population.

A representative collective sample should reflect the population not only in terms of the attribute of interest, but also in terms of any incidental factors that affect the attribute of interest.

Representativeness of environmental samples is difficult to demonstrate. Usually, representativeness is considered justified by the procedure used to select the samples.

Page 19: 1 Statistical sampling principles for the environment Marian Scott August 2013

19

5 step approach

• Define the objectives and questions to be answered• Summarize the environmental context for the

quantities being measured.• Identify the population, including spatial and

temporal extent.• Select an appropriate sampling design.• Document the sampling design and its rationale.

Page 20: 1 Statistical sampling principles for the environment Marian Scott August 2013

20

Methods

Simple random sampling

With simple random sampling, every sampling unit in the population has, in theory, an equal probability of being included in the sample. The resulting estimator based on such a sample will be unbiased, but it may not be efficient, in either the statistical or practical senses. Simple random sampling designs are easy to describe but may be difficult to achieve in practice.

Page 21: 1 Statistical sampling principles for the environment Marian Scott August 2013

21

Population of N units-10 randomly selected

1 2 3 4 5 9

17

23 25

31 33

42 45

46 51 54

Random digits: 5,17,23, 25, 31, 33,42, 45,46,51

Page 22: 1 Statistical sampling principles for the environment Marian Scott August 2013

22

Methods

Stratified sampling

The population is divided into strata, each of which is likely to be more homogeneous than the entire population. In other words, the individual strata have characteristics that allow them to be distinguished from the other strata, and such characteristics are known to affect the measured attribute of interest. Some ordinary sampling method (e.g., a simple random sample or systematic sample) is used to estimate the properties of each stratum.

Page 23: 1 Statistical sampling principles for the environment Marian Scott August 2013

23

Methods

Stratified samplingUsually, the proportion of sample observations taken in each stratum is similar to the stratum proportion of the population, but this is not a requirement. If good estimates are wanted for rare strata that have a small occurrence frequency in the population, then the number of samples taken from the rare strata can be increased. Stratified sampling is more complex and requires more prior knowledge than simple random sampling, and estimates of the population quantities can be biased if the stratum proportions are incorrectly specified.

Page 24: 1 Statistical sampling principles for the environment Marian Scott August 2013

24

Methods

Systematic sampling

Systematic sampling is probably the most commonly used method for field sampling. It is generally unbiased as long as the starting point is randomly selected and the systematic rules are followed with care. Line transects and two dimensional grids are specific types of systematic samples that are described in more detail in the spatial section.

Page 25: 1 Statistical sampling principles for the environment Marian Scott August 2013

25

Methods

Systematic sampling

Systematic sampling is often more practical than random sampling because the procedures are relatively easy to implement in practice, but this approach may miss important features if the quantity being sampled varies with regular periodicity and the sampling scheme has similar periodicity.

Page 26: 1 Statistical sampling principles for the environment Marian Scott August 2013

26

Population of N (9x6) units-9 systematically selected

1 2 3 4 5 6

12 18

24

30 36

42

48 54

Systematic selection: 6,12,18,24,30,36,42,48

Page 27: 1 Statistical sampling principles for the environment Marian Scott August 2013

27

So we have sampled, what next?Analyse the resulting data

Two of the most common sampling objectives are:

estimation of the mean, or estimation of a proportion (e.g., the unknown fraction of a

population > a specified value),

We consider how to achieve these under different sampling schemes

Page 28: 1 Statistical sampling principles for the environment Marian Scott August 2013

28

Estimate the population meanSimple random sampling

every sampling unit in the population is expected to have an equal probability of being included in the sample. The first step requires complete enumeration of the population members. In the simple random-sampling scheme, one generates a set of random digits that are used to objectively identify the individuals to be sampled and measured.

Page 29: 1 Statistical sampling principles for the environment Marian Scott August 2013

29

Estimate the population meanThe sampling frame

In simple random sampling, one might assume a population of N units (N 100-cm2 areas), and use simple random sampling to select n of these units. This typically involves generation of n random digits between 1 and N, which would identify the units to sample. If a number is repeated, then one would simply generate a replacement digit.

Page 30: 1 Statistical sampling principles for the environment Marian Scott August 2013

30

Sample mean and variance as estimators of the population

quantities

n

yy i

1

2

2

n

yys

i

Page 31: 1 Statistical sampling principles for the environment Marian Scott August 2013

31

Sampling error

n

fsyVar

1)( 2

the sampling fraction f is usually very small and given by n/N.

Page 32: 1 Statistical sampling principles for the environment Marian Scott August 2013

32

Stratified random sampling

In stratified sampling, the population is divided into two or more strata that individually are more homogeneous than the entire population, and a sampling method is used to estimate the properties of each stratum. Usually, the proportion of sample observations in each stratum is similar to the stratum proportion in the population.

Page 33: 1 Statistical sampling principles for the environment Marian Scott August 2013

33

Stratified random sampling

In stratified sampling, the population of N units is first divided into sub-populations of N1, N2,….NL units. These sub-populations are non-overlapping and together comprise the whole population. The sub-populations are called strata. They need not have the same number of units, but, to obtain the full benefit of stratification, the sub-population sizes or areas must be known. In stratified sampling, a sample is drawn from each of the strata, the size of each sample ideally in proportion to the population size or area of that stratum.

Page 34: 1 Statistical sampling principles for the environment Marian Scott August 2013

34

Sample mean and variance estimators of the population quantities

N

yNl

ll

cA

ll

l

llc f

n

sWAVar 1)(

22

Page 35: 1 Statistical sampling principles for the environment Marian Scott August 2013

35

Systematic sampling

• Systematic sampling differs from the methods of random sampling in terms of practical implementation and in terms of coverage. Again, assume there are N (= nk) units in the population. Then to sample n units, a unit is selected for sampling at random. Then, subsequent samples are taken at every k units. Systematic sampling has a number of advantages over simple random sampling, not least of which is convenience of collection. A systematic sample is thus spread more evenly over the population.

Page 36: 1 Statistical sampling principles for the environment Marian Scott August 2013

36

Systematic sampling

• Data from systematic designs are more difficult to analyze, especially in the most common case of a single systematic sample. Consider first the simpler case of multiple systematic samples. For example, xxx in pond sediment could be sampled using transects across the pond from one shoreline to the other. Samples are collected every 5m along the transect. The locations of the transects are randomly chosen. Each transect is a single systematic sample.

Page 37: 1 Statistical sampling principles for the environment Marian Scott August 2013

37

Systematic sampling

• Each sample is identified by the transect number and the location along the transect. Suppose there are i = 1,.., t systematic samples (i.e. transect in the pond example) and the yij is the jth observation on the ith

systematic sample for j = 1,…, ni. The

average of the samples from the i’th transect is calculated.

Page 38: 1 Statistical sampling principles for the environment Marian Scott August 2013

38

Population mean and variance estimators

t

ii

t

iij

n

j

t

ii

t

iii

sy

n

y

n

yny

i

1

1 1

1

1

t

isyisy yy

tt

TtyVar

1

2.)1(

/1)(

Page 39: 1 Statistical sampling principles for the environment Marian Scott August 2013

39

• How many samples are needed to ?

Page 40: 1 Statistical sampling principles for the environment Marian Scott August 2013

40

Number of sampling units do you need to collect

• state the desired limits of precision for the population inference (how precisely does one want to know the average PCB concentration, or, what size of difference is needed to be detected and with what precision?),

• state the inherent population variability of the attribute of interest, and

• derive an equation which relates the number (n) of samples with the desired precision of the parameter estimator and the degree of significance (the chance of being wrong in the inference).

Page 41: 1 Statistical sampling principles for the environment Marian Scott August 2013

41

Number of samples

What is the power?

Power is a probability, it is the probability that we correctly conclude that the null hypothesis should be rejected. The null would say there is no difference/no effect/no trend.

We want a high power

Page 42: 1 Statistical sampling principles for the environment Marian Scott August 2013

42

Power Curves

Page 43: 1 Statistical sampling principles for the environment Marian Scott August 2013

43

PCB

• estimate the mean concentration with an estimated standard error (e.s.e.) precision of 0.1 mg kg-1. The variation of PCB in salmon flesh is 3.192. Therefore, how many samples would be required? Since the e.s.e. of the sample mean is s/n, then one must solve for n, for example:

Page 44: 1 Statistical sampling principles for the environment Marian Scott August 2013

44

Sample size-too big

2

...

ese

sn 1018

1.0

19.32

Thus this degree of improvement in precision, can only be achieved by increasing the number of samples taken to approximately 1000. This may well be impractical; therefore the only solution may be to accept a lower precision.

Page 45: 1 Statistical sampling principles for the environment Marian Scott August 2013

45

Outline

• Spatial sampling– Grid, transect and cluster sampling

• See also Spatial modelling section

Page 46: 1 Statistical sampling principles for the environment Marian Scott August 2013

46

Know what you are setting out to do before you start

 describing a characteristic of interest (usually the average),

describing the magnitude in variability of a characteristic, describing spatial patterns of a characteristic,mapping the

spatial distribution, quantifying contamination above a background or

specified intervention level  detecting temporal or spatial trends,   assessing human health or environmental impacts of

specific facilities, or of events such as accidental releases,

• assessing compliance with regulations

Page 47: 1 Statistical sampling principles for the environment Marian Scott August 2013

47

Spatial sampling

• In ecology, spatial data usually fall into one of two different general cases:

• Case 1: We assume that there is an attribute that is spatially continuous, where in principle it is possible to measure the attribute at any location defined by coordinates (x, y) over the domain or area of interest.

• Case 2: The attribute is not continuous through space; it exists and can be measured only at specific locations (see point processes in spatial session).

Page 48: 1 Statistical sampling principles for the environment Marian Scott August 2013

48

Random and stratified random sampling

In random sampling, a random sample of locations at which the attribute is to be measured is chosen from the target population of locations. If there is knowledge of different strata over the sampling domain (such as soil type), the use of a stratified sample would be recommended and a random sample of locations would be selected within each strata. The data set is then given by the spatial coordinates of each measurement location and the measured value of the attribute at that location

Page 49: 1 Statistical sampling principles for the environment Marian Scott August 2013

49

systematic sampling

Usually, for systematic sampling the region is considered as being overlaid by a grid (rectangular or otherwise), and sampling locations are at gridline intersections at fixed distance apart in each of the two directions. The starting location is expected to be randomly selected.

Both the extent of the grid and the spacing between locations are important. The sampling grid should span the area of interest (the population). If the goal of the study is to describe spatial correlations, the spacing between locations should be shorter than the range of the correlation.

Page 50: 1 Statistical sampling principles for the environment Marian Scott August 2013

50

Quadrats and transects

• A quadrat is a well-defined area within which one or more samples are taken; it is usually square or rectangular in shape, with fixed dimensions. The position and orientation of the quadrat will be chosen as part of the sampling scheme.

• A line transect is a straight line along which samples are taken, the starting point and orientation of which will be chosen as part of the sampling scheme. In addition, the number of samples to be collected along the transect, and their spacing requires definition.

Page 51: 1 Statistical sampling principles for the environment Marian Scott August 2013

51

Stratified sampling chosen

Page 52: 1 Statistical sampling principles for the environment Marian Scott August 2013

52

Systematic sampling

• Again, assume there are N (= nk) units in the population. Then to sample n units, a unit is selected for sampling at random. Then, subsequent samples are taken at every k units. Systematic sampling has a number of advantages over simple random sampling, not least of which is convenience of collection.

Page 53: 1 Statistical sampling principles for the environment Marian Scott August 2013

53

Transect sampling chosen

Page 54: 1 Statistical sampling principles for the environment Marian Scott August 2013

54

Before-after-control-impact (BACI) designs

• One of the most plausible alternative explanations of a change is that the system changed ‘on its own’. That is, the observed change (from the before-event samples to the after-event samples) would have happened even in the absence of the known impact.

• One simple impact-assessment design evaluates this alternative by estimating the change at a control site presumed to be unaffected by the known event. Data are collected at four combinations of sites and times: affected and unaffected sites, each sampled before the impact and after the impact.

Page 55: 1 Statistical sampling principles for the environment Marian Scott August 2013

55

Before-after-control-impact (BACI) designs

• The impact of the known event is estimated by the interaction between sites and times, i.e., the difference between the change at the impacted site and the change at the control site. The BACI design controls for additive temporal change unrelated to the known event.

• Elaborations on the basic BACI design include using multiple control sites to estimate spatial variability and spatial trends, multiple samples from the impacted area to estimate variability within the impacted area, and very frequent sampling to better characterize the nature of the impact.

Page 56: 1 Statistical sampling principles for the environment Marian Scott August 2013

56

Graph of designs

7654321

11

10

9

8

7

6

time

Y-D

ata

impactcontrol

Variable

Scatterplot of impact, control vs time

simplest scenario, impact occurs between time point 3 and 4

Page 57: 1 Statistical sampling principles for the environment Marian Scott August 2013

57

Graph of designs

impact occurs between time point 3 and 4, but the impact effect declines with time

7654321

11

10

9

8

7

6

time

Y-D

ata

impactcontrol

Variable

Scatterplot of impact, control vs time

Page 58: 1 Statistical sampling principles for the environment Marian Scott August 2013

58

Analysis of BACI designs

• before after control impact

• if observations made at the same time at the two sites, then we could analyse the difference (paired design)

• with repeated measurements then would need to consider a repeated measures ANOVA approach.

Page 59: 1 Statistical sampling principles for the environment Marian Scott August 2013

59

summary

• Sampling and monitoring the environment is carried out for many purposes, including estimation of certain characteristics.

• Many experimental and monitoring programs have multiple objectives that must be clearly specified before the sampling program is designed, because different purposes require different sampling strategies and sampling intensities in order to be efficient, and to permit general inferences.

• Good sampling underpins all our statistical modelling