trim for first users
TRANSCRIPT
TRIM Workshop Arco van StrienWildlife statistics Statistics Netherlands (CBS)
What is TRIM?• TRends and Indices for Monitoring
data• Computer program for the analysis
of time series of count data with missing observations • Loglinear, Poisson regression (GLM)• Made for the production of wildlife
statistics by Statistics Netherlands (Jeroen Pannekoek / freeware / version 3.0)
Introduction
Why TRIM?• To get better indices? No, GLM in
statistical packages (Splus, Genstat...) may produce similar results • But statistical packages are often
unpractical for large datasets • TRIM is more easy to use
Introduction
The program of this workshopAim: a basic understanding of TRIM • basic theory of imputation• how to use TRIM to impute missing
counts and to assess indices etc. • basic theory of weighting procedure
to cope with unequal sampling of areas & how to use TRIM to weight particular sites
Introduction
Site
Year 1 Year 2 Year 3 Year 4 Year 5
1 20 10 8 2 3 2 20 10 12 3 2 3 16 8 10 3 3 4 8 4 6 6 5 5
10 5 7 7 8
Sum
74 37 43 21 21
Index 100 50 58 28 28
Introduction
INDEX: the total (= sum of al sites) for a year divided by the total of the base year
Site
Year 1 Year 2 Year 3 Year 4 Year 5
1 20 10 8 2 3 2 20 10 12 3 2 3 16 8 10 3 3 4 8 4 6 - 5 5
10 5 7 - 8
Sum
74 37 43 8?? 21
Index 100 50 58 11?? 28
Missing values affect indices
Theory imputation
Site
Year 1 Year 2
1 2 4 2
1 ?
Sum
3 ?
Index 100 ?
How to impute missing values?
ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1(site & year effect taken into account)
26
200
Theory imputation
Site
Year 1 Year 2
1 1 2 2
3 ?
Sum
4 ?
Index 100 ?
Another example..
ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1
6
8
200
Theory imputation
Site
Year 1 Year 2
1 1 3 2
3 ?
Sum
4 ?
Index 100 ?
And another example ...
ESTIMATION OF SITE 2 IN YEAR 2? SITE1 SUGGESTS: THREE TIMES AS MANY AS IN YEAR 1
9
12
300
Theory imputation
Site
Year 1 Year 2
1 ? 4 2
1 ?
Sum
? ?
Index 100 ?
Try this one…..
THERE IS NOT A SINGLE SOLUTION (TRIM will prompt an ERROR)
Theory imputation
Site
Year 1 Year 2 Year 3 Year 4 Year 5
1 20 10 8 2 3 2 - - 12 3 2 3 16 8 10 3 3 4 8 4 6 - 5 5
10 5 7 - 8
Sum
? ? 43 8?? 21
Index 100 50 58 11?? 28
Difficult to guess missings here..
Theory imputation
Site
Year 1 Year 2 Margin
1 2 4 6 2
1 ? 1
Margin
3 4 7
Estimating missing values by an iterative procedure(REQUIRED IN CASE OF MORE THAN A FEW MISSING VALUES)
Theory imputation
Site
Year 1 Year 2 Margin
1 2 4 6 2
1 ?
1
Margin
3 4
7
RECALCULATE THE MARGIN TOTALSAND REPEAT ESTIMATION OF MISSING
First estimate of site 2, year 2: 1 X 4/7 = 0.6
>>0.6
>>4.6
>>1.6
>>7.6
Theory imputation
Site
Year 1 Year 2 Margin
1 2 4 6 2
1 0.96 >>>> 2
>>>> 3
Margin
3 >>>> 6 >>>> 9
Index 100 200
REPEAT AGAIN: MISSING VALUE = 1.22, 1.40, 1.54 ETC. … >> 2
2nd estimate of site 2, year 2: 1.6 X 4.6/7.6 = 0.96
Theory imputation
• To get proper indices, it is necessary to estimate (impute) missings • Missings may be estimated from the margin
totals using an iterative procedure (taking into account both site effect as year effect) (Note: TRIM uses a much faster algorithm to impute missing values).
• Assumption: year-to-year changes are similar for all sites (assumption will be relaxed later!) • Test this assumption using a Goodness-of-
fit (X2 test)
Theory imputation
Site
Year 1 Year 2 Margin
1 2 4 6 2
1 3 4
Margin
3 7 10
(2.8)(4.2)
(1.2)(1.8)
X2: COMPARE EXPECTED COUNTS WITH REAL COUNTS PER CELL
X2 IS SUMMATION OF (COUNTED - EXPECTED VALUE)2 / EXP. VALUE (2-1.8)2 /1.8 + (4-4.2)2 /4.2 ETC. >> X2 = 0.08 WITH A P-VALUE OF 0.78 >>MODEL NOT REJECTED (FITS, but note: cell values in this example are too small for a proper X2 test) Theory imputation
Site
Year 1 Year 2 Year 3 Year 4 Year 5
1 20 10 8 2 3 2 20 10 12 3 2 3 16 (7.5) ? 10 3 3 4 8 4 6 (2.3) ? 5 5
10 5 7 7 8
Sum
74 36 43 17 21
Index 100 49 58 23 28
Imputation without covariate(X2 = 18 and p-value = 0.18)
Theory imputation
Site
Year 1
Year 2 Year 3 Year 4 Year 5
1 20 10 8 2 3 2 20 10 12 3 2 3 16 7.5>>9.1 10 3 3 4 8 4 6 2.3>> 5.4
5
5 10 5 7 7 8 Sum
74 36>>38 43 17>>20 21
Index 100 49>>51 58 23>>28 28
Using a covariate: better imputa-tions & indices, X2 = 1.7 p = 0.99
Theory imputation
Model
X2 df p-value
1
191 140 0.0026
2
154 133 0.09
3
161 143 0.14
What is the best model?
< not rejected
<<< rejected
< not rejected
Both model 2 and 3 are valid
Theory imputation
Summary imputation theory• To get proper indices, it is necessary to
impute missings • Assumption: year-to-year changes are
similar for all sites of the same covariate category • Test assumption using a GOF test; if p-
value < 0.05, try better covariates• If these cannot be found, the resulting
indices may be of low quality (and standard errors high). See also FAQ’s!
Theory imputation
The program of this workshopAim: a basic understanding of TRIM • basic theory of imputation• how to use TRIM to impute missing
counts and to assess indices etc. • basic theory of weighting procedure
to cope with unequal sampling of areas & how to use TRIM to weigh particular sites
Using TRIM
Using TRIM• several statistical models (time effects, linear model)• statistical complications (overdispersion, serial correlation) taken into account• Wald tests to test significances• model versus imputed indices • interpretation of slope
Using TRIM
Time effects model (skylark data) without covariate
Using TRIM
Time effects model with covariate 0 = total 1= dunes 2 = heathland
Using TRIM
Lineair trend model (uses trend estimate to impute missing values)
Using TRIM
Lineair trend model with a changepoint at year 2
Using TRIM
Lineair trend model with changepoints at year 2 and 3
Using TRIM
Lineair trend model with allchangepoints = time effects modelUse lineair trend model when: • data are too sparse for the time effects
model• one is interested in testing trends, e.g.
trends before and after a particular year (or let TRIM stepwise search for relevant changepoints)
But be careful with simple linear models!
Using TRIM
Statistical complications: • Serial correlation: dependence of
counts of earlier years (0 = no corr.) • Overdispersion: deviation from
Poisson distribution (1 = Poisson)
Using TRIM
Run TRIM with overdispersion = on and serial correlation = on, else standard errors and statisticaltests are usually invalid
Running TRIM features• trim command file• output: GOF (as X2) test and Wald
tests • output (fitted values, indices) • indices, time totals • overall trend slope• Frequently Asked Questions• different models (lineair trend
model, changepoints, covariate)
Using TRIM
Model run
X2 df p-value Akaikes Info. Criterium
1, all changepoints
191 140 0.0026 -85
2, all ch. points plus covariate
154 133 0.09 -106
3, two ch. points plus covariate
161 143 0.14 -125
Using TRIM
Both 2 and 3 are valid.Model 3 is the most sparse model.
What is the best model?
Model choice • The indices depend on the statistical
model!• TRIM allows to search for the best
model using GOF test, Akaikes Information Criterion and Wald tests • In case of substantial overdispersion,
one has to rely on the Wald tests
Using TRIM
Wald tests
Different Wald-tests to test for the significance of:• the trend slope parameters• changes in the slope• deviations from a linear trend• the effect of each covariate
Using TRIM
TRIM generates both model indices and imputed indices
Using TRIM
Imputed vs model indicesImputed indices: summation of real counts plus - for missing counts - model predictions. Closer to real counts (more realistic course in time) Model indices: summation of model predictions of all sites. Often more stable
Using TRIM
Usually Model and Imputed Indiceshardly differ!
TRIM computes both additive and multiplicative slopes
Additive + s.e. Multiplicative + s.e. 0.0485 0.0124 1.0497 0.0130
Relation: ln(1,0497) = 0.0485
Using TRIM
Multiplicative parameters are easier to understand
Interpretation multiplicative slopeSlope of 1.05 means 5% increase a year
Using TRIM
Standard error of 0.013 means a confidence interval of 2 x 0.013 = 0.026 Thus, slope between 1.024 and 1.076
Or, 2% to 8% increase a year = significant different from 1
Summary use of TRIM: • choice between time effects and linear trend model• include overdispersion & serial correlation in models• use GOF and Wald tests for better models and indices & to test hypotheses • choice between model and imputed indices • use multiplicative slope
Using TRIM
The program of this workshopAim: a basic understanding of TRIM • basic theory of imputation• how to use TRIM to impute missing
counts and to assess indices etc. • basic theory of weighting procedure
to cope with unequal sampling of areas & how to use TRIM to weight particular sites
Weighting
Unequal sampling due to• stratified random site selection, with
oversampling of particular strata. Weighting results in unbiased national indices • site selection by the free choice of
observers, with oversampling of particular regions & attractive habitat types. Weighting reduces the bias of indices.
Weighting
To cope with unequal sampling.• stratify the data, e.g. into regions
and habitat types • strata are to be expected to have
different indices & trends • weigh strata according to (1) the
number of sample sites in the stratum and (2) the area surface of the stratum • or weigh by population size per
stratum Weighting
Stratum
Total area
Area sampled Weight factor
i
50 5 (undersampled)
2
k
50 10 (oversampled)
1
Weighting factor for each stratum
Weighting factor for stratum i = total area of i / area of i sampled
Weighting
or 10
or 5
Stratum
Total area
Area sampled Weight factor
i
100 5 (undersampled)
k
50 10 (oversampled)
Another example ..
Weighting factor for stratum i = total area of i / area of i sampled
Weighting
100/5= 20(or 4)
50/10=5(or 1)
Weighting in TRIM• include weight factor (different per
stratum) in data file for each site and year record • weight strata and combine the
results to produce a weighted total (= run TRIM with weighting = on and covariate = on)
Weighting
Indices for Skylark unweighted (0 = total index 1= dunes 2 = heath-land)
Weighting
Indices for Skylark with weight factor for each dune site = 10(0 = total index 1= dunes 2 = heathland)
Weighting
Final remarks
To facilitate the calculation of many indices on a routine basis• TRIM in batch mode, using TRIM
Command Language (see manual) • Option to incorporate TRIM in your
own automation system (Access or Delphi or so) (not in manual)
That’s all, but: • if you have any questions about
TRIM, see the manual, the FAQ’s in TRIM or mail Arco van Strien [email protected]
Success!