trim for first users

TRIM Workshop Arco van StrienWildlife statistics Statistics Netherlands (CBS)

What is TRIM?• TRends and Indices for Monitoring

data• Computer program for the analysis

of time series of count data with missing observations • Loglinear, Poisson regression (GLM)• Made for the production of wildlife

statistics by Statistics Netherlands (Jeroen Pannekoek / freeware / version 3.0)

Introduction

Why TRIM?• To get better indices? No, GLM in

statistical packages (Splus, Genstat...) may produce similar results • But statistical packages are often

unpractical for large datasets • TRIM is more easy to use

Introduction

The program of this workshopAim: a basic understanding of TRIM • basic theory of imputation• how to use TRIM to impute missing

counts and to assess indices etc. • basic theory of weighting procedure

to cope with unequal sampling of areas & how to use TRIM to weight particular sites

Introduction

Site

Year 1 Year 2 Year 3 Year 4 Year 5

1 20 10 8 2 3 2 20 10 12 3 2 3 16 8 10 3 3 4 8 4 6 6 5 5

10 5 7 7 8

Sum

74 37 43 21 21

Index 100 50 58 28 28

Introduction

INDEX: the total (= sum of al sites) for a year divided by the total of the base year

Site


1 20 10 8 2 3 2 20 10 12 3 2 3 16 8 10 3 3 4 8 4 6 - 5 5

10 5 7 - 8

Sum

74 37 43 8?? 21

Index 100 50 58 11?? 28

Missing values affect indices

Theory imputation

Site

Year 1 Year 2

1 2 4 2

1 ?

Sum

3 ?

Index 100 ?

How to impute missing values?

ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1(site & year effect taken into account)

26

200

Theory imputation

Site

Year 1 Year 2

1 1 2 2

3 ?

Sum

4 ?

Index 100 ?

Another example..

ESTIMATION OF SITE 2 IN YEAR 2? SITE 1 SUGGESTS: TWICE THE NUMBER OF YEAR 1

6

8

200

Theory imputation

Site

Year 1 Year 2

1 1 3 2

3 ?

Sum

4 ?

Index 100 ?

And another example ...

ESTIMATION OF SITE 2 IN YEAR 2? SITE1 SUGGESTS: THREE TIMES AS MANY AS IN YEAR 1

9

12

300

Theory imputation

Site

Year 1 Year 2

1 ? 4 2

1 ?

Sum

? ?

Index 100 ?

Try this one…..

THERE IS NOT A SINGLE SOLUTION (TRIM will prompt an ERROR)

Theory imputation

Site


1 20 10 8 2 3 2 - - 12 3 2 3 16 8 10 3 3 4 8 4 6 - 5 5

10 5 7 - 8

Sum

? ? 43 8?? 21

Index 100 50 58 11?? 28

Difficult to guess missings here..

Theory imputation

Site

Year 1 Year 2 Margin

1 2 4 6 2

1 ? 1

Margin

3 4 7

Estimating missing values by an iterative procedure(REQUIRED IN CASE OF MORE THAN A FEW MISSING VALUES)

Theory imputation

Site


1 2 4 6 2

1 ?

1

Margin

3 4

7

RECALCULATE THE MARGIN TOTALSAND REPEAT ESTIMATION OF MISSING

First estimate of site 2, year 2: 1 X 4/7 = 0.6

>>0.6

>>4.6

>>1.6

>>7.6

Theory imputation

Site


1 2 4 6 2

1 0.96 >>>> 2

>>>> 3

Margin

3 >>>> 6 >>>> 9

Index 100 200

REPEAT AGAIN: MISSING VALUE = 1.22, 1.40, 1.54 ETC. … >> 2

2nd estimate of site 2, year 2: 1.6 X 4.6/7.6 = 0.96

Theory imputation

• To get proper indices, it is necessary to estimate (impute) missings • Missings may be estimated from the margin

totals using an iterative procedure (taking into account both site effect as year effect) (Note: TRIM uses a much faster algorithm to impute missing values).

• Assumption: year-to-year changes are similar for all sites (assumption will be relaxed later!) • Test this assumption using a Goodness-of-

fit (X2 test)

Theory imputation

Site


1 2 4 6 2

1 3 4

Margin

3 7 10

(2.8)(4.2)

(1.2)(1.8)

X2: COMPARE EXPECTED COUNTS WITH REAL COUNTS PER CELL

X2 IS SUMMATION OF (COUNTED - EXPECTED VALUE)2 / EXP. VALUE (2-1.8)2 /1.8 + (4-4.2)2 /4.2 ETC. >> X2 = 0.08 WITH A P-VALUE OF 0.78 >>MODEL NOT REJECTED (FITS, but note: cell values in this example are too small for a proper X2 test) Theory imputation

Site


1 20 10 8 2 3 2 20 10 12 3 2 3 16 (7.5) ? 10 3 3 4 8 4 6 (2.3) ? 5 5

10 5 7 7 8

Sum

74 36 43 17 21

Index 100 49 58 23 28

Imputation without covariate(X2 = 18 and p-value = 0.18)

Theory imputation

Site

Year 1

Year 2 Year 3 Year 4 Year 5

1 20 10 8 2 3 2 20 10 12 3 2 3 16 7.5>>9.1 10 3 3 4 8 4 6 2.3>> 5.4

5

5 10 5 7 7 8 Sum

74 36>>38 43 17>>20 21

Index 100 49>>51 58 23>>28 28

Using a covariate: better imputa-tions & indices, X2 = 1.7 p = 0.99

Theory imputation

Model

X2 df p-value

1

191 140 0.0026

2

154 133 0.09

3

161 143 0.14

What is the best model?

< not rejected

<<< rejected

< not rejected

Both model 2 and 3 are valid

Theory imputation

Summary imputation theory• To get proper indices, it is necessary to

impute missings • Assumption: year-to-year changes are

similar for all sites of the same covariate category • Test assumption using a GOF test; if p-

value < 0.05, try better covariates• If these cannot be found, the resulting

indices may be of low quality (and standard errors high). See also FAQ’s!

Theory imputation



to cope with unequal sampling of areas & how to use TRIM to weigh particular sites

Using TRIM

Using TRIM• several statistical models (time effects, linear model)• statistical complications (overdispersion, serial correlation) taken into account• Wald tests to test significances• model versus imputed indices • interpretation of slope

Using TRIM

Time effects model (skylark data) without covariate

Using TRIM

Time effects model with covariate 0 = total 1= dunes 2 = heathland

Using TRIM

Lineair trend model (uses trend estimate to impute missing values)

Using TRIM

Lineair trend model with a changepoint at year 2

Using TRIM

Lineair trend model with changepoints at year 2 and 3

Using TRIM

Lineair trend model with allchangepoints = time effects modelUse lineair trend model when: • data are too sparse for the time effects

model• one is interested in testing trends, e.g.

trends before and after a particular year (or let TRIM stepwise search for relevant changepoints)

But be careful with simple linear models!

Using TRIM

Statistical complications: • Serial correlation: dependence of

counts of earlier years (0 = no corr.) • Overdispersion: deviation from

Poisson distribution (1 = Poisson)

Using TRIM

Run TRIM with overdispersion = on and serial correlation = on, else standard errors and statisticaltests are usually invalid

Running TRIM features• trim command file• output: GOF (as X2) test and Wald

tests • output (fitted values, indices) • indices, time totals • overall trend slope• Frequently Asked Questions• different models (lineair trend

model, changepoints, covariate)

Using TRIM

Model run

X2 df p-value Akaikes Info. Criterium

1, all changepoints

191 140 0.0026 -85

2, all ch. points plus covariate

154 133 0.09 -106

3, two ch. points plus covariate

161 143 0.14 -125

Using TRIM

Both 2 and 3 are valid.Model 3 is the most sparse model.

What is the best model?

Model choice • The indices depend on the statistical

model!• TRIM allows to search for the best

model using GOF test, Akaikes Information Criterion and Wald tests • In case of substantial overdispersion,

one has to rely on the Wald tests

Using TRIM

Wald tests

Different Wald-tests to test for the significance of:• the trend slope parameters• changes in the slope• deviations from a linear trend• the effect of each covariate

Using TRIM

TRIM generates both model indices and imputed indices

Using TRIM

Imputed vs model indicesImputed indices: summation of real counts plus - for missing counts - model predictions. Closer to real counts (more realistic course in time) Model indices: summation of model predictions of all sites. Often more stable

Using TRIM

Usually Model and Imputed Indiceshardly differ!

TRIM computes both additive and multiplicative slopes

Additive + s.e. Multiplicative + s.e. 0.0485 0.0124 1.0497 0.0130

Relation: ln(1,0497) = 0.0485

Using TRIM

Multiplicative parameters are easier to understand

Interpretation multiplicative slopeSlope of 1.05 means 5% increase a year

Using TRIM

Standard error of 0.013 means a confidence interval of 2 x 0.013 = 0.026 Thus, slope between 1.024 and 1.076

Or, 2% to 8% increase a year = significant different from 1

Summary use of TRIM: • choice between time effects and linear trend model• include overdispersion & serial correlation in models• use GOF and Wald tests for better models and indices & to test hypotheses • choice between model and imputed indices • use multiplicative slope

Using TRIM



to cope with unequal sampling of areas & how to use TRIM to weight particular sites

Weighting

Unequal sampling due to• stratified random site selection, with

oversampling of particular strata. Weighting results in unbiased national indices • site selection by the free choice of

observers, with oversampling of particular regions & attractive habitat types. Weighting reduces the bias of indices.

Weighting

To cope with unequal sampling.• stratify the data, e.g. into regions

and habitat types • strata are to be expected to have

different indices & trends • weigh strata according to (1) the

number of sample sites in the stratum and (2) the area surface of the stratum • or weigh by population size per

stratum Weighting

Stratum

Total area

Area sampled Weight factor

i

50 5 (undersampled)

2

k

50 10 (oversampled)

1

Weighting factor for each stratum

Weighting factor for stratum i = total area of i / area of i sampled

Weighting

or 10

or 5

Stratum

Total area

Area sampled Weight factor

i

100 5 (undersampled)

k

50 10 (oversampled)

Another example ..

Weighting factor for stratum i = total area of i / area of i sampled

Weighting

100/5= 20(or 4)

50/10=5(or 1)

Weighting in TRIM• include weight factor (different per

stratum) in data file for each site and year record • weight strata and combine the

results to produce a weighted total (= run TRIM with weighting = on and covariate = on)

Weighting

Indices for Skylark unweighted (0 = total index 1= dunes 2 = heath-land)

Weighting

Indices for Skylark with weight factor for each dune site = 10(0 = total index 1= dunes 2 = heathland)

Weighting

Final remarks

To facilitate the calculation of many indices on a routine basis• TRIM in batch mode, using TRIM

Command Language (see manual) • Option to incorporate TRIM in your

own automation system (Access or Delphi or so) (not in manual)

That’s all, but: • if you have any questions about

TRIM, see the manual, the FAQ’s in TRIM or mail Arco van Strien [email protected]

Success!

trim for first users

Documents