1 standardization of variables maarten buis 5-12-2005

24
1 Standardization of variables Maarten Buis 5-12-2005

Upload: clarence-lambert

Post on 15-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Standardization of variables Maarten Buis 5-12-2005

1

Standardization of variables

Maarten Buis

5-12-2005

Page 2: 1 Standardization of variables Maarten Buis 5-12-2005

2

Recap

• Central tendency

• Dispersion

• SPSS

Page 3: 1 Standardization of variables Maarten Buis 5-12-2005

3

Standardization

• Is used to improve interpretability of variables.

• Some variables have a natural interpretable metric: e.g. income, age, gender, country.

• Others, primarily ordinal variables, do not: e.g. education, attitude items, intelligence.

• Standardizing these variables makes them more interpretable.

Page 4: 1 Standardization of variables Maarten Buis 5-12-2005

4

Standardization

• Transforming the variable to a comparable metric– known unit

– known mean

– known standard deviation

– known range

• Three ways of standardizing:– P-standardization (percentile scores)

– Z-standardization (z-scores)

– D-standardization (dichotomize a variable)

Page 5: 1 Standardization of variables Maarten Buis 5-12-2005

5

When you should always standardize

• When averaging multiple variables, e.g. when creating a socioeconomic status variable out of income and education.

• When comparing the effects of variables with unequal units, e.g. does age or education have a larger effect on income?

Page 6: 1 Standardization of variables Maarten Buis 5-12-2005

6

P-Standardization

• Every observation is assigned a number between 0 and 100, indicating the percentage of observation beneath it.

• Can be read from the cumulative distribution

• In case of knots: assign midpoints• The median, quartiles, quintiles, and deciles

are special cases of P-scores.

Page 7: 1 Standardization of variables Maarten Buis 5-12-2005

7

rent cum % percentileroom 1 175 5,3% 5,3%room 2 180 10,5% 10,5%room 3 185 15,8% 15,8%room 4 190 21,1% 21,1%room 5 200 26,3% 26,3%room 6 210 31,6% 36,8%room 7 210 36,8% 36,8%room 8 210 42,1% 36,8%room 9 230 47,4% 47,4%room 10 240 52,6% 55,3%room 11 240 57,9% 55,3%room 12 250 63,2% 65,8%room 13 250 68,4% 65,8%room 14 280 73,7% 73,7%room 15 300 78,9% 81,6%room 16 300 84,2% 81,6%room 17 310 89,5% 89,5%room 18 325 94,7% 94,7%room 19 620 100,0% 100,0%

Page 8: 1 Standardization of variables Maarten Buis 5-12-2005

8

P-standardization

• Turns the variable into a ranking, i.e. it turns the variable into a ordinal variable.

• It is a non-linear transformation: relative distances change

• Results in a fixed mean, range, and standard deviation; M=50, SD=28.6, This can change slightly due to knots

• A histogram of a P-standardized variable approximates a uniform distribution

Page 9: 1 Standardization of variables Maarten Buis 5-12-2005

9

Linear transformation

• Say you want income in thousands of guilders instead of guilders.

• You divide INCMID by f1000,-

M SD

Incmid ƒ2543,- ƒ1481,-

Incmid/1000 kƒ2,543 kƒ1,481

Page 10: 1 Standardization of variables Maarten Buis 5-12-2005

10

Linear transformation

• Say you want to know the deviation from the mean

• Subtract the mean (f2543,-) from INCMID

M SD

Incmid ƒ2543,- ƒ1481,-

Incmid-M ƒ0,- ƒ1481,-

Page 11: 1 Standardization of variables Maarten Buis 5-12-2005

11

Recap: multiplication and addition and the number line

Page 12: 1 Standardization of variables Maarten Buis 5-12-2005

12

Linear transformation

• Adding a constant (X’ = X+c)– M(X’) = M(X)+c

– SD(X’) = SD(X)

• Multiply with a constant (X’ = X*c)– M(X’) = M(X)*c

– SD(X’) = SD(X) * |c|

Page 13: 1 Standardization of variables Maarten Buis 5-12-2005

13

Z-standardization

• Z = (X-M)/SD• two steps:

– center the variable (mean becomes zero)– divide by the standard deviation (the unit becomes

standard deviation)

• Results in fixed mean and standard deviation: M=0, SD=1

• Not in a fixed range!• Z-standardization is a linear transformation:

relative distances remain intact.

Page 14: 1 Standardization of variables Maarten Buis 5-12-2005

14

Z-standardization

• Step 1: subtract the mean

• c = -M(X)

• M(X’) = M(X)+c

• M(X’) = M(X)-M(X)=0

• SD(X’)=SD(X)

Page 15: 1 Standardization of variables Maarten Buis 5-12-2005

15

Z-standardization

• Step 2: divide by the standard deviation

• c is 1/SD(X)

• M(Z) = M(X’) * c

• M(Z) = 0 * 1/SD(X) = 0

• SD(Z) = SD(X’) * c

• SD(Z) = SD(X) * 1/SD(X) = 1

Page 16: 1 Standardization of variables Maarten Buis 5-12-2005

16

Normal distribution• Normal distribution = Gauss curve = Bell

curve• Formula (McCall p. 120)

– Note the (x-)2 part– apart from that all you have to remember is that

the formula is complicated

• Normal distribution occurs when a large number of small random events cause the outcome: e.g. measurement error

Page 17: 1 Standardization of variables Maarten Buis 5-12-2005

17

Normal distribution

• Other examples the height of individuals, intelligence, attitude

• But: the variables Education, Income and age in Eenzaam98 are not normally distributed

Page 18: 1 Standardization of variables Maarten Buis 5-12-2005

18

Z-scores and the normal distribution

• Z-standardization will not result in a normally distributed variable

• Standardization in NOT the same as normalization• We will not discuss normalization (but it does

exist)• But: If the original distribution is normally

distributed, than the z-standardized variable will have a standard normal distribution.

Page 19: 1 Standardization of variables Maarten Buis 5-12-2005

19

Standard normal distribution

• Normal distribution with M=0 and SD=1.

• Table A in Appendix 2 of McCall

• Important numbers (to be remembered):– 68% of the observations lie between ± 1 SD– 90% of the observations lie between ± 1.64 SD– 95% of the observations lie between ± 1.96 SD– 99% of the observations lie between ± 2.58 SD

Page 20: 1 Standardization of variables Maarten Buis 5-12-2005

20

Why bother?• If you know:

– That a variable is normally distributed– the mean and standard deviation

• Than you know the percentage of observations above or below and observation

• These numbers are a good approximation, even if the variable is not exactly normally distributed

Page 21: 1 Standardization of variables Maarten Buis 5-12-2005

21

P & Z standardization

• Both give a distribution with fixed mean, standard deviation, and unit

• P-standardization also gives a fixed range

• Both are relative to the sample: if you take observations out, than you have to re-compute the standardized variables

Page 22: 1 Standardization of variables Maarten Buis 5-12-2005

22

P & Z-standardization

• When interpreting Z-standardized variables one uses percentiles

• With P-standardization one decreases the scale of measurement to ordinal, BUT this improves interpretability.

Page 23: 1 Standardization of variables Maarten Buis 5-12-2005

23

Student recap

Page 24: 1 Standardization of variables Maarten Buis 5-12-2005

24

Do before Wednesday

• Read McCall chapter 5

• Understand Appendix 2, table A

• make exercises 5.7-5.28