multilevel modeling

50
Multilevel Modeling

Upload: adrienne-evans

Post on 03-Jan-2016

76 views

Category:

Documents


10 download

DESCRIPTION

Multilevel Modeling. Multilevel Question. Turns out the Simple Random Sampling is very expensive Travel to Moscow, Idaho to give survey to a single student. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multilevel Modeling

Multilevel Modeling

Page 2: Multilevel Modeling

Multilevel Question

• Turns out the Simple Random Sampling is very expensive

• Travel to Moscow, Idaho to give survey to a single student.

• The subsets are conventionally called primary sampling units or psu's. In a two-stage sample, rst a sample is drawn from the primary sampling units (the rst-stage sample), and within each psu included in the rst-stage sample, a sample of population elements is drawn (the second-stage sample).

• This can be extended to situations with more than two levels, e.g., individuals within households within municipalities, and then is called a multistage sample.

Page 3: Multilevel Modeling

These are examples of two-level data structures, but extensions to multiple levels

are possible:10 cities ->In each city: 5 schools

->In each school: 2 classes->In each class: 5 students

->Each student given the test twice

Page 4: Multilevel Modeling

What is Multilevel or Hierarchical Linear Modeling?

Nested Data Structures

Page 5: Multilevel Modeling

Individuals Undivided

Unit of Analysis = Individuals

Page 6: Multilevel Modeling

Individuals Nested Within Groups

Unit of Analysis = Individuals + Classes

Page 7: Multilevel Modeling

… and Further Nested

Unit of Analysis = Individuals + Classes + Schools

Page 8: Multilevel Modeling

Examples of Multilevel Data Structures

Neighborhoods are nested within communities

Families are nested within neighborhoods

Children are nested within families

Page 9: Multilevel Modeling

Examples of Multilevel Data Structures

Schools are nested within districts

Classes are nested within schools

Students are nested within classes

Page 10: Multilevel Modeling

Multilevel Data Structures

Level 4 District (l)

Level 3 School (k)

Level 2 Class (j)

Level 1 Student (i)

Page 11: Multilevel Modeling

2nd Type of Nesting

Repeated Measures Nested Within Individuals

Focus = Change or Growth

Page 12: Multilevel Modeling

Time Points Nested Within Individuals

Page 13: Multilevel Modeling

Nested Data

Data nested within a group tend to be more alike than data from individuals selected at random.

Nature of group dynamics will tend to exert an effect on individuals.

Page 14: Multilevel Modeling

Nested Data

Intraclass correlation (ICC) provides a measure of the clustering and dependence of the data

0 (very independent) to 1.0 (very dependent)

Details discussed later

Page 15: Multilevel Modeling

Multilevel Modeling Seems New But….

Extension of General Linear Modeling

Simple Linear RegressionMultiple Linear Regression

ANOVAANCOVA

Repeated Measures ANOVA

Page 16: Multilevel Modeling

Why Multilevel Modelingvs. Traditional Approaches?

Traditional Approaches – 1-Level

1. Individual level analysis (ignore group)

2. Group level analysis (aggregate data and ignore individuals)

Page 17: Multilevel Modeling

Problems withTraditional Approaches

1. Individual level analysis (ignore group)

Violation of independence of data assumption leading to misestimated standard errors (standard errors are smaller than they should be).

Page 18: Multilevel Modeling

Problems withTraditional Approaches

1. Group level analysis (aggregate data and ignore individuals)

Aggregation bias = the meaning of a variable at Level-1 (e.g., individual level SES) may not be the same as the meaning at Level-2 (e.g., school level SES)

Page 19: Multilevel Modeling

Example:

  Before AfterPatient SBP DBP SBP DBP

1 210 130 201 1252 169 122 165 1213 187 124 166 1214 160 104 157 1065 167 112 147 1016 176 101 145 857 185 121 168 988 206 124 180 1059 173 115 147 103

10 146 102 136 9811 174 98 151 9012 201 119 168 9813 198 106 179 11014 148 107 129 10315 154 100 131 82

Paired t-test: the average change in DBP is significantly different from zero (p = 0.000951)

Unpaired t-test: the average change in DBP is significantly different from zero (p = 0.036)

Page 20: Multilevel Modeling

Multilevel Approach

2 or more levels can be considered simultaneously

Can analyze within- and between-group variability

Page 21: Multilevel Modeling

How Many Levels Are Usually Examined?

2 or 3 levels very common

15 students x 10 classes x 10 schools

= 1,500

Page 22: Multilevel Modeling

Types of Outcomes

Continuous Scale (Achievement, Attitudes)

Binary (pass/fail) Categorical with 3 + categories

Page 23: Multilevel Modeling

Effect for estimation of a mean

if the sample is a two-stage sample using random sampling with replacement at either stage or if the sampling fractions are so low that the difference between sampling with and sampling without replacement is negligible.

Page 24: Multilevel Modeling

Effect for estimation of a mean Since considerations for the choice of a design always are of

an approximate nature, only those designs are considered here where each level-two unit contains the same number of level-one units.

Level-two units will sometimes be referred to as clusters. The number of level-two units is denoted N

The number of level-one units within each level-two unit is denoted n

These numbers are called the level-two sample size and the cluster size, respectively

The total sample size is Nn. If in reality the number of level-one units fluctuates between

level-two units, it will almost always be a reasonable approximation to use for n the average number of sampled level-one units per level-two unit.

Page 25: Multilevel Modeling

Effect for estimation of a mean Suppose that the mean is to be estimated of some variable Y in a

population which has a two-level structure. As an example, Y could be the duration of hospital stay after a certain operation under the condition that there are no complications or additional health problems.

2

2

1

1

var( )

level-one variance var( )

Estimate

ij j ij

j

ij

y u e

j N

i n

u

e

Random Intercept

Page 26: Multilevel Modeling

Effect for estimation of a mean

1 1

2 2

The overall sample mean is:

The variance of this estimator is:

ˆ var( )=

compared to the variance of the estimator

if it came from a simple random sample:

N n

ijj i

yNn

n

Nn

2 2

Nn

1. This increase in complexity permeates to regression, etc2. This is a relatively simple model, more complex models lead to more

complex calculations that require the calculation of large covariance matrices

Page 27: Multilevel Modeling

Easier Case

2

1

1

level-2 effect is constant

level-one variance var( )

Notice can be observe but not can not be controlled

( ) ( ) (

ij ij j ij

j

ij

ij j j

ij j ij j j j ij j

y x u e

j N

i n

u

e

x u u

y y x x u u e e

)

0 Now we can find optimal j ju u

Another alternative to this operation is to add a dummy variable for each individual

The effect of each level-2 unit is a constant (fixed), not a random variable

Page 28: Multilevel Modeling

Software to do Multilevel Modeling

SAS Users

PROC MIXED

Extension of General Linear Modeling

Simple Linear RegressionMultiple Linear Regression

ANOVAANCOVA

Repeated Measures ANOVA

PROC REGPROC GLM

PROC ANOVA

Page 29: Multilevel Modeling

Example: Family and Gender

The response variable Height measures the heights (in inches) of 18 individuals.

The individuals are classified according to Family and Gender

data heights; input Family Gender$ Height @@; datalines; 1 F 67 1 F 66 1 F 64 1 M 71 1 M 72 2 F 63 2 F 63 2 F 67 2 M 69 2 M 68 2 M 70 3 F 63 3 M 64 4 F 67 4 F 66 4 M 67 4 M 67 4 M 69 ; run;

Different than “Effects…” because now we have more cluster levels, but no random intercepts

Page 30: Multilevel Modeling

Example: Family and Gender The PROC MIXED statement invokes the procedure. The

CLASS statement instructs PROC MIXED to consider both Family and Gender as classification variables.

Dummy (indicator) variables are, as a result, created corresponding to all of the distinct levels of Family and Gender.

For these data, Family has four levels and Gender has two levels.

proc mixed data=heights; class Family Gender; model Height = Gender Family Family*Gender/s; run;

s : requests that a solution for the fixed-effects parameters be produced along with their approximate standard errors

Page 31: Multilevel Modeling

Family and Gender

Run program simple-proc_mixed2.sas

What happens when you try to use the statement CLASS in a PROC REG?

Ordinary Linear Regression coefficients are just one set of them, while for HLM coefficients are estimated for each group unit (i.e., school)

Page 32: Multilevel Modeling

Dorsal shells in lizardsTwo-sample t-test: the small observed difference is not significant (p = 0.1024).

Page 33: Multilevel Modeling

Mother effect

We have 102 lizards from 29 mothers

Mother effects might be present Hence a comparison between

male and female animals should be based on within-mother comparisons.

Page 34: Multilevel Modeling

Mother effect

Mother

# of dorsal shells

Page 35: Multilevel Modeling

First Choice

is the th measurement for the th mother

Overall mean

Gender effect ( 0 for males, 1 for females )

Mother effect

Parameter of Interest

Too many par

ij

ij i ij ij

ij ij

i

Y j i

Y x

x x

2

ameters, hence we need restrictions on , 0

Residual Distribution: (0, )

i ii

ij resN

Test for a ‘sex’ effect, correcting for ‘mother’ effects,

Β can be interpreted as the average difference between males and females for each mother

More complex example than “Effect…” because now we have a variable xij for each observation

Page 36: Multilevel Modeling

SAS program

proc mixed data = lizard;class mothc;model dors = sex mothc;

run;

Source F Value Pr > FSEX 7.19 0.0091MOTHC 3.95 <.0001

1. Highly significant mother effect.2. Significant gender effect.3. Many degrees of freedom are spent to the estimation of the mother

effect, which is not even of interest

Notice that in the previous example “Family and Gender” , gender was a used to define level(cluster) here is just a variable. In previous example it was assumed that individual of the same gender were “clustered”/correlated? Now it is just an input variable

Page 37: Multilevel Modeling

Later in this semester… Note the different nature of the two factors:

SEX: defines 2 groups of interest MOTHER: defines 29 groups not of real interest. A new sample would

imply other mothers.

In practice, one therefore considers the factor ‘mother’ as a random factor.

The factor ‘sex’ is a fixed effect. Thus the model is a mixed model. In general, models can contain multiple fixed and/or

random factors.

constant

0i

ii

Fixed Effects Model

),0(~ 2 Ni

Random Effects Model

As in the slides of “Effect…”

Page 38: Multilevel Modeling

Later in this semester… Note the different nature of the two factors:

SEX: defines 2 groups of interest MOTHER: defines 29 groups not of real interest. A new sample would

imply other mothers.

In practice, one therefore considers the factor ‘mother’ as a random factor.

The factor ‘sex’ is a fixed effect. Thus the model is a mixed model. In general, models can contain multiple fixed and/or

random factors.proc mixed data = lizard;

class mothc;model dors = sex / solution;random mothc;

run;

Page 39: Multilevel Modeling

More terminology

Fixed coefficient A regression coefficient that does

not vary across individuals Random coefficient

A regression coefficient that does vary across individuals

Page 40: Multilevel Modeling

Is a variable random or fixed effect?

LaMotte 1983, pp. 138–139 Treatment levels used are the only ones about which

inferences are sought => fixed Effect Inferences are sought about a broader collection of

treatment effects than those used in the experiment, or if the treatment levels are not selected purposefully => Random Effect

Page 41: Multilevel Modeling

More terminology Balanced design

Equal number of observations per unit Unbalanced design

Unequal number of observation per unit Unconditional model

Simplest level 2 model; no predictors of the level 1 parameters (e.g., intercept and slope)

Conditional model Level 2 model contains predictors of level 1

parameters

Page 42: Multilevel Modeling

Weighted Data

Problem:

Minority Voters

White Voters

Pct. of Voting Population

Minority Voters

White Voters

Pct. of People who have a phone

Solution: Give more “weight” to the minority people with telephone

Page 43: Multilevel Modeling

Weighted Data

Not limited to 2 categories

Minority/Dem.

White /Rep

Pct. of Voting Population

Pct. of People who have a phone

Minority/Rep.

White /Dem

How many categories? As many as there are significant

Minority/Dem.

White /Rep

Minority/Rep.

White /Dem

Page 44: Multilevel Modeling

Proportion

Suppose minority voters are 1/3 of the voting population but only 1/6 of the people with phone

1 1? ? 2

6 35 2 4

? ? .86 3 5

Needless to say that in reality this is a much more complex issue

A sampling weight for a given data point is the number of receipts in the target population which that sample point represents.

Page 45: Multilevel Modeling

Which weight we need to use?

Oversimplified example (don’t take seriously)

Minority Voters

White Voters

Pct. of People who have a phone Minority

Voters

White Voters

Pct. of Voting Population in 2008

Minority Voters

White Voters

Pct. of Voting Population in 2010

O

M

Page 46: Multilevel Modeling

Proportion

1. 100 minority + 500 white answer the phone survey2. 75 Minority will vote for candidate X3. 250 White will votes for candidate X4. Non-Weighted Conclusion: 325/600 =54.16% of the voters

will vote for candidate X5. Weighted Conclusion:

1. 75 minority = 75% of minority with phone=>(.75)*(1/6)=12.5% of people with phone * 2 weight= 25% pct of voting population

2. 250 white = 50% of white people with phone =>(.5)*(5/6)= 41.66% of people with phone * .8 weight =>33.33%

3. 25% +33.33%=58.33%

Suppose minority voters are 1/3 of the voting population but only 1/6 of the people with phone

Page 47: Multilevel Modeling

SAS Weighted Mean

proc means data=sashelp.class;var height;

run;

proc means data=sashelp.class;weight weight;var height;

run;

Page 48: Multilevel Modeling

Weighted PROC MIXED

proc mixed data=sashelp.class covtest;class Sex;model height=Sex Age/solution;weight weight;

run;

proc mixed data=sashelp.class covtest;class Sex;model height=Sex Age/solution;weight weight;

run;

Notice the difference (kind of small) in let’s say the coefficients of the model (Solution for Fixed Effects/Estimates)

Page 49: Multilevel Modeling

Farms Example

It's stratified by regions within Iowa and Nebraska.

Regress on farm area, with separate intercept and slope for each state

Page 50: Multilevel Modeling

References LaMotte, L. R. (1983). Fixed-,

random-, and mixed-effects models. In Encyclopedia of Statistical Sciences, S. Kotz, N. L. Johnson, and C. B. Read