categorical independent variablesbrunner/oldclass/appliedf11/... · 2011-10-23 · indicator dummy...

Post on 25-Dec-2019

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CategoricalIndependentVariables

Andmultiplecomparisons

One‐wayAnalysisofvariance

•  CategoricalIV•  QuantitativeDV•  pcategories(groups)•  H0:Allpopulationmeansequal

•  Normalconditionaldistributions•  Equalvariances

Analysismeanstosplitup

•  WithnoIV,bestpredictoristheoverallmean

•  VariationtobeexplainedisSSTO,sumofsquareddifferencesfromtheoverallmean

•  WithanIV,bestpredictoristhegroupmean•  VariationstillunexplainedisSSW,sumofsquareddifferencesfromthegroupmeans

SSTO=SSB+SSW

ANOVASummaryTable

Sum ofSource DF Squares Mean Square F Value Pr > F

Model p! 1 SSB MSB = SSB/(p! 1) MSB/MSW p-value

Error n! p SSW MSW = SSW/(n! p)

Corrected Total n! 1 SSTO

H0 : µ1 = . . . = µp

R2istheproportionofvariationexplainedbytheindependentvariable

Contrasts

c = a1µ1 + a2µ2 + · · · + apµp

!c = a1Y 1 + a2Y 2 + · · · + apY p

OverallF‐testisatestofp‐1contrasts

c = a1µ1 + a2µ2 + · · · + apµp

MultipleComparisons

•  Mosthypothesistestsaredesignedtobecarriedoutinisolation

•  Butifyoudoalotoftestsandallthenullhypothesesaretrue,thechanceofrejectingatleastoneofthemcanbealotmorethanα.ThisisinflationoftheTypeIerrorrate.

•  Multiplecomparisons(sometimescalledfollow‐uptests,posthoctests,probing)offerasolution.

Multiplecomparisons

•  ProtectafamilyoftestsagainstTypeIerroratsomejointsignificancelevelα

•  Ifallthenullhypothesesaretrue,theprobabilityofrejectingatleastoneisnomorethanα

Multiplecomparisonsofcontrastsinaone‐waydesign:Assumeallmeansare

equalinthepopulation

•  Bonferroni•  Tukey•  Scheffé

Bonferroni

•  BasedonBonferroni’sinequality

•  Appliestoanycollectionofktests•  Assumeallknullhypothesesaretrue•  EventAjisthatnullhypothesisjisrejected.•  Dothetestsasusual•  RejecteachH0ifp<0.05/k•  Or,adjustthep‐values.Multiplythembyk,andrejectifpk<0.05

Pr!!k

j=1Aj

""

k#

j=1

Pr{Aj}

Bonferroni

•  Advantage:Flexibility•  Advantage:Easytodo

•  Disadvantage:Mustknowwhatallthetestsarebeforeseeingthedata

•  Disadvantage:Alittleconservative;thetruejointsignificancelevelislessthanα.

Tukey(HSD)

•  Basedonthedistributionofthelargestmeanminusthesmallest.

•  Appliesonlytopairwisecomparisonsofmeans

•  Ifsamplesizesareequal,it’smostpowerful,period

•  Ifsamplesizesarenotequal,it’sabitconservative

Scheffé

•  Findtheusualcriticalvaluefortheinitialtest.Multiplybyp‐1.ThisistheScheffécriticalvalue.

•  Familyincludesallcontrasts:Infinitelymany!•  Youdon’tneedtospecifytheminadvance•  Basedontheunion‐intersectionprinciple–moredetailslater,afterF‐tests.

Scheffé

•  Follow‐uptestscannotbesignificantiftheinitialoveralltestisnot.NotquitetrueofBonferroniandTukey.

•  Iftheinitialtest(ofp‐1contrasts)issignificant,thereisasinglecontrastthatissignificant(notnecessarilyapairwisecomparison)

•  Adjustedp‐valueisthetailareabeyondFtimes(p‐1)

Whichmethodshouldyouuse?

•  Ifthesamplesizesarenearlyequalandyouareonlyinterestedinpairwisecomparisons,useTukeybecauseit'smostpowerful

•  Ifthesamplesizesarenotclosetoequalandyouareonlyinterestedinpairwisecomparisons,thereis(amazingly)noharminapplyingallthreemethodsandpickingtheonethatgivesyouthegreatestnumberofsignificantresults.(It’sokaybecausethischoicecouldbedeterminedinadvancebasedonnumberoftreatments,αandthesamplesizes.)

•  Ifyouareinterestedincontraststhatgobeyondpairwisecomparisonsandyoucanspecifyallofthembeforeseeingthedata,BonferroniisalmostalwaysmorepowerfulthanScheffé.(Tukeyisout.)

•  Ifyouwantlotsofspecialcontrastsbutyoudon'tknowexactlywhattheyallare,Schefféistheonlyhonestwaytogo,unlessyouhaveaseparatereplicationdataset.

DummyVariables

•  X=1meansDrug,X=0meansPlacebo

•  Populationmeanis

•  Forpatientsgettingthedrug,meanresponseis

•  Forpatientsgettingtheplacebo,meanresponseis

Yi = !0 + !1xi,1 + "i

RegressiontestofH0:β1=0

•  Sameasanindependentt‐test•  SameasaonewayANOVAwith2categories

•  Samet,sameF,samep‐value.

DrugA,DrugB,Placebo•  x1=1ifDrugA,Zerootherwise•  x2=1ifDrugB,Zerootherwise• 

Regressioncoefficientsarecontrastswiththecategorythathasnoindicator‐Thereferencecategory.

H0 : µ1 = µ2 = µ3 ! !1 = !2 = 0

Indicatordummyvariablecodingwithintercept

•  Needp‐1indicatorstorepresentacategoricalIVwithpcategories

•  Ifyouusepdummyvariables,trouble•  Regressioncoefficientsarecontrastswiththecategorythathasnoindicator

•  Callthisthereferencecategory

Nowaddaquantitativevariable(covariate)

•  x1=Age•  x2=1ifDrugA,Zerootherwise•  x3=1ifDrugB,Zerootherwise• 

Whatdoyoureport?•  x1=Age•  x2=1ifDrugA,Zerootherwise•  x3=1ifDrugB,Zerootherwise• 

Setallcovariatestotheirsamplemeanvalues

•  AndcomputeY‐hatforeachgroup•  Callitan“adjusted”mean,orsomethinglike“averageuniversityGPAadjustedforHighSchoolGPA.”

•  SAScallsitaleastsquaresmean(lsmeans)

Test whether the average response to Drug A and Drug B is different from response to the placebo, controlling for age. What is the null hypothesis?

Show your work

We want to avoid this kind of thing. It can get complicated.

A common error

•  Categorical IV with p categories •  p dummy variables (rather than p-1) •  And an intercept

•  There are p population means represented by p+1 regression coefficients - not unique

But suppose you leave off the intercept

•  Now there are p regression coefficients and p population means

•  The correspondence is unique, and the model can be handy -- less algebra

•  Called cell means coding

Cell means coding: p indicators and no intercept

Add a covariate: x4

Effect coding •  p-1 dummy variables for p categories •  Include an intercept •  Last category gets -1 instead of zero •  What do the regression coefficients

mean?

Meaning of the regression coefficients

With effect coding •  Intercept is the Grand Mean •  Regression coefficients are deviations of

group means from the grand mean •  Equal population means is equivalent to zero

coefficients for all the dummy variables •  Last category is not a reference category

Sometimes speak of the “main effect” of a categorical variable

•  More than one categorical IV (factor) •  Marginal means are average group mean,

averaging across the other factors •  This is loose speech: There are actually p

main effects for a variable, not one •  Blends the “effect” of an experimental

variable with the technical statistical meaning of effect.

•  It’s harmless

Add a covariate: Age = x1

Regression coefficients are deviations from the average conditional population mean (conditional on x1).

So if the regression coefficients for all the dummy variables equal zero, the categorical IV is unrelated to the DV, controlling for the covariates.

We will see later that effect coding is very useful when there is more than one categorical independent variable and we are interested in interactions --- ways in which the relationship of an independent variable with the dependent variable depends on the value of another independent variable.

What dummy variable coding scheme should you use?

•  Whichever is most convenient •  They are all equivalent, if done correctly •  Same test statistics, same conclusions

Interactions

•  Interactionbetweenindependentvariablesmeans“Itdepends.”

•  RelationshipbetweenoneIVandtheDVdependsonthevalueofanotherIV.

•  Canhave– Quantitativebyquantitative– Quantitativebycategorical– Categoricalbycategorical

QuantitativebyQuantitative

Y = !0 + !1x1 + !2x2 + !3x1x2 + "

E(Y |x) = !0 + !1x1 + !2x2 + !3x1x2

Forfixedx2

E(Y |x) = (!0 + !2x2) + (!1 + !3x2)x1

Bothslopeandinterceptdependonvalueofx2

Andforfixedx1,slopeandinterceptrelatingx2toE(Y)dependonthevalueofx1

QuantitativebyCategorical

•  Interactionmeansslopesarenotparallel•  Formaproductofquantitativevariablebyeachdummyvariableforthecategoricalvariable

•  Forexample,threetreatmentsandonecovariate:x1isthecovariateandx2,x3aredummyvariables

Y = !0 + !1x1 + !2x2 + !3x3

+!4x1x2 + !5x1x3 + "

Generalprinciple

•  InteractionbetweenAandBmeans– RelationshipofAtoYdependsonvalueofB– RelationshipofBtoYdependsonvalueofA

•  Thetwostatementsareformallyequivalent

E(Y |x) = !0 + !1x1 + !2x2 + !3x3 + !4x1x2 + !5x1x3

Group x2 x3 E(Y |x)1 1 0 (!0 + !2) + (!1 + !4)x1

2 0 1 (!0 + !3) + (!1 + !5)x1

3 0 0 !0 + !1 x1

Group x2 x3 E(Y |x)1 1 0 (!0 + !2) + (!1 + !4)x1

2 0 1 (!0 + !3) + (!1 + !5)x1

3 0 0 !0 + !1 x1

Whatnullhypothesiswouldyoutestfor

•  Parallelslopes•  Compareslopesforgrouponevsthree

•  Compareslopesforgrouponevstwo

•  Equalregressions•  Interactionbetweengroupandx1

WhattodoifH0:β4=β5=0isrejected

•  HowdoyoutestGroup“controlling”forx1?•  Agoodchoiceistosetx1toitssamplemean,andcomparetreatmentsatthatpoint.

•  Howaboutsettingx1tosamplemeanofthegroup(3differentvalues)?

•  WithrandomassignmenttoGroup,allthreemeansjustestimateE(X1),andthemeanofallthex1valuesisabetterestimate.

CategoricalbyCategorical

•  Soon•  Butfirst,anexample

top related