NOMINAL RESPONSES:NOMINAL RESPONSES:
BASELINE-CATEGORY LOGIT MODELS BASELINE-CATEGORY LOGIT MODELS (Agresti 7.1)(Agresti 7.1)
Kathy Fung and Lin ZhangKathy Fung and Lin Zhang
Statistics 6841 ProjectStatistics 6841 Project
Winter 2005Winter 2005
23/4/19 2
ObjectiveObjective
Introduction of NOMINAL Introduction of NOMINAL RESPONSES (BASELINE-RESPONSES (BASELINE-CATEGORY LOGIT MODELS)CATEGORY LOGIT MODELS)
The Concept and ExampleThe Concept and Example
23/4/19 3
Model DefinitionModel Definition
23/4/19 4
Some Notes:Some Notes:
• With categorical predictors, XWith categorical predictors, X22 and and GG22 goodness-of-fit statistics provide a goodness-of-fit statistics provide a model check when data are not model check when data are not sparse. sparse.
• When an explanatory variable is When an explanatory variable is continuous or the data are sparse continuous or the data are sparse such statistics are still valid for such statistics are still valid for comparing nested models differing comparing nested models differing by relative few terms.by relative few terms.
23/4/19 5
Alligator Food Choice Alligator Food Choice Example Example
623/4/19
SAS code of Table 7.1*SAS for Baseline-Category Logit Models with Alligator Data in Table 7.1;
data gator;infile 'K:\CSU Hayward\Stat 6841\project\gator.txt';input lake gender size food count ;
proc logistic; freq count; class lake size / param=ref; model food(ref='1') = lake size / link=glogit aggregate scale=none;proc catmod; weight count; population lake size gender; model food = lake size / pred=freq pred=prob;run;
723/4/19
Output The LOGISTIC Procedure
Model Information
Data Set WORK.GATOR Response Variable food Number of Response Levels 5 Frequency Variable count Model generalized
logit Optimization Technique Fisher's scoring Number of Observations Read 80 Number of Observations Used 56 Sum of Frequencies Read 219 Sum of Frequencies Used 219
Response Profile Ordered Total Value food Frequency 1 1 94 2 2 61 3 3 19 4 4 13 5 5 32
Logits modeled use food=1 as the reference category.
NOTE: 24 observations having nonpositive frequencies or weights were excluded since they do not contribute to the analysis.
823/4/19
Output
Class Level Information
Class Value Design Variables
lake 1 1 0 0 2 0 1 0 3 0 0 1
4 0 0 0
size 1 1 2 0
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
923/4/19
Output Deviance and Pearson Goodness-of-Fit Statistics
Criterion Value DF Value/DF Pr > ChiSq
Deviance 17.0798 12 1.4233 0.1466 Pearson 15.0429 12 1.2536 0.2391 Number of unique profiles: 8
Model Fit Statistics Intercept Intercept and Criterion Only covariates
AIC 612.363 580.080 SC 625.919 647.862 -2 Log L 604.363 540.080
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq Likelihood Ratio 64.2826 16 <.0001 Score 57.2475 16 <.0001 Wald 49.7584 16 <.0001
1023/4/19
Output Type 3 Analysis of Effects
Wald Effect DF Chi-Square Pr > ChiSq
lake 12 35.4890 0.0004 size 4 18.7593 0.0009
Analysis of Maximum Likelihood Estimates Standard Wald Parameter food DF Estimate Error Chi-Square Pr > ChiSq Intercept 2 1 -1.5490 0.4249 13.2890 0.0003 Intercept 3 1 -3.3139 1.0528 9.9081 0.0016 Intercept 4 1 -2.0931 0.6622 9.9894 0.0016 Intercept 5 1 -1.9043 0.5258 13.1150 0.0003 lake 1 2 1 -1.6583 0.6129 7.3216 0.0068 lake 1 3 1 1.2422 1.1852 1.0985 0.2946 lake 1 4 1 0.6951 0.7813 0.7916 0.3736
1123/4/19
Output Analysis of Maximum Likelihood Estimates
Standard Wald Parameter food DF Estimate Error Chi-Square Pr > ChiSq
lake 1 5 1 0.8262 0.5575 2.1959 0.1384 lake 2 2 1 0.9372 0.4719 3.9443 0.0470 lake 2 3 1 2.4583 1.1179 4.8360 0.0279 lake 2 4 1 -0.6532 1.2021 0.2953 0.5869 lake 2 5 1 0.00565 0.7766 0.0001 0.9942 lake 3 2 1 1.1220 0.4905 5.2321 0.0222 lake 3 3 1 2.9347 1.1161 6.9131 0.0086 lake 3 4 1 1.0878 0.8417 1.6703 0.1962 lake 3 5 1 1.5164 0.6214 5.9541 0.0147 size 1 2 1 1.4582 0.3959 13.5634 0.0002 size 1 3 1 -0.3513 0.5800 0.3668 0.5448 size 1 4 1 -0.6307 0.6425 0.9635 0.3263 size 1 5 1 0.3316 0.4483 0.5471 0.4595
1223/4/19
Output
Odds Ratio Estimates Point 95% Wald Effect food Estimate Confidence Limits lake 1 vs 4 2 0.190 0.057 0.633 lake 1 vs 4 3 3.463 0.339 35.343 lake 1 vs 4 4 2.004 0.433 9.266 lake 1 vs 4 5 2.285 0.766 6.814 lake 2 vs 4 2 2.553 1.012 6.437 lake 2 vs 4 3 11.685 1.306 104.508 lake 2 vs 4 4 0.520 0.049 5.490 lake 2 vs 4 5 1.006 0.219 4.608 lake 3 vs 4 2 3.071 1.174 8.032 lake 3 vs 4 3 18.815 2.111 167.717 lake 3 vs 4 4 2.968 0.570 15.447 lake 3 vs 4 5 4.556 1.348 15.400 size 1 vs 2 2 4.298 1.978 9.339 size 1 vs 2 3 0.704 0.226 2.194 size 1 vs 2 4 0.532 0.151 1.875 size 1 vs 2 5 1.393 0.579 3.354
1323/4/19
Output The CATMOD Procedure
Data Summary Response food Response Levels 5 Weight Variable count Populations 16 Data Set GATOR Total Frequency 219 Frequency Missing 0 Observations 56
Population Profiles Sample lake size gender Sample Size ----------------------------------------------- 1 1 1 1 13 2 1 1 2 26 3 1 2 1 7 4 1 2 2 9 5 2 1 1 5 6 2 1 2 15 7 2 2 1 26 8 2 2 2 2 9 3 1 1 12 10 3 1 2 12 11 3 2 1 28 12 3 2 2 1 13 4 1 1 27 14 4 1 2 14 15 4 2 1 12 16 4 2 2 10
1423/4/19
OutputResponse Profiles
Response food----------------
1 1 2 2
3 3 4 4 5 5
Maximum Likelihood Analysis Maximum likelihood computations converged.
Maximum Likelihood Analysis of Variance
Source DF Chi-Square Pr > ChiSq -------------------------------------------------- Intercept 4 70.39 <.0001 lake 12 35.49 0.0004 size 4 18.76 0.0009 Likelihood Ratio 44 52.48 0.1784
1523/4/19
Analysis of Maximum Likelihood Estimates
Function Standard Chi- Parameter Number Estimate Error Square Pr > ChiSq ---------------------------------------------------------------------------- Intercept 1 1.1514 0.2343 24.14 <.0001 2 0.4317 0.2737 2.49 0.1147 3 -0.6795 0.3818 3.17 0.0751 4 -0.9745 0.4049 5.79 0.0161 lake 1 1 -0.2391 0.3458 0.48 0.4892 1 2 -1.9977 0.4946 16.31 <.0001 1 3 -0.6556 0.6071 1.17 0.2802 1 4 0.1736 0.5654 0.09 0.7589 2 1 0.5814 0.5061 1.32 0.2506 2 2 1.4184 0.5250 7.30 0.0069 2 3 1.3810 0.6279 4.84 0.0278 2 4 -0.3542 0.9153 0.15 0.6988 3 1 -0.9293 0.3836 5.87 0.0154 3 2 0.0925 0.3910 0.06 0.8131 3 3 0.3467 0.5130 0.46 0.4991 3 4 -0.1240 0.5830 0.05 0.8316 size 1 1 -0.1658 0.2241 0.55 0.4595 1 2 0.5633 0.2525 4.98 0.0257 1 3 -0.3414 0.3257 1.10 0.2945 1 4 -0.4811 0.3564 1.82 0.1770
Output
23/4/19 16
Table 7.2Table 7.2
23/4/19 17
Some Test Results for Some Test Results for Table 7.2Table 7.2
• The data are sparse, 219 observations The data are sparse, 219 observations scattered among 80 cells. Thus, Gscattered among 80 cells. Thus, G22 is more is more reliable for compar ing models than for reliable for compar ing models than for testing fit. testing fit.
• The statistics The statistics • GG22 [( )|(G)] = 2.1 and [( )|(G)] = 2.1 and • GG22=[(L + S)|(G + L + S)] = 2.2, =[(L + S)|(G + L + S)] = 2.2,
each based on df = 4, suggest simplifying by each based on df = 4, suggest simplifying by collapsing the table over gender. (Other analyses, collapsing the table over gender. (Other analyses, not presented here, show that adding interaction not presented here, show that adding interaction terms including G do not improve the fit terms including G do not improve the fit significantly.) significantly.)
• The GThe G22 and X and X22 values for the collapsed table values for the collapsed table indicate that both L and S have effects.indicate that both L and S have effects.
23/4/19 18
Table 7.3Table 7.3
23/4/19 19
Table 7.4Table 7.4
23/4/19 20
Prediction Equation for Log Prediction Equation for Log Odds of Selecting Odds of Selecting
Invertebrates Instead of FishInvertebrates Instead of Fish
• where s=1 for size 2.3 meters and 0 otherwise, where s=1 for size 2.3 meters and 0 otherwise, • zH is a dummy variable for Lake Hancock (zH=1 for alligators in zH is a dummy variable for Lake Hancock (zH=1 for alligators in
that lake and 0 otherwise), and that lake and 0 otherwise), and • zO and zT are dummy variables for lakes Oklawaha and Trafford. zO and zT are dummy variables for lakes Oklawaha and Trafford. • Size of alligators has a noticeable effect. For a given lake, for small Size of alligators has a noticeable effect. For a given lake, for small
alligators the estimated odds that primary food choice was alligators the estimated odds that primary food choice was invertebrates instead of fish are exp(1.46) = 4.3 times the invertebrates instead of fish are exp(1.46) = 4.3 times the estimated odds for large alligators; estimated odds for large alligators;
• the Wald 95% confidence interval is exp[1.46 ± 1.96(0.396)] = the Wald 95% confidence interval is exp[1.46 ± 1.96(0.396)] = (2.0,9.3). (2.0,9.3).
• The lake effects indicate that the estimated odds that the primary The lake effects indicate that the estimated odds that the primary food choice was invertebrates instead of fish are relatively higher food choice was invertebrates instead of fish are relatively higher at Lakes Trafford and Oklawaha and relatively lower at Lake at Lakes Trafford and Oklawaha and relatively lower at Lake Hancock than they are at Lake George.Hancock than they are at Lake George.
23/4/19 21
Further Estimate Further Estimate CalculationCalculation
23/4/19 22
Estimating Response Estimating Response ProbabilitiesProbabilities
(Model)(Model)The equation that expresses The equation that expresses
multinomial logit models directly in multinomial logit models directly in terms of response probabilities is terms of response probabilities is
23/4/19 23
Estimating Response Estimating Response ProbabilitiesProbabilities
(Results)(Results)• From Table 7.4 the estimated From Table 7.4 the estimated
probability that a large alligator in probability that a large alligator in Lake Hancock has invertebrates as Lake Hancock has invertebrates as the primary food choice is the primary food choice is
• The estimated probabilities for The estimated probabilities for reptile, bird, other, and fish are 0.072, reptile, bird, other, and fish are 0.072, 0.141, 0.194, and 0.570.0.141, 0.194, and 0.570.
23/4/19 24
Quality vs. QuantityQuality vs. Quantity
23/4/19 25
Summary and ConclusionSummary and Conclusion