the chicago guide to writing about multivariate analysis, 2 nd edition. defining the goldilocks...

27
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

Upload: maryann-grant

Post on 05-Jan-2016

240 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Defining the Goldilocks problem

Jane E. Miller, PhD

Page 2: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Overview

• Defining the Goldilocks problem• Understanding why type of variable matters• Understanding why range of values matters• Outlining the steps to avert Goldilocks

problems– Later podcasts fill in the details

Page 3: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

What is “the Goldilocks problem” in multivariate regression?

• As Goldilocks discovered, she and each of the Three Bears preferred different sized chairs.– One chair was too big,– One chair was too small,– One chair was just right!

• Likewise, different variables in a multivariate regression often require different-sized contrasts to illustrate the meaning of their coefficients.

Page 4: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Review: Interpretation of regression coefficients

• Ordinary least squares (OLS) coefficients (βs) change in dependent variable (Y) for a 1-unit increase in independent variable (Xi), with the result in the units of the dependent variable.

• Logit coefficients estimate the effect of a 1-unit increase in Xi on the log-odds of the outcome under study.

Page 5: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Common pitfalls in interpreting regression coefficients

• Assessing which independent variables are the “most important” by directly comparing the sizes of the estimated coefficients (βs).

• Direct comparison of βs implies that a 1-unit increase in each independent variable is the pertinent contrast for that variable. – Problematic because many multivariate models

include different:• Types of variables (levels of measurement).• Ranges and scales of continuous variables.

Page 6: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Why does type of variable matter?

• Continuous independent variables• Categorical independent variables

– Nominal– Ordinal

Page 7: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Considerations for contrast size: Continuous variables

• Different continuous variables have different levels and ranges of values:– Age in a sample of students might vary from 5 to

17 years• A 12-unit range among values in the single to double

digits

– Their annual family incomes could vary from $0 to $millions

• A million+ unit range with a median value likely in the five digits

Page 8: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Problem: Directly comparing βs for continuous variables with different scales

• Although a 1-year increase in age might be a relevant contrast, a $1 increase in annual family income in the US today would be trivial.

• Directly comparing the βs on age and income implicitly assumes that a 1-unit increase fits the scale of both variables.

Page 9: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Considerations for contrast size: Categorical variables

• The numeric codes used as shorthand for categorical variables have no mathematical meaning. – E.g., dummy variable “boy” coded

1 = boy0 = girl

– No such thing as a 1-unit increase in “genderness.”• Such binary variables only span a 1-unit range, so

multiunit changes are not applicable.

Page 10: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Problem: Interpreting directionality of nominal variable codes

• The values of nominal variables such as gender or race have no natural order.– Any rank ordering of categories of those variables

is arbitrary.• An artifact of how the analyst chose to code the

categories.– Could equally well code gender as 1 = male,

2 = female

– Thus the directionality implied by a 1-unit increase is misleading.

Page 11: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Codes for ordinal variables• Codes for categories of ordinal variables are

rankable and might appear to have numeric meaning. – E.g., categories for self-rated health might be

coded:1: excellent2: very good3: good4: fair5: poor

Page 12: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Problem: Interpreting ordinal values as if they were continuous

• Unlike integer values of a continuous variable like age in years, the numeric distance between categories of an ordinal variable cannot be assumed to be uniform.

• E.g., respondents might perceive a bigger difference between “good” and “fair” health than between “very good” and “excellent” health.

Page 13: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Problem: Interpreting ordinal values as if they were continuous

• Unlike integer values of a continuous variable, the numeric distance between categories of an ordinal variable cannot be assumed to be uniform, even when categories have numeric units attached. E.g., income groups often

• Are of varying widths– E.g., <$20K, $20K–39K, $40K–$79K, $80K–$160K

• Include an open-ended top category (e.g., >$160K).

Page 14: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Problem: Comparing βs on categorical and continuous variables

• Given the different interpretations of βs on continuous and categorical variables, if a model includes both types of independent variables, cannot compare their βs without considering the pertinent size contrast for each variable. – For mother’s age (a continuous variable), the contrast can

vary >1 unit (year) across cases. – For gender, the contrast is one category versus the other,

and no more than a “1-unit” increase is possible. – Even if βboy > βmother’s age (117.2 and 10.7, respectively), one

cannot conclude that gender is a “more important” determinant of birth weight than mother’s age.

Page 15: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Why does range of values matter?

• When is a 1-unit change– Too big?– Too small?

Page 16: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

When is a 1-unit increase too big?• For independent variables whose values in

your data:– fall mostly between 0 and 1,– are clustered within a few units of one another,– or are by definition restricted to between 0.0 and

1.0, e.g.,• Variables measured in proportions• Gini coefficients

• In such situations, apply a <1.0 unit contrast to assess the effect of a change in Xi on Y.

Page 17: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Proportions versus percentages• Researchers are often sloppy about variables

measured in proportions, instead labeling them as percentages (or vice versa)– The percentage equivalent of a proportion is by definition

100 times as large.

• Must convey the correct scale of the variable used in the model so β can be interpreted correctly. – For variables measured as a proportion, a 1-unit increase is

too large, – For those measured in percentages, a 1-unit increase often

is too small.

Page 18: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

When is a 1-unit increase too small?

• For independent variables with– A high level or wide range of values– Imprecise measurement of values

• E.g., for blood pressure, a 1 millimeter mercury (mm Hg) difference is too small to be – clinically meaningful – observed with precision

• In such situations, apply a >1.0 unit contrast to assess the effect of a change in Xi on Y.

Page 19: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Goldilocks issues for the dependent variable: range of values

• Evaluate what a 1-unit increase means given the range and scale of the dependent variable.

• β = 1.0 on a dummy variable is – a trivially small effect in a model predicting birth

weight (which ranges from about 400 to 5,900 grams).

– A substantial effect in a model predicting grade point average on the usual 4-point scale.

Page 20: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Goldilocks issues for the dependent variable: model specification

• Ordinal dependent variables such as birth weight categories (e.g., very low, low, normal, and high birth weight) should not be modeled using OLS models.– OLS models imply that the numeric codes for

those categories are values of a continuous dependent variable.

– Instead, use techniques such as ordered logit or other methods for ordered categorical dependent variables (Powers and Xie 2000).

Page 21: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Prose interpretation and comparison of βs is critical

• If βs are only reported in a table or prose, you leave it to readers to:– Notice the different types and scales of the variables,– Figure out pertinent-sized contrasts for each variable in the

model.

• Readers will then be more likely to make Goldilocks errors when they assess the meaning of the βs on different variables in your model.

• Your job as the author is to write about the results in ways that avert Goldilocks errors of interpretation.

Page 22: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Steps for resolving Goldilocks problems1. Getting acquainted with the units and distribution

of your independent and dependent variables.2. Applying theoretical and empirical criteria to choose

a suitably-sized contrast for each independent variable.

3. Using precise, complete labeling of units and categories in prose, tables, and charts.

4. Interpreting the results in prose to clearly communicate the substantive meaning of the βs based on suitably-sized contrasts for each variable.

Page 23: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Summary• A “one-size-fits-all” approach to interpreting

regression coefficients is often misleading because variables– Have different types (levels of measurement), – Have different units of measurement,– Have varying distributions of values, – Occur in different real-world circumstances.

• These issues require careful thought about how to present βs to convey the substantive meaning of the β for each variable.

Page 24: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested resources• Miller, J. E. 2013. The Chicago Guide to Writing

about Multivariate Analysis, 2nd Edition. – Chapter 10, on the Goldilocks problem– Chapter 4, on types of variables, units, and

distribution

• Miller, J. E. and Y. V. Rodgers, 2008. “Economic Importance and Statistical Significance: Guidelines for Communicating Empirical Research.” Feminist Economics 14 (2): 117–49.

Page 25: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested online resources

• Podcasts on – Interpreting multivariate regression coefficients– Resolving the Goldilocks problem

• Measurement and variables• Model specification• Presenting results

Page 26: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested practice exercises

• Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.– Questions #1,2, and 7 in the problem set for

chapter 10.– Suggested course extensions for chapter 10:

• “Reviewing” exercises #1 through 5.• “Applying statistics and writing” question #1.• “Revising” questions #1, 2, 3, 5, and 9.

– “Getting to know your variables” assignment

Page 27: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Contact information

Jane E. Miller, [email protected]

Online materials available athttp://press.uchicago.edu/books/miller/multivariate/index.html