workshop 3 on focusing on the case qualitative modelling

Post on 28-Mar-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Workshop 3 on ‘Focusing on the Case

Qualitative Modelling

Topics Covered Herein

Qualitative Regression Models The Odds Ratio – an online ppt to learn about it The log-odds as a Dependent Variable

Alternatives: Dummy variables as independent variables Log-Linear analysis without a dep/indep causal strcture being

superimposed

Discussion of Causality in Multi-Level Situations Nested multi-level situations Non-Nested multi-level stratified reality

Plurality of Causes – Is it operationalisable? Suggested Readings

Qualitative Regression Models

consider the participation of men and women in the labour force.

In logistic regression, in particular, the measurement mode is not assigned a priori except in so far as cases (in this case people) are identified as bearers of data. Even here, the ‘case’ may bear not only the weight of causes that operate on or through persons, but also structural relations between larger classes of people and institutional factors that affect such a person. The variables in the regression are of a variety of levels of measurement (not all are continuous). The dependent variable in particular is qualitative in character.

We make two translations of the act of ‘entering’ a labour market. First, we allow people to declare how they have done that; secondly, we group these declarations into the larger categories ‘active’ and ‘inactive’,

transform this binary categorisation into a new continuous variable, the logit of activity. The logit is defined as the log of the odds of being active.

The odds of being active are defined as:

  the ratio of the probability

of being active to the probability of being inactive.

We take its logarithm, giving a new number on a wider scale ranging from negative to positive values. The logit, ie the log odds, is not constrained to be between 0 and 1.

One equation, for instance, summarising some results in this particular case, can be represented as follows:

  log of the odds of employment = -1.47(LTLI)+0.27*London+.61*Degree-

0.76*Noqual+0.92*Wife+.61   Each number shows whether the odds of being employed are

raised or lowered by the presence of a given characteristic. In this equation, the following definitions are used.

  LTLI = Long-term limiting illness (specifically,

the person reports that they are unable to do some forms of work due to an illness or other disabling condition)

London= Lives in Inner or Outer London or the rest of the Southeast

Degree = Has a degree and/or a higher degree Noqual= Has no qualifications ie no CSEs or O-

Levels or other qualifications Wife = Is married or cohabiting, and is female

In conclusion, critics of statistics argue that statisticians seek only regularities and assume closure in reality. In this section we showed that methodological closure can be assumed, and regularities within the data-set can be sought, without assuming closure in reality.

Look at the Appendix to the handout to see whether the coefficients exhibit regularity.

Some do.

Some don’t.

What would you do?

Transition to Multi-Level Models

H1 - wage variation across occupational groups, acting as a proxy for a variety of labour-market segmenting practices, notably boundaries limiting potential entrants into occupations, is considerable even after person-level characteristics and occupation-specific characteristics are accounted for.

A graph of these advantages, with educationyears' constant slope, iscalled a random-intercept model. It is also called a fixed-effects model, since the effect of education and segpoint is a fixed effect rather than having a slope that varies from one SOC2 group to another.

The model with the slope of lnwage on education varying by SOC2 occupational groups looks like this - but the slope differentials are just acceptably statistically significant -- t=2.5 or so:

H1: supported by the change in variance explained by the SOC2 categories in themselves. MODEL 1: Empty

model showing the gross variance of LnWage

MODEL 2: Empty model with Gender in

MODEL 3: Complex model with Gender and other factors in

Between-occupation correlation of wages: (defind as: Variance of the intercepts for the different occupations, relative to the total variance )

.338 .287 .274

Change in the variance explained relative to Model 1

- .051 (falls by 15% of .338)

.064 (falls by 19% of .338)

Statistical test of the change in the Log Likelihood corresponding to the change from Model 1

-

-2LL 6361 6289 3675

Under this particular model, however, the obvious association of the predicte dwage with the 'numerical label' of teh SOC2 category disappears: Notice:

-2LL 6361 6289 3675 n of raw cases 4050 4050 4050 n of level 2 cases 77 77 77

H2b) Furthermore, the interaction effect of SEGPOINT with GENDER in linear regression is not significant. (Need to test again for multilevel.)THE TEST: basic two-level model, lacking some controls: 

SAME MODEL but with SEGFEM = SEGPOINT * FEMALE:

S

It appears that women gain less than men from being in male-dominated occupations; equivalently, since the statistical test is symmetric, men lose more from being in women-dominated occupations than women do. The graph illustrates it; note that the y-intercept has not been corrected for other factors but the slopes show the differentiated SEGPOINT effect. The Slope for women is lower. The Slope for men is relatively higher; note that in teh equation, the direct impact of SEGPOINT (which is now applying to males only,) is 0.025, or 2.5% per 10% change of SEGPOINT. They lose 2.5% for each 10% downward they go toward a female-dominated occupation. Women, relative to the mean, lose only (0.025 -0.011)= 0.014, or 1.4%.

It appears that women gain less than men from being in male-dominated occupations; equivalently, since the statistical test is symmetric, men lose more from being in women-dominated occupations than women do. The graph illustrates it; note that the y-intercept has not been corrected for other factors but the slopes show the differentiated SEGPOINT effect. The Slope for women is lower. The Slope for men is relatively higher; note that in teh equation, the direct impact of SEGPOINT (which is now applying to males only,) is 0.025, or 2.5% per 10% change of SEGPOINT. They lose 2.5% for each 10% downward they go toward a female-dominated occupation. Women, relative to the mean, lose only (0.025 -0.011)= 0.014, or 1.4%.

The use of SOC2 as a level assumes in statistical discourse that the occupational categories are a 'random sample of a global population of occupational categories'. The population is not explicitly referring to any population outside of Great Britain or in another year than 1999/2000, but rather is hypothetical construction to enable us to use inferential discourse in deriving statistical significance values from the variables that exist at SOC2 level, notably 'SEGPOINT'. Here, inference at Level 2 means inferring from the sample data to the population. (Since this population is purely hypothetical, and we actually have a full account of every SOC2 category for 1999/2000, this is

slightly misleading. By comparison, the sampling at Level 1, for individuals, and the weighting of the sample cases makes a lot of sense and is based in reality.)

This graph shows merely that for a given occupation, whose mean education is on the x-axis, the predicted level of lnwage is the height of the point on the y-axis.

The corresponding multi-level random effects model, with tiny variations in slopes and a better fit than EDMEAN alone, looks like this: (very similar to the original SOC2 with EDSCALE model):

Summary

Discuss topics covered Reiterate welcome Reminder to submit paperwork Wrap-up

top related