empirical methods for investigating governence

8/3/2019 Empirical Methods for Investigating Governence

1/43

MEANS AND ENDS: A COMPARATIVE STUDY

OF EMPIRICAL METHODS FOR INVESTIGATING

GOVERNANCE AND PERFORMANCE

Carolyn J. Heinrich and Laurence E. Lynn, Jr.

The University of Chicago

DRAFT

September 1999

Prepared for the Fifth National Public Management Research Conference, George Bush School of

Public Service, Texas A&M University, College Station, Texas, December 3-4, 1999, with the support

of the Pew Charitable Trusts.


2/43

ABSTRACT

Scholars within different disciplines employ a wide range of empiricalapproaches to understanding how, why and with what consequences government is

organized. We first review recent statistical modeling efforts in the areas of education,

job-training, welfare reform and drug abuse treatment and assess recent advances in

quantitative research designs. We then estimate governance models with two different

data sets in the area of job training using three different statistical approaches:

hierarchical linear models (HLM); ordinary least squares (OLS) regression models using

individual level data; and OLS models using outcome measures aggregated at the site or

administrator level. We show that HLM approaches are in general superior to OLS

approaches in that they produce (1) a fuller and more precise understanding of

complex, hierarchical relationships in government, (2) more information about theamount of variation explained by statistical models at different levels of analysis, and (3)

increased generalizability of findings across different sites or organizations with varying

characteristics. The notable inconsistencies in the estimated OLS regression coefficients

are of particular interest to the study of governance, since these estimated relationships

are nearly always the primary focus of public policy and public management research.


3/43

Table of Contents

Introduction 1

Empirical Governance Research: Observations on the State of the Art 2

Some Improvements in Models and Methods 3

Lingering Limitations of Conventional Approaches 5

Multilevel Approaches to Governance Research 9

Applications of Multilevel Modeling 10Education 12

Drug Abuse Treatment 14

Employment and Training 16

Comparing Hierarchical Linear Model and Ordinary

Least Squares Results 17

Model Specifications 18

HLM and OLS Model Results 23

Conclusions 28

Tables 30


4/43


5/43

[ 1 ]

Introduction

Scholars of governance within political science, public policy, and public administration describe

their efforts to understand how, why and with what consequences government is organized and

managed as getting inside of or breaking open the black box of program implementation (Lynn,

Heinrich and Hill 1999). A wide range of research designs from case studies and historical

accounts to more formal models that include quantitative analysis are employed to explicate the

processes that establish the means and ends of governmental activity, and, in some studies, to assess the

implications of administration and management for individual-level and program outcomes.

Recently, reflecting world-wide interest in performance management, scholars have begun to

advocate research strategies that relate the measurable effects of public programs and policies to the

specific administrative practices and program or institutional features that seem to produce them (Lynn,

Heinrich and Hill, 1999; Mead, 1997, 1999; Smith and Meier, 1994; Milward and Provan, 1998; and

Roderick, 1999). Mead (1997) argues that program impact studies that neglect the influence of local

administrative capacity and structures have little value to policy makers and program administrators.

However, scholars have long recognized the theoretical and methodological difficulties associated with

identifying and describing complex interrelationships across multiple administrative levels within public

organizations and showing how different structural and administrative arrangements, collectively termed

governance, affect program outcomes.

This paper is concerned with assessing the advantages and disadvantages of different research

strategies that may be used in the empirical study of governance and performance. We first review

studies in several disciplines and policy areas including education, welfare reform, job-training and


6/43

[ 2 ]

drug abuse treatment to determine the extent to which advances in statistical modeling, and, in

particular, in hierarchical or multilevel modeling, as well as collaborations between researchers and

public officials, increase the potential for more accurate and informative governance research. Then,

based analyses of two different data sets that have individual level observations, we will compare the

performance of three different statistical approaches: hierarchical linear models (HLM); ordinary least

squares (OLS) regression models using individual level data; and OLS models using outcome measures

aggregated at the site or administrator level. We will show that, in general, multilevel modeling strategies

are more likely to produce unbiased estimates of policy, administrative or structural variable effects on

outcomes than traditional, ordinary least squares approaches, particularly when the extent of cross-level

effects operating at the multiple levels of analysis is relatively high.

Empirical Governance Research: Observations on the State of the Art

Most relationships in government and social systems involve activities and interactions that span

multiple levels of organization or systemic structures. Empirical studies designed to analyze these

relationships typically focus on program processes or outcomes at a single organizational (or individual)

level. Some studies group or aggregate individuals (or units of analysis of some type) at a higher level of

organization or structure and attempt to explain average effects or outcomes (e.g., for local offices or

agencies.) Other studies, including experimental and non-experimental program evaluations, analyze the

influence of organizational or structural factors on individual or lower-level unit outcomes by controlling

for these factors in individual-level regressions or by estimating separate individual-level regressions for

different organizational units. These analytical approaches all suffer from the limitations of conventional

statistical methods for estimating linear models with multiple levels of data.


7/43

[ 3 ]

Statistical modeling efforts designed to explain individual-level outcomes based on data from

experimental and non-experimental analyses frequently account for factors related to program

administration and implementation with a single program indicator variable, such as a school or local

office indicator. In these studies, we gain little understanding of the interactions and influence of

specific organizational or structural factors on program outcomes. While experimental evaluations of

public programs such as those conducted by the Manpower Demonstration Research Corporation

(MDRC) have consistently included process evaluation components, the qualitative data are

subsequently used for descriptive or interpretive purposes rather than for establishing causal

relationships between administrative practices and outcomes. This use of process analysis is informative

and a potentially valuable complement to quantitative analyses. When they are not incorporated into the

statistical models, however, process analyses tend to be overshadowed in presentations of findings

concerning program impacts.

Some Improvements in Models and Methods

Both researchers and public officials are coming to recognize that accounting for average

program outcomes or impacts provides little information to public managers about how they can

improve program performance. For example, Mathematica Policy Research and its subcontractors are

presently conducting an experimental evaluation of the Job Corps program that involves over 100 sites

across the country and links client data to information about program administration and services

provided at the sites. In addition, Manpower Demonstration Research Corporation (MDRC)

investigators are currently engaged in research, utilizing multi-site, experimental Job Opportunities and

Basic Skills (JOBS) evaluation data combined with the rich array of survey data of program


8/43

[ 4 ]

administrators and staff, that departs from the traditional experimental approach to program evaluation

by formally incorporating process data analyses into the modeling strategies. Unfortunately, relatively

few researchers have access to these types of data sets substantial in size and collected through costly

experimental designs that allow them to avert challenging statistical issues such as selection bias and

comparison group inequalities, inadequate sample sizes, and other data-related problems.

Progress is also being made, however, in the area of non-experimental methodologies using

individual-level data obtained through administrative and other non-experimental sources. One of the

advantages of non-experimental over experimental approaches is that they are better suited to estimating

the heterogeneous effects of heterogeneous treatments or services on clients, and sorting out the

differential effects that programs can have on various client groups. Such information is more likely to

be useful to program administrators than simple average impact estimates.

An example of this type of research is that of Heckman, LaLonde and Smith (forthcoming).

They have produced an exhaustive analysis of the methodological lessons learned in evaluating social

programs through the use of both experimental and non-experimental evaluation methodologies. They

present a comprehensive discussion of a broad array of econometric models and estimators including

their properties, assumptions and information about the way they condition and transform the data to

guide researchers in their use of these methodologies. Somewhat surprising is their conclusion that

there is no single, inherently preferable method or economic estimator for evaluating public programs:

too much emphasis has been placed on formulating alternative econometric methods for correcting

selection bias and too little [attention] given to the quality of the underlying data. Heckman, LaLonde,

and Smith suggest that more effort should be invested in improving the quality of data used in studying


9/43

[ 5 ]

the effects of public programs than in the development of formal econometric methods to overcome

problems generated by inadequate data. More specifically, they show that if biases are clearly defined,

comparable people in the same geographical areas are compared, and relevant background data on

clients are collected (using the same survey questionnaires), problems in using non-experimental

methodologies for evaluating program outcomes will be much less than formerly believed.

Lingering Limitations of Conventional Approaches

These advances in non-experimental evaluation methodologies, in combination with an

increasing number of longer-term collaborations between public officials and scholars engaged in

governance and evaluation research, have made the use of client-level administrative data in statistical

models of program outcomes more feasible and frequent. Lingering problems still constrain what we

can learn from these types of client-level data analyses, however.

One problem is that these models typically explain only a small percentage of the total variation

in individual outcomes. Individual-level data exhibit considerable random variation, and there are also

likely to be a number of unmeasured influences on outcomes at the individual level. In educational

policy research, for example, the oft-cited Coleman Report finding that schools bring little influence

to bear on a childs achievement that is independent of his background and general social context . . .

has undoubtedly been discouraging to educational research. Smith and Meier (1994) argue that, given

the well-established distance between system characteristics and individual performance, using

individual-level data to study educational system performance is a flawed approach.

A second problem is that procedures to assess what portion of the explained variation can be

attributed to any policy or administrative variables included in these types of models are hardly ever


10/43

[ 6 ]

straightforward. For example, Jennings and Ewalt (1998) studied the influence of increased

coordination and administrative consolidation in JTPA programs on ten JTPA participant outcomes

while controlling for demographic and socioeconomic characteristics of participants. Their models

account for 5-29 percent of the total variation in individual outcomes, and the administrative variables

are statistically significant in about half of these models. Some questions that arise include: How much of

the totalvariation in client outcomes is attributable to policy or program design and implementation

factors? How much of the portion of variation attributable to such factors is explained by the two

administrative variables included? Are there other potentially important administrative variables not

incorporated in these models that might change the observed effects of the coordination and

consolidation variables that are included? We are left not only with uncertainty about how much of a

difference the organization of these programs makes, but also with unclear policy prescriptions for

program administrators, (i.e., should they consolidate or not?)

Such limitations in modeling using individual-level outcomes leads Mead (1997, 1999) and

others to urge more research that models administrative processes and program outcomes across

multiple sites using client data aggregated at the site level. Mead (1999) describes this type of research

as performance analysis: process research that draws formal, statistical connections between

administrative practices and outcomes, with programs or sites as the unit of analysis. He argues that

variation [in outcomes] across programs tends to be more systematic, and therefore, explanatory

models using these data tend to be strong. In fact, the proportion of variation explained in

organizational, program- or site-level regressions (as indicated by R2 values) is typically considerably

higher than in similar individual-level regressions. In Meads (forthcoming) study of the influence of


11/43

[ 7 ]

JOBS program requirements (clients active/inactive statuses) on changes in Wisconsin welfare

caseloads controlling for caseload demographics and economic factors, he explains 76 percent of the

variation in welfare caseload changes.

Sandfort (1998), who studied service technologies in Michigans Work First program and their

relationship to program outcomes, also maintains that the unit of analysis in policy studies of welfare

reform should be the program or organization. She argues that the more crucial forces shaping policy

are within the organizations themselves, and that individual-level data should be placed within their

larger, critical organizational context. In her county-level analyses, she models the proportion of

welfare recipients combining welfare and work in an average month and the proportion leaving welfare.

She includes county-level measures of the proportions of service providers offering specific service

technologies (e.g., job search assistance, soft skills, etc.) and four service delivery structure measures

(e.g., Project Zero, non-profit agency, etc.). She also includes several measures of welfare recipient

demographics. Despite the fairly limited set of explanatory variables available to her, Sandfort explains

approximately 60 percent of the variation welfare program outcomes.

While Sandforts work is a noteworthy example of this type of research, it also illustrates how

data access problems can constrain site-level analyses. She acknowledges that her minimal information

on welfare caseload characteristics might contribute to omitted variable bias in her models. Potentially

more problematic for policy analyses, however, is her qualitative finding that there is significant

variation in the service technology used by Work First providers in the same county, even though they

face the same local economic environment. This suggests that potentially important variation in service

delivery approaches at the service provider level is obscured in county-level aggregates used in the


12/43

[ 8 ]

regressions. The services clients take up at this lower level might be related to their individual

characteristics as well as to those of the service providers.

Mead is clear about what he views as the main shortcoming of his 1999 study of Wisconsin

welfare caseloads: the inability to evaluate the effects of work policies on caseloads as definitively as

program impacts on individuals, since cross-sectional analyses explain variations in change around the

state [between counties] rather than the overall trend. The variation being explained in site- or

program-level models is not variation in test scores or earnings but rather variation between sites or

programs in average outcomes. It is inappropriate to use the findings of regression models at one level

of hierarchy to infer what might be going on at lower levels, although information from case studies and

qualitative data analyses can help inform us about these inter-relationships at other levels.

Fergusons (1991) research on 900 Texas school districts illustrates this type of slippage in

discussing site-level model findings. He uses OLS regressions to explain district average reading and

math scores with a wealth of district-level administrative, structural, socioeconomic and context

measures. He reports positive, statistically significant relationships between student test scores and

higher teacher exam scores, smaller classes and more experienced teachers. He

concludes that higher-quality schooling produces better reading skills among public school students.

His use of explanations of variation in average school district test scores to draw implications for

students outcomes ignores the fact that, within districts, there are schools, grades and classrooms

where many of these same factors may be interacting with other administrative and individual-level

factors at these levels to influence student achievement. He further suggests that researchers should

combine the results of studies examining different levels or components of a hierarchical system to link


13/43

[ 9 ]

teacher salaries to teacher quality, teacher quality to students test scores, and students test scores to

earnings later in life. Such meta-analyses, while useful for addressing some questions, still risk neglecting

important factors that interact at the multiple levels of hierarchy within school systems.

Recent advances in statistical methodologies allow for empirical analyses of factors interacting at

multiple levels of hierarchy within government and social systems. Such advances show considerable

promise for improving knowledge of how governance affects public sector performance. Research

designs that integrate quantitative and qualitative information and that are based on multi-level models

and on data sets that include individual level observations are conceptually demanding and expensive,

however. Is the extra effort justified in terms of the results that are produced in comparison with less

complex designs? We address this question next.

Multilevel Approaches to Governance Research

While some forms of multilevel modeling have been in use for close to two decades, recent

work by Bryk, Goldstein, Kreft, Raudenbush and Singer has advanced the use of these models in

education and related fields of social policy research. New statistical packages have also been

developed to make these techniques more accessible to researchers. 1

Applications of Multilevel Modeling

Multilevel statistical models have many different potential applications across a number of

disciplinary fields, including sociology, biology and economics, among others. In this paper, we focus

1 AMultilevel Modeling Newsletter and a Harvard University website (maintained by Singer) provide

technical assistance to researchers and promote the dissemination of new research findings on the use of multilevel

(or hierarchical linear) modeling. Some of these statistical techniques, such as the nonlinear form known as

hierarchical generalized linear models (HGLM), are so new that the software developers issue disclaimers with the

release of these programs.


14/43

[ 10 ]

on the use of multilevel models to formulate and test hypotheses about how factors or variables

measured at one level of an administrative hierarchy might interact with variables at another level. The

existence of these types ofcross-level interactions or effects is at the crux of the development of

multilevel modeling techniques.

In multilevel models, the assumption of independence of observations in the traditional OLS

approach is dropped, and relationships in the data, rather than assumed to be fixed over contexts, are

allowed to vary. The extent to which multilevel modeling improves statistical estimation in comparison

to OLS models depends on the potential for and strength of cross-level effects in the data and the

corresponding extent of variation in the dependent variable to be explained at the different levels of

analyses. When significant cross-level interactions are present but ignored in OLS modeling efforts,

problems arise, including reduced (or inflated) precision of estimates, mis-specification and subsequent

misestimation of model coefficients, and aggregation bias.

Because multilevel modeling expands the possibilities for investigating hierarchical relationships

and cross-level interactions involving two or three levels of organization, many see it as providing a link

between theory and practice in organizational studies (Kreft, 1996.) Bryk and Raudenbush (1992)

criticized the neglect of hierarchical relationships in traditional OLS approaches as fostering an

impoverished conceptualization that has discouraged the formulation of hypotheses about effects

occurring at and across different levels. Goldstein (1992) also sees multilevel modeling as an

explorative tool for theory development about relationships within and between levels of social

systems. He cautions, however, that exploratory analyses should not be substituted for well-grounded

substantive theories and that multilevel models should not be seen as a panacea for all types of complex


15/43

[ 11 ]

data analysis problems. As Kreft (1996) points out, a particular statistical model cannot be optimal in

general only in specific research contexts and models should be selected based on both the theory

or research questions being tested and the type of data collected.

To illustrate with a governance example, if a functioning hierarchy of structural arrangements and

of management activities originating at one level does indeed influence activity at other (particularly

lower) levels of the organization, as they are presumably intended to do (or might do in unintended

ways), then we should anticipate and model the interdependence among hierarchically-ordered

variables. The absence of such cross-level interactions, on the other hand, might imply a high degree of

compartmentalization, or loose coupling across levels, and of sub-unit independence within the

organization. Furthermore, the presence of significant higher-level effects on organizational performance

in the absence of interdependence among hierarchical variables might suggest that lower-level

characteristics are essentially irrelevant to the efficacy of higher-level governance. While many policy

makers dream of circumstances where lower levels of the organization do not influence policy success,

empirical findings to this effect should probably be regarded with some suspicion.

Our literature review suggests that the application of actual hierarchical models in governance

and public management research is of quite recent vintage. Earlier research employed multi-level

concepts but not necessarily hierarchical models. For example, Meyer and Goes (1987). in their study

of non-profit hospitals adoption of innovative technologies, described their analytical approach as

hierarchical regression, but a careful review of studies such as these shows that multilevel modeling

techniques are not in fact utilized. Meyer and Goes assigned their explanatory variables to different

subsets according to the level of analysis to which they apply e.g., an organizational subset, a leader


16/43

[ 12 ]

subset, an environmental subset, etc. and entered the different subsets into the regression model in

stages, examining changes in explained variation (R2) as the variables are added. Unlike HLM

modeling, this analytical strategy does not allow for analyses of cross-level effects between variables in

the different subsets.

Education

Given the large body of empirical research on educational processes and the ongoing, critical

concern for education policy and outcomes, it is not surprising that education researchers have led social

science efforts to develop and apply hierarchical linear models to the analysis of relationships in public

service delivery systems. The early studies of researchers who have published most extensively on the

use of multilevel or hierarchical linear models in education including Harvey Goldstein (University of

London), Anthony Bryk (University of Chicago) and Stephen Raudenbush (Michigan State University)

first emerged in the mid- to late 1980s (Goldstein, 1986, 1987, 1989; Bryk and Raudenbush, 1987,

1988.) Bryk and Raudenbush, for example, applied these techniques to analyze school-level effects on

students growth in mathematics achievement scores and were surprised by the high proportion of

variance in growth rates that was found to be between schools (83%). They continued on in their

research and developed the Hierarchical Linear Modeling (HLM) statistical program that is now widely

used in social sciences research (1992, 1999.) The research of Goldstein and his colleagues has also

progressed steadily, with a considerable number of applications focused on the British educational

system, including larger-scale school performance reviews mandated by the British government (1992,

1995, 1996.)

More recently, Roderick and Camburn (1997) and Roderick (1999) have been examining the


17/43

[ 13 ]

Chicago public school systems decision to end social promotion and increase students achievement.

They are drawing upon the wealth of data generated by the Consortium on Chicago School

Research, which has collaborated with the Chicago Public Schools to develop data sets and

methodologies for multilevel studies of school reform implementation.

Roderick and Camburn used hierarchical generalized linear models (the non-linear form of

HLM) to test hypotheses about students likelihood of failing courses and their likelihood of subsequent

recovery from grade failure. Their models allowed them to assess the potential effectiveness of three

alternative strategies (individual- and system-focused) for improving student performance: (1) improving

the educational preparation of students before they enter high school, (2) creating transition years to

ease stress and increase support for students, and (3) instituting large-scale, school-wide restructuring

and reform efforts to improve teaching practices and school environments. They found a number of

important relationships among individual- and school-level variables and generated strong evidence of

school-level effects that suggest, in their words, governance and instructional environments . . . matter.

Presently, Roderick (1999) is using three-level hierarchical linear models to analyze changes in

students grades and test scores over time (level 1); students paths (promotion, retention, summer

school participation, etc.) through the new policies implementation (within schools and across years)

and the influence of student characteristics (level 2); and the effectiveness of schools responses to these

policies as a function of school demographics and characteristics, measures of policy implementation

and teachers classroom strategies, and the school environment and prior school development (level

3). This study also includes an extensive qualitative component with intensive case studies of each

schools approach to policy implementation and a longitudinal investigation of students experiences


18/43

[ 14 ]

under the promotional policy.

Drug Abuse Treatment

Early large-scale studies on drug abuse treatment effectiveness included: (1) the Drug Abuse

Reporting Program (DARP), which collected data from approximately 44,000 clients and 52 federally-

funded treatment programs between 1969 and 1972, and (2) the Treatment Outcome Prospective

Study (TOPS), which was intended to expand the data collected in DARP and involved more than

11,000 patients in 41 programs between 1979 and 1981. Longitudinal (non-experimental) analyses of

the cost-effectiveness of various drug abuse treatment modalities were conducted using these client-level

data, although information about programs or organizations was limited in focus to services delivered

and program environments.

These research efforts were followed by other major studies, including the Outpatient Drug

Abuse Treatment Systems (ODATS) study and the Drug Abuse Treatment Outcomes Study

(DATOS). ODATS, which is continuing, surveys unit directors and supervisors in drug abuse treatment

programs to obtain rich, organization-level data on characteristics of the programs, their environments

and their clients. ODATS has progressed through four waves of data collection from a total of more

than 600 programs since 1984. In contrast, a major strength of the DATOS research is the

extensiveness of client-level data obtained from more than 10,000 adults in 99 drug abuse treatment

programs between 1991 and 1993. Research using these data sets address questions about program

design, treatment practices, and client outcomes (DAunno, Sutton and Price, 1991 and Fletcher, Tims

and Brown, 1997). Our own exploration of these data suggests that adequate information for a

multilevel investigation of governance and performance is lacking.


19/43

[ 15 ]

In an early study on the effectiveness of methadone treatment for heroin addiction, Attewell and

Gerstein (1979) drew on organizational theory to develop a hierarchical conceptual model of policy

implementation that link[s] the macrosociology of federal policy on opiate addiction to the

microsociology of methadone treatment (311). They used a case-study approach, including

observational research in clinics, interviews with clients, and analyses of program records from clinics

over multiple years, to investigate managerial responses at the program level to government policy and

institutional regulation, as well as clients responses and behavior to subsequent program changes.

Based on qualitative analysis of these observations, they found that compromised policies at the

federal level resulted in ineffective local management practices and poor outcomes for clients.

Gerstein now directs the National Treatment Improvement Evaluation Study (NTIES), which

should permit quantitative, multilevel analyses of drug abuse treatment policies and programs. In the

NTIES final report on the NTIES evaluation study (1997), Gerstein et al. described how a two-level

design permeated every level of the project. This study evaluates both administrative and clinical

(client) processes and outcomes for over 6,000 clients in up to nearly 800 programs. Like the effort led

by the Consortium on Chicago School Research, the design of the NTIES project provides a model for

researchers who are considering plans for a multi-site, multilevel study in any field.

Employment and Training

Our own multilevel study and a separate work by Heinrich (1999) on administrative structures

and management/incentive policies in JTPA programs provide the basis for our comparison of multilevel

modeling techniques with the individual-level and site-level modeling approaches. Heinrich and Lynn

(1999) used data collected during the National JTPA Study on individuals characteristics and earnings


20/43

[ 16 ]

and employment outcomes, as well as administrative and policy data obtained from the sixteen study

sites over a three-year period, to estimate hierarchical linear models. They found that both site-level

administrative structures and local management strategies (including performance incentives) had a

significant influence on client outcomes.

In her multilevel study of local JTPA service providers and their contracts with a single JTPA

agency, Heinrich also examined the influence of organizational structure or form (i.e., public nonprofit,

private nonprofit, and for-profit service providers) and the use of performance incentives in service

provider contracts on client outcomes, controlling for client characteristics and the services they

received. She similarly found significant effects of the use of performance incentives by local JTPA

agencies on client outcomes.

The data used in these two studies allow for a comparison of different statistical approaches.

Further, the extent of cross-level interactions among hierarchical variables in these two sets of data are

quite different. Differences in the extent of intra-class correlation in hierarchical data have important

implications for the relative advantages and disadvantages of using multilevel modeling strategies in

different research contexts, as we shall show.

Comparing Hierarchical Linear Model and Ordinary Least Squares Results

Different models may yield different answers to the same question. Thus researchers should

select modeling approaches that not only fit the data but that are also appropriate ways to address the

questions or hypotheses of interest. In our studies of JTPA programs, two different levels of analyses

are represented: (1) the client or individual level, and (2) the site (service delivery area) or contract level,

which made it possible to organize or fit the data using several different modeling strategies. For OLS


21/43

[ 17 ]

regressions of individual-level outcomes, the site-level (or contract-level) administrative and

management/incentive policy data were linked to the individual participant records, so that all

participants in a given site and year (or served under a specific contract) had the same site-level (or

contract level) variable values. For the site-level or contract-level OLS regressions, the individual-level

data were collated by site or by contract, and average measures of these variables were entered into the

models, along with the site- or contract-level administrative and policy variables. In the hierarchical

linear models, each of these two levels of data was formally represented by its own sub-model, with

each sub-model specifying the structural relations occurring and the residual variability observed at that

level.

The presence of significant intra-class correlations in hierarchical data (described further in the

following section) violates basic assumptions of the OLS regression model, including: (1) the

independence of observations, and (2) that the number of independent observations is equal for all

variables. One of the most widely extolled features of hierarchical linear models is the capability they

provide for partitioning variance into components associated with the different levels of analysis, and

subsequently allowing the detection and exploration of differences across contexts or groups. For

example, large between-group variances will indicate that an overall regression will mis-estimate

relationships for the individual groups.

Model Specifications

One strategy for exploring multilevel data is to first estimate an unconditional means model.

This simple model expresses the outcome, Yij, as a linear combination of the grand mean of Yij (m), (a

fixed component), and two random components: the variability between sites or groups (uj), and the


22/43

[ 18 ]

residual variance associated with the ith unit or individual in the jth site or group (rij). Following a

multilevel modeling approach, the level one individual outcome model is: Yij = b0j + rij , and the level

two model is expressed as a function of the overall mean and random deviations from that mean: b0j=

m00 + u0j. Substituting the level two sub-model into the level one sub-model yields the multilevel model:

Yij = m00 + u0j + rij . (Eq. 1)

Using the covariance parameter estimates from the unconditional means model, one can test

hypotheses about whether the variability between groups and the residual variability within groups are

significantly different from zero. This information may also be used to estimate the intra-class

correlation, which indicates what portion of the total variance in outcomes occurs between sites or

groups (Bryk and Raudenbush 1992 and Singer 1997). A high proportion of intra-class correlation in

the data would suggest that OLS analyses are likely produce misleading results. As a general rule of

thumb, Kreft (1996) defines high intra-class correlation as larger than r = 0.25, (i.e., more than 25

percent of the variation between sites or groups), although much smaller proportions of total variance at

the site- or group-level may be statistically significant and warrant exploration.

The results reported below were derived from the two separate studies of JTPA programs

discussed earlier: the analyses of data from the sixteen National JTPA Study sites over three years, and

the analyses of data from Heinrichs study of JTPA training providers and their contracts with a local

JTPA agency. The estimation of unconditional means models showed that for the 16 NJS sites (or 48

observations over three years), a very small but still statistically significant percentage (about 3%) of the

total variation in participant outcomes was between sites, (or at the site-level). In the study of

approximately 400 JTPA service provider contracts, a much larger percentage (6-39%) of the total


23/43

[ 19 ]

variation was at the contract administration level. These simple statistics suggest that we should expect

more cross-level interactions between levels of analyses in the study of JTPA contracts, and that the

results of the three different modeling strategies individual-level OLS models, site-level or contract-

level OLS models, and (two-level) hierarchical linear models (HLM) would be more likely to diverge

in the contract study findings.

When investigating possible cross-level interactions in hierarchical data, one is advised to begin

with a theory about which variables at the various levels would be expected to interact as well as about

the nature of the interactions. At the second (group or site) level, sub-models denoting the relationships

between level one and level two variables may specify fixed or randomly varying intercepts and/or

slopes. The full multilevel approach, in which both intercepts and slopes vary randomly, is sometimes

used for exploring the full range of potential cross-level effects in hierarchical data. This approach is

similar to fitting a different regression model within each of the level two groups or sites, and this is

typically efficient only when there is a relatively small number of level two observations with large

numbers of level-one cases within each group or site. In our study of the sixteen NJS sites, we

estimated a full, multilevel model (also known as an intercepts- and slopes-as-outcomes model),

which we will also report below. In modeling JTPA participants earnings outcomes following

their participation in JTPA programs, the levelone (individual) sub-modelis specified as follows:

Yij = b0j + b1jX1j + ...+ bnjXnj + rij, (Eq. 2)

where Yij is a measure of a participants post-program earnings; the subscript j denotes the site and

allows each site to have a unique intercept and slope for each of the level one (individual characteristic)


24/43

[ 20 ]

predictors, (X1j to Xnj), and the residual, rij, is assumed to be normally distributed with homogeneous

variance across sites. In the level two (site) sub-modelshown below, all of the predictors (Wj) are

measured at the site level, (i.e., variables describing administrative structures, performance incentive

policies, contracting practices, and economic conditions at the sites):

b0j = g00 + g01W1j + ... + g0nWnj + u0j (Eq. 3)

b1j = g10 + g11 W1j + ... + g1nWnj + u1j . . .

bnj = gn0 + gn1W1j + ... + gnnWnj + unj

The level one and level two sub-models together define the intercepts- and slopes-as-outcomes model.

In the level two sub-model, the level one intercept and beta coefficients are expressed as a linear

function of the level two predictors. In interpreting the results of this model, one examines the estimated

values of the level two coefficients (g01 to gnn) to determine which site-level variables help predict: (1)

why some sites realize better average earnings outcomes than others, and (2) how the effects of some

level one (client-level) variables on outcomes vary across sites.

The results of our estimation of the intercepts- and slopes-as-outcomes model revealed very

few statistically significant relationships among level one and level two predictors, thus indicating that

there was little significant variation in how the effects of client-level variables influenced outcomes across

the sites. These findings suggested that we could simplify our model; that is, the relationships between

the level one and level two variables did not appear to vary randomly across the sites, and thus

randomly varying slopes were not necessary. This is also the point at which we brought our theory of


25/43

[ 21 ]

governance in JTPA programs to bear more definitively on the modeling process. For example, we did

not expect the relationship between having a Private Industry Council as the administrative entity (a level

two variable) and the effects of participants gender (a level one variable) on earnings outcomes to vary

across the sites and years. Rather, we expected (and the intercepts- and slopes-as-outcomes model

results confirmed) that the relationships between administrative structure and the effects of individual-

level characteristics such as gender on outcomes were fairly constant (or fixed) across sites and years.

When one assumes fixed effects for the level one predictors, a different level two sub-model is

specified to combine with the level one sub-model (eq. 2.) This level two sub-model specification, a

variation of the random-intercept model, is:

b0j = g00 + g01W1j + ... + g0nWnj + u0j

b1j = g10 , . . . , bnj = gn0 (Eq. 4)

As in equation 4 above, the relationships between the level two (site-level) variables and the effects of

level one (client-level) predictors on earnings outcomes are fixed (b1j = g10 . . . bnj = gn0). Combining the

level one sub-model (Eq. 2) and this level two sub-model, (i.e., substituting eq. 4 into eq. 2), the

multilevel model derived is:

Yij = g00 + g01W1j + ... + g0nWnj + g10X1j + ...+ gn0Xnj + u0j + rij. (Eq. 5)

Through estimation of this hierarchical linear model (eq. 5), one obtains coefficient values for all level

one (X1j to Xnj) and level two (W1j to Wnj) predictors that account for the interrelationships among


26/43

[ 22 ]

these variables (as specified in the level two sub-model, eq. 4) and that indicate the direction and

significance of their effects for participants earnings.

Equation 5 above was used in estimating the hierarchical linear models presented in Tables 1

and 2 for the study of the sixteen NJS sites over three years. In Heinrichs study of service provider

contracts, two of the multilevel models (of participants earnings in the first post-program quarter and

their pre- to post-program quarterly earnings changes) employ this same specification (i.e., fixed level

two effects), while the other model specifies both fixed effects and a random effect in the level two sub-

model. The level two sub-model for this second specification is shown below:

b0j = g00 + g01W1j + ... + g0nWnj + u0j

b1j = g10 + g01W3j (random effect)

b2j = g20 , . . . , bnj = gn0 (fixed effects) (Eq. 6)

HLM and OLS Model Results

The findings of the hierarchical linear models are shown in the first column of Tables 1-5. The

second and third columns in each table show the results of the individual-level OLS and site-level OLS

regressions, estimated using the same data and exactly the same set of dependent and explanatory

variables as in the multilevel models.

In examining the findings in these tables, the fixed effect coefficient estimates (g10-gn0) of the HLM

models(in the first column) are directly comparable to the OLS beta coefficient estimates of the individual-

level regressions (in the second column.) In the site-level regressions, the variables that are indicator (or

binary) in form at the individual level (e.g., single head of household, welfare recipient, etc.) are aggregated


27/43

[ 23 ]

and become average proportions at the site-level. To allow for comparisons of these site-level OLS

coefficients with the coefficient estimates of binary variables in the other models, these coefficient estimates

are multiplied by their site-level average values to calculate estimated effects for the average individual (in

the third column).

The random effect estimated in the HLM model of hourly wages at termination (in the service

provider contracts study) indicates that there is a statistically significant, cross-level interaction between

the effects of contract performance incentives and the proportion of participants under age 18 that

varies across sites. The positive sign on this random coefficient indicates that, on average, sites with

higher proportions of young participants that also include performance incentives in the contracts of

providers who serve them will improve hourly wage outcomes for participants. (For additional

discussion of the substantive findings of the models shown in Tables 1-5, see Heinrich and Lynn (1999)

and Heinrich (1999.)

We begin the technical comparison of these modeling strategies by turning to Tables 1 and 2,

which display the results of the NJS data analyses. It is apparent that the HLM (column 1) and

individual-level OLS (column 2) estimated variable coefficients are very close for both individual-level

and site-level predictors. This is particularly evident in Table 2, (the model of participants first post-

program year earnings), where 97 percent of the site-level variation is explained by the model. In

general, these findings confirm that where a very small percentage of variation occurs at the site-level

(approximately 3%), OLS and HLM methods are likely to produce comparable estimates of individual

and site-level effects. Another reason for the similarity of these two sets of results is that statistical tests

(performed using HLM model output) showed that all of the statistically significantvariation at the site


28/43

[ 24 ]

level was explained away by the predictors included. That is, there was no statistically significant

variation at the site level that remained to be explained or accounted for in these models (or no omitted

variable bias at level two.)

One might reasonably ask what the advantage is of using HLM in these cases. First, we can

identify how much of the variation in outcomes lies at the different level of analyses. Second, we can

assess what proportion of this variation (at both site- and individual-levels) is explained by our models

and whether any statistically significant variation remains to be explained. In addition, researchers can

use various analytical strategies to examine and check for patterns or irregularities in the residuals at

both the site- or group-level (u0j) and the individual-level (rij). Bryk, Raudenbush and Congdon (1999)

and Goldstein (1995) describe a number of these techniques such as Q-Q plots, plots of empirical

Bayes (level two) versus least square residuals, and plots of empirical Bayes residuals with level-two

predictors to assess model fit and reliability.

Comparing HLM and individual-level OLS results for the service provider contract models

(Tables 3-5), where there was a much larger percentage of variation at level two (or between

contracts), the variable coefficient estimates are still similar, although not as close as those in Tables 1

and 2. The differences in estimated coefficient values are more noticeable in Tables 3 and 4, where

approximately 30-40 percent of the total variation was at the contract level. While the level two

variables in these models did substantially reduce the amount of contract-level variation that was not

accounted for, there were still statistically significant differences between the outcomes by contract that

remained to be explained.

The most striking findings of this investigation of modeling strategies, however, can be seen in


29/43

[ 25 ]

the comparison of the site-level OLS model results with those of the HLM and individual-level OLS

regressions. In contrast to the comparable findings of the HLM and individual-level OLS models, the

site-level models produce both inconsistent and seemingly inaccurate estimates of some of the

individual- and site-level coefficients. (See the italicized numbers in the third column of Tables 1-5.)

While the percent of variation explained in the site- (or contract) level OLS models and the HLM

models is similar, the size, sign and statistical significance of some of the coefficient values and estimated

effects differ noticeably across different outcomes in the respective studies as well as from the HLM and

individual-level OLS model results. Given that some of the seemingly anomalous estimated effects in the

site-level OLS models of JTPA participant outcomes are contrary to the findings of other JTPA

research (e.g., the positive effect of being a high school dropout on earnings in four of the five site-level

OLS models), we believe that it is the site-level OLS models that are probably inaccurate. These

findings also imply, contrary to Meads argument, that modeling administrative processes and program

outcomes across multiple sites with data on clients aggregated at the site levelmay be a less reliable

approach than similar (multiple-site) client-level data analyses.

The notable inconsistencies in the site- or contract-level policy/administrative/structural

coefficients are of particular importance for the study of governance, since these variables are nearly

always the primary focus of public policy or administration studies. In many of the studies (some

discussed earlier) that use site- or organization-level approaches, it is common to see researchers

reporting high levels of variation explained with a relatively small number of policy or governance

variables. A few, such as Mead (forthcoming), make it clear that site- or organization-level OLS

models are not explaining variation in individual outcomes, but rather variation between average


30/43

[ 26 ]

outcomes across the sites or organizations. Our findings underscore that ignoring the variation in

individual-level outcomes and the potential cross-level effects between variables operating at individual-

and site- or organization levels may well lead to inaccurate estimates of policy/administrative/structural

variable effects.

In a recent study that also compared multilevel modeling strategies to individual- and group-level

OLS regressions, Krull and MacKinnon (1999) reached a similar conclusion. In discussing the

individual- versus group-level models, they also pointed out that when individual-level data are

aggregated, the ability to predict individual-level variation, which frequently comprises the majority of

total variation, is eliminated. Therefore, researchers should expect that individual and group level

analyses of the same data might indicate relationships that differ in both magnitude and direction.

Overall, they concluded that multilevel-based estimates of the standard error showed considerably less

bias than OLS-based estimates, and that OLS analyses were less efficient than multilevel analyses

(433).

To summarize, in the absence of multilevel analyses, researchers are unable to determine how

much of the total variation in outcomes lies at the site- or organization level (i.e., the extent of intra-class

correlation) and how much of it one is able to explain with a given model specification. In Table 2,

where the amount of intra-class correlation was small and the site-level variables included in the models

explained nearly all of the site-level variation in outcomes, the estimates produced by the three different

modeling strategies of policy/administrative/structural effects were much closer. Without this

information, however, how does one assess the probable accuracy of estimated effects? While some

researchers support their quantitative studies with qualitative, hands-on components, it is also not


31/43

[ 27 ]

uncommon for them to report some findings that are inconsistent with their hypothesized effects. In

these cases, how does one ascertain whether it is the theory or the model specification that is in error?

The results of the analyses presented here suggest that more attention should be given to multilevel

modeling as a strategy for empirically investigating the linkages between governance and performance.

Conclusions

Multilevel modeling holds considerable promise for governance research. Rapidly increasing

computing capacity and new developments in statistical theories have now made programs for multilevel

modeling (HLM, HOMALS, VARCL, BIRAM, and SAS mixed models are a few examples; see Kreft

and Aschbacher 1994) accessible to anyone willing to invest some time in learning about the underlying

theories and how to apply them. In a recent workshop Models and Methods for the Empirical Study

of Governance, Ann Chih Lin asked, however, whether our quest to advance the empirical study of

governance will drive a push to create Godzilla-like data sets and the subsequent analysis and re-

analysis of them. She noted that developing and supporting the analyses of large-scale, multilevel (and

frequently longitudinal) data sets such as those described in this paper require substantial resources that

might otherwise provide support to many smaller projects. One might question, for example, whether

substantially more knowledge might be gained from a multi-site, multilevel empirical study of drug abuse

treatment programs (such as that which NTIES might allow) than a number of smaller-scale case-

studies like that produced by Attewell and Gerstein.

While the creation or re-analysis of multi-site, multilevel data sets might not always be feasible

or the best use for sparse research funds, we believe that when it is possible to develop and work with


32/43

[ 28 ]

these types of data and methods, the advantages gained in terms of (1) a fuller and more precise

understanding of complex, hierarchical relationships, (2) more information about the amount of variation

explained by statistical models at different levels of analysis, and (3) increased generalizability of findings

across different sites or organizations with varying (observable) characteristics makes the investment in

multilevel modeling worthwhile.

When one doesnt know how much of the total variation in the dependent variable (e.g., a

program outcome) lies at the various levels of organization (i.e., the extent of intra-class correlation), the

results of an individual- or higher-level OLS regression should be interpreted with considerable caution.

As in any scientific field, research that attempts to replicate the most important findings of these studies

is desirable, although this also becomes more challenging when data sets (and subsequently statistical

models) are not directly comparable. Case-study or other qualitative research components can provide

important background for the interpretation of OLS regression findings in these cases, but they typically

do not make the findings more generalizable across a range of program or organizational contexts.

When presenting and discussing their findings, governance researchers should be clear not only about

what they are able to measure and explain in their models but also about the limitations on these findings

attributable to the models, methods, and data employed.


33/43

29

TABLE 1: Hierarchical linear and OLS models of JTPA participants first post-program

quarter earnings outcomes (National JTPA Study data analyses)

Earnings in first post-program quarter

Predictors - (individual level) Hierarchical linear

model

OLS - individual

level

OLS - site level (average)

Intercept 190.55 (0.40) 208.92 (0.51) 33.00 (0.02)

Gender (1=male) 517.88*** (6.51) 513.19*** (6.46) -21.26*** (-2.73) -903.55

Age 22-29 years 369.75*** (3.98) 374.64*** (4.04) 3.91 (0.52) 110.65

Age 30-39 years 240.91** (2.36) 244.29** (2.40) 37.59*** (3.88) 967.94

Age 40 and over years 53.84 (0.42) 57.86 (0.45) -15.17 (-1.00) 165.65

Black -235.16** (-2.29) -239.48** (-2.35) -12.69** (-2.33) -365.47

Hispanic -109.56 (-0.90) -133.60 (-1.11) -0.33 (-0.06) -3.55

Divorced, widowed or separated 87.89 (1.02) 91.86 (1.07) -32.28*** (-4.53) -864.78

No high school degree -350.57*** (-4.52) -349.11*** (-4.51) 20.70* (1.87) 929.22

Some post high school education 360.81*** (3.58) 357.83*** (3.55) 2.41** (0.34) 41.77

Welfare recipient at time of application -293.05*** (-3.71) -298.60*** (-3.78) -7.91 (-1.41) -425.00

Children under age six 63.58 (0.76) 66.71 (0.79) -20.20 (-1.40) -445.01

Employment-unemployment transition in

year before enrollment

-295.66*** (-3.92) -297.27*** (-3.95) 40.57*** (5.31) 2582.69

Earnings in year before enrollment 0.09*** (9.70) 0.09*** (9.72) -0.11 (-1.23)

Received classroom training 100.36 (1.22) 99.52 (1.22) -5.42 (-1.57) -387.26

Received on-the-job training 388.36*** (3.58) 388.34*** (3.56) 26.41*** (3.39) 457.42

Predictors - (site level)

PIC is the administrative entity 446.41*** (3.60) 436.12*** (4.16) 883.00*** (4.80) 404.68

PIC and LEO/CEO are equal partners -472.55** (-2.00) -436.75** (-2.12) 1170.10*** (3.53) 682.52

Percent of services provided directly by

administrative entity

-548.28 (-1.45) -487.64 (-1.53) -259.20 (-0.56)

Percent of performance-based contracts -650.32* (-1.91) -550.17* (-1.90) 1198.20** (2.36)

Weight accorded to employment rate

standard

4260.41*** (3.21) 4188.11*** (3.75) 341.00 (0.22)

Minimum number of standards sites

must meet to qualify for performance

bonuses

21.13 (1.10) 17.33 (1.12) -2.62 (-0.09)


34/43

30

Earnings in first post-program quarter(Table 1, continued)


model

OLS - individual

level OLS - site level

Requirement that performance bonuses

must be used to serve highly

disadvantaged groups

-242.70** (-2.02) -252.56** (-2.40) -1543.80*** (-3.44) -289.46

Southern region 433.03 (1.49) 362.93 (1.43) -1643.00*** (-2.77) -410.75

Midwestern region 535.74*** (2.90) 538.03** (3.31) 11.50 (0.03) 3.59

Western region 825.04** (2.22) 752.67** (2.32) -445.40 (-0.82) -113.35

Unemployment rate 11725.86*** (2.67) 11546.00*** (3.10) 5105.00 (0.90)

Model predicting power - percent

of variation explained by model

6% individual-level; 86%

between-site

R2

= 11.3% R2

= 85.4%

Coefficient value (t-ratio in parentheses): *significant at a


35/43

31

TABLE 2: Hierarchical linear and OLS models of JTPA participants first post-program

year earnings outcomes (National JTPA Study data analyses)

Earnings in first post-program year

Predictors - (individual level) Hierarchical

linear model

Individual level

OLS

Site level OLS (average)

Intercept 1117.10 (0.76) 1093.36 (0.77) 15892.00*** (3.52)

Gender (1=male) 2144.18*** (7.76) 2143.20*** (7.76) -8.79 (-0.37) -373.58

Age 22-29 years 1455.00*** (4.51) 1456.07*** (4.51) -68.84*** (-2.83) -1948.17

Age 30-39 years 1000.99*** (2.82) 1000.36*** (2.82) -92.64*** (-2.62) -2385.48

Age 40 and over years 397.21 (0.89) 398.69 (0.90) 6.86 (0.16) 280.71

Black -1079.14*** (-3.04) -1081.02*** (-3.05) -24.82* (-1.79) -714.82

Hispanic -699.64 (-1.66) -714.26* (-1.70) 4.47 (0.28) 48.14

Divorced, widowed or separated 325.55 (1.09) 326.86 (1.09) -48.64** (-2.24) -1303.07

No high school degree -1424.55*** (-5.29) -1423.04*** (-5.29) -42.09 (-1.22) -1889.42

Some post high school education 1046.91*** (2.99) 1047.70*** (2.99) -65.02*** (-2.90) -1126.80

Welfare recipient at time of application -1006.49*** (-3.67) -1012.47*** (-3.69) -49.09*** (-2.91) -2637.61

Children under age six 496.51* (1.70) 500.15* (1.71) -69.46* (-1.67) -1530.20

Employment-unemployment transition in

year before enrollment

-862.80*** (-3.30) -865.79*** (-3.31) -4.08 (-0.15) -259.73

Earnings in year before enrollment 0.33*** (10.59) 0.33*** (10.59) 0.40 (1.30)

Received classroom training 125.71 (0.44) 132.74 (0.47) -23.74*** (-2.52) -1696.22

Received on-the-job training 1195.17*** (3.17) 1197.98*** (3.18) -40.20 (-1.53) -696.26

Predictors - (site level)

PIC is the administrative entity 1737.40*** (4.59) 1727.15*** (4.74) 1626.90*** (2.98) 745.12

PIC and LEO/CEO are equal partners -1933.65*** (-2.61) -1949.44*** (-2.73) -438.90 (-0.56) -255.88

Percent of services provided directly by

administrative entity

-2618.57** (-2.26) -2604.93** (-2.35) -564.00 (-0.43)

Percent of performance-based contracts -2719.45*** (-2.60) -2709.02*** (-2.69) -2033.00* (-1.80)

Weight accorded to employment rate

standard

15887.75*** (3.93) 15888.00*** (4.09) 15710.00*** (3.13)

Minimum number of standards sites

must meet to qualify for performance

bonuses

22.25 (0.39) 21.50 (0.40) 102.00 (1.16) 11.74


36/43

32

Earnings in first post-program year(Table 2, continued)


model

Individual level

OLS

Site level OLS

(average)

Requirement that performance bonuses

must be used to serve highly

disadvantaged groups

-866.66** (-2.30) -865.32** (-2.36) -1376.00 (-1.35) -258.00

Southern region 2035.83** (2.24) 2025.88** (2.30) 3101.00** (2.15) 775.25

Midwestern region 1936.15*** (3.33) 1940.48*** (3.44) 4367.00*** (3.99) 1364.69

Western region 3215.92*** (2.76) 3214.16*** (2.85) 3760.00*** (2.81) 940.00

Unemployment rate 49955.52*** (3.71) 50558.00*** (3.90) 58873.00*** (3.53)

Model predicting power - percent

of variation explained by model

13% individual-level;

97% between-site

adjusted R2

=

13.2%

adjusted R2

=

87.6%



37/43

33

Table 3: Hierarchical linear and OLS models of JTPA participants

hourly wages at termination (study of service provider contracts)

Hourly wage at termination

model predictors

Hierarchical

linear model

Individual level

OLS

Contract(or) level OLS

Individual level variablesIntercept

Participant characteristics

Under age 18 years

Age 22-29 years

Age 30-39 years

Age 40 years and over

Male

African-American

Hispanic

Single head of household

Welfare recipient

No high school degree

Post high-school education

College graduateMinimal work history

Unemployed at application

Not in labor force

Zero earnings in pre-program year

Training services

Received basic/remedial education

Received vocational training

Received on-the-job training

Received job search/job club

Length of training (in months)

Economic/environmental factors

Percent change in employment, 1988-1989





Contract level variablesPrivate, nonprofit contractor

For-profit contractor

Performance incentives in contract

Random effect: Under age 18 years byperformance incentives in contract

Predicting power (or percent of variation

explained)

2.02*** (11.28)

-0.31*** (-4.56)

0.34*** (4.67)

0.37*** (4.86)

0.51*** (5.94)

-0.03 (-0.86)

-0.10** (-2.26)

-0.01 (-0.09)

0.11** (1.96)

-0.17*** (-4.10)

-0.15*** (-2.80)

0.13*** (2.50)

0.32 (1.48)0.03 (0.60)

-0.30*** (-4.17)

-0.54*** (-7.00)

-0.43*** (-8.96)

-0.10 (-1.54)

0.41*** (5.91)

1.40*** (17.53)

-0.02 (-0.16)

-0.09*** (-17.14)

-8.68*** (-3.83)

2.22*** (9.67)

0.26 (0.94)

32.01* (1.87)

51.37 (1.17)

0.29* (1.84)

0.77*** (4.10)

0.34*** (2.72)

0.30*** (2.67)

9.0% (individual)

68.0% (contract)

2.37*** (20.77)

-0.53*** (-9.23)

0.63*** (9.94)

0.68*** (10.12)

1.02** (15.50)

-0.05 (-1.33)

-0.11** (-2.53)

-0.02 (-0.25)

0.09 (1.61)

-0.20*** (-4.83)

-0.02 (-0.34)

0.20*** (3.61)

0.63*** (2.85)-0.03 (-0.65)

-0.28*** (-3.76)

-0.68*** (-8.65)

-0.67*** (-14.72)

-0.14*** (-2.96)

0.39*** (8.31)

1.35*** (24.58)

0.09 (0.91)

-0.09*** (-19.27)

-16.28*** (-7.56)

2.26*** (11.18)

0.70*** (4.08)

24.35** (2.16)

1.63 (0.07)

0.10* (1.75)

0.40*** (6.12)

0.32*** (7.11)

n.a.

adjusted R2=

34.7%

2.85*** (3.30)

-0.001 (-0.42) -0.04

0.012*** (2.78) 0.21

0.019*** (3.94) 0.28

0.022*** (5.40) 0.29

-0.005 (-1.44) -0.26

0.002 (0.57) 0.12

0.004 (0.84) 0.04

-0.008 (-1.53) -0.15

-0.007* (-1.66) -0.20

0.001 (0.27) 0.02

0.002 (0.42) 0.03

-0.003 (-0.22) -0.0030.001 (0.43) 0.05

-0.006 (-0.75) -0.31

-0.014* (-1.79) -0.62

-0.009*** (-2.99) -0.36

-0.005** (-2.24) 0.10

0.001 (0.59) 0.04

0.022*** (7.06) 0.31

0.013*** (3.18) 0.07

-0.092*** (-3.16) -0.48

-22.75 (-1.24)

2.98** (2.32)

1.59* (1.75)

73.01 (1.16)

-84.47 (-0.96)

0.23 (1.17) 0.15

0.53** (2.18) 0.11

0.13 (0.78) 0.09

n.a.

adjusted R2=

68.2%



38/43

34

Table 4: Hierarchical linear and OLS models of JTPA participants first post-program

quarter earnings outcomes (study of service provider contracts)

First post-program quarter earnings

model predictors

Hierarchical

linear model

Individual level

OLS

Contract(or) level OLS



Under age 18 years

Age 22-29 years

Age 30-39 years


Male

African-American

Hispanic


Welfare recipient





Not in labor force


Training services






Economic/environmental factorsPercent change in employment, 1988-1989








Predicting power (or percent of variation

explained)

1367*** (19.03)

-145*** (-4.61)

238*** (6.15)

339*** (8.41)

315*** (7.33)

8 (0.39)

-114*** (-4.72)

49 (1.29)

29 (0.95)

-93*** (-4.17)

-68** (-2.39)

-15 (-0.50)

-94 (-0.96)

-87*** (-4.04)

-250** (-6.38)

-411*** (-9.71)

-513*** (-21.24)

-39 (-1.37)

38 (1.15)

433*** (11.23)

183*** (3.07)

-18*** (-7.16)

-2169* (-1.72)

540*** (4.63)

-52 (-0.45)

602 (0.08)

-12194 (-0.71)

-53 (-1.07)

55 (0.91)

86** (2.11)

9.0% (individual)

87.0% (contract)

1427*** (23.65)

-150*** (-4.94)

248*** (7.11)

341*** (9.29)

280*** (7.85)

-10 (-0.53)

-113*** (-5.14)

78** (2.18)

69** (2.30)

-77*** (-3.52)

-82*** (-3.06)

-0.2 (-0.007)

-59 (-0.61)

-90*** (-4.38)

-270*** (-6.87)

-442*** (-10.53)

-574*** (-24.31)

-89*** (-3.87)

78*** (3.07)

513*** (17.02)

217*** (4.24)

-16*** (-6.92)

-2055* (1.88)

442*** (4.46)

-11 (-0.13)

3784 (0.70)

-22309* (-1.85)

-26 (-0.90)

48 (1.43)

53** (2.32)

adjusted R2=

33.4%

2352.9*** (4.81)

2.0 (0.89) 79

4.5* (1.91) 79

13.0*** (4.51) 189

10.4*** (4.87) 136

-3.2 (1.56) 166

-4.1*** (-2.93) -247

-1.2 (-0.49) 11

1.0 (0.36) 19

-2.5 (-1.14) -72

1.1 (0.63) -19

-10.8*** (3.32) -156

-6.0 (0.95) -6

0.7 (0.54) 33

-8.3* (-1.86) -430

-12.0*** (-2.79) -529

-7.3*** (-4.77) -292

-1.9* (-1.84) -37

-0.4 (-0.35) -18

13.2*** (7.97) 187

6.7*** (2.59) 34

-17.1 (-1.13) -89

7358.9 (0.76)

235.2 (0.37)

-241.0 (-0.54)

11390 (0.38)

-61197 (-1.35)

-179.3* (-1.81) -114

-113.1 (-0.92) -24

97.4 (1.12) 65

adjusted R2=

66.1%



39/43

35

Table 5: Hierarchical linear and OLS models of JTPA participants pre- to post-program

quarterlyearnings change outcomes (study of service provider contracts)

Pre- to post-program quarterly

earnings change model predictors

Hierarchical

linear model

Individual level

OLS Contract(or) level OLS



Under age 18 years

Age 22-29 years

Age 30-39 years


Male

African-American

Hispanic


Welfare recipient





Not in labor force


Training services






Economic/environmental factorsPercent change in employment, 1988-1989








Predicting power (or percent of variationexplained)

447*** (6.76)

-79*** (-2.64)

-10 (-0.26)

-75* (-1.93)

-75* (-1.83)

-7 (-0.37)

-117*** (-5.00)

10 (0.27)

85*** (2.84)

25 (1.16)

-1 (-0.03)

-44 (-1.46)

-148 (-1.44)56*** (2.68)

-274*** (-7.08)

-311*** (-7.52)

111*** (4.69)

-65*** (-2.48)

-9 (-0.31)

206*** (5.62)

21 (0.36)

-7*** (-2.80)

-3169*** (-2.58)

290*** (2.60)

-304*** (-2.83)

-20042*** (-2.81)

-43810*** (-2.81)

-15 (-0.36)

25 (0.49)

163*** (4.70)

2.0% (individual)

49.0% (contract)

497*** (8.51)

-108*** (-3.72)

-12 (-0.35)

-85** (-2.36)

-106*** (-3.00)

-14 (-0.74)

-100** (-4.62)

22 (0.64)

114*** (3.91)

32 (1.53)

-6 (-0.25)

-46 (-1.56)

-146 (-1.42)49** (2.44)

-284*** (-7.36)

-321*** (-7.88)

61*** (2.65)

-78*** (-3.56)

51** (2.05)

239*** (8.03)

58 (1.12)

-7*** (-3.27)

-4779*** (-4.39)

287*** (2.95)

-213*** (-2.69)

-10025* (-1.92)

-18981* (-1.61)

-15 (-0.54)

17 (0.51)

123*** (5.53)

adjusted R2=

5.3%

2343.7*** (3.74)

-0.8 (-0.29) -32

1.5 (0.49) 26

-2.5 (-0.67) -36

-3.4 (-1.24) -44

-3.0 (-1.15) -155

-3.8** (-2.13) -229

-8.1*** (-2.57) -74

1.5 (0.43) 28

1.0 (0.37) 29

1.3 (0.65) 23

-20.7*** (-4.95) -298

1.2 (0.16) 11.9 (1.09) 90

-15.5*** (-2.70) -802

-15.5*** (-2.81) -684

-1.0 (-0.52) -40

0.3 (0.25) 6

2.5* (1.69) 119

13.6*** (6.37) 193

3.7 (1.13) 19

-43.2** (-2.23) -226

-17066 (-1.37)

410.9 (0.51)

-526.4 (-0.94)

15988 (0.41)

40950 (0.71)

-254.6** (-2.01) -162

-412.5*** (-2.62) -87

271.7** (2.44) 181

adjusted R2=

65.4%



40/43

[ 36 ]

REFERENCES

Arum, Richard, Do Private Schools Force Public Schools to Compete?American Sociological

Review 61:1 (February 1996): 29-46.

Attewell, Paul and Dean R. Gerstein, Government Policy and Local Practice,American Sociological

Review 44 (April 1979): 311-327.

Bryk, Anthony, Stephen Raudenbush and Richard Congdon,Hierarchical Linear and Nonlinear

Modeling with the HLM/2L and HLM/3L Program, Chicago: Scientific Software International, 1999.

Bryk, Anthony S. and Raudenbush, Stephen W.,Hierarchical Linear Models: Applications and

Data Analysis Methods, London: Sage Publications, 1992.

Bryk, Anthony S. and Raudenbush, Stephen W., On Heterogeneity of Variance in ExperimentalStudies: A Challenge to Conventional Interpretations, Psychological Bulletin, 104:3 (1988): 396-

404.

Bryk, Anthony S. and Raudenbush, Stephen W., Application of Hierarchical Linear Models to

Assessing Change, Psychological Bulletin, 101:1 (1987): 147-158.

DAunno, Thomas, Robert I. Sutton, and Richard H. Price, Isomorphism and External Support in

Conflicting Institutional Environments: A Study of Drug Abuse Treatment Units,Academy of

Management Journal34:3 (1991): 636-661.

Ferguson, Ronald F., Paying for Public Education: New Evidence on How and Why Money Matters,

Harvard Journal on Legislation 28 (1991): 465-498.

Fletcher, Bennet W., Frank M. Tims, and Barry S. Brown, Drug Abuse Treatment Outcome Study

(DATOS): Treatment Evaluation Research in the United States, Psychology of Addictive Behaviors

11:4 (1997): 216-229.

Gerstein, Dean R., A. Rupa Datta, Julia S. Ingels, Robert A. Johnson, Kenneth A. Rasinski, Sam

Schildhaus, Kristine Talley, Kathleen Jordan, Dane B. Phillips, Donald W. Anderson, Ward G.

Condelli, and James S. Collins, Final Report: National Treatment Improvement Evaluation Study,U.S. Department of Health and Human Services, March 1997.

Goldstein, Harvey, and S. Thomas, Using Examination Results as Indicators of School and College

Performance,Journal of the Royal Statistical Society, 159:1 (1996): 149-163.


41/43

[ 37 ]

Goldstein, Harvey,Multilevel Statistical Models, New York: Halsted Press, 1995.

Goldstein, Harvey, Statistical Information and the Measurement of Education Outcomes,Journal of

the Royal Statistical Society, 155 (1992): 313-315.

Goldstein, Harvey, Models for Multilevel Response Variables with an Application to Growth Curves,in R.D. Bock (Ed.),Multilevel Analysis of Educational Data, New York: Academic Press, 1989.

Goldstein, Harvey,Multilevel Models in Educational and Social Research, London: Oxford

University Press, 1987.

Goldstein, Harvey, Multilevel mixed linear model analysis using iterative generalized least squares,

Biometrika, 73 (1986): 43-56.

Gray, J., D. Jesson, Harvey Goldstein and J. Rasbash, A Multilevel Analysis of School Improvement:

Changes in Schools Performance Over Time, School Effectiveness and School Improvement, 6(1995): 97-114.

Heckman, James J., LaLonde, Robert J. and Jeffrey A. Smith, The Economics of Econometrics and

Active Labor Market Programs, Prepared for the Handbook of Labor Economics, Volume III, Orley

Ashenfelter and David Card, editors.

Heckman, James J., Carolyn J. Heinrich and Jeffrey A. Smith. "Assessing the Performance of

Performance Standards in Public Bureaucracies." American Economic Review, 87:2 (1997): 389-

395.

Heinrich, Carolyn J. and Laurence E. Lynn, Jr., "Governance and Performance: The Influence of

Program Structure and Management on Job Training Partnership Act (JTPA) Program Outcomes,"

presented at the Workshop on Models and Methods for the Empirical Study of Governance, University

of Arizona, April 29-May 1, 1999.

Heinrich, Carolyn J., "Organizational Form and Performance: An Empirical Investigation of Nonprofit

and For-profit Job-training Service Providers," working paper, National Bureau of Economic Research

and The University of Chicago, 1998.

Jennings, Edwart T., Building Bridges in the Intergovernmental Arena: Coordinating Employment andTraining Programs in the American States, Public Administration Review 54:1 (January/ February

1994): 52-60.

Jennings, Edwart T. and JoAnn Gomer Ewalt, Interorganizational Coordination, Administrative

Consolidation and Policy Performance, Public Administration Review 58:5 (September/October


42/43

[ 38 ]

1998): 417-28.

Kreft, Ita G., Are Multilevel Techniques Necessary? An Overview, Including Simulation Studies,

unpublished manuscript, California State University, Los Angeles, 1996.

Kreft, Ita G. and Pamela R. Aschbacher, Measurement and Evaluation Issues in Education: The Valueof Multivariate Techniques in Evaluating An Innovative High School Reform Program,International

Journal of Educational Research 21 (1994): 181-196.

Lynn, Laurence E., Jr., Heinrich, Carolyn J., and Hill, Carolyn J., The Empirical Study of

Governance: Theories, Models, and Methods, Georgetown University Press, forthcoming, 2000.

Mead, Lawrence M., Optimizing JOBS: Evaluation Versus Administration, Public Administration

Review 57:2 (March/April 1997): 113-123.

Mead, Lawrence M., Performance Analysis, Unpublished manuscript, New York University, 1999.

Mead, Lawrence M., The Decline of Welfare in Wisconsin,Journal of Public Administration

Research and Theory, forthcoming.

Meier, Kenneth J., Bureaucracy and Democracy: The Case for More Bureaucracy and Less

Democracy, Public Administration Review 57:3 (May/June 1997): 193-99.

Meier, Kenneth J. and Joseph Stewart. The Impact of Representative Bureaucracies: Educational

Systems and Public Policies,American Review of Public Administration, 22:3 (September 1992):

157-71

Meier, Kenneth J., Joseph Stewart and Robert E. England. The Politics of Bureaucratic Discretion:

Educational Access as an Urban Service,American Journal of Political Science 35:1 (1991): 155-

177.

Meyer, Alan D. and James B. Goes, How Organizations Adopt and Implement New Technologies,

Best Papers Proceedings Academy of Management (Forty-seventh Annual Meeting of the Academy

of Management, New Orleans, Lousiana, August 9-12, 1987), pp. 175-179.

Milward, H. Brinton and Provan, Keith G., "Governing Service Provider Networks," Presented atEGOS 14th Colloquium, Maastricht University, The Netherlands, 1998.

Roderick, Melissa, Evaluating Chicagos Efforts to End Social Promotion, Presented at the

Workshop on Models and Methods for the Empirical Study of Governance, University of Arizona,

April 29-May 1, 1999.


43/43

Roderick, Melissa and Eric Camburn, Risk and Recovery: Course Failures in the Early Years of High

School, Unpublished Manuscript, January 1997.

Sandfort, Jodi, The Structural Impediments to Front-line Human Service Collaboration: The Case of

Welfare Reform, Presented at the Annual Meeting of the American Political Science Association,Boston, September, 1998.

Singer, Judith D., Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and

Individual Growth Models,Journal of Educational and Behavioral Statistics, forthcoming.

Smith, Kevin B. and Kenneth J. Meier, Politics, Bureaucrats and Schools, Public Administration

Review 54:4 (November/December 1994): 551-558.

empirical methods for investigating governence

Documents