empirical methods for investigating governence
TRANSCRIPT
-
8/3/2019 Empirical Methods for Investigating Governence
1/43
MEANS AND ENDS: A COMPARATIVE STUDY
OF EMPIRICAL METHODS FOR INVESTIGATING
GOVERNANCE AND PERFORMANCE
Carolyn J. Heinrich and Laurence E. Lynn, Jr.
The University of Chicago
DRAFT
September 1999
Prepared for the Fifth National Public Management Research Conference, George Bush School of
Public Service, Texas A&M University, College Station, Texas, December 3-4, 1999, with the support
of the Pew Charitable Trusts.
-
8/3/2019 Empirical Methods for Investigating Governence
2/43
ABSTRACT
Scholars within different disciplines employ a wide range of empiricalapproaches to understanding how, why and with what consequences government is
organized. We first review recent statistical modeling efforts in the areas of education,
job-training, welfare reform and drug abuse treatment and assess recent advances in
quantitative research designs. We then estimate governance models with two different
data sets in the area of job training using three different statistical approaches:
hierarchical linear models (HLM); ordinary least squares (OLS) regression models using
individual level data; and OLS models using outcome measures aggregated at the site or
administrator level. We show that HLM approaches are in general superior to OLS
approaches in that they produce (1) a fuller and more precise understanding of
complex, hierarchical relationships in government, (2) more information about theamount of variation explained by statistical models at different levels of analysis, and (3)
increased generalizability of findings across different sites or organizations with varying
characteristics. The notable inconsistencies in the estimated OLS regression coefficients
are of particular interest to the study of governance, since these estimated relationships
are nearly always the primary focus of public policy and public management research.
-
8/3/2019 Empirical Methods for Investigating Governence
3/43
Table of Contents
Introduction 1
Empirical Governance Research: Observations on the State of the Art 2
Some Improvements in Models and Methods 3
Lingering Limitations of Conventional Approaches 5
Multilevel Approaches to Governance Research 9
Applications of Multilevel Modeling 10Education 12
Drug Abuse Treatment 14
Employment and Training 16
Comparing Hierarchical Linear Model and Ordinary
Least Squares Results 17
Model Specifications 18
HLM and OLS Model Results 23
Conclusions 28
Tables 30
-
8/3/2019 Empirical Methods for Investigating Governence
4/43
-
8/3/2019 Empirical Methods for Investigating Governence
5/43
[ 1 ]
Introduction
Scholars of governance within political science, public policy, and public administration describe
their efforts to understand how, why and with what consequences government is organized and
managed as getting inside of or breaking open the black box of program implementation (Lynn,
Heinrich and Hill 1999). A wide range of research designs from case studies and historical
accounts to more formal models that include quantitative analysis are employed to explicate the
processes that establish the means and ends of governmental activity, and, in some studies, to assess the
implications of administration and management for individual-level and program outcomes.
Recently, reflecting world-wide interest in performance management, scholars have begun to
advocate research strategies that relate the measurable effects of public programs and policies to the
specific administrative practices and program or institutional features that seem to produce them (Lynn,
Heinrich and Hill, 1999; Mead, 1997, 1999; Smith and Meier, 1994; Milward and Provan, 1998; and
Roderick, 1999). Mead (1997) argues that program impact studies that neglect the influence of local
administrative capacity and structures have little value to policy makers and program administrators.
However, scholars have long recognized the theoretical and methodological difficulties associated with
identifying and describing complex interrelationships across multiple administrative levels within public
organizations and showing how different structural and administrative arrangements, collectively termed
governance, affect program outcomes.
This paper is concerned with assessing the advantages and disadvantages of different research
strategies that may be used in the empirical study of governance and performance. We first review
studies in several disciplines and policy areas including education, welfare reform, job-training and
-
8/3/2019 Empirical Methods for Investigating Governence
6/43
[ 2 ]
drug abuse treatment to determine the extent to which advances in statistical modeling, and, in
particular, in hierarchical or multilevel modeling, as well as collaborations between researchers and
public officials, increase the potential for more accurate and informative governance research. Then,
based analyses of two different data sets that have individual level observations, we will compare the
performance of three different statistical approaches: hierarchical linear models (HLM); ordinary least
squares (OLS) regression models using individual level data; and OLS models using outcome measures
aggregated at the site or administrator level. We will show that, in general, multilevel modeling strategies
are more likely to produce unbiased estimates of policy, administrative or structural variable effects on
outcomes than traditional, ordinary least squares approaches, particularly when the extent of cross-level
effects operating at the multiple levels of analysis is relatively high.
Empirical Governance Research: Observations on the State of the Art
Most relationships in government and social systems involve activities and interactions that span
multiple levels of organization or systemic structures. Empirical studies designed to analyze these
relationships typically focus on program processes or outcomes at a single organizational (or individual)
level. Some studies group or aggregate individuals (or units of analysis of some type) at a higher level of
organization or structure and attempt to explain average effects or outcomes (e.g., for local offices or
agencies.) Other studies, including experimental and non-experimental program evaluations, analyze the
influence of organizational or structural factors on individual or lower-level unit outcomes by controlling
for these factors in individual-level regressions or by estimating separate individual-level regressions for
different organizational units. These analytical approaches all suffer from the limitations of conventional
statistical methods for estimating linear models with multiple levels of data.
-
8/3/2019 Empirical Methods for Investigating Governence
7/43
[ 3 ]
Statistical modeling efforts designed to explain individual-level outcomes based on data from
experimental and non-experimental analyses frequently account for factors related to program
administration and implementation with a single program indicator variable, such as a school or local
office indicator. In these studies, we gain little understanding of the interactions and influence of
specific organizational or structural factors on program outcomes. While experimental evaluations of
public programs such as those conducted by the Manpower Demonstration Research Corporation
(MDRC) have consistently included process evaluation components, the qualitative data are
subsequently used for descriptive or interpretive purposes rather than for establishing causal
relationships between administrative practices and outcomes. This use of process analysis is informative
and a potentially valuable complement to quantitative analyses. When they are not incorporated into the
statistical models, however, process analyses tend to be overshadowed in presentations of findings
concerning program impacts.
Some Improvements in Models and Methods
Both researchers and public officials are coming to recognize that accounting for average
program outcomes or impacts provides little information to public managers about how they can
improve program performance. For example, Mathematica Policy Research and its subcontractors are
presently conducting an experimental evaluation of the Job Corps program that involves over 100 sites
across the country and links client data to information about program administration and services
provided at the sites. In addition, Manpower Demonstration Research Corporation (MDRC)
investigators are currently engaged in research, utilizing multi-site, experimental Job Opportunities and
Basic Skills (JOBS) evaluation data combined with the rich array of survey data of program
-
8/3/2019 Empirical Methods for Investigating Governence
8/43
[ 4 ]
administrators and staff, that departs from the traditional experimental approach to program evaluation
by formally incorporating process data analyses into the modeling strategies. Unfortunately, relatively
few researchers have access to these types of data sets substantial in size and collected through costly
experimental designs that allow them to avert challenging statistical issues such as selection bias and
comparison group inequalities, inadequate sample sizes, and other data-related problems.
Progress is also being made, however, in the area of non-experimental methodologies using
individual-level data obtained through administrative and other non-experimental sources. One of the
advantages of non-experimental over experimental approaches is that they are better suited to estimating
the heterogeneous effects of heterogeneous treatments or services on clients, and sorting out the
differential effects that programs can have on various client groups. Such information is more likely to
be useful to program administrators than simple average impact estimates.
An example of this type of research is that of Heckman, LaLonde and Smith (forthcoming).
They have produced an exhaustive analysis of the methodological lessons learned in evaluating social
programs through the use of both experimental and non-experimental evaluation methodologies. They
present a comprehensive discussion of a broad array of econometric models and estimators including
their properties, assumptions and information about the way they condition and transform the data to
guide researchers in their use of these methodologies. Somewhat surprising is their conclusion that
there is no single, inherently preferable method or economic estimator for evaluating public programs:
too much emphasis has been placed on formulating alternative econometric methods for correcting
selection bias and too little [attention] given to the quality of the underlying data. Heckman, LaLonde,
and Smith suggest that more effort should be invested in improving the quality of data used in studying
-
8/3/2019 Empirical Methods for Investigating Governence
9/43
[ 5 ]
the effects of public programs than in the development of formal econometric methods to overcome
problems generated by inadequate data. More specifically, they show that if biases are clearly defined,
comparable people in the same geographical areas are compared, and relevant background data on
clients are collected (using the same survey questionnaires), problems in using non-experimental
methodologies for evaluating program outcomes will be much less than formerly believed.
Lingering Limitations of Conventional Approaches
These advances in non-experimental evaluation methodologies, in combination with an
increasing number of longer-term collaborations between public officials and scholars engaged in
governance and evaluation research, have made the use of client-level administrative data in statistical
models of program outcomes more feasible and frequent. Lingering problems still constrain what we
can learn from these types of client-level data analyses, however.
One problem is that these models typically explain only a small percentage of the total variation
in individual outcomes. Individual-level data exhibit considerable random variation, and there are also
likely to be a number of unmeasured influences on outcomes at the individual level. In educational
policy research, for example, the oft-cited Coleman Report finding that schools bring little influence
to bear on a childs achievement that is independent of his background and general social context . . .
has undoubtedly been discouraging to educational research. Smith and Meier (1994) argue that, given
the well-established distance between system characteristics and individual performance, using
individual-level data to study educational system performance is a flawed approach.
A second problem is that procedures to assess what portion of the explained variation can be
attributed to any policy or administrative variables included in these types of models are hardly ever
-
8/3/2019 Empirical Methods for Investigating Governence
10/43
[ 6 ]
straightforward. For example, Jennings and Ewalt (1998) studied the influence of increased
coordination and administrative consolidation in JTPA programs on ten JTPA participant outcomes
while controlling for demographic and socioeconomic characteristics of participants. Their models
account for 5-29 percent of the total variation in individual outcomes, and the administrative variables
are statistically significant in about half of these models. Some questions that arise include: How much of
the totalvariation in client outcomes is attributable to policy or program design and implementation
factors? How much of the portion of variation attributable to such factors is explained by the two
administrative variables included? Are there other potentially important administrative variables not
incorporated in these models that might change the observed effects of the coordination and
consolidation variables that are included? We are left not only with uncertainty about how much of a
difference the organization of these programs makes, but also with unclear policy prescriptions for
program administrators, (i.e., should they consolidate or not?)
Such limitations in modeling using individual-level outcomes leads Mead (1997, 1999) and
others to urge more research that models administrative processes and program outcomes across
multiple sites using client data aggregated at the site level. Mead (1999) describes this type of research
as performance analysis: process research that draws formal, statistical connections between
administrative practices and outcomes, with programs or sites as the unit of analysis. He argues that
variation [in outcomes] across programs tends to be more systematic, and therefore, explanatory
models using these data tend to be strong. In fact, the proportion of variation explained in
organizational, program- or site-level regressions (as indicated by R2 values) is typically considerably
higher than in similar individual-level regressions. In Meads (forthcoming) study of the influence of
-
8/3/2019 Empirical Methods for Investigating Governence
11/43
[ 7 ]
JOBS program requirements (clients active/inactive statuses) on changes in Wisconsin welfare
caseloads controlling for caseload demographics and economic factors, he explains 76 percent of the
variation in welfare caseload changes.
Sandfort (1998), who studied service technologies in Michigans Work First program and their
relationship to program outcomes, also maintains that the unit of analysis in policy studies of welfare
reform should be the program or organization. She argues that the more crucial forces shaping policy
are within the organizations themselves, and that individual-level data should be placed within their
larger, critical organizational context. In her county-level analyses, she models the proportion of
welfare recipients combining welfare and work in an average month and the proportion leaving welfare.
She includes county-level measures of the proportions of service providers offering specific service
technologies (e.g., job search assistance, soft skills, etc.) and four service delivery structure measures
(e.g., Project Zero, non-profit agency, etc.). She also includes several measures of welfare recipient
demographics. Despite the fairly limited set of explanatory variables available to her, Sandfort explains
approximately 60 percent of the variation welfare program outcomes.
While Sandforts work is a noteworthy example of this type of research, it also illustrates how
data access problems can constrain site-level analyses. She acknowledges that her minimal information
on welfare caseload characteristics might contribute to omitted variable bias in her models. Potentially
more problematic for policy analyses, however, is her qualitative finding that there is significant
variation in the service technology used by Work First providers in the same county, even though they
face the same local economic environment. This suggests that potentially important variation in service
delivery approaches at the service provider level is obscured in county-level aggregates used in the
-
8/3/2019 Empirical Methods for Investigating Governence
12/43
[ 8 ]
regressions. The services clients take up at this lower level might be related to their individual
characteristics as well as to those of the service providers.
Mead is clear about what he views as the main shortcoming of his 1999 study of Wisconsin
welfare caseloads: the inability to evaluate the effects of work policies on caseloads as definitively as
program impacts on individuals, since cross-sectional analyses explain variations in change around the
state [between counties] rather than the overall trend. The variation being explained in site- or
program-level models is not variation in test scores or earnings but rather variation between sites or
programs in average outcomes. It is inappropriate to use the findings of regression models at one level
of hierarchy to infer what might be going on at lower levels, although information from case studies and
qualitative data analyses can help inform us about these inter-relationships at other levels.
Fergusons (1991) research on 900 Texas school districts illustrates this type of slippage in
discussing site-level model findings. He uses OLS regressions to explain district average reading and
math scores with a wealth of district-level administrative, structural, socioeconomic and context
measures. He reports positive, statistically significant relationships between student test scores and
higher teacher exam scores, smaller classes and more experienced teachers. He
concludes that higher-quality schooling produces better reading skills among public school students.
His use of explanations of variation in average school district test scores to draw implications for
students outcomes ignores the fact that, within districts, there are schools, grades and classrooms
where many of these same factors may be interacting with other administrative and individual-level
factors at these levels to influence student achievement. He further suggests that researchers should
combine the results of studies examining different levels or components of a hierarchical system to link
-
8/3/2019 Empirical Methods for Investigating Governence
13/43
[ 9 ]
teacher salaries to teacher quality, teacher quality to students test scores, and students test scores to
earnings later in life. Such meta-analyses, while useful for addressing some questions, still risk neglecting
important factors that interact at the multiple levels of hierarchy within school systems.
Recent advances in statistical methodologies allow for empirical analyses of factors interacting at
multiple levels of hierarchy within government and social systems. Such advances show considerable
promise for improving knowledge of how governance affects public sector performance. Research
designs that integrate quantitative and qualitative information and that are based on multi-level models
and on data sets that include individual level observations are conceptually demanding and expensive,
however. Is the extra effort justified in terms of the results that are produced in comparison with less
complex designs? We address this question next.
Multilevel Approaches to Governance Research
While some forms of multilevel modeling have been in use for close to two decades, recent
work by Bryk, Goldstein, Kreft, Raudenbush and Singer has advanced the use of these models in
education and related fields of social policy research. New statistical packages have also been
developed to make these techniques more accessible to researchers. 1
Applications of Multilevel Modeling
Multilevel statistical models have many different potential applications across a number of
disciplinary fields, including sociology, biology and economics, among others. In this paper, we focus
1 AMultilevel Modeling Newsletter and a Harvard University website (maintained by Singer) provide
technical assistance to researchers and promote the dissemination of new research findings on the use of multilevel
(or hierarchical linear) modeling. Some of these statistical techniques, such as the nonlinear form known as
hierarchical generalized linear models (HGLM), are so new that the software developers issue disclaimers with the
release of these programs.
-
8/3/2019 Empirical Methods for Investigating Governence
14/43
[ 10 ]
on the use of multilevel models to formulate and test hypotheses about how factors or variables
measured at one level of an administrative hierarchy might interact with variables at another level. The
existence of these types ofcross-level interactions or effects is at the crux of the development of
multilevel modeling techniques.
In multilevel models, the assumption of independence of observations in the traditional OLS
approach is dropped, and relationships in the data, rather than assumed to be fixed over contexts, are
allowed to vary. The extent to which multilevel modeling improves statistical estimation in comparison
to OLS models depends on the potential for and strength of cross-level effects in the data and the
corresponding extent of variation in the dependent variable to be explained at the different levels of
analyses. When significant cross-level interactions are present but ignored in OLS modeling efforts,
problems arise, including reduced (or inflated) precision of estimates, mis-specification and subsequent
misestimation of model coefficients, and aggregation bias.
Because multilevel modeling expands the possibilities for investigating hierarchical relationships
and cross-level interactions involving two or three levels of organization, many see it as providing a link
between theory and practice in organizational studies (Kreft, 1996.) Bryk and Raudenbush (1992)
criticized the neglect of hierarchical relationships in traditional OLS approaches as fostering an
impoverished conceptualization that has discouraged the formulation of hypotheses about effects
occurring at and across different levels. Goldstein (1992) also sees multilevel modeling as an
explorative tool for theory development about relationships within and between levels of social
systems. He cautions, however, that exploratory analyses should not be substituted for well-grounded
substantive theories and that multilevel models should not be seen as a panacea for all types of complex
-
8/3/2019 Empirical Methods for Investigating Governence
15/43
[ 11 ]
data analysis problems. As Kreft (1996) points out, a particular statistical model cannot be optimal in
general only in specific research contexts and models should be selected based on both the theory
or research questions being tested and the type of data collected.
To illustrate with a governance example, if a functioning hierarchy of structural arrangements and
of management activities originating at one level does indeed influence activity at other (particularly
lower) levels of the organization, as they are presumably intended to do (or might do in unintended
ways), then we should anticipate and model the interdependence among hierarchically-ordered
variables. The absence of such cross-level interactions, on the other hand, might imply a high degree of
compartmentalization, or loose coupling across levels, and of sub-unit independence within the
organization. Furthermore, the presence of significant higher-level effects on organizational performance
in the absence of interdependence among hierarchical variables might suggest that lower-level
characteristics are essentially irrelevant to the efficacy of higher-level governance. While many policy
makers dream of circumstances where lower levels of the organization do not influence policy success,
empirical findings to this effect should probably be regarded with some suspicion.
Our literature review suggests that the application of actual hierarchical models in governance
and public management research is of quite recent vintage. Earlier research employed multi-level
concepts but not necessarily hierarchical models. For example, Meyer and Goes (1987). in their study
of non-profit hospitals adoption of innovative technologies, described their analytical approach as
hierarchical regression, but a careful review of studies such as these shows that multilevel modeling
techniques are not in fact utilized. Meyer and Goes assigned their explanatory variables to different
subsets according to the level of analysis to which they apply e.g., an organizational subset, a leader
-
8/3/2019 Empirical Methods for Investigating Governence
16/43
[ 12 ]
subset, an environmental subset, etc. and entered the different subsets into the regression model in
stages, examining changes in explained variation (R2) as the variables are added. Unlike HLM
modeling, this analytical strategy does not allow for analyses of cross-level effects between variables in
the different subsets.
Education
Given the large body of empirical research on educational processes and the ongoing, critical
concern for education policy and outcomes, it is not surprising that education researchers have led social
science efforts to develop and apply hierarchical linear models to the analysis of relationships in public
service delivery systems. The early studies of researchers who have published most extensively on the
use of multilevel or hierarchical linear models in education including Harvey Goldstein (University of
London), Anthony Bryk (University of Chicago) and Stephen Raudenbush (Michigan State University)
first emerged in the mid- to late 1980s (Goldstein, 1986, 1987, 1989; Bryk and Raudenbush, 1987,
1988.) Bryk and Raudenbush, for example, applied these techniques to analyze school-level effects on
students growth in mathematics achievement scores and were surprised by the high proportion of
variance in growth rates that was found to be between schools (83%). They continued on in their
research and developed the Hierarchical Linear Modeling (HLM) statistical program that is now widely
used in social sciences research (1992, 1999.) The research of Goldstein and his colleagues has also
progressed steadily, with a considerable number of applications focused on the British educational
system, including larger-scale school performance reviews mandated by the British government (1992,
1995, 1996.)
More recently, Roderick and Camburn (1997) and Roderick (1999) have been examining the
-
8/3/2019 Empirical Methods for Investigating Governence
17/43
[ 13 ]
Chicago public school systems decision to end social promotion and increase students achievement.
They are drawing upon the wealth of data generated by the Consortium on Chicago School
Research, which has collaborated with the Chicago Public Schools to develop data sets and
methodologies for multilevel studies of school reform implementation.
Roderick and Camburn used hierarchical generalized linear models (the non-linear form of
HLM) to test hypotheses about students likelihood of failing courses and their likelihood of subsequent
recovery from grade failure. Their models allowed them to assess the potential effectiveness of three
alternative strategies (individual- and system-focused) for improving student performance: (1) improving
the educational preparation of students before they enter high school, (2) creating transition years to
ease stress and increase support for students, and (3) instituting large-scale, school-wide restructuring
and reform efforts to improve teaching practices and school environments. They found a number of
important relationships among individual- and school-level variables and generated strong evidence of
school-level effects that suggest, in their words, governance and instructional environments . . . matter.
Presently, Roderick (1999) is using three-level hierarchical linear models to analyze changes in
students grades and test scores over time (level 1); students paths (promotion, retention, summer
school participation, etc.) through the new policies implementation (within schools and across years)
and the influence of student characteristics (level 2); and the effectiveness of schools responses to these
policies as a function of school demographics and characteristics, measures of policy implementation
and teachers classroom strategies, and the school environment and prior school development (level
3). This study also includes an extensive qualitative component with intensive case studies of each
schools approach to policy implementation and a longitudinal investigation of students experiences
-
8/3/2019 Empirical Methods for Investigating Governence
18/43
[ 14 ]
under the promotional policy.
Drug Abuse Treatment
Early large-scale studies on drug abuse treatment effectiveness included: (1) the Drug Abuse
Reporting Program (DARP), which collected data from approximately 44,000 clients and 52 federally-
funded treatment programs between 1969 and 1972, and (2) the Treatment Outcome Prospective
Study (TOPS), which was intended to expand the data collected in DARP and involved more than
11,000 patients in 41 programs between 1979 and 1981. Longitudinal (non-experimental) analyses of
the cost-effectiveness of various drug abuse treatment modalities were conducted using these client-level
data, although information about programs or organizations was limited in focus to services delivered
and program environments.
These research efforts were followed by other major studies, including the Outpatient Drug
Abuse Treatment Systems (ODATS) study and the Drug Abuse Treatment Outcomes Study
(DATOS). ODATS, which is continuing, surveys unit directors and supervisors in drug abuse treatment
programs to obtain rich, organization-level data on characteristics of the programs, their environments
and their clients. ODATS has progressed through four waves of data collection from a total of more
than 600 programs since 1984. In contrast, a major strength of the DATOS research is the
extensiveness of client-level data obtained from more than 10,000 adults in 99 drug abuse treatment
programs between 1991 and 1993. Research using these data sets address questions about program
design, treatment practices, and client outcomes (DAunno, Sutton and Price, 1991 and Fletcher, Tims
and Brown, 1997). Our own exploration of these data suggests that adequate information for a
multilevel investigation of governance and performance is lacking.
-
8/3/2019 Empirical Methods for Investigating Governence
19/43
[ 15 ]
In an early study on the effectiveness of methadone treatment for heroin addiction, Attewell and
Gerstein (1979) drew on organizational theory to develop a hierarchical conceptual model of policy
implementation that link[s] the macrosociology of federal policy on opiate addiction to the
microsociology of methadone treatment (311). They used a case-study approach, including
observational research in clinics, interviews with clients, and analyses of program records from clinics
over multiple years, to investigate managerial responses at the program level to government policy and
institutional regulation, as well as clients responses and behavior to subsequent program changes.
Based on qualitative analysis of these observations, they found that compromised policies at the
federal level resulted in ineffective local management practices and poor outcomes for clients.
Gerstein now directs the National Treatment Improvement Evaluation Study (NTIES), which
should permit quantitative, multilevel analyses of drug abuse treatment policies and programs. In the
NTIES final report on the NTIES evaluation study (1997), Gerstein et al. described how a two-level
design permeated every level of the project. This study evaluates both administrative and clinical
(client) processes and outcomes for over 6,000 clients in up to nearly 800 programs. Like the effort led
by the Consortium on Chicago School Research, the design of the NTIES project provides a model for
researchers who are considering plans for a multi-site, multilevel study in any field.
Employment and Training
Our own multilevel study and a separate work by Heinrich (1999) on administrative structures
and management/incentive policies in JTPA programs provide the basis for our comparison of multilevel
modeling techniques with the individual-level and site-level modeling approaches. Heinrich and Lynn
(1999) used data collected during the National JTPA Study on individuals characteristics and earnings
-
8/3/2019 Empirical Methods for Investigating Governence
20/43
[ 16 ]
and employment outcomes, as well as administrative and policy data obtained from the sixteen study
sites over a three-year period, to estimate hierarchical linear models. They found that both site-level
administrative structures and local management strategies (including performance incentives) had a
significant influence on client outcomes.
In her multilevel study of local JTPA service providers and their contracts with a single JTPA
agency, Heinrich also examined the influence of organizational structure or form (i.e., public nonprofit,
private nonprofit, and for-profit service providers) and the use of performance incentives in service
provider contracts on client outcomes, controlling for client characteristics and the services they
received. She similarly found significant effects of the use of performance incentives by local JTPA
agencies on client outcomes.
The data used in these two studies allow for a comparison of different statistical approaches.
Further, the extent of cross-level interactions among hierarchical variables in these two sets of data are
quite different. Differences in the extent of intra-class correlation in hierarchical data have important
implications for the relative advantages and disadvantages of using multilevel modeling strategies in
different research contexts, as we shall show.
Comparing Hierarchical Linear Model and Ordinary Least Squares Results
Different models may yield different answers to the same question. Thus researchers should
select modeling approaches that not only fit the data but that are also appropriate ways to address the
questions or hypotheses of interest. In our studies of JTPA programs, two different levels of analyses
are represented: (1) the client or individual level, and (2) the site (service delivery area) or contract level,
which made it possible to organize or fit the data using several different modeling strategies. For OLS
-
8/3/2019 Empirical Methods for Investigating Governence
21/43
[ 17 ]
regressions of individual-level outcomes, the site-level (or contract-level) administrative and
management/incentive policy data were linked to the individual participant records, so that all
participants in a given site and year (or served under a specific contract) had the same site-level (or
contract level) variable values. For the site-level or contract-level OLS regressions, the individual-level
data were collated by site or by contract, and average measures of these variables were entered into the
models, along with the site- or contract-level administrative and policy variables. In the hierarchical
linear models, each of these two levels of data was formally represented by its own sub-model, with
each sub-model specifying the structural relations occurring and the residual variability observed at that
level.
The presence of significant intra-class correlations in hierarchical data (described further in the
following section) violates basic assumptions of the OLS regression model, including: (1) the
independence of observations, and (2) that the number of independent observations is equal for all
variables. One of the most widely extolled features of hierarchical linear models is the capability they
provide for partitioning variance into components associated with the different levels of analysis, and
subsequently allowing the detection and exploration of differences across contexts or groups. For
example, large between-group variances will indicate that an overall regression will mis-estimate
relationships for the individual groups.
Model Specifications
One strategy for exploring multilevel data is to first estimate an unconditional means model.
This simple model expresses the outcome, Yij, as a linear combination of the grand mean of Yij (m), (a
fixed component), and two random components: the variability between sites or groups (uj), and the
-
8/3/2019 Empirical Methods for Investigating Governence
22/43
[ 18 ]
residual variance associated with the ith unit or individual in the jth site or group (rij). Following a
multilevel modeling approach, the level one individual outcome model is: Yij = b0j + rij , and the level
two model is expressed as a function of the overall mean and random deviations from that mean: b0j=
m00 + u0j. Substituting the level two sub-model into the level one sub-model yields the multilevel model:
Yij = m00 + u0j + rij . (Eq. 1)
Using the covariance parameter estimates from the unconditional means model, one can test
hypotheses about whether the variability between groups and the residual variability within groups are
significantly different from zero. This information may also be used to estimate the intra-class
correlation, which indicates what portion of the total variance in outcomes occurs between sites or
groups (Bryk and Raudenbush 1992 and Singer 1997). A high proportion of intra-class correlation in
the data would suggest that OLS analyses are likely produce misleading results. As a general rule of
thumb, Kreft (1996) defines high intra-class correlation as larger than r = 0.25, (i.e., more than 25
percent of the variation between sites or groups), although much smaller proportions of total variance at
the site- or group-level may be statistically significant and warrant exploration.
The results reported below were derived from the two separate studies of JTPA programs
discussed earlier: the analyses of data from the sixteen National JTPA Study sites over three years, and
the analyses of data from Heinrichs study of JTPA training providers and their contracts with a local
JTPA agency. The estimation of unconditional means models showed that for the 16 NJS sites (or 48
observations over three years), a very small but still statistically significant percentage (about 3%) of the
total variation in participant outcomes was between sites, (or at the site-level). In the study of
approximately 400 JTPA service provider contracts, a much larger percentage (6-39%) of the total
-
8/3/2019 Empirical Methods for Investigating Governence
23/43
[ 19 ]
variation was at the contract administration level. These simple statistics suggest that we should expect
more cross-level interactions between levels of analyses in the study of JTPA contracts, and that the
results of the three different modeling strategies individual-level OLS models, site-level or contract-
level OLS models, and (two-level) hierarchical linear models (HLM) would be more likely to diverge
in the contract study findings.
When investigating possible cross-level interactions in hierarchical data, one is advised to begin
with a theory about which variables at the various levels would be expected to interact as well as about
the nature of the interactions. At the second (group or site) level, sub-models denoting the relationships
between level one and level two variables may specify fixed or randomly varying intercepts and/or
slopes. The full multilevel approach, in which both intercepts and slopes vary randomly, is sometimes
used for exploring the full range of potential cross-level effects in hierarchical data. This approach is
similar to fitting a different regression model within each of the level two groups or sites, and this is
typically efficient only when there is a relatively small number of level two observations with large
numbers of level-one cases within each group or site. In our study of the sixteen NJS sites, we
estimated a full, multilevel model (also known as an intercepts- and slopes-as-outcomes model),
which we will also report below. In modeling JTPA participants earnings outcomes following
their participation in JTPA programs, the levelone (individual) sub-modelis specified as follows:
Yij = b0j + b1jX1j + ...+ bnjXnj + rij, (Eq. 2)
where Yij is a measure of a participants post-program earnings; the subscript j denotes the site and
allows each site to have a unique intercept and slope for each of the level one (individual characteristic)
-
8/3/2019 Empirical Methods for Investigating Governence
24/43
[ 20 ]
predictors, (X1j to Xnj), and the residual, rij, is assumed to be normally distributed with homogeneous
variance across sites. In the level two (site) sub-modelshown below, all of the predictors (Wj) are
measured at the site level, (i.e., variables describing administrative structures, performance incentive
policies, contracting practices, and economic conditions at the sites):
b0j = g00 + g01W1j + ... + g0nWnj + u0j (Eq. 3)
b1j = g10 + g11 W1j + ... + g1nWnj + u1j . . .
bnj = gn0 + gn1W1j + ... + gnnWnj + unj
The level one and level two sub-models together define the intercepts- and slopes-as-outcomes model.
In the level two sub-model, the level one intercept and beta coefficients are expressed as a linear
function of the level two predictors. In interpreting the results of this model, one examines the estimated
values of the level two coefficients (g01 to gnn) to determine which site-level variables help predict: (1)
why some sites realize better average earnings outcomes than others, and (2) how the effects of some
level one (client-level) variables on outcomes vary across sites.
The results of our estimation of the intercepts- and slopes-as-outcomes model revealed very
few statistically significant relationships among level one and level two predictors, thus indicating that
there was little significant variation in how the effects of client-level variables influenced outcomes across
the sites. These findings suggested that we could simplify our model; that is, the relationships between
the level one and level two variables did not appear to vary randomly across the sites, and thus
randomly varying slopes were not necessary. This is also the point at which we brought our theory of
-
8/3/2019 Empirical Methods for Investigating Governence
25/43
[ 21 ]
governance in JTPA programs to bear more definitively on the modeling process. For example, we did
not expect the relationship between having a Private Industry Council as the administrative entity (a level
two variable) and the effects of participants gender (a level one variable) on earnings outcomes to vary
across the sites and years. Rather, we expected (and the intercepts- and slopes-as-outcomes model
results confirmed) that the relationships between administrative structure and the effects of individual-
level characteristics such as gender on outcomes were fairly constant (or fixed) across sites and years.
When one assumes fixed effects for the level one predictors, a different level two sub-model is
specified to combine with the level one sub-model (eq. 2.) This level two sub-model specification, a
variation of the random-intercept model, is:
b0j = g00 + g01W1j + ... + g0nWnj + u0j
b1j = g10 , . . . , bnj = gn0 (Eq. 4)
As in equation 4 above, the relationships between the level two (site-level) variables and the effects of
level one (client-level) predictors on earnings outcomes are fixed (b1j = g10 . . . bnj = gn0). Combining the
level one sub-model (Eq. 2) and this level two sub-model, (i.e., substituting eq. 4 into eq. 2), the
multilevel model derived is:
Yij = g00 + g01W1j + ... + g0nWnj + g10X1j + ...+ gn0Xnj + u0j + rij. (Eq. 5)
Through estimation of this hierarchical linear model (eq. 5), one obtains coefficient values for all level
one (X1j to Xnj) and level two (W1j to Wnj) predictors that account for the interrelationships among
-
8/3/2019 Empirical Methods for Investigating Governence
26/43
[ 22 ]
these variables (as specified in the level two sub-model, eq. 4) and that indicate the direction and
significance of their effects for participants earnings.
Equation 5 above was used in estimating the hierarchical linear models presented in Tables 1
and 2 for the study of the sixteen NJS sites over three years. In Heinrichs study of service provider
contracts, two of the multilevel models (of participants earnings in the first post-program quarter and
their pre- to post-program quarterly earnings changes) employ this same specification (i.e., fixed level
two effects), while the other model specifies both fixed effects and a random effect in the level two sub-
model. The level two sub-model for this second specification is shown below:
b0j = g00 + g01W1j + ... + g0nWnj + u0j
b1j = g10 + g01W3j (random effect)
b2j = g20 , . . . , bnj = gn0 (fixed effects) (Eq. 6)
HLM and OLS Model Results
The findings of the hierarchical linear models are shown in the first column of Tables 1-5. The
second and third columns in each table show the results of the individual-level OLS and site-level OLS
regressions, estimated using the same data and exactly the same set of dependent and explanatory
variables as in the multilevel models.
In examining the findings in these tables, the fixed effect coefficient estimates (g10-gn0) of the HLM
models(in the first column) are directly comparable to the OLS beta coefficient estimates of the individual-
level regressions (in the second column.) In the site-level regressions, the variables that are indicator (or
binary) in form at the individual level (e.g., single head of household, welfare recipient, etc.) are aggregated
-
8/3/2019 Empirical Methods for Investigating Governence
27/43
[ 23 ]
and become average proportions at the site-level. To allow for comparisons of these site-level OLS
coefficients with the coefficient estimates of binary variables in the other models, these coefficient estimates
are multiplied by their site-level average values to calculate estimated effects for the average individual (in
the third column).
The random effect estimated in the HLM model of hourly wages at termination (in the service
provider contracts study) indicates that there is a statistically significant, cross-level interaction between
the effects of contract performance incentives and the proportion of participants under age 18 that
varies across sites. The positive sign on this random coefficient indicates that, on average, sites with
higher proportions of young participants that also include performance incentives in the contracts of
providers who serve them will improve hourly wage outcomes for participants. (For additional
discussion of the substantive findings of the models shown in Tables 1-5, see Heinrich and Lynn (1999)
and Heinrich (1999.)
We begin the technical comparison of these modeling strategies by turning to Tables 1 and 2,
which display the results of the NJS data analyses. It is apparent that the HLM (column 1) and
individual-level OLS (column 2) estimated variable coefficients are very close for both individual-level
and site-level predictors. This is particularly evident in Table 2, (the model of participants first post-
program year earnings), where 97 percent of the site-level variation is explained by the model. In
general, these findings confirm that where a very small percentage of variation occurs at the site-level
(approximately 3%), OLS and HLM methods are likely to produce comparable estimates of individual
and site-level effects. Another reason for the similarity of these two sets of results is that statistical tests
(performed using HLM model output) showed that all of the statistically significantvariation at the site
-
8/3/2019 Empirical Methods for Investigating Governence
28/43
[ 24 ]
level was explained away by the predictors included. That is, there was no statistically significant
variation at the site level that remained to be explained or accounted for in these models (or no omitted
variable bias at level two.)
One might reasonably ask what the advantage is of using HLM in these cases. First, we can
identify how much of the variation in outcomes lies at the different level of analyses. Second, we can
assess what proportion of this variation (at both site- and individual-levels) is explained by our models
and whether any statistically significant variation remains to be explained. In addition, researchers can
use various analytical strategies to examine and check for patterns or irregularities in the residuals at
both the site- or group-level (u0j) and the individual-level (rij). Bryk, Raudenbush and Congdon (1999)
and Goldstein (1995) describe a number of these techniques such as Q-Q plots, plots of empirical
Bayes (level two) versus least square residuals, and plots of empirical Bayes residuals with level-two
predictors to assess model fit and reliability.
Comparing HLM and individual-level OLS results for the service provider contract models
(Tables 3-5), where there was a much larger percentage of variation at level two (or between
contracts), the variable coefficient estimates are still similar, although not as close as those in Tables 1
and 2. The differences in estimated coefficient values are more noticeable in Tables 3 and 4, where
approximately 30-40 percent of the total variation was at the contract level. While the level two
variables in these models did substantially reduce the amount of contract-level variation that was not
accounted for, there were still statistically significant differences between the outcomes by contract that
remained to be explained.
The most striking findings of this investigation of modeling strategies, however, can be seen in
-
8/3/2019 Empirical Methods for Investigating Governence
29/43
[ 25 ]
the comparison of the site-level OLS model results with those of the HLM and individual-level OLS
regressions. In contrast to the comparable findings of the HLM and individual-level OLS models, the
site-level models produce both inconsistent and seemingly inaccurate estimates of some of the
individual- and site-level coefficients. (See the italicized numbers in the third column of Tables 1-5.)
While the percent of variation explained in the site- (or contract) level OLS models and the HLM
models is similar, the size, sign and statistical significance of some of the coefficient values and estimated
effects differ noticeably across different outcomes in the respective studies as well as from the HLM and
individual-level OLS model results. Given that some of the seemingly anomalous estimated effects in the
site-level OLS models of JTPA participant outcomes are contrary to the findings of other JTPA
research (e.g., the positive effect of being a high school dropout on earnings in four of the five site-level
OLS models), we believe that it is the site-level OLS models that are probably inaccurate. These
findings also imply, contrary to Meads argument, that modeling administrative processes and program
outcomes across multiple sites with data on clients aggregated at the site levelmay be a less reliable
approach than similar (multiple-site) client-level data analyses.
The notable inconsistencies in the site- or contract-level policy/administrative/structural
coefficients are of particular importance for the study of governance, since these variables are nearly
always the primary focus of public policy or administration studies. In many of the studies (some
discussed earlier) that use site- or organization-level approaches, it is common to see researchers
reporting high levels of variation explained with a relatively small number of policy or governance
variables. A few, such as Mead (forthcoming), make it clear that site- or organization-level OLS
models are not explaining variation in individual outcomes, but rather variation between average
-
8/3/2019 Empirical Methods for Investigating Governence
30/43
[ 26 ]
outcomes across the sites or organizations. Our findings underscore that ignoring the variation in
individual-level outcomes and the potential cross-level effects between variables operating at individual-
and site- or organization levels may well lead to inaccurate estimates of policy/administrative/structural
variable effects.
In a recent study that also compared multilevel modeling strategies to individual- and group-level
OLS regressions, Krull and MacKinnon (1999) reached a similar conclusion. In discussing the
individual- versus group-level models, they also pointed out that when individual-level data are
aggregated, the ability to predict individual-level variation, which frequently comprises the majority of
total variation, is eliminated. Therefore, researchers should expect that individual and group level
analyses of the same data might indicate relationships that differ in both magnitude and direction.
Overall, they concluded that multilevel-based estimates of the standard error showed considerably less
bias than OLS-based estimates, and that OLS analyses were less efficient than multilevel analyses
(433).
To summarize, in the absence of multilevel analyses, researchers are unable to determine how
much of the total variation in outcomes lies at the site- or organization level (i.e., the extent of intra-class
correlation) and how much of it one is able to explain with a given model specification. In Table 2,
where the amount of intra-class correlation was small and the site-level variables included in the models
explained nearly all of the site-level variation in outcomes, the estimates produced by the three different
modeling strategies of policy/administrative/structural effects were much closer. Without this
information, however, how does one assess the probable accuracy of estimated effects? While some
researchers support their quantitative studies with qualitative, hands-on components, it is also not
-
8/3/2019 Empirical Methods for Investigating Governence
31/43
[ 27 ]
uncommon for them to report some findings that are inconsistent with their hypothesized effects. In
these cases, how does one ascertain whether it is the theory or the model specification that is in error?
The results of the analyses presented here suggest that more attention should be given to multilevel
modeling as a strategy for empirically investigating the linkages between governance and performance.
Conclusions
Multilevel modeling holds considerable promise for governance research. Rapidly increasing
computing capacity and new developments in statistical theories have now made programs for multilevel
modeling (HLM, HOMALS, VARCL, BIRAM, and SAS mixed models are a few examples; see Kreft
and Aschbacher 1994) accessible to anyone willing to invest some time in learning about the underlying
theories and how to apply them. In a recent workshop Models and Methods for the Empirical Study
of Governance, Ann Chih Lin asked, however, whether our quest to advance the empirical study of
governance will drive a push to create Godzilla-like data sets and the subsequent analysis and re-
analysis of them. She noted that developing and supporting the analyses of large-scale, multilevel (and
frequently longitudinal) data sets such as those described in this paper require substantial resources that
might otherwise provide support to many smaller projects. One might question, for example, whether
substantially more knowledge might be gained from a multi-site, multilevel empirical study of drug abuse
treatment programs (such as that which NTIES might allow) than a number of smaller-scale case-
studies like that produced by Attewell and Gerstein.
While the creation or re-analysis of multi-site, multilevel data sets might not always be feasible
or the best use for sparse research funds, we believe that when it is possible to develop and work with
-
8/3/2019 Empirical Methods for Investigating Governence
32/43
[ 28 ]
these types of data and methods, the advantages gained in terms of (1) a fuller and more precise
understanding of complex, hierarchical relationships, (2) more information about the amount of variation
explained by statistical models at different levels of analysis, and (3) increased generalizability of findings
across different sites or organizations with varying (observable) characteristics makes the investment in
multilevel modeling worthwhile.
When one doesnt know how much of the total variation in the dependent variable (e.g., a
program outcome) lies at the various levels of organization (i.e., the extent of intra-class correlation), the
results of an individual- or higher-level OLS regression should be interpreted with considerable caution.
As in any scientific field, research that attempts to replicate the most important findings of these studies
is desirable, although this also becomes more challenging when data sets (and subsequently statistical
models) are not directly comparable. Case-study or other qualitative research components can provide
important background for the interpretation of OLS regression findings in these cases, but they typically
do not make the findings more generalizable across a range of program or organizational contexts.
When presenting and discussing their findings, governance researchers should be clear not only about
what they are able to measure and explain in their models but also about the limitations on these findings
attributable to the models, methods, and data employed.
-
8/3/2019 Empirical Methods for Investigating Governence
33/43
29
TABLE 1: Hierarchical linear and OLS models of JTPA participants first post-program
quarter earnings outcomes (National JTPA Study data analyses)
Earnings in first post-program quarter
Predictors - (individual level) Hierarchical linear
model
OLS - individual
level
OLS - site level (average)
Intercept 190.55 (0.40) 208.92 (0.51) 33.00 (0.02)
Gender (1=male) 517.88*** (6.51) 513.19*** (6.46) -21.26*** (-2.73) -903.55
Age 22-29 years 369.75*** (3.98) 374.64*** (4.04) 3.91 (0.52) 110.65
Age 30-39 years 240.91** (2.36) 244.29** (2.40) 37.59*** (3.88) 967.94
Age 40 and over years 53.84 (0.42) 57.86 (0.45) -15.17 (-1.00) 165.65
Black -235.16** (-2.29) -239.48** (-2.35) -12.69** (-2.33) -365.47
Hispanic -109.56 (-0.90) -133.60 (-1.11) -0.33 (-0.06) -3.55
Divorced, widowed or separated 87.89 (1.02) 91.86 (1.07) -32.28*** (-4.53) -864.78
No high school degree -350.57*** (-4.52) -349.11*** (-4.51) 20.70* (1.87) 929.22
Some post high school education 360.81*** (3.58) 357.83*** (3.55) 2.41** (0.34) 41.77
Welfare recipient at time of application -293.05*** (-3.71) -298.60*** (-3.78) -7.91 (-1.41) -425.00
Children under age six 63.58 (0.76) 66.71 (0.79) -20.20 (-1.40) -445.01
Employment-unemployment transition in
year before enrollment
-295.66*** (-3.92) -297.27*** (-3.95) 40.57*** (5.31) 2582.69
Earnings in year before enrollment 0.09*** (9.70) 0.09*** (9.72) -0.11 (-1.23)
Received classroom training 100.36 (1.22) 99.52 (1.22) -5.42 (-1.57) -387.26
Received on-the-job training 388.36*** (3.58) 388.34*** (3.56) 26.41*** (3.39) 457.42
Predictors - (site level)
PIC is the administrative entity 446.41*** (3.60) 436.12*** (4.16) 883.00*** (4.80) 404.68
PIC and LEO/CEO are equal partners -472.55** (-2.00) -436.75** (-2.12) 1170.10*** (3.53) 682.52
Percent of services provided directly by
administrative entity
-548.28 (-1.45) -487.64 (-1.53) -259.20 (-0.56)
Percent of performance-based contracts -650.32* (-1.91) -550.17* (-1.90) 1198.20** (2.36)
Weight accorded to employment rate
standard
4260.41*** (3.21) 4188.11*** (3.75) 341.00 (0.22)
Minimum number of standards sites
must meet to qualify for performance
bonuses
21.13 (1.10) 17.33 (1.12) -2.62 (-0.09)
-
8/3/2019 Empirical Methods for Investigating Governence
34/43
30
Earnings in first post-program quarter(Table 1, continued)
Predictors - (individual level) Hierarchical linear
model
OLS - individual
level OLS - site level
Requirement that performance bonuses
must be used to serve highly
disadvantaged groups
-242.70** (-2.02) -252.56** (-2.40) -1543.80*** (-3.44) -289.46
Southern region 433.03 (1.49) 362.93 (1.43) -1643.00*** (-2.77) -410.75
Midwestern region 535.74*** (2.90) 538.03** (3.31) 11.50 (0.03) 3.59
Western region 825.04** (2.22) 752.67** (2.32) -445.40 (-0.82) -113.35
Unemployment rate 11725.86*** (2.67) 11546.00*** (3.10) 5105.00 (0.90)
Model predicting power - percent
of variation explained by model
6% individual-level; 86%
between-site
R2
= 11.3% R2
= 85.4%
Coefficient value (t-ratio in parentheses): *significant at a
-
8/3/2019 Empirical Methods for Investigating Governence
35/43
31
TABLE 2: Hierarchical linear and OLS models of JTPA participants first post-program
year earnings outcomes (National JTPA Study data analyses)
Earnings in first post-program year
Predictors - (individual level) Hierarchical
linear model
Individual level
OLS
Site level OLS (average)
Intercept 1117.10 (0.76) 1093.36 (0.77) 15892.00*** (3.52)
Gender (1=male) 2144.18*** (7.76) 2143.20*** (7.76) -8.79 (-0.37) -373.58
Age 22-29 years 1455.00*** (4.51) 1456.07*** (4.51) -68.84*** (-2.83) -1948.17
Age 30-39 years 1000.99*** (2.82) 1000.36*** (2.82) -92.64*** (-2.62) -2385.48
Age 40 and over years 397.21 (0.89) 398.69 (0.90) 6.86 (0.16) 280.71
Black -1079.14*** (-3.04) -1081.02*** (-3.05) -24.82* (-1.79) -714.82
Hispanic -699.64 (-1.66) -714.26* (-1.70) 4.47 (0.28) 48.14
Divorced, widowed or separated 325.55 (1.09) 326.86 (1.09) -48.64** (-2.24) -1303.07
No high school degree -1424.55*** (-5.29) -1423.04*** (-5.29) -42.09 (-1.22) -1889.42
Some post high school education 1046.91*** (2.99) 1047.70*** (2.99) -65.02*** (-2.90) -1126.80
Welfare recipient at time of application -1006.49*** (-3.67) -1012.47*** (-3.69) -49.09*** (-2.91) -2637.61
Children under age six 496.51* (1.70) 500.15* (1.71) -69.46* (-1.67) -1530.20
Employment-unemployment transition in
year before enrollment
-862.80*** (-3.30) -865.79*** (-3.31) -4.08 (-0.15) -259.73
Earnings in year before enrollment 0.33*** (10.59) 0.33*** (10.59) 0.40 (1.30)
Received classroom training 125.71 (0.44) 132.74 (0.47) -23.74*** (-2.52) -1696.22
Received on-the-job training 1195.17*** (3.17) 1197.98*** (3.18) -40.20 (-1.53) -696.26
Predictors - (site level)
PIC is the administrative entity 1737.40*** (4.59) 1727.15*** (4.74) 1626.90*** (2.98) 745.12
PIC and LEO/CEO are equal partners -1933.65*** (-2.61) -1949.44*** (-2.73) -438.90 (-0.56) -255.88
Percent of services provided directly by
administrative entity
-2618.57** (-2.26) -2604.93** (-2.35) -564.00 (-0.43)
Percent of performance-based contracts -2719.45*** (-2.60) -2709.02*** (-2.69) -2033.00* (-1.80)
Weight accorded to employment rate
standard
15887.75*** (3.93) 15888.00*** (4.09) 15710.00*** (3.13)
Minimum number of standards sites
must meet to qualify for performance
bonuses
22.25 (0.39) 21.50 (0.40) 102.00 (1.16) 11.74
-
8/3/2019 Empirical Methods for Investigating Governence
36/43
32
Earnings in first post-program year(Table 2, continued)
Predictors - (individual level) Hierarchical linear
model
Individual level
OLS
Site level OLS
(average)
Requirement that performance bonuses
must be used to serve highly
disadvantaged groups
-866.66** (-2.30) -865.32** (-2.36) -1376.00 (-1.35) -258.00
Southern region 2035.83** (2.24) 2025.88** (2.30) 3101.00** (2.15) 775.25
Midwestern region 1936.15*** (3.33) 1940.48*** (3.44) 4367.00*** (3.99) 1364.69
Western region 3215.92*** (2.76) 3214.16*** (2.85) 3760.00*** (2.81) 940.00
Unemployment rate 49955.52*** (3.71) 50558.00*** (3.90) 58873.00*** (3.53)
Model predicting power - percent
of variation explained by model
13% individual-level;
97% between-site
adjusted R2
=
13.2%
adjusted R2
=
87.6%
Coefficient value (t-ratio in parentheses): *significant at a
-
8/3/2019 Empirical Methods for Investigating Governence
37/43
33
Table 3: Hierarchical linear and OLS models of JTPA participants
hourly wages at termination (study of service provider contracts)
Hourly wage at termination
model predictors
Hierarchical
linear model
Individual level
OLS
Contract(or) level OLS
Individual level variablesIntercept
Participant characteristics
Under age 18 years
Age 22-29 years
Age 30-39 years
Age 40 years and over
Male
African-American
Hispanic
Single head of household
Welfare recipient
No high school degree
Post high-school education
College graduateMinimal work history
Unemployed at application
Not in labor force
Zero earnings in pre-program year
Training services
Received basic/remedial education
Received vocational training
Received on-the-job training
Received job search/job club
Length of training (in months)
Economic/environmental factors
Percent change in employment, 1988-1989
Percent change in employment, 1989-1990
Percent change in employment, 1990-1991
Percent change in employment, 1991-1992
Percent change in employment, 1992-1993
Contract level variablesPrivate, nonprofit contractor
For-profit contractor
Performance incentives in contract
Random effect: Under age 18 years byperformance incentives in contract
Predicting power (or percent of variation
explained)
2.02*** (11.28)
-0.31*** (-4.56)
0.34*** (4.67)
0.37*** (4.86)
0.51*** (5.94)
-0.03 (-0.86)
-0.10** (-2.26)
-0.01 (-0.09)
0.11** (1.96)
-0.17*** (-4.10)
-0.15*** (-2.80)
0.13*** (2.50)
0.32 (1.48)0.03 (0.60)
-0.30*** (-4.17)
-0.54*** (-7.00)
-0.43*** (-8.96)
-0.10 (-1.54)
0.41*** (5.91)
1.40*** (17.53)
-0.02 (-0.16)
-0.09*** (-17.14)
-8.68*** (-3.83)
2.22*** (9.67)
0.26 (0.94)
32.01* (1.87)
51.37 (1.17)
0.29* (1.84)
0.77*** (4.10)
0.34*** (2.72)
0.30*** (2.67)
9.0% (individual)
68.0% (contract)
2.37*** (20.77)
-0.53*** (-9.23)
0.63*** (9.94)
0.68*** (10.12)
1.02** (15.50)
-0.05 (-1.33)
-0.11** (-2.53)
-0.02 (-0.25)
0.09 (1.61)
-0.20*** (-4.83)
-0.02 (-0.34)
0.20*** (3.61)
0.63*** (2.85)-0.03 (-0.65)
-0.28*** (-3.76)
-0.68*** (-8.65)
-0.67*** (-14.72)
-0.14*** (-2.96)
0.39*** (8.31)
1.35*** (24.58)
0.09 (0.91)
-0.09*** (-19.27)
-16.28*** (-7.56)
2.26*** (11.18)
0.70*** (4.08)
24.35** (2.16)
1.63 (0.07)
0.10* (1.75)
0.40*** (6.12)
0.32*** (7.11)
n.a.
adjusted R2=
34.7%
2.85*** (3.30)
-0.001 (-0.42) -0.04
0.012*** (2.78) 0.21
0.019*** (3.94) 0.28
0.022*** (5.40) 0.29
-0.005 (-1.44) -0.26
0.002 (0.57) 0.12
0.004 (0.84) 0.04
-0.008 (-1.53) -0.15
-0.007* (-1.66) -0.20
0.001 (0.27) 0.02
0.002 (0.42) 0.03
-0.003 (-0.22) -0.0030.001 (0.43) 0.05
-0.006 (-0.75) -0.31
-0.014* (-1.79) -0.62
-0.009*** (-2.99) -0.36
-0.005** (-2.24) 0.10
0.001 (0.59) 0.04
0.022*** (7.06) 0.31
0.013*** (3.18) 0.07
-0.092*** (-3.16) -0.48
-22.75 (-1.24)
2.98** (2.32)
1.59* (1.75)
73.01 (1.16)
-84.47 (-0.96)
0.23 (1.17) 0.15
0.53** (2.18) 0.11
0.13 (0.78) 0.09
n.a.
adjusted R2=
68.2%
Coefficient value (t-ratio in parentheses): *significant at a
-
8/3/2019 Empirical Methods for Investigating Governence
38/43
34
Table 4: Hierarchical linear and OLS models of JTPA participants first post-program
quarter earnings outcomes (study of service provider contracts)
First post-program quarter earnings
model predictors
Hierarchical
linear model
Individual level
OLS
Contract(or) level OLS
Individual level variablesIntercept
Participant characteristics
Under age 18 years
Age 22-29 years
Age 30-39 years
Age 40 years and over
Male
African-American
Hispanic
Single head of household
Welfare recipient
No high school degree
Post high-school education
College graduateMinimal work history
Unemployed at application
Not in labor force
Zero earnings in pre-program year
Training services
Received basic/remedial education
Received vocational training
Received on-the-job training
Received job search/job club
Length of training (in months)
Economic/environmental factorsPercent change in employment, 1988-1989
Percent change in employment, 1989-1990
Percent change in employment, 1990-1991
Percent change in employment, 1991-1992
Percent change in employment, 1992-1993
Contract level variablesPrivate, nonprofit contractor
For-profit contractor
Performance incentives in contract
Predicting power (or percent of variation
explained)
1367*** (19.03)
-145*** (-4.61)
238*** (6.15)
339*** (8.41)
315*** (7.33)
8 (0.39)
-114*** (-4.72)
49 (1.29)
29 (0.95)
-93*** (-4.17)
-68** (-2.39)
-15 (-0.50)
-94 (-0.96)
-87*** (-4.04)
-250** (-6.38)
-411*** (-9.71)
-513*** (-21.24)
-39 (-1.37)
38 (1.15)
433*** (11.23)
183*** (3.07)
-18*** (-7.16)
-2169* (-1.72)
540*** (4.63)
-52 (-0.45)
602 (0.08)
-12194 (-0.71)
-53 (-1.07)
55 (0.91)
86** (2.11)
9.0% (individual)
87.0% (contract)
1427*** (23.65)
-150*** (-4.94)
248*** (7.11)
341*** (9.29)
280*** (7.85)
-10 (-0.53)
-113*** (-5.14)
78** (2.18)
69** (2.30)
-77*** (-3.52)
-82*** (-3.06)
-0.2 (-0.007)
-59 (-0.61)
-90*** (-4.38)
-270*** (-6.87)
-442*** (-10.53)
-574*** (-24.31)
-89*** (-3.87)
78*** (3.07)
513*** (17.02)
217*** (4.24)
-16*** (-6.92)
-2055* (1.88)
442*** (4.46)
-11 (-0.13)
3784 (0.70)
-22309* (-1.85)
-26 (-0.90)
48 (1.43)
53** (2.32)
adjusted R2=
33.4%
2352.9*** (4.81)
2.0 (0.89) 79
4.5* (1.91) 79
13.0*** (4.51) 189
10.4*** (4.87) 136
-3.2 (1.56) 166
-4.1*** (-2.93) -247
-1.2 (-0.49) 11
1.0 (0.36) 19
-2.5 (-1.14) -72
1.1 (0.63) -19
-10.8*** (3.32) -156
-6.0 (0.95) -6
0.7 (0.54) 33
-8.3* (-1.86) -430
-12.0*** (-2.79) -529
-7.3*** (-4.77) -292
-1.9* (-1.84) -37
-0.4 (-0.35) -18
13.2*** (7.97) 187
6.7*** (2.59) 34
-17.1 (-1.13) -89
7358.9 (0.76)
235.2 (0.37)
-241.0 (-0.54)
11390 (0.38)
-61197 (-1.35)
-179.3* (-1.81) -114
-113.1 (-0.92) -24
97.4 (1.12) 65
adjusted R2=
66.1%
Coefficient value (t-ratio in parentheses): *significant at a
-
8/3/2019 Empirical Methods for Investigating Governence
39/43
35
Table 5: Hierarchical linear and OLS models of JTPA participants pre- to post-program
quarterlyearnings change outcomes (study of service provider contracts)
Pre- to post-program quarterly
earnings change model predictors
Hierarchical
linear model
Individual level
OLS Contract(or) level OLS
Individual level variablesIntercept
Participant characteristics
Under age 18 years
Age 22-29 years
Age 30-39 years
Age 40 years and over
Male
African-American
Hispanic
Single head of household
Welfare recipient
No high school degree
Post high-school education
College graduateMinimal work history
Unemployed at application
Not in labor force
Zero earnings in pre-program year
Training services
Received basic/remedial education
Received vocational training
Received on-the-job training
Received job search/job club
Length of training (in months)
Economic/environmental factorsPercent change in employment, 1988-1989
Percent change in employment, 1989-1990
Percent change in employment, 1990-1991
Percent change in employment, 1991-1992
Percent change in employment, 1992-1993
Contract level variablesPrivate, nonprofit contractor
For-profit contractor
Performance incentives in contract
Predicting power (or percent of variationexplained)
447*** (6.76)
-79*** (-2.64)
-10 (-0.26)
-75* (-1.93)
-75* (-1.83)
-7 (-0.37)
-117*** (-5.00)
10 (0.27)
85*** (2.84)
25 (1.16)
-1 (-0.03)
-44 (-1.46)
-148 (-1.44)56*** (2.68)
-274*** (-7.08)
-311*** (-7.52)
111*** (4.69)
-65*** (-2.48)
-9 (-0.31)
206*** (5.62)
21 (0.36)
-7*** (-2.80)
-3169*** (-2.58)
290*** (2.60)
-304*** (-2.83)
-20042*** (-2.81)
-43810*** (-2.81)
-15 (-0.36)
25 (0.49)
163*** (4.70)
2.0% (individual)
49.0% (contract)
497*** (8.51)
-108*** (-3.72)
-12 (-0.35)
-85** (-2.36)
-106*** (-3.00)
-14 (-0.74)
-100** (-4.62)
22 (0.64)
114*** (3.91)
32 (1.53)
-6 (-0.25)
-46 (-1.56)
-146 (-1.42)49** (2.44)
-284*** (-7.36)
-321*** (-7.88)
61*** (2.65)
-78*** (-3.56)
51** (2.05)
239*** (8.03)
58 (1.12)
-7*** (-3.27)
-4779*** (-4.39)
287*** (2.95)
-213*** (-2.69)
-10025* (-1.92)
-18981* (-1.61)
-15 (-0.54)
17 (0.51)
123*** (5.53)
adjusted R2=
5.3%
2343.7*** (3.74)
-0.8 (-0.29) -32
1.5 (0.49) 26
-2.5 (-0.67) -36
-3.4 (-1.24) -44
-3.0 (-1.15) -155
-3.8** (-2.13) -229
-8.1*** (-2.57) -74
1.5 (0.43) 28
1.0 (0.37) 29
1.3 (0.65) 23
-20.7*** (-4.95) -298
1.2 (0.16) 11.9 (1.09) 90
-15.5*** (-2.70) -802
-15.5*** (-2.81) -684
-1.0 (-0.52) -40
0.3 (0.25) 6
2.5* (1.69) 119
13.6*** (6.37) 193
3.7 (1.13) 19
-43.2** (-2.23) -226
-17066 (-1.37)
410.9 (0.51)
-526.4 (-0.94)
15988 (0.41)
40950 (0.71)
-254.6** (-2.01) -162
-412.5*** (-2.62) -87
271.7** (2.44) 181
adjusted R2=
65.4%
Coefficient value (t-ratio in parentheses): *significant at a
-
8/3/2019 Empirical Methods for Investigating Governence
40/43
[ 36 ]
REFERENCES
Arum, Richard, Do Private Schools Force Public Schools to Compete?American Sociological
Review 61:1 (February 1996): 29-46.
Attewell, Paul and Dean R. Gerstein, Government Policy and Local Practice,American Sociological
Review 44 (April 1979): 311-327.
Bryk, Anthony, Stephen Raudenbush and Richard Congdon,Hierarchical Linear and Nonlinear
Modeling with the HLM/2L and HLM/3L Program, Chicago: Scientific Software International, 1999.
Bryk, Anthony S. and Raudenbush, Stephen W.,Hierarchical Linear Models: Applications and
Data Analysis Methods, London: Sage Publications, 1992.
Bryk, Anthony S. and Raudenbush, Stephen W., On Heterogeneity of Variance in ExperimentalStudies: A Challenge to Conventional Interpretations, Psychological Bulletin, 104:3 (1988): 396-
404.
Bryk, Anthony S. and Raudenbush, Stephen W., Application of Hierarchical Linear Models to
Assessing Change, Psychological Bulletin, 101:1 (1987): 147-158.
DAunno, Thomas, Robert I. Sutton, and Richard H. Price, Isomorphism and External Support in
Conflicting Institutional Environments: A Study of Drug Abuse Treatment Units,Academy of
Management Journal34:3 (1991): 636-661.
Ferguson, Ronald F., Paying for Public Education: New Evidence on How and Why Money Matters,
Harvard Journal on Legislation 28 (1991): 465-498.
Fletcher, Bennet W., Frank M. Tims, and Barry S. Brown, Drug Abuse Treatment Outcome Study
(DATOS): Treatment Evaluation Research in the United States, Psychology of Addictive Behaviors
11:4 (1997): 216-229.
Gerstein, Dean R., A. Rupa Datta, Julia S. Ingels, Robert A. Johnson, Kenneth A. Rasinski, Sam
Schildhaus, Kristine Talley, Kathleen Jordan, Dane B. Phillips, Donald W. Anderson, Ward G.
Condelli, and James S. Collins, Final Report: National Treatment Improvement Evaluation Study,U.S. Department of Health and Human Services, March 1997.
Goldstein, Harvey, and S. Thomas, Using Examination Results as Indicators of School and College
Performance,Journal of the Royal Statistical Society, 159:1 (1996): 149-163.
-
8/3/2019 Empirical Methods for Investigating Governence
41/43
[ 37 ]
Goldstein, Harvey,Multilevel Statistical Models, New York: Halsted Press, 1995.
Goldstein, Harvey, Statistical Information and the Measurement of Education Outcomes,Journal of
the Royal Statistical Society, 155 (1992): 313-315.
Goldstein, Harvey, Models for Multilevel Response Variables with an Application to Growth Curves,in R.D. Bock (Ed.),Multilevel Analysis of Educational Data, New York: Academic Press, 1989.
Goldstein, Harvey,Multilevel Models in Educational and Social Research, London: Oxford
University Press, 1987.
Goldstein, Harvey, Multilevel mixed linear model analysis using iterative generalized least squares,
Biometrika, 73 (1986): 43-56.
Gray, J., D. Jesson, Harvey Goldstein and J. Rasbash, A Multilevel Analysis of School Improvement:
Changes in Schools Performance Over Time, School Effectiveness and School Improvement, 6(1995): 97-114.
Heckman, James J., LaLonde, Robert J. and Jeffrey A. Smith, The Economics of Econometrics and
Active Labor Market Programs, Prepared for the Handbook of Labor Economics, Volume III, Orley
Ashenfelter and David Card, editors.
Heckman, James J., Carolyn J. Heinrich and Jeffrey A. Smith. "Assessing the Performance of
Performance Standards in Public Bureaucracies." American Economic Review, 87:2 (1997): 389-
395.
Heinrich, Carolyn J. and Laurence E. Lynn, Jr., "Governance and Performance: The Influence of
Program Structure and Management on Job Training Partnership Act (JTPA) Program Outcomes,"
presented at the Workshop on Models and Methods for the Empirical Study of Governance, University
of Arizona, April 29-May 1, 1999.
Heinrich, Carolyn J., "Organizational Form and Performance: An Empirical Investigation of Nonprofit
and For-profit Job-training Service Providers," working paper, National Bureau of Economic Research
and The University of Chicago, 1998.
Jennings, Edwart T., Building Bridges in the Intergovernmental Arena: Coordinating Employment andTraining Programs in the American States, Public Administration Review 54:1 (January/ February
1994): 52-60.
Jennings, Edwart T. and JoAnn Gomer Ewalt, Interorganizational Coordination, Administrative
Consolidation and Policy Performance, Public Administration Review 58:5 (September/October
-
8/3/2019 Empirical Methods for Investigating Governence
42/43
[ 38 ]
1998): 417-28.
Kreft, Ita G., Are Multilevel Techniques Necessary? An Overview, Including Simulation Studies,
unpublished manuscript, California State University, Los Angeles, 1996.
Kreft, Ita G. and Pamela R. Aschbacher, Measurement and Evaluation Issues in Education: The Valueof Multivariate Techniques in Evaluating An Innovative High School Reform Program,International
Journal of Educational Research 21 (1994): 181-196.
Lynn, Laurence E., Jr., Heinrich, Carolyn J., and Hill, Carolyn J., The Empirical Study of
Governance: Theories, Models, and Methods, Georgetown University Press, forthcoming, 2000.
Mead, Lawrence M., Optimizing JOBS: Evaluation Versus Administration, Public Administration
Review 57:2 (March/April 1997): 113-123.
Mead, Lawrence M., Performance Analysis, Unpublished manuscript, New York University, 1999.
Mead, Lawrence M., The Decline of Welfare in Wisconsin,Journal of Public Administration
Research and Theory, forthcoming.
Meier, Kenneth J., Bureaucracy and Democracy: The Case for More Bureaucracy and Less
Democracy, Public Administration Review 57:3 (May/June 1997): 193-99.
Meier, Kenneth J. and Joseph Stewart. The Impact of Representative Bureaucracies: Educational
Systems and Public Policies,American Review of Public Administration, 22:3 (September 1992):
157-71
Meier, Kenneth J., Joseph Stewart and Robert E. England. The Politics of Bureaucratic Discretion:
Educational Access as an Urban Service,American Journal of Political Science 35:1 (1991): 155-
177.
Meyer, Alan D. and James B. Goes, How Organizations Adopt and Implement New Technologies,
Best Papers Proceedings Academy of Management (Forty-seventh Annual Meeting of the Academy
of Management, New Orleans, Lousiana, August 9-12, 1987), pp. 175-179.
Milward, H. Brinton and Provan, Keith G., "Governing Service Provider Networks," Presented atEGOS 14th Colloquium, Maastricht University, The Netherlands, 1998.
Roderick, Melissa, Evaluating Chicagos Efforts to End Social Promotion, Presented at the
Workshop on Models and Methods for the Empirical Study of Governance, University of Arizona,
April 29-May 1, 1999.
-
8/3/2019 Empirical Methods for Investigating Governence
43/43
Roderick, Melissa and Eric Camburn, Risk and Recovery: Course Failures in the Early Years of High
School, Unpublished Manuscript, January 1997.
Sandfort, Jodi, The Structural Impediments to Front-line Human Service Collaboration: The Case of
Welfare Reform, Presented at the Annual Meeting of the American Political Science Association,Boston, September, 1998.
Singer, Judith D., Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and
Individual Growth Models,Journal of Educational and Behavioral Statistics, forthcoming.
Smith, Kevin B. and Kenneth J. Meier, Politics, Bureaucrats and Schools, Public Administration
Review 54:4 (November/December 1994): 551-558.