empirical methods for investigating governence

Upload: megan-ives

Post on 06-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Empirical Methods for Investigating Governence

    1/43

    MEANS AND ENDS: A COMPARATIVE STUDY

    OF EMPIRICAL METHODS FOR INVESTIGATING

    GOVERNANCE AND PERFORMANCE

    Carolyn J. Heinrich and Laurence E. Lynn, Jr.

    The University of Chicago

    DRAFT

    September 1999

    Prepared for the Fifth National Public Management Research Conference, George Bush School of

    Public Service, Texas A&M University, College Station, Texas, December 3-4, 1999, with the support

    of the Pew Charitable Trusts.

  • 8/3/2019 Empirical Methods for Investigating Governence

    2/43

    ABSTRACT

    Scholars within different disciplines employ a wide range of empiricalapproaches to understanding how, why and with what consequences government is

    organized. We first review recent statistical modeling efforts in the areas of education,

    job-training, welfare reform and drug abuse treatment and assess recent advances in

    quantitative research designs. We then estimate governance models with two different

    data sets in the area of job training using three different statistical approaches:

    hierarchical linear models (HLM); ordinary least squares (OLS) regression models using

    individual level data; and OLS models using outcome measures aggregated at the site or

    administrator level. We show that HLM approaches are in general superior to OLS

    approaches in that they produce (1) a fuller and more precise understanding of

    complex, hierarchical relationships in government, (2) more information about theamount of variation explained by statistical models at different levels of analysis, and (3)

    increased generalizability of findings across different sites or organizations with varying

    characteristics. The notable inconsistencies in the estimated OLS regression coefficients

    are of particular interest to the study of governance, since these estimated relationships

    are nearly always the primary focus of public policy and public management research.

  • 8/3/2019 Empirical Methods for Investigating Governence

    3/43

    Table of Contents

    Introduction 1

    Empirical Governance Research: Observations on the State of the Art 2

    Some Improvements in Models and Methods 3

    Lingering Limitations of Conventional Approaches 5

    Multilevel Approaches to Governance Research 9

    Applications of Multilevel Modeling 10Education 12

    Drug Abuse Treatment 14

    Employment and Training 16

    Comparing Hierarchical Linear Model and Ordinary

    Least Squares Results 17

    Model Specifications 18

    HLM and OLS Model Results 23

    Conclusions 28

    Tables 30

  • 8/3/2019 Empirical Methods for Investigating Governence

    4/43

  • 8/3/2019 Empirical Methods for Investigating Governence

    5/43

    [ 1 ]

    Introduction

    Scholars of governance within political science, public policy, and public administration describe

    their efforts to understand how, why and with what consequences government is organized and

    managed as getting inside of or breaking open the black box of program implementation (Lynn,

    Heinrich and Hill 1999). A wide range of research designs from case studies and historical

    accounts to more formal models that include quantitative analysis are employed to explicate the

    processes that establish the means and ends of governmental activity, and, in some studies, to assess the

    implications of administration and management for individual-level and program outcomes.

    Recently, reflecting world-wide interest in performance management, scholars have begun to

    advocate research strategies that relate the measurable effects of public programs and policies to the

    specific administrative practices and program or institutional features that seem to produce them (Lynn,

    Heinrich and Hill, 1999; Mead, 1997, 1999; Smith and Meier, 1994; Milward and Provan, 1998; and

    Roderick, 1999). Mead (1997) argues that program impact studies that neglect the influence of local

    administrative capacity and structures have little value to policy makers and program administrators.

    However, scholars have long recognized the theoretical and methodological difficulties associated with

    identifying and describing complex interrelationships across multiple administrative levels within public

    organizations and showing how different structural and administrative arrangements, collectively termed

    governance, affect program outcomes.

    This paper is concerned with assessing the advantages and disadvantages of different research

    strategies that may be used in the empirical study of governance and performance. We first review

    studies in several disciplines and policy areas including education, welfare reform, job-training and

  • 8/3/2019 Empirical Methods for Investigating Governence

    6/43

    [ 2 ]

    drug abuse treatment to determine the extent to which advances in statistical modeling, and, in

    particular, in hierarchical or multilevel modeling, as well as collaborations between researchers and

    public officials, increase the potential for more accurate and informative governance research. Then,

    based analyses of two different data sets that have individual level observations, we will compare the

    performance of three different statistical approaches: hierarchical linear models (HLM); ordinary least

    squares (OLS) regression models using individual level data; and OLS models using outcome measures

    aggregated at the site or administrator level. We will show that, in general, multilevel modeling strategies

    are more likely to produce unbiased estimates of policy, administrative or structural variable effects on

    outcomes than traditional, ordinary least squares approaches, particularly when the extent of cross-level

    effects operating at the multiple levels of analysis is relatively high.

    Empirical Governance Research: Observations on the State of the Art

    Most relationships in government and social systems involve activities and interactions that span

    multiple levels of organization or systemic structures. Empirical studies designed to analyze these

    relationships typically focus on program processes or outcomes at a single organizational (or individual)

    level. Some studies group or aggregate individuals (or units of analysis of some type) at a higher level of

    organization or structure and attempt to explain average effects or outcomes (e.g., for local offices or

    agencies.) Other studies, including experimental and non-experimental program evaluations, analyze the

    influence of organizational or structural factors on individual or lower-level unit outcomes by controlling

    for these factors in individual-level regressions or by estimating separate individual-level regressions for

    different organizational units. These analytical approaches all suffer from the limitations of conventional

    statistical methods for estimating linear models with multiple levels of data.

  • 8/3/2019 Empirical Methods for Investigating Governence

    7/43

    [ 3 ]

    Statistical modeling efforts designed to explain individual-level outcomes based on data from

    experimental and non-experimental analyses frequently account for factors related to program

    administration and implementation with a single program indicator variable, such as a school or local

    office indicator. In these studies, we gain little understanding of the interactions and influence of

    specific organizational or structural factors on program outcomes. While experimental evaluations of

    public programs such as those conducted by the Manpower Demonstration Research Corporation

    (MDRC) have consistently included process evaluation components, the qualitative data are

    subsequently used for descriptive or interpretive purposes rather than for establishing causal

    relationships between administrative practices and outcomes. This use of process analysis is informative

    and a potentially valuable complement to quantitative analyses. When they are not incorporated into the

    statistical models, however, process analyses tend to be overshadowed in presentations of findings

    concerning program impacts.

    Some Improvements in Models and Methods

    Both researchers and public officials are coming to recognize that accounting for average

    program outcomes or impacts provides little information to public managers about how they can

    improve program performance. For example, Mathematica Policy Research and its subcontractors are

    presently conducting an experimental evaluation of the Job Corps program that involves over 100 sites

    across the country and links client data to information about program administration and services

    provided at the sites. In addition, Manpower Demonstration Research Corporation (MDRC)

    investigators are currently engaged in research, utilizing multi-site, experimental Job Opportunities and

    Basic Skills (JOBS) evaluation data combined with the rich array of survey data of program

  • 8/3/2019 Empirical Methods for Investigating Governence

    8/43

    [ 4 ]

    administrators and staff, that departs from the traditional experimental approach to program evaluation

    by formally incorporating process data analyses into the modeling strategies. Unfortunately, relatively

    few researchers have access to these types of data sets substantial in size and collected through costly

    experimental designs that allow them to avert challenging statistical issues such as selection bias and

    comparison group inequalities, inadequate sample sizes, and other data-related problems.

    Progress is also being made, however, in the area of non-experimental methodologies using

    individual-level data obtained through administrative and other non-experimental sources. One of the

    advantages of non-experimental over experimental approaches is that they are better suited to estimating

    the heterogeneous effects of heterogeneous treatments or services on clients, and sorting out the

    differential effects that programs can have on various client groups. Such information is more likely to

    be useful to program administrators than simple average impact estimates.

    An example of this type of research is that of Heckman, LaLonde and Smith (forthcoming).

    They have produced an exhaustive analysis of the methodological lessons learned in evaluating social

    programs through the use of both experimental and non-experimental evaluation methodologies. They

    present a comprehensive discussion of a broad array of econometric models and estimators including

    their properties, assumptions and information about the way they condition and transform the data to

    guide researchers in their use of these methodologies. Somewhat surprising is their conclusion that

    there is no single, inherently preferable method or economic estimator for evaluating public programs:

    too much emphasis has been placed on formulating alternative econometric methods for correcting

    selection bias and too little [attention] given to the quality of the underlying data. Heckman, LaLonde,

    and Smith suggest that more effort should be invested in improving the quality of data used in studying

  • 8/3/2019 Empirical Methods for Investigating Governence

    9/43

    [ 5 ]

    the effects of public programs than in the development of formal econometric methods to overcome

    problems generated by inadequate data. More specifically, they show that if biases are clearly defined,

    comparable people in the same geographical areas are compared, and relevant background data on

    clients are collected (using the same survey questionnaires), problems in using non-experimental

    methodologies for evaluating program outcomes will be much less than formerly believed.

    Lingering Limitations of Conventional Approaches

    These advances in non-experimental evaluation methodologies, in combination with an

    increasing number of longer-term collaborations between public officials and scholars engaged in

    governance and evaluation research, have made the use of client-level administrative data in statistical

    models of program outcomes more feasible and frequent. Lingering problems still constrain what we

    can learn from these types of client-level data analyses, however.

    One problem is that these models typically explain only a small percentage of the total variation

    in individual outcomes. Individual-level data exhibit considerable random variation, and there are also

    likely to be a number of unmeasured influences on outcomes at the individual level. In educational

    policy research, for example, the oft-cited Coleman Report finding that schools bring little influence

    to bear on a childs achievement that is independent of his background and general social context . . .

    has undoubtedly been discouraging to educational research. Smith and Meier (1994) argue that, given

    the well-established distance between system characteristics and individual performance, using

    individual-level data to study educational system performance is a flawed approach.

    A second problem is that procedures to assess what portion of the explained variation can be

    attributed to any policy or administrative variables included in these types of models are hardly ever

  • 8/3/2019 Empirical Methods for Investigating Governence

    10/43

    [ 6 ]

    straightforward. For example, Jennings and Ewalt (1998) studied the influence of increased

    coordination and administrative consolidation in JTPA programs on ten JTPA participant outcomes

    while controlling for demographic and socioeconomic characteristics of participants. Their models

    account for 5-29 percent of the total variation in individual outcomes, and the administrative variables

    are statistically significant in about half of these models. Some questions that arise include: How much of

    the totalvariation in client outcomes is attributable to policy or program design and implementation

    factors? How much of the portion of variation attributable to such factors is explained by the two

    administrative variables included? Are there other potentially important administrative variables not

    incorporated in these models that might change the observed effects of the coordination and

    consolidation variables that are included? We are left not only with uncertainty about how much of a

    difference the organization of these programs makes, but also with unclear policy prescriptions for

    program administrators, (i.e., should they consolidate or not?)

    Such limitations in modeling using individual-level outcomes leads Mead (1997, 1999) and

    others to urge more research that models administrative processes and program outcomes across

    multiple sites using client data aggregated at the site level. Mead (1999) describes this type of research

    as performance analysis: process research that draws formal, statistical connections between

    administrative practices and outcomes, with programs or sites as the unit of analysis. He argues that

    variation [in outcomes] across programs tends to be more systematic, and therefore, explanatory

    models using these data tend to be strong. In fact, the proportion of variation explained in

    organizational, program- or site-level regressions (as indicated by R2 values) is typically considerably

    higher than in similar individual-level regressions. In Meads (forthcoming) study of the influence of

  • 8/3/2019 Empirical Methods for Investigating Governence

    11/43

    [ 7 ]

    JOBS program requirements (clients active/inactive statuses) on changes in Wisconsin welfare

    caseloads controlling for caseload demographics and economic factors, he explains 76 percent of the

    variation in welfare caseload changes.

    Sandfort (1998), who studied service technologies in Michigans Work First program and their

    relationship to program outcomes, also maintains that the unit of analysis in policy studies of welfare

    reform should be the program or organization. She argues that the more crucial forces shaping policy

    are within the organizations themselves, and that individual-level data should be placed within their

    larger, critical organizational context. In her county-level analyses, she models the proportion of

    welfare recipients combining welfare and work in an average month and the proportion leaving welfare.

    She includes county-level measures of the proportions of service providers offering specific service

    technologies (e.g., job search assistance, soft skills, etc.) and four service delivery structure measures

    (e.g., Project Zero, non-profit agency, etc.). She also includes several measures of welfare recipient

    demographics. Despite the fairly limited set of explanatory variables available to her, Sandfort explains

    approximately 60 percent of the variation welfare program outcomes.

    While Sandforts work is a noteworthy example of this type of research, it also illustrates how

    data access problems can constrain site-level analyses. She acknowledges that her minimal information

    on welfare caseload characteristics might contribute to omitted variable bias in her models. Potentially

    more problematic for policy analyses, however, is her qualitative finding that there is significant

    variation in the service technology used by Work First providers in the same county, even though they

    face the same local economic environment. This suggests that potentially important variation in service

    delivery approaches at the service provider level is obscured in county-level aggregates used in the

  • 8/3/2019 Empirical Methods for Investigating Governence

    12/43

    [ 8 ]

    regressions. The services clients take up at this lower level might be related to their individual

    characteristics as well as to those of the service providers.

    Mead is clear about what he views as the main shortcoming of his 1999 study of Wisconsin

    welfare caseloads: the inability to evaluate the effects of work policies on caseloads as definitively as

    program impacts on individuals, since cross-sectional analyses explain variations in change around the

    state [between counties] rather than the overall trend. The variation being explained in site- or

    program-level models is not variation in test scores or earnings but rather variation between sites or

    programs in average outcomes. It is inappropriate to use the findings of regression models at one level

    of hierarchy to infer what might be going on at lower levels, although information from case studies and

    qualitative data analyses can help inform us about these inter-relationships at other levels.

    Fergusons (1991) research on 900 Texas school districts illustrates this type of slippage in

    discussing site-level model findings. He uses OLS regressions to explain district average reading and

    math scores with a wealth of district-level administrative, structural, socioeconomic and context

    measures. He reports positive, statistically significant relationships between student test scores and

    higher teacher exam scores, smaller classes and more experienced teachers. He

    concludes that higher-quality schooling produces better reading skills among public school students.

    His use of explanations of variation in average school district test scores to draw implications for

    students outcomes ignores the fact that, within districts, there are schools, grades and classrooms

    where many of these same factors may be interacting with other administrative and individual-level

    factors at these levels to influence student achievement. He further suggests that researchers should

    combine the results of studies examining different levels or components of a hierarchical system to link

  • 8/3/2019 Empirical Methods for Investigating Governence

    13/43

    [ 9 ]

    teacher salaries to teacher quality, teacher quality to students test scores, and students test scores to

    earnings later in life. Such meta-analyses, while useful for addressing some questions, still risk neglecting

    important factors that interact at the multiple levels of hierarchy within school systems.

    Recent advances in statistical methodologies allow for empirical analyses of factors interacting at

    multiple levels of hierarchy within government and social systems. Such advances show considerable

    promise for improving knowledge of how governance affects public sector performance. Research

    designs that integrate quantitative and qualitative information and that are based on multi-level models

    and on data sets that include individual level observations are conceptually demanding and expensive,

    however. Is the extra effort justified in terms of the results that are produced in comparison with less

    complex designs? We address this question next.

    Multilevel Approaches to Governance Research

    While some forms of multilevel modeling have been in use for close to two decades, recent

    work by Bryk, Goldstein, Kreft, Raudenbush and Singer has advanced the use of these models in

    education and related fields of social policy research. New statistical packages have also been

    developed to make these techniques more accessible to researchers. 1

    Applications of Multilevel Modeling

    Multilevel statistical models have many different potential applications across a number of

    disciplinary fields, including sociology, biology and economics, among others. In this paper, we focus

    1 AMultilevel Modeling Newsletter and a Harvard University website (maintained by Singer) provide

    technical assistance to researchers and promote the dissemination of new research findings on the use of multilevel

    (or hierarchical linear) modeling. Some of these statistical techniques, such as the nonlinear form known as

    hierarchical generalized linear models (HGLM), are so new that the software developers issue disclaimers with the

    release of these programs.

  • 8/3/2019 Empirical Methods for Investigating Governence

    14/43

    [ 10 ]

    on the use of multilevel models to formulate and test hypotheses about how factors or variables

    measured at one level of an administrative hierarchy might interact with variables at another level. The

    existence of these types ofcross-level interactions or effects is at the crux of the development of

    multilevel modeling techniques.

    In multilevel models, the assumption of independence of observations in the traditional OLS

    approach is dropped, and relationships in the data, rather than assumed to be fixed over contexts, are

    allowed to vary. The extent to which multilevel modeling improves statistical estimation in comparison

    to OLS models depends on the potential for and strength of cross-level effects in the data and the

    corresponding extent of variation in the dependent variable to be explained at the different levels of

    analyses. When significant cross-level interactions are present but ignored in OLS modeling efforts,

    problems arise, including reduced (or inflated) precision of estimates, mis-specification and subsequent

    misestimation of model coefficients, and aggregation bias.

    Because multilevel modeling expands the possibilities for investigating hierarchical relationships

    and cross-level interactions involving two or three levels of organization, many see it as providing a link

    between theory and practice in organizational studies (Kreft, 1996.) Bryk and Raudenbush (1992)

    criticized the neglect of hierarchical relationships in traditional OLS approaches as fostering an

    impoverished conceptualization that has discouraged the formulation of hypotheses about effects

    occurring at and across different levels. Goldstein (1992) also sees multilevel modeling as an

    explorative tool for theory development about relationships within and between levels of social

    systems. He cautions, however, that exploratory analyses should not be substituted for well-grounded

    substantive theories and that multilevel models should not be seen as a panacea for all types of complex

  • 8/3/2019 Empirical Methods for Investigating Governence

    15/43

    [ 11 ]

    data analysis problems. As Kreft (1996) points out, a particular statistical model cannot be optimal in

    general only in specific research contexts and models should be selected based on both the theory

    or research questions being tested and the type of data collected.

    To illustrate with a governance example, if a functioning hierarchy of structural arrangements and

    of management activities originating at one level does indeed influence activity at other (particularly

    lower) levels of the organization, as they are presumably intended to do (or might do in unintended

    ways), then we should anticipate and model the interdependence among hierarchically-ordered

    variables. The absence of such cross-level interactions, on the other hand, might imply a high degree of

    compartmentalization, or loose coupling across levels, and of sub-unit independence within the

    organization. Furthermore, the presence of significant higher-level effects on organizational performance

    in the absence of interdependence among hierarchical variables might suggest that lower-level

    characteristics are essentially irrelevant to the efficacy of higher-level governance. While many policy

    makers dream of circumstances where lower levels of the organization do not influence policy success,

    empirical findings to this effect should probably be regarded with some suspicion.

    Our literature review suggests that the application of actual hierarchical models in governance

    and public management research is of quite recent vintage. Earlier research employed multi-level

    concepts but not necessarily hierarchical models. For example, Meyer and Goes (1987). in their study

    of non-profit hospitals adoption of innovative technologies, described their analytical approach as

    hierarchical regression, but a careful review of studies such as these shows that multilevel modeling

    techniques are not in fact utilized. Meyer and Goes assigned their explanatory variables to different

    subsets according to the level of analysis to which they apply e.g., an organizational subset, a leader

  • 8/3/2019 Empirical Methods for Investigating Governence

    16/43

    [ 12 ]

    subset, an environmental subset, etc. and entered the different subsets into the regression model in

    stages, examining changes in explained variation (R2) as the variables are added. Unlike HLM

    modeling, this analytical strategy does not allow for analyses of cross-level effects between variables in

    the different subsets.

    Education

    Given the large body of empirical research on educational processes and the ongoing, critical

    concern for education policy and outcomes, it is not surprising that education researchers have led social

    science efforts to develop and apply hierarchical linear models to the analysis of relationships in public

    service delivery systems. The early studies of researchers who have published most extensively on the

    use of multilevel or hierarchical linear models in education including Harvey Goldstein (University of

    London), Anthony Bryk (University of Chicago) and Stephen Raudenbush (Michigan State University)

    first emerged in the mid- to late 1980s (Goldstein, 1986, 1987, 1989; Bryk and Raudenbush, 1987,

    1988.) Bryk and Raudenbush, for example, applied these techniques to analyze school-level effects on

    students growth in mathematics achievement scores and were surprised by the high proportion of

    variance in growth rates that was found to be between schools (83%). They continued on in their

    research and developed the Hierarchical Linear Modeling (HLM) statistical program that is now widely

    used in social sciences research (1992, 1999.) The research of Goldstein and his colleagues has also

    progressed steadily, with a considerable number of applications focused on the British educational

    system, including larger-scale school performance reviews mandated by the British government (1992,

    1995, 1996.)

    More recently, Roderick and Camburn (1997) and Roderick (1999) have been examining the

  • 8/3/2019 Empirical Methods for Investigating Governence

    17/43

    [ 13 ]

    Chicago public school systems decision to end social promotion and increase students achievement.

    They are drawing upon the wealth of data generated by the Consortium on Chicago School

    Research, which has collaborated with the Chicago Public Schools to develop data sets and

    methodologies for multilevel studies of school reform implementation.

    Roderick and Camburn used hierarchical generalized linear models (the non-linear form of

    HLM) to test hypotheses about students likelihood of failing courses and their likelihood of subsequent

    recovery from grade failure. Their models allowed them to assess the potential effectiveness of three

    alternative strategies (individual- and system-focused) for improving student performance: (1) improving

    the educational preparation of students before they enter high school, (2) creating transition years to

    ease stress and increase support for students, and (3) instituting large-scale, school-wide restructuring

    and reform efforts to improve teaching practices and school environments. They found a number of

    important relationships among individual- and school-level variables and generated strong evidence of

    school-level effects that suggest, in their words, governance and instructional environments . . . matter.

    Presently, Roderick (1999) is using three-level hierarchical linear models to analyze changes in

    students grades and test scores over time (level 1); students paths (promotion, retention, summer

    school participation, etc.) through the new policies implementation (within schools and across years)

    and the influence of student characteristics (level 2); and the effectiveness of schools responses to these

    policies as a function of school demographics and characteristics, measures of policy implementation

    and teachers classroom strategies, and the school environment and prior school development (level

    3). This study also includes an extensive qualitative component with intensive case studies of each

    schools approach to policy implementation and a longitudinal investigation of students experiences

  • 8/3/2019 Empirical Methods for Investigating Governence

    18/43

    [ 14 ]

    under the promotional policy.

    Drug Abuse Treatment

    Early large-scale studies on drug abuse treatment effectiveness included: (1) the Drug Abuse

    Reporting Program (DARP), which collected data from approximately 44,000 clients and 52 federally-

    funded treatment programs between 1969 and 1972, and (2) the Treatment Outcome Prospective

    Study (TOPS), which was intended to expand the data collected in DARP and involved more than

    11,000 patients in 41 programs between 1979 and 1981. Longitudinal (non-experimental) analyses of

    the cost-effectiveness of various drug abuse treatment modalities were conducted using these client-level

    data, although information about programs or organizations was limited in focus to services delivered

    and program environments.

    These research efforts were followed by other major studies, including the Outpatient Drug

    Abuse Treatment Systems (ODATS) study and the Drug Abuse Treatment Outcomes Study

    (DATOS). ODATS, which is continuing, surveys unit directors and supervisors in drug abuse treatment

    programs to obtain rich, organization-level data on characteristics of the programs, their environments

    and their clients. ODATS has progressed through four waves of data collection from a total of more

    than 600 programs since 1984. In contrast, a major strength of the DATOS research is the

    extensiveness of client-level data obtained from more than 10,000 adults in 99 drug abuse treatment

    programs between 1991 and 1993. Research using these data sets address questions about program

    design, treatment practices, and client outcomes (DAunno, Sutton and Price, 1991 and Fletcher, Tims

    and Brown, 1997). Our own exploration of these data suggests that adequate information for a

    multilevel investigation of governance and performance is lacking.

  • 8/3/2019 Empirical Methods for Investigating Governence

    19/43

    [ 15 ]

    In an early study on the effectiveness of methadone treatment for heroin addiction, Attewell and

    Gerstein (1979) drew on organizational theory to develop a hierarchical conceptual model of policy

    implementation that link[s] the macrosociology of federal policy on opiate addiction to the

    microsociology of methadone treatment (311). They used a case-study approach, including

    observational research in clinics, interviews with clients, and analyses of program records from clinics

    over multiple years, to investigate managerial responses at the program level to government policy and

    institutional regulation, as well as clients responses and behavior to subsequent program changes.

    Based on qualitative analysis of these observations, they found that compromised policies at the

    federal level resulted in ineffective local management practices and poor outcomes for clients.

    Gerstein now directs the National Treatment Improvement Evaluation Study (NTIES), which

    should permit quantitative, multilevel analyses of drug abuse treatment policies and programs. In the

    NTIES final report on the NTIES evaluation study (1997), Gerstein et al. described how a two-level

    design permeated every level of the project. This study evaluates both administrative and clinical

    (client) processes and outcomes for over 6,000 clients in up to nearly 800 programs. Like the effort led

    by the Consortium on Chicago School Research, the design of the NTIES project provides a model for

    researchers who are considering plans for a multi-site, multilevel study in any field.

    Employment and Training

    Our own multilevel study and a separate work by Heinrich (1999) on administrative structures

    and management/incentive policies in JTPA programs provide the basis for our comparison of multilevel

    modeling techniques with the individual-level and site-level modeling approaches. Heinrich and Lynn

    (1999) used data collected during the National JTPA Study on individuals characteristics and earnings

  • 8/3/2019 Empirical Methods for Investigating Governence

    20/43

    [ 16 ]

    and employment outcomes, as well as administrative and policy data obtained from the sixteen study

    sites over a three-year period, to estimate hierarchical linear models. They found that both site-level

    administrative structures and local management strategies (including performance incentives) had a

    significant influence on client outcomes.

    In her multilevel study of local JTPA service providers and their contracts with a single JTPA

    agency, Heinrich also examined the influence of organizational structure or form (i.e., public nonprofit,

    private nonprofit, and for-profit service providers) and the use of performance incentives in service

    provider contracts on client outcomes, controlling for client characteristics and the services they

    received. She similarly found significant effects of the use of performance incentives by local JTPA

    agencies on client outcomes.

    The data used in these two studies allow for a comparison of different statistical approaches.

    Further, the extent of cross-level interactions among hierarchical variables in these two sets of data are

    quite different. Differences in the extent of intra-class correlation in hierarchical data have important

    implications for the relative advantages and disadvantages of using multilevel modeling strategies in

    different research contexts, as we shall show.

    Comparing Hierarchical Linear Model and Ordinary Least Squares Results

    Different models may yield different answers to the same question. Thus researchers should

    select modeling approaches that not only fit the data but that are also appropriate ways to address the

    questions or hypotheses of interest. In our studies of JTPA programs, two different levels of analyses

    are represented: (1) the client or individual level, and (2) the site (service delivery area) or contract level,

    which made it possible to organize or fit the data using several different modeling strategies. For OLS

  • 8/3/2019 Empirical Methods for Investigating Governence

    21/43

    [ 17 ]

    regressions of individual-level outcomes, the site-level (or contract-level) administrative and

    management/incentive policy data were linked to the individual participant records, so that all

    participants in a given site and year (or served under a specific contract) had the same site-level (or

    contract level) variable values. For the site-level or contract-level OLS regressions, the individual-level

    data were collated by site or by contract, and average measures of these variables were entered into the

    models, along with the site- or contract-level administrative and policy variables. In the hierarchical

    linear models, each of these two levels of data was formally represented by its own sub-model, with

    each sub-model specifying the structural relations occurring and the residual variability observed at that

    level.

    The presence of significant intra-class correlations in hierarchical data (described further in the

    following section) violates basic assumptions of the OLS regression model, including: (1) the

    independence of observations, and (2) that the number of independent observations is equal for all

    variables. One of the most widely extolled features of hierarchical linear models is the capability they

    provide for partitioning variance into components associated with the different levels of analysis, and

    subsequently allowing the detection and exploration of differences across contexts or groups. For

    example, large between-group variances will indicate that an overall regression will mis-estimate

    relationships for the individual groups.

    Model Specifications

    One strategy for exploring multilevel data is to first estimate an unconditional means model.

    This simple model expresses the outcome, Yij, as a linear combination of the grand mean of Yij (m), (a

    fixed component), and two random components: the variability between sites or groups (uj), and the

  • 8/3/2019 Empirical Methods for Investigating Governence

    22/43

    [ 18 ]

    residual variance associated with the ith unit or individual in the jth site or group (rij). Following a

    multilevel modeling approach, the level one individual outcome model is: Yij = b0j + rij , and the level

    two model is expressed as a function of the overall mean and random deviations from that mean: b0j=

    m00 + u0j. Substituting the level two sub-model into the level one sub-model yields the multilevel model:

    Yij = m00 + u0j + rij . (Eq. 1)

    Using the covariance parameter estimates from the unconditional means model, one can test

    hypotheses about whether the variability between groups and the residual variability within groups are

    significantly different from zero. This information may also be used to estimate the intra-class

    correlation, which indicates what portion of the total variance in outcomes occurs between sites or

    groups (Bryk and Raudenbush 1992 and Singer 1997). A high proportion of intra-class correlation in

    the data would suggest that OLS analyses are likely produce misleading results. As a general rule of

    thumb, Kreft (1996) defines high intra-class correlation as larger than r = 0.25, (i.e., more than 25

    percent of the variation between sites or groups), although much smaller proportions of total variance at

    the site- or group-level may be statistically significant and warrant exploration.

    The results reported below were derived from the two separate studies of JTPA programs

    discussed earlier: the analyses of data from the sixteen National JTPA Study sites over three years, and

    the analyses of data from Heinrichs study of JTPA training providers and their contracts with a local

    JTPA agency. The estimation of unconditional means models showed that for the 16 NJS sites (or 48

    observations over three years), a very small but still statistically significant percentage (about 3%) of the

    total variation in participant outcomes was between sites, (or at the site-level). In the study of

    approximately 400 JTPA service provider contracts, a much larger percentage (6-39%) of the total

  • 8/3/2019 Empirical Methods for Investigating Governence

    23/43

    [ 19 ]

    variation was at the contract administration level. These simple statistics suggest that we should expect

    more cross-level interactions between levels of analyses in the study of JTPA contracts, and that the

    results of the three different modeling strategies individual-level OLS models, site-level or contract-

    level OLS models, and (two-level) hierarchical linear models (HLM) would be more likely to diverge

    in the contract study findings.

    When investigating possible cross-level interactions in hierarchical data, one is advised to begin

    with a theory about which variables at the various levels would be expected to interact as well as about

    the nature of the interactions. At the second (group or site) level, sub-models denoting the relationships

    between level one and level two variables may specify fixed or randomly varying intercepts and/or

    slopes. The full multilevel approach, in which both intercepts and slopes vary randomly, is sometimes

    used for exploring the full range of potential cross-level effects in hierarchical data. This approach is

    similar to fitting a different regression model within each of the level two groups or sites, and this is

    typically efficient only when there is a relatively small number of level two observations with large

    numbers of level-one cases within each group or site. In our study of the sixteen NJS sites, we

    estimated a full, multilevel model (also known as an intercepts- and slopes-as-outcomes model),

    which we will also report below. In modeling JTPA participants earnings outcomes following

    their participation in JTPA programs, the levelone (individual) sub-modelis specified as follows:

    Yij = b0j + b1jX1j + ...+ bnjXnj + rij, (Eq. 2)

    where Yij is a measure of a participants post-program earnings; the subscript j denotes the site and

    allows each site to have a unique intercept and slope for each of the level one (individual characteristic)

  • 8/3/2019 Empirical Methods for Investigating Governence

    24/43

    [ 20 ]

    predictors, (X1j to Xnj), and the residual, rij, is assumed to be normally distributed with homogeneous

    variance across sites. In the level two (site) sub-modelshown below, all of the predictors (Wj) are

    measured at the site level, (i.e., variables describing administrative structures, performance incentive

    policies, contracting practices, and economic conditions at the sites):

    b0j = g00 + g01W1j + ... + g0nWnj + u0j (Eq. 3)

    b1j = g10 + g11 W1j + ... + g1nWnj + u1j . . .

    bnj = gn0 + gn1W1j + ... + gnnWnj + unj

    The level one and level two sub-models together define the intercepts- and slopes-as-outcomes model.

    In the level two sub-model, the level one intercept and beta coefficients are expressed as a linear

    function of the level two predictors. In interpreting the results of this model, one examines the estimated

    values of the level two coefficients (g01 to gnn) to determine which site-level variables help predict: (1)

    why some sites realize better average earnings outcomes than others, and (2) how the effects of some

    level one (client-level) variables on outcomes vary across sites.

    The results of our estimation of the intercepts- and slopes-as-outcomes model revealed very

    few statistically significant relationships among level one and level two predictors, thus indicating that

    there was little significant variation in how the effects of client-level variables influenced outcomes across

    the sites. These findings suggested that we could simplify our model; that is, the relationships between

    the level one and level two variables did not appear to vary randomly across the sites, and thus

    randomly varying slopes were not necessary. This is also the point at which we brought our theory of

  • 8/3/2019 Empirical Methods for Investigating Governence

    25/43

    [ 21 ]

    governance in JTPA programs to bear more definitively on the modeling process. For example, we did

    not expect the relationship between having a Private Industry Council as the administrative entity (a level

    two variable) and the effects of participants gender (a level one variable) on earnings outcomes to vary

    across the sites and years. Rather, we expected (and the intercepts- and slopes-as-outcomes model

    results confirmed) that the relationships between administrative structure and the effects of individual-

    level characteristics such as gender on outcomes were fairly constant (or fixed) across sites and years.

    When one assumes fixed effects for the level one predictors, a different level two sub-model is

    specified to combine with the level one sub-model (eq. 2.) This level two sub-model specification, a

    variation of the random-intercept model, is:

    b0j = g00 + g01W1j + ... + g0nWnj + u0j

    b1j = g10 , . . . , bnj = gn0 (Eq. 4)

    As in equation 4 above, the relationships between the level two (site-level) variables and the effects of

    level one (client-level) predictors on earnings outcomes are fixed (b1j = g10 . . . bnj = gn0). Combining the

    level one sub-model (Eq. 2) and this level two sub-model, (i.e., substituting eq. 4 into eq. 2), the

    multilevel model derived is:

    Yij = g00 + g01W1j + ... + g0nWnj + g10X1j + ...+ gn0Xnj + u0j + rij. (Eq. 5)

    Through estimation of this hierarchical linear model (eq. 5), one obtains coefficient values for all level

    one (X1j to Xnj) and level two (W1j to Wnj) predictors that account for the interrelationships among

  • 8/3/2019 Empirical Methods for Investigating Governence

    26/43

    [ 22 ]

    these variables (as specified in the level two sub-model, eq. 4) and that indicate the direction and

    significance of their effects for participants earnings.

    Equation 5 above was used in estimating the hierarchical linear models presented in Tables 1

    and 2 for the study of the sixteen NJS sites over three years. In Heinrichs study of service provider

    contracts, two of the multilevel models (of participants earnings in the first post-program quarter and

    their pre- to post-program quarterly earnings changes) employ this same specification (i.e., fixed level

    two effects), while the other model specifies both fixed effects and a random effect in the level two sub-

    model. The level two sub-model for this second specification is shown below:

    b0j = g00 + g01W1j + ... + g0nWnj + u0j

    b1j = g10 + g01W3j (random effect)

    b2j = g20 , . . . , bnj = gn0 (fixed effects) (Eq. 6)

    HLM and OLS Model Results

    The findings of the hierarchical linear models are shown in the first column of Tables 1-5. The

    second and third columns in each table show the results of the individual-level OLS and site-level OLS

    regressions, estimated using the same data and exactly the same set of dependent and explanatory

    variables as in the multilevel models.

    In examining the findings in these tables, the fixed effect coefficient estimates (g10-gn0) of the HLM

    models(in the first column) are directly comparable to the OLS beta coefficient estimates of the individual-

    level regressions (in the second column.) In the site-level regressions, the variables that are indicator (or

    binary) in form at the individual level (e.g., single head of household, welfare recipient, etc.) are aggregated

  • 8/3/2019 Empirical Methods for Investigating Governence

    27/43

    [ 23 ]

    and become average proportions at the site-level. To allow for comparisons of these site-level OLS

    coefficients with the coefficient estimates of binary variables in the other models, these coefficient estimates

    are multiplied by their site-level average values to calculate estimated effects for the average individual (in

    the third column).

    The random effect estimated in the HLM model of hourly wages at termination (in the service

    provider contracts study) indicates that there is a statistically significant, cross-level interaction between

    the effects of contract performance incentives and the proportion of participants under age 18 that

    varies across sites. The positive sign on this random coefficient indicates that, on average, sites with

    higher proportions of young participants that also include performance incentives in the contracts of

    providers who serve them will improve hourly wage outcomes for participants. (For additional

    discussion of the substantive findings of the models shown in Tables 1-5, see Heinrich and Lynn (1999)

    and Heinrich (1999.)

    We begin the technical comparison of these modeling strategies by turning to Tables 1 and 2,

    which display the results of the NJS data analyses. It is apparent that the HLM (column 1) and

    individual-level OLS (column 2) estimated variable coefficients are very close for both individual-level

    and site-level predictors. This is particularly evident in Table 2, (the model of participants first post-

    program year earnings), where 97 percent of the site-level variation is explained by the model. In

    general, these findings confirm that where a very small percentage of variation occurs at the site-level

    (approximately 3%), OLS and HLM methods are likely to produce comparable estimates of individual

    and site-level effects. Another reason for the similarity of these two sets of results is that statistical tests

    (performed using HLM model output) showed that all of the statistically significantvariation at the site

  • 8/3/2019 Empirical Methods for Investigating Governence

    28/43

    [ 24 ]

    level was explained away by the predictors included. That is, there was no statistically significant

    variation at the site level that remained to be explained or accounted for in these models (or no omitted

    variable bias at level two.)

    One might reasonably ask what the advantage is of using HLM in these cases. First, we can

    identify how much of the variation in outcomes lies at the different level of analyses. Second, we can

    assess what proportion of this variation (at both site- and individual-levels) is explained by our models

    and whether any statistically significant variation remains to be explained. In addition, researchers can

    use various analytical strategies to examine and check for patterns or irregularities in the residuals at

    both the site- or group-level (u0j) and the individual-level (rij). Bryk, Raudenbush and Congdon (1999)

    and Goldstein (1995) describe a number of these techniques such as Q-Q plots, plots of empirical

    Bayes (level two) versus least square residuals, and plots of empirical Bayes residuals with level-two

    predictors to assess model fit and reliability.

    Comparing HLM and individual-level OLS results for the service provider contract models

    (Tables 3-5), where there was a much larger percentage of variation at level two (or between

    contracts), the variable coefficient estimates are still similar, although not as close as those in Tables 1

    and 2. The differences in estimated coefficient values are more noticeable in Tables 3 and 4, where

    approximately 30-40 percent of the total variation was at the contract level. While the level two

    variables in these models did substantially reduce the amount of contract-level variation that was not

    accounted for, there were still statistically significant differences between the outcomes by contract that

    remained to be explained.

    The most striking findings of this investigation of modeling strategies, however, can be seen in

  • 8/3/2019 Empirical Methods for Investigating Governence

    29/43

    [ 25 ]

    the comparison of the site-level OLS model results with those of the HLM and individual-level OLS

    regressions. In contrast to the comparable findings of the HLM and individual-level OLS models, the

    site-level models produce both inconsistent and seemingly inaccurate estimates of some of the

    individual- and site-level coefficients. (See the italicized numbers in the third column of Tables 1-5.)

    While the percent of variation explained in the site- (or contract) level OLS models and the HLM

    models is similar, the size, sign and statistical significance of some of the coefficient values and estimated

    effects differ noticeably across different outcomes in the respective studies as well as from the HLM and

    individual-level OLS model results. Given that some of the seemingly anomalous estimated effects in the

    site-level OLS models of JTPA participant outcomes are contrary to the findings of other JTPA

    research (e.g., the positive effect of being a high school dropout on earnings in four of the five site-level

    OLS models), we believe that it is the site-level OLS models that are probably inaccurate. These

    findings also imply, contrary to Meads argument, that modeling administrative processes and program

    outcomes across multiple sites with data on clients aggregated at the site levelmay be a less reliable

    approach than similar (multiple-site) client-level data analyses.

    The notable inconsistencies in the site- or contract-level policy/administrative/structural

    coefficients are of particular importance for the study of governance, since these variables are nearly

    always the primary focus of public policy or administration studies. In many of the studies (some

    discussed earlier) that use site- or organization-level approaches, it is common to see researchers

    reporting high levels of variation explained with a relatively small number of policy or governance

    variables. A few, such as Mead (forthcoming), make it clear that site- or organization-level OLS

    models are not explaining variation in individual outcomes, but rather variation between average

  • 8/3/2019 Empirical Methods for Investigating Governence

    30/43

    [ 26 ]

    outcomes across the sites or organizations. Our findings underscore that ignoring the variation in

    individual-level outcomes and the potential cross-level effects between variables operating at individual-

    and site- or organization levels may well lead to inaccurate estimates of policy/administrative/structural

    variable effects.

    In a recent study that also compared multilevel modeling strategies to individual- and group-level

    OLS regressions, Krull and MacKinnon (1999) reached a similar conclusion. In discussing the

    individual- versus group-level models, they also pointed out that when individual-level data are

    aggregated, the ability to predict individual-level variation, which frequently comprises the majority of

    total variation, is eliminated. Therefore, researchers should expect that individual and group level

    analyses of the same data might indicate relationships that differ in both magnitude and direction.

    Overall, they concluded that multilevel-based estimates of the standard error showed considerably less

    bias than OLS-based estimates, and that OLS analyses were less efficient than multilevel analyses

    (433).

    To summarize, in the absence of multilevel analyses, researchers are unable to determine how

    much of the total variation in outcomes lies at the site- or organization level (i.e., the extent of intra-class

    correlation) and how much of it one is able to explain with a given model specification. In Table 2,

    where the amount of intra-class correlation was small and the site-level variables included in the models

    explained nearly all of the site-level variation in outcomes, the estimates produced by the three different

    modeling strategies of policy/administrative/structural effects were much closer. Without this

    information, however, how does one assess the probable accuracy of estimated effects? While some

    researchers support their quantitative studies with qualitative, hands-on components, it is also not

  • 8/3/2019 Empirical Methods for Investigating Governence

    31/43

    [ 27 ]

    uncommon for them to report some findings that are inconsistent with their hypothesized effects. In

    these cases, how does one ascertain whether it is the theory or the model specification that is in error?

    The results of the analyses presented here suggest that more attention should be given to multilevel

    modeling as a strategy for empirically investigating the linkages between governance and performance.

    Conclusions

    Multilevel modeling holds considerable promise for governance research. Rapidly increasing

    computing capacity and new developments in statistical theories have now made programs for multilevel

    modeling (HLM, HOMALS, VARCL, BIRAM, and SAS mixed models are a few examples; see Kreft

    and Aschbacher 1994) accessible to anyone willing to invest some time in learning about the underlying

    theories and how to apply them. In a recent workshop Models and Methods for the Empirical Study

    of Governance, Ann Chih Lin asked, however, whether our quest to advance the empirical study of

    governance will drive a push to create Godzilla-like data sets and the subsequent analysis and re-

    analysis of them. She noted that developing and supporting the analyses of large-scale, multilevel (and

    frequently longitudinal) data sets such as those described in this paper require substantial resources that

    might otherwise provide support to many smaller projects. One might question, for example, whether

    substantially more knowledge might be gained from a multi-site, multilevel empirical study of drug abuse

    treatment programs (such as that which NTIES might allow) than a number of smaller-scale case-

    studies like that produced by Attewell and Gerstein.

    While the creation or re-analysis of multi-site, multilevel data sets might not always be feasible

    or the best use for sparse research funds, we believe that when it is possible to develop and work with

  • 8/3/2019 Empirical Methods for Investigating Governence

    32/43

    [ 28 ]

    these types of data and methods, the advantages gained in terms of (1) a fuller and more precise

    understanding of complex, hierarchical relationships, (2) more information about the amount of variation

    explained by statistical models at different levels of analysis, and (3) increased generalizability of findings

    across different sites or organizations with varying (observable) characteristics makes the investment in

    multilevel modeling worthwhile.

    When one doesnt know how much of the total variation in the dependent variable (e.g., a

    program outcome) lies at the various levels of organization (i.e., the extent of intra-class correlation), the

    results of an individual- or higher-level OLS regression should be interpreted with considerable caution.

    As in any scientific field, research that attempts to replicate the most important findings of these studies

    is desirable, although this also becomes more challenging when data sets (and subsequently statistical

    models) are not directly comparable. Case-study or other qualitative research components can provide

    important background for the interpretation of OLS regression findings in these cases, but they typically

    do not make the findings more generalizable across a range of program or organizational contexts.

    When presenting and discussing their findings, governance researchers should be clear not only about

    what they are able to measure and explain in their models but also about the limitations on these findings

    attributable to the models, methods, and data employed.

  • 8/3/2019 Empirical Methods for Investigating Governence

    33/43

    29

    TABLE 1: Hierarchical linear and OLS models of JTPA participants first post-program

    quarter earnings outcomes (National JTPA Study data analyses)

    Earnings in first post-program quarter

    Predictors - (individual level) Hierarchical linear

    model

    OLS - individual

    level

    OLS - site level (average)

    Intercept 190.55 (0.40) 208.92 (0.51) 33.00 (0.02)

    Gender (1=male) 517.88*** (6.51) 513.19*** (6.46) -21.26*** (-2.73) -903.55

    Age 22-29 years 369.75*** (3.98) 374.64*** (4.04) 3.91 (0.52) 110.65

    Age 30-39 years 240.91** (2.36) 244.29** (2.40) 37.59*** (3.88) 967.94

    Age 40 and over years 53.84 (0.42) 57.86 (0.45) -15.17 (-1.00) 165.65

    Black -235.16** (-2.29) -239.48** (-2.35) -12.69** (-2.33) -365.47

    Hispanic -109.56 (-0.90) -133.60 (-1.11) -0.33 (-0.06) -3.55

    Divorced, widowed or separated 87.89 (1.02) 91.86 (1.07) -32.28*** (-4.53) -864.78

    No high school degree -350.57*** (-4.52) -349.11*** (-4.51) 20.70* (1.87) 929.22

    Some post high school education 360.81*** (3.58) 357.83*** (3.55) 2.41** (0.34) 41.77

    Welfare recipient at time of application -293.05*** (-3.71) -298.60*** (-3.78) -7.91 (-1.41) -425.00

    Children under age six 63.58 (0.76) 66.71 (0.79) -20.20 (-1.40) -445.01

    Employment-unemployment transition in

    year before enrollment

    -295.66*** (-3.92) -297.27*** (-3.95) 40.57*** (5.31) 2582.69

    Earnings in year before enrollment 0.09*** (9.70) 0.09*** (9.72) -0.11 (-1.23)

    Received classroom training 100.36 (1.22) 99.52 (1.22) -5.42 (-1.57) -387.26

    Received on-the-job training 388.36*** (3.58) 388.34*** (3.56) 26.41*** (3.39) 457.42

    Predictors - (site level)

    PIC is the administrative entity 446.41*** (3.60) 436.12*** (4.16) 883.00*** (4.80) 404.68

    PIC and LEO/CEO are equal partners -472.55** (-2.00) -436.75** (-2.12) 1170.10*** (3.53) 682.52

    Percent of services provided directly by

    administrative entity

    -548.28 (-1.45) -487.64 (-1.53) -259.20 (-0.56)

    Percent of performance-based contracts -650.32* (-1.91) -550.17* (-1.90) 1198.20** (2.36)

    Weight accorded to employment rate

    standard

    4260.41*** (3.21) 4188.11*** (3.75) 341.00 (0.22)

    Minimum number of standards sites

    must meet to qualify for performance

    bonuses

    21.13 (1.10) 17.33 (1.12) -2.62 (-0.09)

  • 8/3/2019 Empirical Methods for Investigating Governence

    34/43

    30

    Earnings in first post-program quarter(Table 1, continued)

    Predictors - (individual level) Hierarchical linear

    model

    OLS - individual

    level OLS - site level

    Requirement that performance bonuses

    must be used to serve highly

    disadvantaged groups

    -242.70** (-2.02) -252.56** (-2.40) -1543.80*** (-3.44) -289.46

    Southern region 433.03 (1.49) 362.93 (1.43) -1643.00*** (-2.77) -410.75

    Midwestern region 535.74*** (2.90) 538.03** (3.31) 11.50 (0.03) 3.59

    Western region 825.04** (2.22) 752.67** (2.32) -445.40 (-0.82) -113.35

    Unemployment rate 11725.86*** (2.67) 11546.00*** (3.10) 5105.00 (0.90)

    Model predicting power - percent

    of variation explained by model

    6% individual-level; 86%

    between-site

    R2

    = 11.3% R2

    = 85.4%

    Coefficient value (t-ratio in parentheses): *significant at a

  • 8/3/2019 Empirical Methods for Investigating Governence

    35/43

    31

    TABLE 2: Hierarchical linear and OLS models of JTPA participants first post-program

    year earnings outcomes (National JTPA Study data analyses)

    Earnings in first post-program year

    Predictors - (individual level) Hierarchical

    linear model

    Individual level

    OLS

    Site level OLS (average)

    Intercept 1117.10 (0.76) 1093.36 (0.77) 15892.00*** (3.52)

    Gender (1=male) 2144.18*** (7.76) 2143.20*** (7.76) -8.79 (-0.37) -373.58

    Age 22-29 years 1455.00*** (4.51) 1456.07*** (4.51) -68.84*** (-2.83) -1948.17

    Age 30-39 years 1000.99*** (2.82) 1000.36*** (2.82) -92.64*** (-2.62) -2385.48

    Age 40 and over years 397.21 (0.89) 398.69 (0.90) 6.86 (0.16) 280.71

    Black -1079.14*** (-3.04) -1081.02*** (-3.05) -24.82* (-1.79) -714.82

    Hispanic -699.64 (-1.66) -714.26* (-1.70) 4.47 (0.28) 48.14

    Divorced, widowed or separated 325.55 (1.09) 326.86 (1.09) -48.64** (-2.24) -1303.07

    No high school degree -1424.55*** (-5.29) -1423.04*** (-5.29) -42.09 (-1.22) -1889.42

    Some post high school education 1046.91*** (2.99) 1047.70*** (2.99) -65.02*** (-2.90) -1126.80

    Welfare recipient at time of application -1006.49*** (-3.67) -1012.47*** (-3.69) -49.09*** (-2.91) -2637.61

    Children under age six 496.51* (1.70) 500.15* (1.71) -69.46* (-1.67) -1530.20

    Employment-unemployment transition in

    year before enrollment

    -862.80*** (-3.30) -865.79*** (-3.31) -4.08 (-0.15) -259.73

    Earnings in year before enrollment 0.33*** (10.59) 0.33*** (10.59) 0.40 (1.30)

    Received classroom training 125.71 (0.44) 132.74 (0.47) -23.74*** (-2.52) -1696.22

    Received on-the-job training 1195.17*** (3.17) 1197.98*** (3.18) -40.20 (-1.53) -696.26

    Predictors - (site level)

    PIC is the administrative entity 1737.40*** (4.59) 1727.15*** (4.74) 1626.90*** (2.98) 745.12

    PIC and LEO/CEO are equal partners -1933.65*** (-2.61) -1949.44*** (-2.73) -438.90 (-0.56) -255.88

    Percent of services provided directly by

    administrative entity

    -2618.57** (-2.26) -2604.93** (-2.35) -564.00 (-0.43)

    Percent of performance-based contracts -2719.45*** (-2.60) -2709.02*** (-2.69) -2033.00* (-1.80)

    Weight accorded to employment rate

    standard

    15887.75*** (3.93) 15888.00*** (4.09) 15710.00*** (3.13)

    Minimum number of standards sites

    must meet to qualify for performance

    bonuses

    22.25 (0.39) 21.50 (0.40) 102.00 (1.16) 11.74

  • 8/3/2019 Empirical Methods for Investigating Governence

    36/43

    32

    Earnings in first post-program year(Table 2, continued)

    Predictors - (individual level) Hierarchical linear

    model

    Individual level

    OLS

    Site level OLS

    (average)

    Requirement that performance bonuses

    must be used to serve highly

    disadvantaged groups

    -866.66** (-2.30) -865.32** (-2.36) -1376.00 (-1.35) -258.00

    Southern region 2035.83** (2.24) 2025.88** (2.30) 3101.00** (2.15) 775.25

    Midwestern region 1936.15*** (3.33) 1940.48*** (3.44) 4367.00*** (3.99) 1364.69

    Western region 3215.92*** (2.76) 3214.16*** (2.85) 3760.00*** (2.81) 940.00

    Unemployment rate 49955.52*** (3.71) 50558.00*** (3.90) 58873.00*** (3.53)

    Model predicting power - percent

    of variation explained by model

    13% individual-level;

    97% between-site

    adjusted R2

    =

    13.2%

    adjusted R2

    =

    87.6%

    Coefficient value (t-ratio in parentheses): *significant at a

  • 8/3/2019 Empirical Methods for Investigating Governence

    37/43

    33

    Table 3: Hierarchical linear and OLS models of JTPA participants

    hourly wages at termination (study of service provider contracts)

    Hourly wage at termination

    model predictors

    Hierarchical

    linear model

    Individual level

    OLS

    Contract(or) level OLS

    Individual level variablesIntercept

    Participant characteristics

    Under age 18 years

    Age 22-29 years

    Age 30-39 years

    Age 40 years and over

    Male

    African-American

    Hispanic

    Single head of household

    Welfare recipient

    No high school degree

    Post high-school education

    College graduateMinimal work history

    Unemployed at application

    Not in labor force

    Zero earnings in pre-program year

    Training services

    Received basic/remedial education

    Received vocational training

    Received on-the-job training

    Received job search/job club

    Length of training (in months)

    Economic/environmental factors

    Percent change in employment, 1988-1989

    Percent change in employment, 1989-1990

    Percent change in employment, 1990-1991

    Percent change in employment, 1991-1992

    Percent change in employment, 1992-1993

    Contract level variablesPrivate, nonprofit contractor

    For-profit contractor

    Performance incentives in contract

    Random effect: Under age 18 years byperformance incentives in contract

    Predicting power (or percent of variation

    explained)

    2.02*** (11.28)

    -0.31*** (-4.56)

    0.34*** (4.67)

    0.37*** (4.86)

    0.51*** (5.94)

    -0.03 (-0.86)

    -0.10** (-2.26)

    -0.01 (-0.09)

    0.11** (1.96)

    -0.17*** (-4.10)

    -0.15*** (-2.80)

    0.13*** (2.50)

    0.32 (1.48)0.03 (0.60)

    -0.30*** (-4.17)

    -0.54*** (-7.00)

    -0.43*** (-8.96)

    -0.10 (-1.54)

    0.41*** (5.91)

    1.40*** (17.53)

    -0.02 (-0.16)

    -0.09*** (-17.14)

    -8.68*** (-3.83)

    2.22*** (9.67)

    0.26 (0.94)

    32.01* (1.87)

    51.37 (1.17)

    0.29* (1.84)

    0.77*** (4.10)

    0.34*** (2.72)

    0.30*** (2.67)

    9.0% (individual)

    68.0% (contract)

    2.37*** (20.77)

    -0.53*** (-9.23)

    0.63*** (9.94)

    0.68*** (10.12)

    1.02** (15.50)

    -0.05 (-1.33)

    -0.11** (-2.53)

    -0.02 (-0.25)

    0.09 (1.61)

    -0.20*** (-4.83)

    -0.02 (-0.34)

    0.20*** (3.61)

    0.63*** (2.85)-0.03 (-0.65)

    -0.28*** (-3.76)

    -0.68*** (-8.65)

    -0.67*** (-14.72)

    -0.14*** (-2.96)

    0.39*** (8.31)

    1.35*** (24.58)

    0.09 (0.91)

    -0.09*** (-19.27)

    -16.28*** (-7.56)

    2.26*** (11.18)

    0.70*** (4.08)

    24.35** (2.16)

    1.63 (0.07)

    0.10* (1.75)

    0.40*** (6.12)

    0.32*** (7.11)

    n.a.

    adjusted R2=

    34.7%

    2.85*** (3.30)

    -0.001 (-0.42) -0.04

    0.012*** (2.78) 0.21

    0.019*** (3.94) 0.28

    0.022*** (5.40) 0.29

    -0.005 (-1.44) -0.26

    0.002 (0.57) 0.12

    0.004 (0.84) 0.04

    -0.008 (-1.53) -0.15

    -0.007* (-1.66) -0.20

    0.001 (0.27) 0.02

    0.002 (0.42) 0.03

    -0.003 (-0.22) -0.0030.001 (0.43) 0.05

    -0.006 (-0.75) -0.31

    -0.014* (-1.79) -0.62

    -0.009*** (-2.99) -0.36

    -0.005** (-2.24) 0.10

    0.001 (0.59) 0.04

    0.022*** (7.06) 0.31

    0.013*** (3.18) 0.07

    -0.092*** (-3.16) -0.48

    -22.75 (-1.24)

    2.98** (2.32)

    1.59* (1.75)

    73.01 (1.16)

    -84.47 (-0.96)

    0.23 (1.17) 0.15

    0.53** (2.18) 0.11

    0.13 (0.78) 0.09

    n.a.

    adjusted R2=

    68.2%

    Coefficient value (t-ratio in parentheses): *significant at a

  • 8/3/2019 Empirical Methods for Investigating Governence

    38/43

    34

    Table 4: Hierarchical linear and OLS models of JTPA participants first post-program

    quarter earnings outcomes (study of service provider contracts)

    First post-program quarter earnings

    model predictors

    Hierarchical

    linear model

    Individual level

    OLS

    Contract(or) level OLS

    Individual level variablesIntercept

    Participant characteristics

    Under age 18 years

    Age 22-29 years

    Age 30-39 years

    Age 40 years and over

    Male

    African-American

    Hispanic

    Single head of household

    Welfare recipient

    No high school degree

    Post high-school education

    College graduateMinimal work history

    Unemployed at application

    Not in labor force

    Zero earnings in pre-program year

    Training services

    Received basic/remedial education

    Received vocational training

    Received on-the-job training

    Received job search/job club

    Length of training (in months)

    Economic/environmental factorsPercent change in employment, 1988-1989

    Percent change in employment, 1989-1990

    Percent change in employment, 1990-1991

    Percent change in employment, 1991-1992

    Percent change in employment, 1992-1993

    Contract level variablesPrivate, nonprofit contractor

    For-profit contractor

    Performance incentives in contract

    Predicting power (or percent of variation

    explained)

    1367*** (19.03)

    -145*** (-4.61)

    238*** (6.15)

    339*** (8.41)

    315*** (7.33)

    8 (0.39)

    -114*** (-4.72)

    49 (1.29)

    29 (0.95)

    -93*** (-4.17)

    -68** (-2.39)

    -15 (-0.50)

    -94 (-0.96)

    -87*** (-4.04)

    -250** (-6.38)

    -411*** (-9.71)

    -513*** (-21.24)

    -39 (-1.37)

    38 (1.15)

    433*** (11.23)

    183*** (3.07)

    -18*** (-7.16)

    -2169* (-1.72)

    540*** (4.63)

    -52 (-0.45)

    602 (0.08)

    -12194 (-0.71)

    -53 (-1.07)

    55 (0.91)

    86** (2.11)

    9.0% (individual)

    87.0% (contract)

    1427*** (23.65)

    -150*** (-4.94)

    248*** (7.11)

    341*** (9.29)

    280*** (7.85)

    -10 (-0.53)

    -113*** (-5.14)

    78** (2.18)

    69** (2.30)

    -77*** (-3.52)

    -82*** (-3.06)

    -0.2 (-0.007)

    -59 (-0.61)

    -90*** (-4.38)

    -270*** (-6.87)

    -442*** (-10.53)

    -574*** (-24.31)

    -89*** (-3.87)

    78*** (3.07)

    513*** (17.02)

    217*** (4.24)

    -16*** (-6.92)

    -2055* (1.88)

    442*** (4.46)

    -11 (-0.13)

    3784 (0.70)

    -22309* (-1.85)

    -26 (-0.90)

    48 (1.43)

    53** (2.32)

    adjusted R2=

    33.4%

    2352.9*** (4.81)

    2.0 (0.89) 79

    4.5* (1.91) 79

    13.0*** (4.51) 189

    10.4*** (4.87) 136

    -3.2 (1.56) 166

    -4.1*** (-2.93) -247

    -1.2 (-0.49) 11

    1.0 (0.36) 19

    -2.5 (-1.14) -72

    1.1 (0.63) -19

    -10.8*** (3.32) -156

    -6.0 (0.95) -6

    0.7 (0.54) 33

    -8.3* (-1.86) -430

    -12.0*** (-2.79) -529

    -7.3*** (-4.77) -292

    -1.9* (-1.84) -37

    -0.4 (-0.35) -18

    13.2*** (7.97) 187

    6.7*** (2.59) 34

    -17.1 (-1.13) -89

    7358.9 (0.76)

    235.2 (0.37)

    -241.0 (-0.54)

    11390 (0.38)

    -61197 (-1.35)

    -179.3* (-1.81) -114

    -113.1 (-0.92) -24

    97.4 (1.12) 65

    adjusted R2=

    66.1%

    Coefficient value (t-ratio in parentheses): *significant at a

  • 8/3/2019 Empirical Methods for Investigating Governence

    39/43

    35

    Table 5: Hierarchical linear and OLS models of JTPA participants pre- to post-program

    quarterlyearnings change outcomes (study of service provider contracts)

    Pre- to post-program quarterly

    earnings change model predictors

    Hierarchical

    linear model

    Individual level

    OLS Contract(or) level OLS

    Individual level variablesIntercept

    Participant characteristics

    Under age 18 years

    Age 22-29 years

    Age 30-39 years

    Age 40 years and over

    Male

    African-American

    Hispanic

    Single head of household

    Welfare recipient

    No high school degree

    Post high-school education

    College graduateMinimal work history

    Unemployed at application

    Not in labor force

    Zero earnings in pre-program year

    Training services

    Received basic/remedial education

    Received vocational training

    Received on-the-job training

    Received job search/job club

    Length of training (in months)

    Economic/environmental factorsPercent change in employment, 1988-1989

    Percent change in employment, 1989-1990

    Percent change in employment, 1990-1991

    Percent change in employment, 1991-1992

    Percent change in employment, 1992-1993

    Contract level variablesPrivate, nonprofit contractor

    For-profit contractor

    Performance incentives in contract

    Predicting power (or percent of variationexplained)

    447*** (6.76)

    -79*** (-2.64)

    -10 (-0.26)

    -75* (-1.93)

    -75* (-1.83)

    -7 (-0.37)

    -117*** (-5.00)

    10 (0.27)

    85*** (2.84)

    25 (1.16)

    -1 (-0.03)

    -44 (-1.46)

    -148 (-1.44)56*** (2.68)

    -274*** (-7.08)

    -311*** (-7.52)

    111*** (4.69)

    -65*** (-2.48)

    -9 (-0.31)

    206*** (5.62)

    21 (0.36)

    -7*** (-2.80)

    -3169*** (-2.58)

    290*** (2.60)

    -304*** (-2.83)

    -20042*** (-2.81)

    -43810*** (-2.81)

    -15 (-0.36)

    25 (0.49)

    163*** (4.70)

    2.0% (individual)

    49.0% (contract)

    497*** (8.51)

    -108*** (-3.72)

    -12 (-0.35)

    -85** (-2.36)

    -106*** (-3.00)

    -14 (-0.74)

    -100** (-4.62)

    22 (0.64)

    114*** (3.91)

    32 (1.53)

    -6 (-0.25)

    -46 (-1.56)

    -146 (-1.42)49** (2.44)

    -284*** (-7.36)

    -321*** (-7.88)

    61*** (2.65)

    -78*** (-3.56)

    51** (2.05)

    239*** (8.03)

    58 (1.12)

    -7*** (-3.27)

    -4779*** (-4.39)

    287*** (2.95)

    -213*** (-2.69)

    -10025* (-1.92)

    -18981* (-1.61)

    -15 (-0.54)

    17 (0.51)

    123*** (5.53)

    adjusted R2=

    5.3%

    2343.7*** (3.74)

    -0.8 (-0.29) -32

    1.5 (0.49) 26

    -2.5 (-0.67) -36

    -3.4 (-1.24) -44

    -3.0 (-1.15) -155

    -3.8** (-2.13) -229

    -8.1*** (-2.57) -74

    1.5 (0.43) 28

    1.0 (0.37) 29

    1.3 (0.65) 23

    -20.7*** (-4.95) -298

    1.2 (0.16) 11.9 (1.09) 90

    -15.5*** (-2.70) -802

    -15.5*** (-2.81) -684

    -1.0 (-0.52) -40

    0.3 (0.25) 6

    2.5* (1.69) 119

    13.6*** (6.37) 193

    3.7 (1.13) 19

    -43.2** (-2.23) -226

    -17066 (-1.37)

    410.9 (0.51)

    -526.4 (-0.94)

    15988 (0.41)

    40950 (0.71)

    -254.6** (-2.01) -162

    -412.5*** (-2.62) -87

    271.7** (2.44) 181

    adjusted R2=

    65.4%

    Coefficient value (t-ratio in parentheses): *significant at a

  • 8/3/2019 Empirical Methods for Investigating Governence

    40/43

    [ 36 ]

    REFERENCES

    Arum, Richard, Do Private Schools Force Public Schools to Compete?American Sociological

    Review 61:1 (February 1996): 29-46.

    Attewell, Paul and Dean R. Gerstein, Government Policy and Local Practice,American Sociological

    Review 44 (April 1979): 311-327.

    Bryk, Anthony, Stephen Raudenbush and Richard Congdon,Hierarchical Linear and Nonlinear

    Modeling with the HLM/2L and HLM/3L Program, Chicago: Scientific Software International, 1999.

    Bryk, Anthony S. and Raudenbush, Stephen W.,Hierarchical Linear Models: Applications and

    Data Analysis Methods, London: Sage Publications, 1992.

    Bryk, Anthony S. and Raudenbush, Stephen W., On Heterogeneity of Variance in ExperimentalStudies: A Challenge to Conventional Interpretations, Psychological Bulletin, 104:3 (1988): 396-

    404.

    Bryk, Anthony S. and Raudenbush, Stephen W., Application of Hierarchical Linear Models to

    Assessing Change, Psychological Bulletin, 101:1 (1987): 147-158.

    DAunno, Thomas, Robert I. Sutton, and Richard H. Price, Isomorphism and External Support in

    Conflicting Institutional Environments: A Study of Drug Abuse Treatment Units,Academy of

    Management Journal34:3 (1991): 636-661.

    Ferguson, Ronald F., Paying for Public Education: New Evidence on How and Why Money Matters,

    Harvard Journal on Legislation 28 (1991): 465-498.

    Fletcher, Bennet W., Frank M. Tims, and Barry S. Brown, Drug Abuse Treatment Outcome Study

    (DATOS): Treatment Evaluation Research in the United States, Psychology of Addictive Behaviors

    11:4 (1997): 216-229.

    Gerstein, Dean R., A. Rupa Datta, Julia S. Ingels, Robert A. Johnson, Kenneth A. Rasinski, Sam

    Schildhaus, Kristine Talley, Kathleen Jordan, Dane B. Phillips, Donald W. Anderson, Ward G.

    Condelli, and James S. Collins, Final Report: National Treatment Improvement Evaluation Study,U.S. Department of Health and Human Services, March 1997.

    Goldstein, Harvey, and S. Thomas, Using Examination Results as Indicators of School and College

    Performance,Journal of the Royal Statistical Society, 159:1 (1996): 149-163.

  • 8/3/2019 Empirical Methods for Investigating Governence

    41/43

    [ 37 ]

    Goldstein, Harvey,Multilevel Statistical Models, New York: Halsted Press, 1995.

    Goldstein, Harvey, Statistical Information and the Measurement of Education Outcomes,Journal of

    the Royal Statistical Society, 155 (1992): 313-315.

    Goldstein, Harvey, Models for Multilevel Response Variables with an Application to Growth Curves,in R.D. Bock (Ed.),Multilevel Analysis of Educational Data, New York: Academic Press, 1989.

    Goldstein, Harvey,Multilevel Models in Educational and Social Research, London: Oxford

    University Press, 1987.

    Goldstein, Harvey, Multilevel mixed linear model analysis using iterative generalized least squares,

    Biometrika, 73 (1986): 43-56.

    Gray, J., D. Jesson, Harvey Goldstein and J. Rasbash, A Multilevel Analysis of School Improvement:

    Changes in Schools Performance Over Time, School Effectiveness and School Improvement, 6(1995): 97-114.

    Heckman, James J., LaLonde, Robert J. and Jeffrey A. Smith, The Economics of Econometrics and

    Active Labor Market Programs, Prepared for the Handbook of Labor Economics, Volume III, Orley

    Ashenfelter and David Card, editors.

    Heckman, James J., Carolyn J. Heinrich and Jeffrey A. Smith. "Assessing the Performance of

    Performance Standards in Public Bureaucracies." American Economic Review, 87:2 (1997): 389-

    395.

    Heinrich, Carolyn J. and Laurence E. Lynn, Jr., "Governance and Performance: The Influence of

    Program Structure and Management on Job Training Partnership Act (JTPA) Program Outcomes,"

    presented at the Workshop on Models and Methods for the Empirical Study of Governance, University

    of Arizona, April 29-May 1, 1999.

    Heinrich, Carolyn J., "Organizational Form and Performance: An Empirical Investigation of Nonprofit

    and For-profit Job-training Service Providers," working paper, National Bureau of Economic Research

    and The University of Chicago, 1998.

    Jennings, Edwart T., Building Bridges in the Intergovernmental Arena: Coordinating Employment andTraining Programs in the American States, Public Administration Review 54:1 (January/ February

    1994): 52-60.

    Jennings, Edwart T. and JoAnn Gomer Ewalt, Interorganizational Coordination, Administrative

    Consolidation and Policy Performance, Public Administration Review 58:5 (September/October

  • 8/3/2019 Empirical Methods for Investigating Governence

    42/43

    [ 38 ]

    1998): 417-28.

    Kreft, Ita G., Are Multilevel Techniques Necessary? An Overview, Including Simulation Studies,

    unpublished manuscript, California State University, Los Angeles, 1996.

    Kreft, Ita G. and Pamela R. Aschbacher, Measurement and Evaluation Issues in Education: The Valueof Multivariate Techniques in Evaluating An Innovative High School Reform Program,International

    Journal of Educational Research 21 (1994): 181-196.

    Lynn, Laurence E., Jr., Heinrich, Carolyn J., and Hill, Carolyn J., The Empirical Study of

    Governance: Theories, Models, and Methods, Georgetown University Press, forthcoming, 2000.

    Mead, Lawrence M., Optimizing JOBS: Evaluation Versus Administration, Public Administration

    Review 57:2 (March/April 1997): 113-123.

    Mead, Lawrence M., Performance Analysis, Unpublished manuscript, New York University, 1999.

    Mead, Lawrence M., The Decline of Welfare in Wisconsin,Journal of Public Administration

    Research and Theory, forthcoming.

    Meier, Kenneth J., Bureaucracy and Democracy: The Case for More Bureaucracy and Less

    Democracy, Public Administration Review 57:3 (May/June 1997): 193-99.

    Meier, Kenneth J. and Joseph Stewart. The Impact of Representative Bureaucracies: Educational

    Systems and Public Policies,American Review of Public Administration, 22:3 (September 1992):

    157-71

    Meier, Kenneth J., Joseph Stewart and Robert E. England. The Politics of Bureaucratic Discretion:

    Educational Access as an Urban Service,American Journal of Political Science 35:1 (1991): 155-

    177.

    Meyer, Alan D. and James B. Goes, How Organizations Adopt and Implement New Technologies,

    Best Papers Proceedings Academy of Management (Forty-seventh Annual Meeting of the Academy

    of Management, New Orleans, Lousiana, August 9-12, 1987), pp. 175-179.

    Milward, H. Brinton and Provan, Keith G., "Governing Service Provider Networks," Presented atEGOS 14th Colloquium, Maastricht University, The Netherlands, 1998.

    Roderick, Melissa, Evaluating Chicagos Efforts to End Social Promotion, Presented at the

    Workshop on Models and Methods for the Empirical Study of Governance, University of Arizona,

    April 29-May 1, 1999.

  • 8/3/2019 Empirical Methods for Investigating Governence

    43/43

    Roderick, Melissa and Eric Camburn, Risk and Recovery: Course Failures in the Early Years of High

    School, Unpublished Manuscript, January 1997.

    Sandfort, Jodi, The Structural Impediments to Front-line Human Service Collaboration: The Case of

    Welfare Reform, Presented at the Annual Meeting of the American Political Science Association,Boston, September, 1998.

    Singer, Judith D., Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and

    Individual Growth Models,Journal of Educational and Behavioral Statistics, forthcoming.

    Smith, Kevin B. and Kenneth J. Meier, Politics, Bureaucrats and Schools, Public Administration

    Review 54:4 (November/December 1994): 551-558.