abma, tineke a. abt associates - sage … tineke a. (b. 1964, joure, holland). ph.d. institute for...

� ABMA, TINEKE A.

(b. 1964, Joure, Holland). Ph.D. Institute for HealthCare Policy and Management, Erasmus University ofRotterdam; M.A. Health Care Administration, ErasmusUniversity, Rotterdam; B.Sc. Nursing, Health CareAcademy of Groningen.

Since 1990, Abma has worked at the Institutefor Health Care Policy and Management, ErasmusUniversity, Rotterdam, and is affiliated with theInstitute for Health Ethics, University of Maastricht.She has been involved in numerous evaluations,including responsive evaluations of palliative careprograms for cancer patients, rehabilitation programsfor psychiatric patients, quality of care improvementin coercion and constraint in psychiatry, and injuryprevention programs for students in the perform-ing arts.

Her primary contribution to the field of evaluationis the exploration of how narrative and dialogue cancreate responsive evaluation approaches that strengthenstakeholders’ contribution to policy making in trans-formational health-care programs. Social construc-tionist and postmodern thought, hermeneutics, andthe ethics of care inform her evaluation practice.Primary influences on her evaluation work includeRobert Stake, Egon Guba, Yvonna Lincoln, ThomasSchwandt, Jennifer Greene, Zygmunt Bauman, RosiBraidotti, Kenneth Gergen, Sheila McNammee, DianHosking, Hans-Georg Gadamer, Guy Widdershoven,Joan Tronto, and Margret Walker.

Abma edited the books Telling Tales, OnEvaluation and Narrative, Dialogue in Evaluation,a special issue of the journal Evaluation, and

“Responsive Evaluation,” an issue of New Directionsin Evaluation. Her dissertation was nominated for theVan Poelje Prize by the Dutch Association for PublicAdministration.

� ABT ASSOCIATES

Founded by Clark Abt in 1965, Abt Associates isone of the world’s largest for-profit research firms,with over 1000 employees located in nine corpo-rate and 25 project offices around the world. AbtAssociates applies rigorous research techniques toinvestigate a wide range of issues relating to socialand economic policy formulation, international devel-opment, and business research. Abt clients are foundin all levels of government, business, and industry, aswell as nonprofit organizations and foundations. AbtAssociates is one of the 100 largest employee-ownedcompanies in the United States, with gross revenuesexceeding $184 million in fiscal year 2002.

—Jeffrey G. Tucker

� ACCESSIBILITY

Accessibility is a common criterion in evaluation,especially of programs, services, products, and infor-mation, when the target audience (such as disabledindividuals) or the intervention (such as online teach-ing or Web pages) is presumed to present specialchallenges of access. These two factors often coincidein evaluations, as for example in evaluating Web-based teaching for persons with disabilities. Many

1

A-Mathison.qxd 9/9/2004 6:13 PM

government agencies have developed guidelines fordetermining accessibility of a range of services.

� ACCOUNTABILITY

Accountability is a state of, or a process for, holdingsomeone to account to someone else for something—that is, being required to justify or explain what hasbeen done. Although accountability is frequentlygiven as a rationale for doing evaluation, there is con-siderable variation in who is required to answer towhom, concerning what, through what means, andwith what consequences. More important, within thisrange of options, the ways in which evaluation is usedfor accountability are frequently so poorly conceivedand executed that they are likely to be dysfunctionalfor programs and organizations.

In its narrowest and most common form, account-ability focuses on simple justification by requiringprogram managers to report back to funders (eitherseparate organizations or the decision makers withintheir own organization) on their performance com-pared to agreed plans and targets.

In theory, this sounds attractive. It seems likelythat such a system will contribute to good outcomesfor programs and organizations through providingan incentive system that encourages managers andstaff to focus on and achieve better performance andthrough providing information for decision makersthat will enable them to reward and maintain goodperformance and intervene in cases of poor perfor-mance. In practice, as has been repeatedly found,many systems of accountability of this type are subjectto several forms of corruption and hence are likelyto reduce the sense of responsibility for and quality ofperformance.

The most common problem is a too-narrow focuson justification through meeting agreed targets forservice delivery outputs. In organizations in whichthis is the case, and in which, additionally, rewardsand sanctions for individuals and organizations aretightly tied to the achievement of pre-established tar-gets, goal displacement is highly likely (in goal dis-placement, people seek to achieve the target even atthe expense of no longer achieving the objective). Themost notorious example comes from the Vietnam War,during which the emphasis on body counts, used as aproxy for success in battles, led to increased killing ofcivilians in one-sided and strategically unimportant

battles. Public sector examples of this sort of problemabound, but there are also many private sector examplesin which senior managers have been rewarded hand-somely for achieving specific targets at the cost of thelong-term viability of the company.

Another common effect is that what gets measuredgets done, as intended, but what is not measured is nolonger valued or encouraged. A program may turn to“creaming”: selecting easier clients so that targets ofthroughput or outcomes can be achieved, at the cost ofreduced access for those who most need the service.Other important values for the organization, such ascooperation across different units of the organization,may no longer be encouraged because of the empha-sis on achieving one’s own targets. Finally, there aremany reported cases in which such a system encour-ages data corruption: Reported outcomes are exagger-ated or modified to match targets.

Disquiet about the effects of this sort of evaluationis at the heart of concerns about high-stakes testing ofchildren in schools, in which case serious sanctionsfor children, teachers, and schools follow poor perfor-mance in standardized tests.

Even if rewards and sanctions are not tightly tiedto these forms of accountability—that is, there are fewconsequences built into the accountability system—other problems arise: cynicism about the value of mon-itoring and evaluation and the commitment of decisionmakers to reasonable decision-making processes.

What are the alternatives? It is not necessary to aban-don the notion of being accountable for what has beendone but to return to the meaning and focus on systemsthat both justify and explain what has been done. Thisrequires careful consideration of who is being heldaccountable, to whom, for what, how, and with whatconsequences. More thoughtful and comprehensiveapproaches to accountability should demonstrably sup-port good performance and encourage responsibility.Some have referred to this as smart accountability.

The first issue to consider is who is being heldaccountable. Those being held accountable are mostoften program managers but could and possibly shouldinclude staff, senior managers, and politicians. Politi-cians often claim that their accountability is enactedthrough elections, but these are clearly imperfectbecause they are infrequent, involve multiple issues, andoften reflect party allegiances rather than responses tospecific issues.

The second issue is to whom these parties are beingheld accountable and how. They are most often held

2———Accountability


accountable to those making funding decisions butcould and possibly should be accountable to the com-munity, citizens, service users, consumer advocacygroups, taxpayers, relevant professions, and interna-tional organizations for compliance with internationalagreements and conventions. These different audi-ences for accountability have been labeled upwardsaccountability and outwards accountability, respec-tively. Parties are most often held accountable throughperformance indicators shown in regular reports tofunders, but may also be held accountable throughdiverse methods such as annual reports, spot inspec-tions, detailed commentaries on performance indica-tors, or public meetings and reporting.

Real accountability to citizens would involvemaking appropriate information accessible to citizens,together with some process for feedback and conse-quences. Annual reports from government depart-ments, corporate entities, and projects are one methodfor providing this information but are usually usedinstead as a public relations exercise, highlighting thepositive and downplaying problems. The informationavailability that has been produced by the Internethas created many more opportunities for reportinginformation to citizens who have ready access to theInternet if there is a central agency willing to providethe information. So, for example, in Massachusetts,parents seeking to choose a residential neighborhood,and hence a schooling system, can access detailedperformance indicators comparing resources spenton education, demographic characteristics, activitiesundertaken, and test scores in different neighborhoodsto help inform their decision. However, there remaindifficulties for parents in synthesizing, interpreting,and applying this information to decisions about whatwould be best for their particular children. Withoutappropriate analysis and interpretation, there are risksthat clients and taxpayers will draw dangerouslywrong conclusions. For example, some senior bureau-crats have advocated making surgeons accountable tothe public by publishing their rates for complicationand death. The obvious problem is that without suit-able adjustment for various factors, those surgeonswould appear less effective who treated the mostsevere cases or patients with the poorest health.

The third aspect that needs to be addressed is thatfor which people are being held accountable. Giventhe concerns outlined so far, it is clear that considerationmust be given to more than simply meeting targets.Other aspects of performance that may need to be

included are coverage (matching actual clients withintended target groups), treatment (providing servicesin the agreed way and in the agreed amount), fiscalmanagement (spending the money on the agreedinputs, proper controls against fraud), and legal com-pliance (ensuring procedural adherence to laws, poli-cies, and regulations). It is not reasonable, of course,to expect programs to be able to simultaneously meetunilaterally set targets for all of these. An examplewould be schools or hospitals that are expected tomeet standards for open access to all cases, legalrequirements about curriculum or standards of care,and fiscal targets linked to reduced budgets and arepunished for not meeting the same outcomes forstudents or patients as organizations with fewercompeting accountabilities.

Accountability needs to be understood not justin terms of reporting compliance and meeting targetsbut in terms of explaining and justifying legitimatevariations, including necessary trade-offs betweencompeting imperatives and accountability. Easy achieve-ment of a timid target can easily be reported. Under-standable and legitimate differences between the targetand actual performance will require space and a sym-pathetic audience for a more detailed explanation.

Accountability also needs to go beyond those out-comes that are directly under the control of thosewho are being held accountable—for example,employment outcomes of school students or long-term family outcomes for children in foster care.Although it is not reasonable to suggest that theperformance of these programs and managers shouldbe assessed only by these long-term outcomes, it isalso unreasonable to base it only on the completion ofunits of service that they can totally control. A betterform of accountability expects them to be aware of thesubsequent causal chain and to be actively seeking tohave a beneficial effect on it or to be redeveloping theprogram so that it is more likely to do so.

The final and most important aspect of an account-ability system is the consequences for those providingthe reports. Reputational accountability and marketaccountability, where adverse performance can affectcredibility and market pressures, respectively, dependon evidence of performance being available to bescrutinized by relevant parties. More usually, account-ability systems focus on reporting discrepancies betweentargets and performance to funders, the assumptionbeing that they will use this information in future fund-ing and policy decisions. However, accountability

Accountability———3


systems rarely provide sufficient information to makeit possible for funders to decide if such discrepanciesshould be followed by decreased funding (as a sanc-tion), increased funding (to improve the quality orquantity of services being provided), or termination ofthe function.

Accountability requires a much more comprehen-sive explanation of performance, an incentive systemthat encourages improvement of performance ratherthan misreport and distortion of it, and a commitmentto address learning as well as accountability. In otherwords, accountability systems need to be a tool forinformed judgment and management rather than asubstitute. This is the smart accountability that hasbeen increasingly advocated.

Smart accountability includes demonstratingresponsible, informed management; including appro-priate risk management, such as cautious trials ofdifficult or new approaches; and a commitment toidentify and learn from both successes and mistakes.The incentive system for accountability needs toreward intelligent failure (competent implementationof something that has since been found not to work),discourage setting easy targets, discourage simplyreporting compliance with processes or targets, andencourage seeking out tough criticism.

The acid test of a good accountability system isthat it encourages responsibility and promotes betterperformance.

—Patricia J. Rogers

Further ReadingAmerican Evaluation Association. (2002). American Evaluation

Association position statement on high stakes testing in preK-12education. Retrieved April 23, 2004, from http://www.eval.org/hst3.htm

Mayne, J. (2001). Science, research, & development: Evaluationworkshops. Seminar 1: Performance measurement. RetrievedApril 23, 2004, from http://www.dpi.vic.gov.au/dpi/nrensr.nsf/FID/-341A311BA4C565A7CA256C090017CA16? OpenDocument#1

Perrin, B. (1999). Performance measurement: Does the realitymatch the rhetoric? American Journal of Evaluation, 20(1),101-114.

� ACCREDITATION

Accreditation is a process and an outcome, and it existsin two main forms. First, there is accreditation offeredby an institution or awarding body to individuals on the

basis of academic or training credit already gainedwithin the institution or as a direct result of prior or cur-rent learning. Second, there is accreditation sought byan institution from an outside awarding body throughsome form of formal professional review.

The processes of accreditation make crediblethe autonomous privilege of an organization or bodyto confer academic or training awards. Furthermore,accreditation is commonly used to form new programsor to open opportunities for a wider adoption of coursesleading to the confirmation of awards. In the contextof wider participation in education and training, it isnot unusual to hear questions about whether academicstandards are the same across a nation or state or com-parable between institutions capable of awarding aca-demic or training credit. The idea here is not to pointthe finger at any single phase of education or the qual-ity assurance process but to address difficult questionsopenly. The reality is a raft of issues concerned withcomparing and formally evaluating standards ofattainment.

STANDARDS AND PHILOSOPHY

The process of accreditation is a complex course ofaction that attempts to evaluate standards of attain-ment. Indeed, the definition and evaluation of attain-able, objective standards relevant to the disciplineor domain are often part of the expressed goals foraccreditation. This argument may appear to ascribegreater worth to evaluating the outcomes of a trainingprocess rather than understanding the kinds of per-sonal change and development expected during thetraining. However, whether we are outcome or processdriven, it is important to know what principles andideals drive accreditation.

As suggested in the foregoing argument, a philoso-phy of accreditation is more often determined by thedefinition of the concerns and principles underpinningthe process. Focusing on such principles should dimi-nish any undue reliance on evaluation of course out-comes as a single focus of assessment. Indeed,confidence in a single overriding criterion for assess-ment can create a negative model of noncompliance.For example, close matching of prior experientiallearning-to-learning outcomes of particular moduleswithin a program can capture some evidence of equiv-alence of experience, but it is less robust as evidenceof corresponding learning. This approach can result in

4———Accreditation


the accreditation of the matching process rather thanlearning and leads us to ask the question, “Are wemeasuring the comparability of standards, the qualityof learning, or the similarity of experience?”

ACCREDITED LEARNING

Accreditation of students’ or trainees’ learning canapply where awards are credit rated, with all programand award routes leading to awards offered within theinstitution, calibrated for credit as an integral part ofthe validation process. The outcome of this accredi-tation process is credit, a recognizable educational“currency” that students can seek to transfer from oneaward to another, from one program to another, andfrom one institution to another. Although credit can betransferred, the marks achieved by the student are notusually transferable.

Accredited learning is defined as formal learning,including learning assessed and credit rated or certifi-cated by the institution or an external institution orsimilar awarding body and learning that has not beenassessed but that is capable of assessment for thepurpose of awarding credit.

Credit gained in the context of a named award maybe transferred to another named award. However,credit transfer across named awards is not automatic.Transfer of credit from one award route to another isdependent on the learning outcomes being deemedby the institution as valid for the new award. To berecognized as contributing credit to an award, theevidence of the accredited learning must be capable ofdemonstrating the following:

• Authenticity, by evidence that the applicant com-pleted what was claimed

• Direct comparison, by evidence of a matching of thelearning outcomes with those expected of compara-ble specified modules approved by the university forthe award sought

• Currency, by evidence that the learning achieved is inkeeping with expectations of knowledge current inthe area of expertise required

• Accreditation of experiential learning

Experiential learning is defined as learning achievedthrough experience gained by an individual outside for-malized learning arrangements. An applicant may applyfor the award of credit on the evidence of experiential

learning. Such evidence must be capable of assessmentand of being matched against the learning outcomes ofthe program for which the applicant is seeking credit.

The normal forms of assessment for the accredita-tion of experiential learning include the following:

• A structured portfolio with written commentary andsupporting evidence

• A structured interview plus corroborating evidence• Work-based observation plus a portfolio or other

record• Assignments or examinations set for relevant,

approved modules or units

FORMAL PROFESSIONAL REVIEW

Institutions that are intent on gaining accreditationmust freely and autonomously request a formal evalu-ation from an awarding organization or body. Accre-ditation involves an external audit of the institution’sability to provide a service of high quality by compar-ing outcomes to a defined standard of practice, whichis confirmed by peer review.

Current accreditation arrangements rely on theassumption that only bona fide members of a profes-sion should judge the activities of their peers, andby criteria largely or wholly defined by members ofthe profession. Historically, there are some interestingexamples of nonexpert examination of the profes-sional practices of institutions (for example, Flexner’sexamination of medical schools in the United Statesand Canada in the early 1900s), but this form of layreview is not at all typical of the way modern forms offormal review for accreditation have grown up.

Rather, accreditation of an institution has largelybecome a formal process of program evaluation bypeers, which, if successful, testifies that an institutionor awarding body:

• has a purpose appropriate to the phase of education ordomain of training

• has physical and human resources, teaching schemes,assessment structures, and support services sufficientto realize that purpose on a continuing basis

• maintains clearly specified educational objectivesthat are consistent with its mission

• is successful in achieving its stated objectives• is able to provide evidence that it is accomplishing its

mission

Accreditation———5


The requirements, policies, processes, procedures,and decisions of accreditation are predicated ona full commitment to integrity and on institutionsdedicating themselves to being learning societies,capable of inclusive and democratic activity. Theexistence of a single accreditation process in a varietyof learning contexts requires adherence to agreed-onevaluation criteria within a common framework offormal program evaluation, and this may not alwaysbe possible or desirable. Indeed, it is the develop-ment of an adequate accreditation design that is thesingle most important activity in matching theaccreditation process to the context to which it isbeing applied.

FRAMEWORK OF FORMALPROFESSIONAL REVIEW

The development of an adequate accreditation designis best placed within a conceptual framework offormal professional review underpinned by a commonvalue system and an agreed-on set of criteria. A modelof formal professional review is shown in Figure 1.

Values

Underpinning the review process is an agreed-on setof fundamental principles and values relevant to theawarding body and the institution under review, andthese principles and values are shared by all. Forexample, one of the values might be that the highest dutyof all concerned is to preserve life and to labor to improvethe quality of life. This is a foundation from whichthe right to award accreditation is offered and sought.

Without such foundations, the review process might wellbecome a somewhat arid inspection of organizationalstructures, course content, and assessment practices.

Criteria

The formal review is structured around a set ofrequirements made specific in the form of observablecriteria that are grouped in appropriate ways and listedin a convenient form for reference and substantiation.These criteria are embedded in the structure of the valuesystem underpinning the common concern and existwholly within the parameters of the shared principles.The formulation of criteria affects the organization of thereview and the nature of how accordance is recorded.

Comparison

A comparison is made (Figure 2) between twoestates of accreditation: institutional or individualpractice and the standards of the awarding body.Comparisons are made by observation of practice,scrutiny of documentation, and engagement in profes-sional discourse. These processes lie fully within theagreed-on criteria, so when contextual and other inter-esting matters arise, these are not carried forward intothe next stage. The extent to which there is an overlapbetween the two estates provides initial findings forthe assessment.

Evaluation

Reviewers consider the nature and content of theoverlap between the two estates and how that mightaffect their overall judgment. Value is ascribed

6———Accreditation

Impact

Moderation

Evaluation

Comparison

Criteria

ValuesConsistency

Integration

Figure 1 Framework of Formal Professional Review

Standardsof an

awardingbody

Institutional orindividual

practice andoutcomes

Initial findings for theassessment

Figure 2 Estates of Accreditation


and evidence is reviewed to discriminate betweendiscernment, opinion, and critique. Some elementsof the comparison may be eliminated from the evalu-ation process as not pertinent in this context (whetheraccreditation should be awarded). An argument couldbe made that evaluation is best carried out by peerprofessionals, who bring expert experience, knowl-edge, and understanding to the process, and that thisexpertise goes beyond commonsense approaches.

Moderation

In this part of the process, the reviewers affirm theirjudgments and authenticate their findings by ground-ing them back into the evidence base, verifying thesources of their evidence, and ensuring the rigor of theprocess that has taken place. If thought necessary ormade necessary by some sampling arrangement, alter-native opinions are sought and compared with thesubstantive findings and outcomes. Perceptions aresought of the extent to which practice in assessmentand evaluation has been informed by an agreed-on setof fundamental principles and values.

Before the final part of the framework of formalprofessional review is considered, two dimensionsof the framework should be examined that not onlyform the glue that binds the framework together butmake the importance of the impact phase more apparentwhen they are defined.

Consistency

The consistency dimension is one in which judg-ments can be made about the extent to which whatappears on the face of the institution can be consis-tently found throughout its structures and organization.At a time when institutions are increasingly positioningthemselves as “student facing” or “business facing,” itis becoming vital to take a metaphorical slice acrossthe organization to see to what extent attempts to createa compliant façade are supported with scaffolding bythe remainder of the organization. Toward the top of theFramework of Formal Professional Review (Figure 1),there is less depth to the organization, and thusapproaches to assessment, moderation, and the studentexperience should be more transparent.

Integration

The division of any process into sections inevitablyraises questions about the extent to which judgments

about an institution should contain an element ofgestalt. Indeed, the extent to which an organizedwhole is greater than the sum of its parts is a judgmentthat needs to be made. The successful integration ofthe structures, provision, programs, and proceduresof any institution is a significant part of its ability toprovide evidence that it is accomplishing its mission.

Impact

At the apex of Figure 1 is Impact. The impact isoften on the individual learner, whose experience isdifficult to capture but whose story of personal changeand development is pivotal to full understanding ofthe nature of what takes place in the overlap betweenthe two estates of accreditation. What has happened tothis individual and to all the other individuals in theimpact section of the framework is of momentousimportance to making things better in classrooms, inhospital wards and theaters, in workplaces and homes.It could be argued that securing intelligence of whathappens to individual learners and how they arechanged by their learning experiences remains anunconquered acme of accreditation.

—Malcolm Hughes andSaville Kushner

Further ReadingEvans, N. (1992). Experiential learning: Its assessment and

accreditation. New York: Routledge.Council for National Academic Awards. (1990). Accreditation:

The American experience. London: Author.Fretwell, D. (2003). A framework for evaluating vocational

education and training (VET). European Journal of Education,38(2), 177-190.

Stufflebeam, D. L. (2001). The metaevaluation imperative.American Journal of Evaluation, 22(2), 183-209.

ACCURACY. See BIAS, OBJECTIVITY,RELIABILITY, VALIDITY

� ACHIEVEMENT

In education, achievement is the construct of under-standing student learning and educational effective-ness. Because what a student knows and can do cannotbe measured directly but must be inferred, determin-ing the nature and level of a student’s achievement—identifying achievement in real-life cases—is necessarilyproblematic. Apparent precision in achievement test

Achievement———7


scores belies the elusiveness of achievement andmaintains public confidence that test scores accuratelyindicate achievement. Although increasingly promi-nent since the mid-19th century, standardized high-stakes achievement testing has generated concern andopposition. Thus, as achievement has become evermore crucial in educational accountability and evalu-ation, determining achievement has become corre-spondingly controversial.

—Linda Mabry

� ACTION RESEARCH

The main features of action research are as follows:

• It includes a developmental aim that embodies aprofessional ideal and that all those who participateare committed to realizing in practice.

• It focuses on changing practice to make it moreconsistent with the developmental aim.

• In identifying and explaining inconsistenciesbetween aspiration and practice (such explanationmay lie in the broader institutional, social, and polit-ical context), it problematizes the assumptions andbeliefs (theories) that tacitly underpin professionalpractice.

• It involves professional practitioners in a process ofgenerating and testing new forms of action for realiz-ing their aspirations and thereby enables them toreconstruct the theories that guide their practice.

• It is a developmental process characterized by reflex-ivity on the part of the practitioner.

From an action research perspective, professionalpractice is a form of research and vice versa.

Good action research is informed by the valuespractitioners want to realize in their practice. In socialwork, for example, it is defined by the professionalvalues (e.g., client empowerment, antioppressivepractice) social workers want to realize. Professionalvalues are ideas about what constitutes a profession-ally worthwhile process of working with clients andcolleagues. Such values specify criteria for identify-ing appropriate modes of interaction. In other words,they define the relationship between the content ofprofessional work, practitioners, and their variousclients.

Terms such as care, education, empowerment, auton-omy, independence, quality, justice, and effectiveness

all specify qualities of that relationship. Good actionresearch is developmental; namely, it is a form ofreflective inquiry that enables practitioners to betterrealize such qualities in their practice. The tests forgood action research are very pragmatic ones. Will theresearch improve the professional quality of the trans-actions between practitioners and clients or colleagues?Good action research might fail this particular test ifit generates evidence to explain why improvement isimpossible under the circumstances, in which case itjustifies a temporary tolerance of the status quo. In eachcase, action research provides a basis for wise and intel-ligent decision making. A decision to wait awhile withpatience until the time is ripe and circumstances opennew windows of opportunity is sometimes wiser thanrepeated attempts to initiate change.

These are not extrinsic tests but ones that arecontinuously conducted by practitioners within theprocess of the research itself. If practitioners have noidea whether their research is improving their prac-tice, then its status as action research is very dubiousindeed. It follows from this that action research is nota different process from that of professional practice.Rather, action research is a form of practice andvice versa. It fuses practice and research into a singleactivity. Those who claim they have no time forresearch because they are too busy working withclients and colleagues misunderstand the relationship.They are saying they have no time to change theirpractice in any fundamental sense. When practicestrategies are viewed as hypothetical probes into waysof actualizing professional values, they constitute thecore activities of a research process, a process that mustalways be distinguished from research on practice byoutsiders.

Action research aims to realize values in prac-tice. Practitioner action research may use outsiderresearch, but it always subordinates the generation ofpropositional knowledge to the pursuit of practicalsituational understanding.

ACTION RESEARCHDEVELOPS THE CURRICULUM

Good action research always implies practice develop-ment. Practice is never simply a set of statements aboutthe content of activities. It always specifies a mode ofinteraction. If it is specified in terms of specific behav-ioral objectives, then the message to clients and

8———Action Research


colleagues is that the objectives of professional workdefine them in terms of deficits and that remedies forthese deficits may be found only within a dependencyrelationship. The question then is whether such prac-tices enable professional workers to represent theirwork in a form that is consistent with the nature offacilitative, caring, or educational relationships.

Professional practice embodies psychological,sociological, political, and ethical theories. It isthrough action research that the potential of thesetheories can be assessed. Through action research,practices are deconstructed and reconstructed in bothcontent and form. Practice designs (plans, guidelines,etc.) need not simply determine practice; rather, bymeans of action research, they themselves can beshaped through practice. Good practice planning willnot only specify the content of action but articulategeneral principles governing the form in which it is tobe enacted. In other words, it should specify actionhypotheses in the form of strategies for realizing suchprinciples, which practitioners in general can explorein their particular professional contexts. Such strate-gies need not be confined to immediate interpersonalprocesses but can refer to the wider organizationalpractices that shape social relationships and theamount of time available to various participants forworking on certain kinds of tasks. A good practicedesign not only specifies good professional practicebut also provides guidance on how to realize it. Theoutcome of good action research is not simplyimprovement in the quality of professional work forthose engaged in it but the systematic articulation ofwhat this involves and how others might achieveit. Good action research does not generate privateknowledge for an elite core of staff. It renders whatthey have achieved public and open to professionalscrutiny.

ACTION RESEARCH IMPLIESREFLEXIVE PRACTICE, NOTSIMPLY REFLECTIVE PRACTICE

Good action research generates evidence to supportjudgments about the quality of practice. The evidenceis always about the mode of interaction. Practitionerresearch that focuses on everything other than theinterpersonal conditions established by practitionersis not good action research. Good action researchis always reflexive, not merely reflective. One can

reflect about all manner of things other than one’sown actions. Evidence of client outcomes does not,in isolation, constitute evidence of practice quality.Outcomes need to be explained. The quality of imme-diate practice activities is only one possible explana-tion for success or failure. Other kinds of evidenceneed to be collected before the contribution of theindividual practitioner’s decision making to client out-comes can be judged. Outcome data may provide abasis for hypothesizing about the nature of this con-tribution, but the hypotheses will need to be testedagainst other evidence concerning the professionalprocess. Outcome data are very indirect evidence ofquality. Judging the quality of outcomes and thequality of practice are different enterprises.

ACTION RESEARCH INVOLVESGATHERING DATA ABOUT PRACTICEFROM DIFFERENT POINTS OF VIEW

Evidence about the quality of practice can be gatheredfrom a number of sources: practitioners’ own accountsof their practice, clients’ and colleagues’ accounts,peer observations of each others’ practice, “out-siders’” observational accounts, and video and audiorecordings of professional transactions. This processof gathering data from a multiplicity of sources iscalled triangulation. There are three fundamentalsources of evidence: from observers and from themajor participants, that is, the practitioner and her orhis clients and colleagues. In a fully developed actionresearch process, practitioners will be comparing andcontrasting the accounts of observers and clients orcolleagues with their own.

ACTION RESEARCH DEFINES RATHERTHAN APPLIES QUALITY INDICATORS

These are the sources of evidence, but how do practi-tioners make sense of this evidence? How do theyknow what to look for? Do they need a precodedchecklist of quality indicators—specific behaviorsthat are indicative of the qualities they want to realizein their practice? The problem with suggesting thatthey do is that it preempts and distorts what isinvolved in doing action research. Quality indicatorscannot be predefined because it is the task of actionresearch to define them. When practitioners choose

Action Research———9


certain courses of action as a possible means ofrealizing their educational values in practice, they areexploring the question, “What actions are indicativeof those values?” Evidence then has to be collected todetermine whether the means selected are indicativeof the professional values the practitioner espouses.Practitioners may select a course of action in thebelief that it will facilitate the empowerment of clientsor colleagues and, in the light of triangulation data,discover this belief to be problematic. Clients may, forexample, report that they experience the actions asconstraints, and such reports may appear to be consis-tent with observational accounts. The evidence thusrenders the practitioners’ actions problematic as qual-ity indicators and challenges practitioners to redefinewhat constitutes good practice in the circumstancesthey confront.

Professional quality indicators are determinedthrough action research, not in advance of it. Pre-specifications of quality indicators prescribe whatpractitioners must do to realize professional values.By standardizing responses, they render the responsesinsensitive to context and substitute standardizedassessments of performance in the place of actionresearch. Good action research acknowledges the factthat what constitutes quality in professional practicecannot be defined independently of the particularset of circumstances a practitioner confronts. It canonly be defined in situ through action research. Prac-titioners may, through action research, generalizeindicators across a range of contexts, but this outcomeprovides a source of hypotheses to be tested and not aprescriptive straightjacket that preempts practitionersfrom ultimately judging what actions are indicativeof quality in particular circumstances. Good actionresearch involves grounding such judgments in trian-gulated case data.

—John Elliott

Further ReadingArgyris, C., Putnam, R., & Smith, D. M. (1985). Action science:

Concepts, methods and skills for research and intervention.San Francisco: Jossey-Bass.

Carr, W., & Kemmis, S. (1986). Becoming critical: Educationknowledge and action research. London: Falmer.

Elliott, J. (1991). Action research for educational change. MiltonKeynes, PA: Open University Press.

Kemmis, S., & McTaggart, R. (1988). The action research plan-ner (3rd ed.). Geelong, Victoria, Australia: Deakin UniversityPress.

� ACTIVE LEARNINGNETWORK FORACCOUNTABILITYAND PERFORMANCEIN HUMANITARIANACTION (ALNAP)

Established in 1997, the Active Learning Networkfor Accountability and Performance in HumanitarianAction (ALNAP) is an international interagencyforum dedicated to improving the quality and account-ability of humanitarian action by sharing lessons; iden-tifying common problems; and, where appropriate,building consensus on approaches. ALNAP consistsof 51 full members and approximately 300 observermembers. Member representatives are drawn fromthe policy, operations, evaluation, and monitoringsections of organizations involved in humanitarianaction.

� ACTIVITY THEORY

Activity theory is an approach to psychologybased on Marx’s dialectical materialism that wasdeveloped by revolutionary Russian psychologistsVygotsky, Leont’ev, and Luria in the 1920s and1930s. The focus of analysis is the activity under-taken by a person (subject) with a purpose (object)that is mediated by psychological tools (systems ofnumbers, language) and often performed in collabo-ration with others. An activity occurs in a culturalcontext that includes conventions, such as culturalrules, and forms of relationships, such as a divisionof labor.

See also SYSTEMS AND SYSTEMS THINKING

Further ReadingNardi, B. (1996). (Ed.). Context and consciousness: Activity

theory and human-computer interaction. Cambridge: MITPress.

� ADELMAN, CLEM

(b. 1942, Tring, England). Ph.D. Science Education,London University; B.Sc. Education, London University.

Adelman began teaching about and doing researchand evaluation in 1964. Since 1972 he has worked at

10———Active Learning Network for Accountability and Performance in Humanitarian Action (ALNAP)


the Center for Applied Research in Education(CARE), University of East Anglia, United Kingdom,and has also held appointments at the University ofReading and the University of Trondheim in Norway.

At CARE, Adelman initially worked on the FordFoundation Teaching Project, which promoted self-evaluation and classroom action research, and then onan ethnographic study of 3- to 5-year-old children inschools. He became increasingly involved in evalua-tion problems arising from fieldwork and case studies,especially those in which the evaluators aspired to bedemocratic in their principles and conduct. He hasworked on various evaluations, including those con-cerning arts education; bilingual schooling in Boston,MA; assessment of oral English; school-industry links;and residential care for the handicapped.

His work on the feasibility of institutional self-evaluation has contributed to a broad understandingof how and why educational organizations engage inevaluation. His book (with Robin Alexander) Self-Evaluating Institution: Practice and Principles in theManagement of Educational Change reflects thesecontributions. Some of his other books include ThePolitics and Ethics of Evaluation; Guide to ClassroomObservation (with Rob Walker); and the edited volumeUttering, Muttering: Collecting, Using and ReportingTalk for Social and Educational Research.

Adelman’s work has been influenced by fellowevaluators Bob Stake, Barry McDonald, Ernie House,and Gene Glass, as well Bela Bartok, H. S. Becker,Ernest Gellner, Stephen J. Gould, Nat Hentoff, GeorgSimmel, and Spinoza.

Adelman has lived and worked in the south ofFrance, Norway, northwest China, and Hungary. He isan accomplished saxophone player.

ADVERSARIAL EVALUATION.See JUDICIAL MODEL OF EVALUATION

� ADVOCACY IN EVALUATION

Advocacy, defined in many of the debates as takingsides, is seen as one of the more intractable matters incontemporary evaluation. For example, Linda Mabrynotes, “Whether reports should be advocative andwhether they can avoid advocacy is an issue which hasexercised the evaluation community in recent years.The inescapability [from a constructionist stance] of

an evaluator’s personal values as a fundamentalundegirding for reports has been noted but resistedby objectivist evaluators [who have] focused on biasmanagement through design elements and criticalanalysis. At issue is whether evaluation should beproactive or merely instrumental.” M. F. Smith, con-sidering the future of evaluation, writes, “The issue(of independence/objectivity versus involvement/advocacy) is what gives me the most worry about theability of our field to be a profession.”

Mabry’s and Smith’s concerns seem well justified.Granted, the issue of advocacy in evaluation is com-plex in that examining one aspect (for example, thenature of objectivity in social science) leads to anotheraspect (such as methods of bias control), then another(such as the role of the evaluator as an honest broker,a voice for the disenfranchised, or something else),then another as quickly. The issue is high stakesin that the approach taken may lead to different eva-luation processes, designs, measures, analyses, con-clusions, and, probably, consequences. With a fewexceptions, there appears to be little systematic com-parative study on this point. The issue of advocacy isdivisive in our field in that beliefs about advocacyseem deeply and sometimes rancorously held. Last,despite the development of evaluation standards andguidelines, in practice, there is no more than chanceagreement and considerable unpredictability on whatexperienced evaluators would do in quite a few ethicalsituations involving advocacy.

Often, advocacy issues are posed as questions:What is the role of the evaluator? Is impartiality adelusion? In particular, do the Guiding Principles ofthe American Evaluation Association place a prior-ity on Principle E (Responsibilities for General andPublic Welfare) before Principles A through D, if allcannot be satisfied equally? Does credibility requireobjectivity? These questions were prominent in the1994 examination of the future of evaluation, based onstatements of more than 25 leading evaluators, andthey became even more so in the 2000 statements.

PART OF THE CONTEXT

Relatively powerful groups largely fund evaluations—foundations; large philanthropies such as the UnitedWay; local, state, and federal governments; boardsand directors of organizations. The questions the eval-uator is to help answer are specified initially by the

Advocacy in Evaluation———11


organizations funding the evaluation. In someinstances, so are the evaluation designs (for example,use of randomized control groups to help rule outalternative explanations of results); so are the con-structs to be measured (for example, reading readi-ness); and, at times, so are the measures themselves,particularly the performance indicators.

Speculatively, all might have been well, if, in theearly decades of evaluation, most studies showedreasonably positive effects, leading to more servicesfor more people in need. Evaluations, however, oftenhave yielded macronegative results: no measurableevidence of program benefits. Lacking evidence ofbenefits, programs may be closed down or revised andunderlying policies discredited. If one believes in theevaluation results, this is good utilization of eval-uation. Truly ineffective programs waste funds and,worse, deprive service recipients of possibly muchmore effective assistance if something different weretried: If it doesn’t work, programmatically, try anotherapproach.

Many of the programs under fairly constant review,revision, or sometimes closure, however, affected low-income people, minorities, and persons of color; manypublic expenditures benefiting relatively wealthierpersons or groups never got evaluated. Evaluations ofsuch programs tended to focus on service delivery,efficiency, and costs rather than effectiveness. Inaddition, being perceived as an unfriendly critic orexecutioner is painful. Understandably, evaluators—particularly those at the local level but also manydistinguished evaluation theorists—looked carefullyat the adequacy of the evaluation frameworks,designs, measures, analyses, and processes. Could theresults derived through evaluation approaches seen asbiased toward the most privileged groups be trusted?If not, as a matter of social justice and practicality,another approach should be tried, one more likely to“level the playing field.” Thus around the mid-1970sand early 1980s, evaluation began to split alongthe fault line of the possibility or impossibility ofobjectivity.

THE POSSIBILITY OF OBJECTIVITY

Some evaluators conclude that evaluator neutrality isat the heart of evaluation; that it is necessary for cred-ibility; that meaningful, reliable, valid informationabout a situation usually is available; and that

frameworks offering objective, trustworthy answersare possible. Recognizing (a) the need and benefitsof listening to stakeholders; (b) the value of including,where appropriate, extensive study of process andcontext that can help explain how observed outcomescame about; and (c) the value of mixed methods, the-orists such as Rossi and Freeman, Boruch, Chelimsky,Scriven, and Stufflebeam have emphasized method-ologies they see as offering the most certain, leastequivocal answers to client questions.

In this view, social good means spending moneyon programs that work by criteria seen as their raisond’etre by the funders and by others, backed by strongevidence to rule out alternative explanations or rule inplausible causes. One tries to understand the hopesand the fears for the program, to look for unintendedas well as intended consequences, to study the pro-gram as it actually happens as well as the program asintended. Pragmatically, because stakeholder involve-ment can make for better evaluations and betterutilization, it is an important part of the evaluationprocess. Also, in this view, evaluators try to under-stand, explain, and show what was happening, includ-ing context, that best accounts for the findings. Theyshould not, however, take the side of any stakeholdergroup but strive to be impartial sources of reliableinformation. Theory-driven evaluation and its applica-tion in program logic models (Chen); realist evalua-tion (Henry, Mark, Julnes); frameworks connectingcontext, input, processes, and results (Stufflebeam);integrative meta-analyses (Shadish, Lipsey, Light);and contemporary experimental designs (Boruch)evolved from these concerns.

THE IMPOSSIBILITY OF OBJECTIVITY

Other evaluators, such as Lincoln and Guba, Mertens,and House and Howe, conclude that evaluations inwhich the more powerful have an exclusive or primarysay in all aspects of the evaluation are inherentlyflawed, both as a matter of social justice and becauseof the impossibility of objectivity. Evaluators shouldbe open and up-front in acknowledging their ownvalue stances about programs; should accept that allsources of information about programs will reflect thevalue stances of the people with whom they are inter-acting; and, as a matter of responsible evaluation,should give extra weight to the most disenfranchised.Approaches such as Fourth Generation Evaluation

12———Advocacy in Evaluation


(Lincoln and Guba), Utilization Focused Evaluation(Patton), and Empowerment Evaluation (Fetterman)emphasize meaning as a social construct; place strong-est emphasis on diverse stakeholder involvement andcontrol, particularly the participation of service deliv-erers and service recipients; and see the evaluatorrole as “friendly critic” or consultant in helping thelearning organization examine itself.

In one variant of this general approach, the evalua-tor becomes an open advocate for social justice as sheor he perceives social justice in each case, for the mostdisenfranchised and the least powerful. This couldmean, for example, making the best possible case forthe continuation of a program offering what the eval-uator saw as social benefits, such as employment oflow-income persons as program staff. It could meanthat negative evidence potentially leading to loss ofjobs or services for low-income persons might not bereported. It could mean that evidence of what the eval-uator perceives as a social injustice (an intervieweetells of an instance of sexual harassment) is reportedand the program director threatened, when the lineof inquiry was not part of the original study. In otherwords, the evaluator takes sides for or against theinterests of certain, primarily less powerful groupsbefore, during, and after the evaluation.

Ahn offers some information on evaluators’ “pro-gram entanglements” as they affect practice decisions.Based on interviews of a range of evaluators, Ahnreports,

Many evaluators considered and identified “the least power-ful or marginalized group” in the program, whose voices arerarely heard, as crucial in their work. And their allegianceto this group was addressed in a variety of ways in theirprogram. DM, who viewed “emancipation” as the role ofher evaluation, for example, spoke of the importance of eval-uators being attentive to the needs of “people who could behurt the most,” giving voices to them and promoting theirparticipation. R was also concerned with questions ofpower. For example, in reporting her evaluation findings,she devoted the largest section in her report to presenting theperspectives of the least powerful who had the most to lose.

Where the voices of contrasting evaluators can beheard on same evaluation situation, as in MichaelMorris’ ethics in evaluation series in the AmericanJournal of Evaluation, the powerful urge to admini-ster on-the-spot social justice (as an advocate foran individual, program, or principle) is perhaps

startlingly clear in ways the theorists probably had notintended.

TOWARD COMMON GROUND

Both broad approaches can be caricatured. Noconstructionist, for example, advocates making upimaginary data to make a program look good. Mostwould present negative findings but in ways theybelieved would be appropriate and constructive,such as informally and verbally. And no “neoposi-tivist” ignores human and contextual factors in anevaluation or mindlessly gets meaningless informa-tion from flawed instruments to an obviously biasedquestion.

There is much common ground. The leading pro-ponents of advocacy in evaluation indicate that theymean advocacy for the voices of all stakeholders, notonly the most powerful, and advocacy for the qualityof the evaluation itself. They are concerned with fair-ness to all stakeholders in questions, designs, mea-sures, decision making, and process, not with takingsides before an evaluation begins. And so are leadingproponents of more positivist approaches.

Nonetheless, too little is known about actual deci-sions in evaluation practice. Recently, some evalua-tion organizations have been making all aspects oftheir studies transparent, even crystalline, from rawdata to final reports (for example, the Urban Institute’sNew Federalism evaluation series). Articles focusedon specific evaluations and specific evaluators, suchas the American Journal of Evaluation’s interviewswith Len Bickman and with Stewart Donaldson, areshowing how some evaluators work through practicechoices. This kind of clarity should help us sort outconflicts among standards and principles and establishsome agreed-on instances, in a sort of clinical prac-tice, case-law sense.

There seems to be considerable agreement thatas evaluators, our common ground is fairness. Ourwarrant is our knowledge of many ways of fairlyrepresenting diverse interests, understanding com-plexity and context, and wisely presenting what canbe learned from a systematic, data-based inquiry. Ourneed is to anchor our debates on neutrality and advo-cacy in analysis of specific practice decisions so allcan understand what constitutes nobly seeking fairnessand what, notoriously exceeding our warrant.

—Lois-ellin Datta

Advocacy in Evaluation———13


Further ReadingAhn, J. (2001, November). Evaluators’ program entanglements:

How they related to practice decisions. Paper presented atthe annual meeting of the American Evaluation Association,St. Louis.

Datta, L.-e. (1999). The ethics of evaluation neutrality and advo-cacy. In J. L. Fitzpatrick & M. Morris (Eds.), Current andemerging ethical issues in evaluation. New Directions forEvaluation, 82, 77-88.

House, E. R., & Howe, K. R. (1998). The issue of advocacy inevaluations. American Journal of Evaluation, 19(2), 233-236.

Mabry, L. (2002). In living color: Qualitative methods in educa-tional evaluation. In T. Kelleghan & D. L. Stufflebeam (Eds.),International handbook of educational evaluation (pp. 167-185). Dordrecht, The Netherlands: Kluwer Academic.

Scriven, M. (1997). Truth and objectivity in evaluation. InE. Chelimsky & W. Shadish (Eds.), Evaluation for the 21stcentury: A handbook (pp. 477-500). Thousand Oaks, CA:Sage.

Smith, M. F. (2002). The future of the evaluation profession. InT. Kelleghan & D. L. Stufflebeam (Eds.), Internationalhandbook of educational evaluation (pp. 373-386). Dordrecht,The Netherlands: Kluwer Academic.

� AESTHETICS

Aesthetics is a field of study within the discipline ofphilosophy that, at least currently and during much ofthe 20th century, has addressed questions about thenature and function of art. Aesthetic theory became partof the discourse within the evaluation field primarilythrough the work of Elliot Eisner. Eisner used aesthetictheory from a number of philosophers (such as SusanneLanger and John Dewey) to conceptualize and create ajustifying rationale for an approach to evaluation basedon art criticism. Among other things, Eisner’s connois-seurship-criticism model emphasized the use of literarytechniques to capture and communicate the aestheticdimensions of the phenomena being evaluated.

—Robert Donmoyer

See also CONNOISSEURSHIP

Further ReadingDewey, J. (1980). Art as experience. New York: Perigee Books.Langer, S. (1957). Problems of art. New York: Scribner.

� AFFECT

Affect refers to observable behavior or self-reportsthat express a subjectively experienced feeling or

emotion. Affect and attitude are sometimes usedinterchangeably. Whether affect is independent ofcognition is a debated topic. On the one hand is theassertion that people can have a purely emotionalreaction to something without having processed anyinformation about it. On the other hand is the assertionthat at least some cognitive (albeit not always con-scious) processing is necessary to evoke an emotionalresponse. Affective measures are used in evaluation.Self-reports of affect provide information on prefer-ences, although indirect measures of affect are some-times considered more robust; that is, there is likely tobe more stability shown in what people are observedto do than in what they say they do. Because inter-ventions may focus specifically on changing people’saffect (for example, emotional responses to gender,race, environmentalism, and so on), it is important forevaluators to establish valid indicators of affect.

� AGGREGATE MATCHING

In conditions where participants cannot be ran-domly assigned to program and control groups, theuse of proper nonrandomized control groups is rec-ommended to more accurately assess the effects of theindependent variable under study. Aggregate match-ing is a procedure for devising matched controlsin quasiexperimental evaluation research. Individualsare not matched, but the overall distributions in theexperimental and control groups on each matchingvariable are made to correspond. For example, as aresult of this procedure, similar proportions of charac-teristics such as gender and race would be found inboth the program and comparison groups.

—Marco A. Muñoz

� AID TO FAMILIESWITH DEPENDENTCHILDREN (AFDC)

Building on the Depression-era Aid to DependentChildren program, Aid to Families with DependentChildren (AFDC) provided financial assistance toneedy families from the 1970s to 1996. The federalgovernment provided broad guidelines and programrequirements, and states were responsible for programformulation, benefit determinations, and administra-tion. Eligibility for benefits was based on a state’s

14———Aesthetics


standard of need as well as the income and resourcesavailable to the recipient. In 1996, the Personal Respons-ibility and Work Opportunity Reconciliation Actreplaced the AFDC program with the TemporaryAssistance for Needy Families program.


� ALBÆK, ERIK

(b. 1955, Denmark). Ph.D. and M.A. in Political Science,University of Aarhus, Denmark.

Albæk is Professor of Public Administration, Depart-ment of Economics, Politics and Public Administration,Aalborg University, Denmark. Previous appointmentsinclude Eurofaculty Professor of Public Administra-tion, Institute of International Relations and PoliticalScience, University of Vilnius, Lithuania, and Asso-ciate Professor of Public Administration, University ofAarhus, Denmark. He has been an American Councilof Learned Societies Fellow and Visiting Scholar inAdministration, Planning, and Social Policy, GraduateSchool of Education, Harvard University, and inthe Science, Technology, and Society program atMassachusetts Institute of Technology. Currently, heis Editor of Scandinavian Political Studies and previ-ously was Editor of GRUS. Since 1989, he has partic-ipated in a variety of cross-national research projectswith colleagues throughout Europe and the UnitedStates.

His primary contributions to evaluation focus onthe history, utilization, and functions of evaluation, aswell as decision theory, and his primary intellectualinfluences are to be found in the works of HerbertSimon, Charles Lindbloom, James March, and CarolWeiss. He wrote HIV, Blood and the Politics of“Scandal” in Denmark, cowrote Nordic Local Govern-ment: Developmental Trends and Reform Activities inthe Postwar Period, and coedited Crisis, Miracles,and Beyond: Negotiated Adaptation of the DanishWelfare State. In addition, he has authored numer-ous articles and book chapters in both English andDanish.

� ALKIN, MARVIN C.

(b. 1934, New York). Ed.D. Stanford University;M.A.Education, and B.A. Mathematics, San Jose StateCollege.

Alkin was instrumental in helping to shape the fieldof evaluation through his work on evaluation utili-zation and comparative evaluation theory. He drewattention to ways of categorizing evaluation theoriesand provided the discipline with a systematic analysisof the way in which evaluation theories develop. Asa professor, Alkin developed novel ways of teachinggraduate-level evaluation courses, including the use ofsimulation and role-playing.

His interest in systems analysis was cultivatedthrough his association with Professor Fred MacDonaldin the Educational Psychology Department at StanfordUniversity. His thinking and early writings on cost-benefit and cost-effective analysis were fostered byProfessor H. Thomas James, also at Stanford University.He was also influenced by his collegial relationshipswith other evaluators, such as Dan Stufflebeam, BobStake, and Michael Quinn Patton.

Alkin founded and served as the Director of theCenter for the Study of Evaluation at the Universityof California, Los Angeles (UCLA), which gave himthe opportunity to expand his thinking about issuesrelated to evaluation theory. He was Editor-in-Chieffor the Encyclopedia of Educational Research (6thedition) and Editor of Educational Evaluation andPolicy Analysis (1995-1997). He is also a FoundingEditor of Studies in Educational Evaluation. Hereceived the American Paul F. Lazarsfeld Award fromthe American Evaluation Association for his contribu-tions to the theories of evaluation.

He is a grandfather of five grandchildren and adedicated UCLA basketball fan, having attended thefull season every year since 1965.

� ALPHA TEST

A term used frequently in software development,alpha test refers to the first phase of testing inthe development process. This phase includes unit,component, and system testing of the product. Theterm alpha derives from the first letter of the Greekalphabet.

� ALTSCHULD, JAMES W.

(b. 1939, Cleveland, Ohio). Ph.D. EducationalResearch and Development, M.S. Organic Chemistry,The Ohio State University; B.A. Chemistry, CaseWestern Reserve University.

Altschuld, James W.———15


A professor of education at The Ohio StateUniversity, Altschuld has contributed to the field ofevaluation, especially in the areas of needs assessmentand the evaluation of science education and technol-ogy, and has coauthored influential books on bothtopics. He has worked collaboratively with RuthWitkin on an approach to needs assessment, a centralconstruct in evaluation practice. Together they havewritten two books on needs assessment: Planningand Conducting Needs Assessment: A Practical Guideand From Needs Assessment to Action: TransformingNeeds Into Solution Strategies. Along with collabora-tor David Kumar, Altschuld has developed a modelof the evaluation of science education programs,reflected in his edited volume Evaluation of Scienceand Technology Education at the Dawn of a NewMillennium. Altschuld has contributed substantiallyto evaluation as a profession through his positionpapers on certification of evaluators and in The Direct-ory of Evaluation Training Programs, published in1995.

He considers Ruth Altschuld, his wife, and BelleRuth Witkin, a colleague, to be the major intellectualinfluences in his life.

Altschuld received the American EvaluationAssociation Alva and Gunnar Myrdal EvaluationPractice Award in 2002; in 2000, The Ohio StateUniversity College of Education Award for Researchand Scholarship; in 1997, the Best Program EvaluationResearch Award (with Kumar) from the Society forInformation Technology and Teacher Evaluation; in1990, the Evaluation Recognition Award from the OhioProgram Evaluators’Group; and in 1988, The Ohio StateUniversity Distinguished Teaching Award.

Altschuld is a reluctant but regular jogger and adevoted grandfather to Andrew and Lindsay.

� AMBIGUITY

Ambiguity refers to the absence of an overall frame-work to interpret situations. Multiple meanings canexist side by side, and this often causes confusion andconflict. Ambiguity differs from uncertainty. Whereasuncertainty—a shortage of information—can bereduced by facts, new and more objective informa-tion will not reduce ambiguity because facts do nothave an inherent meaning. Ambiguity poses particularproblems for evaluators: How is one to evaluate apolicy or program if the underlying concept has no

clear meaning? A possible way out is to conduct aresponsive approach that takes ambiguity as a depar-ture point for reflexive dialogues on the evaluatedpractice.

—Tineke A. Abma

Further ReadingAbma, T. A. (2002). Evaluating palliative care: Facilitating reflex-

ive dialogues about an ambiguous concept. Medicine, HealthCare and Philosophy, 4(3), 259-276.

Weick, K. (1995). Sensemaking in organizations. Thousand Oaks,CA: Sage.

� AMELIORATION

To engage in evaluation is to engage in an activitythat has the potential to improve the evaluand, or thehuman condition, more generally. Logically speaking,no evaluation in and of itself must necessarily purportto be helpful, and making a value judgment does notentail providing prescriptions, remediation, or amelio-ration. Michael Scriven has clearly delineated thedistinction between doing evaluation, making a valuejudgment, and making a recommendation. In a theoryof evaluation, this is an important distinction. In eval-uation practice, however, the making of value judg-ments and the provision of recommendations becomesblurred because we have come to expect the workof evaluators and the purpose of evaluation to bemore than simple rendering of value judgments: Thepurpose of evaluation is also to make things better.Acknowledging the serious logical and conceptualproblems of moving from evaluative to prescriptiveclaims or actions, evaluation is meant to be helpful.

The assertion that evaluation should lead toimprovement is in most senses self-evident. What is lessevident is how evaluation is expected to contribute tomaking things better. Indeed, different approaches toevaluation conceptualize helpfulness and improve-ment differently. Speaking broadly, ameliorationtakes the form of progress either through scienceor through democratic processes. The first sortof amelioration is typical of quasiexperimental, deci-sion-making, and systems analysis approaches toevaluation. The second sort of help is typical of partic-ipatory, collaborative, and deliberative approaches toevaluation.

Evaluations that are based on amelioration throughscience focus on the methods used (quasiexperimental

16———Ambiguity


or at least controlled) because these methods permitthe examination of causal hypotheses. It is thesecausal relationships that are the key to amelioration—if one knows what causes what (e.g., whole-languageteaching causes higher reading achievement), thenthis causal claim can be used to improve programsor services by choices that reflect the causal claim(e.g., adopting whole-language pedagogy). Thesecausal claims might be reflected in their contributionto a general theory (of, say, academic achievement) orprogram theory (of, say, reading programs).

Evaluations that are based on amelioration throughdemocratic processes assume that the meanings ofgood and right are socially constructed rather thanscientifically discovered. This view suggests that truthclaims are not natural causal laws but rather informed,sophisticated interpretations that are tentatively held.These interpretations depend on deliberation anddialogue, and it is this emphasis on evaluation processthat flags this perspective of amelioration. Participa-tory, deliberative, and democratic approaches to eval-uation reflect the ameliorative assumption based onfaith in inclusiveness, participation, public dialogue, andconstructivism as the means to improving or helping.Programs, services, and communities will be better asa result of an evaluation that includes stakeholdersin genuine ways, thus enabling self-determination inproblem definitions and solutions.

Further ReadingScriven, M. (1995). The logic of evaluation and evaluation prac-

tice. In D. Fournier (Ed.), Reasoning in evaluation. NewDirections for Program Evaluation, 68.

� AMERICAN EVALUATIONASSOCIATION (AEA)

The American Evaluation Association (AEA) isan international professional association of evaluatorsdevoted to the application and exploration of pro-gram evaluation, personnel evaluation, technologyevaluation, and many other forms of evaluation. Theassociation was formed in 1986 with the merger of theEvaluation Network and the Evaluation ResearchSociety. AEA’s mission is to improve evaluation prac-tices and methods, increase evaluation use, promoteevaluation as a profession, and support the contribu-tion of evaluation to the generation of theory andknowledge about effective human action. AEA has

approximately 3000 members, representing all 50 statesin the United States, as well as many other countries.

� AMERICAN INSTITUTESFOR RESEARCH (AIR)

Founded in 1946, the American Institutes forResearch (AIR) is a not-for-profit research corpora-tion with a long history of research, evaluation, andpolicy analysis. AIR’s staff of some 800 professionalsperforms basic and applied research, provides techni-cal support, and conducts analyses using establishedmethods from the behavioral and social sciences.Program areas focus on education, health, individualand organizational performance, and quality of lifeissues.


� AMERICAN JOURNALOF EVALUATION

The American Journal of Evaluation (AJE) is anofficial, peer-reviewed journal sponsored by AEA.Between 1986 and 1997, the journal was publishedunder the title Evaluation Practice. Prior to 1986, apredecessor publication, Evaluation News, was spon-sored by the Evaluation Network, one of the two orga-nizations that merged to create AEA. AJE’s mission isto publish original papers about the methods, theory,practice, and findings of evaluation. The general goalof AJE is to publish the best work in and about evalu-ation. Blaine Worthen was Editor during the transi-tion from Evaluation Practice to AJE, and MelvinM. Mark succeeded him in 1999. M. F. Smith andTony Eichelberger previously served as editors ofEvaluation Practice.

—Melvin M. Mark

� ANALYSIS

Analysis may be defined as the separation of anintellectual or material whole into its constituent partsand the study of the constituent parts and their inter-relationships in making up a whole. Analysis has botha qualitative dimension (what something is) and aquantitative dimension (how much of that something

Analysis———17


there is). In logic, analysis also refers to the tracing ofthings to their source, the search for original princi-ples. In evaluation, judging evaluands requires ananalysis by identifying important aspects of the eval-uand and discerning how much of those aspects ispresent. The opposite of analysis is synthesis.

� APPLIED RESEARCH

Applied research refers to the use of social scienceinquiry methods in situations where generalizabilitymay be limited. Such research provides answers toquestions dealing with a delimited group of persons,behaviors, or outcomes. Applied research contrastswith basic research, which has the purpose of address-ing fundamental questions with wide generalizabil-ity; for example, testing a hypothesis derived from atheory in economics. Both applied and basic research-ers can use any of the social science research methods,such as the survey, experimental, and qualitativemethods. Differences between the research roles donot relate to methods of inquiry; they relate to the pur-poses of the investigation. Applied researchers focuson concrete and practical problems; basic researchersfocus on problems that are more abstract and lesslikely to have immediate application.

Evaluation provides many avenues for appliedresearch. For example, an evaluator might perform aneeds assessment to determine whether a programaimed at a particular group of clients should be plannedand implemented. A community psychologist whosurveys directors of homeless shelters to assess theneed for a substance abuse counseling program isperforming applied research. The range of generaliz-ability is limited to the community being surveyed.The problem being addressed is practical, not theoret-ical. Formative evaluation activities involve appliedresearch almost exclusively. For example, a director ofcorporate training might use methods such as surveys,interviews, observations, and focus groups to reviseand refine instructional materials. The purpose of theresearch is to obtain feedback about which aspects ofthe material should be changed for specific users.

An evaluation can sometimes have dimensions ofboth applied and basic research. For example, a sum-mative evaluation can have both practical uses andimplications for theory. A demonstration project onpreschool education could simultaneously reveal themerit of the project and test a theory about the impact

of instructional activities on the school readiness of4-year-olds. Although evaluation is associated morewith applied research than with basic research, thelatter has strongly influenced some evaluation frame-works. A prime example is theory-driven evaluation,which focuses on structuring the evaluation to testwhether predicted relationships among variablesare verified by the program. An evaluator using thisapproach might employ statistical models, such aspath analysis, that require a set of hypothesized rela-tionships to be tested with empirical data derived fromprogram participants.

—Joseph M. Petrosko

� APPRAISAL

Appraisal, sometimes used as a synonym for eval-uation, refers most specifically to estimating themarket or dollar value of an object, such as a propertyor a piece of art or jewelry. The term is also used invaluing intangibles, such as investments; businessand industry value, solvency, and liability; and evenpsychological traits such as motivation.

� APPRECIATIVE INQUIRY

Appreciative inquiry is a method and approach toinquiry that seeks to understand what is best abouta program, organization, or system, to create a betterfuture. The underlying assumptions of appreciativeinquiry suggest that what we focus on becomes ourreality, that there are multiple realities and values thatneed to be acknowledged and included, that the veryact of asking questions influences our thinking andbehavior, and that people will have more enthusiasmand motivation to change if they see possibilities andopportunities for the future. Appreciative inquiry isbased on five principles:

1. Knowledge about an organization and the destinyof that organization are interwoven.

2. Inquiry and change are not separate but are simulta-neous. Inquiry is intervention.

3. The most important resources we have for generatingconstructive organizational change or improvementare our collective imagination and our discourseabout the future.

18———Applied Research


4. Human organizations are unfinished books. Anorganization’s story is continually being written by thepeople within the organization, as well as by thoseoutside who interact with it.

5. Momentum for change requires large amounts ofboth positive affect and social bonding—things suchas hope, inspiration, and sheer joy in creating withone another.

Appreciative inquiry is often implemented as a “sum-mit” that lasts from 2 to 5 days and includes 20 to 2500people. During their time together, participants engagein a four-stage process of discovery, dream, design, anddestiny, during which they respond to a series of ques-tions that seek to uncover what is working well, whatthey want more of, and how the ideal might become real-ity. Appreciative inquiry questions might include, “Asyou reflect on your experience with the program, whatwas a high point?” “When did you feel most successfulin terms of your contributions to the project?” “What arethe most outstanding moments or stories from this orga-nization’s past that make you most proud to be a memberof this organization?” “What are the things that give lifeto the organization when it is most alive, most effective,most in tune with the overarching vision?”

Appreciative inquiry and participatory, collabora-tive, and learning-oriented approaches to evaluationshare several similarities. For the most part, they arecatalysts for change; emphasize the importance of dia-logue and, through questioning, seek to identify val-ues, beliefs, and assumptions throughout the process;are based on the social construction of reality; stressthe importance of stakeholder involvement; embrace asystems orientation; and reflect an action orientationand the use of results.

Appreciative inquiry is being used to evaluate awide variety of programs and services around theworld. While some evaluators use appreciative inquiryas an overarching framework (as with utilization-focusedor empowerment frameworks), others are adoptingappreciative inquiry principles to construct interviewprotocols and procedures. Using appreciative inquiryfor evaluation may be particularly useful (a) for fram-ing and implementing developmental and formativeevaluations, (b) as a method to focus an evaluationstudy, (c) as an interviewing technique, and (d) as ameans to increase an organization’s commitment toengaging in evaluation work.

—Hallie Preskill

Further ReadingHammond, S. A. (1996). The thin book of appreciative inquiry.

Plano, TX: CSS.Watkins, J. M., & Cooperrider, D. (2000). Appreciative inquiry:

A transformative paradigm. OD Practitioner, 32(1), 6-12.Watkins, J. M., & Mohr, B. J. (2001). Appreciative inquiry: Change

at the speed of imagination. San Francisco: Jossey-Bass.

� APPROPRIATENESS

In the focus on effectiveness, efficiency, and econ-omy, the appropriateness of programs, projects, poli-cies, and products is often ignored—to the detrimentof sound judgments about worth. To evaluate appro-priateness, one of two comparisons is made. The pro-gram may be compared to the needs of the intendedclients, using any of the techniques of needs analysis.Alternatively, the program can be evaluated in termsof its compliance with process. In health, for example,some evaluations focus on appropriate care, includ-ing treatment of conditions (heart disease) or events(childbirth). Appropriateness can be determinedthrough expert review of individual cases.

—Patricia J. Rogers

� ARCHIVES

An archive is a place in which past and currentrecords and artifacts of ongoing value are protectedand made available to people such as evaluators. Suchmaterial often forms part of historic memory andcan enhance understandings of cultures, organiza-tions, and programs. Archives may contain primary orsecondary sources such as memorabilia (e.g., photo-graphs, documents, letters), equipment, newspaperarticles, rare books, minutes of meetings, and records.Locations of archives vary: They can be found ingovernment departments, libraries, museums, newspa-per offices, universities, private companies, and religiousorganizations.

From archives, evaluators may gain valuableinsights into organizations and programs that may nothave been apparent before and that could not havebeen discovered in any other way. Frequently, evalua-tors must go to the archival site, although somearchives are now being made available through micro-fiche and the Internet.

—Rosalind Hurworth

Archives———19


� ARGUMENT

In an evaluation context, argument is the frame-work, or methodological reasoning, used to persuadean audience of the worth or value of something.Rarely do evaluations find definitive answers or com-pelling conclusions; most often, evaluations appeal toan audience’s reason and understanding to persuadepeople that the findings of an evaluation are plausi-ble and actionable. Different audiences want differentinformation from an evaluation and will find differentconstructions of that information compelling. Evalu-ators use information and data collected during theevaluation process to make different arguments todifferent audiences or stakeholders of the evaluation.Evaluation is a process of deriving criteria for whatcounts as important in the evaluation (either as anexpert, in concert with program managers or staff,or through participatory processes for deriving thesecriteria); collecting, analyzing, and interpreting data;and presenting those analyses and interpretations toaudiences of interested parties.

Argumentation (methodological reasoning) is aninherent element of designing and carrying out anevaluation; the points of decision that confront evalu-ators as they work to understand a program or policyand put together the most compelling description pos-sible of their findings are similar to the decision pointsnecessary to construct a valid argument. Evaluatorsbuttress their arguments with data from the evaluationprocess. However, persuasion comes into play whenpresenting those facts to audiences: For example,school boards and parents may find measures ofstudent achievement most compelling; educators andschool administrators may find evidence of height-ened interest in learning, gathered through studentinterviews, most compelling. An evaluator’s job isto present the information in as unbiased a fashion aspossible and make the most credible, persuasive argu-ment to the right audiences. Rather than convinceand demonstrate, an evaluator persuades and argues;rather than amass and present evidence that is com-pelling and certain, an evaluator strives to presentevidence that is widely accepted as credible.

—Leslie K. Goodyear

Further ReadingHouse, E. R. (1980). Evaluating with validity. Beverly Hills, CA:

Sage.

� ARTISTIC EVALUATION

Artistic evaluation is a general term that couldbe used to refer to a number of ideas and activitieswithin the evaluation field. Three possible meaningsare discussed here.

EISNER’S ARTISTIC EVALUATION MODEL

Possibly the most obvious referent would be theapproach to educational evaluation that Elliot Eisnerhas fashioned using art criticism as a model. Eisner’sconnoisseurship-criticism model can be considered anartistic approach to evaluation for at least four reasons.

First, Eisner draws on a number of aesthetictheories, including those of John Dewey and SusanneLanger, to conceptualize and justify his educationalconnoisseurship and criticism model. Second, the modelemphasizes the use of artistic and literary forms ofdiscourse to describe the program that is being evalu-ated. (Only evocative, somewhat poetic language cancapture the aesthetic dimensions of what is beingstudied, Eisner argues.) Third, the model emphasizesthe eclectic use of social science theory to interpretthe phenomena being evaluated, and this eclectic useof theory can be thought of as being artistic, at least ina metaphorical sense, because it is not rule governedor systematized. Finally, Eisner argues that social phe-nomena are like works of art in a number of respects,including their complexity, and he applies JohnDewey’s dictum about evaluating works of art toevaluating educational and other social phenomena:The worth of complex phenomena cannot be assessedby applying a predetermined standard; rather, theevaluator’s judgment must be employed.

ARTISTIC FORMS OF DISPLAYINGEVALUATION DATA AND RESULTS

The term artistic evaluation might also be used to ref-erence the growing use of art forms and artistic tech-niques to display evaluation findings. Evaluators’ useof alternative forms of data display—including formsthat are rooted in the arts—normally is motivated bya desire to communicate more effectively with differ-ent evaluation audiences, including audiences that areunlikely to respond positively to (or, for that matter,

20———Argument


even to read) traditional social science–soundingevaluation reports.

An example of an artistic data display techniquethat has been used to communicate evaluation find-ings in contracted evaluation studies is readers’ the-ater. Readers’ theater is a stylized, nonrealistic formof theater in which actors hold scripts and audiencemembers are asked to use their imaginations to visu-alize details that, in realistic theater, would be pro-vided for them by the scenery and costume designers.In this respect, readers’ theater productions are likeradio plays; the difference, of course, is that, in a read-ers’ theater production, the actors are visible to theaudience and may engage in some visually observablebehavior to symbolize certain significant activities(e.g., an actor might turn his back to the audience tosymbolize that he or she has left the scene).

Robert Donmoyer and Fred Galloway used readers’theater in an evaluation project funded by the BallFoundation. The project assessed the foundation’seducational reform project in a Southern Californiaschool district. Specifically, the readers’ theater datadisplay technique was used to communicate evalua-tion findings to an audience of teachers and adminis-trators involved with the reform. Teachers andadministrators in the district also served as readersand actors in the production of the readers’ theaterscript.

The readers’ theater script titled Voices in OurHeads: A Montage of Ideas Encountered Duringthe Evaluation of the Ball Foundation’s Communityof Schools Project in the Chula Vista ElementarySchool District was constructed from quotations excerp-ted from interview transcripts. The script frequentlyjuxtaposed different and, at times, antithetical pointsof view about the foundation-funded reform initiativein the district.

Those present judged the presentation to be bothan enjoyable and an effective way to communicateimportant ideas emerging from the evaluation to anaudience that was unlikely to read the rather lengthywritten report that the two evaluators had produced.Foundation officials, in fact, were so pleased with thisartistic approach to reporting evaluation findings thatthey contracted with the two evaluators to evaluatethe annual conference the foundation sponsored for theschools and school districts it funded and to reportthe results of that evaluation in another readers’ theaterproduction.

Before proceeding, it should be noted that the useof alternative modes of displaying data—includingmodes rooted in the arts—is hardly a novel idea. BothRobert Stake, in his discussions of his “responsive”approach to evaluation, and Elliot Eisner, in dis-cussing his educational connoisseurship and criticismmodel of evaluation, endorsed the use of artisticmodes of data display in evaluation work as early asthe mid-1970s.

THE PROCESS MEANINGOF ARTISTIC EVALUATION

There is at least one other meaning that could beassociated with the term artistic evaluation. The termmight refer to the fact that all forms of evaluation havea serendipitous, unplanned element. Even when anevaluator is intent on rigidly following predeterminedand prespecified standard operating procedures andmaking the evaluation process systematized and rulegoverned, things do not always turn out as planned,and, consequently, an improvisational element alwayscreeps into evaluation work. To state this point anotherway: Effective evaluation work inevitably requires adegree of artistry, metaphorically speaking.

For some, the artistic dimension of evaluation workis not seen as a necessary evil. Indeed, at times, theartistic element of evaluation work is lauded andbrought front and center in discussions about thedesigns to be employed in evaluation studies. A posi-tive view of the use of artistry in designing and con-ducting evaluation studies is on display in MichaelQuinn Patton’s book Creative Evaluation, for example.Whether or not one decides to embrace and cultivatethe serendipitous aspects of evaluation work as Pattonclearly does, these dimensions will be present in someform and to some degree in all evaluation work. In thisrespect, all evaluations can be considered to be, to agreater or lesser extent, artistic evaluation.

—Robert Donmoyer

See also CONNOISSEURSHIP

Further ReadingDonmoyer, R., & Yennie-Donmoyer, J. (1995). Data as drama:

Reflections on the use of readers’ theater as a mode of qual-itative data display. Qualitative Inquiry, 1, 397-408.

Patton, M. Q. (1988). Creative evaluation. Thousand Oaks, CA:Sage.

Artistic Evaluation———21


� ASSESSMENT

From the Greek, “to sit with,” assessment means anevaluative determination. Roughly synonymous withtesting and evaluation in lay terms, assessment hasbecome the term of choice in education for determin-ing the quality of student work for purposes of identi-fying the student’s level of achievement. A moreimportant distinction is between the terms assessmentand measurement because educational constructs suchas achievement, like most social phenomena, cannotbe directly measured but can be assessed.

—Linda Mabry

� ASSOCIATIONS,EVALUATION

A hierarchy of evaluation organizations is slowlyemerging. At the global level are organizations suchas the International Organization for Cooperation inEvaluation (IOCE) and the International DevelopmentEvaluation Association (IDEAS). The IOCE is a loosecoalition of some 50 regional and national evaluationorganizations from around the world. The mission ofthe IOCE is to legitimate and strengthen evaluationsocieties and associations by promoting the system-atic use of evaluation in civil society. Its intent is tobuild evaluation capacity, develop principles and pro-cedures in evaluation, encourage the development ofnew societies and associations, procure resources forcooperative activity, and be a forum for the exchangeof good practice and theory in evaluation.

The initiative to establish IDEAS arose fromthe lack of an international organization representingthe professional interests and intellectual needs ofdevelopment evaluators, particularly in transitioneconomies and the developing world. IDEAS seeks tofulfill that need by creating a strong body of commit-ted voluntary members worldwide, particularly fromdeveloping countries and transition economies, thatwill support essential, creative, and innovative devel-opment evaluation activities; enhance capacity; nur-ture partnerships; and advance learning and sharingof knowledge with a view to improving the quality ofpeople’s lives.

The regional level of the hierarchy is made upof evaluation organizations that have a geographicalfocus that spans two or more countries. The regional

evaluation organizations registered to attend the IOCEinaugural assembly include the African EvaluationAssociation, the Australasian Evaluation Society (AES),the European Evaluation Society, the InternationalProgram Evaluation Network (IPEN) of Russia andthe Independent States, and the Program for Strength-ening the Regional Capacity for Evaluation of RuralPoverty Alleviation Projects in Latin America and theCaribbean. Most, if not all, regional evaluation organi-zations are based on individual memberships. To a lesserextent they serve as umbrella organizations for thenational evaluation organizations that lie within theirgeographical focus. Like their national counterparts,regional evaluation organizations offer many benefits totheir memberships.

The national level of the hierarchy is made up ofevaluation organizations that operate throughout asingle country. The national evaluation organiza-tions registered to attend the IOCE inaugural assemblyinclude the American Evaluation Association (AEA),Associazione Italiana di Valutazione, Canadian Evalu-ation Society (CES), Egyptian Evaluation Society,Eritrean National Evaluation Association, Ghana Eva-luators Association, Israeli Association for ProgramEvaluation, IPEN-Georgia, IPEN-Ukraine, IPEN-Russia,Kenya Evaluation Association, Malawi Network of Eva-luators, Malaysian Evaluation Society, Rede Brasileirade Monitoramento e Avaliaçao, Réseau nigérien deSuivi Evaluation (ReNSE), Réseau de Suivi et Evalua-tion du Rwanda, Sociedad Española de Evaluación,Société Française de l’Evaluation, South African Eva-luation Network, Sri Lankan Evaluation Association,United Kingdom Evaluation Society, and ZimbabweEvaluation Society.

The structure of regional and national evaluationorganizations arises organically in response to contex-tual variables that are unique to their respective areasof geographic focus. In democratic countries, a com-mon structure is the association or society. (A notableexception is Utvärderarna, a very loose evaluation net-work in Sweden.) In nondemocratic countries, infor-mal networks whose memberships are not registeredwith the government have often been found to bepreferable. There also appears to be a natural progres-sion to the development of regional and national eval-uation organization structure. National organizationsoften begin as informal networks. As the organiza-tions mature and contextual variables change, often-times the networks begin to formalize. Eventually,some networks take the step of becoming legal

22———Assessment


organizations that are formally recognized by theirgovernments.

Regional and national evaluation organizationsprovide many benefits to their members. Conferencesare a common benefit. These meetings provide anopportunity for professional development and network-ing. Some organizations seek to connect memberswho have common interests. The American Evalua-tion Association does this by grouping membersinto 32 interest groups that deal with a wide varietyof evaluation topics. Journals are another commonbenefit. Journals provide members with news of theprofession as well as information about the latestapproaches and methods. Some regional and nationalevaluation organizations (e.g., AEA, AES, CES,IPEN, etc.) have defined ethical codes to guide theconduct of their members. Finally, several regionaland national evaluation organizations (e.g., SwissEvaluation Society, German Evaluation Association,African Evaluation Society, Australasian EvaluationSociety) are in various stages of publishing programevaluation standards. In many cases, these standardsare based on the Joint Committee for EducationalEvaluation’s Program Evaluation Standards.

The subnational level of the hierarchy is madeup of evaluation organizations that operate withina region, state, or province of a country. Examplesof these types of organizations include the Sociétéquébécoise d’évaluation de programme, the SocieteWallonne de l’Evaluation et de la Prospective, theprovince chapters of the CES, and the local affili-ates of the American Evaluation Association. TheIOCE has created an enabling environment thatpermits international cooperation at the subna-tional level. For example, the Michigan Associationfor Evaluation (one of AEA’s local affiliates) andthe Ontario Province Chapter of CES have beenexploring opportunities for cooperation such asjoint workshops and conferences, professionalexchanges, shared publications, and so on. It may beat the subnational level that individuals mostdirectly experience the benefits of internationalcooperation.

—Craig Russon

Further ReadingInternational Organization for Cooperation in Evaluation. (2002).

[Home page]. Retrieved April 27, 2004, from http://home.wmis.net/~russon/ioce/

� ATTITUDES

Attitude is a predisposition to classify objects,people, ideas, and events and to react to them withsome degree of evaluative consistency. Inherent in anattitude is a judgment about the goodness or rightnessof a person, thing, or state. Attitudes are mental con-structs that are inferred and are manifest in consciousexperience of an inner state, in speech, in behavior,and in physiological symptoms. Measurements ofattitudes are often evaluation data, as they may reflectresponses to an evaluand.

See also AFFECT

� AUDIENCE

An audience is the identified receiver of the findingsor knowledge products (evidence, conclusions, judg-ments, or recommendations) from the evaluative inves-tigation. One can think of a hierarchy of audiences—primary, secondary, and so on. The primary audienceis an individual or group to which the findings aredirected during the evaluation. Early identification ofa key individual who has influence in an organiza-tion can significantly affect the utilization of eval-uation findings in that organization, and there isevidence that influential groups can be similarlyinfluential.

—John M. Owen

� AUDITING

Broadly defined, auditing is a procedure in whichan independent third party systematically examinesthe evidence of adherence of some practice to a set ofnorms or standards for that practice and issues a pro-fessional opinion. For several years, this general ideahas informed the process of metaevaluation—a third-party evaluator examines the quality of a completedevaluation against some set of standards for evalua-tion. In addition to this generic way of thinking aboutthe nature of auditing and its relevance for evalua-tion, more specifically, one can examine the relationsbetween the practices of program evaluation andprogram and performance auditing at state and nationallevels. For many years, these activities have existedside by side as distinct practices with different

Auditing———23


professional cultures, literatures, and academicpreparation (e.g., training in financial and perfor-mance auditing or in disciplines of social scienceresearch). During the last several decades, each prac-tice has gone through substantial changes that haveinfluenced the dialogue between the two practices onissues of purpose and methodology. As defined bythe Comptroller General of the United States, a perfor-mance audit is “an objective and systematic examina-tion of evidence of the performance of a governmentorganization, program, activity, or function in order toprovide information to improve public accountabilityand facilitate decision-making.” A program audit is asubcategory of performance auditing, in which a keyobjective is to determine whether program results orbenefits established by the legislature or other autho-rizing bodies are being achieved.

Both evaluation and performance auditing sharean interest in establishing their independence and inwarranting the credibility of their professional judg-ments. Furthermore, both practices are broadly con-cerned with assessing performance. However, someobservers have argued that evaluation and perfor-mance auditing differ in the ways they conceive ofand accomplish that aim. Some of the differencesbetween the two practices include the following:Auditors address normative questions (questionsof what is, in light of what should be), and evalua-tors are more concerned with descriptive and impactquestions. Auditors work more independently of theauditee than evaluators do with their clients. Auditorsare more exclusively focused on management objec-tives, performance, and controls than are evaluators.Auditors work with techniques for generating evi-dence and analyzing data that make it possible toprovide quick feedback to auditees; evaluations often(though not always) have a longer time frame. Althoughboth auditors and evaluators base their judgments onevidence, not on impressions, and both rely on anextensive kit of tools and techniques for generatingevidence, they often make use of those tools in dif-ferent ways. For example, auditors plan the steps inan audit, but an evaluator is more likely to establish astudy design that may well take into account examin-ing related evaluation studies and their results. Bothstudy designs and reporting in evaluation are likely toinclude great detail on methods; for example, inter-view schedules, the process of selecting interviewees,and the conditions of interviewing. Evaluators alsooften draw on multiple methods of generating and

analyzing data, moreso than auditors. Finally,auditors operate under statutory authority; evaluatorswork as fee-for-service consultants or as university-based researchers. Other observers have argued thatthe practices of auditing and evaluation, althoughthey often exist independently of one another, arebeing blended together as a resource pool for deci-sion makers responsible for public programs. Inthis circumstance, an amalgamated picture is emerg-ing of professional objectives (e.g., placing highvalue on independence, strict attention to documen-tation of evidence), purpose (e.g., combining norma-tive, descriptive, and impact questions), andmethodologies (e.g., making use of a wide range oftechniques).

—Thomas A. Schwandt

Further ReadingChelimsky, E. (1985). Comparing and contrasting auditing and

evaluation: Some notes on their relationship. EvaluationReview, 9, 485-503.

Wisler, C. (Ed.). (1996). Evaluation and auditing: Prospects forconvergence. New Directions for Evaluation, 71.

� AUTHENTICITY

Authenticity is defined as a report writer’s attemptto present the voices of the evaluands integrated withthe context of the action in a way that seems to corre-spond with and represent the lived experience of theevaluands. Claims of authenticity are best corrobo-rated by the evaluand’s supportive feedback on thereport, perhaps as part of the pursuit of constructvalidity. Criteria to be satisfied include whether thereport is accurate, has appropriate coverage, and isbalanced and fair. Audio and visual recordings of theaction do not in themselves provide authenticity whentranscribed and otherwise interpreted by the evaluator,for they lack the experiential accounts and are, forall their detail, partial, like all other data. Evaluatorswho have employed the concept of authenticity intheir work include Elliot Eisner, Rob Walker, TerryDenny, and Saville Kushner. The origins of the con-cept of authenticity can be found in The Confessionsof J. J. Rousseau and also in the works of Camus andHeidegger.

—Clem Adelman

See also CONTEXT

24———Authenticity


� AUTHORITYOF EVALUATION

Authority in evaluation is contingent on myriadreciprocal and overlapping considerations, some obvi-ous and some subtle, as well as on starkly differentperspectives as to what constitutes evaluation as aprofessional practice.

WITHIN THE PROFESSIONALCOMMUNITY

Broad distinctions between stakeholder-orientedand expert-oriented evaluation approaches have impli-cations for authority. Expert-oriented evaluationapproaches invest authority in a professional evaluator(e.g., Stufflebeam’s context-input-process-product,or CIPP, model), a connoisseur (i.e., Eisner’s connois-seurship approach), or a group of professionals ortechnical experts (e.g., evaluation or accreditationteams, advisory and blue-ribbon panels). In contrast,stakeholder-oriented approaches may confer author-ity on program personnel, participants, and relevantothers or may involve shared authority. Less explicitly,a stakeholder-oriented evaluator may retain authorityas well as closely attend to stakeholder aims, priorities,and criteria.

An important consideration in the first half of the20th century has been the variety of approaches toevaluation, roughly corresponding (but not limited) tothe so-called models of evaluation, of which there areabout a dozen, depending on categorization schemes.Even the question of whether the evaluator is obligedto render an evaluative judgment regarding the qual-ity, worth, merit, shortcoming, or effectiveness of aprogram depends, in part, on his or her approach.According to some, the evaluator’s basic responsibil-ity is to define foci and questions, determine datacollection methods, establish analytic criteria andstrategies, and report evaluative conclusions (e.g.,Scriven’s approach). In other approaches, such deci-sions and activities are undertaken collaborativelywith clients. In articulating the continuum (controver-sially), Scriven has decried as “not quite evaluation”approaches in which the evaluator leaves authority forfinal judgment to program participants (e.g., Stake’sresponsive evaluation) and as “more than evaluation”(i.e., consultation) those approaches in which theevaluator continues to work with clients after an

evaluation report to ensure its useful implementation(e.g., Patton’s utilization-focused evaluation) andapproaches in which overarching goals includeimproved working relationships among programpersonnel (e.g., Greene’s participatory evaluation) ortheir improved political efficacy (e.g., Fetterman’sempowerment evaluation).

In any approach—perhaps appropriately, perhapsinevitably—some stakeholder aims, priorities, andcriteria will be emphasized over others. For example,all approaches are vulnerable to threats of manageri-alism, the prioritizing of managers’ interests andconcerns, because managers are often the contractingagents. All approaches are also susceptible to clien-tism (attempts to ensure future contracts by givingclients what they want) and, to varying degrees, tothe emergence of relationships between evaluatorsand stakeholders. The day-by-day conduct and internalpolitics of an evaluation demonstrate that authority inevaluation both varies and fluctuates.

Locus of authority is also affected by evaluationpurpose. With its general intent to assess programaccomplishment at a specific point in time (e.g., theend of a granting agency’s designated funding period),summative evaluation may tend toward the investmentof authority in an evaluator. By comparison, formativeevaluation (monitoring to improve program implemen-tation) may tend toward shared authority, as ongoinguse of evaluation findings by program personnel is theaim. Purposes more specific than these broad strokesalso shade evaluation authority.

Authority in an evaluation is influenced by thenature of the evaluator’s position vis-à-vis the pro-gram itself or the organization of which the programis a part. External evaluation by an outside profes-sional contracted for the purpose of conducting a spe-cific evaluation is often considered more credible thaninternal evaluation conducted by program personnelor by an organization’s internal evaluation unitbecause of the external evaluator’s presumed inde-pendence and lack of investment in the program.Independence suggests that external evaluation maytend toward locating authority in the evaluator.Internal evaluators may be subject to particularlyintense encroachments on their authority. Even wherethere is formal provision for independence, they mustreport findings to their superiors and colleagues who,if threatened or dissatisfied, may actively underminetheir credibility or status. Internal evaluators mayfind themselves caught between reporting fully and

Authority of Evaluation———25


accurately, expected in competent practice, andkeeping their jobs. This complication of independencesuggests that authority may tend to be more diffuse ininternal evaluation.

Calls for metaevaluation (e.g., by Scriven andStufflebeam), especially in large-scale, high-impact,and expensive projects, imply that even well-placedand well-exercised authority in evaluation guaranteesneither competent practice nor credibility to clients.Metalevel review of evaluation projects further dis-perses authority in evaluation, signifying that even themost autocratic evaluator may not be the final author-ity regarding his or her own evaluation studies.

BEYOND THEPROFESSIONAL COMMUNITY

Clients may feel that authority should be theirsregarding the focus of an evaluation, dissemination ofresults, and perhaps other matters, such as identifyingkey contacts and informants and selecting methods(e.g., specifying an interview study or a populationto be surveyed). Having commissioned evaluation fortheir own reasons, clients may insist, blatantly or sub-tly, on their authority, as they would with other typesof outsourced services. Wrangles may ensue overhow an evaluation should be conducted and over pub-lication of findings, even after good-faith contractnegotiations. Difficulty may be unavoidable when evalua-tors believe they have an ethical responsibility toreport publicly or to right-to-know or need-to-knowaudiences but clients fear that broad disseminationmay damage their programs by revealing hurtful defi-ciency. In this way, ethical consideration of the publicinterest may give the public an authoritative influencein evaluation.

Funding agencies frequently require evaluations ofthe programs they support and constitute importantaudiences for evaluation reports. Consequently, theymay exercise real authority over some aspects ofan evaluation, as when their review panels accept orreject proposals on the basis of whether they conformto the funding agency’s interests (e.g., the WorldBank’s interest in economic aspects of educationalprojects). Even an applicant’s awareness of the evalu-ation designs of successful prior proposals may givefunding agencies oblique authority over evaluations.

Where program managers and program funders aredistinct entities rather than one and the same and wherethe interests of program managers and funders clash,

their struggles for authority over evaluations arepredictable. For example, formative first-year evalua-tion reporting may be expected to serve programmanagers’ interests in improving the program andsimultaneously serve funders’ decisions about fundingcontinuation or termination. Full reporting of early pro-gram difficulties may be opposed by program managersbecause of the risks to personnel but be demanded byfunders, each tugging for authority over reporting.

AUTHORITY IN MEANING MAKING

Who should determine the criteria or standards ofquality against which a program will be judged, andwhether the criteria for determining the quality, worth,merit, shortcomings, or effectiveness of a programshould be preordinate or explicit, are issues of author-ity regarding meaning making, framing the interpreta-tion of data and the development of findings. Theseissues have been debated in the professional com-munity, sometimes as a matter of criteriality. Partlyreflecting evaluators’ models or approaches to evalua-tion, the responsibility to identify or devise criteria isseen by some as an obligatory part of the evaluator’srole. Alternatively, clients may justifiably feel that theauthority to determine criteria of quality should be theresponsibility of professional organizations in theirrespective fields (e.g., early childhood education stan-dards, rehabilitation standards) or should be shared,giving them a right to comment on whether the stan-dards are sufficiently sensitive for their programs orappropriate for programs of their kind.

Other evaluators consider that useful, sensitivecriteria are not well captured in formal, explicit state-ments, even standards promulgated within the pro-gram or within its field of endeavor. Ill-advisedcriteria, as preordinate criteria determined at the startof an evaluation when relatively little is known aboutthe program may be, could mislead an evaluation,most seriously if taken as guideposts from the outset,exercising authority over all aspects of the study frominitial focus to final analysis. From this perspective,emergent criteria intuited by the evaluator during theconduct of the evaluation as the program becomesfamiliar may seem more nuanced, more appropriate,and more useful. However, explicit, preordinate crite-ria can foster client awareness and encourage clientinput in ways not available to unstated intuitive crite-ria, rendering claims about the greater sensitivity ofintuitive criteria vulnerable to challenge.

26———Authority of Evaluation


Audiences may unknowingly exercise authorityregarding evaluation results as evaluators strive toachieve accessibility in reporting and to encourageappropriate use. Evaluators often try to be consciousof prospective audiences as they design and conducttheir studies, considering what types of data maybe meaningful to readers and hearers, for example,and, as they analyze data, considering what typesof interpretations may be useful and how to presentthem comprehensibly. Audiences take what they canand what they will from an evaluation report. Theirbackground (educational level, cultural values, per-sonal experiences, socioeconomic status, and the like)will affect what they glean from anything theyread. Audiences’ varied agendas also affect their under-standing and use of information. Accordingly, evalua-tors are commonly admonished to consider accessiblereport formats and styles and to try to predict likelyuses and misuses of reports.

Ultimately, those empowered to make decisionsaffecting programs assume authority for the imple-mentation of evaluation-stimulated program changes.Such decisions can occur at many program levels,with one group of stakeholders citing an evaluationreport to press for changes opposed by another groupof stakeholders citing the same report. For example,from their reading of a report, program managers maymove to expand intake of information about theirclients’ backgrounds to fine-tune service delivery, andbeneficiaries may better understand their rights fromreading the same report and resist deeper incursionsinto their privacy. Stakeholders’ common failures toimplement evaluation findings, failures to implementthem appropriately, and attempts to suppress reportsand discredit evaluators have raised enduring issuesof utility and responsibility. They also raise issuesof authority because a stakeholder might know betterthan an evaluator what would constitute appropriateinterpretations of data, appropriate report dissemina-tion, and appropriate implementation of findings. Anevaluator, however conscientious regarding appropri-ate distribution of authority, cannot claim omniscienceor prevent error.

POSTMODERNCONCERN ABOUT AUTHORITY

One way to think about these issues is to see them asleading inexorably to troubling postmodern perspectivesregarding authority. Postmodernism is characterized

by doubt and denial of the legitimacy of theunderstandings and cultural practices underlyingmodern societies and their institutions—including eval-uation. Postmodernism views science as error prone,methods as value laden, logic and rationality as hope-lessly discontinuous with reality, and strategies forsocial improvement as the beginnings of new failuresand oppressions by new authorities.

In its implacable skepticism about truth and knowl-edge, postmodernism implies that all evaluators abusetheir authority, misrepresenting to greater or lesserdegree the programs they intend to document. Post-modernists view evaluators (and others) as unable toescape their personal or professional ways of know-ing and unable to compel language, if meanings varyacross contexts and persons, to convey fully whatthey think they know. Falling especially on outsiderssuch as external evaluators, postmodern opposition tometanarratives and to the hidden presence of authors(and their values and agendas) in their texts impliesthat evaluation reports are necessarily insensitive toinsiders, diminishing real persons, who are reconfig-ured as mere stakeholders. Postmodernism impliescondemnation of evaluators’ presumptions of author-ity, as authors whose claims of expertise regarding thequality of the program override and overwrite pro-gram participants’ understandings, which are based onlived experience. Postmodernism also recognizes thatauthorizing evaluation is an exercise of power, withevaluators both assuming power and serving the power-ful, contributing to a status quo that undeniably privi-leges some and oppresses others.

In this dark recognition, it is not clear how evalua-tors, heirs of the 18th-century Enlightenment, mightrevise their professional practices so as to promoteappropriate and legitimate authority in evaluation.Encouraging clients and stakeholders to demand ashare or a balance of authority would not necessarilybetter protect the accuracy, sensitivity, or utility ofan evaluation or report. Fuller negotiation or sharingof authority implies the possibility of consensus, butclarifying or redrawing boundaries and expectationsmight promote dissensus instead. If, for example, thecriteria for judging program quality were open toinput by various interested parties, the many valuesdear to the many stakeholders might prove so diversethat standards could not be selected or devised with-out neglecting or offending some.

Similarly, favoring readerly texts, which explicitlyleave interpretive options open to readers for theirown meaning making, over writerly texts, disparaged

Authority of Evaluation———27


by postmodernists as improper usurpations of authorityby authors, might engender stakeholder confusionand professional reproach, especially from colleagueswho hold that provision of evaluative interpretationsor findings is requisite to competent practice. Whenauthority for conducting evaluation is dispersed beyondreclamation and divisiveness becomes entrenched, theevaluation of a program can be thwarted. It is not clearthat absence of authority would serve stakeholders orsociety better than modernist evaluation does.

THE RELATIONSHIPBETWEEN AUTHORITY ANDRESPONSIBILITY IN EVALUATION

Has the evaluator the authority or the responsibility todecide the focus, methods, findings, and audiences foran evaluation? The considerations offered here regard-ing this question imply new questions: Has the evalua-tor the authority or the responsibility to advocate for aparticular focus, methods, interpretation, audience, orstakeholder group? Has the evaluator the authority orthe responsibility to advocate for evaluation as a pro-fession, for evaluation’s professional codes of conduct,or for social science generally? Has the evaluator theauthority or the responsibility to advocate for his or herpersonal views of social improvement or equity?

If the credibility and social utility of the professionare to be preserved, being a good professional mayvery well require something like professional disinter-est rather than advocacy for any group or value.However, if being a good professional requires pro-fessional disinterest rather than advocacy for anygroup or value, then evaluators foreclose on theirobligations as citizens by restricting their participationin the democratic fray within programs or within thepolitical contexts in which programs exist. If being agood professional requires professional disinterestrather than advocacy for any group or value, thenevaluators similarly foreclose on their obligations ashuman beings by restricting challenges to inequitythey might otherwise take up.

Appropriate exercise of authority in evaluation isnot a simple matter but is contingent on methodologi-cal views, interpersonal politics, contextual exigen-cies, and philosophical perspectives, none of whichare absolute, unequivocally clear, or static. In a givensituation, either an exercise of authority or a refusal toexercise unilateral authority might be justified on sim-ilar grounds. Appropriate exercise of authority is ulti-mately a judgment call in an environment wheredifferent judgments can be made and defended—andmust be.

—Linda Mabry

28———Authority of Evaluation


abma, tineke a. abt associates - sage … tineke a. (b. 1964, joure, holland). ph.d. institute for...

Documents