employee training and development

Training and Leadership Development: A HealthyInvestment at Sisters of Charity Providence Hospital

At Sisters of Charity Providence Hospital, a 304-bed hospital in Columbia, SouthCarolina, a training function that addresses performance gaps and supports qualityservice is a top priority. Training and development programs are linked to the hos-pitals strategic goals: to become an employer of choice and to provide service andoperational excellence. It is not only important to develop programs supporting thestrategy but also to provide evidence as to how they contribute. One priority hasbeen to develop leaders from its current managers. The hospitals first program,Leading Edge, included a curriculum that focused on financial and performancemanagement, identifying and recruiting top talent, and change management. Thefocus of the training was based on the middle managers performance needs andweaknesses. The training used a variety of methods but each program included aset of team building activities and a performance review of the organization. For-mal processes were implemented to hold leaders accountable for the performanceof their area. The principle objectives included (1) increased earnings before inter-est, depreciation, and amortization and (2) improvement of employee and patientsatisfaction. Each goal was met during the first year of the program and improve-ments have continued. For example, to measure a behavior change in managerfeedback, questions were added to employee surveys. Results indicate that therehas been a significant increase in feedback by leaders who have attended the train-ing program.

Objectives

After reading this chapter, you should beable to

1. Explain why evaluation is important.

2. Identify and choose outcomes to eval-uate a training program.

3. Discuss the process used to plan andimplement a good training evaluation.

4. Discuss the strengths and weaknessesof different evaluation designs.

5. Choose the appropriate evaluationdesign based on the characteristics ofthe company and the importance andpurpose of the training.

6. Conduct a cost-benefit analysis for atraining program.

215

Chapter Six

Training Evaluation

noe30344_ch06_215-256.qxd 29/8/09 1:51 AM Page 215Confirming Pages

216 Part 2 Designing Training

Another priority has been to use training to improve patient care. This has led tothe hospitals investment in a clinical simulation training program that recreates realpatient experiences in a safe practice environment. Five practice patient rooms havebeen constructed, each with patient simulators. The patients can cough, vomit,and reproduce other bodily functions as well as communicate their medical needs.Employees can work alone or in teams during the simulations to improve their patientcare and problem-solving skills. Trainers provide time for reflection and provide feed-back to help employees learn. The realistic work environments have improved patientcare and employee satisfaction has increased. Also, the retention rate for new first-year nurse graduates has increased 18 percent.

Source: Based on Sisters of Charity Providence Hospitals, The Healthy Glow of Learning T D(October 2008): 6364.

INTRODUCTION

As the opening vignette illustrates, the Sisters of Charity Providence Hospital training functionwanted to show that the time, money, and effort devoted to training actually made a difference.That is, the training function was interested in assessing the effectiveness of training programs.Training effectiveness refers to the benefits that the company and the trainees receive fromtraining. Benefits for trainees may include learning new skills or behavior. Benefits for thecompany may include increased sales and more satisfied customers. A training evaluationmeasures specific outcomes or criteria to determine the benefits of the program. Trainingoutcomes or criteria refer to measures that the trainer and the company use to evaluate train-ing programs. To determine the effectiveness of the program, the hospital had to conduct anevaluation. Training evaluation refers to the process of collecting the outcomes needed todetermine whether training is effective. For Sisters of Charity Providence Hospital, some ofthe outcomes included an increase in feedback by leaders, financial indicators (e.g., increasedearnings before interest), and an improvement in patient care satisfaction. The hospital alsohad to be confident that its information-gathering process was providing it with accurate con-clusions about the effectiveness of its training programs. The evaluation design refers to thecollection of informationincluding what, when, how, and from whomthat will be used todetermine the effectiveness of the training program. Information about the evaluation designused by the hospital is not provided in the opening vignette. However, any organization thatevaluates training has to be confident that trainingrather than same other factoris respon-sible for changes in the outcomes of interest (e.g., patient care and satisfactory earnings).

Recall the Instructional Systems Design model shown in Figure 1.1 and the topics cov-ered in Chapters 2 through 5. The information from the needs assessment, the characteris-tics of the learning environment, and the steps taken to ensure transfer of training shouldall be used to develop an evaluation plan. In order to identify appropriate training out-comes, a company needs to look at its business strategy, its organizational analysis (Whyare we conducting training? How is it related to the business?), its person analysis (Whoneeds training?), its task analysis (What is the training content?), the learning objectives ofthe training, and its plan for training transfer.

This chapter will help you understand why and how to evaluate training programs. Thechapter begins by discussing the types of outcomes used in training program evaluation.


Chapter 6 Training Evaluation 217

The next section of the chapter discusses issues related to choosing an evaluation design.An overview of the types of designs is presented. The practical factors that influence thetype of design chosen for an evaluation are discussed. The chapter concludes by reviewingthe process involved in conducting an evaluation.

REASONS FOR EVALUATING TRAINING

Companies are investing millions of dollars in training programs to help gain a competi-tive advantage. Companies invest in training because learning creates knowledge; often,it is this knowledge that distinguishes successful companies and employees from thosewho are not. Research summarizing the results of studies that have examined the linkagebetween training and human resource outcomes (attitudes and motivation, behaviors,human capital), organizational performance outcomes (performance and productivity), orfinancial outcomes (profits and financial indicators) has found that companies that con-duct training are likely to have more positive human resource outcomes and greater per-formance outcomes.1 The influence of training is largest for organizational performanceoutcomes and human resource outcomes and weakest for financial outcomes. This resultis not surprising, given that training can least affect an organizations financial perform-ance and may do so through its influence on human resource practices. As emphasized inChapter 2, Strategic Training, training is more strongly related to organizational outcomeswhen it is matched with the organizations business strategy and capital intensity. Becausecompanies have made large dollar investments in training and education and view train-ing as a strategy to be successful, they expect the outcomes or benefits related to trainingto be measurable.

For example, consider CompUSA.2 CompUSAs goal was to focus its human resourcestrategy by aligning it with the strategies for improving the business. Senior managementbelieved that one way to promote growth was through building engagement with teammembers. Employee engagement includes employees feelings toward support, recogni-tion, development, and sharing the companys goals. Employee engagement results in sat-isfied employees, which translates into better customer service. CompUSA administeredan engagement survey to more than 200 stores. The engagement survey results were usedto build a strategic training plan. Each year the training function meets to discuss trainingpriorities based on the survey results. Regional human resource managers as well as storemanagers set improvement goals. All employees are linked to the idea of creating a win-ning culture that promotes customer satisfaction. The engagement measure is used to trackthe influence of training programs. It is linked to company strategy, tracks progress overtime, and has a direct link to training. The common use of engagement builds accountabil-ity for managers and provides a valued progress measure for the entire company.

Training evaluation provides a way to understand the investments that training producesand provides information needed to improve training.3 If the company receives an inade-quate return on its investment in training, the company will likely reduce its investment intraining or look for training providers outside the company who can provide training expe-riences that improve performance, productivity, customer satisfaction, or whatever otheroutcomes the company is interested in achieving. Training evaluation provides the dataneeded to demonstrate that training does offer benefits to the company. Training evaluationinvolves both formative and summative evaluation.4



Formative EvaluationFormative evaluation refers to the evaluation of training that takes place during programdesign and development. That is, formative evaluation helps to ensure that (1) the trainingprogram is well organized and runs smoothly and (2) trainees learn and are satisfied withthe program. Formative evaluation provides information about how to make the programbetter; it usually involves collecting qualitative data about the program. Qualitative datainclude opinions, beliefs, and feelings about the program. Formative evaluations ask cus-tomers, employees, managers, and subject-matter experts their opinions on the descriptionof the training content and objectives and the program design. These people are also askedto evaluate the clarity and ease of use of a part of the training program that is provided tothem in the way that it will be delivered (e.g., online, face-to-face, video).5 The formativeevaluation is conducted either individually or in groups before the program is made avail-able to the rest of the company. Trainers may also be involved to measure the time require-ments of the program. As a result of the formative evaluation, training content may bechanged to be more accurate, easier to understand, or more appealing. The training methodcan be adjusted to improve learning (e.g., provide trainees with more opportunities to prac-tice or give feedback). Also, introducing the training program as early as possible to man-agers and customers helps in getting them to buy into the program, which is critical fortheir role in helping employees learn and transfer skills. It also allows their concerns to beaddressed before the program is implemented.

Pilot testing refers to the process of previewing the training program with potentialtrainees and managers or with other customers (persons who are paying for the develop-ment of the program). Pilot testing can be used as a dress rehearsal to show the programto managers, trainees, and customers. It should also be used for formative evaluation. Forexample, a group of potential trainees and their managers may be asked to preview or pilottest a Web-based training program. As they complete the program, trainees and managersmay be asked to provide their opinions as to whether graphics, videos, or music used in theprogram contributed to (or interfered with) learning. They may also be asked how easy itwas to move through the program and complete the exercises, and they may be asked toevaluate the quality of feedback the training program provided after they completed theexercises. The information gained from this preview would be used by program developersto improve the program before it is made available to all employees. St. George Bankdeveloped a new Web-based training system for bank tellers.6 Before the program was pro-vided to all bank tellers, it was reviewed by a small group of bank tellers considered to betypical users of the program. The tellers provided suggestions for improvement, and theinstructional designers incorporated their suggestions into the final version of the program.

Summative EvaluationSummative evaluation refers to an evaluation conducted to determine the extent to whichtrainees have changed as a result of participating in the training program. That is, havetrainees acquired knowledge, skills, attitudes, behavior, or other outcomes identified in thetraining objectives? Summative evaluation may also include measuring the monetary ben-efits (also known as return on investment) that the company receives from the program.Summative evaluation usually involves collecting quantitative (numerical) data throughtests, ratings of behavior, or objective measures of performance such as volume of sales,accidents, or patents.



From the discussion of summative and formative evaluation, it is probably apparent toyou why a training program should be evaluated:

1. To identify the programs strengths and weaknesses. This includes determining if theprogram is meeting the learning objectives, if the quality of the learning environment issatisfactory, and if transfer of training to the job is occurring.

2. To assess whether the content, organization, and administration of the programincluding the schedule, accommodations, trainers, and materialscontribute to learn-ing and the use of training content on the job.

3. To identify which trainees benefit most or least from the program.

4. To assist in marketing programs through the collection of information from participantsabout whether they would recommend the program to others, why they attended the pro-gram, and their level of satisfaction with the program.

5. To determine the financial benefits and costs of the program.

6. To compare the costs and benefits of training versus nontraining investments (such aswork redesign or a better employee selection system).

7. To compare the costs and benefits of different training programs to choose the bestprogram.

OVERVIEW OF THE EVALUATION PROCESS

Before the chapter explains each aspect of training evaluation in detail, you need to under-stand the evaluation process, which is summarized in Figure 6.1. The previous discussionof formative and summative evaluation suggests that training evaluation involves scruti-nizing the program both before and after the program is completed. Figure 6.1 emphasizesthat training evaluation must be considered by managers and trainers before training hasactually occurred. As was suggested earlier in this chapter, information gained from thetraining design process shown in Figure 1.1 is valuable for training evaluation.

The evaluation process should begin with determining training needs (as discussed inChapter 3). Needs assessment helps identify what knowledge, skills, behavior, or other

Conduct a Needs Analysis

Develop Measurable Learning Objectives and Analyze Transfer of Training

Develop Outcome Measures

Choose an Evaluation Strategy

Plan and Execute the Evaluation

FIGURE 6.1The EvaluationProcess

Source: Based on D. A. Grove and C. Ostroff, ProgramEvaluation, inDeveloping HumanResources, ed. K. N. Wexley(Washington, DC:Bureau of NationalAffairs, 1991): 5185to 5220; K. Kraiger,D. McLinden, and W. Casper,CollaborativePlanning for TrainingImpact, HumanResource Management(Winter 2004):33751.



learned capabilities are needed. Needs assessment also helps identify where the training isexpected to have an impact. Needs assessment helps focus the evaluation by identifying thepurpose of the program, the resources needed (human, financial, company), and the out-comes that will provide evidence that the program has been effective.7 The next step in theprocess is to identify specific, measurable training objectives to guide the program. Thecharacteristics of good objectives are discussed in Chapter 4. The more specific and meas-urable these objectives are, the easier it is to identify relevant outcomes for the evaluation.Besides considering the learning and program objectives in developing learning outcomes,it is also important to consider the expectations of those individuals who support the pro-gram and have an interest in it (stakeholders such as trainees, managers, and trainers).8 Ifthe needs assessment was done well, the stakeholders interests likely overlap considerablywith the learning and program objectives. Analysis of the work environment to determinetransfer of training (discussed in Chapter 5) can be useful for determining how trainingcontent will be used on the job. Based on the learning objectives and analysis of transfer oftraining, outcome measures are designed to assess the extent to which learning and trans-fer have occurred.

Once the outcomes have been identified, the next step is to determine an evaluation strat-egy. Factors such as expertise, how quickly the information is needed, change potential, andthe organizational culture should be considered in choosing a design. Planning and execut-ing the evaluation involves previewing the program (formative evaluation) as well as col-lecting training outcomes according to the evaluation design. The results of the evaluationare used to modify, market, or gain additional support for the program. The results of theevaluation should also be used to encourage all stakeholders in the training processincluding managers, employees, and trainersto design or choose training that helps thecompany meet its business strategy and helps managers and employees meet their goals.9

OUTCOMES USED IN THE EVALUATION OF TRAINING PROGRAMS

To evaluate its training program, a company must decide how it will determine the pro-grams effectiveness; that is, it must identify what training outcomes or criteria it willmeasure.

One of the original frameworks for identifying and categorizing training outcomes wasdeveloped by Kirkpatrick. Table 6.1 shows Kirkpatricks four-level framework for catego-rizing training outcomes.10

The hierarchical nature of Kirkpatricks framework suggests that higher level out-comes should not be measured unless positive changes occur in lower level outcomes.For example, if trainees do not like a course, no learning will occur. Also, the frame-work implies that changes at a higher level (e.g., results) are more beneficial than

Level Criteria Focus

4 Results Business results achieved by trainees3 Behavior Improvement of behavior on the job2 Learning Acquisition of knowledge, skills, attitudes, behavior1 Reactions Trainee satisfaction

TABLE 6.1KirkpatricksFour-LevelFramework ofEvaluationCriteria

Source: Based on D. L. Kirkpatrick,Evaluation in TheASTD Training andDevelopment Hand-book (2d ed.), ed. R. L. Craig (NewYork: McGraw-Hill,1996): 294312.



changes at a lower level (e.g., learning). However, the framework has been criticized fora number of reasons. First, research has not found that each level is caused by the levelthat precedes it in the framework, nor does evidence suggest that the levels differ inimportance.11 Second, the approach does not take into account the purpose of the eval-uation. The outcomes used for evaluation should relate to the training needs, the pro-gram learning objectives, and the strategic reasons for training. Third, use of theapproach suggests that outcomes can and should be collected in an orderly manner, thatis, measures of reaction followed by measures of learning, behavior, and results. Real-istically, learning measures need to be collected at approximately the same time as reac-tion measures, near the end of the training program, in order to determine whetherlearning has occurred.

As a result of these criticisms, both training practitioners and academic researchers havedeveloped a more comprehensive model of training criteria; that is, additional trainingoutcomes have been added to Kirkpatricks original framework. Accordingly, training out-comes have been classified into six categories, as shown in Table 6.2: reaction outcomes,learning or cognitive outcomes, behavior and skill-based outcomes, affective outcomes,results, and return on investment.12

Table 6.2 shows training outcomes, D.L. Kirkpatricks five-level framework for cate-gorizing training outcomes, and a description of each of the outcomes and how they aremeasured. Both level 1 and level 2 outcomes (reactions and learning) are collected at thecompletion of training, before trainees return to the job. Level 3 outcomes(behavior/skills) can also be collected at the completion of training to determine traineesbehavior or skill level at the completion of training. To determine whether trainees areusing training content back on the job (i.e., whether transfer of training has occurred),level 3, level 4, and/or level 5 outcomes can be collected. Level 3 criteria can be collectedto determine whether behavior/skills are being used on the job. Level 4 and level 5 crite-ria (results and return on investment) can also be used to determine whether training hasresulted in an improvement in business results such as productivity or customer satisfac-tion. These criteria also help to determine whether the benefits of training exceed theircosts.

Reaction OutcomesReaction outcomes refer to trainees perceptions of the program, including the facilities,trainers, and content. (Reaction outcomes are often referred to as a measure of creaturecomfort.) They are often called class or instructor evaluations. This information is typi-cally collected at the programs conclusion. You probably have been asked to completeclass or instructor evaluations either at the end of a college course or a training program atwork. Reactions are useful for identifying what trainees thought was successful or whatinhibited learning. Reaction outcomes are level 1 (reaction) criteria in Kirkpatricksframework.

Reaction outcomes are typically collected via a questionnaire completed by trainees. Areaction measure should include questions related to the trainees satisfaction with theinstructor, training materials, and training administration (ease of registration, accuracyof course description) as well as the clarity of course objectives and usefulness of thetraining content.13 Table 6.3 shows a reaction measure that contains questions about theseareas.

noe30344_ch06_215-256.qxd 9/9/09 17:28 Page 221Rev. Confirming Pages


Outcome or What Is Method ofCriteria Level Measured Example Measurement

Reactions 1 Learners Comfortable Surveyssatisfaction training

roomUseful materials Interviewsand program content

Learning or 2 Principles, facts, Electrical Cognitive techniques, principles

procedures, Safety rules Testsor processes the Steps in Work sampleslearners have interviewingacquired

Behavior and 2 or 3 Technical or Preparing a Testsskills motor skills or dessert Observations

behaviors Sawing wood Self, peer, acquired by Landing an customer, and/orlearners airplane managers

Listening ratingsWork samples

Affective 2 or 3 Learners Tolerance Attitude surveysattitudes and for diversity Interviewsmotivation Safety Focus groups

attitudesCustomerservice orientation

Results 4 Payoffs for the Productivity Observationcompany Quality Performance data

Costs from records or Repeat companycustomers databasesCustomersatisfactionAccidents

Return on 5 Identification Dollar value of Economic valueInvestment and comparison productivity

of learning divided by benefits with training costscosts

TABLE 6.2 Evaluation Outcomes

Source: Based on K. Kraiger, J. K. Ford, and E. Salas, Application of Cognitive, skill-Based, and Affective Theories of Learning Outcomes to New Methods ofTraining Evaluation, Journal of Applied Psychology 78 (2) (1993): 311328; K. Kraiger, Decision-Based Evaluation, in K. Kraiger (ed.), Creating, Implementing,and Managing Effective Training and Development (San Francisco: Jossey-Bass, 2002): 331375; D. Kirkpatrick, Evaluation, in The ASTD Training and Develop-ment Handbook, 2nd ed., ed. R. L. Craig (New York: McGraw-Hill, 1996): 294312.



An accurate evaluation needs to include all the factors related to a successful learning envi-ronment.14 Most instructor or class evaluations include items related to the trainers prepara-tion, delivery, ability to lead a discussion, organization of the training materials and content,use of visual aids, presentation style, ability and willingness to answer questions, and abilityto stimulate trainees interest in the course. These items come from trainers manuals, trainercertification programs, and observation of successful trainers. Conventional wisdom suggeststhat trainees who like a training program (who have positive reactions) learn more and aremore likely to change behaviors and improve their performance (transfer of training). Is thisthe case? Recent research results suggest that reactions have the largest relationship withchanges in affective learning outcomes.15 Also, research has found that reactions are signifi-cantly related to changes in declarative and procedural knowledge, which challenges previousresearch suggesting that reactions are unrelated to learning. For courses such as diversity train-ing or ethics training, trainee reactions are especially important because they affect learnersreceptivity to attitude change. Reactions have been found to have the strongest relationshipwith post-training motivation, trainee self-efficacy, and declarative knowledge when technol-ogy is used for instructional delivery. This suggests that for online or e-learning training meth-ods, it is important to ensure that it is easy for trainees to access them and the training contentis meaningful, i.e., linked to their current job experiences, tasks, or work issues.

Learning or Cognitive OutcomesCognitive outcomes are used to determine the degree to which trainees are familiar withprinciples, facts, techniques, procedures, or processes emphasized in the training program.Cognitive outcomes measure what knowledge trainees learned in the program. Cognitive

Read each statement below. Indicate the extent to which you agree or disagree with each statement usingthe scale below.

Strongly Disagree Disagree Neither Agree Strongly Agree1 2 3 4 5

1. I had the knowledge and skills needed to learn in this course.2. The facilities and equipment made it easy to learn.3. The course met all of the stated objectives.4. I clearly understood the course objectives.5. The way the course was delivered was an effective way to learn.6. The materials I received during the course were useful.7. The course content was logically organized.8. There was enough time to learn the course content.9. I felt that the instructor wanted us to learn.

10. I was comfortable asking the instructor questions.11. The instructor was prepared.12. The instructor was knowledgeable about the course content.13. I learned a lot from this course.14. What I learned in this course is useful for my job.15. The information I received about the course was accurate.16. Overall, I was satisfied with the instructor.17. Overall, I was satisfied with the course.

TABLE 6.3 Sample Reaction Measure



outcomes are level 2 (learning) criteria in Kirkpatricks framework. Typically, pencil-and-paper tests are used to assess cognitive outcomes. Table 6.4 provides an example of itemsfrom a pencil-and-paper test used to measure trainees knowledge of decision-makingskills. These items help to measure whether a trainee knows how to make a decision (theprocess he or she would use). They do not help to determine if the trainee will actually usedecision-making skills on the job.

Behavior and Skill-Based OutcomesSkill-based outcomes are used to assess the level of technical or motor skills and behav-iors. Skill-based outcomes include acquisition or learning of skills (skill learning) and useof skills on the job (skill transfer). Skill-based outcomes relate to Kirkpatricks level 2(learning) and level 3 (behavior). The extent to which trainees have learned skills can beevaluated by observing their performance in work samples such as simulators. Skill trans-fer is usually determined by observation. For example, a resident medical student may per-form surgery while the surgeon carefully observes, giving advice and assistance as needed.Trainees may be asked to provide ratings of their own behavior or skills (self-ratings).Peers, managers, and subordinates may also be asked to rate trainees behavior or skillsbased on their observations. Because research suggests that the use of only self-ratingslikely results in an inaccurately positive assessment of skill or behavior transfer of training,it is recommended that skill or behavior ratings be collected from multiple perspectives(e.g., managers and subordinates or peers).16 Table 6.5 shows a sample rating form. Thisform was used as part of an evaluation of a training program developed to improve schoolprincipals management skills.

Affective OutcomesAffective outcomes include attitudes and motivation. Affective outcomes that might becollected in an evaluation include tolerance for diversity, motivation to learn, safety atti-tudes, and customer service orientation. Affective outcomes can be measured using sur-veys. Table 6.6 shows an example of questions on a survey used to measure career goals,plans, and interests. The specific attitude of interest depends on the program objectives.Affective outcomes relate to Kirkpatricks level 2 (learning) or level 3 (behavior) depend-ing on how they are evaluated. If trainees were asked about their attitudes on a survey, that

For each question, check all that apply.

1. If my boss returned a piece of work to me and asked me to make changes on it, I would:

__ Prove to my boss that the work didnt need to be changed.__ Do what the boss said, but show where changes are needed.__ Make the changes without talking to my boss.__ Request a transfer from the department.2. If I were setting up a new process in my office, I would:__ Do it on my own without asking for help.__ Ask my boss for suggestions.__ Ask the people who work for me for suggestions.__ Discuss it with friends outside the company.

TABLE 6.4Sample TestItems Used toMeasureLearning

Source: Based on A. P. Carnevale, L. J. Gainer, and A. S. Meltzer,Workplace BasicsTraining Manual (SanFrancisco: Jossey-Bass, 1990): 8.12.



Rating task: Consider your opportunities over the past three months to observe and interact with theprincipal/assistant principal you are rating. Read the definition and behaviors associated with the skill. Thencomplete your ratings using the following scale:

Always Usually Sometimes Seldom Never1 2 3 4 5

I. Sensitivity: Ability to perceive the needs, concerns, and personal problems of others; tact in dealing withpersons from different backgrounds; skill in resolving conflict; ability to deal effectively with peopleconcerning emotional needs; knowing what information to communicate to whom.

To what extent in the past three months has the principal or assistant principal:

__ 1. Elicited perceptions, feelings, and concerns of others?__ 2. Expressed verbal and nonverbal recognition of the feelings, needs, and concerns of others?__ 3. Taken actions that anticipated the emotional effects of specific behaviors?__ 4. Accurately reflected the point of view of others by restating it, applying it, or encouraging

feedback?__ 5. Communicated all information to others that they needed to perform their job?__ 6. Diverted unnecessary conflict with others in problem situations?

II. Decisiveness: Ability to recognize when a decision is required and act quickly. (Disregard the quality of thedecision.)

To what extent in the past three months has this individual:__ 7. Recognized when a decision was required by determining the results if the decision was made

or not made?__ 8. Determined whether a short- or long-term solution was most appropriate to various situations

encountered in the school?__ 9. Considered decision alternatives?__10. Made a timely decision based on available data?__11. Stuck to decisions once they were made, resisting pressures from others?

TABLE 6.5 Sample Rating Form Used to Measure Behavior

would be considered a learning measure. For example, attitudes toward career goals andinterests might be an appropriate outcome to use to evaluate training focusing on employ-ees self-managing their careers.

ResultsResults are used to determine the training programs payoff for the company. Exam-ples of results outcomes include increased production and reduced costs related toemployee turnover, accidents, and equipment downtime as well as improvements inproduct quality or customer service.17 Results outcomes are level 4 (results) criteria in

1. At this time I have a definite career goal in mind.2. I have a strategy for achieving my career goals.3. My manager is aware of my career goals.4. I have sought information regarding my specific areas of career interest from friends,

colleagues, or company career sources.5. I have initiated conversations concerning my career plans with my manager.

TABLE 6.6Example ofAffectiveOutcomes:Career Goals,Plans, andInterests



Kirkpatricks framework. For example, Kroger, the supermarket chain, hires more than100,000 new employees each year who need to be trained.18 Kroger collected produc-tivity data for an evaluation comparing cashiers who received computer-based trainingto those who were trained in the classroom and on the job. The measures of productiv-ity included rate of scanning grocery items, recognition of produce that had to be iden-tified and weighed at the checkout, and the amount of time that store offices spenthelping the cashiers deal with more complex transactions such as food stamps andchecks.

LQ Management LLC, the parent company of La Quinta Inns and Suites, is responsiblefor training for the hotels.19 The company recently implemented a new sales strategydesigned to generate the best available rate for each hotel location based on customerdemand and occupancy rates. A training program involving an experiential game(Buddys View) was used to engage staff in understanding how the new sales strategywould impact the business as well as to improve customer service. To evaluate the effec-tiveness of the program, the company collects business results (Kirkpatricks level 4 crite-ria), specifically, percent changes in service quality and customers intent to return beforeand after the program.

Return on InvestmentReturn on investment (ROI) refers to comparing the trainings monetary benefits withthe cost of the training. ROI is often referred to as level 5 evaluation (see Table 6.2).Training costs can be direct and indirect.20 Direct costs include salaries and benefits forall employees involved in training, including trainees, instructors, consultants, andemployees who design the program; program material and supplies; equipment or class-room rentals or purchases; and travel costs. Indirect costs are not related directly to thedesign, development, or delivery of the training program. They include general officesupplies, facilities, equipment, and related expenses; travel and expenses not directlybilled to one program; training department management and staff salaries not related toany one program; and administrative and staff support salaries. Benefits are the value thatthe company gains from the training program.

The Northwest Airlines technical operations training department includes 72 instructorswho are responsible for training thousands of aircraft technicians and more than 10,000outside vendors who work on maintaining the Northwest aircraft fleet.21 Each of the train-ing instructors works with one type of aircraft, such as the Airbus 320. Most of the depart-ments training is instructor-led in a classroom, but other instruction programs use asimulator or take place in an actual airplane.

By tracking department training data, which allowed for training evaluation, the techni-cal operations department was able to demonstrate its worth by showing how its servicescontribute to the airlines business. For example, the technical operations departmentreduced the cost of training an individual technician by 16 percent; increased customer sat-isfaction through training; increased training productivity; made the case for upper man-agement to provide financial resources for training; and improved postcourse evaluations,knowledge, and performance gains.

To achieve these results, the technical operations training department developed theTraining Quality Index (TQI). The TQI is a computer application that collects data abouttraining department performance, productivity, budget, and courses and allows for detailed



analysis of the data. TQI tracks all department training data into five categories: effective-ness, quantity, perceptions, financial impact, and operational impact. The quality of train-ing is included under the effectiveness category. For example, knowledge gain relates tothe difference in trainees pretraining and posttraining knowledge measured by exams. Thesystem can provide performance reports that relate to budgets and the cost of training perstudent per day and other costs of training. The measures that are collected are also linkedto department goals, to department strategy, and ultimately, to Northwest Airlines overallstrategy. Questions that were often asked before TQI was developed but couldnt easily beansweredsuch as how can the cost of training be justified, what is the operational impactof training, and what amount of training have technicians receivedcan now be answeredthrough the TQI system. Training demand can be compared against passenger loads andthe number of flying routes to determine the right number of trainers in the right locationsto support business needs. These adjustments increase customer satisfaction and result inpositive views of the training operations.

DETERMINING WHETHER OUTCOMES ARE APPROPRIATE

An important issue in choosing outcomes is to determine whether they are appropriate.That is, are these outcomes the best ones to measure to determine whether the training pro-gram is effective? Appropriate training outcomes need to be relevant, reliable, discrimina-tive, and practical.22

RelevanceCriteria relevance refers to the extent to which training outcomes are related to thelearned capabilities emphasized in the training program. The learned capabilities requiredto succeed in the training program should be the same as those required to be successful onthe job. The outcomes collected in training should be as similar as possible to what traineeslearned in the program. That is, the outcomes need to be valid measures of learning. Oneway to ensure the relevancy of the outcomes is to choose outcomes based on the learningobjectives for the program. Recall from Chapter 4 that the learning objectives show theexpected action, the conditions under which the trainee is to perform, and the level or stan-dard of performance.

Figure 6.2 shows two ways that training outcomes may lack relevance. Criterion con-tamination refers to the extent that training outcomes measure inappropriate capabilitiesor are affected by extraneous conditions. For example, if managers evaluations of job per-formance are used as a training outcome, trainees may receive higher ratings of job per-formance simply because the managers know they attended the training program, believethe program is valuable, and therefore give high ratings to ensure that the training lookslike it positively affects performance. Criteria may also be contaminated if the conditionsunder which the outcomes are measured vary from the learning environment. That is,trainees may be asked to perform their learned capabilities using equipment, time con-straints, or physical working conditions that are not similar to those in the learning envi-ronment.

For example, trainees may be asked to demonstrate spreadsheet skills using a newer ver-sion of spreadsheet software than they used in the training program. This demonstration



likely will result in no changes in their spreadsheet skills from pretraining levels. In thiscase, poor-quality training is not the cause for the lack of change in their spreadsheet skills.Trainees may have learned the necessary spreadsheet skills, but the environment for theevaluation differs substantially from the learning environment, so no change in skill levelis observed.

Criteria may also be deficient. Criterion deficiency refers to the failure to measuretraining outcomes that were emphasized in the training objectives. For example, the objec-tives of a spreadsheet skills training program emphasize that trainees both understand thecommands available on the spreadsheet (e.g., compute) and use the spreadsheet to calcu-late statistics using a data set. An evaluation design that uses only learning outcomes suchas a test of knowledge of the purpose of keystrokes is deficient, because the evaluationdoes not measure outcomes that were included in the training objectives (e.g., use spread-sheet to compute the mean and standard deviation of a set of data).

ReliabilityReliability refers to the degree to which outcomes can be measured consistently over time.For example, a trainer gives restaurant employees a written test measuring knowledge ofsafety standards to evaluate a safety training program they attended. The test is givenbefore (pretraining) and after (posttraining) employees attend the program. A reliable testincludes items for which the meaning or interpretation does not change over time. A reli-able test allows the trainer to have confidence that any improvements in posttraining testscores from pretraining levels are the result of learning that occurred in the training pro-gram, not test characteristics (e.g., items are more understandable the second time) or thetest environment (e.g., trainees performed better on the posttraining test because the class-room was more comfortable and quieter).

DiscriminationDiscrimination refers to the degree to which trainees performance on the outcome actu-ally reflects true differences in performance. For example, a paper-and-pencil test thatmeasures electricians knowledge of electrical principles must detect true differences intrainees knowledge of electrical principles. That is, the test should discriminate on the

Contamination Relevance Deficiency

Outcomes Measured in Evaluation

Outcomes Related to Training

Objectives

Outcomes Identified by

Needs Assessment and Included in Training Objectives

FIGURE 6.2CriterionDeficiency,Relevance, andContamination



basis of trainees knowledge of electrical principles. (People who score high on the testhave a better understanding of the principles of electricity than do those who score low.)

PracticalityPracticality refers to the ease with which the outcome measures can be collected. One rea-son companies give for not including learning, performance, and behavior outcomes intheir evaluation of training programs is that collecting them is too burdensome. (It takestoo much time and energy, which detracts from the business.) For example, in evaluating asales training program, it may be impractical to ask customers to rate the salespersonsbehavior because this would place too much of a time commitment on the customer (andprobably damage future sales relationships).

EVALUATION PRACTICES

Figure 6.3 shows outcomes used in training evaluation practices. Surveys of companiesevaluation practices indicate that reactions (an affective outcome) and cognitive outcomesare the most frequently used outcomes in training evaluation.23 Despite the less frequentuse of cognitive, behavioral, and results outcomes, research suggests that training canhave a positive effect on these outcomes.24 Keep in mind that while most companies areconducting training evaluations, some surveys indicate that 20 percent of all companiesare not!

Which Training Outcomes Should Be Collected?From our discussion of evaluation outcomes and evaluation practices you may have themistaken impression that it is necessary to collect all five levels of outcomes to evaluatea training program. While collecting all five levels of outcomes is ideal, the training pro-gram objectives determine which ones should be linked to the broader business strategy,

100

90

80

70

60

50

40

30

20

10

0Per

cent

age

of C

ompa

nies

Usi

ng O

utco

me

Reaction Cognitive Behavior Results

54%

23%

8%3%

91%

Return on Investment(Actual or Projected)

Outcomes

FIGURE 6.3TrainingEvaluationPractices

Note: Respondentswere companies thatparticipated in ASTDBenchmarkingForum.Source: Based on B. Sugrue and R. Rivera, 2005 State of the Industry(Alexandria, VA:American Society forTraining andDevelopment, 2005): 15.



as discussed in Chapter 2. To ensure adequate training evaluation, companies should col-lect outcome measures related to both learning (levels 1 and 2) and transfer of training(levels 3, 4, or 5).

It is important to recognize the limitations of choosing to measure only reaction andcognitive outcomes. Consider the previous discussions of learning (Chapter 4) and transferof training (Chapter 5). Remember that for training to be successful, learning and transferof training must occur. Figure 6.4 shows the multiple objectives of training programs andtheir implication for choosing evaluation outcomes. Training programs usually have objec-tives related to both learning and transfer. That is, they want trainees to acquire knowledgeand cognitive skill and also to demonstrate the use of the knowledge or strategy in their on-the-job behavior. As a result, to ensure an adequate training evaluation, companies mustcollect outcome measures related to both learning and transfer.

Ernst Youngs training function uses knowledge testing (level 2) for all of the companyse-learning courses, which account for 50 percent of training.25 New courses and programsuse behavior transfer (level 3) and business results (level 4). Regardless of the program, thecompanys leaders are interested in whether the trainees feel that training has been a gooduse of their time, money, and whether they would recommend it to other employees (level1). The training function automatically tracks these outcomes. Managers use training anddevelopment to encourage observable behavior changes in employees that will result inbusiness results such as client satisfaction and lower turnover, which they also monitor.

Note that outcome measures are not perfectly related to each other. That is, it is tempt-ing to assume that satisfied trainees learn more and will apply their knowledge and skill tothe job, resulting in behavior change and positive results for the company. However,research indicates that the relationships among reaction, cognitive, behavior, and resultsoutcomes are small.26

Objective

Outcomes

TransferLearning

Skill-Based: Ratings by peersor managers basedon observationof behavior

Affective: Trainees motivationor job attitudes

Results: Did company benefitthrough sales, quality,productivity, reducedaccidents, and complaints?

Performance onwork equipment

Reactions: Did trainees likethe program?Did the environmenthelp learning?Was materialmeaningful?

Learning/Cognitive: Pencil-and-paper tests

Behavior/

Behavior/

Skill-Based: Performance ona work sample

FIGURE 6.4TrainingProgramObjectives andTheirImplicationsfor Evaluation



Which training outcomes measure is best? The answer depends on the training objec-tives. For example, if the instructional objectives identified business-related outcomessuch as increased customer service or product quality, then results outcomes should beincluded in the evaluation. As Figure 6.4 shows, both reaction and cognitive outcomes mayaffect learning. Reaction outcomes provide information regarding the extent to which thetrainer, facilities, or learning environment may have hindered learning. Learning or cogni-tive outcomes directly measure the extent to which trainees have mastered training content.However, reaction and cognitive outcomes do not help determine how much trainees actu-ally use the training content in their jobs. As much as possible, evaluation should includebehavior or skill-based, affective, or results outcomes to determine the extent to whichtransfer of training has occurredthat is, whether training has influenced a change inbehavior, skill, or attitude or has directly influenced objective measures related to companyeffectiveness (e.g., sales).

How long after training should outcomes be collected? There is no accepted standardfor when the different training outcomes should be collected. In most cases, reactions areusually measured immediately after training.27 Learning, behavior, and results should bemeasured after sufficient time has elapsed to determine whether training has had an influ-ence on these outcomes. Positive transfer of training is demonstrated when learning occursand positive changes in skill-based, affective, or results outcomes are also observed. Notransfer of training is demonstrated if learning occurs but no changes are observed in skill-based, affective, or learning outcomes. Negative transfer is evident when learning occursbut skills, affective outcomes, or results are less than at pretraining levels. Results of eval-uation studies that find no transfer or negative transfer suggest that the trainer and themanager need to investigate whether a good learning environment (e.g., opportunities forfeedback and practice) was provided in the training program, trainees were motivated andable to learn, and the needs assessment correctly identified training needs.

EVALUATION DESIGNS

The design of the training evaluation determines the confidence that can be placed in theresults, that is, how sure a company can be that training is either responsible for changesin evaluation outcomes or has failed to influence the outcomes. No evaluation design canensure that the results of the evaluation are completely due to training. What the evalua-tor strives for is to use the most rigorous design possible (given the circumstances underwhich the evaluation occurs) to rule out alternative explanations for the results of theevaluation.

This discussion of evaluation designs begins by identifying these alternative explana-tions that the evaluator should attempt to control for. Next, various evaluation designs arecompared. Finally, this section discusses practical circumstances that the trainer needs toconsider in selecting an evaluation design.

Threats to Validity: Alternative Explanations for Evaluation ResultsTable 6.7 presents threats to validity of an evaluation. Threats to validity refer to factorsthat will lead an evaluator to question either (1) the believability of the study results or (2) the extent to which the evaluation results are generalizable to other groups of trainees



and situations.28 The believability of study results refers to internal validity. The internalthreats to validity relate to characteristics of the company (history), the outcome measures(instrumentation, testing), and the persons in the evaluation study (maturation, regressiontoward the mean, mortality, initial group differences). These characteristics can cause theevaluator to reach the wrong conclusions about training effectiveness. An evaluation studyneeds internal validity to provide confidence that the results of the evaluation (particularlyif they are positive) are due to the training program and not to another factor. For example,consider a group of managers who have attended a communication skills training program.At the same time that they attend the program, it is announced that the company will berestructured. After the program, the managers may become better communicators simplybecause they are scared that otherwise they will lose their jobs. Perhaps no learning actu-ally occurred in the training program!

Trainers are also interested in the generalizability of the study results to other groupsand situations (i.e., they are interested in the external validity of the study). As shown inTable 6.7, threats to external validity relate to how study participants react to beingincluded in the study and the effects of multiple types of training. Because evaluation usu-ally does not involve all employees who have completed a program (or who may take train-ing in the future), trainers want to be able to say that the training program will be effectivein the future with similar groups.

Threats to Internal Validity Description

Company

History Event occurs, producing changes in training outcomes.

Persons

Maturation Changes in training outcomes result from trainees physical growth or emotional state.

Mortality Study participants drop out of study (e.g., leave company).Initial group differences Training group differs from comparison group on individual differences

that influence outcomes (knowledge, skills, ability, behavior).

Outcome Measures

Testing Trainees are sensitized to perform well on posttest measures.Instrumentation Trainee interpretation of outcomes changes over course of evaluation.Regression toward the mean High-and low-scoring trainees move toward middle or average on

posttraining measure.

Threats to External Validity Description

Reaction to pretest Use of test before training causes trainee to pay attention to material on test.

Reaction to evaluation Being evaluated causes trainee to try harder in training program.Interaction of selection and training Characteristics of trainee influence program effectiveness.Interaction of methods Results of trainees who received different methods can be generalized

only to trainees who receive same training in the same order.

TABLE 6.7 Threats to Validity

Source: Based on T. D. Cook, D. T. Campbell, and L. Peracchio, Quasi-Experimentation, in Handbook of Industrial and Organizational Psychology, 2d ed., Vol. 1,eds. M. D. Dunnette and L. M. Hough (Palo Alto, CA: Consulting Psychologists Press, 1990): 491576.



Methods to Control for Threats to Validity

Because trainers often want to use evaluation study results as a basis for changing trainingprograms or demonstrating that training does work (as a means to gain additional fundingfor training from those who control the training budget), it is important to minimize thethreats to validity. There are three ways to minimize threats to validity: the use of pretestsand posttests in evaluation designs, comparison groups, and random assignment.

Pretests and Posttests One way to improve the internal validity of the study results is tofirst establish a baseline or pretraining measure of the outcome. Another measure of theoutcomes can be taken after training. This is referred to as a posttraining measure. Acomparison of the posttraining and pretraining measures can indicate the degree to whichtrainees have changed as a result of training.

Use of Comparison Groups Internal validity can be improved by using a control orcomparison group. A comparison group refers to a group of employees who participate inthe evaluation study but do not attend the training program. The comparison employeeshave personal characteristics (e.g., gender, education, age, tenure, skill level) as similar tothe trainees as possible. Use of a comparison group in training evaluation helps to rule outthe possibility that changes found in the outcome measures are due to factors other thantraining. The Hawthorne effect refers to employees in an evaluation study performing at ahigh level simply because of the attention they are receiving. Use of a comparison grouphelps to show that any effects observed are due specifically to the training rather than theattention the trainees are receiving. Use of a comparison group helps to control for theeffects of history, testing, instrumentation, and maturation because both the comparisongroup and the training group are treated similarly, receive the same measures, and have thesame amount of time to develop.

For example, consider an evaluation of a safety training program. Safe behaviors aremeasured before and after safety training for both trainees and a comparison group. If thelevel of safe behavior improves for the training group from pretraining levels but remainsrelatively the same for the comparison group at both pretraining and posttraining, the rea-sonable conclusion is that the observed differences in safe behaviors are due to the train-ing and not some other factor, such as the attention given to both the trainees and thecomparison group by asking them to participate in the study.

Random Assignment Random assignment refers to assigning employees to the trainingor comparison group on the basis of chance. That is, employees are assigned to the trainingprogram without consideration of individual differences (ability, motivation) or priorexperiences. Random assignment helps to ensure that trainees are similar in individualdifferences such as age, gender, ability, and motivation. Because it is often impossible toidentify and measure all the individual characteristics that might influence the outcomemeasures, random assignment ensures that these characteristics are equally distributed inthe comparison group and the training group. Random assignment helps to reduce theeffects of employees dropping out of the study (mortality) and differences between thetraining group and comparison group in ability, knowledge, skill, or other personalcharacteristics.

Keep in mind that random assignment is often impractical. Companies want to trainemployees who need training. Also, companies may be unwilling to provide a comparisongroup. One solution to this problem is to identify the factors in which the training and com-parison groups differ and control for these factors in the analysis of the data (a statistical



procedure known as analysis of covariance). Another method is to determine traineescharacteristics after they are assigned and ensure that the comparison group includesemployees with similar characteristics.

Types of Evaluation DesignsA number of different designs can be used to evaluate training programs.29 Table 6.8 com-pares each design on the basis of who is involved (trainees, comparison group), whenmeasures are collected (pretraining, posttraining), the costs, the time it takes to conduct theevaluation, and the strength of the design for ruling out alternative explanations for theresults. As shown in Table 6.8, research designs vary based on whether they include pre-training and posttraining measurement of outcomes and a comparison group. In general,designs that use pretraining and posttraining measures of outcomes and include a compar-ison group reduce the risk that alternative factors (other than the training itself) are respon-sible for the results of the evaluation. This increases the trainers confidence in using theresults to make decisions. Of course, the trade-off is that evaluations using these designsare more costly and take more time to conduct than do evaluations not using pretrainingand posttraining measures or comparison groups.

Posttest OnlyThe posttest-only design refers to an evaluation design in which only posttraining out-comes are collected. This design can be strengthened by adding a comparison group(which helps to rule out alternative explanations for changes). The posttest-only design isappropriate when trainees (and the comparison group, if one is used) can be expected tohave similar levels of knowledge, behavior, or results outcomes (e.g., same number ofsales, equal awareness of how to close a sale) prior to training.

Consider the evaluation design that Mayo Clinic used to compare two methods fordelivering new manager training.30 Mayo Clinic is one of the worlds leading centers of

Measures

Design Groups Pretraining Posttraining Cost Time Strength

Posttest only Trainees No Yes Low Low LowPretest/posttest Trainees Yes Yes Low Low Med.Posttest only with Trainees and No Yes Med. Med. Med.comparison group comparisonPretest/posttest with Trainees and Yes Yes Med. Med. Highcomparison group comparisonTime series Trainees Yes Yes, several Med. Med. Med.Time series with Trainees and Yes Yes, several High Med. Highcomparison group comparisonand reversalSolomon Four-Group Trainees A Yes Yes High High High

Trainees B No YesComparison A Yes YesComparison B No Yes

TABLE 6.8 Comparison of Evaluation Designs



medical education and research. Recently, Mayo has undergone considerable growthbecause a new hospital and clinic have been added in the Phoenix area (Mayo Clinic is alsolocated in Rochester, Minnesota). In the process, employees who were not fully preparedwere moved into management positions, which resulted in increased employee dissatisfac-tion and employee turnover rates. After a needs assessment indicated that employees wereleaving because of dissatisfaction with management, Mayo decided to initiate a new train-ing program designed to help the new managers improve their skills. There was somedebate whether the training would be best administered in a classroom or one-on-one witha coach. Because of the cost implications of using coaching instead of classroom training(the costs were higher for coaching), Mayo decided to conduct an evaluation using aposttest comparison group design. Before training all managers, Mayo held three trainingsessions. No more than 75 managers were included in each session. Within each sessionmanagers were divided into three groups: a group that received four days of classroomtraining, a group that received one-on-one training from a coach, and a group that receivedno training (a comparison group). Mayo collected reaction (did the trainees like the pro-gram?), learning, transfer, and results outcomes. The evaluation found no statistically sig-nificant differences in the effects of the coaching compared to classroom and no training.As a result, Mayo decided to rely on classroom courses for new managers and to considercoaching only for managers with critical and immediate job issues.

Pretest/PosttestThe pretest/posttest refers to an evaluation design in which both pretraining and post-training outcome measures are collected. There is no comparison group. The lack of acomparison group makes it difficult to rule out the effects of business conditions or otherfactors as explanations for changes. This design is often used by companies that want toevaluate a training program but are uncomfortable with excluding certain employees orthat only intend to train a small group of employees.

Pretest/Posttest with Comparison GroupThe pretest/posttest with comparison group refers to an evaluation design that includestrainees and a comparison group. Pretraining and posttraining outcome measures are col-lected from both groups. If improvement is greater for the training group than the compar-ison group, this finding provides evidence that training is responsible for the change. Thistype of design controls for most of the threats to validity.

Table 6.9 presents an example of a pretest/posttest comparison group design. This eval-uation involved determining the relationship between three conditions or treatments andlearning, satisfaction, and use of computer skills.31 The three conditions or treatments(types of computer training) were behavior modeling, self-paced study, and lecture. Acomparison group was also included in the study. Behavior modeling involved watching avideo showing a model performing key behaviors necessary to complete a task. In this casethe task was procedures on the computer. (Behavior modeling is discussed in detail inChapter 7.)

Forty trainees were included in each condition. Measures of learning included a testconsisting of 11 items designed to measure information that trainees needed to know tooperate the computer system (e.g., Does formatting destroy all data on the disk?). Also,trainees comprehension of computer procedures (procedural comprehension) was meas-ured by presenting trainees with scenarios on the computer screens and asking them what



would appear next on the screen. Use of computer skills (skill-based learning outcome)was measured by asking trainees to complete six computer tasks (e.g., changing directo-ries). Satisfaction with the program (reaction) was measured by six items (e.g., I wouldrecommend this program to others).

As shown in Table 6.9, measures of learning and skills were collected from the traineesprior to attending the program (pretraining). Measures of learning and skills were also col-lected immediately after training (posttraining time 1) and four weeks after training (post-training time 2). The satisfaction measure was collected immediately following training.

The posttraining time 2 measures collected in this study help to determine the occur-rence of training transfer and retention of the information and skills. That is, immediatelyfollowing training, trainees may have appeared to learn and acquire skills related to com-puter training. Collection of the posttraining measures four weeks after training providesinformation about trainees level of retention of the skills and knowledge.

Statistical procedures known as analysis of variance and analysis of covariance were usedto test for differences between pretraining measures and posttraining measures for each con-dition. Also, differences between each of the training conditions and the comparison groupwere analyzed. These procedures determine whether differences between the groups are largeenough to conclude with a high degree of confidence that the differences were caused bytraining rather than by chance fluctuations in trainees scores on the measures.

Time SeriesTime series refers to an evaluation design in which training outcomes are collected at peri-odic intervals both before and after training. (In the other evaluation designs discussedhere, training outcomes are collected only once after and maybe once before training.) Thestrength of this design can be improved by using reversal, which refers to a time period inwhich participants no longer receive the training intervention. A comparison group canalso be used with a time series design. One advantage of the time series design is that itallows an analysis of the stability of training outcomes over time. Another advantage is thatusing both the reversal and comparison group helps to rule out alternative explanations forthe evaluation results. The time series design is frequently used to evaluate training pro-grams that focus on improving readily observable outcomes (such as accident rates, pro-ductivity, and absenteeism) that vary over time.

Table 6.10 shows a time series design that was used to evaluate how much a trainingprogram improved the number of safe work behaviors in a food manufacturing plant.32

This plant was experiencing an accident rate similar to that of the mining industry, the mostdangerous area of work. Employees were engaging in unsafe behaviors such as puttingtheir hands into conveyors to unjam them (resulting in crushed limbs).

Posttraining Posttraining Pretraining Training Time 1 Time 2

Lecture Yes Yes Yes YesSelf-paced study Yes Yes Yes YesBehavior modeling Yes Yes Yes Yes

No training (Comparison) Yes No Yes Yes

TABLE 6.9Example of aPretest/PosttestComparisonGroup Design

Source: Based on S. J. Simon and J. M. Werner, Com-puter Trainingthrough BehaviorModeling, Self-Paced,and InstructionalApproaches: A FieldExperiment, Journalof Applied Psychol-ogy 81 (1996):64859.



To improve safety, the company developed a training program that taught employeessafe behaviors, provided them with incentives for safe behaviors, and encouraged them tomonitor their own behavior. To evaluate the program, the design included a comparisongroup (the Makeup Department) and a trained group (the Wrapping Department). TheMakeup Department is responsible for measuring and mixing ingredients, preparing the dough, placing the dough in the oven and removing it when it is cooked, and packag-ing the finished product. The Wrapping Department is responsible for bagging, sealing,and labeling the packaging and stacking it on skids for shipping. Outcomes includedobservations of safe work behaviors. These observations were taken over a 25-week period.

The baseline shows the percentage of safe acts prior to introduction of the safety train-ing program. Training directed at increasing the number of safe behaviors was introducedafter approximately five weeks (20 observation sessions) in the Wrapping Department and10 weeks (50 observation sessions) in the Makeup Department. Training was withdrawnfrom the Wrapping and Makeup Departments after approximately 62 observation sessions.The withdrawal of training resulted in a reduction of the work incidents performed safely(to pretraining levels). As shown, the number of safe acts observed varied across the obser-vation period for both groups. However, the number of safe behaviors increased after thetraining program was conducted for the trained group (Wrapping Department). The levelof safe acts remained stable across the observation period. (See the intervention period.)When the Makeup Department received training (at 10 weeks, or after 50 observations), asimilar increase in the percentage of safe behaviors was observed.

Solomon Four-GroupThe Solomon four-group design combines the pretest/posttest comparison group and theposttest-only control group design. In the Solomon four-group design, a training group anda comparison group are measured on the outcomes both before and after training. Another

Perc

enta

ge o

f Inc

iden

ts P

erfo

rmed

Saf

ely

Observation Sessions

1009080706050

0

1009080706050

06560555045403530252015105

Baseline Intervention ReversalWrapping Department

Makeup Department

Percentage of Items Performed Safely by Employees in Two Departments of a Food Manufacturing Plant during a 25-Week Period of Time

TABLE 6.10Example of aTime SeriesDesign

Source: J. Komaki, K. D. Badwick, and L. R. Scott, A Behav-ioral Approach toOccupational Safety:Pinpointing Safe Per-formance in a FoodManufacturing Plant,Journal of AppliedPsychology 63 (1978).Copyright 1978 by theAmerican Psychologi-cal Association.Adapted by permission.



training group and control group are measured only after training. This design controls formost threats to internal and external validity.

An application of the Solomon four-group design is shown in Table 6.11. This designwas used to compare the effects of training based on integrative learning (IL) with tradi-tional (lecture-based) training of manufacturing resource planning. Manufacturingresource planning is a method for effectively planning, coordinating, and integrating theuse of all resources of a manufacturing company.33 The IL-based training differed from thetraditional training in several ways. IL-based training sessions began with a series of activ-ities intended to create a relaxed, positive environment for learning. The students wereasked what manufacturing resource planning meant to them, and attempts were made toreaffirm their beliefs and unite the trainees around a common understanding of manufac-turing resource planning. Students presented training material and participated in groupdiscussions, games, stories, and poetry related to the manufacturing processes.

Because the company was interested in the effects of IL related to traditional training,groups who received traditional training were used as the comparison group (rather thangroups who received no training). A test of manufacturing resource planning (knowledgetest) and a reaction measure were used as outcomes. The study found that participants inthe IL-based learning groups learned slightly less than participants in the traditional train-ing groups. However, IL-group participants had much more positive reactions than didthose in the traditional training program.

Considerations in Choosing an Evaluation DesignThere is no one appropriate evaluation design. An evaluation design should be chosenbased on an evaluation of the factors shown in Table 6.12. There are several reasons whyno evaluation or a less rigorous evaluation design may be more appropriate than a morerigorous design that includes a comparison group, random assignment, or pretraining andposttraining measures. First, managers and trainers may be unwilling to devote the timeand effort necessary to collect training outcomes. Second, managers or trainers may lackthe expertise to conduct an evaluation study. Third, a company may view training as aninvestment from which it expects to receive little or no return. A more rigorous evaluationdesign (pretest/posttest with comparison group) should be considered if any of the follow-ing conditions are true:34

1. The evaluation results can be used to change the program.

2. The training program is ongoing and has the potential to have an important influence on(employees or customers).

3. The training program involves multiple classes and a large number of trainees.

4. Cost justification for training is based on numerical indicators. (Here the company hasa strong orientation toward evaluation.)

Pretest Training Posttest

Group 1 Yes IL-based YesGroup 2 Yes Traditional YesGroup 3 No IL-based YesGroup 4 No Traditional Yes

TABLE 6.11Example of aSolomon Four-Group Design

Source: Based on R. D. Bretz and R. E. Thompsett,Comparing Tradi-tional and IntegrativeLearning Methods inOrganizational Train-ing Programs,Journal of AppliedPsychology 77(1992): 94151.



5. Trainers or others in the company have the expertise (or the budget to purchase exper-tise from outside the company) to design and evaluate the data collected from an evalu-ation study.

6. The cost of the training creates a need to show that it works.

7. There is sufficient time for conducting an evaluation. Here, information regarding train-ing effectiveness is not needed immediately.

8. There is interest in measuring change (in knowledge, behavior, skill, etc.) from pre-training levels or in comparing two or more different programs.

For example, if the company is interested in determining how much employees com-munications skills have changed as a result of a training program, a pretest/posttest com-parison group design is necessary. Trainees should be randomly assigned to training andno-training conditions. These evaluation design features offer a high degree of confidencethat any communication skill change is the result of participation in the training program.35

This type of evaluation design is also necessary if the company wants to compare the effec-tiveness of two training programs.

Evaluation designs without pretest or comparison groups are most appropriate in situa-tions in which the company is interested in identifying whether a specific level of per-formance has been achieved. (For example, are employees who participated in trainingable to adequately communicate their ideas?) In these situations, companies are not inter-ested in determining how much change has occurred but rather in whether the traineeshave achieved a certain proficiency level.

One companys evaluation strategy for a training course delivered to the companys taxprofessionals shows how company norms regarding evaluation and the purpose of traininginfluence the type of evaluation design chosen.36 This accounting firm views training as aneffective method for developing human resources. Training is expected to provide a goodreturn on investment. The company used a combination of affective, cognitive, behavior,and results criteria to evaluate a five-week course designed to prepare tax professionals tounderstand state and local tax law. The course involved two weeks of self-study and threeweeks of classroom work. A pretest/posttest comparison design was used. Before they tookthe course, trainees were tested to determine their knowledge of state and local tax laws,

Factor How the Factor Influences the Type of Evaluation Design

Change potential Can program be modified?Importance Does ineffective training affect customer service,

safety, product development, or relationships among employees?

Scale How many trainees are involved?Purpose of training Is training conducted for learning, results, or both?Organization culture Is demonstrating results part of company norms and

expectations?Expertise Can a complex study be analyzed?Cost Is evaluation too expensive?Time frame When is the information needed?

TABLE 6.12Factors ThatInfluence theType ofEvaluationDesign

Source: Based on S. I. Tannenbaum andS. B. Woods, Deter-mining a Strategy forEvaluating Training:Operating withinOrganizational Con-straints, HumanResource Planning 15(1992): 6381.



and they completed a survey designed to assess their self-confidence in preparing accuratetax returns. The evaluators also identified the trainees (accountants) billable hours relatedto calculating state and local tax returns and the revenue generated by the activity. After thecourse, evaluators again identified billable hours and surveyed trainees self-confidence.The results of the evaluation indicated that the accountants were spending more time doingstate and local tax work than before training. Also, the trained accountants produced morerevenue doing state and local tax work than did accountants who had not yet received thetraining (comparison group). There was also a significant improvement in the accountantsconfidence following training, and they were more willing to promote their expertise instate and local tax preparation. Finally, after 15 months, the revenue gained by the com-pany more than offset the cost of training. On average, the increase in revenue for thetrained tax accountants was more than 10 percent.

DETERMINING RETURN ON INVESTMENT

Return on investment (ROI) is an important training outcome. This section discusses howto calculate ROI through a cost-benefit analysis. Cost-benefit analysis in this situation isthe process of determining the economic benefits of a training program using accountingmethods that look at training costs and benefits. Training cost information is important forseveral reasons:37

1. To understand total expenditures for training, including direct and indirect costs.

2. To compare the costs of alternative training programs.

3. To evaluate the proportion of money spent on training development, administration, andevaluation as well as to compare monies spent on training for different groups ofemployees (exempt versus nonexempt, for example).

4. To control costs.

There is an increased interest in measuring the ROI of training and development pro-grams because of the need to show the results of these programs to justify funding and toincrease the status of the training and development function.38 Most trainers and man-agers believe that there is a value provided by training and development activities, such asproductivity or customer service improvements, cost reductions, time savings, anddecreased employee turnover. ROI provides evidence of the economic value provided bytraining and development programs. However, it is important to keep in mind that ROI isnot a substitute for other program outcomes that provide data regarding the success of aprogram based on trainees reactions and whether learning and training transfer haveoccurred.

Consider the use of ROI at LensCrafters. LensCrafters brings the eye doctor, a wideselection of frames and lenses, and the lens-making laboratory together in one location.39

LensCrafters has convenient locations and hours of operations, and it has the ability tomake eyewear on-site. Emphasizing customer service, the company offers a one-stoplocation and promises to make eyewear in one hour. Dave Palm, a training professional atLensCrafters, received a call from a concerned regional manager. He told Palm thatalthough company executives knew that LensCrafters employees had to be well trained todesign eyewear and that employees were satisfied with the training, the executives wanted



to know whether the money they were investing in training was providing any return.Palm decided to partner with the operations people to identify how to link training tomeasurable outcomes such as profitability, quality, and sales. After conversations with theoperations employees, he decided to link training to waste from mistakes in quality andremakes, store performance and sales, and customer satisfaction. He chose two geo-graphic regions for the evaluation study and compared the results from these two regionswith results from one that had not yet received the training. Palm found that all stores inthe two regions that received training reduced waste, increased sales, and improved cus-tomer satisfaction. As a result, LensCrafters allotted its training department more financialresources$10 million a year for training program development and administrationthanany other optical retail competitor. Because the training department demonstrated that itdoes contribute to business operations, it also received money to develop a multimedia-based training system.

The process of determining ROI begins with an understanding of the objectives of thetraining program.40 Plans are developed for collecting data related to measuring theseobjectives. The next step is to isolate, if possible, the effects of training from other factorsthat might influence the data. Last, the data are converted to a monetary value and ROI iscalculated. Choosing evaluation outcomes and designing an evaluation that helps isolatethe effects of training were explained earlier in the chapter. The following sections discusshow to determine costs and benefits and provide examples of cost-benefit analysis andROI calculations.

Because ROI analysis can be costly, it should be limited only to certain training pro-grams. ROI analysis is best for training programs that are focused on an operational issue(measurable identifiable outcomes are available), are linked to a companywide strategy(e.g., better customer service), are expensive, are highly visible, have management interest,are attended by many employees, and are permanent.41 At Deloitte & Touche, the tax andauditing firm, managers dont require analysis of ROI for many training programs.42

Because knowledge is the product at Deloitte & Touche, investment in training is seen asan important part of the business. Deloitte & Touche makes money through the billablehours its consultants provide to clients. Training helps prepare the consultants to serveclients needs. ROI is primarily calculated for courses or programs that are new or expen-sive. For example, ROI analysis was conducted for a simulation designed to help newemployees learn more quickly how to service clients. At Deloitte & Touche use of the sim-ulation has resulted in new hires being 30 to 40 percent faster in serving clientsresultingin an ROI of over $66 billion after subtracting program costs.

Determining CostsOne method for comparing costs of alternative training programs is the resource require-ments model.43 The resource requirements model compares equipment, facilities, person-nel, and materials costs across different stages of the training process (needs assessment,development, training design, implementation, and evaluation). Use of the resourcerequirements model can help determine overall differences in costs among training pro-grams. Also, costs incurred at different stages of the training process can be comparedacross programs.

Accounting can also be used to calculate costs.44 Seven categories of cost sources arecosts related to: program development or purchase, instructional materials for trainers



and trainees, equipment and hardware, facilities, travel and lodging, salary of trainer andsupport staff, and the cost of lost productivity while trainees attend the program (or thecost of temporary employees who replace the trainees while they are at training). Thismethod also identifies when the costs are incurred. One-time costs include those relatedto needs assessment and program development. Costs per offering relate to training siterental fees, trainer salaries, and other costs that are realized every time the program isoffered. Costs per trainee include meals, materials, and lost productivity or expensesincurred to replace the trainees while they attend training.

Determining BenefitsTo identify the potential benefits of training, the company must review the original reasonsthat the training was conducted. For example, training may have been conducted to reduceproduction costs or overtime costs or to increase the amount of repeat business. A numberof methods may be helpful in identifying the benefits of training:

1. Technical, academic, and practitioner literature summarizes the benefits that have beenshown to relate to a specific training program.

2. P