an evaluation of factors regarding

Upload: sveal

Post on 02-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 An Evaluation of Factors Regarding

    1/28

    Decision Sciences Journal of Innovative EducationVolume 6 Number 2 July 2008 Printed in the U.S.A.

    EMPIRICAL RESEARCH

    An Evaluation of Factors RegardingStudents Assessment of Facultyin a Business School

    Richard L. Peterson, Mark L. Berenson, Ram B. Misra,and David J. Radosevich

    Department of Management & Information Systems, Montclair State University, Montclair, NJ 07043, e-mail: [email protected], [email protected],[email protected], [email protected]

    ABSTRACT

    Student faculty ratings are used at most institutions of higher learning for three impor-tant reasons. First, the ratings provide direct feedback to the faculty, and this enablesfaculty to adjust their teaching styles. Second, the ratings provide the administrationwith information intended to assist in guiding and mentoring faculty toward more ef-fective pedagogical performance in the classroom. Third, the ratings also provide theadministration with information to be used in the reappointment, tenure, and promotionprocesses, as well as for assignment of salary range adjustments and teaching awards.To be of real value, however, all of this is predicated on the use of a valid and reli-able faculty-rating instrument along with a system designed to provide both the facultyand the administration with norming reports that allow for appropriate comparisons of ratings. This article reports such a study conducted within a large department of a busi-ness school and recommends that the process used be adapted by other business schooldepartments and other academic units across the university and at other universities toensure a more universally appropriate usage of students ratings.

    Subject Areas: Norming Report, Rating Instrument, and Student Faculty

    Ratings.

    INTRODUCTION

    Brightman (2005) discusses the importance of mentoring faculty to improve teach-ing and student learning. He contends that, to establish a good mentoring systemaimed to assist in retaining junior faculty and prevent burnout by senior faculty,there must be two factors in place: (1) a valid and reliable student evaluationinstrument and (2) a meaningful norming report. He opines that many institutionsemploy rating instruments that lack validity and reliability and, even worse, do not

    Special thanks to I-Lin (Christy) Tu for her many contributions to this project and to Harvey Brightmanfor his suggestions in his invited article in this journal in 2005 that led us to rethink and expand our researchefforts.

    Corresponding author.

    375

  • 7/27/2019 An Evaluation of Factors Regarding

    2/28

    376 An Evaluation of Factors Regarding Students Assessment of Faculty

    even display a norming report. He states, and we most certainly concur, that [i]tis unfair to compare a faculty member teaching a required core class (sophomore-level class in statistics) with another faculty member teaching a senior-level elective

    course or graduate course in decision support systems. He then outlines what mustbe found in a valid and reliable rating system and in a norming report.

    Using Brightman s suggestions as a guide, our objective in initiating thisstudy is to better assess the results of our student faculty evaluations and usethis feedback appropriately when providing advice and guidance to faculty andproviding appropriate input to the dean and provost.

    The remainder of this article is organized to include a literature review, thebackground for the study, a presentation of the research methodology and a con-ceptual framework articulating the hypotheses examined, a description of the dataset, a presentation of key ndings, the development of a set of norming reports, adiscussion indicating the scope and limitations of the study and their impact on thekey ndings, and a summary and list of recommendations for further research.

    LITERATURE REVIEW

    Prior to business school faculty getting interested in the research related to thestudents assessment of faculty, other disciplines, particularly the social sciences,have examined this topic. While social scientists have approached their researchfrom a motivational point of view, research from business school faculty has treatedstudents as consumers (Lawrence & Sharma, 2002; Singh, 2002). Nevertheless,as expected, the factors that impact students ratings of professors classroomperformance are similar. A broad search in the JSTOR database yielded over 5,200listings.A narrower search in theJSTOR databaseusing thestring Student Ratingsof College Teaching yielded 85 listings and a similar search in the ABI/INFORMSdatabase yielded 34 listings. Roughly half of these overlapped with the rst search.

    The current literature can be grouped into four categories: (1) the factorsinuencing students ratings; (2) the use of students ratings by the administration(Costin, Greenough, & Menges, 1971) along with alternate means such as peer evaluation and administrative ratings (Greenwood & Ramagli, 1980); (3) the re-

    lationship between student learning and ratings; and (4) the impact of students ratings on faculty behavior in conducting the course. Because the focus of this ar-ticle is in understanding the factors that might be in uencing the students ratings(Category 1) and how these ratings might be used by administrators (Category 2),the literature review will mainly deal with these two categories.

    Costin et al. (1971) presented a comprehensive review of research relatedto the evaluation of college teaching by students. The topics covered includedreliability of student ratings. Their review covers close to 120 research arti-cles reporting the ndings of studies involving almost every conceivable factor that might be considered to have a role in determining students ratings of the

    faculty.In presenting these factors, we propose to put them in two categories:

    teaching-related factors and non-teaching-related factors. This distinction is im-portant as the non-teaching factors cast enough doubt on the ratings to put theeffect of teaching-related factors on ratings in question. For example, bad teachers

  • 7/27/2019 An Evaluation of Factors Regarding

    3/28

    Peterson et al. 377

    may take solace in the fact that their ratings are low because they do not dilute their courses by not covering hard material or because they are tough graders. This lineof thinking takes credence as some research indicates such phenomena (Everett,

    1977; Hamilton, 1980; Lima, 1981).

    Teaching-Related FactorsEarly research efforts concerned student perceptions of qualities displayed by themost effective teachers (Crawford & Bradshaw, 1968; Costin, 1968; French, 1957;Pohlmann, 1975). Such teachers demonstrated:

    r Thorough knowledge of the subject. r Genuine interest in teaching the material and ability to create interest in

    students for the subject. r Well-planned and organized class sessions. r Clear and understandable explanation, using relevant examples. r Flexibility and concern for students needs.

    Although there is no debate on these factors being good measures of teaching, it issometimes argued whether students can be objective enough to give honest ratingswithout being in uenced by other, non-teaching-related factors.

    Non-Teaching FactorsThe numerous non-teaching factors that have been studied by researchers are clas-sied here into ve categories: (1) those related to grading (Voeks & French, 1960);(2) those related to the attributes of students; (3) those related to the attributes of teachers; (4) those related to the attributes of the course; and (5) those related toenvironmental factors, such as class size.

    It is a long-standing belief among many professors that easy does it. Zangenehzadeh (1988) concluded that student ratings of faculty have resulted inchanging teachers grading behavior. Bacon and Novotny (2002) found a positivecorrelation between the perception of easiness and the ratings given by the stu-

    dents for hypothetical teachers (e.g., what if you had a teacher who was very easyin grading . . . ) at the undergraduate level but not at the graduate level. However,other research studies (Costin et al., 1971) do not support that nding.

    Factors related to student attributes include: r Class designation freshman through senior (Costin et al., 1971). Note

    that seniors gave higher ratings than did less experienced freshmen. r Gender (Myers & Dugan, 1996; Ward, Cook, Ward, & Wilson, 1999;

    Wilson & Doyle, 1976). Note that, except for particular situations, gender interactions did not typically impact on the ratings.

    r Gradeexpectations (Bejar& Doyle, 1976; Hamilton, 1980).Note that prior experience ledstudents to expect instructor traits to covary in speci c ways,but that these expectations had little, if any, bearing on the evaluationsgiven. A related factor is the difference between the expected grade andthe target grade.

  • 7/27/2019 An Evaluation of Factors Regarding

    4/28

    378 An Evaluation of Factors Regarding Students Assessment of Faculty

    r Cultural background a measurable difference in student perceptionamong U.S. and Eastern countries students in their perception of teachers classroom instructional behavior (Burba, Petrosko, & Boyle, 2001).

    r Major (Costin et al., 1971). Note that students had a higher level of interestin courses in their major, and this was re ected in higher ratings.

    r Course level dif culty soft course versus hard course (Everett,1977). Note that students favored professors who emphasized lower-levelcognitive material.

    r Student performance (Frey, Dale, Leonard, & Beatty, 1975). Note thatthere was a high positive correlation between the students ratings andtheir educational achievement (i.e., higher grades).

    r Students knowledge about the disposition (i.e., administrative use, courseimprovement, etc.) of the rating results (Driscoll & Goodwin, 1979). Notethat students who were told of the use of their evaluations gave higher ratings compared to those who were not told.

    r Students opinions about the value of student ratings (Small & Mahon,2005).Note that therewas a strongpositiverelationship between thequalityof ratings and students perceptions of the value given to their ratings bythe teacher and by the administration.

    Among the teachers attributes in uencing the ratings are the teachers position or rank, expectations demanding or non-demanding as well as experience, train-ing, communication skills, and age (Blackburn & Lawrence, 1986).

    Course characteristics inuence on students ratings were studied by Aigner and Thum (1986). More recent research indicates demanding courses can result inlower ratings (Paswan & Young, 2002).

    Among the environmental factors are the class size (Crittenden, Norr, &LaBailly, 1975; Hamilton, 1980) and the time period for obtaining the ratings fromthe students at the middle of the semester, near the end of the semester, wellbefore the nal examination, right before the nal examination, or after the nalexamination. Students in larger classes gave lower ratings. Frey (1973) compared

    the nal examination performances of seven sections of introductory calculus withthe student ratings of the instructors. To obtain the ratings, half of the students werecontacted during the last week of classes and the other half during the rst week of the subsequent semester. Although both sets of ratings correlated positively withstudents nal examination performance, the ratings made during the rst week of thesubsequent semester showed a stronger relationshipwith student performance.

    BACKGROUND

    The central objective of our research is to identify variables that provide a valid

    rationale for assigning faculty into norming groups for the purpose of comparingone member of that group to all other members of that group. Given the objectiveand the data set we had available, we selected the following six variables to explore:

    Semester Effect: Course offerings for fall and spring semesters, althoughoverlapping, are certainly not identical and ratings for each semester need to

  • 7/27/2019 An Evaluation of Factors Regarding

    5/28

    Peterson et al. 379

    be compared. The literature, especially Hamilton (1980), suggests only thatintrasemester timing of the data collection has an impact on ratings. It could bethat semester ratings differ due to such in uences as an increased student focus on

    graduation, summer employment, between-semester holidays, and the like.Course Session Effect: Institutions of higher education, especially com-

    muter schools, have different populations of students during the day and evening.Day students tend to be full-time students who are employed part-time whereasevening students may work as much as 40 50 hours per week. Differences in stu-dent performance or interest in the subject matter may vary between these twogroups.

    Faculty Type Effect: There is not an identical mix of full-time and adjunctfaculty across the day and evening classes. Assuming that academically quali edfaculty ground their knowledge of, interest in, and ability to teach a subject on atheoretical basis compared with the practical grounding of professionally quali ed(adjunct) faculty, a difference in ratings between the two groups of faculty mayappear.

    Course Level Effect: The level (100, 200, etc.) of the course where the ratingoccurs follows the notion from the literature (Everett, 1977) that students favoredprofessors who emphasized lower-level cognitive material. Assuming that facultyhave greater expectations in higher-level courses, faculty teaching at, say the 300(i.e., junior) level may be more like other faculty teaching at that level and not likefaculty teaching courses at other levels.

    Course Focus Effect: Everett (1977) explored the in uence of soft versushard courses whereas Costin et al. (1971) suggested that a student s major in-uences the student s ratings. Soft and hard are relative and dependent on theperception of the rater. It might be more informative to separately measure thefocus of the course (quantitative, system, or theory/practice) and then the reasonthe student is in the course.

    Course Type: In most curricula the listed courses are required of all studentsin a major, are required of students in a minor or concentration, or theyare electives.It seems reasonable to suppose a difference in ratings depending on the reason thestudents are in the class. In the remainder of this article we explore the in uence

    of these six effects on faculty ratings.

    RESEARCH METHODOLOGY

    Faculty members at Montclair State University typically teach three courses per semester and may be evaluated by students in each course. Participation in thestudent faculty evaluation process is contractual (required) forall untenured faculty,adjuncts, and tenured faculty intending to be considered for promotion or salaryrange adjustment. Other senior tenured faculty members are asked to participateon a voluntary basis and the majority of them do.

    The School of Business at Montclair State University uses a student facultyevaluation instrument for each course that has 10 items measured on a ve-pointLikert scale. Faculty members distribute the evaluation forms in any class sessionthey choose in the last 2 weeks of the semester. Final examinations at Montclair StateUniversity arescheduled duringa special weekfollowing thelast class session.

  • 7/27/2019 An Evaluation of Factors Regarding

    6/28

    380 An Evaluation of Factors Regarding Students Assessment of Faculty

    The faculty receives their average ratings on these 10 items, along with the overallsimple average of all 10 items. This feedback from the student evaluations aredistributed to the faculty early in the following semester, well after the grading for

    the current semester is over.We rst formulated key research questions regarding the student evaluation

    instrument. Once we developed the necessary database to be used, we conductedexploratory analyses of class size. Next, we assessed the validity and reliabilityof the student evaluation instrument. Subsequently, we performed a preliminaryevaluation of correlations between students ratings of faculty and grades assignedby faculty to students. Next, we conducted exploratory analyses of the distributionof the class grade point average (GPA, i.e., calculated by converting letter gradesinto numerical grades) assigned by faculty and the distribution of ratings of facultyassigned by students. This was followed by an examination of class GPA assignedby faculty who participated in the student rating process versus those who did notrequest student ratings. We, then analyzed six primary research questions pertinentto the development of a set of norming reports that concern possible differences instudents ratings of faculty by semester (i.e., fall versus spring), by course session(i.e., day versus evening), by faculty type (i.e., full-time versus adjunct), by courselevel (i.e., 1 200, 300, 400, and 500), by course focus (i.e., courses with either a quantitative, systems, or theory/practice emphasis), and by course type (i.e.,course required versus student chosen courses). These ndings were compared andcontrasted with those obtained through an analysis of other, secondary questions

    pertaining to faculty grading of students. Finally, we developed and implementeda set of customized norming reports based on the analysis of speci c researchquestions.

    PRIMARY RESEARCH QUESTIONS

    As Brightman (2005) emphasized, in addition to a valid and reliable rating instru-ment, meaningful norming reports must be available (Cohen, 1980; Frey, 1973) for administrators and supervisors to be able to properly assist faculty in improvingpedagogical delivery.

    Every professor has his or her ideas of what prompts students to give goodfeedback, honest feedback, and bad feedback. This study intended to investigateseveral factors thatmay impact on the feedbackand use these results to develop a setof appropriate, customized norming reports. An analysis of the primary researchquestions provided below enable the development of the aforementioned set of norming reports.

    r Semester Effect: Is there evidence of a fall versus spring effect in theratings of faculty provided by the students?

    r Course Session Effect: Is there evidence of a difference in the students

    ratings of faculty in day versus evening classes? r Faculty Type Effect: Is there evidence of a difference in the students

    ratings between full-time versus adjunct faculty? r Course Level Effect: Is there evidence of differences in the students

    ratings across four business school course levels (100 or 200 =sophomore

  • 7/27/2019 An Evaluation of Factors Regarding

    7/28

    Peterson et al. 381

    level or below, 300 = junior level, 400 = senior level, or 500 = graduatelevel)? r Course Focus Effect: Is there evidence of a course emphasis effect in the

    students ratings across courses with a quantitative focus, a systems focus,or a theory/practice focus?

    r Course Type Effect: Is there evidence of a difference in the students ratings of required core business school courses versus courses taken (bothrequired and elective) as part of a selected discipline concentration?

    SECONDARY RESEARCH QUESTIONS

    In addition to the analysis of the above primary research questions, which pertain

    to the student evaluations and directly impact on the development of the normingreports, an analysis of secondary research questions of interest, which concern thegrades faculty assign to the students is given for completeness. In particular, thefollowing questions are also addressed:

    r Semester Effect: Is there evidence of a fall versus spring effect in thegrades assigned by faculty to the students?

    r Course Session Effect: Is there evidence of a difference in the gradesassigned by faculty to students taking day versus evening classes?

    r Faculty Type Effect: Is there evidence of a difference in the grades as-

    signed to students by full-time versus adjunct faculty? r Course Level Effect: Is there evidence of differences in the grades as-

    signed by faculty to students across four business school course levels(100 or 200 = sophomore level or below, 300 = junior level, 400 = senior level, or 500 = graduate level)?

    r Course Focus Effect: Is there evidence of a course emphasis effect shownin the grades assigned by faculty to students across courses with a quanti-tative focus, a systems focus, or a theory/practice focus?

    r Course Type Effect: Is there evidence of a difference in the grades

    assigned by faculty to students in core business school courses versuscourses taken (both required and elective) as part of a selected disciplineconcentration?

    THE DATA

    A data set containing the students average ratings of all faculty participants inManagement & Information Systems Department courses was created over four consecutive semesters two full academicyears.Therewereapproximately 90 sec-tions per semester that were offered (355 sections over 2 years).

    Figure 1 provides the key components of the 10-item evaluation instrumentutilized for obtaining students responses on ve-point Likert scales using letters Ato E, which are then converted to the values 1 5 (A or 1 being the best rating score).A computer program compiles the responses in each class and provides the simpleaverage rating on each of the 10 items, as well as an average of the 10 averages.

  • 7/27/2019 An Evaluation of Factors Regarding

    8/28

    382 An Evaluation of Factors Regarding Students Assessment of Faculty

    Figure 1: Student evaluation instrument components.

    Course Number Semester Year Instructors Name Student Assigned Ratings of Instructor on Each of Ten Items:

    (1=Strongly Agree, 2=Agree, 3=Uncertain, 4=Disagree, 5=Strongly Dis agree) Items Being Rated:

    Q1. Instructor demonstrates importance/relevance of subject Q2. Instructor encourages critical thinking Q3. Instructor has well-planned presentationsQ4. Instructor demonstrates enthusiasm in teaching the subject Q5. Instructor provides clear explanationsQ6. Instructor encourages student participation/expressionQ7. Instructor is readily accessible to studentsQ8. Instructor provides appropriate evaluation of student performanceQ9. Instructor should be recommended to a friend Q10. Instructor delivers the course with a level of excellence one should expect

    The administration provides these 11 summary ratings to each faculty member for each course in which the student ratings are obtained. Averages closer to 1.0 are

    considered truly outstanding.Sucha systemis, of course, the reverse ofwhat is usedwhen faculty give grades to students. At almost all institutions of higher learning agrade of A is equated to 4.0, B to 3.0 and so on with a grade of F equated to .0. Thusa student grade point average (GPA) closer to 4.0 is considered truly outstanding.Therefore, in the ideal situation where an excellent teacher motivates the studentsand encourages learning, which will result in enhanced student performance andhigher course grades, it should be expected that the students will also give better ratings to this teacher, resulting in a signi cant correlation between the grades thefaculty member awards the students and the ratings the faculty member receivesby the students.

    In addition to the student evaluations of the faculty, the data set was con-structed by merging a le containing the grade distributions assigned by eachfaculty member in these corresponding sections. Moreover, the data set also in-cludes a le containing the grade distributions assigned in department sections byfaculty who did not participate in the student-rating program. In particular, thegrade distribution assigned by the faculty (i.e., the numbers of A, B, C, D, and Fgrades) are recorded along with the total number of assigned grades, course name,number and section, a coded faculty identi er, and information about the level of the course, the type of course, whether the course was offered in the fall or springsemester, whether the course was offered in the day or evening, and whether thecourse was offered by a full-time or adjunct faculty member. Using the speci cinformation on faculty grade distributions, a program was written to compute theclass GPAs by assigning 4.0 points to the grade A, 3.0 points to the grade B, and soon down to .0 points for the grade F. This information is now part of each courserecord in the data set.

  • 7/27/2019 An Evaluation of Factors Regarding

    9/28

    Peterson et al. 383

    EXPLORATORY AND PRELIMINARY FINDINGS

    Selecting Class Sections for Inclusion in the Study: An Exploratory

    Analysis of Class SizeFigure 2 is a stem-and-leaf display of the number of students enrolled in each of the355 class sections offered by the Management & Information Systems Departmentover four consecutive semesters.

    As might be anticipated for a public university setting, distribution of classsize is slightly skewed left. Brightman (2005) had suggested that a trimmed meanbe used to evaluate data sets of student evaluations with classes of 15 studentsor fewer, but it was thought here to evaluate the smaller class sections separately.Because Brightman did not provide a rationale for the class size cut point, wechose to take a more liberal position and exclude classes of 13 students or fewer.Although our decision may also be deemed arbitrary, the question that needed tobe considered was how small must be the class size, relative to other class sizes,before it can be considered an outlier. Tukey (1977) suggested that in general anydata value located a distance more than 1.5 times the interquartile range belowthe rst quartile (Q1) can be considered an outside value or possible outlier

    Figure 2: Stem-and-leaf display: Total number of students per class in 355sections.

    Stem-and-leaf of the number of students in each class

    Leaf Unit = 1.00L 1111

    0T 22

    0F

    0S 67

    0H 8999999

    1L 0000001

    1T 22233

    1F 4455555

    1S 66666666667777777

    1H 88888889999999

    2L 00000011111111111

    2T 222222222222222333333333333

    2F 44444444444445555555555555555555555555

    2S 666666666666677777777777777

    2H 88888888888888999999

    3L 0000000001111111111111111

    3T 222222222222222222333333333333333333

    3F 4444444444444444444444444444444445555555555555555555555555

    3S 666666666777777

    3H 88888999999999

    4L 00000000

    4T 3

    4F 5

    4S

    4H 9

    5L

    5T 2

    5F 4

  • 7/27/2019 An Evaluation of Factors Regarding

    10/28

    384 An Evaluation of Factors Regarding Students Assessment of Faculty

    (Velleman & Hoaglin, 1981) whereas proponents of total quality managementag as an outlier any data value located more than 3.0 standard deviations belowthe mean, when the observations are approximately normally distributed (Levine,

    Stephan, Krehbiel, & Berenson, 2008). For the distribution of class sizes presentedin Figure 2, classes of size four [Q1 (1.5)(Q3 Q1) = 22 (1.5)(34 22) =4] would qualify for this distinction for any-shaped data set and classes of sizeone [ X 3.0S = 27.09 (3.0)(8.78) = .75 1] would qualify for normallydistributed data sets. However, in our study we deemed classes of such sizes far toosmall to be considered for inclusion. Students in small classes may feel pressuredthat their evaluations could be identi ed by a faculty member who may yet see thesame students again in other, small-sized, required or elective classes.

    Removing 27 class sections of size 13 or fewer from this study 7.6% of those offered during the timeframe of analysis resulted in a data set containing328 sections over the four semesters. These 27 small-sized classes are displayedin Figure 2 in italics. We note further that 268 (or 81.7%) of these larger sectionsparticipated in the student ratings program and 60 did not.

    Assessing Validity and Reliability of the Student EvaluationRatings InstrumentBased on the suggestions of Brightman (2005) this study commenced with an as-sessment as to the reliability and validity of the rating instrument that has been usedin the business school at Montclair State University. Centra (1979) had concludedthat a valid rating instrument must contain factors measuring presentation ability,organization and clarity, fairness of grading, student interaction, and student mo-tivation. He opined that, when such factors are present in the rating instrument,the negative perceptions often expressed by faculty regarding evaluations as merepopularity contests (Centra, 1982), not related to student learning (Frey, 1973), areeffectively addressed.

    Correlation matrix of student ratings

    For the 268 class sections that provided students ratings of the faculty, Figure 3presents the correlation matrix of all pair-wise associations between the 11 average

    Figure 3: Correlation matrix of average student evaluation ratings for 268 classsections.

    All Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10Overall

    Average

    Q1 1.000

    Q2 0.912 1.000

    Q3 0.884 0.871 1.000

    Q4 0.901 0.867 0.850 1.000

    Q5 0.880 0.857 0.893 0.824 1.000

    Q6 0.856 0.895 0.843 0.791 0.875 1.000

    Q7 0.835 0.841 0.781 0.808 0.779 0.821 1.000

    Q8 0.825 0.786 0.836 0.789 0.843 0.813 0.805 1.000

    Q9 0.861 0.840 0.863 0.825 0.937 0.880 0.804 0.884 1.000

    Q10 0.940 0.922 0.918 0.888 0.926 0.904 0.861 0.873 0.927 1.000OverallAverage 0.947 0.936 0.936 0.908 0.949 0.929 0.888 0.909 0.951 0.981 1.000

  • 7/27/2019 An Evaluation of Factors Regarding

    11/28

  • 7/27/2019 An Evaluation of Factors Regarding

    12/28

    386 An Evaluation of Factors Regarding Students Assessment of Faculty

    teaching effectiveness. The administrators of our university developed the itemsand determined that they represented the scope of teaching behaviors that wereneeded for administrative purposes. Second, in a future study, we will also be as-

    sessing the overall construct validity through convergent validity, where we willcorrelate student evaluations in a particular course with faculty peer ratings in thatcourse normally completed as part of a classroom observation session. Althoughthe measuring instrument used by the faculty is different, similar kinds of informa-tion are obtained, and it would be possible to create a scale based on the responsesprovided. The dif culty with such assessment, however, is that a faculty member typically teaches three courses per semester over a 14-week period and the studentevaluations, though perhaps biased by more recent events, typically consider thework over the entire semester. The faculty rating is a one-time experience, typicallydone during the middle weeks of the semester. Nevertheless, despite these limita-tions, a faculty rating scale is currently in development so that peer ratings can becorrelated with the student ratings for the same courses. Thus, at this juncture thestudent evaluation instrument s reliability has been ascertained, but its constructvalidity needs to be further examined and will be part of our continued research inthis area.

    Assessing Association between Grades Assigned by Facultyand Students Ratings of FacultyFigure 4 presents the Pearsonian correlations between class GPA assigned by fac-

    ulty and the corresponding averages of students ratings of the faculty on the10-item student evaluation instrument along with the overall average rating in 268class sections. Note the negative and highly statistically signi cant relationshipsbetween the class GPA and the students ratings in these class sections. The weak-est correlation, r = .238, for the class GPA with Q3, is very highly statisticallysigni cant, but the coef cient of determination is, nevertheless, rather low ( r 2 =.057). Thus, there is a moderate association between the grades assigned by faculty

    Figure 4: Pearsonian correlations of class GPAs and average student evaluationsfor 268 sections.

    GPA

    Q1 -0.262

    Q2 -0.282

    Q3 -0.238

    Q4 -0.299

    Q5 -0.290

    Q6 -0.313

    Q7 -0.296

    Q8 -0.314Q9 -0.334Q10 -0.272

    Overall Avg. -0.312

  • 7/27/2019 An Evaluation of Factors Regarding

    13/28

    Peterson et al. 387

    Figure 5: Histogram with superimposed normal curve of class GPA given byfaculty in 328 class sections over four consecutive semesters.

    Students' GPA over 4 semesters

    4 . 0 0 3 . 7 5

    3 . 5 0 3 . 2 5

    3 . 0 0 2 . 7 5

    2 . 5 0 2 . 2 5

    2 . 0 0 1 . 7 5

    1 . 5 0

    Students' GPA over 4 semesters

    F r e q u e n c y

    60

    50

    40

    30

    20

    10

    0

    Std. Dev = .44

    Mean = 3.04

    N = 328.00

    and the evaluations given to the faculty by the students across all items being rated.Note that the correlation is negative only because in the rating instrument, the scoreof 1 is the best score and 5 is the worst.

    Following the conformation of the reliability of the student evaluation ratinginstrument (along with the assumption of validity), as well as the examinationof association between faculty grades assigned and student ratings of faculty, athorough preliminary investigation of faculty grades was undertaken. Below arethe key ndings.

    Analysis of Faculty Grade Distribution (Class GPA) by Class SectionBecause class GPA is assumed to be normally distributed, we used the Lillieforsprocedure (Lilliefors, 1967) to test the normality assumption for our data set. Thisprocedure is appropriate for testing the normality goodness-of- t assumption whenthe underlying population s parameters are unknown and the samples estimates of the mean and standard deviation are used as proxies. Figure 5 shows a histogram

    of the class GPA assigned by faculty for 328 class sections over four consecutivesemesters. As can be observed, the normality assumption is quite reasonable. Thenull hypothesis that a normal distribution is a good t to the data is not rejectedusing the traditional .05 level of signi cance (the test statistic L = .0401 < L.05 =.0489).

  • 7/27/2019 An Evaluation of Factors Regarding

    14/28

    388 An Evaluation of Factors Regarding Students Assessment of Faculty

    Figure 6: Normal probability plot of class GPA given by faculty in 268 classsections.

    -3 -2 -1 0 1 2 3Normal Distribution for GPA on 268 sections

    1.0

    1.6

    2.2

    2.8

    3.4

    4.0

    G

    P A

    Similar assessments of normality were performed on the class GPA assignedby faculty in the 268 class sections submitting student evaluations, as well as onthe student ratings from these 268 class sections. Figures 6 and 7, respectively,display these normal probability plots. The assumption of normality as a good tto the data set is once again met for the class GPA (the test statistic L

    =.0459 L.05 = .0541).

    Comparing Faculty Grades Assigned in Classes Administeringversus Not Administering Student Evaluation InstrumentWe analyzed the class GPAs given in 268 sections, taught by faculty who admin-istered student evaluations with the class GPAs assigned in 60 sections taught byfaculty who did not administer student evaluations. Figure 8 shows the box-and-whisker plots. Given the signi cant differences in the variability between the two

    groups ( S1 = .464, S2 = .347, F =1.79, two-tailed p value = .009), the two-sampleseparate-variance t test demonstrates that the slight differences in central tendencyare due to chance ( X 1 = 3.042, X 2 = 3.005, t = .704, two-tailed p value = .483).There is no evidence of a difference in the mean class GPA assigned in coursesby faculty who do the student evaluations and those who do not do the studentevaluations.

  • 7/27/2019 An Evaluation of Factors Regarding

    15/28

    Peterson et al. 389

    Figure 7: Normalprobability plot of student ratingsof faculty in 268class sections.

    -3 -2 -1 0 1 2 3

    Normal Distribution for Student Rating on 268 sections

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    S t u

    d e n

    t R a

    t i n g

    Given this important nding, a thorough investigation of all research hy-potheses based on the 268 class sections that provided student evaluations of thefaculty was undertaken.

    ANALYSES OF RESEARCH QUESTIONS

    The results from the analyses of the primary research questions and secondary

    research questions are summarized in Table 1. The various hypotheses formulatedfrom these research questions and other tangential questions are listed along withgroup sample sizes, test procedures used and p values obtained. The particular testprocedures were selected following an examination of their assumptions.

    The results summarized for the rst six hypotheses in Table 1 enable thedevelopment of an appropriate set of norming reports. Factors that need to beincluded for purposes of comparison are course session, course level, course focus,and course type.

    Findings Based on Primary Research Questions Regarding

    Student Ratings of Professors

    Semester effect

    H1: There is no semester effect based on students ratings of faculty; that is,there is no evidence of a difference between the mean student ratings givento faculty in the fall versus spring semesters ( H 01 : F S = SS).

  • 7/27/2019 An Evaluation of Factors Regarding

    16/28

    390 An Evaluation of Factors Regarding Students Assessment of Faculty

    Figure 8: Box-and-whisker plot of class GPAs assigned by faculty who admin-istered student evaluations (268 sections) versus faculty who didn t administer student evaluations (60 sections).

    60268N =

    Not ObtainingObtaining

    4.5

    4.0

    3.5

    3.0

    2.5

    2.0

    1.5

    1.0

    Class GPA (Faculty Obtaining vs. Not Obtaining Student Evaluations)

    GPA

    Results: Do not reject H 01 . There was no semester effect based on students rat-ings of faculty. Differences in students ratings of faculty in the fall versus springsemesters were due to chance ( X F S =1.617, X SS =1.553, p value = .208) and thisfactor was eliminated from further consideration in our department with respect tonorming report development.

    Course session effect

    H2: There is no course session effect based on students ratings of faculty; thatis there is no evidence of a difference between the mean student ratingsgiven to faculty in the day versus evening sessions ( H 02 : D = E ).

    Results: Reject H 02 . Students who took day classes rated their professors signif-icantly better than those students who took evening classes ( X D = 1.535, X E =1.656, p value = .017). The former student group is typically younger and workspart-time; the latter group (comprising undergraduate and graduate students) istypically older and works full-time. One possible explanation is that older and full-time working students are more demanding (i.e., have higher expectations) thantheir counterparts taking day classes. Another possible explanation could be thatolder and full-time working students may have less time to study and thus becomeunhappy with being forced to do homework.

  • 7/27/2019 An Evaluation of Factors Regarding

    17/28

    Peterson et al. 391

    Table 1: Summary of results from analyses of research questions.

    ResearchHypotheses Groups and (Sample Sizes) Tests p-Values

    PrimaryH1. Student ratings

    by semester Fall (138) vs. spring (130) Pooled variance t .208

    H2. Student ratingsby session

    Day (156) vs. evening (112) Pooled variance t .017

    H3. Student ratingsby type faculty

    Full-time (181) vs. adjunct (87) Pooled variance t .478

    H4. Student ratingsby course level

    1 200 (88), 300 (125), 400(32), 500 (23)

    Separate variance F .039

    1 200 vs. 300 Games Howell >. 0501 200 vs. 400 Games Howell .0501 200 vs. 500 Games Howell >. 050300 vs. 400 Games Howell >. 050300 vs. 500 Games Howell >. 050400 vs. 500 Games Howell >. 050400 vs. avg . (1 200, 300, and

    500)Brown Forsythe .047

    H5. Student ratingsby course focus

    Quant (91), systems (78),theory/practice (99)

    Separate variance F .043

    Quant vs. systems Games Howell >. 050Quant vs. theory/practice Games Howell >. 050Systems vs. theory/practice Games Howell >. 050Systems vs. avg . (quant and

    theory/practice)Brown Forsythe .043

    H6. Student ratingsby course type

    Core (166) vs. major requiredand elective (102)

    Separate variance t .008

    SecondaryH7. Faculty grades by

    semester Fall (138) vs. spring (130) Pooled variance t .432

    H8. Faculty grades bysession

    Day (156) vs. evening (112) Pooled variance t .216

    H9. Faculty grades bytype faculty

    Full-time (181) vs. adjunct (87) Pooled variance t .000

    H10. Faculty gradesby course level

    1 200 (88), 300 (125), 400(32), 500 (23)

    ANOVA F .000

    1 200 vs. 300 Tukey HSD .0001 200 vs. 400 Tukey HSD .0001 200 vs. 500 Tukey HSD .000300 vs. 400 Tukey HSD .635300 vs. 500 Tukey HSD .005400 vs. 500 Tukey HSD .226

    H11. Faculty gradesby course focus

    Quant (91), systems (78),theory/practice (99)

    ANOVA F .098

    H12. Faculty gradesby course type

    Core (166) vs. major requiredand elective (102)

    Pooled variance t .024

    Note: Selected tests were based on evaluations of assumptions of normality and equalityof variance.Indicates signi cance at .05, indicates signi cance at .01.

  • 7/27/2019 An Evaluation of Factors Regarding

    18/28

    392 An Evaluation of Factors Regarding Students Assessment of Faculty

    Faculty type effect

    H3: There is no type-of-faculty effect based on students ratings; that is, thereis no evidence of a difference between the mean student ratings given tofull-time versus adjunct faculty ( H 03 : FT = Ad j ).

    Results: Do not reject H 03 . There was no evidence of a type-of-faculty effect basedon students evaluations. Differences in students ratings of full-timeversus adjunctfaculty were due to chance ( X FT = 1.573, X Ad j = 1.612, p value = .478) and thisfactor was eliminated from further consideration in our department with respect tonorming report development.

    Course level effect

    H4: There is no course level effect based on students ratings of faculty; that is,there is no evidence of a difference among the mean student ratings givento faculty across the different course levels ( H 04 : 100 = 200 = 300 = 400 = 500 ).

    Results: Reject H 04 . Students in senior-level courses rated their professors signi -cantly better than did students in sophomore-level courses or below ( X 400 =1.456, X 1200 = 1.616, p value = .050) and students in senior-level courses rated their professors signi cantly better than did students taking all other undergraduate or graduate-level courses combined ( X 400 = 1.456, X combined = 1.603, p value =.047). Given that the 400- or senior-level courses are (a) in the discipline concen-tration, (b) student-selected electives, or (c) the required business capstone, onepossible explanation for their signi cantly better student evaluations is what mightbe termed a familiarity effect. The students may know the professors by thetime they take these courses and, therefore, they should experience less anxietyabout taking them as opposed to students who must take core required 100 200- or 300-level undergraduate courses and core required graduate (500) level courseswhere they are typically encountering a professor for the rst time.

    Course focus effect

    H5: There is no course focus effect based on students ratings of faculty; thatis, there is no evidence of a difference among the mean student ratingsgiven to faculty based on differing course focus ( H 05 : Q = S = T P ).

    Results: Reject H 05 . Students taking systems courses rated their teachers signif-icantly better than those taking quantitatively oriented courses or theory/practicecourses combined ( X S = 1.501, X Q&T P = 1.620, p value = .043). This couldpossibly have resulted because of business students lower comfort with either quantitatively oriented courses or with theory and practice courses requiring muchwriting than with courses dealing with computer applications.

    Course type effect H6: There is no course type effect based on students ratings of faculty; that is,

    there is no evidence of a difference between the mean student ratings givento faculty teaching core versus either major required or elective courses( H 06 : C = M R& E ).

  • 7/27/2019 An Evaluation of Factors Regarding

    19/28

    Peterson et al. 393

    Results: Reject H 06 . Students taking major required and elective courses ratedtheir professors signi cantly better than those taking core courses ( X C = 1.640, X M R& E

    =1.497, p value

    =.008). One possible interpretation is that students are

    more interested in or more serious about their major concentration or selectedelective courses; hence they give better ratings.

    Findings Based on Secondary Research Questions RegardingGrades Assigned by FacultyGiven the signi cant correlation found between the average ratings assigned bystudents to faculty and the class GPA assigned by faculty to students in the variousclass sections, it was interesting to observe both concordance and discordance withrespect to the factors investigated as part of our secondary hypotheses.

    Semester effect

    H7: There is no semester effect based on grades assigned by faculty; that is,there is no evidence of a difference between the mean class GPA given byfaculty in the fall versus spring semesters ( H 07 : F S = SS).

    Results: Do not reject H 01 . There was no semester effect based on faculty gradingof students. Differences in faculty grading in the fall versus spring semesters weredue to chance ( X F S = 3.021, X SS = 3.065, p value = .432). This nding isconcordant with the students ratings of faculty by semester and the semester factor was eliminated from further study in our department.

    Course session effect

    H8: There is no course session effect based on grades assigned by faculty; thatis, there is no evidence of a difference between the mean class GPA givenby faculty in the day versus evening sessions ( H 08 : D = E ).

    Results: Do not reject H 08 . There was no evidence of a session effect based on thegrades faculty assigned to students. Even though evening courses are comprisedof both undergraduate and graduate class sections, the differences were due tochance ( X D

    =3.013, X E

    =3.084, p value

    =.216). It should be noted here that

    this result is discordant with the students ratings of faculty by session (H2), whereit was found that the average day ratings were signi cantly higher than the averageevening ratings. This demonstrates that the faculty makes no distinction betweenday and evening students when assigning grades.

    Faculty type effect

    H9: There is no type-of-faculty effect based on grades assigned to students;that is, there is no evidence of a difference between the mean class GPAsgiven by full-time versus adjunct faculty ( H 09 : FT = Ad j ).

    Results: Reject H 09 . There was evidence of highly signi cant differences betweenfull-time and adjunct faculty with respect to grades they assign students. Speci -cally, full-time faculty gave signi cantly higher grades than did adjuncts ( X FT =3.119, X Ad j = 2.883, p value = .000). Perhaps this is due in part to the fact thatfull-time faculty are more likely to teach higher-level courses. Nevertheless, the

  • 7/27/2019 An Evaluation of Factors Regarding

    20/28

    394 An Evaluation of Factors Regarding Students Assessment of Faculty

    nding here is discordant with the previously observed results regarding studentratings of faculty. No type-of-faculty effect was found based on student evaluationsand thus this factor was eliminated from further consideration in our department

    with respect to norming report development.

    Course level effect

    H10: There is no course level effect based on grades assigned by faculty; thatis, there is no evidence of a difference among the mean class GPA givenby faculty across the different course levels ( H 010 : 100 = 200 = 300 = 400 = 500 ).

    Results: Reject H 010 . As would be anticipated, there was a positive trend effect inthe faculty assigned class GPAs based on course level. Breaking down the class

    GPA given by faculty in 1 200-, 300-, 400- and 500-level courses, signi cantlyhigher grades were given in higher-level courses ( X 1200 = 2.689, X 300 = 3.162, X 400 = 3.253, X 500 = 3.454, p value = .000).Course focus effect

    H11: There is nocourse focus effect based on grades faculty give to students, thatis, there is no evidence of a difference among the mean class GPA givenby faculty based on differing course focus ( H 011 : Q = S = T P ).

    Results: Do not reject H 011 . There was no evidence of a course focus effect basedon the grades faculty assigned to students ( X Q

    =3.032, X S

    =3.131, X T P

    =2.981,

    p value =.098). Differences in facultygradingamong quantitative-oriented classes,systems classes and theory/practice classes were insigni cant and due to chance.This nding is discordant with the students ratings of faculty by course focus,although the two sets of means for the types of courses (quantitative, systems, andtheory/practice) are in the same sequential order.

    Course type effect

    H12: There is no course type effect based on grades assigned by the faculty, thatis, there is no evidence of a difference between the mean class GPA faculty

    give in core versus either major required or elective courses ( H 012 : C = M R& E ). Results: Reject H 012 . As would be anticipated, faculty gave signi cantly higher grades to students taking major required and elective courses than to those tak-ing core courses ( X C = 2.992, X M R& E = 3.124, p value = .024). This ndingis concordant with the students ratings of faculty by course type. One possibleexplanation for this nding is that perhaps students are more serious about their major and elective courses because they perceive a more relevant learning experi-ence and hence put in greater efforts, and thus obtain signi cantly better grades insuch courses.

    A SET OF CUSTOMIZED NORMING REPORTS

    In past semesters faculty at the business school received the results of the previoussemester s course evaluations as a packet that consisted of, for each course: (1)

  • 7/27/2019 An Evaluation of Factors Regarding

    21/28

    Peterson et al. 395

    Figure 9: An illustrated set of norming reports.

    a single sheet listing the evaluation item and the corresponding mean or each of the 10 items, along with a mean of the means; (2) a simple summary table of theindividual responses to each item; (3) an ordered array of the single, overall meanrating of each class in the department; and (4) a stem-and-leaf display of thesesame mean ratings. The latter two items were an attempt to help faculty comparetheir results with those obtained by other department colleagues.

    Based on the research conducted herein, the development of customized setsof norming reports (see Figure 9) greatly expand the evaluation information avail-able to the faculty by providing detailed data to allow comparisons of one faculty ssingle course results to other groups of scores obtained by similar normed groups.For example, in Figure 9 the four boxes represent four different comparisons: Re-port 1 compares the results obtained for a particular faculty member in a particular

    course section in one semester with results received by all faculty teaching thatsame course in that semester. Report 2 compares the results obtained for a par-ticular faculty member in a particular course section in one semester with resultsreceived by all faculty, teaching that course in previous semesters. Reports 3 and 4(and other reports not shown) provide comparisons within the particular semester (Report 3) and for previous semesters (Report 4) based on other norming group-ings. In Figure 9, for example, the norming group for Reports 3 and 4 representsthe course level (i.e., in this case 400 level rather than not 400 level). Other potential norming groups (not shown) include course session (i.e., day or evening),course focus (i.e., systems emphasis or non-systems emphasis), and course type

    (i.e., required core business school courses or courses taken as part of a selecteddiscipline concentration, or electives).

    Within each table the faculty member is provided helpful statistics. In Re-port 1, for example, column two ( Evaluation ) shows the mean rating for thisindividual on each of the items. Column three lists the means on each of the items

  • 7/27/2019 An Evaluation of Factors Regarding

    22/28

    396 An Evaluation of Factors Regarding Students Assessment of Faculty

    for all faculty in the norming group in this case all faculty teaching the particular course that semester. Column four provides the standard deviations ( SD) on eachof the items for all faculty in the norming group. Column ve lists the correspond-

    ing standardized scores, Z , the deviations from expectations relative to the standarddeviations. Finally, column six identi es the percentile of the report owner s classscore on each item in comparison to the norming group. So, for example, lookingat item 1 ( Q1) in Report 1, this faculty member s resulting student rating was inthe 63rd percentile, some 37% below the best rating in the norming group.

    For the individual faculty member and for administration alike, the norm-ing reports provide a means of assessing class evaluation results both within andamong semesters by comparing individual results with various norming groups.In the example above, the 63rd percentile rating on Q1 (demonstrating the impor-tance/ relevance of the subject matter) might be a catalyst for self-re ection and/or discussion with administration as to what instructor behaviors to modify to changethat rating in future evaluations. Comparisons of the results for this item acrossthe various norming groups provide additional insights into behaviors that mightimpact future results.

    Using customized sets of norming reports would assist the administrationand faculty to better understand the student ratings and plans for improvement. For example, suppose one professor teaches INFO 270 (a core statistics course) in theday, INFO 375 (a core operations management course) in the evening and INFO501 (a core statistics course) in the graduate program. Her rst customized set of

    10 reports must show comparisons with other INFO 270 classes, with other daysession classes, with other non-400-level course classes, with other non-systemsfocused classes, and with other required B-school core classes. The current coursestudent ratings would be compared on these ve factors over the most recentsemester, as well as historically over several previous semesters. Note that her setsof norming reports for INFO 375 and INFO 501 classes would have the samereference groupings as the above set with respect to course level (i.e., non-400-level course classes), course focus (i.e., non-systems focused classes) and coursetype (i.e., required business school core classes). They would, however, differ fromthe rst set based on session (i.e., the latter two are evening classes) and the speci c

    course (i.e., INFO 375 or INFO 501).

    DISCUSSION

    Scope and LimitationsOverall, this study makes two primary contributions to the literature. First, it pro-vides an empirical examination of the factors that might be in uencing students ratings. Second, it offers suggestions on how these ratings might be used by admin-istrators. These two contributions offer more evidence that supports Brightman s(2005) recommendations that any evaluation instrument focused on faculty teach-

    ing must (1) be reliable and valid and (2) have a meaningful norming report.The results from this study highlighted the fact that there are non-teaching

    related factors that in uence the students assessment of professors. These ndingsare quite informative and have implications for such assessments that are intendedto capture teaching-related factors (e.g., knowledge of the topic, effectiveness

  • 7/27/2019 An Evaluation of Factors Regarding

    23/28

    Peterson et al. 397

    of delivering the material). Speci cally, providing evidence that class session(day/evening), class level, class focus, and class type may differentially in uencestudent assessments of faculty should signal caution in how faculty teaching eval-

    uations are interpreted and used for administrative purposes. For example, facultymembers who teach only day classes would be expected to have higher ratings thanif they taught the same classes solely in the evening. In accordance with Brightman(2005), it would be unfair to compare faculty members without using the correctnorming reports.

    There are several issues associated with these ndings that need to be ad-dressed. First, it is expected that faculty would use the feedback from these evalua-tions to make changes to their pedagogical style and improve as a teacher. However,it may be the case that faculty may actually improve or adapt their teaching styles togame the system. That is, they may engage in behaviors designed to enhance their ratings, but not necessarily improve student learning. Some professors may makeeasier examinations so they can givehighergrades, teach higher-level classes, teachclasses that students elect to take, or teach only during the day. Although studentratings of their teaching may increase, it is not in line with the ultimate goal of en-hancing student learning. Establishing more appropriate norming reports may be abetter way to accurately evaluate faculty and diminish the probability of gaming the system.

    Another issue concerns limiting the study to classes of suf cient size toprovide student ratings and to examine the faculty assigned grade distribution. In

    this study, we included only classes with 14 or more students since we reasonedthat classes with 13 or fewer students, although not statistical outliers, were toosmall for appropriate analysis. Students in small classes may feel pressured thattheir evaluationscould be identi edbya faculty member, who may yet see the samestudents again in other, small-sized, required or elective classes. It might be usefulfor future research to examine these smaller classes to determine if the results aresimilar to what we found for the larger classes. Conversely, it would also be helpfulto determine if these effects were operating in large lecture hall classes and other disciplines.

    The nature of any instrument s reliability and validity needs to be carefully

    examined. In this study, the student evaluation instrument s reliability was ascer-tained. However, its validity can only be assumed and the demonstration of suchvalidity will be part of our continued research in this area. One way to assess thevalidity of the instrument would be to have peer evaluations performed by fellowfaculty members or by a department chairperson and then correlate their ratingsof teaching effectiveness with the student ratings. Further, it is also important tolook at the individual items of the instrument, as some may be double-barreleditems that confuse readers or others may not adequately assess teaching-relatedfactors. It might also be useful to alter the sequence of scaled responses by word-ing some of the items with positive phrases and other with negative phrases. This

    method may capture whether students are paying careful attention to the itemsor simply marking the same response for each item in a hurried fashion. Finally,future developers of these instruments should also carefully de ne, select, andmeasure the factors included in their instrument. Failing to consider the issuesrelated to the actual items or student response tendencies will only diminish the

  • 7/27/2019 An Evaluation of Factors Regarding

    24/28

    398 An Evaluation of Factors Regarding Students Assessment of Faculty

    validityof thescale andhaveprofound implicationsfor theadministrativeuseof theratings.

    Summary and RecommendationsIn sum, these ndings highlight the importance of empirically investigating therole of non-teaching factors in the assessment of faculty by students. Several non-teaching factorsmay undulyin uencetheratingsand only cloudthetrue assessmentof teaching-related factors. Our ndings about the course level (100, 200, 300, 400and500) aresimilar to what Costinet al. (1971) found in their research. Senior-levelcourses had better ratings. Our ndings on the course focus (quantitative, systems,and theory/practice) were similar to those of Aigner and Thum (1986); they calledthis factor course characteristics.Moredemandingcourses resulted in lower ratings.This was also con rmed by Paswan and Young (2002). Even though Blackburn andLawrence (1986) found that the rank of faculty could in uence student ratings, wedid not observe a signi cant effect between full time and adjunct faculty. Finally,one interesting nding of this research is the signi cant difference in averagestudent ratings for day and night sessions (classes). As anticipated, students ratedfaculty lower and likewise the faculty gave worse grades in core courses. Therewas no semester effect with respect to students ratings and faculty grades.

    Based on these ndings, several recommendations can be offered. First, it isrecommended that this processbeadaptedbyotherbusinessschooldepartmentsandother academic units across the university. Employing this methodology elsewherewould help insure a more universally appropriate usage of students ratings by,business schools as well as by other institutions of higher learning.

    Second, it is recommended that other departments, other schools within theuniversity, and other institutions of higher learning wishing to establish appropriatesets of norming reports should consider a possible semester factor effect not ob-served here. For example, average student ratings and class GPA may demonstratea semester factor effect in courses that run in a one-year sequence such as typicallyobserved in business school offerings of accounting, as well as micro- and macroe-conomics, and outside a business school in such offerings as one-year course in

    mathematics, physics, chemistry, language, writing, or literature. Although we didnot observe a semester effect in our data set for one-year classes, it may be possiblethat the effect is evident in other departments or universities.

    Third, it is recommendedthat future researchexamine thepossibility of ratingbias at the extremes of the rating scales. There were some students at the extremehigh end of the rating scale and other students who were at the extreme low end of the rating scale. It would be informative to examine why students in the same classperceived the same teacher in such discrepant ways, other than simply based onthe grades they were receiving in the class at the time of completing the evaluationform.

    Finally, it is recommended that departments and universities systematicallydevelop their own norming reports and carefully incorporate them into the devel-opment of a mentoring system for junior faculty, adjunct, or any other professor looking to improve his or her teaching. Speci cally, it would be very useful tolog these semester-by-semester reports into a faculty le. This would enable both

  • 7/27/2019 An Evaluation of Factors Regarding

    25/28

    Peterson et al. 399

    the administrator and the faculty member to re ect on student evaluation ratingsover time, by course, to examine whether improvements in the student ratings haveoccurred. Using detailed record keeping procedures would allow for more timely

    and accurate feedback to be provided to individual faculty members, especially junior faculty, who may still be developing their unique pedagogical style. Thedetailed reports may facilitate a more thorough and detailed mentoring process.Most mentoring may occur in a more informal manner, but the development of norming reports may make mentoring programs more formal and effective, giventhat the feedback is more relevant and targeted.

    REFERENCES

    Aigner, D. J., & Thum, F. D. (1986). On student evaluation of teaching ability. The Journal of Economic Education, 17 , 243 265.

    Bacon, D. R.,& Novotny, J. (2002). Exploring achievement striving as a moderator of the grading leniency effect. Journal of Marketing Education, 24 , 4 14.

    Bejar, I. I., & Doyle, K. O. (1976). The effect of prior expectations on the structureof student ratings of instruction. Journal of Educational Measurement, 13 ,151 154.

    Berenson, M. L., Levine, D. M., & Goldstein, M. (1983). Intermediate statistical

    methods and applications: A computer package approach . Englewood Cliffs,NJ: Prentice Hall.

    Blackburn, R. T., & Lawrence, J. H. (1986). Aging and the quality of faculty jobperformance. Review of Educational Research, 56 , 265 290.

    Brightman, H. J. (2005). Mentoring faculty to improve teaching and student learn-ing. Decision Sciences Journal of Innovative Education, 3 , 191 203.

    Brown, M. B., & Forsythe, A. B. (1974). The ANOVA and multiple comparisonsfor data with heterogeneous variances. Biometrics, 30 , 719 724.

    Burba, F. J., Petrosko, J. M., & Boyle, M. A. (2001). Appropriate and inappropriateinstructional behaviors for International training. Human Resource Develop-ment Quarterly, 12 , 267 283.

    Centra, J. (1979). Determining faculty effectiveness: Assessing teaching, research,and service for personnel decisions and improvements . San Francisco:Jossey-Bass.

    Centra, J. (1982). Determining faculty effectiveness . San Francisco: Jossey-Bass.

    Chronbach, L. J. (1951). Coef cient alpha and the internal structure of tests. Psy-chometrika, 16 , 297 334.

    Cohen, P. (1980). Effectiveness of student-rating feedback for improving collegeinstruction: A meta-analysis of ndings. Research in Higher Education, 21 ,321 341.

    Costin, F. (1968). Survey of opinions about lectures . Department of Psychology,University of Illinois.

  • 7/27/2019 An Evaluation of Factors Regarding

    26/28

    400 An Evaluation of Factors Regarding Students Assessment of Faculty

    Costin, F., Greenough, W. T., & Menges, R. J. (1971). Student ratings of col-lege teaching: Reliability, validity, and usefulness. Review of Educational Research, 41 , 511 535.

    Crawford, P. L., & Bradshaw, H. L. (1968). Perceptions of characteristics of effec-tive university teachers: A scaling analysis. Educational and Psychological Measurement, 28 , 1079 1085.

    Crittenden, K. S., Norr, J. L., & LeBailly, R. K. (1975). Size of university classesand student evaluation teaching. The Journal of Higher Education, 46 , 461 470.

    Driscoll, L. A., & Goodwin, W. L. (1979). The effects of varying information aboutuse and disposition of results on university students evaluations of facultyand courses. American Educational Research Journal, 16 (1), 25 37.

    Everett, M. D. (1977). Student evaluations of teaching and the cognitive level of economics courses. The Journal of Economic Education, 8 , 100 103.

    French, G. M. (1957). College students concept of effective teaching determinedby an analysis of teacher ratings. Dissertation Abstracts, 17 , 1380 1381.

    Frey, W. (1973). Student ratings of teaching: Validity of several rating factors.Science, 182 , 83 85.

    Frey, P. W., Leonard, D. W., & Beatty, W. W. (1975). Student ratings of instruction:Validation research. American Educational Research Journal, 12 , 435 444.

    Games, P. A., & Howell, J. F. (1976). Pairwisemultiplecomparison procedureswithunequal n s and/or variances: A Monte Carlo study. Journal of EducationalStatistics, 1 , 113 125.

    Greenwood, G. E., & Ramagli, H. J., Jr. (1980). Alternatives to student ratings of college teaching. The Journal of Higher Education, 51 , 673 684.

    Hamilton, L. C. (1980). Grades, class size, and faculty status predict teachingevaluations. Teaching Sociology, 8 (1), 47 62.

    Kutner, M., Nachtsheim, C., Neter, J., & Li, W. (2005). Applied linear statisticalmodels (5th ed.). New York: McGraw-Hill/Irwin.

    Lawrence, S., & Sharma, U. (2002). Commodi cation of education and academiclabour using the balanced scorecard in a university setting. Critical Per-spectives on Accounting, 13 , 661 677.

    Levine, D. M., Stephan, D., Krehbiel, T. C., & Berenson, M.L. (2008). Statistics for managers using Microsoft Excel (5th ed.). Upper Saddle River, NJ: PrenticeHall.

    Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with meanand variance unknown. Journal of the American Statistical Association, 62 ,399 402.

    Lima, A. K. (1981). An economic model of teaching effectiveness. The American Economic Review, 71 , 1056 1059.

    Myers, D. J., & Dugan, K. B. (1996). Sexism in graduate school classrooms:Consequences for students and faculty. Gender and Society, 10 , 330 350.

  • 7/27/2019 An Evaluation of Factors Regarding

    27/28

    Peterson et al. 401

    Paswan, A. K., & Young, J. A. (2002). Student evaluation of instructor: A nomolog-ical investigation using structural equation modeling. Journal of Marketing Education, 24 , 193 202.

    Pohlmann, J. T. (1975). A description of teaching effectiveness as measured bystudent ratings. Journal of Educational Measurement, 12 (Spring), 49 54.

    Singh, G. (2002). Educational consumers or educational partners: A critical theoryanalysis. Critical Perspectives on Accounting, 13 , 681 700.

    Small, J.,& Mahon, S. (2005). New evaluation form, same oldattitude ?Astudyof perceptions of the lecturer evaluation process. Journal of Eastern CaribbeanStudies, 30 (2), 75 90.

    Tukey, J. W. (1977). Exploratory data analysis . Reading, MA: Addison-Wesley.

    Velleman, P. F., & Hoaglin, D. C. (1981). Applications, basics, and computing of exploratory data analysis. Boston: Duxbury Press.

    Voeks, V. W., & French, G. M. (1960). Are student-ratings of teachers affectedby grade? The report of three studies at the university of Washington. The Journal of Higher Education, 31 , 330 334.

    Ward, S. P., Cook, E. D., Ward, D. R., & Wilson, T. E., Jr. (1999). The effect of stu-dent gender on perceptions of instructor behavior and teaching effectivenessin the upper level accounting classroom. Journal of Accounting and Finance Research, 7 (3), 15.

    Wilson, D., & Doyle, K. O., Jr. (1976). Student ratings of instruction: Student andinstructor interactions. The Journal of Higher Education, 47 , 465 470.

    Zangenehzadeh, H. (1988). Grade in ation: A way out. The Journal of Economic Education, 19 , 217 226.

    Richard L. Peterson is a professor and the chairperson of the Management andInformation Systems Department in the School of Business at Montclair StateUniversity. He received a BS in Education from Edinboro State University, a MSin Educational Psychology from Penn State, and a PhD in Liberal Arts (Inter-

    disciplinary, including communications, computer science, and educational psy-chology) from Penn State. Prior to joining Montclair State, he founded and man-aged a full-service, custom consultancy focused on business process innovationand change. He is highly regarded among students and faculty for his innovativeapproaches to learning. He teaches courses in systems analysis and design. Hisresearch interests extend across a number of disciplines, including pedagogy, deci-sion making, networking, and process innovation. His recent research has appearedin Managerial and Decision Economics and the Journal of Academy of Businessand Economics .

    Mark L. Berenson is a professor of Management and Information Systems in theSchool of Business at Montclair State University. He is also Professor Emeritusof Statistics in the Zicklin School of Business, Baruch College, City University of New York (CUNY). Hereceivedhis BA in economicstatisticsand his MBA in busi-ness statistics from CCNY and his PhD in quantitative methods in business fromCUNY. Over the years he has received numerous awards for teaching, innovative

  • 7/27/2019 An Evaluation of Factors Regarding

    28/28

    402 An Evaluation of Factors Regarding Students Assessment of Faculty

    contributions to statistics education, and for service. At Montclair State he is thechair of the School of Business Learning Goals and Assessment Committee. He isa coauthor of eleven statistics texts published by Prentice Hall and his research has

    appeared in Review of Business Research, The American Statistician, Communi-cations in Statistics, Psychometrika, Education and Psychological Measurement, Journal of Health Administration Research , Encyclopedia of Measurement and Statistics , and in the Encyclopedia of Statistical Sciences.

    Ram B. Misra is a Professorof Management andInformation Systems at Montclair State University. He received his PhD in Operations Research with a minor inStatistics from Texas A&M University. He also earned an Executive MBA fromColumbia School of Business. Prior to joining Montclair State University, he wasan executive director at Telcordia Technologies (formerly Bell CommunicationsResearch). He has over 20 years of telecom industry experience that spans BellLabs, Bell Communications Research and Telcordia Technologies. Before joiningBell Labs, he taught at Texas A&M University and the University of Houston.He has published in journals such as IEEE Transactions, International Journal of Management Research, International Journal of Production Research, and Naval Logistics Review .

    David J. Radosevich is an associate professor at Montclair State University with a joint appointment in the Department of Management and Information Systems andin the Department of Psychology. He received his BA in psychology from WesternMaryland College and his PhD in industrial/organizational psychology from theUniversity at Albany, State University of New York. He has taught a wide varietyof both graduate and undergraduate classes in organizational behavior, humanresourcemanagement, researchmethods, statistics, and psychology. His consultingwork has been in the area of selection, training, needs assessment, and performancemanagement. He has published in journals such as Journal of Applied Psychology,Contemporary Educational Psychology, Seoul Journal of Business, International Journal of Business Research, Review of Business Research , and Innovate.