factors in the evaluation of instructors by students

6
IEEE TRANSACTIONS ON EDUCATION, VOL. E-18, NO. 3, AUGUST 1975 Factors in the Evaluation of Instructors by Students HERBERT C. RATZ, SENIOR MEMBER, IEEE Abstract-The evaluation of instructors by students through some sort of anonymous questionnaire is a procedure that has become estab- lished at many Universities. This produces numerical data that are used to classify the relative teaching effectiveness of instructors. Such in- structor ratings have been collected over a four year period in the Electrical Engineering Department at the University of Waterloo, Can- ada. These data were analyzed statistically to investigate the possible effects of factors other than classroomn performance; such as, teaching a subject for the first time, teaching service courses in other departments, subject matter, level of research support, admission year of the class, and class size. The majority of instructors were indistinquishable, al- though a few others were consistently different from this majority. Some observations are made, from this experience, regarding the signif- icance of questionnaire ratings, their use in the University system, and the improvement of the quality of teaching. I. INTRODUCTION T HE Electrical Engineering Department at the University of Waterloo, Ontario, has used the same questionnaire procedure since 1969 to sample student opinion regard- ing the instructors in all courses. The results provide feedback both to the individual instructor, and to the Chairman of the Department. The objectives, presumably, are to improve the quality of teaching, and to provide a measure of teaching ef- fectiveness in the form of an instructor rating. Because of this latter use, questions naturally arise as to the consistency of the ratings, and their dependence on factors other than the in- structor's performance; such as, subject matter, class size, teaching assignments, a particular group of students, the in- structor's research activity, etc. This paper reports the results of a statistical study made on the data from questionnaires accumulated over a four year period. The questionnaire consisted of both a "free form" part and a "scaled" part. The latter can be quantified, encoded, and analyzed statistically, and attached to the instructor's name for use elsewhere in the University system. Unfortunately, scaled forms are not particularly useful at detecting problems and specifying points for improvement, and the dependence of the results upon the arbitrary scaling selected is largely un- known. Nevertheless, numerical data are readily processed, have a fascination for academic administrators, and such informa- tion is, therefore, the type we have analyzed. It is very impor- tant, however, to take note of the existence of the "free form" part used in conjunction with the questionnaire, since this is a superior way for the instructor to learn what is going on, and where improvement would increase his effectiveness as a teacher [2]. At the very least, student evaluation responses indicate the instructor's communication skills, and the "free Manuscript received October 15, 1974; revised March 28, 1975. The author is with the Department of Electrical Engineering, Uni- versity of Waterloo, Waterloo, Ont., Canada. form" part tells him directly some important aspects of these skills which can effect the quality of his teaching. These are in addition to the numerical measure of platform performance which may give very few hints that are really helpful to him. The possible immediate benefits of the free-form feedback are lost in our case because all forms are held until after the end of the examination period, presumably to avoid jeopardy to the respondants. However, the replies are anonymous so there can be no individual identification in any case. Although the students are thus doubly protected, the ethics are not nearly so clear concerning the use in the University system of the numerical ratings obtained from the scaled questionnaire, and that become associated with the name of a particular in- structor. By-passing the issue of employing student ratings as measures of overall teaching quality, there is the question of the place of anonymous evaluations as factors in faculty career decisions. Such evidence has been described as "irresponsible, inadmissible in any just procedure; it is all immature, all in- competent regarding the subject;" [6]. But one cannot decry numerical ratings as such, since, like counting published papers, the results of such personnel deliberations, whether subjectively or objectively arrived at, are undoubtedly numeri- cal in value to several significant figures; namely in dollars. Yet the effect of anonymous questionnaires on salary, promo- tion, and tenure could be like that of a hidden credit rating without, at the same time, yielding any constructive indicators that would improve teaching effectiveness. Indeed there is evidence that teaching deliberately directed towards achieving a good average student rating is detrimental to the quality of education in any broader sense of intellectual development [13]. Personnel decisions by Department Heads on salary and pro- motion have always purported to include research ability or potential, publication activity, and teaching quality. The de- bate between teaching and research is not about the validity of these functions, but is usually concerned with the merit of particular measures in each case. However, because of the "halo effect" (good at one thing, say publications; therefore, good at others, like teaching) Department Heads have tended to judge good researchers as also good teachers where student ratings show no connection: so that professorial rank becomes correlated with research ability (publication productivity) and not with teaching quality as indicated by students' evaluation [5]. The connection between teaching and research not only depends upon the measures chosen in each case, but also upon the institutional setting of these activities. For example, if perceived research ability strongly influences teaching assign- ments then such a policy may contribute directly to differ- ences in teaching effectiveness measures. A good researcher can become a good teacher if promoted and asked only to 122

Upload: herbert-c

Post on 24-Sep-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Factors in the Evaluation of Instructors by Students

IEEE TRANSACTIONS ON EDUCATION, VOL. E-18, NO. 3, AUGUST 1975

Factors in the Evaluation of Instructors by StudentsHERBERT C. RATZ, SENIOR MEMBER, IEEE

Abstract-The evaluation of instructors by students through somesort of anonymous questionnaire is a procedure that has become estab-lished at many Universities. This produces numerical data that are usedto classify the relative teaching effectiveness of instructors. Such in-structor ratings have been collected over a four year period in theElectrical Engineering Department at the University of Waterloo, Can-ada. These data were analyzed statistically to investigate the possibleeffects of factors other than classroomn performance; such as, teaching asubject for the first time, teaching service courses in other departments,subject matter, level of research support, admission year of the class,and class size. The majority of instructors were indistinquishable, al-though a few others were consistently different from this majority.Some observations are made, from this experience, regarding the signif-icance of questionnaire ratings, their use in the University system, andthe improvement of the quality of teaching.

I. INTRODUCTIONT HE Electrical Engineering Department at the University

of Waterloo, Ontario, has used the same questionnaireprocedure since 1969 to sample student opinion regard-

ing the instructors in all courses. The results provide feedbackboth to the individual instructor, and to the Chairman of theDepartment. The objectives, presumably, are to improve thequality of teaching, and to provide a measure of teaching ef-fectiveness in the form of an instructor rating. Because of thislatter use, questions naturally arise as to the consistency of theratings, and their dependence on factors other than the in-structor's performance; such as, subject matter, class size,teaching assignments, a particular group of students, the in-structor's research activity, etc. This paper reports the resultsof a statistical study made on the data from questionnairesaccumulated over a four year period.The questionnaire consisted of both a "free form" part and

a "scaled" part. The latter can be quantified, encoded, andanalyzed statistically, and attached to the instructor's namefor use elsewhere in the University system. Unfortunately,scaled forms are not particularly useful at detecting problemsand specifying points for improvement, and the dependenceof the results upon the arbitrary scaling selected is largely un-known. Nevertheless, numerical data are readily processed, havea fascination for academic administrators, and such informa-tion is, therefore, the type we have analyzed. It is very impor-tant, however, to take note of the existence of the "free form"part used in conjunction with the questionnaire, since this is asuperior way for the instructor to learn what is going on, andwhere improvement would increase his effectiveness as ateacher [2]. At the very least, student evaluation responsesindicate the instructor's communication skills, and the "free

Manuscript received October 15, 1974; revised March 28, 1975.The author is with the Department of Electrical Engineering, Uni-

versity of Waterloo, Waterloo, Ont., Canada.

form" part tells him directly some important aspects of theseskills which can effect the quality of his teaching. These arein addition to the numerical measure of platform performancewhich may give very few hints that are really helpful to him.The possible immediate benefits of the free-form feedback

are lost in our case because all forms are held until after theend of the examination period, presumably to avoid jeopardyto the respondants. However, the replies are anonymous sothere can be no individual identification in any case. Althoughthe students are thus doubly protected, the ethics are notnearly so clear concerning the use in the University system ofthe numerical ratings obtained from the scaled questionnaire,and that become associated with the name of a particular in-structor. By-passing the issue of employing student ratings asmeasures of overall teaching quality, there is the question ofthe place of anonymous evaluations as factors in faculty careerdecisions. Such evidence has been described as "irresponsible,inadmissible in any just procedure; it is all immature, all in-competent regarding the subject;" [6]. But one cannot decrynumerical ratings as such, since, like counting publishedpapers, the results of such personnel deliberations, whethersubjectively or objectively arrived at, are undoubtedly numeri-cal in value to several significant figures; namely in dollars.Yet the effect of anonymous questionnaires on salary, promo-tion, and tenure could be like that of a hidden credit ratingwithout, at the same time, yielding any constructive indicatorsthat would improve teaching effectiveness. Indeed there isevidence that teaching deliberately directed towards achievinga good average student rating is detrimental to the quality ofeducation in any broader sense of intellectual development[13].Personnel decisions by Department Heads on salary and pro-

motion have always purported to include research ability orpotential, publication activity, and teaching quality. The de-bate between teaching and research is not about the validity ofthese functions, but is usually concerned with the merit ofparticular measures in each case. However, because of the"halo effect" (good at one thing, say publications; therefore,good at others, like teaching) Department Heads have tendedto judge good researchers as also good teachers where studentratings show no connection: so that professorial rank becomescorrelated with research ability (publication productivity) andnot with teaching quality as indicated by students' evaluation[5].The connection between teaching and research not only

depends upon the measures chosen in each case, but also uponthe institutional setting of these activities. For example, ifperceived research ability strongly influences teaching assign-ments then such a policy may contribute directly to differ-ences in teaching effectiveness measures. A good researchercan become a good teacher if promoted and asked only to

122

Page 2: Factors in the Evaluation of Instructors by Students

RATZ: EVALUATION OF INSTRUCTORS BY STUDENTS

teach small classes of advanced students who have electedsubjects in his speciality.The spread of questionnaire evaluation of teaching during

the sixties seemed at least partly motivated by the desire toprovide a safety valve for student participation more than byany planned intent to pay attention to the results [10]. Inthis way, the system can be made to appear to correct poorteaching and to placate dissatisfied customers, although thedissatisfaction may not be related to the quality of teachingaccording to other measures [9]. Thus a token step towardsstudent involvement in this area has established a permanentstructure with possible effects which must now be taken seri-ously. Recently, reservations have been expressed about bothwhat is being measured, and how the measures are used [5].Questionnaire procedures in which the quality of service is

judged by the immediate reaction of the consumer have beencharacterized as "gastronomic tests" [12] . The basic justifica-tion for recording scaled measures of student reaction during ateaching termn is based on the viewpoint of students as con-sumers. It may not be related at all to any objective measureof knowledge or skills acquired. In some reports, studentshave evaluated most highly those instructors from whom theylearned the least [9]. Zelby [13] finds that students favorsecure, straightforward teaching of textbook material overmethods more demanding of powers of generalization or in-sight that seem less comfortable. If we are not to regard thecurrent student body as consumers, then former students oralumni might provide better perspective [10], or the use ofexternal examinations might be more objective [4]. Lancaster[7] reports on a scheme using delayed student opinions ac-quired two years after the teaching experience. Certainly stu-dents are customers to a considerable extent, but society in amuch broader sense is both the consumer and purchaser ofthe university system and may, with some justification, viewthe student as a product of that system.We would conclude that the free form response provides the

best immediate feedback for the improvement of an instruc-tor's platform performance. Questionnaires which tabulatethe students' gastronomic reaction to a teacher may be a use-ful indicator of the effectiveness of his communications skills,but it definitely should not be inferred that such ratings rep-resent a complete measure of the quality of education. In thiscontext, we have analyzed over four years of student ratingson all instructors and in all courses in the Electrical Engineer-ing program at Waterloo. These data have been categorized indifferent ways in an attempt to identify any consistency thatmight exist between the ratings and certain objective factors.

II. PROCEDURE AND METHODS

The Faculty of Engineering at the University of Waterloooperates a Co-operative program on a trisemester basis. Thenormal four year program is taken in eight semesters, from1A and lB to 4A and 4B, interwoven with six work terms.A freshman class enters 1 A in the Fall term and then dividesinto two streams ("A" and "B") which alternate academicand work terms in a leap-frog fashion until they finish to-gether in 4B in a Winter term. Each subject is of one term

duration, and the total program including work terms, oc-cupies 14 semesters. The data available for this study coverthe 13 term period from Winter 1969 until Winter 1973 in-clusive. During that period, the same questionnaire was usedto collect student opinion on all courses in the ElectricalEngineering program (years 2, 3, and 4), and also on thoseso-called "service course" subjects taught by instructors fromthe Electrical Engineering Department in the common firstyear or in other departments in the Engineering Faculty.

The questionnaire is filled out during a scheduled lectureperiod during the ninth or tenth week of a term, and the re-sults are held by the Departmental Administrative Assistantuntil all examination results for that term are finalized. Thenumerical ratings are tabulated and given to the instructorand the Departmental Chairman and the comments or free-form section is given only to the instructor. Figs. 1 and 2show the questionnaire, with Fig. 1 being the very simple"free form" part.Shown in Fig. 2 is the page of questions which is the part

of the evaluation procedure that yields the instructor ratings.The first eleven questions use the graded scale with thevalues:

A = outstanding = 5, to E = poor = 1.

The instructor's rating is obtained by averaging the answersto the first 10 questions. The numerical scale from the ques-tionnaire has a range from 1 to 5 with a central value of 3.22.For convenience, we have scaled all the ratings so that thiscentral value becomes 100 points, and the range of values isfrom 31 to a maximum of 155 points on this normalized scale.Questions 11 to 14 inclusive were not used in obtaining the

rating of an instructor as a teacher. Nevertheless, they wereincluded as factors which might influence a student's evalua-tion of teaching effectiveness. For example, there are manyanecdotal data among faculty members indicating that theacademically better students (as measured by question 14) aremore appreciative of an instructor's teaching effectiveness,and conversely, that poor instructor ratings come from poorerstudents. To test this, we ran the original response cards andcomputed correlations between the average of questions 1 to10 and the responses to questions 11, 12, and 14. A total of7,097 cards were involved. The resulting correlation coef-ficients were subjected to significance tests using Fisher's "z"test and a "t" test [14] .

In all other tests, the average ratings by a class for a particu-lar subject in a given semester were used, and categorized ac-cording to one identified factor. Then, the possible signifi-cance of that factor could be investigated statistically.Contingency tables are suitable for this purpose [16], and inparticular, we employed the median contingency test [15].In this procedure, the number of occurrances in each categoryare counted above or below the grand median, and the result-ing distribution tested by a chi-squared statistic against thenull hypothesis of uniformity. The median contingency test isa non-parametric test, but the more powerful one-way analysisof variance and F-test requires assumptions about the normal-ity of the data and the homogeneity of the variance. We re-port results of the application of analysis of variance only in

123

Page 3: Factors in the Evaluation of Instructors by Students

IEEE TRANSACTIONS ON EDUCATION, AUGUST 1975

COIMENT SHEET

Please turn in this comment sheet on Faculty Menbers with your answer sheet.

COURSE NJUMBER: PROFESSOR

Your Professor would like to know if there is something you believe he has done

especially well in his teaching of this course.

Your Professor would also like to know what specific things you believe might

be done to improve his teaching of this course.

Fig. 1. The "free form" comment sheet used with the instructorevaluation.

SURVEY OF STUDENT OPINION OF TEACHING

Please answer all questions as follows:

A = outstanding: B = superior; C = average; D = below average; E = poor

1. Articulation and lecture manner

2. Sensitive to class reaction, understanding and suggestions

3. Interprete abstract ideas and theories clearly

4. Gets me interested in his subject

5. Has helped broaden my interest

6.

7.

8.

9.

10.

Stresses important material

Makes good use of examples and illustrations

Inspires class confidence in his knowledge of the subject

Has given me new viewpoints or appreciations

Is clear and understandable in his explanations

11. Rate your interest in the subject matter of the course, independentof the instructor.

12. Estimate the average hours per week of private study associated with

the lectures and problems of this course. Pick the case nearest

to your estimate: A = 13, B = 10, C = 7, D= 4, E = 1 hour.

13. Estimate the average hours per week used to complete laboratoryassignments (if any) outside the laboratory. Use same scale as

question 12.

14. Enter your overall average grade for the last complete academic

term: A = 87-93, B = 80-86, C = 73-79, D = 66-72, E 59-65

per cent.

Fig. 2. Questionnaire for the evaluation of instructors.

those cases where the validity of these assumptions have beensatisfactorily verified.Where a hypothesis regarding the normality of data is of

interest, we have employed the chi-squared test for uniformityon eight equally likely intervals, as computed from the meanand variance of the data, and the null hypothesis of no depar-ture from normality [16]. Bartlett's test for homogeneity ofvariances [141 was used when one-way classification was beinginvestigated.

Identical cases of instructor and subject combinations were

investigated using paired comparisons and the "t" test on thedifferences of the pairs [14]. This test assumes normality forthe differences, but equal variances for the variables beingcompared are not required.In all statistical significance tests, the level of significance for

the detection of an effect is 0.1%, although any level between0. 1% and 10% supports the same conclusions concerning thepresence or absence of an effect.The complete data listed 75 instructors and 45 subjects, but

there were entries for only 195 or less than 6% of the possiblecombinations. Moreover, about half of these combinationshad only one entry indicating that those teaching tasks wereundertaken by an instructor for the first and only time. Thetotal raw data contained 349 entries, or an average of 1.79entries in each instructor-subject combination for which therewas any entry at all. On the average, each instructor taught2.60 different subjects, and each subject had 4.8 differentinstructors. Part of the cause of this situation lies with theCo-operative Program and its complex schedule that precludesany simple recycling period. Nevertheless, identification ofthe situation was followed by scheduling policies which havesubstantially reduced the frequency of "one-off" teachingtasks.The data are not the result of a designed statistical experi-

ment but rather constitute a historical record. The data fromthis record have been categorized in different ways, and theresults inspected for evidence of association between the rat-ings and the factor selected for the classification. Since we donot have a fully randomized experimental design in advance, itis not possible to explore all the various interactions amongthe factors, nor to relate cause and effect.However, one way classification of all pertinent available

data can show whether the factor being classified is associatedwith the instructor ratings.We observed that the fraction of subjects being taught by an

instructor for the n-th time decreased exponentially. Thissuggested comparing the cases of first offerings with the n > 2cases. Data for this test involved all the instructors from theElectrical Engineering Department who had taught the samesubject or subjects more than once in the sophomore andjunior years. This included 25 instructors, 13 different sub-jects, and 39 different instructor-subject combinations. Foreach combination, paired differences were calculated betweenthe "first-offering" of a subject and subsequent offerings bythe same instructor. The entries for the second and subse-quent times the course was taught were compared to the classsizes corresponding to these entries. In this way any inter-action between the class size test and any "first-offering"effect was avoided.

It has been alleged sometimes, that a certain class has a col-lective personality with a consistently favorable or unfavorableeffect on instructor ratings. To test this, data were analyzedfor all class streams for which there were four or more se-mesters with complete reports. The averages for the six classesso obtained could then be compared for any consistent dif-ference between groups of students.One obvious difficulty in testing for the significance of

instructor evaluation by students, is the possible interaction

------------------------------------------------------------------------

124

Page 4: Factors in the Evaluation of Instructors by Students

RATZ: EVALUATION OF INSTRUCTORS BY STUDENTS

with the subject being taught. Because of the large number ofisolated instructor-subject combinations in the original data,some selection mechanism was necessary to choose those in-structors who had taught several subjects, and those subjectstaught by several instructors. The original instructor-subjectmatrix has 75 X 45 cells, mostly empty, but some with severalentries. Rows and columns of this matrix were discarded ifthey contained only 1 or 2 non-empty cells. In this way thedata matrix was reduced by successive iterations until a matrixof 15 instructors X 16 subjects remained such that on theaverage each instructor had taught 3.6 of the 16 subjects andeach subject had had 3.4 of the 15 instructors. As one mightexpect, this reduced data matrix showed groups of instructorssomewhat loosely associated with groups of subjects. Thesubjects fell into four groups, which we designate, A, B, C,and D such that if an instructor had taught any one subject ina group, then he also had taught over half the subjects in thatgroup. The number of subjects in each group, along with arough indication of the subject material are as follows:A (2 subjects) Introductory "service courses" to other

Engineering Departments;B (5 subjects) Electronics, circuits, field theory;C (4 subjects) Electromagnetics, energy conversion, power;D (5 subjects) Networks, control systems, communications.

III. TESTS AND RESULTS

Any consistent effect related to the first time a subject istaught as compared with second and further repeats of thesame subject was investigated using data for Electrical Engi-neering instructors and subjects in the sophomore and junioryears only as described earlier. There were 103 entries in thedata, with a mean for first-time offerings of 100.6. Secondand subsequent offerings of these same subjects by the sameinstructors was, on the average, 6.2 points higher. The paireddifferences did not depart significantly from normality, andthis mean difference tested as very significant using a "t" test.Thus there is a favorable increment of about 6% in the ratingof an instructor in this group when he is presenting a subjectwhich he has taught before.There would seem to be an obvious difference between lec-

turing to a large audience in one of the mass enrollment sub-jects, and teaching a small group who have elected a particularcourse [11 . Cornwell [31 has concluded that for enrollmentsover 20 there is no substantial class size effect but that higherratings are obtained by instructors of classes under 20 innumber. Again, data involving Electrical Department instruc-tors teaching Electrical Engineering classes in the sophomoreand junior years were examined to see if there was any cor-relation with the size of the classes. In this case, the data re-lated to subjects which the instructor had taught at least oncebefore. We had 78 entries of an instructor rating with thecorresponding class size, and both ratings and class sizesshowed no significant departure from normality. The classsizes were grouped into four categories and the median con-tingency test applied to the ratings, but no significant depar-ture from the uniform distribution was detected. The sameconclusion results from applying a "t" test to the correlationcoefficient, and therefore, we conclude there is no correlation

with class sizes. Cornwell [3] indicates a higher rating byabout 13%, for instructors of classes under 20 in size. Therange of class sizes in our study (28 to 75) did not includeclasses that small, so that our results are not inconsistent withthose of Cornwell and support his conclusions for class sizesover 20.The possibility was investigated that a particular class or

student group would consistantly rate all instructors eithermore favorably or more unfavorably throughout its academicprogram. Six such classes were compared for which therewere four to six semesters of reports available from each, anda total of 213 entries. The median contingency test detectedno significant differences among these six classes. Moreover,the entries did not depart from normality, and the variances ofeach class were homogeneous according to Bartlett's test.Therefore we employed the more powerful analysis of variancefor a one-way classification, and the "F-test." Again, wecould detect no significant difference among these classes.The data were categorized according to the four subject

groups obtained from the reduced data matrix describedabove, yielding a total of 169 entries.These were assigned in a median contingency table and tests

of the resulting distribution indicated a significant differenceamong the groups. However, the majority of the contributionto the chi-squared statistic derived from subject group "A"which had a mean of 90.1 compared to that of 102.2 for theother subjects. Therefore, we conclude, tentatively, thatthere is a 12% decrement in average student rating associatedwith these "service-courses."Next the data were categorized by the 15 instructors as

identified from the reduced data matrix. The 128 values wereentered in a median contingency table, and the resulting chi-squared test indicated a very significant departure from theuniformity expected by the null hypothesis. Therefore, itwould be natural to conclude that there is a difference amongthe instructors and that the ratings by students are a signifi-cant indicator of teaching effectiveness, which can be asso-ciated with each instructor. Certainly, the statistical test issignificant, but the fimal conclusion could be misleading.This point can be demonstrated by removing from the test,

the four or five instructors whose ratings contributed most tothe test statistic. There remains a pool of a two-thirds major-ity of the instructors. Now members of this selected pool arecompletely indistinguishable on the median contingencytable test. Moreover, the ratings for this pool do not differsignificantly from a normal distribution, and their variancesare homogeneous by Bartlett's test. Accordingly, a one-wayanalysis of variance can be used, but it also shows no signifi-cant difference among the members of this majority pool.The point, then, is that the majority of the instructors

formed an indistinguishable pool as far as these ratings wereconcerned, with a mean and standard deviation of:

100 ± 13.7.

The remaining instructors, a minority, fell into two groups,that were on the average ±14%, or one standard deviation,above and below the average of the majority pool.There are reports that the best rated teachers are those who

125

Page 5: Factors in the Evaluation of Instructors by Students

IEEE TRANSACTIONS ON EDUCATION, AUGUST 1975

also obtain external research support [1] in spite of anecdotaldata that persist in describing researchers as bad teachers. Totest this with our data, we compared student ratings with ameasure of external research support. Faculty members inElectrical Engineering at the University of Waterloo obtainfinancial support for their research from a number of sources.Almost all, however, have some support from the NationalResearch Council of Canada, and all NRC Grants-in-Aid toUniversities in Canada in Electrical Engineering are awardedthrough the same central committee. Thus these grants, whileonly one possible source of research support, are very wide-spread and based, in some sense, upon a common measure ofresearch activity in the preceding years. Accordingly, we tookthe total of NRC awards over the three fiscal years 1972-1975and compared these totals with the average student ratings ofthe 15 instructors over the 1969-1973 period.A two-by-two contingency table obtained from the medians

in each quantity failed to reveal any association whatever,between student ratings and research grants in these data.Finally all the response cards were processed to ascertain if

the answers to questions 11, 12, and 14 (see Fig. 2) gave anyclue to the average of questions 1 to 10 employed as an in-structor evaluation. The overall averages for the three sub-sidiary questions were:

109.6 ± 28 (question 11, scaled as per instructors ratings)5.9 ± 3.0 (lhours/week ofhome study)72.2 ± 8.0 (previous term average, percent).

The results of the tests for correlation between these sub-sidiary questions and the instiuctor evaluation were not asanticipated. There was no important linear correlation be-tween the average of questions 1 to 10, and the response toquestions 12 and 14 concerning individual student effort andprevious semester achievement. On the other hand, there wasa very significant correlation with the response to question 11,which enquires after "interest in the subject matter of thecourse, independent of the instructor." The correlation coef-ficient in this case was:

0.29 ± 0.01

between the instructor's rating and student interest in the sub-ject matter, "independent of the instructor."

IV. CONCLUSIONSThe results indicate that the rating of teaching effectiveness

by students has a standard deviation of about 14%, whichrepresents fluctuations that have not been accounted for.Thus, short-term variations of this size may not be significant,and longer term averages are more appropriate as indicators. Aconvenient way of maintaining such a running average is toweight previous samples exponentially into the past. Thisresults in "low-pass" smoothing and makes it unnecessary torecalculate the average from all past values as each new valuebecomes available. The algorithm is:

(New average) = (Previous average) + a (New value -

Previous average)= et (New value) + (1 - at) (Previous average).

Thus, as each new value becomes available, the average is up-

dated. The constant, a, determines the weighting sequence,and the reduction in variance achieved.The reduction in variance is a/(2 - at) for an infinite series of

exponentially weighted past values. It is of interest to knowthe reduction in variance obtained by this smoothing proce-dure for a finite number, K, of past values, since this indicateshow the above limiting reduction is reached as K increases.This is shown in Fig. 3. If a value of ca is selected, the familyof curves shows how the variance is reduced with increasing Kuntil the limit is reached. The reduction in the variance atthis limit is equivalent to that obtained by the uniform aver-aging of (2 - oa) aa samples. For example, with a smoothingconstant of a = 5 the standard deviation would be reducedfrom 14% to under 5% after 10 samples have been includedin the weighted average. This assumes, of course, that thefluctuations result from random or uninteresting causesrather than from any systematic and meaningful factor.Some conclusions can be drawn from our investigation about

the association of certain factors with student questionnaireratings. An instructor appears less satisfactory to his class thefirst time he presents a course, by about 6%. An instructorlecturing in "service-courses"' to students in other departmentscan expect a decrement in his reception of about 12%. On theother hand, our analysis found no consistent effects attribut-able to subject matter, level of research support, studentsgrouped by year of admission, or class size (for classes over20).Although there were significant differences among instruc-

tors, it is important to note that the majority (over 2 ) wereindistinquishable on the basis of these student rating reports,and the differences seen from time to time among membersof this pool were not consistent. Hence some smoothing oraveraging such as given above would be appropriate. A fewinstructors consistently averaged 14% different from thismajority pool, so that significant differences can be estab-lished using student ratings in some cases. While the identifica-tion of these exceptional cases may be useful, the value of thenumerical ratings in all other cases as a quantitative measure ofteaching performance is questionable.The results of the correlations between a students' evalua-

tion of his instructor's performance and his interest in the sub-ject "independent of the instructor," calls into doubt the realmeaning of the questions on this type of questionnaire, andthe interpretation to be placed on scaled measures of studentreaction to those questions [8]. It would seem that the ques-tions may have a variety of meanings to the respondents whichare different from that intended. Hence, it could be that thestandard deviation of 14% results from averaging these particu-lar ten questions. The choice of question is important, andprobably the fewer, the better.The use of one scaled measure implies a one-dimensional

problem; but the opposite of non-inspirational teaching maybe inspiring, or challenging, or trivial, or entertaining. Ob-viously "teaching and learning effectiveness" in any real senseis more complex than a single gastronomical response.Finally, the free form part of the student response provides thethe instructor with information in uncoded form concerninghis apparent strengths and weaknesses and is, therefore, likelyto be more helpful in improving his communication skills.

126

Page 6: Factors in the Evaluation of Instructors by Students

IEEE TRANSACTIONS ON EDUCATION, VOL. E-18, NO. 3, AUGUST 1975

0.1CONSTANT a

Fie. 3. The reduction in variance with smoothing constant ot, after Ksamples.

V. REFERENCES

J. B. Bresler, "Teaching Effectiveness and Government Awards,"Science, Vol. 160, 1968, pp. 164-167.D. W. Brooks, and H. Levenson, "Questions and Answers aboutAnswers to Questions," Jour. of Chemical Education, Vol. 51,1974, pp. 161-162.

[3] C. D. Cornwall, "Statistical Treatment of Data from StudentTeaching Evaluation Questionnaires," Jour. of Chemical Educa-tion, Vol. 51, 1974, pp. 155-160.

[41 P. K. Gessner, "Evaluation of Instruction," Science, vol. 180,1973, pp. 566-570.

[51 J. R. Hayes, "Research, Teaching, and Faculty Fate," Science,vol. 172, 1971, pp. 227-230.

[6] J. H. Hildebrand, "On Grading without Judgment," Bull. Amer.Assoc. Univ. Prof., vol. 31, 1945, p. 638-64 1.

[71 0. E. Lancaster, "Measuring Teaching Effectiveness," I.E.E.E.Trans. on Education, vol. 16, 1973, pp. 138-142.

[8] E. M. Larsen, "Students' Criteria for Response to TeachingEvaluation Questionnaires," Jour. Chemical Education, vol. 51,1974, pp. 163-165.

[9] M. Rodin, and B. Rodin, "Student Evaluation of Teachers,"Science, vol. 177, 1972, pp. 1164-1166.

[10] M. E. Schaff, and B. R. Siebring, "What do Chemistry Professorsthink about Evaluation of Instruction?" Jour. of ChemicalEducation, vol. 51, 1974, pp. 152-154.

[111 B. R. Siebring, "A Survey of the Literature Concerning StudentEvaluation of Teaching," Jour. of Chemical Education, vol. 51,1974, pp. 150-151.

[12] L. W. Zelby, "On Teaching Effectiveness," IEEE Trans. on Educa-tion, vol. 15, 1972, pp. 30-31.

[13] L. W. Zelby, "Student-Faculty Evaluation," Science, vol. 183,1974, pp. 1267-1270.

[14] I. N. Gibra, "Probability and Statistical Inference for Scientistsand Engineers," Prentice-Hall, Inc., New Jersey, 1973. Chaps.10 and 14.

[15] W. L. Harp and R. L. Winkler, "Statistics: Probability, Inference,and Decision" Vol. II, Holt, Rinehart and Winston, Inc., NewYork, 1970. Chap. 12.

[16] M. C. Kendall and A. S. Stuart, "The Advanced Theory ofStatistics" Vol. II, Hofner Publishing Comp., New York, 1961.Chaps. 30 and 33.

A Course in Law and TechnologyJAMES P. VANYO AND STANLEY J. NICHOLSON

Abstract-There has been growing interest in the interactions betweenlaw and techndogy as subject matter having potential utility for under-standing modern society and for helping to resolve some of its difficul-ties. There has also been increasing interest in various "mastery" strat-epW as effective ways to organize and conduct courses. This paper

describes a course in law and technology taught using selected featuresof mastery strategies modified to the unique needs of an interdisciplin-my course and a very limited number of proctors.

I. INTRODUCTION

TMHS paper describes an interdisciplinary upper divisionuniversity course titled "Law and Technology" whichuses selected features of mastery strategies. It is offered

Manuscript received September 12, 1974; revised March 10, 1975.This work was supported in part by a grant from the Office of Instruc-tional Development, University of California, Santa Barbara, Calif.

J. P. Vanyo is with the Department of Mechanical and EnvironmentEngineering, University of California, Santa Barbara, Calif. 93106.

S. J. Nicholson is with the Office of Instructional Development, Uni-versity of California, Santa Barbara, Calif. 93106.

in the Department of Mechanical and Environmental Engi-neering largely because of the background of the first author.There has been a growing belief among many people that law

and technology have natural interactions and that understand-ing these interactions may help to understand modern societyand lead toward resolving some of its difficulties (see, for ex-

ample, Shapiro [1], Douglas [21 , and Thomas [3]). Theemphasis of the present course is to provide a survey of themost important of the interactions between law and tech-nology, and to permit students, through the use of group pro-

jects, to examine one such interaction in some detail.Not only is the content of the course interdisciplinary, but

the mix of students typically enrolled in the course is alsointerdisciplinary. In the two offerings to date, most of thestudents were either in Mechanical Engineering or were Let-ters and Science majors in either a Law and Society Programor an Environmental Studies Program. However, there were

also students from Political Science and Electrical Engineering.Approximately 25% of the students were juniors, 50% were

seniors, and 25% were graduate students.

[1]

[2]

127