the validity of higher-order questions as a process indicator of educational quality

7/27/2019 The Validity of Higher-Order Questions as a Process Indicator of Educational Quality

http://slidepdf.com/reader/full/the-validity-of-higher-order-questions-as-a-process-indicator-of-educational 1/33

THE VALIDITY OF HIGHER-ORDER QUESTIONSAS A PROCESS INDICATOR OF EDUCATIONALQUALITY

Robert D. Renaud* , † and Harry G. Murray**

................................................................................................

................................................................................................One way to assess the quality of education in post-secondary institutions is throughthe use of performance indicators. Studies that have compared currently popularprocess indicators (e.g., library size, percentage of faculty with PhD) found thatafter controlling for incoming student ability, these process indicators tend to beweakly associated with student outcomes (Pascarella and Terenzini, 2005). Inaddition, while much research has found that students increase their criticalthinking skills as a result of attending college, little is known about what goes onduring the college experience that contributes to this. The purpose of this researchwas to examine the validity of higher-order questions on tests and assignments asa process indicator by comparing it with gains in critical thinking skills amongcollege students as an outcome indicator. The present research consisted of threestudies that used different designs, samples, and instruments. Overall, it was foundthat frequency of higher-order questions can be a valid process indicator as it isrelated to gains in students’ critical thinking skills.

................................................................................................................................................................................................KEY WORDS: critical thinking; higher-order; educational quality; process indicators;performance indicators; college; university.

INTRODUCTION

While we have long known that students learn and develop in a widevariety of ways as a result of attending college, we are only beginning toget a clearer idea of what is going on within colleges that contributesto these gains (see review by Pascarella and Terenzini, 2005). Morespecically, Pascarella and Terenzini point out that the degree to which

*Department of Educational Administration, Foundations, and Psychology, University of Manitoba, Winnipeg, Manitoba, Canada.

**Department of Psychology, University of Western Ontario, London, Ontario, Canada.Address correspondence to: Robert D. Renaud, Department of Educational Administration,Foundations, and Psychology, University of Manitoba, Winnipeg, Manitoba, Canada; E-mail:[email protected]

Research in Higher Education, Vol. 48, No. 3, May 2007 ( Ó 2007)DOI: 10.1007/s11162-006-9028-1

319

0361-0365 ⁄ 07 ⁄ 0500-0319 ⁄ 0 Ó 2006 Springer Science+Business Media, Inc.



a student progresses in a particular area of learning is determined to aconsiderable degree by his or her amount of involvement. As well, weare nding out much more about what is happening within colleges tofoster greater levels of student involvement. The implications stemmingfrom the progress in this area apply to several perspectives. Perhaps themost direct application of these ndings would be with the individualinstructor who can decide which instructional approaches and classroomactivities would best contribute towards the intended student outcomes.On a slightly broader level, having a clearer understanding of what fur-thers student achievement would enable administrators to make moreappropriate policy decisions. With respect to the issues concerning theassessment of educational quality, with quality in higher educationbroadly dened as the amount the students have learned and developedas a result of their enrollment, evaluators have the opportunity todevelop more valid performance indicators.

Performance indicators are often divided into input, process, and out-put indicators (e.g., Borden and Botrill, 1994). Astin (1970) outlines aconceptual model of college impact that represents these categories.Student inputs are the characteristics a new student brings into college(e.g., aspirations, aptitude). Moreover, student inputs can be in the formof either variable measures that change with time (e.g., cognitive devel-opment), or static personal attributes (e.g., race, sex). The process indi-cators (or college environment as referred to by Astin) refer to thoseaspects of the institution capable of affecting the student (e.g., teachingquality, physical characteristics). Student outputs refer to those aspectsof student learning and development that the college inuences orattempts to inuence.

The input-process-output model is a fairly common approach in pro-gram evaluation research. Of particular importance in the presentresearch is the relation between the processes and the outcomes. Oneway of judging the validity of a process variable is to compare it withan outcome variable (Scheirer, 1994). Scheirer points out that if the pro-cesses have any inuence on the outcomes (as intended), then these twofactors should be associated. In other words, high levels of outputshould be preceded by high levels of the process variables and vice-versa.The important point here is that, without measuring the empirical rela-tion between both the input and process variables and the outcomes, thedegree to which one can conclude that the program is effective (i.e.,schools are fostering student learning) is difficult to conrm.

One main reason for focusing on the relation between institutionalprocess indicators and student outcomes is in response to the rapidlygrowing popularity of college and university annual rankings, which

320 RENAUD AND MURRAY



focus primarily on inputs and processes while little (if any) attention ispaid to outcomes (e.g., Maclean’s, U.S. News and World Report). Per-haps the most controversial aspect of these rankings is that they arebased largely on the unsupported assumption that the indicators used torank institutions are in fact related to student outcomes or, in otherwords, accurately reect the level of quality within an institution (Astinand Panos, 1969; Ball and Halwachi, 1987; Conrad and Blackburn,1985; Nedwick and Neal, 1994, Solmon, 1973, 1975; Tan, 1986).Another serious limitation with college rankings is the redundancyamong the indicators in that many of the indicators are highly relatedto one another. For example, Astin (1971) reported that admissionselectivity is highly correlated with an institution’s prestige, faculty – student ratio, library size, faculty salaries, endowment, research funds,academic competitiveness among students, and political orientation. Ina later study, Astin and Henson (1977) found that selectivity correlatedstrongly with several traditional performance indicators including tui-tion, average faculty salary, educational and general expenditures perstudent, value of endowment per student, percentage graduate students,and educational and general expenditures. Therefore, it is hardly sur-prising to see that a school that is highly selective will also score highlyon most if not all other indicators. A third concern with institutionalrankings is that a particular performance indicator used to compareschools (e.g., faculty – student ratio) may not be based on consistentcriteria (Conrad and Blackburn, 1985; Solomon, 1973, 1975). For exam-ple, what exactly denes ‘‘faculty’’ in determining the faculty – studentratio could be tenured faculty only at one institution and anyone whoteaches a course (including graduate students and part-time instructors)at another institution. Although there are other limitations with respectto rankings and performance indicators (see Bruneau and Savage, 2002),the three limitations described above appear to be the most relevant andsubstantial.

In summary, a central question with respect to measuring educationalquality is, how much do institutional characteristics as reected in per-formance indicators actually contribute to student learning and develop-ment? In their extensive review on how college contributes toward awide range of student outcomes, Pascarella and Terenzini (2005) con-clude that the relationship between the institutional characteristicstypically used in annual rankings and student outcomes is generallyweak and inconsistent. In other words, among colleges that are similarin terms of incoming student ability, there seems to be little relationbetween structural features such as library size and output measures likecognitive skills. One possible explanation suggested by Nordvall and

VALIDITY OF HIGHER-ORDER QUESTIONS 321



Braxton (1996) is that current indicators are inappropriate because theyare far too removed from the level of actual classrooms and courses,and thus, it is difficult to formulate feasible educational policies basedon them. Second, there is the problem of misuse of indicators. Severalsources (e.g., Hossler, 2000; McGuire, 1995; Webster, 1992) suggest thatsome of the performance indicator information provided by institutionsis either incomplete, inaccurate or, in an effort to reect the most posi-tive image, deliberately false. Thus, even if a set of meaningful and validindicators are established, they will be only as informative to the degreethey are properly measured and interpreted. Given these concerns withthe currently popular indicators, one process indicator that may bemore valid is the use of higher-order questions in classes.

The denition of a higher-order question in this study corresponds tothe top four levels within Bloom’s (1956) taxonomy educational objec-tives within the cognitive domain, namely, application, analysis, synthe-sis, and evaluation. Briey, an acceptable response at a particular levelassumes that one can exhibit the cognitive processes at all of the lowerlevels. For example, being asked to design a study to determine howmuch student learning is caused by teacher enthusiasm would representa synthesis level item. This would require a student to know about eachaspect of the study such as research design and data collection (knowl-edge); know what each aspect of the study means, such as why anANOVA would be an appropriate statistic (comprehension); applythese abstract concepts to a particular situation (application); and tie ineach of the separate concepts together such that each component (e.g.,sample size, how enthusiasm and learning are measured) becomes anintegral part of the newly created product (synthesis). Perhaps theclearest distinction between lower- and higher-order questions, as notedby Bloom, is that while lower-order questions are designed to elicitexisting answers (e.g., from the textbook, directly from the lecture),higher-order questions require novel answers in that they cannot simplybe recalled.

It appears that the earliest consistent empirical support for the useof higher-order questions comes from several studies carried out byBraxton and Nordvall. The frequency of use of higher-order ques-tions was found to be positively related to year level of course (Brax-ton, 1993), selectivity of the institution (Braxton, 1993; Braxton andNordvall, 1985), whether a course is required or optional (Braxtonand Nordvall, 1985), and quality of the graduate department fromwhich a faculty member earned his or her degree (Braxton andNordvall, 1988). In sum, these studies provide clear validity evidencefor the corresponding inputs and processes.




There are three main advantages in focusing on higher-orderquestions as a process indicator. First, this variable is relatively easy tomeasure. Like most other performance indicators, frequency of higher-order questions is (1) a quantitative variable that can be measured quitereliably and objectively; (2) available from a large number of schoolsand many classes within each school; and (3) measurable in a non-intrusive way that does not interfere with regular classroom or adminis-trative operations. The second main advantage is that unlike manyother performance indicators, frequency of higher-order questions issomething that occurs at a classroom or course level and instructors cancontrol directly in their classrooms. Finally, higher-order questions canprovide additional and, compared with students ratings, more objectivedata on the quality of the course (Nordvall and Braxton, 1996). Amongthe many possible outcome variables of interest, the current researchexamined gains in critical thinking skill because it would logically be ex-pected to be enhanced by increased exposure to higher-order questionsand, as most educators would agree, is one of the most important goalsof the educational process (Facione, 1990; Halpern, 1996, 2001).

In comparing the processes involved in higher-order thinking (appli-cation, analysis, synthesis, and evaluation) with commonly denedaspects of critical thinking, it appears that critical thinking comprises asignicant portion of higher-order thinking (Ennis, 1985; Facione; 1990;Ferguson, 1986; Halpern, 1998; King, 1995; Paul, 1993; Tsui, 1999). Forexample, two elements commonly included in the denition of criticalthinking, namely the ability to identify assumptions and the ability toevaluate evidence (e.g., Ennis, 1985; Furedy and Furedy, 1985; Pascarellaand Terenzini, 2005; Watson and Glaser, 1980), are also listed as majorcomponents within the analysis level as described by Bloom (1956).

Aside from the conceptual overlap between higher-order thinking andcritical thinking, there are several other reasons why it would be worth-while to explore the relation between these two variables. Perhaps thebiggest reason is that the suitability of focusing on a particular processlike using higher-order questions in class would be best conrmed bydetermining its relation with a specic outcome. Second, among themany possible ways in which students learn and develop as a result of attending college, the one outcome that would be expected to be mostdirectly inuenced would be critical thinking. Finally, previous ndingssuggest that these two constructs are indeed clearly linked together.

From a broader perspective, the results of several recent studiessupport the conclusion that higher-order questions may have a positiveimpact on students’ critical thinking skills. Using a qualitative app-roach to obtain a detailed account of what instructional activities best




contribute toward gains in critical thinking skills, Tsui (2002)concluded that writing was one general factor that was clearly inuen-tial. Although the Tsui (2002) study did not set out to focus on higher-order questions specically, one could reasonably presume that theactivities involved in the process of writing, as outlined by Tsui, in-cluded at least some degree of the type of thinking that would be nee-ded to respond to a higher-order question. Similarly, in a series of studies by Williams and colleagues (Williams, Oliver, Allin, Winn, andBooher, 2003; Williams, Oliver, and Stockdale, 2004; Williams andStockdale, 2003), students who scored better on exams with itemsrequiring logical reasoning tended to show larger pre-test – post-test gainsin critical thinking skills. As with the Tsui (2002) study, the results of the Williams studies provide indirect support for the effect of higher-order questions on critical thinking skills.

Previous studies that have explored the correlation between the fre-quency of higher-order questions with critical thinking have generallyfocused on higher-order questions either as those asked by the instruc-tor during lectures, or those listed on tests and assignments. Except forthe ndings of Logan (1976), who found a positive relation between thefrequency of higher-order questions asked by the instructor while teach-ing and gains in critical thinking, most other studies (Foster, 1983;Smith, 1977, 1980; Winne, 1979) found little relation. One possibleexplanation for the weak association between asking higher-order ques-tions during class and gains in critical thinking skills may have to dowith the lack of variability in asking higher-order questions (e.g., Smith,1977). In comparison, this link tended to be positive among studies inwhich students either created their own higher-order questions (Keeley,Ali, and Gebring, 1998; King, 1989, 1990, 1995) or answered higher-order questions provided by the instructor on assignments and exams(Gadzella, Ginther, and Bryant, 1996; Gareis, 1995; Willis, 1992).

The Current Research

Given that higher-order questioning is a potentially valuable processor performance indicator variable that has not been studied extensivelyin terms of how it inuences critical thinking, the goal of this researchwas to attempt to conrm empirically that the frequency of higher-orderquestions occurring on tests and assignments is a valid process indicatorof educational quality in that it correlates signicantly and positivelywith gains in critical thinking skill. Three studies were conducted to testthis hypothesis.




The rst study was a split-plot experimental design with studentsfrom three sections of an educational psychology course. As such, therewere three levels of the between-subjects factor (i.e., course sections),and two levels of the within-subjects independent variable controlled bythe researcher (i.e., lower- and higher-order questions). Each studentwas given six assignments during the course with a given assignmentcontaining both low-level questions based on one chapter of the textand high-level questions based on another chapter. After each consecu-tive set of two assignments (representing four chapters of material fromthe text), students were given a test containing four critical thinkingquestions with each question focusing on one of the chapters covered inthe preceding two assignments. Thus, there were three tests in total,each providing a within-subjects comparison of topics taught with andwithout higher-order questions. It was predicted that students wouldearn higher scores on the critical thinking questions from chapterswhere higher-order assignment questions were given than from chapterswhere lower-order assignment questions were given.

The second study was a true experiment that involved studentsenrolled in an introductory psychology course. The experiment consistedof a pre-test measure of critical thinking, answering review questionsbased on a passage from the text used in the course, and nally a post-test measure of critical thinking. The independent variable was the levelof review questions (lower- or higher-order). It was predicted that theexperimental group would show larger pre-test to post-test gains in criti-cal thinking compared to the control group.

The last study was a correlational design that compared mean coursegain in critical thinking with the proportion of higher-order questionsused on assignments and tests across courses from different year levelsand disciplines. Gains in critical thinking were measured with pre-testand post-test measures consisting of abbreviated versions of theWGCTA (Form A and B). It was predicted that the frequency of use of higher-order questions on assignments and tests would be positively cor-related with class mean gains in critical thinking.

STUDY 1

This study used a split-plot design with three class sections. Withineach section, students were given three tests and, in preparation for

each test, they were given lower- and higher-order assignment questions.The purpose of this study was to determine if students would earnhigher scores on critical thinking test items that pertain to higher-order




assignment questions compared to critical thinking test items thatpertain to lower-order assignment questions.

Method

Participants

Within a large university in central Canada, a total of 131 undergrad-uate students from three sections of an optional introductory educa-tional psychology course participated in this study. The classes ran fromSeptember to December 1998 ( n = 49), January to April 1999 ( n = 53),and July to August 1999 ( n = 29). All sections were taught by the rst

author. Although students were not randomly assigned to course sec-tions, it is reasonable to expect that the groups were similar in terms of any possible confounding inuence (e.g., motivation) due to the factthat students typically chose a particular section based on factors unre-lated to the variables of interest in this study (e.g., scheduling with othercourses). Enrollment in this course typically consists of students who are19 – 22 years of age, and approximately 80% of the students are female.Participating students were not informed of any aspects of the studyin order to preclude possible biases. However, ethical approval to con-

duct the study was obtained with the assurance that (1) the data wereobtained entirely from students grades on normal course assignmentsand tests, such that conducting the study did not interfere with anyaspect of the course (i.e., content coverage, teaching, evaluation); and(2) every student was given the same set of tasks (i.e., assignments andtests) and was assessed according to the same standards.

For a student’s data to be included for analysis, he or she had tomeet all three of the following criteria. First, to help ensure that a stu-dent was exposed to the independent variable (i.e., lower- and higher-

order questions on assignments) to an acceptable degree, a student hadto have submitted at least ve of the six assignments in the course.Second, data were retained only for students who obtained a total scoreof 41 or less out of 48 on test questions based on lower-order assignmentquestions, thus making it possible for a student’s score on higher-ordertest questions to exceed that on lower-order questions. A ceiling effectwas found for some students who obtained near perfect scores on mostof the test questions pertaining to lower-order assignment questions,such that the student’s scores on test questions corresponding to higher-

order assignment questions could, at most, be only marginally higherthan his or her scores on lower-order test questions. Third, to obtaina complete measure of the dependent variable, a student had to have




written all three tests in the course. Forty-one students failed to meet oneor more of these three criteria, leaving 90 students for the analyses. 1

Materials

The independent variable was the level of questions (lower- vs. high-er-order) in each of six assignments over the entire course. In all sec-tions, each assignment focused on two consecutive chapters of thetextbook. Most students completed each assignment within 2 – 3 pages.The textbook used in each section was Educational Psychology, 6th edi-tion, by Gage and Berliner (1998). In terms of grading for each section,each assignment was given equal weight with all six assignments collec-tively accounting for 20% of the nal grade in the course.

In the rst two class sections, each assignment contained six lower-order questions based on one chapter and two higher-order questionsbased on the other. For example, Assignment 1 contained two higher-order questions based on Chapter 1 of the textbook, and six lower-order questions covering Chapter 2. In total, the 6 assignments covered12 chapters of the textbook. Given that the selection of chapters fromwhich higher- and lower-order assignment questions was random, andthat higher-order questions were intended to be more difficult than thelower-order questions, it was possible that how well students did on aparticular question was partially a function of the content of the chap-ter. To help deal with this, lower- and higher-order questions on assign-ments used in the second class section focused on chapters opposite tothose used in the rst class section. For example, Assignment 1 in thesecond class contained six lower-order questions from Chapter 1 andtwo higher-order questions covering Chapter 2.

For the third class section, the structure of the assignments was thesame as for the rst two sections, except for one main difference.Instead of students doing an assignment containing lower- and higher-order questions written by the instructor, students themselves wererequired to compose two lower-order multiple-choice and two lower-or-der short answer essay questions based on one chapter in the text, andone higher-order multiple-choice and one higher-order essay questionfor another chapter.

In all three class sections, the dependent variable was performance oncritical thinking questions in each of three non-cumulative term testscovering successive thirds of the course. Each critical thinking questionwas of short-answer essay format with the answer expected to be rough-ly half a page in length. Each term test contained four critical thinkingquestions with each question focusing on one of the chapters covered in




one of the two corresponding assignments. For example, the rst testcontained four critical thinking questions covering chapters 1 – 4 of thetextbook, which corresponded to the rst two assignments. In terms of grading, the critical thinking questions accounted for about half of themarks on each test, with each test worth 20% of the nal grade in thecourse.

Procedure

During the rst class, each student was given a course outline thatexplained the format of the assignments and tests, along with the exactdates when assignments were due and tests were given. Students werestrongly encouraged to complete assignments not only to maximize theirnal grades, but also as a study aid in preparing for the next test. In therst two class sections, each assignment was handed out in classapproximately 1 week before it was due and was graded and returned tostudents roughly 1 week after the due date. In the third class section,the details of all six assignments were handed out to students during therst class. In addition to assigning a grade for each question on theassignment, written constructive feedback was provided for questionsthat were not written and/or answered adequately. The majority of the

10.0

10.5

11.0

11.5

12.0

12.5

13.0

M e a n

C r i t i c a

l T h i n k i n g

S c o r e

1 2 3

Test

Higher-Order

Lower-Order

FIG. 1. Mean critical thinking scores based on lower- and higher-order assignmentquestions (Study 1, all classes).




grading in each section was done by a trained assistant who was una-ware of this study. In each section, the assistant graded each entireassignment and half of the critical thinking questions in each test(i.e., two questions). The remainder of the grading was done by theresearcher/instructor.

Results

To assess the overall difference between lower- vs. higher-order ques-tions on critical thinking and to identify under which conditions thatdifference would exist, a 3 Â 2 Â 3 split-plot analysis of variance was per-formed. There was one between-subjects factor namely class sections (A)with three levels, and two within-subjects factors being level of ques-tions (i.e., lower- vs. higher-order) (B), and testing occasions (C) withthree levels. As outlined previously, each test consisted of four criticalthinking questions with two of those questions pertaining to chaptersfrom which lower-order assignment questions were given, and the othertwo based on chapters from which higher-order assignment questionswere given. The dependent variable was the total score on the two criti-cal thinking questions on a test corresponding to either the lower- orhigher-order assignment questions. Each critical thinking test questionwas worth eight marks. Therefore, scores could range from a low of 0to a perfect score of 16.

In this study, the most relevant main effect was the within-subjectseffect of level of questions on assignments. Because this factor consistedof two levels, the assumption of circularity was not applicable. Acrossall classes and testing occasions, the mean score of the critical thinkingtest questions based on higher-order assignment questions ( M =11.49,SD =1.84) was signicantly higher than the corresponding mean scorebased on lower-order assignment questions ( M =11.07, SD =1.76), F (1,87)=4.36, p<.05, and the proportion of variance accounted for by thiseffect was .048.

A signicant two-way interaction was found between level of ques-tions and testing occasion, F (2, 174)=5.41, p<.01. Figure 1 shows themeans of test questions based on lower- and higher-order assignmentquestions collapsed across the three classes for each time period. It wasfound that the mean score of critical thinking questions based on high-er-order questions ( M =12.34, SD =2.01) was higher than the meanscore of critical thinking questions based on lower-order questions(M =10.98, SD =2.41) in the rst test, whereas the corresponding meanscores in both the second and third tests did not differ signicantly.A three-way signicant interaction between level of questions, testing




occasion, and class section was also found, F (4, 174)=54.67, p<.01.Contrary to expectation, the mean critical thinking score based on lower-order questions ( M =10.78, SD =2.81) was signicantly higher than thatrepresenting higher-order assignment questions ( M =9.22, SD =3.61) onthe third test. In contrast, the mean critical thinking score based onhigher-order questions on the third test in Class 2 ( M =13.18,SD =1.84) was signicantly higher than that representing lower-orderquestions ( M =10.53, SD =3.49). Similarly, for Class 3, the meancritical thinking score based on higher-order questions ( M =12.06,SD =1.98) was greater than that reecting lower-order questions(M =9.19, SD =2.11) on the rst test. In sum, while the ndings withrespect to the main effect for level of questions appears to be encourag-ing as the overall mean critical thinking score based on higher-orderquestions was higher than that associated with lower-order questions,the simple main effect comparisons were complex and confusing, andfailed to identify any patterns that might suggest under which particularconditions this effect is more likely to occur.

Discussion

It was predicted that students would earn higher scores on criticalthinking test questions when the questions pertained to a chapter fromwhich higher-order assignment questions were given than from a chap-ter from which lower-order assignment questions were given. The mainfeature of this study was that it provided a within-subjects comparisonof topics taught with and without higher-order questions. As such, somepossible confounds such as incoming knowledge of educational psychol-ogy or critical thinking ability were controlled for. Overall, the resultsoffer mild statistical support for this hypothesis. However, the ndingscould be considered more notable for at least two reasons. Although thefact that level of assignment questions accounted for only 4.8% of thevariance may not seem particularly noteworthy, when considered in aneducational context, it could mean the difference of a letter grade (e.g.,going from a ‘‘C’’ to a ‘‘B’’) for some students. Furthermore, this nd-ing is even more impressive when one considers the fact that the inde-pendent variable (i.e., level of assignment questions) had only twolevels, which would tend to attenuate its relationship with a dependentvariable that could take on many possible values (i.e., critical thinkingtest item) (Prentice and Miller, 1992).

One possible reason why this study did not nd a stronger main effectfor level of questions may stem from the signicant interaction betweenlevel of questions and testing occasion. Of the three tests, the only




signicant difference was found on the rst test in which the meancritical thinking score based on higher-order assignment questions washigher than that based on lower-order questions. In each class section,students were informed at the beginning of the course that each of thethree tests were in exactly the same format, and were non-cumulative.Therefore, some students may have modied their studying strategiesafter the rst test so as to obscure the effects of the lower- vs. higher-order questions on subsequent assignments. For example, a student mayhave felt that all four questions on the rst test were quite challengingin that they did not simply ask for information to be recalled. Expectingsimilar types of questions on the next two tests, that student may havestudied each chapter more thoroughly (especially those chapters thatcorresponded to lower-order assignment questions) by thinking about itscontent in ways that were not covered explicitly in either the text or thelectures. Secondly, in this course, the rst test was given relatively earlyin comparison to when mid-term exams are typically given in other clas-ses. In this situation, some students may not have had other immediatecommitments (e.g., essays, tests in other classes) and therefore, couldhave had more time to prepare for the rst test. Conversely, the secondand third tests occurred when it was more likely that students had othercommitments, especially the third test happening during the nal examperiod. A third explanation for the seemingly small main effect has todo with the main limitation of a eld experiment. In this study, therewas a clear manipulation of the independent variable of interest (i.e.,level of questions on assignments), and the topics were randomlyassigned to treatment levels. However, because this study was conductedin actual classes, the dependent variable may have been difficult toinuence because of extraneous factors.

Considering the conditions under which this research was carried out,it appears that the impact of lower- vs. higher-order questions on criti-cal thinking is notable. In addition to the main limitation outlinedabove, the fact that this effect was even detectable during the short timespan of the course (about 13 weeks) makes this nding even moreimpressive.

STUDY 2

This purpose of this study was to compare two groups in terms of their mean pre-test – post-test gains in student critical thinking skills in atrue experiment. During the treatment phase, all students were given ashort passage to read along with review questions to answer. Studentsin the experimental group were given higher-order review questions,




while students in the control group were given lower-order reviewquestions.

Method

Participants

Within the same university as Study 1, the participants in this studyconsisted of 190 undergraduate students registered in a rst-year intro-ductory psychology course. Students who chose to participate in thisstudy were given credit toward fullling the research participationrequirement in this course. Before the experiment began, all participat-ing students signed a consent form indicating that they were taking partin a study that compares study strategies. Participants were randomlyassigned to either an experimental group or a control group with 96 and94 students, respectively.

To minimize the degree to which the data used in the analyses werecontaminated with outlying values, each subject’s data had to meet thefollowing criteria for inclusion. Separate analyses were performed foreach of the general and subject-specic critical thinking tests. Lookingat the general test, there were three criteria for inclusion. First, becausethe maximum score on both the general pre-test and post-test was 10points, a subject’s data were removed if he or she obtained a perfectscore on the general pre-test, thus precluding pre-test-to-post-test gain.During the treatment phase of the experiment, to help ensure that asubject had put forth a reasonable effort toward reading the passageand answering the review questions that pertained to the passage, eachsubject had to obtain a minimum score of three out of eight on thereview questions. Although this criterion was applied to all subjects, itwas intended more for those in the higher-order group to ensure thatthey had engaged in a sufficient level of higher-order thinking. Of the 26subjects’ data who were taken out for failing to meet this criterion, 25came from the higher-order group. Finally, as an indication that a subject had put as much effort into answering the questions on the generalpost-test as on the pre-test, and considering how a subject’s scores coulddecline slightly from pre-test to post-test because of factors other thancritical thinking ability (e.g., the different questions in each version), asubject’s data were removed if his or her post-test score was four ormore points below his or her pre-test score. Based on these three crite-ria, 157 subjects were retained for analyses on the general test with 66and 91 subjects in the experimental and control conditions, respectively.With respect to the course-specic psychology portion of the critical




thinking test, there were two further criteria for inclusion in the dataanalysis. First, as with the general subtest, each subject had to obtain aminimum score of three out of eight on the review questions. Becausethe review exercise (i.e., treatment) was the same in both the generaland psychology analyses, the number of subjects taken out from eachgroup was the same. Regarding the level of student motivation in com-pleting the post-test, it is possible that a student who is less motivatedmay simply select answers to the multiple-choice questions at random.With each of the 15 items in the psychology subtest having fouroptions, the probability of someone obtaining up to four out 15 correctby picking answers at random is .69. Therefore, to help ensure that apost-test score was the result of a genuine effort rather than a randomselection, the second criterion for inclusion was that a subject’s post-testscore had to be at least ve out of 15. Based on these two criteria, 95subjects were retained for analyses on the psychology test with 37 and58 subjects in the experimental and control conditions, respectively.

Materials

All subjects were given a pre-test and a parallel post-test measureof critical thinking, consisting of both a general subtest and a course-specic subtest. Each critical thinking test consists of 25 multiple-choicequestions intended to measure the degree to which a student can engagein a particular aspect of critical thinking. Each item is followed by fouror ve options representing varying degrees or correctness or applicabil-ity from which students selected the most appropriate option. Scores oneither test could range from a minimum of 0 to a maximum of 25.To measure general critical thinking ability, the rst ten questions in thepre-test and post-test were adapted from the Watson-Glaser CriticalThinking Appraisal (WGCTA). These questions focus on everyday situ-ations that most people would likely be familiar with. To concur withthe overall format of the WGCTA, the general items used in this studyreected each of the ve main components of critical thinking coveredby the WGCTA (i.e., inference, recognition of assumptions, deduction,interpretation, and evaluation of arguments), with two items reectingeach component. The remaining 15 questions focused on a selected pas-sage from a chapter on personality theory from the introductory psy-chology textbook that was used in the course. The chapter onpersonality theory was chosen because it was not scheduled to be cov-ered in class for at least a month after the completion of this study.

Between the pre- and post-test measures of critical thinking, all subjects were asked to read the passage on personality theory. This passage




consisted of nine textbook pages, which most students read easily within25 min. Along with the assigned reading, each subject was given a set of review questions that pertained to the passage. The independent variablein this experiment was the level of review questions subjects answered asthey read the passage. Subjects assigned to the experimental conditionwere given four higher-order critical thinking questions, each of whichwas in a short answer essay format and answerable within half a page.Subjects in the control condition were given eight lower-order recallquestions, each of which was also in a short answer open format andanswerable within a quarter of a page. The reason for assigning a greaternumber of lower-order questions than higher-order questions was toensure that students in both groups were spending about the samelength of time on the reading assignment before starting the post-testmeasure of critical thinking. It was suspected that the lower-order ques-tions would be easier and, therefore, take less time and space to answercompared to the higher-order questions.

Procedure

Subjects were tested in groups ranging in size from 3 to 10. To ensurethat experimental and control groups had roughly equal numbers of subjects, and that subjects in both groups were tested at the same times,half of the subjects in each testing session were randomly assigned tothe experimental condition and half to the control condition. Before theexperiment began, each participant signed a consent form indicating hisor her understanding of the experiment and willingness to participate.To encourage students to put forth their best effort, students were toldbefore the experiment began that anyone who scored higher than thepredicted population average on both critical thinking tests and thechapter questions would receive a one-dollar lottery ticket at the end of the experiment. Actually, every participant had received a lottery ticket.The ‘‘predicted population average’’ was a ctitious goal that enabledthe experimenter to justify giving every participant a lottery ticket moreeasily than would an absolute goal such as 50% correct.

To begin, all subjects were given Form A as the critical thinking pre-test, which most subjects completed in about 20 min. After all subjectshad completed the pre-test, they were given the assigned reading andreview questions at the same time. Before the students began this phase,they were instructed to briey look over the questions before readingthe passage so they could identify more readily which parts of the pas-sage to pay close attention to in order to answer the questions. Mostsubjects took approximately 40 min to read the passage and answer all




of the questions. After each subject had nished answering his or herreview questions based on the assigned reading, they were given Form Bas the critical thinking post-test. After completing the post-test, eachsubject received a debrieng form outlining the purpose of this experi-

ment in detail and a lottery ticket.

Results

On the pre-test, internal consistency estimates for the 10 items in thegeneral subtest and the 15 items in the psychology subtest were .41 and.00, respectively. Those for the corresponding tests in the post-test were.50 for the general subtest and .08 for the psychology subtest. Theseestimates for the general subtest suggest that the test items reect the

same construct at least to a moderate degree. On the other hand, sur-prisingly low reliability estimates for the psychology tests indicate thatthe test items fail to clearly represent a single construct. An item analy-sis was performed on each of the four subtests to determine if any par-ticular items had a noticeably detrimental effect on internal consistencyreliability. In each of the subtests, it was found that omitting as manyas half of the items with the lowest item – total correlations made only amarginal improvement in reliability estimates. Moreover, omitting itemshad little impact on the degree to which the pre-test – post-test means dif-

fered between the control and experimental groups. Therefore, all itemswere retained for analysis.

5.0

5.5

6.0

6.5

7.0

7.5

8.0

M e a n

C r i t i c a

l T h i n k i n g

S c o r e

PRETEST POSTTEST

Critical Thinking Test (General)

ExperimentalGroup

ControlGroup

FIG. 2. Mean pre-test – post-test critical thinking test scores (General) (Study 2).




Mean scores on the review exercises were 7.7 out of eight for thelower-order question group and 4.4 out of eight for the higher-orderquestion. This indicates that while most students in the lower-order con-dition had little difficulty in correctly answering the review questions,students in the higher-order category were not as successful. However,it should be pointed out that the higher-order review questions wereclearly more difficult compared to those in the lower-order condition.

The mean score for each group on the general pre-test and post-test isshown in Fig. 2. The two groups showed virtually the same amount of gain with the control group going from ( M =5.95, SD =1.81) on thepre-test to ( M =6.45, SD =1.91) on the post-test, while the experimentalwent from ( M =6.53, SD =1.32) to ( M =7.02, SD =1.76) on the pre-testand post-test, respectively. Note that the higher mean score of theexperimental group compared to the control group on the general pre-test was primarily due to the majority of omitted subjects, who tendedto score relatively low on this test, coming from the experimental groupas explained earlier. Using the ANCOVA procedure with pre-test scoresas the covariate, there was no signicant difference between the groups’post-test scores, with group membership accounting for only 1% of thetotal variance.

The mean score for each group on the psychology pre-test andpost-test is shown in Fig. 3. It may be noted that the group receiving

4.5

5.0

5.5

6.0

6.5

7.0

M e a n

C r i t i c a

l T h i n k i n g

S c o r e

PRETEST POSTTESTCritical Thinking Test (Psychology)

ExperimentalGroup

ControlGroup

FIG. 3. Mean pre-test – post-test critical thinking test scores (Psychology) (Study 2).




higher-order review questions showed a larger gain from pre-test(M =5.38, SD =1.59) to post-test ( M =6.46, SD =1.45) than the groupreceiving lower-order questions ( M =5.16, SD =1.63) and ( M =5.79,SD =0.91) for pre-test and post-test, respectively. Using the ANCOVAprocedure controlling for psychology pre-test scores, the mean post-testscore of the group that received higher-order review questions was sig-nicantly higher than that of the group that received the lower-order re-view questions, F (1, 92)=7.94, p<.01, and the effect of the treatmentaccounted for 7.9% of the total variance.

Discussion

The purpose of this study was to determine the effect of higher-orderquestions on gain in critical thinking skills. This was a ‘‘true experi-ment’’ in which all or nearly all possible confounds were controlled for.On the general test of critical thinking, there was virtually no difference(in both statistical and practical terms) between the lower- and higher-order groups in pre-test to post-test gain. However, for the course-specic test of critical thinking, the results showed that students whoanswered higher-order review questions, showed a larger pre-test topost-test gain than students who answered lower-order review questions.In addition to being signicant both in statistical and practical terms,this nding is noteworthy for another reason. Like intelligence, it is notunreasonable to think that an appreciable gain in critical thinking skillswould occur after having been exposed to a variety of courses and expe-riences over an extended period. Therefore, given that one could expectonly a small effect under the restricted conditions of this experiment,namely minimal exposure to the independent variable in such a shorttime frame (less than 2 hours), these results are encouraging.

An ongoing concern in controlled experiments like this is the degreeto which students are putting forth a genuine effort to complete thetasks involved in the experiment. The students who took part in thisstudy did so to help fulll their research participation requirement inthe introductory psychology course. This requirement is based solely onthe number of research studies in which the student participates, andhas nothing to do with the quality of the participation. As an incentivefor the students in this study to try their best in completing the criticalthinking tests and the review questions, each student was promised alottery ticket if his or her scores were above a hypothetical standardon all three tasks (i.e., pre-test, review questions, post-test). Admittedly,in this study, the effectiveness of this incentive is questionable for




two reasons. First, although both groups showed a pre-test – post-testimprovement on the course-specic psychology subtest, it was somewhatsurprising that there was not a larger improvement given that the criti-cal thinking tests referred to a relatively short passage that students hadread through for about 45 min. Second, it is possible that thelower scores obtained on review questions by students in the higher-or-der condition (4.4 out of eight, on average) may suggest that thesestudents were not engaged in higher-order thinking as much as wasexpected.

Another factor that may have affected scores on both the pre-test andpost-test measures of critical thinking is the number of answers selectedsimply by guessing. All students were instructed to answer all of themultiple-choice questions and to take their best guess if they were notcompletely sure of the correct answer. In this situation, it is more diffi-cult to know if a correct answer was chosen either because the studentknew the answer, or simply guessed. This would explain why many of the items, particularly those in the psychology subtests, had negativeitem – total correlations, which diminished their internal consistency.With the large amount of error in the observed scores on the psychol-ogy subtests, it is more difficult to accurately assess one’s level of criticalthinking ability. Therefore, although the ndings of the current studyare encouraging when the context is considered, the lack of a clearlyeffective incentive along with the limitation of the critical thinking mea-sures points out that these results should be regarded with appropriatecaution.

One way to address the concern regarding the level of student motiva-tion to try their best in each of the tasks in the experiment would beto provide a more valuable incentive. At the post-secondary level, oneof the clearest, most immediate incentives is grades. Therefore, onepossibility would be include the experimental tasks as a small part of acourse with a corresponding weight of the nal grade. To deal withthe ethical issues involved, such an experiment could utilize a within-subjects design (as in Study 1) such that each student receives exactlythe same materials from which he or she will be assigned a grade. Forexample, this study could take place during a class period with the pre-test and post-test critical thinking measures focusing only on relevantcourse material. The review questions could consist of both lower-orderquestions that pertain to one half of the passage (e.g., based on onechapter), and higher-order questions that pertain to the rest (e.g., basedon another chapter in the text).




STUDY 3

This study was a correlational design that compared mean course

pre-test to post-test gain in critical thinking with the proportion of higher-order questions used on assignments and tests across coursesfrom different year levels and disciplines.

Method

Participants

The participants in this study were 781 students enrolled in 24 one-semester Year 2 to Year 4 courses in a variety of disciplines within the

same university as in Study 1. As with Study 2, the most important crite-rion for inclusion was that each student had to have completed both pre-and post-critical thinking tests. In addition, to deal with the concern thatsome students may not have put forth a genuine effort to do their best onthe critical thinking tests, a cutoff was established both for the pre-test andthe decline from pre-test to post-test. For a student’s data to be includedfor analysis, he or she had to have scored at least 18 out of 40 on the pre-test and to have dropped from pre-test to post-test by no more than threepoints. The only criterion for class mean data to be included was that the

instructor had to provide a blank copy of each piece of work (e.g, assign-ments, tests, etc.) that was to be graded. Course materials were obtainedfrom all classes except one. Out of 387 students who wrote both tests, 313students from 23 classes met all of the above criteria for inclusion.

Materials

Gains in critical thinking were measured with abbreviated pre-testand post-test versions of the Watson-Glaser Critical Thinking Appraisal(WGCTA). For each of the two forms, Form A and Form B, roughlyhalf of the items in their original format from each of the ve WGCTAsubtests were used. The predictor variable in this correlational studywas the proportion of higher-order questions on tests and assignmentsin a course, as indicated by official course documents such as handouts,tests, and the course outline. The criterion variable was the meanpre-test to post-test gain in critical thinking skill.

Procedure

At least 4 weeks prior to the start of the semester, course instructorswere contacted by e-mail to obtain permission to include their classes inthe study. Instructors who agreed to participate signed a consent form




indicating their willingness to participate and their understanding of thestudy.

In the rst 2 weeks of each course, students were given the criticalthinking pre-test during the rst part of a regular class meeting. Beforethe test was administered, students were informed that (1) they wereparticipating in a study that was examining the critical thinking skills of university students, and (2) their participation was entirely optional and(3) their scores would have no bearing on their grade in the course. Tocounter the possibility that pre-test – post-test gains might be inuencedby one version of the test being more difficult than the other, half of theclasses received Form A as the pre-test while the other half receivedForm B, and vice versa for the post-test. Almost all students were ableto complete both the pre-test and the post-test within 20 min.

Throughout the semester, copies of course outlines, assignments, andtests from each course were obtained from each course instructor. Afterall course materials were received for a particular course, each questionon each piece of work was coded in terms of whether it was a higher-order or a lower-order question. A higher-order question was judgedas one which appeared to reect one of the top four levels of Bloom’staxonomy of cognitive objectives, namely application, analysis, synthesis,or evaluation. On the other hand, a lower-order question included thosethat reected one of the lowest two levels at the bottom of the taxon-omy, namely knowledge and comprehension. All questions were codedby the rst author as being either lower-order or higher-order. While itwas relatively straightforward to classify a particular question on a testor assignment, many courses required students to write an essay or givea presentation. The potential difficulty here is in determining what pro-portion of the entire project involved higher-order objectives. For exam-ple, a student could be faced with a question such as ‘‘Outline the prosand cons of drug testing in sports’’. On a test, one can reasonably as-sume that most if not all of the response to this question would requirehigher-order thinking. In comparison, a student who was asked to writean essay or give a presentation on ‘‘Drug testing in sports’’ could submitan essay with most of its content resulting from higher-order thinking.On the other hand, a student could give a presentation on this topic thatincluded little more than sharing denitions and facts that are readilyavailable. Therefore, because of the potentially wide range of the propor-tion of higher-order thinking that would go into essays and presenta-tions, as a conservative estimate, each essay or presentation wasclassied as being half lower- and half higher-order. To obtain an esti-mate of the proportion of coursework in each of the two Bloom catego-ries, each test question and each essay or presentation was weighted




according to how much of the nal course grade it represented. Forexample, if a particular question on a test was worth 20% of the test,and the test was worth 30% of the grade in the course, then that ques-tion would receive a weight of .06. Therefore, the proportion of higher-order content for a particular course was obtained by calculating thesum of the weights of those questions. To provide an indication of theaccuracy with which the test questions were classied, each question inthe course materials from ve courses were independently classied byanother expert rater to provide an estimate of inter-rater reliability. Be-cause the purpose of this research required each question to be classiedas either lower- or higher-order, the inter-rater reliability was calculatedas the proportion of the total number of questions in the sample of vecourses in which both raters judged a question as being either lower- orhigher-order. Based on a total of 215 questions from the ve courses,both raters gave the same classication for 74.0% of the questions.

In the nal 2 weeks of each course, students were given the criticalthinking post-test during the rst part of a regular class meeting. Thepost-test version was opposite to the version each class received at thestart of the course. Within 2 weeks after the end of the course, a letterwas mailed out to each instructor who participated. This letter servedtwo functions: to thank them for participating in the study; and to offerfeedback in the form of mean scores on the pre-test and post-test.

0.20 0.40 0.60 0.80

Proportion of Higher-Order Questions

1.50

2.00

2.50

3.00

3.50

4.00

4.50

P r e

t e s

t - P o s

t t e s

t G a

i n

FIG. 4. Scatterplot of Mean pre-test – post-test gain in critical thinking test scoresand proportion of higher-order questions for each class (Study 3, n = 23).




Results

As the items in each version were adapted from the WGCTA, each

version of the test used in this study contains the same ve subtests witheight items in each subtest. In Form A, the internal consistency esti-mates ranged from .31 for the Deduction subtest to .78 for the Recogni-tion of Assumptions subtest. In comparison, those for Form B rangedfrom .11 for the Evaluation of Arguments subtest to .82 for the Recog-nition of Assumptions subtest. These estimates indicate that while theitems in the Recognition of Assumptions subtests (both Form A and B)each focus on a particular construct quite well, that for each of theremaining subtests is moderate at best.

In comparing the two versions, it appeared that Form B was slightlymore difficult than Form A. Across 12 classes that were given Form Aas the pre-test and Form B as the post-test, the mean pre-test and post-test scores were 27.45 and 27.18, respectively, for a mean decline of .27.Across the remaining 11 classes who received the tests in the oppositeorder, the mean pre-test and post-test scores were 25.22 and 27.84,respectively, for a mean gain of 2.62. Because the value of interestwith respect to the critical thinking scores was the mean gain for eachclass, to make the two groups of classes (i.e., those who received either

Form A or Form B as the pre-test) more comparable, the differencebetween the two groups of classes on their mean gains was added as aconstant to the mean gain (or decline) of each class that received FormA as the pre-test. Therefore, a constant of 2.89 (from ) .27 to 2.62) wasadded to the mean of each class who received Form A as the pre-test.

Figure 4 shows the scatterplot of the mean gain in critical thinkingscores and proportion of higher-order questions for each class. The pro-portion of higher-order questions in each class ranged from .11 to .80with a mean of .36. The mean gain in critical thinking scores ranged

from 1.31 to 4.38 with an overall mean of 2.58. The correlation betweenmean gain in critical thinking scores and proportion of higher-orderquestions was r (22)=.42, p<.05, with frequency of use of higher-orderquestions accounting for 17.8% of the variation in critical thinking gainscores.

Discussion

The purpose of this study was to determine the extent to which ask-

ing higher-order questions on tests and assignments in university coursesis associated with gains in student critical thinking skills. The signicantpositive correlation found in this study was particularly encouraging




given that it was purely an observational study that spanned only abouta 3-month period. An interesting feature of this study compared to thetwo other studies was that it used the same method of assessing the pro-cess indicator (i.e., proportion of higher-order questions) as might beused in an actual assessment of educational quality. In addition, thisstudy attempted to classify assignment and test questions in a moreaccurate manner. Compared to studies such as that of Braxton andNordvall (1985) where proportion of higher-order questions wasmeasured on one examination only, this study included all coursematerials and calculated the proportions based on each question’sweight toward the nal grade in the course. As pointed out earlier, thereare two benets of this approach. First, it is unobtrusive as it neitherinterferes with teaching in the classroom, nor requires much time froman instructor. Second, unlike many other process indicators such aslibrary size, the proportion of higher-order questions is one that inu-ences student study activities and is under the direct control of theinstructor.

The most obvious limitation in this study was the lack of control of confounding variables such as effects of other courses. Even within aparticular class, it was unlikely that even a small proportion of its stu-dents had been taking the same other classes. Admittedly, it is difficultto conclude how much a student’s gain in critical thinking skills wasinuenced by the higher-order questions in a particular course com-pared to what that student had experienced in his or her other courses,especially when it was not known what other courses each student wastaking. Another reason to regard these results with caution is that, aspointed out by Bloom (1956), each question was classied as either lower-or higher-order based on the wording of the question. As such, this as-sumes that the answers to higher-order questions were neither coveredin class nor could be taken directly from the text. In other words, eachanswer to a higher-order question was a novel response based on infer-ence or deduction from information learned in class or from the text.A related limitation was that the raters who classied each questionwere not familiar with the knowledge of many of the disciplines in thissample, which may have reduced the level of agreement between the tworaters. Finally, although these ndings seem encouraging, perhaps aneven greater effect would have been found if this study was conductedover an entire academic year (i.e., two semesters) and included rst-yearcourses. It is plausible that the rst-year courses would show the small-est proportion of higher-order questions that, when included with cour-ses at each other year level, would increase the range of that variable.Unfortunately, most rst-year courses at this university are too large




(over 100 students) to allow for efficient administration of the pre-testand post-test measures.

The main recommendation for this type of study would be toconsider training raters in each discipline from which course materialswould be obtained. For example, an experienced engineering instructormight be better able to classify questions from any engineering coursethan might a researcher from an unrelated discipline. In this sense, itwould seem easier to train an instructor who is already knowledgeablein a particular discipline to classify questions according to Bloom’s tax-onomy rather than have an instructor who knows Bloom’s taxonomyquite well but, is unfamiliar with the content of the discipline at hand.

GENERAL DISCUSSION

The present research consisted of three studies to determine the im-pact of higher-order questions as a process variable on student gain incritical thinking skills as an outcome variable. While it is difficult for asingle study to avoid at least one serious methodological limitation, thethree studies in the present research used different designs, samples andinstruments to address the main design limitations associated with a sin-gle study. One study compared the amount of higher-order questions ontests and assignments in actual classes to pre-test – post-test gains in criti-cal thinking, while in the other two experimental studies, one comparedgroups of students given lower- vs. higher-order questions in actualclasses, and the other was a true experiment done in a laboratory thatrelated level of review questions to pre-test – post-test gains while con-trolling for possible confounding variables.

The rationale behind the main research question in these studies wasto validate the use of higher-order questions as a process indicator of educational quality. Previous studies have found little relation betweentypical process variables (e.g., library size, proportion of faculty withPhD) and student outcomes (see review by Pasacrella and Terenzini,2005). It appears that most of the process variables used in previousresearch in this area were chosen partly on the basis of expediency andpartly on the basis of a presumed relation with student outcomes. Intheory, many of those indicators seem reasonable. For example, itwould not be unreasonable to believe that students will learn morewhen taught by more qualied faculty (i.e., with PhD). Perhaps one rea-son why most process indicators fail to show a relation with studentoutcomes is because they are far too removed from what happens inactual courses and classrooms and, thus, are not directly linked withstudent learning. In other words, student learning is more likely to be




correlated with a process variable when that variable has a direct con-nection with student learning. Although we are beginning to learn moreabout what happens during the college experience that contributes togains in critical thinking skills (see reviews by McMillan, 1987; Pascarellaand Terenzini, 2005), the current studies provide a detailed, empiricallook at one particular process variable, namely frequency of use of higher-order questions, that appears to have a direct impact on thedevelopment of critical thinking.

Overall, the ndings of this research clearly indicate that students aremore likely to improve their critical thinking skills when they haveanswered higher-order questions in their coursework. These ndings areencouraging for at least two reasons. First, having found the same con-clusion using different methods and samples provides more convincingand converging evidence of the effect of higher-order questions on gainsin critical thinking skills. This effect was found using an observationalstudy, and two experiments, one done in the eld and one in the labora-tory. Secondly, these results are impressive when one considers the shorttime frame in which these studies were conducted. Pascarella andTerenzini (2005) point out that students typically show about a 19%improvement in critical thinking skills throughout their full universityexperience, from freshman to senior year. In the present research, aneffect was found in two studies conducted over the course of one semes-ter, and in a laboratory experiment that lasted under 2 hours. More-over, Halpern (2001) points out that it would seem unreasonable toexpect a substantial gain over a short period (i.e., one semester or less).

Beyond the demonstrated validity of using higher-order questions as aperformance indicator, there are several implications that stem from thisresearch. As mentioned earlier, the proportion of higher-order questionscan be measured quite easily, objectively, and unobtrusively, and unlikemany typical indicators, this variable is less likely to be distorted byeither inconsistent operational denitions or falsication. Secondly,using higher-order questions may be a more acceptable indicatorbecause unlike the more popular indicators such as library size, or selec-tivity in admission, it is directly under an instructor’s control. In otherwords, an instructor can usually choose what types of questions to usein assignments and tests based on the objectives of the course.

Perhaps the most obvious issue that can be addressed by this type of research is the evaluation of educational quality. In addition to the typi-cal comparisons that one could make, such as comparing psychologydepartments across several schools, this research could help to answerthe long debated question of whether the quality of education haseither improved or declined over the years. For example, looking at




higher-order questions, one could obtain course materials from coursestaught 30 years ago and compare those with materials from the same orsimilar courses taught today. While many expert opinions have been of-fered (e.g., Bercuson, Bothwell, and Granatstein, 1984, 1997), very fewstudies (e.g., Stalker, 1986) have attempted an objective approach to-ward answering this question. Finally, as Braxton (1990, 1993) has poin-ted out, it appears that some faculty lack sufficient training in beingable to prepare higher-order questions. The clear benets of using high-er-order questions in course work implied by the ndings of these stud-ies demonstrates the need to include training in this area as a part of faculty development.

While the results of this research clearly support the use of higher-order questions as an indicator of educational quality, a couple of important qualications are necessary. First, it must be emphasized thatfocusing on higher-order questions does not by any means give a com-plete picture of educational quality. As covered extensively in theirreview, Pascarella and Terenzini (2005) conclude that students learn anddevelop in many ways as a result of attending a post-secondary institu-tion, and these outcomes are inuenced by many factors that occur dur-ing this period. This research compared only one process with oneoutcome. Second, despite the ndings of this research and the impor-tance of critical thinking skills as an outcome variable, it would be amistake to cast a negative light on a course that had little or no empha-sis on higher-order questions. The proportion of higher-order questionsin a particular course ought to be a function of the course objectives.For example, in an introductory course in a particular discipline such asbiology, the objective may be to have students know a large number of terms to serve as a foundation for them to draw upon in subsequentupper-year courses in which they are expected to engage in higher-orderthinking in responding to the more challenging problems.

A clear limitation that existed in each of the three studies in the pres-ent research was the multiple-choice format of the critical thinking tests.One could argue that a student would need to engage in critical think-ing in order to differentiate between the most appropriate option andattractively worded distractors. However, it would certainly be morepreferable to use a critical thinking test that better reected this processin everyday life and allowed students to construct rather than selecttheir answers. A related limitation has to do with the lack of a precisedenition of critical thinking. As outlined earlier, critical thinking hasbeen dened as a construct consisting of several aspects. It is possiblethat the proportion of questions representing each aspect of criticalthinking could have differed from pre-test to post-test. This may have




reduced the degree to which the pre-test and post-test are comparable.This possible variation in the proportion of the types of critical thinkingquestions may also explain how one version was found to be moredifficult than the other version in a particular study (e.g., in Study 3,Form B was found to be more difficult than Form A). While the resultsof this research seem encouraging given its short time frame, it mayhave shown clearer effects if studied over a longer period. Dressel andMayhew (1954) have found that the biggest gains in critical thinkingskills occur during the rst 2 years of college. A second limitation isthat the ndings in each of these studies may have been attenuatedsomewhat due to the narrow observed variance in the amount of higher-order questions. Barnes (1983) and Smith (1977) note that collegeinstructors have been found to rely very heavily on lower-order ques-tions. One possible reason higher-order questions are used so infre-quently is that faculty are not adequately trained to write higher-orderquestions (Barnes, 1983; Braxton, 1990; Nordvall and Braxton, 1996).Another factor could be the academic disciplines examined. Nordvalland Braxton point out that lower-level objectives and lower-level ques-tions may be more appropriate for disciplines such as mathematics andchemistry with a high level of consensus on theory and methods, whilehigher-level objectives are more likely to occur in disciplines such as his-tory or philosophy with a variety of theoretical perspectives. Similarly,increased variance and statistical power would be obtained by includingother academic processes, such as written instructions for term papersand assignments. However, the benet of assessing the amount of high-er-order questions more thoroughly from courses across varied disci-plines may pose another problem as the accuracy with which someoneclassies higher-order items may vary with his or her level of knowledgeof that subject (Braxton and Nordvall, 1985). Bloom (1956) outlines twomain conditions in being able to correctly classify test or assignmentquestions. First, one needs to either know, or at least make assumptionsabout, the context in which the material was learned. For example, if students were given the following item on a test: ‘‘Outline three criti-cisms of Piaget’s theory of cognitive development,’’ the person classify-ing this item would have to judge whether the lecture and/or the textmerely described Piaget’s stages of cognitive development withoutexplicitly covering any particular strengths or limitations, or if thesecriticisms were clearly explained. The difference is that the former situa-tion would lead one to label this as an item at the evaluation level,while in the latter situation it could be considered a knowledge-levelitem. While Bloom also suggests that classication accuracy can beenhanced by actually attempting to solve the problem, having judges




trained to look for a particular phrase within a question can help in thisregard. For example, King (1995) provided a list of phrases such as‘‘What is the best___ and why?’’ that commonly appear in higher-or-der questions at various levels of Bloom’s taxonomy. Therefore, withthe assumption that either the answer to this question has not beenexplicitly covered in the course, or the student has not learned thissometime before starting the course, judges ought to be fairly consis-tent in being able to at least determine whether or not this is a high-er-order question.

In addition to the specic recommendations outlined in each study,one general recommendation would be to validate other process indi-cators that are directly linked with student outcomes. In a mannersimilar to the approach used in this research, future studies could usedifferent designs varying in their levels of internal and external valid-ity. One viable process variable is the amount of instructors’ feedbackto students. While the amount of feedback has been found to bepositively related to admission selectivity (Ewell, 1989), it seems thatit has not been explored much in terms of its relation to student out-comes. Another process variable that could be explored further is thelevel and type of student involvement in the classroom. In an obser-vational study, Smith (1977) found that classroom participation andstudent – teacher interactions were positively related to gains in stu-dents’ critical thinking skills.

The aim of this area of research is to determine which aspects of aninstitution one should look at in order to obtain the most validassessment of the performance of that institution. As the need for post-secondary institutions to be accountable to students, parents, industry,and even investors is increasing, further research in this area is essentialso that more informed decisions can be made regarding the assessmentand comparison of schools in terms of the quality of the education theyare providing.

NOTES1. While some of the criteria for inclusion are more clearly xed (e.g., having completed all

tests), the particular cutoff scores for others may not be quite as straightforward to justify(e.g., specic test scores). Although the latter type of criteria could be considered morearbitrary to some degree, each cutoff score can be at least somewhat justied as it was setat an arguably reasonable level in an attempt to address some of the relevant validityissues (e.g., ceiling effect, lack of motivation). In each of the three studies, the pattern of

the results based on all cases with complete data was similar to those reported here.




REFERENCES

Astin, A. W. (1970). The methodology of research on college input, Part one. Sociology of

Education 43: 223 – 254.Astin, A. W. (1971). Open admissions and programs for the disadvantaged. Journal of Higher

Education 42: 629 – 647.Astin, A. W., and Henson, J. W. (1977). New measures of college selectivity. Research in Higher

Education 6: 1 – 9.Astin, A. W., and Panos, R. J. (1969). The Educational and Vocational Development of College

Students , American Council on Education, Washington, D. C.Ball, R., and Halwachi, J. (1987). Performance indicators in higher education. Higher Education

16: 393 – 405.Barnes, C. (1983). Questioning in the college classrooms. In: Ellner, C., and Barnes, C. (eds.),

Studies in College Teaching: Experimental Results, Theoretical Interpretations and NewPerspectives , D. C. Heath, Lexington, MA.

Bercuson, D. J., Bothwell, R., and Granatstein, J. L. (1984). The Great Brain Robbery:Universities on the Road to Ruin , McClellend and Stewart Limited, Toronto, Canada.

Bercuson, D. J., Bothwell, R., and Granatstein, J. L. (1997). The Crisis in Canada’s Universities:Petried Campus , Random House of Canada, Toronto, Canada.

Bloom, B. S. (1956). Taxonomy of Educational Objectives: Cognitive Domain , McKay,New York.

Borden, V. M. H., and Bottrill, K. V. (1994). Performance indicator: history, denitions, andmethods. In: Borden, V. M. H., and Banta, T. W. (eds.), Using Performance Indicators toGuide Strategic Decision Making , Jossey-Bass, San Francisco.

Braxton, J. M. (1990). Course-level academic processes as indicators of the quality of undergraduate education. Instructional Developments 1: 8 – 10.

Braxton, J. M. (1993). Selectivity and rigor in research universities. Journal of Higher Education64: 657 – 675.

Braxton, J. M., and Nordvall, R. C. (1985). Selective liberal arts colleges: Higher quality as wellas higher prestige? Journal of Higher Education 56: 536 – 554.

Braxton, J. M., and Nordvall, R. C. (1988). Quality of graduate department origin of facultyand its relationship to undergraduate course examination questions. Research in HigherEducation 28: 145 – 159.

Bruneau, W., and Savage, D. C. (2002). Counting out the scholars: How performance indicatorsundermine universities and colleges , Lorimar, Toronto.

Conrad, C. F., and Blackburn, R. T. (1985). Program quality in higher education: A review andcritique of literature and research. In: J. C., Smart (ed.), Higher Education: Handbook of Theory and Research (Vol. 1), Agathon, New York, pp. 283 – 308.

Dressel, P. L., and Mayhew, L. B. (1954). General education: Explorations in evaluation ,American Council on Education, Washington, D. C.

Ennis, R. H. (1985). A logical basis for measuring critical thinking skills. Educational Leadership43: 44 – 48.

Ewell, P. T. (1989). Institutional characteristics and faculty/administrator perceptions of outcomes: An exploratory analysis. Research in Higher Education 30: 113 – 136.

Facione, P. A. (ed.) (1990). Critical Thinking: A Statement of Expert Consensus for Purposes of Educational Assessment and Instruction , American Philosophical Association, ERIC ID 315

423.Ferguson, N. B. L. (1986). Encouraging responsibility, active participation, and critical thinking

in general psychology students. Teaching of Psychology 13: 217 – 218.




Foster, P. (1983). Verbal participation and outcomes in medical education: A study of third-year clinical-discussion groups. In: Ellner, C., and Barnes, C. (eds.), Studies in CollegeTeaching: Experimental Results, Theoretical Interpretations and New Perspectives , D. C.

Heath, Lexington, MA.Furedy, C., and Furedy, J. (1985). Critical thinking: Toward research and dialogue.In: Donald, J., and Sullivan, A. (eds.), Using Research to Improve Teaching (New Direction for Teaching and Learning No. 23) , Jossey-Bass, San Francisco.

Gadzella, B. M., Ginther, D. W., and Bryant, G. W. (1996). Teaching and Learning CriticalThinking Skills. Paper Presented at the International Congress of Psychology (26th, Montreal,Quebec, August, 1996) .

Gage, N. L., and Berliner, D. C. (1998). Educational Psychology (6th Ed.), Houghton MifflinCompany, Boston.

Gareis, K. C. (1995). Critiquing articles cited in the introductory textbook: A writingassignment. Teaching of Psychology 22: 233 – 235.

Halpern, D. F. (1996). Thought and Knowledge: An Introduction to Critical Thinking , (3th Ed.),Erlbaum, Mahwah NJ.

Halpern, D. F. (1998). Teaching critical thinking for transfer across domains. AmericanPsychologist 53: 449 – 455.

Halpern, D. F. (2001). Assessing the effectiveness of critical thinking instruction. The Journal of General Education 50: 270 – 286.

Hossler, D. (2000). The problem with college rankings. About Campus 5: 20 – 24.Keeley, S. M., Ali, R., and Gebing, T. (1998). Beyond the sponge model: Encour

aging students’ questioning skills in abnormal psychology. Teaching of Psychology 25:270 – 274.

King, A. (1989). Effects of self-questioning training on college students’ comprehension of

lectures. Contemporary Educational Psychology 14: 1 – 16.King, A. (1990). Enhancing peer interaction and learning in the classroom through reciprocal

questioning. American Educational Research Journal 27: 664 – 687.King, A. (1995). Inquiring minds really do want to know: Using questioning to teach critical

thinking. Teaching of Psychology 22: 13 – 17.Logan, G. H. (1976). Do sociologists teach students to think more critically? Teaching Sociology

4: 29 – 48.McGuire, M. D. (1995). Validity issues for reputational studies. In: Walleri, R. D., and Moss,

M. K. (eds.), Evaluating and Responding to College Guidebooks and Rankings , Jossey-Bass,San Francisco, pp. 45 – 59.

McMillan, J. H. (1987). Enhancing college students’ critical thinking: A review of studies.Research in Higher Education 26: 3 – 29.

Nedwick, B. P., and Neal, J. E. (1994). Performance indicators and rational management tools:A comparative assessment of projects in North America and Europe. Research in HigherEducation 35: 75 – 103.

Nordvall, R. C., and Braxton, J. M. (1996). An alternative denition of quality of undergraduate education: Toward usable knowledge for improvement. Journal of HigherEducation 67: 483 – 497.

Pascarella, E. T., and Terenzini, P. T. (2005). How College Affects Students: A Third Decade of Research , (Vol. 2), Jossey-Bass, San Francisco.

Paul, R. (1993). Critical Thinking: How to Prepare Students for a Rapidly Changing World , (3thEd.), Sonoma State University Center for Critical Thinking and Moral Critique, RohnertPark, CA.

Prentice, D. A., and Miller, D. T. (1992). When small effects are impressive. Psychological Bulletin 112: 160 – 164.




Scheirer, M. (1994). Designing and using process evaluation. In: Wholey, J. S., Hatry, H. P.,and Newcomer, K. E. (eds.), Handbook of Practical Program Evaluation , Jossey-Bass,San Fransisco.

Smith, D. G. (1977). College classroom interactions and critical thinking. Journal of Educational Psychology 69: 180 – 190.Smith, D. G. (1980). College instruction: Four empirical views. Instruction and outcomes in an

undergraduate setting. Paper Presented at the Annual Meeting of the American Educational Research Association, Boston, April, 1980 .

Solmon, L. C. (1973). The denition and impact of college quality. In: Solmon, L., andTaubman, P. (eds.), Does College Matter? Academic Press, New York, pp. 77 – 105.

Solmon, L. C. (1975). The denition of college quality and its impact on earnings. Explorationsin Economic Research 2: 537 – 587.

Stalker, R. G. (1986). Is the quality of university education declining? Survey of faculty attitudesand longitudinal comparison of undergraduate honours theses . Unpublished Honours Thesis,

University of Western Ontario, London, Canada.Tan, D. L. (1986). The assessment of quality in higher education: A critical review of the

literature and research. Research in Higher Education 24: 223 – 265.Tsui, L. (1999). Courses and instruction affecting critical thinking. Research in Higher Education

40: 185 – 200.Tsui, L. (2002). Fostering critical thinking through effective pedagogy: Evidence from four case

studies. Journal of Higher Education 73: 740 – 763.Watson, G. B., and Glaser, E. M. (1980). Watson – Glaser Critical Thinking Appraisal , The

Psychological Corporation, San Antonio.Webster, D. (1992). Are they any good? Rankings of undergraduate education in U. S. News

and World Report and Money. Change 24: 19 – 31.

Williams, R. L., Oliver, R., Allin, J. L., Winn, B., and Booher, C. S. (2003). Psychologicalcritical thinking as a course predictor and outcome variable. Teaching of Psychology 30:220 – 223.

Williams, R. L., Oliver, R., and Stockdale, S. (2004). Psychological versus generic criticalthinking as predictors and outcome measures in a large undergraduate human developmentcourse. Journal of General Education 53: 37 – 58.

Williams, R. L., and Stockdale, S. L. (2003). High performing students with low criticalthinking skills. Journal of General Education 52: 199 – 225.

Willis, S. A. (1992). Integrating levels of critical thinking into writing assignments forintroductory psychology students. Paper presented at the annual meeting of the AmericanPsychological Association, Washington, DC, August 1992 .

Winne, P. H. (1979). Experiment relating teachers’ use of higher cognitive questions to studentsachievement. Review of Educational Research 49: 13 – 50.

Received September 6, 2005


the validity of higher-order questions as a process indicator of educational quality

Documents