evaluating prospective elementary teachers' understanding of science and mathematics in a model...

27
JOURNAL OF RESEARCH IN SCIENCE TEACHING VOL. 32, NO. 7, PP. 749-775 (1995) Evaluating Prospective Elementary Teachers’ Understanding of Science and Mathematics in a Model Preservice Program Teresa M. McDevitt and Rod Troyer Educational Psychology, College of Education, University of Northern Colorado, Greeley, Colorado 80639 Anthony L. Ambrosio Department of Administration, Counseling, Educational and School Psychology, Wichita State University, Wichita, Kansas 67260-0123 Henry W. Heikkinen Mathematics and Science Teaching Center, University of Northern Colorado, Greeley, Colorado 80639 Erica Warren Department of Educational Psychology, Research and Measurement, University of Georgia, Athens, Georgia 30605 Abstract Two cohorts of students preparing to become elementary teachers participated in a model program in science and mathematics. These students were compared to other students taking similar courses on their conceptual understandings of science and mathematics, their investigative proficiencies, and their beliefs about effective methods of teaching these subjects. Instruments included newly developed tests of under- standing, existing standardized achievement tests, and instruments devised for the evaluation to elicit conceptions of appropriate ways to teach science and mathematics to elementary children. Results from individual courses indicated that students participating in the model program developed more thorough understandings and more reform-minded beliefs related to teaching science and mathematics. Issues associated with the assessment and evaluation of innovative programs in science and mathematics are discussed, and recommendations for teacher preparation are offered. Nationwide, clusters of new curricula and methods are being tested on the next generation of teachers. Without scrutiny, we cannot claim that these innovations merit continuation or even refinement. Unfortunately, comprehensive data that might guide teacher education programs are scarce (Galluzzo, 1986; Galluzzo & Craig, 1990). Without public discussion of evaluation data, reform will remain rhetoric (see Hurd, 1993). The present investigation adds to the limited existing discourse on the impact of different teacher education programs. It focuses specifically 0 1995 by the National Association for Research in Science Teaching Published by John Wiley & Sons, Inc. CCC 0022-4308/95/070749-27

Upload: teresa-m-mcdevitt

Post on 06-Jul-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

JOURNAL OF RESEARCH IN SCIENCE TEACHING VOL. 32, NO. 7, PP. 749-775 (1995)

Evaluating Prospective Elementary Teachers’ Understanding of Science and Mathematics in a Model Preservice Program

Teresa M. McDevitt and Rod Troyer

Educational Psychology, College of Education, University of Northern Colorado, Greeley, Colorado 80639

Anthony L. Ambrosio

Department of Administration, Counseling, Educational and School Psychology, Wichita State University, Wichita, Kansas 67260-0123

Henry W. Heikkinen

Mathematics and Science Teaching Center, University of Northern Colorado, Greeley, Colorado 80639

Erica Warren

Department of Educational Psychology, Research and Measurement, University of Georgia, Athens, Georgia 30605

Abstract

Two cohorts of students preparing to become elementary teachers participated in a model program in science and mathematics. These students were compared to other students taking similar courses on their conceptual understandings of science and mathematics, their investigative proficiencies, and their beliefs about effective methods of teaching these subjects. Instruments included newly developed tests of under- standing, existing standardized achievement tests, and instruments devised for the evaluation to elicit conceptions of appropriate ways to teach science and mathematics to elementary children. Results from individual courses indicated that students participating in the model program developed more thorough understandings and more reform-minded beliefs related to teaching science and mathematics. Issues associated with the assessment and evaluation of innovative programs in science and mathematics are discussed, and recommendations for teacher preparation are offered.

Nationwide, clusters of new curricula and methods are being tested on the next generation of teachers. Without scrutiny, w e cannot claim that these innovations merit continuation or even refinement. Unfortunately, comprehensive data that might guide teacher education programs are scarce (Galluzzo, 1986; Galluzzo & Craig, 1990). Without public discussion of evaluation data, reform will remain rhetoric (see Hurd, 1993). The present investigation adds to the limited existing discourse on the impact of different teacher education programs. It focuses specifically

0 1995 by the National Association for Research in Science Teaching Published by John Wiley & Sons, Inc. CCC 0022-4308/95/070749-27

Page 2: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

750 McDEVITT ET AL.

on the conceptual understandings, beliefs about teaching, and academic achievement in science and mathematics of students preparing to teach at the elementary level.

Need for Reform in the Preparation of Teachers in Science and Mathematics

The fundamental position played by science and mathematics in today’s society under- scores the need to inspire citizens to high levels of mastery. Yet, confidence in this attainment has eroded. Reproaches have been issued about an unprepared citizenry and a diminished pipeline of trained professionals (cf. Berryman, 1983; Jones, 1988). Many educational, politi- cal, and corporate organizations observe that optimal preparation in science, mathematics, and other subjects is not taking place in America’s schools (e.g., American Association for the Advancement of Science, 1989; National Commission on Excellence in Education, 1983; Na- tional Science Board Commission on Precollege Education in Mathematics, Science, and Tech- nology, 1983). These observations have been elaborated on by researchers who conclude that instruction in these and other subjects is conceptually fragmented, inaccurate, unmotivating, unfocused, biased to encourage some subgroups over others, and misguided in its emphasis on memorization of isolated facts (e.g., Brown & Campione, 1991; Brown, Cooney, & Jones, 1990; Dempster, 1993; Eylon & Linn, 1988; Kahle, 1990; Oakes, 1990; Tobin, 1990).

Reform movements have gained force to address these inadequacies; these movements focus on the need to prepare teachers to enhance meaningful learning and conceptual change on the part of children, incorporate investigative hands-on activities, and relate material to chil- dren’s previous understandings and current interests (e.g., Champagne & Bunce, 1991 ; National Council of Teachers of Mathematics, 1991; St. John, 1992). Many educators also propose that teachers learn to encourage children to participate in small groups and become aware of how to ensure that all children are challenged and encouraged. Finally, such transformations of the curriculum have implications for assessment; teachers must learn to assess children’s learning in valid and sensitive ways.

An Attempt at Reform: The Preservice Elementary SciencelMathematics Project

At the University of Northern Colorado, a group of faculty, administrators, and experienced elementary teachers initiated an optional program of study in 1987 for prospective elementary teachers. The university received funding from the National Science Foundation to design, implement, and evaluate a model program in science, mathematics, and related education courses. As part of the program, students completed nine courses, most of which contributed to the fulfillment of customary general education, professional teacher education, and elementary certification requirements at the university. Students completed the program during their first 2.5 years on campus.

With one exception-Equity Issues in a Technological Society-all program courses were based on existing university offerings. Program courses were substantially revised to address project objectives. Instructors of project courses encouraged students to solve problems, conduct experiments, and participate in other engaging hands-on activities. Course materials and teaching strategies that had a bearing on elementary classroom activities were highlighted, and the relevance of material to students’ everyday lives was addressed. Cooperative learning was a key feature of instruction, and project instructors worked in team capacities with experienced elementary teachers in course revision and delivery (Heikkinen, McDevitt, & Stone, 1992).

Previous reports on the project have documented a positive effect on students’ attitudes, concerns, and beliefs about teaching and learning (Constas, 1992; Gardner, McDevitt, & Constas, 1990, 1991; McDevitt, Heikkinen, Alcorn, Ambrosio, & Gardner, 1993). For exam-

Page 3: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS ’ UNDERSTANDING 75 1

ple, project students became more eager to teach science and mathematics than did a compara- ble control group, and they became more concerned with sharing their insights with colleagues and extending the strategies they had learned. Yet, the objectives of the project extended beyond attitude change to improvements in understanding of content and methods of teaching. In this article, we report evaluation results about students’ conceptual understanding and achievement in science and mathematics and their views about teaching these subjects.

Program Evaluation Issues

Making judgments about the merit of a new program is risky business. Typically, the entire research enterprise is conducted quickly and with low cost. Evaluators have restricted freedom to design programmatic features that might clarify causal effects. Threats to validity abound. Not surprisingly, program evaluation data frequently have no use beyond their immediate sphere. Yet, evaluation data are sorely needed in teacher education.

The present evaluation was designed to provide comprehensive information about the experiences of students participating in a model program in science and mathematics. Project students were compared with other students enrolled in nonproject sections of the same courses as project students the same semester. Additional information about methods and threats to validity is reported elsewhere (McDevitt et al., 1993).

Evaluative information was obtained at the end of the semester from project students enrolled in specific project courses and from other students enrolled in nonproject sections of the same courses. Occasionally, we were able to compare the end-of-semester data to comparable information collected at the beginning of the semester. There was no control section for the Equity Issues in a Technological Society course because it was offered only to project students (for evaluation data related to this specific course, see Ambrosio, McDevitt, Gardner, & Heik- kinen, 1991; Gardner et al., 1990, 1991).

Differences between Project and Control Courses

Project and control instructors came from a variety of backgrounds. With the exception of one control instructor, all possessed doctoral degrees. Both groups included diversity in back- grounds; slightly more of the project instructors were tenured or held tenure-track positions. On a few occasions, project instructors taught both project and nonproject sections of a course. In addition, project students have been previously compared with other university students; they were similar in entry achievement and attitudes toward science and mathematic to other students preparing for elementary teaching who matriculated at the university as freshmen at the same time (McDevitt et al., 1993). All project students were seeking certification in elementary teaching. With only a few exceptions, students in control sections of courses were likewise seeking certification in elementary teaching. The few exceptions were control students who sought teaching at another level (e.g., secondary).

Programmatic differences existed in the overall experiences of students participating in project and control sections. Students in control sections had numerous options in their pro- grams (for example, they had more flexibility in the specific science courses they took, and the sequence and stage in which they took them). Thus, it is not possible to depict a uniform program. Project students, in contrast, participated as two cohorts in a defined sequence of nine courses. This defined sequencing of courses contributed to conceptual coherence across courses (instructors teaching courses later in the sequence were assured that certain concepts had been encountered by students in earlier courses) and to significant social support among students, who came to develop social bonds over time as they repeatedly worked together. Project

Page 4: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

152 McDEVITT ET AL.

students were informed of objectives during recruitment and subsequent administrative and class meetings.

In addition, we assume that control students had less exposure to instructional strategies targeted by project instructors, such as inquiry-based methods (e.g., as implemented through learning cycle perspectives), cooperative learning techniques, attention to the needs of students from diverse backgrounds, and selection of content underscoring connections to everyday decision making and problem solving. Lecture is perhaps the most common format of university science instruction (Spector, 1987), and student comments indicated that lecture was more often featured in control sections than in project sections. In addition, project students were more often exposed to classroom teachers, who served a key role in the revision and delivery of project courses (Heikkinen et al., 1992). Finally, project courses tended to include earlier and more frequent field-based experiences (McDevitt et al., 1993). Hence, project sections at- tempted to incorporate several proposed strategies for reform, whereas control sections were more typical of our nation’s broad survey-coverage orientation.

Nature of Methods Used in the Evaluation

Careful consideration must be given to evaluation methods employed with innovative programs. In particular, we must address limitations identified with traditional standardized instruments (see Haladyna, Nolen, & Haas, 1991; Morison, 1992; Romberg, Zarinnia, & Collis, 1990; Tucker, 1991). Out of these concerns have come arguments for alternative meth- ods (Baker, 1988, 1990; Baron, 1987; Resnick, 1987). In contrast to many traditional assess- ment devices, alternative assessments may incorporate a variety of approaches; they give students time to answer in a thoughtful manner; they are not restricted to paper-and-pencil assessments of skills and attitudes; they include measurement of classroom dynamics as well as the performance of individuals; and they are analyzed quantitatively and qualitatively. In addi- tion, the assessments are integrated with learning activities (Kulm & Stuessy, 1991). Specific to science and mathematics, tests allow assessment of conceptual knowledge, process skills, and higher-order thinking in addition to factual knowledge and skills (Murnane & Raizen, 1988).

In this evaluation, instruments were designed with the following characteristics. In most cases, our assessment devices allowed students to phrase responses in their own terms, and not merely to choose among defined alternatives. Researchers have depicted higher-order thinking as nonalgorithmic (Resnick, 1987); thus, our assessment devices had to allow students to go beyond taking a path of action that was fully specified in advance. Interrater reliability of responses to open-ended tasks was addressed by ensuring that two or more raters always established reliability blinded to group membership. Concerns are often raised about the gener- alizability of performance assessments; a range of different types of instruments was adminis- tered to cope with this possible limitation. Students’ understanding of subject matter content in science and mathematics, methods of teaching these subjects, and familiarity with related educational issues, such as equity concerns, combine to affect children’s conceptual understand- ing in these domains (Ball & McDiarmid, 1990; Schulman, 1986). Hence, these general capaci- ties merit examination in the evaluation of this and other programs.

A range of conceptual understandings and beliefs were assessed: (a) scientific investigative skills (e.g., interpreting graphs and data and designing studies); (b) understanding of basic concepts (e.g., formulating applications of concepts such as metric measurement), (c) beliefs about how to teach science and mathematics effectively (e.g., students were asked to describe general effective teaching strategies for science and mathematics); (d) achievement in science and mathematics (e.g., through standardized multiple-choice tests in earth science and physical

Page 5: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 753

science); and (e) understanding of how to teach in a fashion that encourages all students (e.g., students were asked to evaluate learning situations).

We expected project students to perform at higher levels than control students on instru- ments requiring investigative skills such as designing experiments. We also expected that their knowledge of mathematical concepts that nourish science, such as scientific notation, would be more thorough. In addition, we predicted that their beliefs about ideal teaching would be more consistent with project-endorsed strategies, such as using conceptually based hands-on activities and addressing needs of diverse groups of children. We should note that although we tended to hold expectations about how project and control groups would differ in their responses to open- ended questions, we typically developed coding schemes that represented the range of actual responses obtained, rather than merely analyzing preconceived elements.

There were three standardized instruments in the pool of measurement devices. Content was pared down and sharply focused for project students, making it likely that they would perform more poorly on these instruments than control students, who were exposed to more typical survey coverage. We did not view existing standardized tests as prime indicators of understanding for preservice teachers. However, standardized tests have enjoyed a long record of administration, and their psychometric properties tend to be established. Also, when innova- tive and traditional programs are compared, conclusions on their impact often depend on the match between the curricula and the nature of the instrumentation (Walker & Schaffarzick, 1974). Including such measures provided an indication of whether content paring measurably restricted students’ broad-based knowledge of science and mathematics.

Limitations to instrumentation should be acknowledged. We selected and developed spe- cific instruments based on concepts of the course specified in university catalog descriptions as well as through conversations with course instructors. In addition, the evaluation team endeav- ored to infuse a variety of formats into the instrument pool, and there was no single conceptual target that pervaded it. This approach, admittedly somewhat fragmented, does offer the advan- tage of the diversity of formats. Given the dearth of available models of appropriate evaluation instruments for preservice science and mathematics courses, it seemed worthwhile to us to use a range of formats that efficiently measured key concepts. We should also acknowledge that time available to administer instruments was uneven across courses, leading to variation in the depth of information available.

Methods and Results

Overview

In the following sections, instruments administered to two cohorts of students are described for individual courses. These descriptions are followed by summaries of results for particular courses.

Subjects. Two cohorts of project students participated in this model program. The first cohort entered in Fall 1988 (Cohort 1). The second cohort entered in Fall 1989 (Cohort 2). Similarly, there were two cohorts of control students. Students in both project and control groups pursued a variety of academic majors. Comparison of project and control groups indi- cated similarity on demographic variables (Table 1). There was a slight tendency for the project group to contain a larger proportion of women. Only a small proportion of each group was classified as minority. Actual class configurations and numbers of students taking individual

Page 6: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

Tabl

e 1

Fre

qwnc

ies of B

ackg

roun

d V

aria

bles

for

Pro

ject

and

Con

trol

Gro

ups

Proj

ectlc

ohor

t 1

Con

trol

lcoh

ort 1

Pr

ojec

tlcoh

ort 2

C

ontr

ollc

ohor

t 2

Gen

der

iden

tity

Gen

der

iden

tity

Gen

der

iden

tity

Gen

der

iden

tity

Rac

ialle

thni

c R

acia

lleth

nic

Rac

iaU

ethn

ic

Rac

ialle

thni

c

MF

WH

BA

NM

F W

HB

AN

MF

WH

BA

NM

F

WH

BA

N

3 E

a~th

Sd.

0 64

59

3

0 2

0 0

32

30

1 0

1 0

3 58

55

5

1 0

0 1

19

19

0 0

0 1

R co

urse

Mat

h. I

0

57

52

3 0

2 0

I2

5 2

1 2

0 3

0 3

48 46

4 1

0 0

3

10

10

3 0

0 0

m

Tch

.Mat

h.

0 54

51

2

0 1

0 2

31

32

0 0

I 0

3 48

53

5

I 0

0

12

1

20

1 0

0 1

Mi&

. I

0 41

38

2

0

10

I2

4 2

0 2

0 3

0 2

35

35 2

0 0

0 0

7 6

0 0

0 1

4 > r

Ed.

Ryc

h.

0 44

41

2

0 I

0 h

12

16

2 0

0 0

2 41

39

3

I 0

0

18

42

53

5 0

I 1

Phy

s.Sc

i. 0

43

40

2 0

1 0

S 49

50

1 1

I I

1 37

35

3

0

0 0

7 60

65

1 0

0 I

Tch

.Sci

. 0

41

38

2 0

I 0

0 28

24

4

0 0

0 2

40

38

5 I

0 0

2 31

33

0

0 0

0 B

io.S

ci.

0 42

39

2

0 1

0 2

SO

49

1 1

0 0

2 32

30

3

I 0

0 1

36

35

2 0

0 0

$

Not

e. A

bbre

viat

ed t

itles

are

use

d fo

r co

urse

s. M

= m

ale,

F =

fem

ale;

W =

whi

te; H

= H

ispa

nic,

Lat

ino,

or C

hica

no; B

= B

lack

lAfr

ican

Am

eric

an; A

= A

sian

; N

= N

ativ

e A

mer

ican

. N

umbe

rs o

f pr

ojec

t st

uden

ts te

nded

to

decr

ease

with

sub

sequ

ent c

cucs

es w

ith a

ttri

tion.

Stu

dent

s in

con

trol

cou

rses

ten

ded

to be

diffe

rent

in

divi

dual

s fro

m c

ours

e to

cou

rse.

Page 7: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 755

assessment devices sometimes differed slightly because no makeup examinations were given. The attrition apparent in Table 1 for project students was almost entirely accounted for by their departure from the university. In virtually all cases, course-control students were also preparing to become teachers. Sample size for project students ranged from 32 to 64; sample size for control students ranged from 7 to 67.

Materials and Results

Courses are described here in the order in which they were taken (although there were typically two project courses offered per cohort each semester, so some of the courses were offered concurrently). Included in each class description is a summary of the course taken from the university’s bulletin and other sources, a description of assessment devices used, and the corresponding results. Unless otherwise stated, only a posttest was given. An (Y level of .05 was established for statistical tests, although lower obtained probability levels are noted when appropriate.

Earth Science Concepts For Elementary Teachers

Overview

This course was an investigation of the basic concepts of earth science. Typically, instruc- tional formats included lecture, discussion, and laboratory investigations. The project group experienced two laboratory sessions and one lecture per week. The control group received two lectures and one laboratory session per week. Evaluation instruments included one standardized test and one instrument measuring students’ interpretation of a graph and ability to design an experiment.

Assessment Devices

American Geological InstituteINational Science Teachers Association Earth Science Test (Cohort 1 only). The American Geological InstituteINational Science Teachers Association Earth Science Test (1988, Version A) was designed to assess the outcomes of beginning-level earth science courses for Grades 9 through 12 (Callister & Mayer, 1988). Students completed Part 1 of the test, which consisted of sections on earth history, ocean and lakes, solid earth, and atmosphere and space. A total of 60 multiple-choice questions were selected. Items focused on content knowledge, although some required interpretation of graphs or data.

Investigations in Earth Science (Cohort 2 only). The Investigation in Earth Science instru- ment was developed for the project, which assessed students’ understanding of basic earth science concepts and ability to design an experiment. The instrument contained three primary questions. First, an experiment addressing rates at which both soil and water absorb and radiate heat energy was described and pictured, with results shown on an adjoining graph. Students were asked to draw conclusions from the data given. Students received up to 12 points for their interpretations. Specifically, they obtained one point for each of the following relationships they mentioned: (a) There was a similar pattern of change for both soil and water temperatures. (b) Temperature changes were influenced by the presence or lack of light. (c) Heat gained equals

Page 8: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

156 McDEVI’M ET AL

heat lost. (d) Soil temperatures change more rapidly than water temperatures. (e) Water retains heat longer. (f) The experiment measured temperatures of three different materials: air, water, and soil. (8) Air temperatures change more rapidly than either soil or water temperatures. (h) Temperatures over soil change more rapidly than temperatures over water. (i) The experiment measured rate variables. (j) The experiment measured four different conditions (temperature of soil, temperature of water, temperature of air over soil, and temperature of air over water). (k) At least three of the temperatures can be ranked. (1) The four temperatures rank in the order: air over soil > air over water > soil > water.

On the second item, students were asked to list four variables that affect how quickly a sample of water evaporates. Students received up to four points for mentioning the variables of temperature, humidity, quantity, (surface) area, wind, and barometric pressure. A follow-up question asked students to formulate a hypothesis regarding how one of the proposed variables might affect evaporation. One point was awarded for stating a hypothesis; an additional point was awarded for justifying it.

In the third part, we provided students with a list of equipment and asked them to design an experiment to test the hypothesis listed in Part 2. One point was awarded for each of the following characteristics: (a) Experiments had construct validity. (b) Experiments include ma- nipulated variables. (c) Simple experiments manipulated only one variable. (d) Nonmanipulated variables were acknowledged. (e) Extraneous variables were controlled. ( f ) Identified variables were controlled. (g) Conditions minimized the effects of unidentified variables. (h) Analogous models, such as a sponge to model atmospheric humidity, were useful. (i) Actual real-world experimental conditions can provide data that are more valid than analogous models. (j) Some of the selected equipment was used appropriately. (j) Proper experimental control required proper equipment use. (k) Experimental design was feasible, even if incorrectly or incompletely modeled. (I) Experimental design was feasible, correct, and complete. (m) The description of the procedure was clear.

Interrater reliability was calculated from a sample of 32 students: 16 project and 16 control. Raters coded blinded to group membership. Interrater agreement across 34 items had a mean of 85%, with a range of 66-100%. Correlation of total scores for both raters was r = .88. For all 34 items, Cronbach’s (1951) alpha was .90.

Results

American Geological InstitutelNational Science Teachers Association Earth Science Test. No significant differences were found between groups. Out of a possible 60, the project mean was 34.5 (SD = 8.01); the control mean was 36.3 (SD = 8.05).

Investigations in Earth Science. Project students outperformed control students on both the conceptual and design portions of the instrument. The project students’ grand mean was 16.68 (SD = 6.34); control students’ grand mean was 9.28 (SD = 5.63), F( 1, 116) = 44.88, p <.001. Significant group differences were found.for all subtotals. The largest difference was on the third question (design an experiment). The project mean was 7.27 (SD = 4. IS), whereas the control mean was 2.66 (SD = 4.42), F( 1, 116) = 34.14, p <.001. There was also great variation in responses to interpretation of the graph. The project mean was 5.12 (SD =2.78); control mean was 3.41 (SD = 2.48), F(1, 116) = 12.37, p < .001. For evaporation variable naming, the project mean was 2.73 (SD = 0.71), whereas the control mean was 2.12 (SD = 1.09), F( 1, 116) = 13.12, p < .001. For evaporation hypotheses, the project mean was 1.57 (SD = 0.62), whereas the control mean was 1.09 (SD = 0.78), F(1, 116) = 13.78, p < .001.

Page 9: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 757

Summary of Results. Project and control students did not differ on the standardized test of knowledge of earth science. However, project students were more proficient at designing an experiment, interpreting a graph, defining variables that affect evaporation rates, and formulat- ing hypotheses.

Fundamentals of Mathematics I

Overview

This course is the first of two mathematics courses designed for prospective arithmetic teachers. The course content included numeration, elementary set theory, problem solving, number theory, and a development of the systems of whole numbers, integers, and rational numbers. Evaluation instruments included a multiple-choice test regarding mathematics instruc- tional equipment, and an instrument measuring students’ appreciation of the everyday applica- tions of mathematics as well as its multicultural origins.

Assessment Devices

Math Equipment Test (Cohort I only, pretest and post-test). The Math Equipment test was a 20-item multiple-choice test designed to assess a student’s knowledge about using equipment and exercises (e.g., abacus, the mathematics balance, tangrams) to demonstrate mathematical principles (this instrument was developed by Charles McNerney, an instructor for the course and a member of the project’s senior staff). One example of a test question is: “Tangrams are used primarily with the study of which of the following? (a) Geometry; (b) number theory; (c) the history of mathematics; (d) problem solving; (e) none of the above.” (The correct answer is d.) Internal consistency could not be estimated on this instrument because of variations in item difficulty level.

Utility, Origin, and Applications of Mathematics Test (Cohort 2 only). The Utility, Origin, and Applications of Mathematics test was a 12-item instrument designed for use in the project and intended to measure a student’s perceptions of the use of mathematics. Four items specifi- cally relating to equity issues and mathematical usage were analyzed for this report.

In the first question, we asked students whether they agreed or disagreed with the statement, “In their everyday lives and jobs, few people use very much mathematics.” Tallies were kept of those who agreed or disagreed with the statement, and of those who defended their answer. The second question asked students what they might tell a predominantly Hispanic third-grade class about the history and origins of mathematics. The answers were subjected to categoric analysis. Categories were as follows: (a) Mathematics is acultural-for example, “Everyone uses math.” (b) Mathematics has a Hispanic history-for example, “The Mayans had the concept of zero.” (c) Generic reference was made to the special (unique) needs of the class-for example, “I would have to do some research before I could answer this question.” (d) Generic teaching approaches were described, such as “Bring in role models.” The third question asked the students to name any women mathematicians they could think of, or their contributions to the field of mathematics. Two points were awarded for naming a female mathematician, such as Lady Lovelace, Amelia Earhart, Madame Curie, or “my ninth-grade teacher” (extreme latitude was allowed). One point was given for any generic reference to women’s contributions (for example, “Wasn’t it some lady who wrote the first computer program?’). The fourth question asked the students to list as many manipulatives as they could, along with their associated use and the appropriate concept being addressed.

Page 10: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

758 McDEVITT ET AL.

Interrater reliability between two raters coding blinded as to group membership was calcu- lated from a sample of 24 students: 12 project and 12 control. Across all items there was 95.5% agreement, with a range of 79-100%.

Results

Math Equipment Test. A t-test was performed on both pre- and post-test MET scores. Results indicated a significant post-test difference between project (M = 9.90, SD = 2.1) and control course sections (M = 7.01, SD = 2.4), t(74) = 5.42, p < .001. The pretest difference was not significant [M = 6.91 (SD = 2.16) for project and M = 6.5 (SD = 2.27)] for control groups. Thus, project students became more knowledgeable about appropriate equipment and exercises for facilitating understanding of basic mathematical concepts.

Utility, Origin, and Applications of Mathematics Test. Results indicated that project stu- dents had a more thorough understanding of the multicultural origins of mathematics and its use in everyday life. A repeated measures multiple analysis of variance showed that the pattern of responses differed significantly between project and control groups, F( 1, 59) = 16.15, p < .01. Of the project group, 98% disagreed with the statement, “In their everyday lives and jobs, few people use very much mathematics,” compared to 62% of the control group [F( 1, 59) = 19.69, p < .01]. The third-grade Hispanic class question produced a varied pattern of group differ- ences. The largest difference was that 58% of the project group, compared to 15% of the control group, mentioned that mathematics has Hispanic origins [F(1 , 59) = 8.33, p < .01]. Another large difference was found for the general pedagogy category, with the control group (38%) responding more often than did the project group (10%) [F( 1 , 59) = 6.28, p < .01]. The second most frequent response by both groups, that mathematics was acultural, was mentioned by 30% of both the project and control groups. The least common answer, the need to do research, had no response from the control group, compared to 6% of the project group.

When asked to name a specific female mathematician, 23% of the project group and 8% of the control group did so. Another 38% of the project group made a generic reference to a female mathematician, compared to 8% of the control group. By point total for naming female mathe- maticians (possible range 0-2), the project group had a mean of 0.83 (SD = 0.78), compared to the control group mean of 0.23 ((SD = 0.60), F(1, 59) = 8.33, p < .05]. The project group listed more manipulatives [M = 2.13 (SD = I .70)] than did the control group [M = 0.39 (SD = 0.51), F(1, 59) = 1 3 . 2 2 , ~ < .01].

Summary of Results. Project students were more knowledgeable about appropriate equip- ment and activities for demonstrating mathematical concepts. They advocated for the everyday uses of mathematics, were more fluent about multicultural origins of mathematical progress, and listed more possible manipulatives for teaching mathematics than did control students.

Effective Instruction in Elementary School Mathematics

Overview

The purpose of the course was to examine issues, trends, and practices in elementary school mathematics programs. The course covered content; methods, materials, and foundations of learning and teaching. One instrument focused on students’ categorization and critique of mathematical lessons.

Page 11: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 159

Assessment Device

Alternative Evaluation (Cohorts I and 2) . The Alternative Evaluation was constructed for use in this evaluation; it measured students’ ability to evaluate mathematical lessons. In Part 1, students were asked to label four mathematical learning activities as being “essentially computa- tional or problem solving in nature.” In addition, students were asked to provide a rationale for their answers. For example, one activity asked children to write down a mathematical sum, hide it from view, and then find three addends from a magic square that would yield that sum. The winner was the child who found that sum first. One point was given to students for correctly labeling the activity; an additional point was given for formulation of an adequate rationale, such as noting that specific computational problems required algorithmic responses, and problem-solving items had answers that were not obvious.

In Part 2, subjects were shown four pictures of teachers and children engaged in educational activities. Students were asked to write a brief paragraph describing how each picture relates to student-teacher interactions. An interaction score was calculated from the number of pictures in which students identified the nature of the interaction between the characters (e.g., “The teacher encourages discussion”). An equity score was calculated from the number of pictures in which students mentioned an equitable or inequitable teaching strategy (e .g . , “Stereotypical sex roles are portrayed”).

Cohort 2 received a revised version that dispensed with Part 1, the computational-problem- solving portion, in favor of expanding part 2. Higher quality photo processing was employed that allowed for clarity of children’s gender, ethnicity, and disability status (for example, a student with Down’s Syndrome was photographed in a natural learning environment). As a negative example of student-teacher interaction, a worksheet of addition problems was in- cluded. An example of a learning situation that could be coded as inequitable included a picture of a girl standing in the background taking notes while two boys were doing a hands-on experiment. In total, the revised version included eight examples of learning situations. Students were asked, “What do you like about this lesson?” and “How might it be improved?’

Responses were categorized into one or more of seven instructional features. Up to two points were given per picture in each category (for the number of times they referred to each category). The seven categories included (a) equity (e.g., “Have the girl do the balance and the boy record”); (b) active engagement, such as manipulatives or hands-on experience (e.g., “The students are experimenting and investigating. They will be able to see the sun in relation to the earth-seasons, day, night”); (c) relevance or interest (e.g., “Students would consider it fun- and would be learning too”); (d) application of mathematics to natural phenomenon (e.g., “It relates to life”); (e) importance of specific content (e.g., “Learning the concept of money is very important”); ( f ) cooperative learning (e.g., “I really like the fact that the students are working in groups”); (g) general pedagogic responses (e.g., “It is important to keep their attention so they will grasp onto the concept”).

Interrater reliability for the revised version of Alternative Evaluation was calculated with 10 project and 10 control students. Two raters coding blinded membership had an average correla- tion across all seven items of r = .85 (range .71-.95).

Results

Alternative Evaluation. For Cohort 1, the computational-problem-solving score (from Part 1) and equity and interaction subtest scores (from Part 2) were subjected to separate one-way analyses of variance. Significant group differences were found only on the equity component. Project students [M = 0.60 (SD = 1.13)] identified more equitable and inequitable teach-

Page 12: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

760 McDEVI'M ET AL.

ing strategies in the pictures than did control students [M = 0.08 (SD = 0.28), F( 1 , 89) = 5.80,

For Cohort 2 , a multivariate analysis of variance was computed with group as as the between-subjects variable and criteria of evaluation as the within-subjects variable (there were seven levels to represent the categories described earlier). The pattern of responses between groups was different [F( 1, 6 ) = 2.45, p < .05]. Significant subtotal differences were again found only on the equity subtotal, with the project group [ M = 1.74 (SD = 2.20)j and control group [M = 0.50 (SD = 0.60), F(1, 70) = 6.45, p < .01]. There was a nonsignificant tendency for the control group responses to focus more on general pedagogic remarks.

For analysis on a more general level, three variables of specific interest to the project- equity, hands-on, and application to natural phenomena-were combined and analyzed as a composite. A t-test showed the composite number of responses to be significantly higher for the project group [M = 9.89 (SD = 4.50)] than for the control group [M = 7.23 (SD = 2.79), t( 1 , 61.8) = 3.04, p <.01] (separate-variance estimate was used).

p < .01].

Summary of Results. In Cohorts 1 and 2 , project students identified more equitable and inequitable strategies in representations of mathematics lessons. In Cohort 2 , project students focused more on project-related concerns (e.g., equitable instruction, hands-on activities). Control students tended to discuss general pedagogic issues more often.

Fundamentals of Mathematics 2

Overview

This course is a continuation of Fundamentals of Mathematics I , described previously. This course is not typically required for students seeking certification in elementary teaching unless they are also seeking their bachelor's degree in mathematics. The Cohort 2 control class was small because the curriculum was restructured at the time. One evaluation instrument focused on students' understanding of basic mathematical concepts, as well as their ability to formulate everyday examples of the concepts.

Assessment Device

Applications of Mathematics Concepts to Nutuml Phenomenon (Cohorts 1 and 2 ) . The Applications of Mathematics Concepts to Natural Phenomenon, designed for this project, con- sisted of five open-ended questions. Students were asked to describe or define scientific nota- tion, metric measurement, graphing, polyhedra, and direct and inverse variation. Each question contained a follow-up question asking for examples or applications of the concept.

A coding scheme was developed that represented component features inherent in each concept. For example, scientific notation responses accumulated one point if the answer in- cluded a reference to ease of computation, another for referring to exponents in general, another for specifically mentioning a power of 10, and full credit for the scientific definition: 1 5 x < 10, x 10". As another example, students accumulated one point for giving polyhedra defini- tions implying or stating that they were multiple-sided, one point for solid, one point for three- dimensional, and one point for noncurved surface.

Interrater reliability was calculated for the Applications of Mathematics Concepts to Natu- ral Phenomenon with 15 control and 15 project students. Two coders categorized responses

Page 13: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS' UNDERSTANDING 761

blinded as to group affiliation. On 35 items, overall interrater agreement was 94%, with a range of 88- 100%. Across the 30 subjects, interrater agreement was 95%, with a range of 83- 100%. Total scores between raters correlated at r = .99.

Results A within-subjects analysis of variance was computed with type of variable as the within-

subjects variable (10 levels, with five definitions of substantive variables, scientific notation, metric measurement, graphing, polyhedra, direct and inverse variation, and five variables reflecting the number of examples) and group (project, control) as the between-subject variable. Patterns of responses were different for the groups [F(9,58) = 5.01, p <.001 for Cohort 1, and F(9, 36) = 5.70, p <.001 for Cohort 21.

Follow-up t-tests were computed to clarify the domains in which the two groups differed. A subtotal was created of responses to the five substantive variables; a second subtotal was created of a number of examples and applications. Alphas indicated that the two composites represented fairly high degrees of internal consistency. Standardized item cis were computed (formula accounts for unequal variances of items) for Cohort 1 and 2 combined; the definition subtotal ci was .78, .71 for the example subtotal, and .83 for the entire instrument. For Cohort 1, a t-test showed differences on the definition subtotal between the project group [M = 11.56 (SD = 3.03)] and the control group [M = 7.15 (SD = 3.39), t(66) = 5.61, p < .001], and also for the example/application subtotal, with the project group [M = 11.56 (SD = 3.03)] and control group [M = 7.00 (SD = 4.06), t(66) = 4.56, p < .001]. For Cohort 2, there were also significant differences in the definition subtotal between the project students [M = 12.85 (SD = 3.57)1 and control students [M = 7.43 (SD = 3.41), t(44) = 3.72, p < .001], and also in the example/application subtotal, with the project group [M = 13.31 (SD = 5.78) and the control group [M = 4.57 (SD = 3.46), t(44) = 3.85, p < .001]. Analysis of individual items indicates that the differences between the Cohort 1 project and control groups were largest related to direct and inverse variation and scientific notation and smallest for metric measurement. For Cohort 2, differences were largest with direct and inverse variation and scientific notation and smallest with graphing and metric measurement.

Summary of Results. Project students defined basic mathematical concepts more accurately than did control students. They also provided more applications of concepts.

Educational Psychology

Overview

The Educational Psychology course was concerned with principles and classroom applica- tions of human development, learning, motivation, behavior, management, and educational testing and evaluation. One instrument focused on the criteria students mentioned when describ- ing the relevance of educational psychology concepts for teachers.

Assessment Device

Jane Doe (Cohort 1 only). The Jane Doe instrument was a one-item, open-ended question designed to assess a student's understanding of the relevance of educational psychology to teaching elementary students. Students were asked to imagine that they were a third-grade

Page 14: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

762 McDEVITT ET AL.

teacher who was approached by a colleague, Jane Doe, who was considering entering into a masters degree program. However, Jane Doe did not want to study educational psychology because the field “has absolutely no relevance” to teaching science, mathematics, or other subjects. Student were asked to either agree or disagree with Jane Doe and to justify their responses.

Responses to the Jane Doe instrument were coded based on whether they were arguments or counterarguments to Jane Doe’s statement. Counterarguments were further coded into one or more of 10 categories. Each of the categories represented various areas of contribution from educational psychology to teaching. Responses were coded into the following categories: (a) developmental stages or phases (“If you’re teaching something (division, for example) that they are not ready for, you will not be effective”); (b) general characteristics of students (‘‘I feel a teacher should know the background of her students and if they are acting abnormally in class, she might be able to decipher why”); (c) individual differences (“individual students”); (d) equity and multiculturalism (‘‘I’ve learned about inequities on tests, in books, everywhere, and how to overcome them”); (e) learning and cognition (‘‘ . . . it would help her learn how students think and retain information”); (f) instructional strategies (“It will help the teacher know how to construct her lesson so that students will learn the most they can in it”); (8) assessment techniques (“ . . . how to write . . . tests correctly”); (h) discipline, management, and rein- forcement (“The class will . . . help the teacher deal with discipline and classroom manage- ment’’); (i) motivation and self-concept (“The teacher should be aware o f . . . how to motivate the child”), and (j) miscellaneous values (“I think it’s very important to learn educational techniques from the standpoint of what works with children”). Explicit mention of science or mathematics was an additional category. Students received one point for each component mentioned. An interrater reliability of 91% agreement was obtained for the Jane Doe instrument (the range was 79.7-98.4%).

Results There were no significant differences between project and control students in the number of

counterarguments or the number of arguments offered. However, analysis of individual re- sponse categories revealed significant differences between project and control students for two specific categories. Project students, more than control course students, cited the importance of educational psychology in providing valuable information about instructional strategies (such as planning lessons, setting objectives, using different instructional techniques, and adopting a teaching style [X2 (1 , N = 64) = 4.95, p < .05]; 80.0% of the project and 52.6% of the control students mentioned this component) and assessment (such as monitoring progress of students and providing feedback to students about their performance) [X2 (1, N = 64) = 4.75, p < ,051; 37.8% of the project and 10.5% of the control students mentioned this component. Thus, although project and control students did not differ in the number of domains in which they saw the field of educational psychology as contributing to teaching, there were differences in specific domains. Specifically, project students were more convinced that they could apply information from the field to actions in the classroom, and they also believed that instructional strategies and assessment techniques should be based on research and theory. Other topics that did not dif- ferentiate the two groups focused on more theoretically based perspectives on teaching and learning.

Summary of Results. Project and control students did not differ in the number of domains of educational psychology they referred to in their analysis of relevancies to everyday teaching.

Page 15: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 763

However, project students did focus on topical areas that lend themselves directly to teacher applications (i.e., instructional strategies and assessment methods).

Physical Science Concepts for Elementary Teachers

Overview

This course was an investigation of basic physical science concepts, emphasizing their application to the physical world. Emphasis was on giving prospective teachers a general understanding of physical science. Evaluation instruments included two standardized tests: one measurement requiring students to interpret a graph, formulate predictions, and design an experiment, and an assessment of students’ analysis of physical science instructional activities.

Assessment Devices

Physical Science Exam (Cohort 1 Only, Pretest and Post-test). Cohort 1 took a 50-item multiple-choice examination consisting of items adapted from the Assurance Test Bank (Assur- ance Incorporated, 1984); the instrument was originally developed as a pool of eighth-grade physical science items. The Physical Science exam was intended to serve as a broad survey of factual knowledge in this domain. Some items were corrected or slightly altered. The test achieved an OL of .63.

Investigations in Physical Science (Cohorts 1 and 2). Investigations in Physical Science was designed for the project to assess students’ understanding of certain concepts of physical science, as well as their ability to design experiments in this area (slight improvements were made in the instrument and coding scheme between Cohort 1 and 2 administrations; the revised versions are reported here). The Investigations in Physical Science consisted of three parts. The first was a distance-time graph of two cars. Students were asked to compare the speed, acceleration, and distance of two cars. Students received up to two points for the distance comparison if they addressed distance traveled or noted that distance was linearly related to time for Car 1 but not Car 2 (e.g., “Car 1 went farther”). For speed, students received two points if they explicitly compared the speed of the cars (e.g., “Car 2 traveled at a greater speed than Car 1”). We awarded students two points for acceleration if they explicitly addressed this dimension (e.g., “Car 1’s speed is constant; Car 2’s speed is changing”). Students were awarded partial credit for responses that alluded to these concepts but did not articulate them directly.

With the second part, students were asked to make predictions concerning a circuit drawing of a voltage source and three lightbulbs, with the third lightbulb being shorted out. Students were asked to assess three consequences of the circuit’s being shorted out: specifically, “What happens to the brightness of Bulb 3? Why?’ and “What happens to brightness of Bulbs 1 and 2? Why?” Students received up to two points for noting that Bulb 3 dimmed or went out. They received up to three points for noting that electricity follows the path of least resistance as a reason for their predictions. They also received up to two points for noting that the brightness of Bulbs 1 and 2 would increase and an additional three points for noting that the current would increase. The third part of the Investigations in Physical Science asked students to complete the design of an experiment concerning the molecular weights of two gases, and then to hypothesize the results. In the first version of the instrument (for Cohort l) , students were also asked to design an instrument.

Page 16: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

164 McDEVITT ET AL.

Interrater reliability was calculated for the I988 version of the Investigations in Physical Science from a sample of 31 students: 15 control and 16 project. There were two coders, each coding the sample blinded as to group membership. On 25 items, overall interrater agreement was 96.5%, with a range of 84-100%. Across the 31 subjects, interrater agreement was 96. I % , with a range of 83-100%. Correlation of total scores between raters was r = .98.

Classroom Activity Instrument (Cohort 2 Only). The Classroom Activities instrument as- sessed students’ beliefs about the appropriateness of predefined instructional activities for ad- dressing specific constructs. The instrument consisted of 16 classroom activities, such as mak- ing fudge or demonstrating a Van de Graff generator, from which the students were to choose the single best and worst activities for enhancing children’s understanding of specific constructs. Eight of the examples were ones project and control students had seen in class; eight were novel activities.

In addition, the Classroom Activity instrument assessed students’ ability to recognize concepts in applied settings. Students were asked to indicate which concepts were addressed by the activities. A total of 13 constructs were listed along the top of the instrument: mass, weight, acceleration, electricity, saturation, states of matter, buoyant force, solubility, crystal formation, density, centripetal force, chemical reaction, and gravity. A pool of experts in physical science education (2 professors, 1 classroom teacher, and 1 graduate student) was formed to substantiate identification of good and poor teaching activities and the appropriate knowledge level re- sponses.

For each individual classroom activity, the total score was calculated by summing one point for each correctly identified concept minus a half point for each concept erroneously identified. The Classroom Activity instrument was baited with two non-hands-on activities-a work- sheet on Boyle’s law and a lecture on static cling-with the expectation that many students would list one as the worst classroom activity. Correct best activities were taken from a larger pool of answers including mixing baking soda and vinegar, rubbing a balloon on one’s head and sticking it on the wall, and the Van de Graff demonstration. For the knowledge portion of the Classroom Activity instrument, (Y was .91.

National Assessment of Educutional Progress: Physics and Chemistry Portions Only (Co- hort 2 Only, Pretest and Post-test). Cohort 2 completed an examination consisting of 28 items taken from the National Assessment of Educational Progress test. Items focused on physics and chemistry and were selected from the secured and public release items for Grades 11 and 12 (National Assessment of Educational Progress, 1986). Items were selected based on their relevance to the concepts typically taught in the course. The internal consistency of this instru- ment could not be measured because of sizable variations in item difficulty level.

Results

Physical Science Exam. Group differences were found only on the pretest [F( 1, 23) = 12.34, p < .OOl] . The mean for the project group pretest was 27.02 (SD = 5.27), with M =

23.92 (SD = 4.3) for the control group. Post-test scores were as follows: projectM = 34.60 (SD = 4.87); control M = 34.09 (SD = 4.96).

Investigations in Physical Science. Cohort 1 project students had an overall mean of 14.53

Page 17: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 765

(SD = 4.20), whereas Cohort I control students had an overall mean of 6.04 (SD = 5.58). This difference was significant [F(l , 89) = 65.99, p < .001]. Differences were found for all subtotals, with statistics ranging from F(1, 89) = 7.00 to 85.56, with all p s < .01. The largest difference was on the design an experiment subtotal, with the project group scoring M = 5.67 (SD = 1.71), and the control group scoring M = 1.27 (SD = 2.28).

On the revised version of the Investigations in Physical Science, an analysis of variance showed a smaller but significant difference between Cohort 2 project and control students [F( 1, 115) = 9.80, p < .002]. Cohort 2 project students scored a mean of 9.60 (SD = 3.63); control students scored a mean of 7.43 (SD = 3.52). No significant group differences were obtained for the graph subtotal.

Classroom Activities Instrument. Of the project group, 5 1 % correctly identified a worst answer, compared to 35% of the control group. A one-tailed t-test showed this to be a significant difference [t(l16) = 1.71, p <.05]. The scores on the best answer-project = 22%, control = 13%-were not significantly different. Significant differences were also found for total score for knowledge [t(l14) = 2.71, p <.01], with the project group scoring M = 15.09 (SD = 4.48), and the control group scoring M = 12.17 (SD = 6.02) (maximum score was 35). Surprisingly, both groups had better knowledge scores for novel activities than for activities they had seen in class. In both situations, the project group scored significantly higher than the control group. For novel stimuli the project mean was 8.54 (SD = 3.33), the control mean was 7.13 (SD = 3.56), t(l14) = 2.08, p < .02. For in-class stimuli, the project mean was 6.55 (SD = 1.86); the control mean was 5.11 (SD = 3.13), t(l13) = 2.70, p < .01.

National Assessment of Educational Progress. No group differences were found on either the pretest or the post-test. The mean for the project group pretest was 14.60 (SD = 3.55), with M = 14.18 (SD = 3.12) for the control group. Post-test scores were as follows: project M = 17.78 (SD = 3.42); control M = 17.25 (SD = 2.79).

Summary of Results. There were no group differences at post-test phases for either of the standardized achievement tests (although project students did score higher on the pretest of one of the achievement tests). Project students did perform at higher levels in designing an experi- ment; assessing consequences of a circuit being shorted out; and comparing two vehicles in terms of distance traveled, speed, and acceleration (Cohort 1). More project students than control students agreed with experts about worst instructional activities for teaching specific physical science concepts. Project students were also more aware of appropriate links between instructional activities and underlying concepts.

Teaching Science in the Elementary School

Overview

This course examined objectives, methods, and materials needed for teaching elementary science. Students were taught to combine science with other curriculum areas; to analyze and apply a variety of teaching strategies; to manage time and materials effectively; and to analyze, modify, and teach curriculum materials and science lessons. One instrument required students to appraise narratives of science lessons.

Page 18: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

766 McDEVI7T ET AL

Assessment Device

Teaching Science in Elementaty School Instrument (Cohorts 1 and 2 ) . The Teaching Sci- ence in Elementary School instrument was an open-ended instrument developed for this project to assess students’ ability to evaluate a lesson in terms of the teaching strategies employed. We provided respondents with summaries of two classroom lessons and asked them to evaluate the instruction. The first scenario is a three-page protocol of a 12-minute lesson on how plants grow. In this lesson to a second-grade class, Mr. Diaz begins by writing 1 I concepts on the board (e.g., root, soil, germinating seed). He maintains a teacher-dominated recitation mode. When not addressing the entire class, he asks individual boys (only) to respond. He interrupts children who do not respond with a fully correct answer. Students do not have an opportunity to engage in hands-on activities or practice investigative skiffs. The following represents an excerpt from the lesson:

Mr. Diaz: Some plants need seeds and some don’t. There are two types of plants that make seeds: flowering plants and conifers. With flowering plants, the flowers make the seeds. Flowers also make fruit, and the fruit protects the seeds. Another kind of plant that makes seeds is conifers. Conifers make seeds inside cones. Does anyone know of the name of a plant that makes seeds inside cones?

Julie: Roses make-Mr. Diaz (interrupting her): Roses are flowering plants, Julie. Mike, do you know?

Mike (trying): Pine cone trees? Mr. Diaz (nodding his head): Yes, Mike, pine trees make pine cones. They are a type of conifer. . . .

Mr. Diaz plans to drill the children on these concepts and then will switch topics to animals (cold-blooded and warm-blooded vertebrates and animal behavior).

The second scenario is a three-page account of 22-minute lesson on the effect of light on the appearance of plants. Mr. Montoya, also working with a second-grade class, wants the children to understand the effect of light on the appearance of plants (e.g., color, height, number of leaves). He begins the lesson by asking children about what plants need to grow. He accepts a variety of answers and then demonstrates planting small geranium stalks. In discussing where they should place their plants to determine the effects of light on plant growth, both boys and girls offer suggestions. Mr. Montoya asks the children to return to their groups of three and to plant geranium plants into two cups. Although he offers guidance, the children have flexiblity in choosing locations. Mr. Montoya plans to ask the children to chart the growth of the plants over a 3-week period. He will ask them to offer possible explanations for their findings and has plans to extend their understandings.

Thus, each scenario depicts the interactions of the class and teacher. Students were asked to list the best and worst aspect of each lesson. Students were given one point for each aspect mentioned. Tallies were kept in each of the following categories: (a) equitable or inequitable teaching strategy (e.g., “He only involves a few in the lessons,” “He used gender-free lan- guage”), (b) wait time or interruption (e.g., “He interrupts,” “He waited before responding”), (c) use of hands-on activities or lack thereof (e.g., “He allowed no active hands-on participa- tion,” “He engaged students in experimentation”), (d) teacher encouragement of meaningful learning or rote instruction (e.g., “He lectures too much,” “He made the lessons meaningful for students”), (e) teacher feedback to elicit positive or negative student emotion (e.g., “He intimi- dated the students,” “He gave the students a feeling of accomplishment and self-confidence”), (f) cooperative or noncooperative learning strategy (e.g., “There was too much whole-class work,” “He allowed students to work together”), (g) teacher encouragement or discouragement of student ideas and participation (e.g., “There was no encouragement for student involvement

Page 19: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 167

or participation,” “He had the students give lots of examples”), (h) lengthy or unhurried lesson (e.g.. “The lesson was hurried,” “The lesson proceeded at a good pace”), and (i) general effective or ineffective teacher characteristics (e.g., “He repeats himself too many times,” “He kept lessons well structured”). Interrater reliability was calculated from a sample of 13 project students and 7 control students from Cohort 2. Two raters, coding the sample blinded as to group membership, had an average of 85% agreement across all nine categories, with a range of 75- 100%.

Results For Cohort 1 , an analysis of variance showed significant group differences in total number

of responses [F(1, 59) = 12.92, p < .001]. The project group [M = 9.55 (SD = 2.18)] mentioned more good and bad aspects in the teaching scenarios than did the control group [A4 = 7.71 (SD = 1.72)]. We also found group differences for Cohort 2 in total number of responses [F(1, 121) = 38.27, p < .001]. The project group [M = 9.79 (SD = 2.18)] mentioned more good and bad aspects in the teaching scenarios than did the control group [M = 7.31 (SD = 2.26)].

Cohort 1 showed individual category differences on the number of comments related to equity [t(59) = 2.88, p < .01] student emotion [t(59) = 2.42, p < .05] and general effective characteristics [t(59) = 2.36, p < ,051. In each case, the project group made more comments than did the control group. Cohort 2 showed differences on equity [t(60) = 4.12, p < .001], wait time [t(60) = 3.05, p < .01], meaningful learning [t(60) = 2.06, p < .05], cooperative learning [t(60) = 3.16, p <.01], and general effective characteristics [t(60) = 3.06, p <.01]. Again, in each case the project group made more comments than did the control group.

Summary of Results. Project students were more comprehensive in their analyses of the teaching vignettes. Project students in Cohorts 1 and 2 more often mentioned positive and negative features relevant to equity. There were other criteria that favored project students, but the specific dimensions differed across Cohorts 1 and 2.

Biological Science Concepts for Elementary Teachers

Overview

This course was an investigation of basic biological concepts through lecture, discussion, and laboratory investigations. One instrument included a series of questions about students’ selection of biological concepts to teach to children and their appreciation of the personal relevance of biological concepts to everyday lives.

Assessment Device

Teaching Biology Questionnaire (Cohorts I and 2). The Teaching Biology questionnaire, developed for the purpose of this project, consisted of open-ended questions assessing students’ ability to apply and integrate their understanding of biological concepts in the elementary classroom. Question 1 asked the subjects to describe two biology topics they would select for an imaginary third-grade class. They were then asked to defend the appropriateness of their choices. Students were asked in Question 2 to describe and defend the most effective ways for elementary students to learn about concepts of biology and their applications. Question 3 asked

Page 20: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

768 McDEVITT ET AL

what resources might be consulted in planning biology lessons in the elementary classroom. Two additional questions addressed how biology was personally relevant to students.

On the first question, reasons given for choosing topics were assessed. One set of categories was specifically relevant to the focus of the project section as well as the general objectives of the project: (a) students can personally relate to the topic (e.g., “Children are interested in their own bodies”), (b) the topic lends itself to hands-on activities (e.g., “Children can test their own senses”), (c) the topic can lead to behavioral change in students (e.g., “If they know about AIDS, they can act reasonably and prudently”). Other categories included: (d) students need to know and understand this topic (e.g., “Senses are important”), (e) it gives the student an understanding of the world around us (e.g., “All life is based on the cell”), (f) references to the age level of students (e.g., “It would be at their level of ability”), and (g) miscellaneous comments (e.g., “I understand the topic”).

Students’ responses to Question 2 focused on the most effective methods for teaching biology to elementary children. Categories were as follows: (a) hands-on activities and explora- tion (hands-on experiments and activities), (b) learning-cycle methods (“The most effective way is through learning cycle methods”), (c) cooperative group interactions (e.g., base groups), (d) multimedia presentations (e.g., viewing pictures and slides), (e) other formats that encourage active involvement (besides those listed earlier, such as discussions), (f) lecture and explanation (e.g., lecture to introduce concepts and ideas), and (8) miscellaneous comments (e.g., taking notes).

Students’ responses to why they selected a particular method were coded into one or more of the following categories: (a) the experience builds on prior knowledge (e.g., “It helps them build connections”), (b) the experience stimulates children’s curiosity, motivation, and involve- ment (e.g., “Children find experiments fun”), (c) students need a variety of modalities (e.g., “They need something they can see, feel, and experience”), (d) the experience is the best way to remember the material (e.g., “Children remember what they do more than what they are told”), (e) it ensures high-level learning and understanding (e.g., “It makes the topic more real and understandable”), and ( f ) miscellaneous comments (e.g., “They can answer their own ques- tions”).

Simple frequency counts were taken in responses to Question 3, regarding the number of resources students mentioned. In response to Question 4, frequencies were calculated on wheth- er responses suggested behavioral change (e.g., calories in nuts: “I’ve learned how many calories there are in nuts, and now I eat fewer nuts”), concern for the world (e.g., greenhouse effect: “It will affect the future of this planet”), concern for self (e.g., reproductive system: “I learned exactly how my menstrual cycle works”), and also for nonspecific answers (e.g., ecosystems: “help me understand how everything stays in balance”).

For the first two questions and their follow-ups, interrater reliabilities were determined separately. Two raters, coding blinded as to group membership, from equally split project- control samples of 30 to 40 students had overall agreement across 21 items of 93%, with a range of 75-100%. Question 3 was a simple frequency count. For Question 4, frequency counts within categories and across subjects were correlated between the same two raters, yielding a mean r of .85 (range .75-.90).

Results For the first question, concerning the appropriatencss of a chosen topic, the most frequently

cited reason for both Cohort I project and control groups was that students needed to know or understand the topic (project = 64.3%, control = 59.6%). The only group difference was that

Page 21: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 769

the chosen topic would lead to behavioral change in the student; 59.5% of project students and 26.9% of control students mentioned this characteristic X2 (1, N =72) = 10.17, p < .01]. Other answers having more than 25% response included “Students can relate to the topic” (project = 40.5%, control = 5 1.9%) and mentioning that the topic was age appropriate (control = 34.6%).

The patterns of responses for Cohort 2 were somewhat different. Again, there was only one group difference; this time, however, it was for the students need to know response: project = 58.3%, control = 79.5%, X2 (1, N = 72) = 3.94, p < .05. One similarity between Cohorts 1 and 2 was that the students need to know response was the most common answer for both control and project groups. The Cohort 2 project students had more than 25% response to all but the hands-on response. The Cohort 2 control students had less than 25% for age appropriate and hands-on.

For the question related to most effective ways of teaching biology to elementary children, Cohort 1 had one definitive response: hands-on (88.1 % of the project and 94.2% of the control students mentioned this characteristic). All other responses had less than 25% response, al- though two-cooperative learning and the learning cycle-had significant group differences, with the project group responding more frequently in both cases [respectively, X2 (1, N = 72) = 6.35 and 5.17, with both p s < .05]. For Cohort 2, the definitive answer was again hands-on, although this time there was a group difference: 97.2% of the project students and 64.1% of the control students offered this criterion X2 ( I , N = 72) = 12.8, p <.01]. The control group had one other response over 25%: multimedia. The control group response rate of 43.6% for multimedia was significantly higher than the project response rate of 13.9% [X2 (1, N = 72) = 7.97, p < .01]. No other answers received more than 25% response.

Regarding the question of why the chosen methods are effective, the category of experience improves memory received a response from 39.0% of the Cohort 1 project students, and 55.8% of the Cohort 1 control students. The higher learningimeaning evolves category received a response from 41.5% of the Cohort 1 project students and 30.8% of the Cohort 1 control students. The only significant difference for Cohort 1 was for the answer that students relate to the topic; this response was offered by 14.6% of the project students but only 1.9% of the control students X2 ( I , N = 72) = 5.3, p < .02]. Cohort 2 differed significantly on the higher learning/meaning evolves response; 58.3% of the project students and 27.8% of the control students provided this explanation X2 (1, N = 72) = 6.85, p < .01]. Cohort 2 project and control students had high response rates in the experience improves memory category (project = 47.2%, control = 50.0%). Both Cohort 2 project and control students also had more than 25% response rate in the behavioral change category (project = 27.8%, control = 36.1%).

A repeated measures multiple analysis of variance was used to analyze Questions 3 and 4. Cohort I project students listed significantly more resources to be consulted [M = 4.29 (SD = 2.1 I ) ] than did the Cohort 1 control students [M = 3.27 (SD = 1.75), F(1, 92) = 6.52, p < .01]. The same was true for Cohort 2 project [M = 4.56 (SD = 1.92)1 and control [M = 3.18 (SD = 1.62), F(1, 73) = 11.32, p < .001]. The Cohort 1 project students also listed signifi- cantly more personally relevant topics [M = 8.24 (SD = 2.51)] than did the Cohort 1 control students [M = 5.12 (SD = 2.49), F(1, 92) = 3 6 . 3 8 , ~ < .001]. Again, the same was true for the Cohort 2: project M = 8.36 (SD = 3.16); control M = 3.54 (SD = 1.67), F(1, 73) = 69.09, p < ,001.

Summary of Results. Reasons for selecting particular biological topics varied across Co- horts 1 and 2. For Cohort 1, project students more often articulated the value of identifying

Page 22: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

770 McDEVITT ET AL.

topics that lead to behavioral change. For Cohort 2, control students more often selected topics because of the general need to know basis. In their analyses of the most effective ways of learning biological concepts and applications, Cohort 1 project students more often mentioned cooperative learning and learning cycle methods than did control students. For Cohort 2 , project students more often mentioned hands-on activities, and control students more often mentioned multimedia activities. In response to the question about why they chose particular methods, Cohort 1 project students more often mentioned students’ tendency to relate to the topic. Cohort 2 project students more often discussed issues associated with meaningful learning. Finally, project students cited more resources when teaching biology, and they listed more biological concepts that were personally relevant to them.

Discussion

Investigative Proficiencies and Conceptual Understandings

In general, the results of this investigation suggest that project courses were superior to control courses in enhancing students’ investigative capacities. Effects seemed to be particularly strong on tasks that required students to design experiments and interpret data. For example, results from the earth science course revealed large differences in favor of project students in designing experiments and interpreting graphs. Project students more often designed experi- ments with ideal features such as controlled variables, defined procedures, and proper use of equipment. They also made more accurate interpretations about graphs of heat absorption and radiation. Similarly, in the physical science course, large differences were obtained in students’ interpretations of graphs about speed and acceleration and in their formulation of experiments. As we expected, project students also displayed more refined knowledge of mathematical concepts. For instance, they received higher scores on their ability to define mathematical concepts (e.g., scientific notation, metric measurement). They were also more fluent in the number of examples and applications they provided of these concepts.

Beliefs about Efective Methods of Teaching Science and Mathematics

Project students more consistently referred to the need to teach in an equitable fashion, and were more knowledgeable about contributions of women and minority groups in the progress of science and mathematics. In the initial mathematics course, project students more often articu- lated specific innovations in mathematical reasoning formulated by Hispanic and Meso- American Indians. In the mathematics methods course, they more often made comments about equitable and inequitable features of mathematics lessons. Similarly, they more often mentioned concerns with equity in their critique of narrative science lessons during the science methods course.

In addition to pedagogic concerns with equity, project students frequently specified the values of hands-on activities and investigative learning strategies. They were more knowledge- able about appropriate uses of manipulatives in the first mathematics course, they more often mentioned hands-on strategies in the mathematics methods course, and they revealed greater appreciation for selecting material that related to children’s everyday actions and interests in the biological science course. In contrast, control students more often expressed general pedagogic concerns. This difference in specificity was also borne out in the educational psychology course. Project students focused on specific instructional and asessment strategies, whereas both groups mentioned theoretical advances.

Page 23: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 77 I

Performance on Standardized Achievement Tests

There were no differences between project and control students on the standardized multiple-choice tests. Of the three such instruments we administered (one in the earth science course and two in the physical science course), there were no differences between project and control students at the end of the semester. We believe that this is a noteworthy finding given the difference between project and control courses in instructional design. Project courses routinely required more laboratory work and less lecture. Control students, therefore, had the advantage of more lecture time (if more survey coverage in and of itself can be construed as an advan- tage!). Even so, project students seemed to maintain ground on achievement tests (although we cannot conclude that our failure to obtain significant differences definitively indicates equiva- lence). It is possible that project students developed broad perspectives on particular areas of science through required outside readings. Another possibility is that project students did well on clusters of items that addressed the more focused topics developed in class, and did poorly on other types of items. Yet another interpretation is that focused instruction in project courses fostered insights that aided reasoning on particular types of multiple-choice items. One of the long-standing criticisms of standardized tests is that their global scores do not furnish us with detailed information about specific competencies.

Discrepancies between Alternative Assessments and Standardized Tests

Questions arise as to why the instruments developed specifically for this project produced results that were consistently different from those of the standardized tests. The consistency with which sets of individual alternative assessments produced superior performance on investigative skills and reform-minded instructional strategies can be juxtaposed against the failure of the standardized instruments to reveal differences. Although the standardized instruments we em- ployed in this evaluation are arguably among the best on the market for secondary and introduc- tory college-level science, they were not sensitive to the skills and conceptual understandings targeted in this project. These results underscore the need for refinement of assessment of teacher preparation experiences. In fact, needs are urgent in this area, because states, univer- sities, and school districts are currently struggling with ways to assess teachers for certification, licensure, and promotion. For example, Colorado recently mandated in House Bill 91- 1OO5 (Educator Licensure Act) that the Colorado Teachers and Special Services Professional Stan- dards Board define the outcomes and evaluation procedures associated with the preparation of teachers and special service providers in Colorado schools and develop requirements for three levels of licensure: provisional, professional, and master. The challenges facing such boards are enormous. They must determine ways to assess how well school professionals accomplish tasks as disparate as accommodating diverse learners, facilitating children’s learning of subject mat- ter, communicating effectively, and devising valid and sensitive assessment techniques. Al- though standardized instruments can measure some aspects of these competencies, we contend that alternative assessments should supplement the pool. Alternative assessments are partic- ularly vital in subject matter teaching. In science teaching, for example, teachers must know how to design experiments and interpret graphs themselves before they can instruct children on how to do so. Finding ways to measure these and other subject-specific competencies will be a time-consuming but consequential investment for educators and evaluators. The high-stakes uses of such assessments make it critical to obtain solid evidence about their validity and reliability (cf. Morison, 1992). Our experience indicates that teams of individuals with expertise in science and mathematics content and methods, educational psychology and assessment, and

Page 24: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

772 McDEVITT ET AL.

school-based teaching and learning are needed to construct and validate these assessment devices.

Evaluating Preparation of Teachers in Science and Mathematics

The results of this evaluation indicate that participation in a model program led preservice teachers to develop relatively high levels of investigative proficiency and conceptual under- standing, as well as beliefs about effective methods of teaching science and mathematics that are consistent with many calls for reform. These findings are compatible with the results of other evaluation data related to the project that we summarized earlier. Although many threats to validity cannot be ruled out in this evaluation, the consistency of the results is impressive. Questions about long-term impact arise, and we are presently investigating the program’s graduates as they undertake student teaching and their initial full-time classroom assignments. Of particular interest to us is their level of resilience to traditional views about teaching and learning in science and mathematics. Our hope is that they will take leadership roles in systemic movements to improve instruction in science and mathematics.

Recommendations for Elementary Teacher Preparation

It is not possible to isolate with certainty the specific features of the program that were responsible for its apparent positive impact. Further analysis and debate about innovative approaches are vital to improvement of teacher preparation in science and mathematics. Amidst such debate, however, something must be done to improve existing programs. We hold this perception about the need for changes nationwide, as well as the need for progress at our own institution, where issues associated with long-term institutionalization present enormous chal- lenges.

With a sense that much remains undone, we close this article with a summary of recom- mendations for teacher preparation proposed in 1992 by senior staff associated with the Preser- vice Project (Aas et al., 1992). One of their fundamental tenets was that we must model effective teaching strategies in our preservice courses. In addition to serving as powerful models, professors have the opportunity to challenge students’ misconceptions and deepen their understanding of scientific and mathematical concepts. Toward the end of effective modeling, project faculty members found it worthwhile to collaborate closely with experienced teachers in course planning, delivery, and revision. In terms of instructional format, they endeavored to engage students with hands-on and minds-on activities. Other strategies included cooperative learning, systematic attention to the existence of inequities in science and mathematics teaching, and curriculum coordination and integration between science and mathematics content and methods courses and related education courses. As we search for guidelines to improve teacher preparation, standards established by professional organizations such as the National Council of Teachers of Mathematics and the National Science Teachers Association deserve our attention. Mechanisms for campuswide discussion of such standards need to be established. Other recom- mendations from senior staff focused on integrating perspectives gained through field experi- ences, extending the influence of experienced classroom teachers to other teacher preparation initiatives, providing appropriate instructional environments for delivery of pre-service content, pedagogy, and other education courses (e.g., in terms of class size, nature of teaching environ- ment, and scheduling procedures), rewarding faculty members for participating in intense and sustained efforts related to curricular reform, considering the benefits of establishing a particular order for a subset of the teacher preparation courses, and capitalizing on the value of a new

Page 25: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 773

course developed specifically for the purposes of the project (Equity Issues in a Technological Society).

Undoubtedly, these recommendations need to be adapted and supplemented for consider- ation at other institutions. Other factors must be taken into account at individual institutions, including the complexity of the campus environment; the nature of its teacher preparation programs; beliefs about teaching and learning held by participating faculty members; the num- ber of faculty members able to devote time to revising courses and administering innovative programs; the dynamics of decision making for curricular revisions; administrative support for such initiatives; the history of collaboration with local school systems; and mutual respect and trust developed between faculty members specializing in science, mathematics, and education. Despite the many instantiations to reform that are likely to arise, one common factor is essential to all changes: appropriate evaluation plans. Yet, many visions exist about appropriate evalua- tion strategies, and more time needs to be invested in discussing how we should evaluate transformations to our teacher preparation programs.

This study was funded in part by a grant from the National Science Foundation (NSF) (TEI-875 1476) to the University of Northern Colorado. The views in this report do not necessarily reflect the position or policy of NSF, and no official endorsement by NSF should be inferred. The authors thank the following individuals for their assistance with the project and its evaluation: Wallace Aas, Wilbur Bergquist, Kathy Cochran, Mark Constas, Tina Danahy, Clay Gorman, Jay Hackett, Alice Horton, M. Lynn James, Ivo Lindauer, Chuck McNemey, Gayle Munson, Jeanne Ormrod, Rick Silverman, and Matt Smith.

References

Aas, W., Adams, D., Alcorn, J., Cochran, K., Constas, M., Gardner, A., Hackett, J., Heikkinen, H., James, M. L., Lindauer, I . , McDevitt, T., McNerney, C., Ormrod, J., & Silverman, F. (1992, February). Recornmendations for teacher preparation: An interim report from UNC’s Pre-Service Elementary SciencelMathematics Project. Unpublished manuscript.

Ambrosio, A.L., McDevitt, T.M., Gardner, A.L., & Heikkinen, H.W. (1991, August). Factors related to equitable teaching: Implications for an equity issues course. Paper presented at the annual meeting of the American Psychological Association, San Francisco, CA.

American Association for the Advancement of Science (1989). Project 2061: Science for all Americans. Washington, DC: Author.

Assurance Incorporated. (1984). Assurance Test Bank (1984). Tuscon, AZ: Author. Baker, E.L. (1988). Can we fairly measure the quality of education? NEA Today, 6, 9-14. Baker, E.L. (1990). Developing comprehensive assessments of higher order thinking. In G.

Kulm (Ed.), Assessing higher order thinking in mathematics (pp. 7-20). Washington, DC: American Association for the Advancement of Science.

Ball, D.L., & McDiarmid, G. W. (1990). The subject-matter preparation of teachers. In W.R. Houston (Ed.), Handbook of research on teacher education: A project of the Association of Teacher Educators (pp. 437-449). New York: Macmillan.

Baron, J.B. (1987). Evaluating thinking skills in the classroom. In J.B. Baron & R.J. Sternberg (Eds.), Teaching thinking skills: Theory and practice (pp. 221-247). New York: WH Freeman.

Baron, J.B. (1991). Performance assessment: Blurring the edges of assessment, curricu- lum, and instruction. In G. Kulm & S.M. Malcom (Eds.), Science assessment in the service of reform (pp. 247-266). Washington, DC: American Association for the Advancemenr of Sci- ence.

Page 26: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

174 McDEVI'IT ET AL.

Berryman, S.E. (1983). Who will do science? New York: Rockefeller Foundation. Brown, A.L., & Campione, J.C. (1991). Interactive learning environments and the teach-

ing of science and mathematics. In M. Gardner, J.G. Greeno, F. Reif, A.H. Schoenfeld, A. diSessa, & E. Stage (Eds.), Toward a scientific practice of science education (pp. 11 1-139). Hillsdale, NJ: Erlbaum.

Brown, S.I., Cooney, T.I., & Jones, D. (1990). Mathematics teacher education. In W.R. Houston (Ed.), Handbook of research on teacher education: A project of the Association of Teacher Educators (pp. 639-656). New York: Macmillan.

Callister, J.C., & Mayer, V.J. (1988). NSTA's New Earth Science Test. The Science Teacher, 55, 32-34.

Champagne, A.B., & Bunce, D.M. (1991). Learning theory-based science teaching. In S.M. Glynn, R.H. Yeany, & B.K. Britton (Eds.), The psychology of learning science (pp. 21- 35). Hillsdale, NJ: Erlbaum.

Constas, M.A. (1992). Qualitative analysis as a public event: The documentation of catego- ry development procedures. American Educational Research Journal, 29, 253-266.

Cronbach, J.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.

Dempster, F.N. (1993, February). Exposing our students to less should help them learn more. Phi Delta Kappan, 74, 433-437.

Eylon, B., & Linn, M.C. (1988). Learning and instruction: An examination of four re- search perspectives in science education. Review of Educational Research, 58, 25 1-301.

Galluzzo, G. R. ( 1986). Teacher education program evaluation: Organizing or agonizing? In J. D. Raths & F. G. Katz (Eds.), Advances in teacher education (Vol. 2, pp. 222-237). Norwood, NJ: Ablex.

Galluzzo, G.R., & Craig, J.R. (1990). Evaluation of preservice teacher education pro- grams. In W.R. Houston (Ed.), Handbook of research on teacher education: A project of rhe Association of Teacher Educators (pp. 599-616). New York: Macmillan.

Gardner, A.L., McDevitt, T.M., & Constas, M. (April, 1990). Weaving a network of early support for girls in science: Empowering elementary teachers. Paper presented at the annual meeting of the National Association for Research in Science Teaching, Atlanta, GA.

Gardner, A.L., McDevitt, T.M., & Constas, M. (1991, March). Empowering elementary teachers to develop science skills and interests among members of underrepresented groups. Paper presented at the annual meeting of the National Science Teachers Association, Houston, TX .

Glynn, S.M., Yeany, R.H., & Britton, B.K. (1991). A constructivist view of learning science. In S.M. Glynn, R.H. Yeany, & B.K. Britton (Eds.), The psychology of learning science (pp. 3-19). Hillsdale, NJ: Erlbaum.

Heikkinen, H.W., McDevitt, T.M., & Stone, B.J. (1992). Classroom teachers as agents of reform in university teacher preparation programs. Journal of Teacher Education, 43, 283-289.

Hurd, P.D. (1993). Comment on science education research: A crisis of confidence. Jour- nal of Research in Science Teaching, 30, 1009- 101 1.

Jones, L.V. (1988). School achievement trends in mathematics and science, and what can be done to improve them. In E.Z. Rothkopf (Ed.), Review of Research in Education, 15, 307- 34 I . Washington, DC: American Educational Research Association.

Kahle, J.B. (1990). Real students take chemistry and physics: Gender issues. In K. Tobin, J.B. Kahle, & B.J. Fraser (Eds.), Windows into science classrooms: Problems associated with higher-level cognitive learning (pp. 92- 134). London: Falmer Press.

Kuhm, G., & Stuessy, C. (1991). Assessment in science and mathematics education re- form. In G. Kulm & S.M. Malcom (Eds.), Science assessment in the service of reform (pp. 71- 87). Washington, DC: American Association for the Advancement of Science.

Page 27: Evaluating prospective elementary teachers' understanding of science and mathematics in a model preservice program

EVALUATING TEACHERS’ UNDERSTANDING 115

McDevitt, T.M., Heikkinen, H.W., Alcorn, J.K., Ambrosio, A.L., & Gardner, A.L. (1993). Evaluation of the preparation of teachers in science and mathematics: An assessment of pre-service teachers’ attitudes and beliefs. Science Education, 77, 593-6 10.

Morison, P. (1992). Testing in American schools: Issues for research and policy. Social Policy Report of the Society for Research in Child Development, 6 , 1-25.

Murnane, R.J., & Raizen, S.A. (1988). Improving indicators of the quality of science and mathematics education in grades K-12. Washington, DC: National Academy Press.

National Assessment of Educational Progress ( 1986). Science Assessment. Princeton, NJ: National Assessment of Educational Progress.

National Council of Teachers of Mathematics ( 199 1). Professional standards for teaching mathematics. Reston, VA: Author.

National Commission on Excellence in Education (1983). A nation at risk: The imperative for educational reform. Washington, DC: U.S. Government Printing Office.

National Science Board Commission on Precollege Education in Mathematics, Science, and Technology (1983). Educating Americans for the 21st century. Washington, DC: National Science Foundation.

Oakes, J. (1990). Opportunities, achievement, and choice: Women and minority students in science and mathematics. In C.B. Cazden (Ed.), Review of Research in Education (pp. 153- 222). Washington, DC: American Educational Research Association.

Resnick, L.B. (1987). Education and learning to think. Washington, DC: National Acade- my Press.

Romberg, T.A., Zariannia, A., & Collis, K.F. (1990). A new world view of assessment in mathematics. In G. Kulm (Ed.), Assessing higher order thinking in mathematics (pp. 21-38). Washington, DC: American Association for the Advancement of Science.

Spector, B.S. (1987). Excellence in preservice elementary teacher education in science. In J.E. Penick (Ed.), Focus on excellence: Preservice elementary teacher education in science (pp. 5-8). Washington, DC: National Science Teachers Association.

St. John, M. (1992). Science education for the 1990s: Strategies for change. Inverness, CA: Inverness Research Associates.

Shulman, L. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, I5, 4- 14.

Tobin, K. (1990). Teacher mind frames and science learning. In K. Tobin, J.B. Kahle, & B. J. Fraser (Eds.), Windows into science classrooms: Problems associated with higher-level cognitive learning (pp. 33-91). London: Falmer Press.

Tucker, M.S. (1991). Why assessment is now issue number one. In G. Kulm & S.M. Malcom (Eds.), Science assessment in the service of reform (pp. 3-15). Washington, DC: American Associatin for the Advancement of Science.

Walker, D.F., & Schaffarzick, J. (1974). Comparing curricula. Review of Educational Research, 44, 83- 1 1 1.

Received April 20, 1994 Revised January 17, 1995 Accepted January 27, 1995