esera dcqqdeqeqebook part 11

Strand 11

Evaluation and assessment of

student learning and development

i

CONTENTS

Chapter Title Page

1 Introduction

Robin Millar, Jens Dolin

1

2 Performance assessment of practical skills in science in teacher

training programs useful in school

Ann Mutvei Berrez, Jan-Eric Mattsson

3

3 Development of an instrument to measure childrens systems thinking

Kyriake Constantinide, Michalis Michaelides, Costantinos P.

Constantinou

13

4 Development of a two-tier test- instrument for geometrical

optics

Claudia Haagen, Martin Hopf

24

5 Strengthening assessment in high school inquiry classrooms

Chris Harrison

31

6 Analysis of student concept knowledge in kinematics

Andreas Lichtenberger, Andreas Vaterlaous, Clemens Wagner

38

7 Measuring experimental skills in large-scale assessments:

Developing a simulation-based test instrument

Martin Dickmann, Bodo Eickhorst, Heike Theyssen, Knut

Neumann, Horst Schecker, Nico Schreiber

50

8 The notion of authenticity according to PISA: An empirical

analysis

Laura Weiss, Andreas Mueller

59

9 Examining whether secondary school students make changes

suggested by expert or peer assessors in the science web-

portfolio

Olia Tsivitanidou, Zacharias Zacharia, Tasos Hovardas

68

10 Sources of difficulties in PISA science items

Florence Le Hebel, Andree Tiberghien, Pascale Montpied

76

Strand 11 Evaluation and assessment of student learning and development

ii

11 In-context items in a nation wide examination: Which

knowledge and skills are actually assessed?

Nelio Bizzo, Ana Maria Santos Gouw, Paulo Sergio Garcia,

Paulo Henrique Nico Monteiro, Luiz Caldeira Brant de

Tolentino-Neto

85

12 Predicting success of freshmen in chemistry using moderated

multiple linear regression analysis

Katja Freyer, Matthias Epple, Elke Sumfleth

93

13 Testing student conceptual understanding of electric circuits as a

system

Hildegard Urban-Woldron

101

14 Process-oriented and product-oriented assessment of

experimental skills in physics: A comparison

Nico Schreiber, Heike Theyssen, Horst Schecker

112

15 Modelling and assessing experimental competence: An

interdisciplinary progress model for hands-on assessments

Susanne Metzger, Christoph Gut, Pitt Hild, Josiane Tardent

120

16 Effects of self-evaluation on students achievements in chemistry education

Inga Kallweit, Insa Melle

128


INTRODUCTION

Strand 11 focuses on the evaluation and assessment of student learning and

development. Many studies presented in other conference strands, of course, involve

the assessment of student learning or of affective characteristics and outcomes such as

students attitudes or interests and use existing instruments or new ones developed for the study in hand. In such studies, assessment instruments are tools to be used to

try to explore and answer other questions of interest. In strand 11, the emphasis is on

the development, validation and use of assessment instruments; the focus is on the

instrument itself. These can include standardized tests, achievement tests, high stakes

tests, and instruments for measuring attitudes, interests, beliefs, self-efficacy, science

process skills, conceptual understandings, and so on. They may be developed with a

view to making assessment more authentic in some sense, to facilitate formative assessment, or to improve summative assessment of student learning.

Fifteen papers presented in this strand are included in this book of e-proceedings.

Four of them discuss the development of new or modified instruments to assess

students conceptual understanding of a science topic. Two use the two-tier multiple choice format that many researchers have found valuable for probing understanding,

to explore the topics of electric circuits and geometrical optics. Another explores the

factors that may underlie the observed patterns in students responses, trying to tease out the relative importance of mathematical and physical ideas in determining

performance on questions about kinematics. A fourth paper begins the exploration of

a relatively new and novel science domain, systems thinking. Here assessment items

have a particularly significant role to play in helping to define the domain in

operational terms, and facilitating discussion within the science education research

community.

Four papers explore issues concerning the assessment of practical competence and

skills. One looks at the general issue of developing a model to describe progress in

carrying out hands-on activities; another focuses more specifically on experimental

skills in physics; and a third considers performance assessment in the context of initial

teacher education. The fourth paper looks at the potential use of simulations as

surrogates for bench practical activities. Work in this domain is important, as science

educators seek to come to a better understanding of the factors that lead to variation in

students responses to practical tasks.

Three papers look in different ways at the influence of contexts on students answers and responses to tasks. Two take the PISA studies as their starting point, looking in

detail at the thinking of students as they respond to PISA tasks and questioning the

extent to which the PISA interpretation of authenticity enhances student interest and engagement with assessment tasks. Both point to the value of listening to students

talking about their thinking as they answer questions, and suggest that this may be

quite different from what we would expect, and perhaps hope. A third paper with an

interest in the effects of contextualisation presents data from a study in Brazil

comparing students answered to sets of parallel questions with fuller and more abridged contextual information. The findings have implications for item design, and

suggest that reading demands should be kept carefully in check if we aim to assess

science learning.


1

Three papers in this section explore the formative use of assessment. One has a focus

on the assessment of learning that results from inquiry-based science teaching.

Another looks at the ways in which students respond to formative feedback on their

work. The context for this study is web portfolios, but the research question is one

with wider applicability to other forms of feedback, and across science contents more

generally. The third uses an experimental design to explore the impact on student

learning in a topic on chemical reactions of a self-evaluation instrument that asks

students to try to monitor their own learning and to take action to address areas in

which they judge themselves to be weak.

All of the papers described above collect data from students of secondary school age

or prospective teachers. The final paper in this strand looks at the potential use of an

attitude assessment instrument to predict undergraduate students success in chemistry learning.

The set of papers highlights the key role of assessment items and instruments as

operational definitions of intended learning outcomes, bringing greater clarity to the constructs used and to our understanding of learning in the domains that they study.

Jens Dolin and Robin Millar


2

PERFORMANCE ASSESSMENT OF PRACTICAL

SKILLS IN SCIENCE IN TEACHER TRAINING

PROGRAMS USEFUL IN SCHOOL

Ann Mutvei and Jan-Eric Mattsson

School of Natural Sciences, Technology and Environmental Studies, Sdertrn

University, Sweden.

Abstract: There is a general process towards an understanding of knowledge not as a

question of remembering facts but to achieve the skill to use what is learnt under

different circumstances. According to this, knowledge should be useful at different

occasions also outside school. This process may also be identified in the development

of new tests performed in order to assess knowledge.

In courses in biology, chemistry and physics focused on didactics we have developed

performance assessments aimed at assessing the understanding of general scientific

principles by simple practical investigations. Although, designed to assess whether

specific goals are attained, we discovered how small alterations of performance

assessments promoted the development of didactic skills. Performance assessments

may act as tools for the academic teacher, school teacher and for enhancement of

student understanding of the theory.

This workshop was focused on performance assessments of the ability to present

skills and to develop new ideas. We presented, discussed, explained and familiarized

a practical approach to performance assessments in science education together with

the other participants. The emphasis was to demonstrate and to give experience of this

assessment tool.

We performed elaborative tasks as they may be used by teachers working at different

levels, assessed the performances and evaluated the learning outcome of the activity.

Different assessment rubrics where be presented and tested at the workshop. Learning

by doing filled the major part of the workshop but there were also opportunities for

discussions, sharing ideas and suggestions for further development.

The activities performed may be seen as models possible for further development into

new assessments.

Keywords: assessment, rubric, practical skills, knowledge requirement

INTRODUCTION

During the last ten or fifteen years there has been a general process towards an

understanding of knowledge not as a question of remembering facts but to achieve the

skill to use what is learnt under different more or less practical circumstances.

According to this view knowledge should be useful at different occasions also outside

school. Traditional textbooks often had facts arranged in a linear and in a hierarchical

order. More recent books are focused on the development of the thoughts and ideas of

the student by presenting general principles underpinned by good examples,


3

diagnoses, questions to discuss, reflective tasks without any presentation of a correct

answers, etc. (cf. Audesirk et al. 2008, Hewitt et al. 2008, Reece et. al 2011, Trefil &

Hazen 2010). A similar development can be found in teacher training programs,

where lectures and traditional text seminars to some extent have been replaced by

more interactive forms of teaching. This development we also found in examinations

at our own university where tests performed in order to assess knowledge of literature

content have been replaced by tests where students have to show their capacity to use

their knowledge.

Practical performance assessments are important when assessing abilities or skills of

students in teacher training programs. In science courses in biology, chemistry and

physics focused on didactics we have for several years developed performance

assessments focused on understanding of general scientific principles, but based on

simple practical investigations or studies. Although, designed to assess whether

students reached the goals of a specific course, we often have discovered how small

alterations of these performance assessments have promoted the development of the

didactic skills of the student. Thus, they may act as assessment tools for the academic

teacher, models for assessments in school and enhancement of the students

theoretical understanding of the subject and theory. The assessments may be made on

oral or written reports, during guided excursions or museum visits or practical

experiments, on traditional or esthetical diaries, self diagnoses or diagnoses made by

other students based on certain criteria.

We have been working several years with teacher training programs focused on work

in primary and secondary schools, with further education for teachers and with

university students studying biology and chemistry. The wide range of courses and

students have been giving us experiences how to work with different contents adapted

to different ages of students at school. Out of this we have found some similar and

different basic problems and needs of understanding depending on the subject. These

experiences also give us the opportunity to contribute to national seminars and

conferences.

CURRICULUM AT SWEDISH SCHOOL

The new curriculum in Sweden for the primary and lower secondary schools

(Skolverket 2010) as well as the new one for the upper secondary school put the

emphasis on the students skills rather than knowledge (facts). It is the ability to use

the knowledge that is to be assessed. This development is a global trend; see e.g.

Eurasian Journal of Mathematics, Science & Technology Education 8(1). This is a

great change compared to earlier curricula, especially when compared to the common

interpretation and implementation of these at the local level. A similar development

has occurred in the universities in Sweden. Today the intended learning outcomes

should be described in the syllabi as abilities the student can show after finishing the

course and how this should be done.

Many teachers have problems with this view as they are used to assess the students

ability to reproduce facts. These teachers find it hard to understand how to work with

performance assessments instead of tests targeting the knowledge of facts. They often

ask for clear directions and expect strict answers instead of guidelines how to improve

their own ability to work with performance assessments.


4

Teaching according to these new curricula starts with the design of performance

assessments suitable for the assessment of a specific skill and to create a rubric for the

assessment. Thereafter the teacher plans the exercises beneficial for student

development and finally decides the time needed and plans the activities according to

this.

Figure 1. How to plan learning situations.

As an example of how teachers may work with this method we designed a practical

assessment of practical skills and presented it as a workshop at ESERA 2013.

HOW TO DESIGN A PERFORMANCE ASSESSMENT OF PRACTICAL

SCIENCE SKILLS

In order to design a workshop on performance assessments of the skills we tried to do

as teachers are supposed to do at school. The emphasis was to demonstrate and give a

possibility to get experience of this assessment tool under realistic conditions. Thus,

these performance assessments are constructed in accordance with the curriculum in

Sweden from 2011 (Skolverket 2010) but they are probably useful for anyone who

wants to assess abilities or skills rather than memories of facts or texts. We tried to

present, explain and familiarize the participants with a practical approach to

performance assessments in science education at school.

The skill of assessment has to be learned. If teachers are used to assess skills these

normally are of a more or less theoretical kind. They are used to assess the quality of

the language used or the correctness of a mathematical calculation. Assessment of

practical skills does not has to be more complicated but it has to be trained. According

to the Swedish curriculum 150200 assessments of each student and in each of the

about 15 school subjects should be done at the end of years 6 and 9 and many of these

refer to practical skills. In order to simplify this monstrous task it is possible and

necessary to assess several skills in more than one subject at one occasion.

We had prepared four similar activities, all with the same material; candle, wick, and

matchbox but with different purposes. They were supposed to represent studies of


5

mass transfer, energy transformation, technical design, and phase changes. The latter

is presented here in detail.

General principles of performance assessments

In the preparations we followed the directions of the Swedish curriculum for the

compulsory school (Skolverket 2010). We selected the core content and the

knowledge requirements relevant for phase transitions as the foundation for

development of the performance assessment. Usually teachers start with the

knowledge requirements, interpret these and design tests for assessing the students

skills according to the requirements, design suitable learning situations or practical

training of the skills and finally decide what parts of the core content should be used

(Figure 1). Here we started with the core content as it were some specific areas of

knowledge we wanted to study. When the core content was selected the assessment

rubric was developed by interpreting and dissecting the knowledge requirements.

Core content

The teaching in science studies should, in this case, according to the curriculum of

primary and secondary school (Skolverket 2010), deal with the core content presented

in Table 1.

Table 1

Core content in Swedish compulsory school curriculum relevant for phase transitions

and scientific studies.

In years 13 In years 46 In years 79 Various forms of water: solids, liquids and gases.

Transition between the

forms: evaporation,

boiling, condensation,

melting and solidification.

Simple particle model to describe and explain the structure,

recycling and indestructibility of

matter. Movements of particles as

an explanation for transitions

between solids, liquids and gases.

Particle models to describe and explain the

properties of phases, phase

transitions and distribution

processes for matter in air,

water and the ground.

Simple scientific studies. Simple systematic studies. Planning, execution and

evaluation.

Systematic studies. Formulating simple

questions, planning,

execution and evaluation.

The relationship between chemical experiments and

the development of

concepts, models and

theories.

Knowledge requirements

The knowledge requirements are related to the age of the students and show a clear

progression through school. At the end of the third, sixth and ninth year there are

clearly defined knowledge requirements (Table 2). Grades are introduced in the sixth

year and levels for grades E (lowest), C, and A (highest) are described in the

curriculum. Also D and B are being used. Grades D or B means that the knowledge

requirements for grade E or C and most of C or A are satisfied respectively.


6

Table 2

Knowledge requirements for different years and grades

Year

3

Based on clear instructions, pupils can carry out [] simple studies dealing with nature and people, power and motion, and also water and air.

Grade E Grade C Grade A

Year

6

Pupils can talk about and

discuss simple questions

concerning energy.

Pupils can carry out

simple studies based on

given plans and also

contribute to

formulating simple

questions and planning

which can be

systematically developed.

In their work, pupils use

equipment in a safe and

basically functional way.

Pupils can [] contribute to making

proposals that can

improve the study.



concerning energy.

Pupils can carry out

simple studies based on

given plans and also

formulate simple


which after some

reworking can be




appropriate way. Pupils

can [] make proposals which after some

reworking can improve

the study.



concerning energy.

Pupils can carry out simple

studies based on given

plans and also formulate

simple questions and

planning which after some

reworking can be



equipment in a safe,

appropriate and effective

way. Pupils can [] make proposals which can

improve the study.

Year

9


discuss questions

concerning energy. Pupils

can carry out studies

based on given plans and

also contribute to

formulating simple


which can be


In their studies, pupils use


basically functional way.

Pupils apply simple

reasoning about the

plausibility of their results

and contribute to making

proposals on how the

studies can be improved.

Pupils have basic

knowledge of energy,

matter, [] and show this by giving examples and

describing these with

some use of the concepts,

models and theories.


discuss questions


can carry out studies

based on given plans and

also formulate simple


which after some

reworking can be


In their studies, pupils use


appropriate way. Pupils

apply developed

reasoning about the

plausibility of their results

and make proposals on

how the studies can be

improved. Pupils have

good knowledge of

energy, matter, [] and show this by explaining

and showing

relationships with

relatively good use of the

concepts, models and

theories.


discuss questions


can carry out studies based

on given plans and also

formulate simple questions

and planning that can be


In their investigations,

pupils use equipment in a

safe, appropriate and

effective way. Pupils apply

well developed reasoning

concerning the plausibility

of their results in relation

to possible sources of

error and make proposals

on how the studies can be

improved and identify new

questions for further

study. Pupils have very

good knowledge of energy,

matter, [] and show this by explaining and showing

relationships between

them and some general

characteristics with good

use of the concepts, models

and theories


7

Assessments of knowledge requirements

The knowledge requirements were interpreted and dissected in smaller units in order

to construct an assessment rubric adapted to the inquiry. Five main skills were

selected from the knowledge requirements; Use of theory, Improvement of the

experiment, Explanations, Relate, and Discuss. In order to make the assessment rubric

more generalized we decided not to use the grades of the curriculum but recognized

three levels of skills; Sufficient, Good, and Better corresponding to the grades E, C

and A respectively. In all cases we also gave examples of relevant student answers.

This is a more or less necessary requirement in order to make sure that the performer,

assessor or teacher really understands what is meant by a specific requirement (Arter

& McTighe 2001, Jnsson 2011).

As an example of this we can look at the knowledge requirement Pupils can carry

out studies based on given plans and also contribute to formulating simple questions

and planning which can be systematically developed. In their studies, pupils use

equipment in a safe and basically functional way. Pupils apply simple reasoning

about the plausibility of their results and contribute to making proposals on how the

studies can be improved. (Year 9, level E). This requirement contains information

that may be dissected into several units.

Primarily it is necessary to look at the five skills of the students that are going to be

assessed and look at the suitable requirements for each skill. The students are

supposed to carry out studies based on given plans. In the case the experiment is

very simple, (light and observe a burning candle), and hardly useful assessing this

specific skill. They shall also contribute to formulating simple questions and

planning which can be systematically developed. This requirement can be further

developed to suit the five skills.

In order to show this skill it is necessary to have some knowledge about the theory

and use it in a suitable way. The skill use of theory is a necessary condition for this

and may be formulated as The student draws simple conclusions partly related to

chemical models and theories. This criterion also is in concordance with the skill

simple reasoning about the plausibility of their results and contribute to making

proposals on how the studies can be improved. This may be formulated as the

student discusses the observations and contributes with suggestions of improvements

in the rubric for assessment of the improvement of the experiment requirement.

In a similar way the assessment of remaining three skills may be developed into more

specific criteria adapted to this experiment (Table 3).

In order to make it possible for the student to understand what is expected it is

necessary to clarify the requirement criteria and give realistic examples of these

requirements. The meaning of words differs between disciplines not only in the

academic world but also in school (cf. Chanock 2000). This has consequences when

students get feedback as they often do not understand the academic discourse with its

specific concepts and fail to use the feedback later (Lea & Street 2006). Criteria

combined with explicit examples are necessary to solve this problem (Sadler 1987).

This is also important when designing assessment rubrics (Busching 1998, Arter &

McTighe 2001). Thus, to every criterion there has to be at least one example given. In

Table 3 this is exemplified in every combination of skill and grade requirement.


8

Table 3

Assessment rubric for assessing skills in an experiment of phase changes Sufficient Good Better

Use of theory The student draws

simple conclusions

partly related to

chemical models and

theories. (I can see

stearic acid in solid,

liquid and gas phase.)

The student draws

conclusions based on

chemical models and

theories. (The heat of the

candle causes the phase

transfer between the

phases.)

The student draws well

founded conclusions out of

chemical models and

theories. (Stearic acid

must in gas phase and mix

with oxygen to burn.)

Improvement

of the

experiment

The student discusses

the observations and

contributes with

suggestions of

improvements.

(Observe more

burning candles.)

The student discusses

different interpretations

of the observations and

suggests improvements.

(Remove the wick and

relight the candle.)

The student discusses well

founded interpretations of

the observations, if they

are reasonable, and

suggests based on these

improvements which allow

enquiries of new

questions. (Heat a small

amount of stearic acid and

try to light the gas phase

above.)

Explanations

The student gives

simple and relatively

well founded

explanations. (The

stearic acid melts by

heat produced by the

flame.)

The student gives

developed and well

founded explanations.

(Also the change from

liquid phase to gaseous

phase depends on the

heat from the flame.)

The student presents

theoretically developed

and well founded

explanations. (All phase

changes from solid to

liquid or liquid to gaseous

need energy.)

Relate The student gives

examples of similar

processes as in the

experiment related to

questions about

energy, environment,

health and society.

(The warmth of the sun

melts the ice on the

lake at the end of the

winter.)

The student generalizes

and describes the

occurrence of similar

phenomena as in the

experiment related to

questions about energy,

environment, health and

society. (In the frying

pan it is hot enough for

butter to melt and in the

sauna water vaporizes.)

The student discusses the

occurrence of the

phenomena observed in

everyday life and the use

of it and its impact on

environment, health and

society. (The phase change

from liquid to gaseous

phase cools you down

when you are sweating.)

Discuss The student

contributes to a

discussion of the

occurrence of the

phenomena studied in

society and makes

statements partly based

on facts and describes

some possible

consequences. (Gases

are often transported

in a liquid phase which

has a lower volume.)

The student describes

and discusses the

occurrence of the

phenomena studied in

society and makes

statements based on facts

and fairly complicated

physical relations and

theories. (The bottle of a

gas stove has fuel mainly

in liquid phase but it is

transported in the hose

and burnt i gaseous

phase.)

The student uses the

experiment as a model and

discusses the occurrence

of the phenomena studied

in society and makes

statements and

consequences based on

facts and complicated

physical relations and

theories (The phase

change from liquid to

gaseous phase cools you

down when you are

sweating.)


9

WORKSHOP

We had prepared four similar activities, all with the same material; candle, wick, and

matchbox but with different purposes. The activities represented studies of mass

transfer, energy transformation, technical design, and phase changes. At the workshop

three groups were formed, omitting the study of technical design. The three groups

were not informed about the differences between the aims of their experiments. The

groups were constructed to include people with as varied background as possible.

Thus, participants from one specific country or similar fields as chemistry or physics

were allocated to different groups. They performed elaborative tasks similar to those

used by teachers working at different levels, assessed the performance and evaluated

the learning outcome of the activity. Within each group one person was selected to do

the assessment of activities the others made. The person assessing the work should

focus not only on the results of the discussions within the group but also try to

evaluate the process, as the aim was to assess the skills of the participants rather than

the content of their knowledge.

Discussion

The aim was to demonstrate of how peer reviewing within the group may be used for

producing information of several kinds beneficial for the performance assessment of

science education at school. Discussions arose among the participants about how an

integrated approach, especially in relation to other subjects in school, improved the

usefulness of the methods. Learning by doing followed by discussions became the

major part of the workshop with sharing of ideas and suggestions for further

development.

Most of the participants had weak knowledge of assessments of practical skills and

expressed their astonishment of the positive result of the workshop and showed

curiosity to use the method. Some of the participants also showed didactic skills when

explaining the different aspects of the experiment they mastered to the others, a good

example of the importance of variation in the skills of group members.

The persons who made the assessments expressed the need of further practicing. They

realized the complexity in assessing different skills at the same time as assessing the

grade. They also expressed a will to develop this ability as they realized the strength

in assessing several skills at one occasion. Further, the participants noted the

importance of questions like the last on in the instructions (Appendix) in order to

assess the quality of the relation between theory and practice.

Conclusion

Although, based on a simple experiment of a burning candle, the workshop gave a

opportunity to discuss and understand theories being regarded as difficult to

understand from the viewpoint of the student or difficult to teach from the teachers

view. The experiments, although similar, were of different character, thus, reflecting a

wide spectrum of possibilities.

Thus, the activities performed may be seen as models or examples possible to further

develop new assessments according to the content of the subject.


10

REFERENCES

Arter, J. A. & McTighe, J. (2001). Scoring rubrics in the classroom, Corwin

Audesirk, T., Audesirk, G., & Byers, G B. (2008). Life on earth, 5 ed., San Francisco,

Pearson Education.

Busching, B. (1998). Grading Inquiry Projects. New Directions for Teaching and

Learning 74: 8996.

Chanock, K. (2000). Comments on Essays: do students understand what tutors write?

Teaching in Higher Education 5 (1): 95105.

Hewitt, P. G., Suchocki, J. & Hewitt, L. A. (2008). Conceptual physical science, 4 ed.

San Francisco, Pearson Education.

Jnsson, A. (2011). Lrande bedmning. Gleerups.

Lea, M.R. & Street B.V. (2006). The Academic Literacies Model: Theory and

Applications. Theory into Practice, 45(4): 368377.

Reece, J.B., Urry, L.A., Cain, M.L., Wasserman, S.A., Minorsky, P.V. & Jackson, R.

B. (2011). Campbell Biology Global Edition, Pearson.

Sadler, D.R. (1987). Specifying and Promulgating Achievement Standards. Oxford

Review of Education 13(2): 191209.

Skolverket (Swedish National Agency for Education). (2010) Curriculum for the

compulsory school, preschool class and the recreation centre 2011. Skolverket.

Trefil, J. & Hazen R.M. (2010). Sciences an integrated approach. Wiley Eurasian

Journal of Mathematics, Science & Technology Education 8(1).


11

APPENDIX

INQUIRY OF A BURNING CANDLE

This is an experiment of phase changes

1. Light the candle and observe the change of phases.

2. Which changes of phase can you observe?

3. Where do they occur?

4. Why do they occur?

5. What happens in the different phases?

6. How may you improve the experiment?

7. Give examples of phase changes in daily life and the society.


This is an experiment of energy transformation

1. Light the candle and observe the energy transformations.

2. Which changes of energy forms can you observe?



5. What happens during the different energy transformations?


7. Give examples of energy transformations in daily life and the society.


This is an experiment of mass transfer

1. Light the candle and observe mass transfer

2. Which types of mass transfer can you observe?



5. What happens to the candle due to this mass transfer?


7. Give examples of mass transfer in daily life and the society.


This is an experiment of candle design

1. Light the candle and discuss the design of the candle.

2. Which different parts can you observe in the candle?

3. Where are they and how are they united?

4. What function do the different parts have?

5. Why is the candle created in that way?


7. Give examples of similar designs in daily life and the society.


12

DEVELOPMENT OF AN INSTRUMENT TO MEASURE

CHILDRENS SYSTEMS THINKING

Kyriake Constantinide, Michalis Michaelides and Costas P. Constantinou

University of Cyprus

Abstract: Systems thinking is a higher order thinking skill required to meet the demands

of social, environmental, technological and scientific advancements. Science abounds in

systems and makes system function a core object of investigation and analysis. As a

consequence, teaching in science can be a valuable framework for developing systems

thinking. In order to approach this methodically, it becomes important to specify the

aspects that constitute the systems thinking construct, design curriculum materials to help students develop these aspects, and develop instruments for evaluating students competence and monitoring the learning process. The present study aims at the

development of an instrument for standardized assessment of systems thinking. It draws

on a methodology that follows a cyclic procedure for instrument development and

validation, where literature, experts, students and educators contribute in the procedure.

Currently, the assessment instrument is in the second cycle of field testing, having

collected data from about 900 students and having used these to develop a first version of

a validated test and a scale for measuring 10-14-year-old childrens systems thinking. The test consists of multiple-choice scenario items that draw their content from everyday

life. We present the methodology we are following, providing some examples of

multiple-choice items to demonstrate their development and transformation throughout

the process.

Keywords: systems thinking, assessment, test development

BACKGROUND

The rate of advancements in scientific knowledge and technology and the widespread

demands on young people to participate actively in solving problems in almost every

aspect of our lives have reoriented the role of education in general and science teaching

in particular. Nowadays, science teaching aims at developing scientifically literate people

with flexible thinking skills and an ability to participate critically in meaningful

discourse. More specifically, it aims at helping students acquire positive attitudes towards

learning and science, a variety of experiences, conceptual understanding, epistemological

awareness, practical and scientific skills and creative thinking skills (Constantinide,

Kalyfommatou & Constantinou, 2001).

The definitions of systems thinking described in the literature (e.g., Senge, 1990; Thier &

Knott, 1992; Booth Sweeney, 2001; Ben-Zvi Assaraf & Orion, 2005) include thinking

about a system, meaning a number of interacting items that produce a result over a period

of time. According to the Benchmarks for Science Literacy (AAAS; 1993), systems

thinking is an essential component of higher order thinking, whereas Kali, Orion and


13

Eylon (2003) refer to systems thinking as a high-order thinking skill required in

scientific, technological, and everyday domains. Senge (1990) claims that systemic

thinkers are able to change their own mental models, control their way of thinking and

their problem-solving process. Therefore, defining, promoting through curricula, and

measuring systems thinking should be an essential priority for education. Science

teaching and learning can be a valuable framework for developing such skills, since it

abounds in systems and science makes system function a core object of investigation and

analysis.

Several structured teaching attempts to promote systems thinking are reported in the

literature, making the development of instruments for measuring systems thinking and for

evaluating the effectiveness of such curricula a necessity. The most common means of

evaluating systems thinking that has been reported thus far include tests (e.g. Riess &

Mischo, 2009), interviews (e.g. Hmelo-Silver & Green Pheffer, 2004) and computer

simulations and logs (e.g. Sheehy, Wylie, McGuinness & Orchard, 2000). Some

researchers in order to triangulate their data used a combination of various data sources

(e.g. Ben-Zvi Assaraf & Orion, 2005). Almost all means include tasks where a problem is

introduced and the subjects have to propose solutions or predict the behavior of the

system and its elements. Nevertheless, to date there is no validated instrument and prior

research has not provided a scale for measuring systems thinking of children aged 10-14

years old. The purpose of this paper is to describe the on-going development process of

the Systems Thinking Assessment (STA), a test designed to assess systems thinking.

RESEARCH METHODOLOGY

Systems Thinking Assessment (STA): purpose and specifications

The STA will be used to measure the quality of thinking about systems by children aged

10-14 and the effectiveness of curricula designed to promote systems thinking. It consists

of multiple-choice items in the context of everyday phenomena, familiar to the children

of the specific age range. The stems of the items include a scenario and children are

asked to choose the best possible answer, amongst four alternatives.

Multiple choice items have advantages and disadvantages. Given that every other criterion

was taken into account, grading a multiple choice test is objective, since a grader would

mark an item in the same way as anybody else. Besides, a short amount of time is needed

to administer many items, in order to sufficiently cover the content domain under study.

They are also more reliable than other forms of questions, since, in a possible

readministration of a test, it is more likely that a subject will produce the same answers if

the questions are multiple choice than if they are open-ended. A basic disadvantage of

multiple choice questions is that they do not provide much information on the subjects thinking processes, namely the reasons for which they answer each item the way they do.

Nevertheless, the procedure of the tests development and Rasch analysis minimize the effect of this disadvantage on the results.


14

In order to be able to make generalizations, there was an intentional effort to include items

that utilize various systems: physical-biological systems (such as water cycle, a forest, a

dam or food webs), mechanical-electrical systems (such as a bicycle or a car) and

socioeconomic systems (such as a family, a village or a store). Moreover, where possible,

a picture or a diagram was added in the items wording, so as to make the item clearer and the test more eye-pleasant.

We have adopted the following operational definition of systems thinking, which relies

on four strands:

(a) System definition includes identifying the essential elements of a system, its temporal boundaries and its emergent phenomena. (b) System interactions includes reasoning about causes and effects when interactions are taking place within the system.

(c) System balance refers to the abilities of recognizing the relation between interactions and the systems balance. (d) Flows refers to reasoning about the relation of inflows and outflows in a system and recognizing cyclic flows of matter or energy.

STAs cyclic development procedure

Figure 1 presents the cyclic nature of the STA development. The definition of Systems

Thinking in the center of the cycle is in regard to both the abilities that constitute it and

the items that measure it. Involved parties (experts, educators, students and existing

literature), provide feedback on Systems Thinking definition through data that define the

tests validity and reliability.

Figure 1. Development procedure for STA

Educators

(face validity)

Literature

(content validity)

Experts

(content validity)

Students

(test admin. and

interview data)

(construct,

criterion and face

validity,

reliability)

Systems

Thinking

(Abilities and

items)


15

The STA has already undergone its first cycle of development. Reviewing the literature

led to 13 abilities that seemed to define Systems Thinking. The original items were

developed and administered to a small number of 10-year-old students. Qualitative and

quantitative data led to modifications (content and wording changes) and the

development of new items. Two experts gave feedback on the tests content validity. Further improvements were carried out and two educators with experience with children

aged 10-14 years old examined the face validity of the test. The revised version was once

again administered to a small number of 10-year-old students and after the necessary

modifications the final form of the test with 52 multiple-choice items was administered to 900 students. Rasch modeling led to a scale showing items difficulty and students ability.

Based on a broader literature review and the development of separate examples regarding

each ability, the second development cycle began with revising the 13-ability schema and

reducing the abilities to 10 and the items to 41. The revised test was given to

approximately 90 10-14-year-old students. Test and items difficulty indices, items discrimination indices and frequencies were calculated and, items were either modified or

replaced. Afterwards, 16 students participated in interviews, answering the items and

following a think-aloud protocol (Ericsson & Simon, 1998). Non-effective items were

replaced or modified.

The latest version of the items is under evaluation by independent experts. Graduate/PhD

students in Learning in Science, academics specialized in Science Teaching or

Psychology and international researchers with experience on Systems Thinking

measurement will provide feedback on the test by solving it first, and by judging its

efficiency based on a structured protocol. Finally, an expert panel will be formed, during which any problems will be discussed until the panel reaches consensus. The

revised test will be given to four educators to evaluate its face validity. The test will then

be administered to 100 10-14-year-old students to statistically assess its clarity and its

developmental validity. The improved test will finally be administered to 500 students

and the data will be analyzed using Rasch modeling. Confirmatory Factor Analysis will

be carried out in order to assess the 10-ability structure of the construct.

RESULTS

At the final stage of the first cycle of the STA development, the test was administered to

about 900 students. Rasch statistical model provided a scale for the 52 items of the STT,

where both subjects score and items degree of difficulty are presented (Figure 2).

It is evident that the 52 items of the test fit the model well. Both students scores and items degree of difficulty are distributed uniformly on the scale. Students scores vary between -2.16 and 2.37 logits, whereas the items degree of difficulty varies between -2.41 and 2.53 logits.


16

* Every represents 4 students

Figure 2. Scale of STT (at the end of first cycle)


17

Table 1

Statistical values for the 52 STA items for the whole sample and the four groups

Statistical indices Total

sample

5th

Gr.

Prima

ry

6th

grade

Primary

1st

grade

Secon

d.

2nd

grade

Secon

d.

(n=848

)

(n=21

9)

(n=249) (n=13

7)

(n=24

3)

Mean (items*) 0.00 0.00 0.00 0.00 0.00

(persons) -0.01 -0.30 -0.05 0.14 0.21

Standard deviation (items) 0.97 0.96 0.97 1.08 1.03

(persons) 0.72 0.66 0.73 0.73 0.68

Separability** (items) 0.99 0.97 0.98 0.96 0.98

(persons) 0.81 0.77 0.81 0.81 0.78

Mean Infit mean square (items) 1.00 1.00 1.00 1.00 1.00

(persons) 1.00 1.00 1.00 1.00 1.00

Mean Outfit mean square (items) 1.01 1.02 1.02 1.01 1.01

(persons) 1.01 1.02 1.02 1.01 1.01

Infit t (items) -0.12 -0.13 -0.03 0.00 0.04

(persons) -0.04 -0.07 -0.03 -0.02 -0.01

Outfit t (items) 0.09 0.05 0.09 0.05 0.08

(persons) 0.02 0.04 0.03 -0.01 0.02

*L=52 items

** Separability: value=1 shows great reliability, whereas value=0 very little reliability

Table 1 shows the statistical values of Rasch statistical model for the whole sample and

the four subgroups (5th and 6th primary grades and 1st and 2nd secondary grades)

separately. It is evident that, for the whole sample and the subgroups, items reliability values are over .95, whereas subjects reliability values are over .76. Although the generally accepted values for such a scale are over .90 (Wright, 1985), the subjects reliability may be accepted. Furthermore, Mean Infit mean square for both items and

subjects equals to 1 for the whole sample and the subgroups, while Mean Outfit mean

square is either 1.01 or 1.02. Infit t and Outfit t, range from -0.13 to 0.09. Subjects Standard Deviation is rather small (SD=0.72), indicating uniformity in the samples behavior. Namely, students aged 10-14 respond to STT as an unvarying group. Besides,

the subjects mean score increases with age, suggesting developmental validity of the test. Rasch analysis also showed that the items receive infit values from .87 to 1.18,

which fit the generally accepted range .77-1.30 (Adams & Khoo, 1993). Three of the

items have an outfit value over 1.30, but since the difference between infit and outfit

values for these items is small, they remain in the test.


18

This is an on-going study and, at the moment, the test is under its second cycle of

development. Test administration and interviews with students, feedback from experts

and educators provide data to validate the items. The way data from each stage were

analyzed is indicated in the Tables 1 and 2 that are presented in the next subchapter. At

the end of the second cycle, Rasch analysis, as well as confirmatory factor analysis will

be conducted and results will be published.

Two examples of the items development

The development of two items through the STA construction cycles can be seen in Tables

2 and 3. The bicycle item presented in Table 2 refers to the strand System definition and more specifically to the ability of identifying the essential elements of a system and

during the procedure it has been revised. The apple tree item presented in Table 3 refers to the strand System balance and more specifically to the ability of identifying reinforcing balancing loops. It has been replaced by a different one because of

problematic item statistics during the pre-pilot phase of the second development cycle.

Table 2

The development of the bicycle item

1st

cycle Translation in English Comments Action

Pre-

pilot

Which are the least elements that a

bicycle that can troll should have?

. frame, two wheels, pedals, chain . frame, two wheels, gears, handle bar

C. frame, two wheels, pedals, seat

D. frame, two wheels

Students did not understand

the wording of the stem

Frequencies per alternative:

A B C D

0,41 0,09 0,32 0,18

Change

wording

of main

body

and

alternati

ves

To

experts

Which are the elements that a

bicycle SHOULD have in order to

roll, when someone is pushing it?

. frame, two wheels, chain, pedals, handle bar

. frame, two wheels, chain, pedals C. frame, two wheels, chain


Experts relate the item to two

initially separate abilities (the abilities 1.1 and 1.2 were

afterwards unified)

Keep as

is

To

educat

ors



troll, when someone is pushing it?




OK Keep as

is


19

Pilot Which are the elements that a







A B C D

0,56 0,19 0,00 0,25

Revise

distract

or

Final

admini

stration





. frame, two wheels, chain, pedals C. frame, two wheels, chain, handle

bar



A B C D

0,58 0,07 0,16 0,18

Change

wording

of the

stem

2nd

cycle

Pre-

pilot


bicycle SHOULD NECESSARILY

have in order to troll, when

someone is pushing it?



bar


Difficulty index (0.21)

Discrimination index (0.3)

Alternatives ok Frequencies per alternative:

A B C D

0,40 0,15 0,24 0,21

Keep as

is

Intervi

ews

(first

set)







bar


Correct answer with

CORRECT reasoning (4/11)

Wrong answer (7/11)

Suggestion of other

alternatives (2/11)

(wheels, pedals, handle bar) Alternative (B) not chosen by

anyone

Change

alternati

ve

content

Intervi

ews

(secon

d set)






Correct answer with

CORRECT reasoning (1/5)

Wrong answer (4/5)

Keep as

is


20

. frame, two wheels, pedals, handle bar

C. frame, two wheels, chain, handle

bar


Table 3

The development of the apple tree item 1

st cycle Translation in English Comments Action

Pre-pilot - - -

To experts - - -

To

educators

Mr George planted a small apple tree 10

years ago. Now the apple tree is quite big.

As the apple tree grows,

A. it needs more water.

B. it needs less water.

C. the trees need in water does not change. D. it does not need extra water, since it has

already grown.

Keep as is

Pilot Mr George planted a small apple tree 10



A. it needs more water .



already grown.


A B C D

0,38 0,31 0,13 0,19

Keep as is

Final

administra

tion

Mr George planted a small apple tree 10



A. it needs more water.



already grown.


A B C D

0,48 0,18 0,25 0,07

Keep as is

2nd

cycle

Pre-pilot Mr George planted a small apple tree 10



A. it needs more water .



already grown.

Difficulty index (0.43) OK

Discrimination index (-0.3)

Frequencies per lternatives

A B C D

0,43 0,21 0,28 0,07

Item

replaced


21

CONCLUSION

Systems thinking is a higher order skill, important in dealing with everyday phenomena

and in solving problems. At the same time, science is a field with plenty of models to

analyze and model. Despite the widespread research on curriculum development on

systems thinking, no validated tests have been developed to evaluate their effectiveness.

STA is developed following a cyclic and iterative procedure. It aspires to be a useful

instrument in assessing a curriculum designed to promote systems thinking in upper-

primary and lower-secondary school students.

REFERENCES

Adams, R. J. & Khoo, S. T. (1993). Quest: The Interactive Test Analysis System.

Camberwell, Victoria: ACER.

American Association for the Advancement of Science (1993). Benchmarks for science

literacy. New York: Oxford University Press: Author.

Constantinide, K., Kalyfommatou, N. & Constantinou, C. P. (2001). The development of

modeling skills through computer based simulation of an ant colony. In

Proceedings of the Fifth International Conference on Computer Based Learning

in Science, July 7th July 12th 2001, Masaryk University, Faculty of Education, Brno, Czech Republic.

Ben-Zvi Assaraf, O. & Orion, N. (2005). Development of System Thinking Skills in the

Context of Earth System Education. Journal of Research in Science Teaching, 42

(5), 518560

Booth Sweeney, L. B. (2001). When a butterfly sneezes. Pegasus Communications, Inc,

Waltham.

Ericsson, K. A. and Simon, H. A.(1998). How to Study Thinking in Everyday Life:

Contrasting Think-Aloud Protocols With Descriptions and Explanations of

Thinking. Mind, Culture and Activity, 5, 178-186.

Hmelo-Silver, C. E. and Green Pheffer, M. (2004). Comparing expert and vonice

understanding of a complex system prom the perspective of structures, behaviors,

and functions. Cognitive Science, 28, 127-138.

Kali, Y., Orion, N., & Eylon, B. (2003). The effect of knowledge integration activities on

students perception of the earths crust as a cyclic system. Journal of Research in Science Teaching, 40, 545565.

Riess, W., & Mischo, C. (2009). Promoting Systems Thinking through Biology Lessons.

International Journal of Science Education, 1-21.


22

Senge, P. (1990). The Fifth Discipline: The Art and Practice of the Learning

Organization. New York: Doubleday.

Sheehy, N., Wylie, J., McGuinness, C. & Orchard, G. (2000). How Children Solve

Environmental Problems: using computer simulations to investigate systems

thinking. Environmental Education Research, 6, 2, 109-126.

Thier, H. D. & Knott, R. C. (1992). Subsystems and Variables. Teachers guide, Level 3, Science Curriculum Improvement Study. Delta Education, Inc., Hudson.


23

DEVELOPMENT OF A TWO-TIER TEST-INSTRUMENT

FOR GEOMETRICAL OPTICS

Claudia Haagen and Martin Hopf

University of Vienna, AECCP, Vienna, Austria

Abstract: Light is part of our everyday life. Nevertheless, students face enormous

difficulties in explaining everyday optical phenomena with the help of scientific concepts.

Usually they rely on alternative concepts deduced from everyday experience, which are

often in conflict to scientific views. The identification of such alternative conceptions is

one of the most important prerequisite for promoting conceptual change (Duit und

Treagust 2003). Investigating students concepts with interviews is quite time consuming and difficult to handle in school-settings. Multiple-choice tests on the other hand, depict

the conceptual knowledge base frequently in a superficial way. The main aim of our

project is to develop a two-tier multiple-choice test which reliably and validly diagnoses

year-8 students' understanding of geometrical optics. So far, we have developed and

empirically tested a first (N=643) and second test version (N=367) partly based on items

from literature. Though, the overall results are promising, the quality of the items differs a

lot: There are a number of items which do not have appropriate distractors for the second

tier. In addition, students and teachers feedback on the test indicates that some items pose problems due to their wording or the kind of representation chosen. For a closer analysis of

these problematic items the qualitative method of student interviews was chosen. Semi-structured, problem based interviews were led with 29 year-8 students after their formal

instruction in optics. Based on the results of these interviews, test items were revised and

extended.

Keywords: geometrical optics, two-tier multiple choice test, test development

INTRODUCTION

Despite everyday experience with light, understanding geometrical optics turns out to be

difficult for students. Physics education research shows that students hold numerous

conceptions about optics which differ from scientifically adequate concepts (Duit 2009).

Alternative conceptions are very stable. Research shows that formal instruction is

frequently not able to transform them into scientifically accepted ideas (Andersson und

Krrqvist 1983; Fetherstonhaugh und Treagust 1992; Galili 1996; Langley et al. 1997).

Teachers knowledge about their students learning difficulties is one important prerequisite for the design of successful instruction. Exploring students conceptual knowledgebase can provide important feedback: It can support students in their individual

learning process and can serve as basis for further teaching decisions.

In general, there are two main methods used for examining students conceptual knowledge: Interviews and open ended questionnaires. The most effective methods like

interviews are very time consuming and difficult to handle for teachers in classroom

situations. In search for alternatives out of this dilemma, we encountered the method of

two-tier tests as used by e.g. Treagust 2006; Law & Treagust 2008. Two-tiered test items

are items that require an explanation or defence for the answer [] (see Wiggins and


24

McTighe 1998, p. 14) (Treagust 2006). Each item consists of two parts, called tiers. The first part of the item is a multiple-choice question which consists of distractors including

known student alternative conceptions. In the second part of each item, students have to

justify the choice made in step one by choosing among several given reasons (Treagust

2006).

Research on alternative conceptions in optics has mainly used the methods of interviews or

questionnaires with open answers (Andersson und Krrqvist 1983; Driver et al. 1985; Guesne 1985; Viennot 2003). In addition, multiple-choice tests were developed (Bardar et

al. 2006; Chen et al. 2002;Chu et al. 2009; Fetherstonhaugh und Treagust 1992). These

tests focus on various age-groups and on different content areas within geometrical optics.

We have, however, not found a psychometric valid test-instrument designed to portray

basics conceptions in geometrical optics of students on the lower secondary level.

Our main research objective is the development of a multiple-choice test-instrument for

year-8 students which is able to portray the students conceptions in geometrical optics.

DEVELOPMENT OF THE TEST INSTRUMENT

The test instrument was so far developed in two phases. In the first phase of the test

development the content area of the test was identified based on the Austrian curriculum of

year-8. Then students conceptions related to the key ideas of the content area were investigated by intensive literature research. Finally, items for the test were selected from

already existing assessment tools for geometrical optics and adopted to the two-tier

structure, where possible. Where already existing items were added a second tier,

distractors for this second tier were taken from research on students conceptions. Additionally, some items were newly developed. The final version of the test was tried out

with N=643 year-8 students.

The results of this first test phase were used to revise the first test version. The second test

version was tested with N=367 year-8 students, after their conventional instruction in

geometrical optics in year-8. This version consisted of 20 two-tier items and 6 items with

only one-tier, which were partly taken from literature (Fetherstonhaugh und Treagust 1992;

Kutluay 2005; Bardar et al. 2006; Chu et al. 2009). The results of the statistical analysis

with SPSS and students and teachers feedback on the test indicated a potential for improvement. Some items did not have appropriate distractors for the second tier, while

others seemed to pose problems due to their wording or the kind of representations (Colin

et al. 2002) chosen.

Consequently, semi-structured, problem based interviews were conducted with year-8

students, after their instruction in geometrical optics. These interviews were carried out for

the following reasons: Firstly, we wanted to make sure that the distractors which had been

taken from literature were exhaustive. Secondly, the interviews should investigate the

response space of the newly developed items. Finally, the language and the graphical

representations used in the items should be validated by students.

Participants and Setting

We interviewed 29 students (17 female, 12 male) after their instruction in geometrical

optics. The students attended year-8 in 5 different schools. The students went to 8 different


25

classes and thus had 8 different physics teachers. The schools our sample attended

contained all different types of schools available in Austria at year-8 level.

The interviews were conducted in the school setting. Each student was interviewed

individually. The average duration of the interviews was 19.5 minutes.

METHOD

We carried out semi-structured, problem based interviews (Lamnek, 2002; Mayring, 2002;

Witzel, 1985). The interviews were based on seven selected items of the second test

version. The students were just given the item task without any distractors. The interview

followed a four step structure for each item. The students had to:

paraphrase the task of the item

describe the graphical representation used in the item

answer the item

account for the answer given

Figure 1. Flow chart of the structure of the interviews

Data analysis

The interviews were recorded and transcribed. Afterwards they were analysed with

MAXQDA following the method of qualitative content analysis by Mayring (2010) and

Gropengieer (2008).

The data was analysed concerning three main categories: language issues, the forms of

visual representations used and students conceptions related to the content of the items. As far as language issues are concerned, we were interested how students interpreted the task

of the item on basis of the text given. Additionally, we tried to identify unfamiliar words

and expressions as well as too long or complicated sentences.

For the visual representations our main aim was to find out if the students were able to

grasp the content or the situation represented in visual form.

The final category on students conceptions was supposed to analyse the response space concerning the problems posed and so to get a good overview of students conceptions related to the problem posed.


26

FINDINGS

The findings presented here are results of the empirical testing of the second test version

(N=376). The reliability of the test was established by a Cronbach alpha coefficient of

=0.77. An overview of the test and item statistics concerning the 20 two-tier items is given in figure 2.

Figure 2. Test and item statistics of the second test version

Two-tier items were on average answered only in 37.2% of cases correctly. Contrary, one-

tier items were solved on average in 47.41% of the cases. The solution frequencies of one-

tier items (8.5% - 88.0%) were higher than those of two-tier items, which varied between

3.0% and 57.2%. This effect is well known from research using two-tier items. Next to

other factors, it is mainly caused by the fact that the probability of guessing is reduced by

the necessity of accounting in the second tier for the choice made in tier one (cf. e.g. Tan &

Treagust 2002).

This is also supposed to be one way of distinguishing students who just possess a

superficial factual knowledge of phenomena from students who have a deeper conceptual

knowledge of phenomena as they are not only able to give a correct answer for the first tier

of a multiple choice item but are also able to give a correct reason for their choice. As

reported elsewhere (cf. Haagen & Hopf 2012) a more detailed analysis of the items

indicated that most two-tier items used had a higher potential of portraying students conceptions in more detail in comparison to one-tier items.

The second part of the findings section is going to concentrate on the findings of the

interviews. As already mentioned above, the interviews were used to find appropriate

distractors for items not having a second tier. For this paper, the focus is on this issue and

in the following, one example of adding a second tier with help of the interview results is

reported.

For the topic of continuous propagation of light, the following item represented in figure 3

was used.


27

Figure 3. One-tier item of test version two concerning the key idea of continuous

propagation of light

For those students who indicated in the first tier that they supposed a different distance of

propagation of light from the campfire during day and night, we got 6 different categories

of reasons as shown in figure 4.

Figure 4. Reasons for a different propagation distance of light from a campfire during day

and night

Each of these categories was retranslated into students language taking either a student statement directly from the interviews or modifying a student statement slightly in order to

fulfil psychometric guidelines for distractor construction. This procedure led to the second

tier for this item as presented below in figure 5.


28

Figure 5. Two-tier item of test version two concerning the key idea of continuous

propagation of light

CONCLUSION

In conclusion, the analysis of the second test version showed that two-tier items of the test

are well able to portray several types of students conceptions known from literature. On the other hand, results indicated that some items needed still revision and improvement.

The results obtained by interviews were integrated and make up the third test-version,

which needs to be tested.

REFERENCES

Andersson, B.; Krrqvist, C. (1983): How Swedish pupils, aged 12-15 years, understand light and its properties. In: IJSE 5 (4), S. 387402.

Bardar, E.M; Prather, E.E; Brecher, K.; Slater, T.F (2006): Development and validation of

the light and spectroscopy concept inventory. In: Astronomy Education Review 5, S.

103.

Chu, H.E; Treagust, D.; Chandrasegaran, A. L. (2009): A stratified study of students'

understanding of basic optics concepts in different contexts using two-tier multiple-

choice items. In: RSTE 27, S. 253265.

Colin, P.; Chauvet, F.; Viennot, L. (2002): Reading images in optics: students difficulties

and teachers views. In: IJSE 24 (3), S. 313332.

Driver, R.; Guesne, E.; Tiberghien, A. (Hg.) (1985): Children's ideas in science.

Buckingham: Open University Press.

Duit, R. (2009): BibliographySTCSE: Students and teachers conceptions and science education. Retrieved October 20, 2009.

Duit, R.; Treagust, D.F (2003): Conceptual change: a powerful framework for improving

science teaching and learning. In: IJSE 25 (6), S. 671688.


29

Fetherstonhaugh, T.; Treagust, D. F. (1992): Students' understanding of light and its

properties: Teaching to engender conceptual change. In: SE 76 (6), S. 653672.

Galili, I. (1996): Students conceptual change in geometrical optics. In: IJSE 18 (7), S. 847868.

Guesne, E. (1985): Light. In: R. Driver, E. Guesne und A. Tiberghien (Hg.): Children's

ideas in science. 1993. Aufl. Buckingham: Open University Press, S. 1032.

Langley, D.; Ronen, M.; Eylon, B. S. (1997): Light propagation and visual patterns:

Preinstruction learners' conceptions. In: JRST 34 (4), S. 399424.

Law, J.F; Treagust, D. F. (2008): Diagnosis of student understanding of content specific

science areas using on-line two-tier diagnostic tests. Curtin University of

Technology.

Mayring, P. (2010): Qualitative Inhaltsanalyse. Weinheim: Beltz.

Treagust, D. F. (2006): Diagnostic assessment in science as a means to improving

teaching, learning and retention. In: UniSever Science - Symposium Proceedings:

Assessment in science teaching and learning. Sidney, 2006. UniServe Science.

Treagust, D.F; Glynn, S. M.; Duit, R. (1995): Diagnostic assessment of students science knowledge. In: Learning science in the schools: Research reforming practice 1, S.

327436.

Viennot, L. (2003): Teaching physics. Supported by: U. Besso, F. Chauvet, P. Colin, F.

Hirn-Chaine, W. Kaminski und S. Rainson: Springer Netherlands.


30

STRENGTHENING ASSESSMENT IN HIGH SCHOOL

INQUIRY CLASSROOMS

Chris Harrison

Kings college London

Abstract: Inquiry provides both the impetus and experience that helps students

acquire problem solving and lifelong learning skills. Teachers on the Strategies for

Assessment of Inquiry Learning in Science Project (SAILS) strengthened their

inquiry pedagogy, through focusing on seeking assessment evidence for formative

action. Observing learners in the classroom as they carry out investigations, listening

to learners piece together evidence in a group discussion, reading through answers to

homework questions and watching learners respond to what is being offered as

possible solutions to problems all provide plentiful and rich assessment data for

teachers.

Keywords: Inquiry, Assessment, Teacher change

BACKGROUND

The European Parliament and Council (2006) identified and defined the key

competencies necessary for personal fulfillment, active citizenship, social inclusion

and employability in our modern day society. These included communication skills

both in mother tongue and foreign languages, mathematical, scientific, digital and

technological competencies, social and civic competencies, cultural awareness and

expression, entrepreneurship and learning to learn. These key competencies formed

the foundation for the approach that our European Framework 7 project (EUFP7)

Strategies for Assessment of Inquiry Learning in Science Project (SAILS) took to

developing, researching and understanding how teachers might strengthen their

teaching of inquiry-based science education.

Since the Rocard Report (2007) recommended that school science teaching should

move from a deductive to an inquiry approach to science learning, there have been

several EUFP7 projects such as S-TEAM, ESTABLISH, Fibonacci, PRIMAS and

Pathway,.whose remit has been to support groups of teachers across Europe in

bringing about this radical change in practice. These projects have been successful in

highlighting the importance of IBSE across Europe. They also have enabled us to

determine the range of understanding of what the term inquiry means to teachers

across Europe, and to establish to what extent skills and competencies that are

developed through inquiry practices have been identified. The term inquiry has figured prominently in science education, yet it refers to at least three distinct categories of activitieswhat scientists do (e.g., conducting investigations using scientific methods), how students learn (e.g., actively inquiring through thinking and doing into a phenomenon or problem, often mirroring the processes used by scientists), and a pedagogical approach that teachers employ (e.g., designing or using curricula that allow for extended investigations) (Minner et al, 2009).

Inquiry-based science education (IBSE) has proved its efficacy at both primary and

secondary levels in increasing childrens and students interest and attainments levels (Minner et al, 2009: Osborne et al, 2008) while at the same time stimulating teacher


31

motivation (Wilson et al, 2010). One area that has remained problematic for teachers

and cited as one of the areas limiting the development of IBSE within schools has

been assessment. (Wellcome, 2011). This EUFP7 project Strategies for Assessment of

Inquiry Learning in Science (SAILS) aims to prepare science teachers, not only to be

able to teach science through inquiry, but also to be confident and competent in the

assessment of their students learning through inquiry. The literature on teacher change suggests that teacher change is a slow (and often difficult process and none

moreso than when the initiative requires teachers to review and change their

assessment practices (Harrison, 2012).

Part of the reason for this slow implementation of IBSE in science classrooms is the

time lag that happens between introducing ideas and the training of teachers at both

inservice and preservice level. While this situation should improve over the next few

years, there is a fundamental problem with an IBSE approach and this lies with

assessment. While the many EU IBSE projects have produced teaching materials,

they have not produced support materials to help teachers with the assessment of this

approach. Linked to this is the low level of IBSE type items in national and

international assessments which gives the message to teachers that IBSE is not

considered important in terms of skills in science education. It is clear that there is a

need to produce an assessment model and support materials to help teachers assess

IBSE learning in their classrooms if this approach is to be further developed and

sustained in classrooms across Europe.

Inquiry Skills

Inquiry skills are what learners use to make sense of the world around them. These

skills are important both to create citizens that can make sense of the science in the

world they live in so that they make informed decisions and also to develop scientific

reasoning for those undertaking future scientific careers or careers that require the

logical approach that science encourages. An inquiry approach not only helps

youngsters develop a set of skills such as critical thinking that they may find useful in

a variety of contexts, it can also help them develop their conceptual understanding of

science inquiry based science education (IBSE) and encourages students motivation

and engagement with science.

The term inquiry has figured prominently in science education, yet it refers to at least

three distinct categories of activitieswhat scientists do (e.g., conducting investigations using scientific methods), how students learn (e.g., actively inquiring

through thinking and doing into a phenomenon or problem, often mirroring the

processes used by scientists), and a pedagogical approach that teachers employ

(e.g., designing or using curricula that allow for extended investigations) (Minner,

2009). However, whether it is the scientist, student, or teacher who is doing or

supporting inquiry, the act itself has some core components.

Inquiry based science education is an approach to teaching and learning science that is

conducted through the process of raising questions and seeking answers (Wenning,

2005, 2007) . An inquiry approach fits within a constructivist paradigm in that it

requires the learner to take note of new ideas and contexts and question how these fit

with their existing understanding. It is not about the teacher delivering a curriculum

of knowledge to the learner but rather about the learner building an understanding

through guidance and challenge from their teacher and from their peers.


32

Some of the key characteristics of inquiry based learning are:

Students are engaged with a difficult problem or situation that is open-ended

to such a degree that a variety of solutions or responses are conceivable.

Students have control over the direction of the inquiry and the methods or

approaches that are taken.

Students draw upon their existing knowledge and they identify what their

learning needs are.

The different tasks stimulate curiosity in the students, which encourages them

to continue to search for new data or evidence.

The students are responsible for the analysis of the evidence and also for

presenting evidence in an appropriate manner which defends their solution to

the initial problem (Kahn & O'Rourke, 2005).

In our view, these inquiry skills are developed and experienced through working

collaboratively with others

esera dcqqdeqeqebook part 11

Documents

assessment of student

assessment instruments

science process skills

developmentintroduction

students attitudes

existing instruments

secondary school students

twotier test instrument