esera dcqqdeqeqebook part 11
DESCRIPTION
qefqefTRANSCRIPT
-
Strand 11
Evaluation and assessment of
student learning and development
-
i
CONTENTS
Chapter Title Page
1 Introduction
Robin Millar, Jens Dolin
1
2 Performance assessment of practical skills in science in teacher
training programs useful in school
Ann Mutvei Berrez, Jan-Eric Mattsson
3
3 Development of an instrument to measure childrens systems thinking
Kyriake Constantinide, Michalis Michaelides, Costantinos P.
Constantinou
13
4 Development of a two-tier test- instrument for geometrical
optics
Claudia Haagen, Martin Hopf
24
5 Strengthening assessment in high school inquiry classrooms
Chris Harrison
31
6 Analysis of student concept knowledge in kinematics
Andreas Lichtenberger, Andreas Vaterlaous, Clemens Wagner
38
7 Measuring experimental skills in large-scale assessments:
Developing a simulation-based test instrument
Martin Dickmann, Bodo Eickhorst, Heike Theyssen, Knut
Neumann, Horst Schecker, Nico Schreiber
50
8 The notion of authenticity according to PISA: An empirical
analysis
Laura Weiss, Andreas Mueller
59
9 Examining whether secondary school students make changes
suggested by expert or peer assessors in the science web-
portfolio
Olia Tsivitanidou, Zacharias Zacharia, Tasos Hovardas
68
10 Sources of difficulties in PISA science items
Florence Le Hebel, Andree Tiberghien, Pascale Montpied
76
Strand 11 Evaluation and assessment of student learning and development
-
ii
11 In-context items in a nation wide examination: Which
knowledge and skills are actually assessed?
Nelio Bizzo, Ana Maria Santos Gouw, Paulo Sergio Garcia,
Paulo Henrique Nico Monteiro, Luiz Caldeira Brant de
Tolentino-Neto
85
12 Predicting success of freshmen in chemistry using moderated
multiple linear regression analysis
Katja Freyer, Matthias Epple, Elke Sumfleth
93
13 Testing student conceptual understanding of electric circuits as a
system
Hildegard Urban-Woldron
101
14 Process-oriented and product-oriented assessment of
experimental skills in physics: A comparison
Nico Schreiber, Heike Theyssen, Horst Schecker
112
15 Modelling and assessing experimental competence: An
interdisciplinary progress model for hands-on assessments
Susanne Metzger, Christoph Gut, Pitt Hild, Josiane Tardent
120
16 Effects of self-evaluation on students achievements in chemistry education
Inga Kallweit, Insa Melle
128
Strand 11 Evaluation and assessment of student learning and development
-
INTRODUCTION
Strand 11 focuses on the evaluation and assessment of student learning and
development. Many studies presented in other conference strands, of course, involve
the assessment of student learning or of affective characteristics and outcomes such as
students attitudes or interests and use existing instruments or new ones developed for the study in hand. In such studies, assessment instruments are tools to be used to
try to explore and answer other questions of interest. In strand 11, the emphasis is on
the development, validation and use of assessment instruments; the focus is on the
instrument itself. These can include standardized tests, achievement tests, high stakes
tests, and instruments for measuring attitudes, interests, beliefs, self-efficacy, science
process skills, conceptual understandings, and so on. They may be developed with a
view to making assessment more authentic in some sense, to facilitate formative assessment, or to improve summative assessment of student learning.
Fifteen papers presented in this strand are included in this book of e-proceedings.
Four of them discuss the development of new or modified instruments to assess
students conceptual understanding of a science topic. Two use the two-tier multiple choice format that many researchers have found valuable for probing understanding,
to explore the topics of electric circuits and geometrical optics. Another explores the
factors that may underlie the observed patterns in students responses, trying to tease out the relative importance of mathematical and physical ideas in determining
performance on questions about kinematics. A fourth paper begins the exploration of
a relatively new and novel science domain, systems thinking. Here assessment items
have a particularly significant role to play in helping to define the domain in
operational terms, and facilitating discussion within the science education research
community.
Four papers explore issues concerning the assessment of practical competence and
skills. One looks at the general issue of developing a model to describe progress in
carrying out hands-on activities; another focuses more specifically on experimental
skills in physics; and a third considers performance assessment in the context of initial
teacher education. The fourth paper looks at the potential use of simulations as
surrogates for bench practical activities. Work in this domain is important, as science
educators seek to come to a better understanding of the factors that lead to variation in
students responses to practical tasks.
Three papers look in different ways at the influence of contexts on students answers and responses to tasks. Two take the PISA studies as their starting point, looking in
detail at the thinking of students as they respond to PISA tasks and questioning the
extent to which the PISA interpretation of authenticity enhances student interest and engagement with assessment tasks. Both point to the value of listening to students
talking about their thinking as they answer questions, and suggest that this may be
quite different from what we would expect, and perhaps hope. A third paper with an
interest in the effects of contextualisation presents data from a study in Brazil
comparing students answered to sets of parallel questions with fuller and more abridged contextual information. The findings have implications for item design, and
suggest that reading demands should be kept carefully in check if we aim to assess
science learning.
Strand 11 Evaluation and assessment of student learning and development
1
-
Three papers in this section explore the formative use of assessment. One has a focus
on the assessment of learning that results from inquiry-based science teaching.
Another looks at the ways in which students respond to formative feedback on their
work. The context for this study is web portfolios, but the research question is one
with wider applicability to other forms of feedback, and across science contents more
generally. The third uses an experimental design to explore the impact on student
learning in a topic on chemical reactions of a self-evaluation instrument that asks
students to try to monitor their own learning and to take action to address areas in
which they judge themselves to be weak.
All of the papers described above collect data from students of secondary school age
or prospective teachers. The final paper in this strand looks at the potential use of an
attitude assessment instrument to predict undergraduate students success in chemistry learning.
The set of papers highlights the key role of assessment items and instruments as
operational definitions of intended learning outcomes, bringing greater clarity to the constructs used and to our understanding of learning in the domains that they study.
Jens Dolin and Robin Millar
Strand 11 Evaluation and assessment of student learning and development
2
-
PERFORMANCE ASSESSMENT OF PRACTICAL
SKILLS IN SCIENCE IN TEACHER TRAINING
PROGRAMS USEFUL IN SCHOOL
Ann Mutvei and Jan-Eric Mattsson
School of Natural Sciences, Technology and Environmental Studies, Sdertrn
University, Sweden.
Abstract: There is a general process towards an understanding of knowledge not as a
question of remembering facts but to achieve the skill to use what is learnt under
different circumstances. According to this, knowledge should be useful at different
occasions also outside school. This process may also be identified in the development
of new tests performed in order to assess knowledge.
In courses in biology, chemistry and physics focused on didactics we have developed
performance assessments aimed at assessing the understanding of general scientific
principles by simple practical investigations. Although, designed to assess whether
specific goals are attained, we discovered how small alterations of performance
assessments promoted the development of didactic skills. Performance assessments
may act as tools for the academic teacher, school teacher and for enhancement of
student understanding of the theory.
This workshop was focused on performance assessments of the ability to present
skills and to develop new ideas. We presented, discussed, explained and familiarized
a practical approach to performance assessments in science education together with
the other participants. The emphasis was to demonstrate and to give experience of this
assessment tool.
We performed elaborative tasks as they may be used by teachers working at different
levels, assessed the performances and evaluated the learning outcome of the activity.
Different assessment rubrics where be presented and tested at the workshop. Learning
by doing filled the major part of the workshop but there were also opportunities for
discussions, sharing ideas and suggestions for further development.
The activities performed may be seen as models possible for further development into
new assessments.
Keywords: assessment, rubric, practical skills, knowledge requirement
INTRODUCTION
During the last ten or fifteen years there has been a general process towards an
understanding of knowledge not as a question of remembering facts but to achieve the
skill to use what is learnt under different more or less practical circumstances.
According to this view knowledge should be useful at different occasions also outside
school. Traditional textbooks often had facts arranged in a linear and in a hierarchical
order. More recent books are focused on the development of the thoughts and ideas of
the student by presenting general principles underpinned by good examples,
Strand 11 Evaluation and assessment of student learning and development
3
-
diagnoses, questions to discuss, reflective tasks without any presentation of a correct
answers, etc. (cf. Audesirk et al. 2008, Hewitt et al. 2008, Reece et. al 2011, Trefil &
Hazen 2010). A similar development can be found in teacher training programs,
where lectures and traditional text seminars to some extent have been replaced by
more interactive forms of teaching. This development we also found in examinations
at our own university where tests performed in order to assess knowledge of literature
content have been replaced by tests where students have to show their capacity to use
their knowledge.
Practical performance assessments are important when assessing abilities or skills of
students in teacher training programs. In science courses in biology, chemistry and
physics focused on didactics we have for several years developed performance
assessments focused on understanding of general scientific principles, but based on
simple practical investigations or studies. Although, designed to assess whether
students reached the goals of a specific course, we often have discovered how small
alterations of these performance assessments have promoted the development of the
didactic skills of the student. Thus, they may act as assessment tools for the academic
teacher, models for assessments in school and enhancement of the students
theoretical understanding of the subject and theory. The assessments may be made on
oral or written reports, during guided excursions or museum visits or practical
experiments, on traditional or esthetical diaries, self diagnoses or diagnoses made by
other students based on certain criteria.
We have been working several years with teacher training programs focused on work
in primary and secondary schools, with further education for teachers and with
university students studying biology and chemistry. The wide range of courses and
students have been giving us experiences how to work with different contents adapted
to different ages of students at school. Out of this we have found some similar and
different basic problems and needs of understanding depending on the subject. These
experiences also give us the opportunity to contribute to national seminars and
conferences.
CURRICULUM AT SWEDISH SCHOOL
The new curriculum in Sweden for the primary and lower secondary schools
(Skolverket 2010) as well as the new one for the upper secondary school put the
emphasis on the students skills rather than knowledge (facts). It is the ability to use
the knowledge that is to be assessed. This development is a global trend; see e.g.
Eurasian Journal of Mathematics, Science & Technology Education 8(1). This is a
great change compared to earlier curricula, especially when compared to the common
interpretation and implementation of these at the local level. A similar development
has occurred in the universities in Sweden. Today the intended learning outcomes
should be described in the syllabi as abilities the student can show after finishing the
course and how this should be done.
Many teachers have problems with this view as they are used to assess the students
ability to reproduce facts. These teachers find it hard to understand how to work with
performance assessments instead of tests targeting the knowledge of facts. They often
ask for clear directions and expect strict answers instead of guidelines how to improve
their own ability to work with performance assessments.
Strand 11 Evaluation and assessment of student learning and development
4
-
Teaching according to these new curricula starts with the design of performance
assessments suitable for the assessment of a specific skill and to create a rubric for the
assessment. Thereafter the teacher plans the exercises beneficial for student
development and finally decides the time needed and plans the activities according to
this.
Figure 1. How to plan learning situations.
As an example of how teachers may work with this method we designed a practical
assessment of practical skills and presented it as a workshop at ESERA 2013.
HOW TO DESIGN A PERFORMANCE ASSESSMENT OF PRACTICAL
SCIENCE SKILLS
In order to design a workshop on performance assessments of the skills we tried to do
as teachers are supposed to do at school. The emphasis was to demonstrate and give a
possibility to get experience of this assessment tool under realistic conditions. Thus,
these performance assessments are constructed in accordance with the curriculum in
Sweden from 2011 (Skolverket 2010) but they are probably useful for anyone who
wants to assess abilities or skills rather than memories of facts or texts. We tried to
present, explain and familiarize the participants with a practical approach to
performance assessments in science education at school.
The skill of assessment has to be learned. If teachers are used to assess skills these
normally are of a more or less theoretical kind. They are used to assess the quality of
the language used or the correctness of a mathematical calculation. Assessment of
practical skills does not has to be more complicated but it has to be trained. According
to the Swedish curriculum 150200 assessments of each student and in each of the
about 15 school subjects should be done at the end of years 6 and 9 and many of these
refer to practical skills. In order to simplify this monstrous task it is possible and
necessary to assess several skills in more than one subject at one occasion.
We had prepared four similar activities, all with the same material; candle, wick, and
matchbox but with different purposes. They were supposed to represent studies of
Strand 11 Evaluation and assessment of student learning and development
5
-
mass transfer, energy transformation, technical design, and phase changes. The latter
is presented here in detail.
General principles of performance assessments
In the preparations we followed the directions of the Swedish curriculum for the
compulsory school (Skolverket 2010). We selected the core content and the
knowledge requirements relevant for phase transitions as the foundation for
development of the performance assessment. Usually teachers start with the
knowledge requirements, interpret these and design tests for assessing the students
skills according to the requirements, design suitable learning situations or practical
training of the skills and finally decide what parts of the core content should be used
(Figure 1). Here we started with the core content as it were some specific areas of
knowledge we wanted to study. When the core content was selected the assessment
rubric was developed by interpreting and dissecting the knowledge requirements.
Core content
The teaching in science studies should, in this case, according to the curriculum of
primary and secondary school (Skolverket 2010), deal with the core content presented
in Table 1.
Table 1
Core content in Swedish compulsory school curriculum relevant for phase transitions
and scientific studies.
In years 13 In years 46 In years 79 Various forms of water: solids, liquids and gases.
Transition between the
forms: evaporation,
boiling, condensation,
melting and solidification.
Simple particle model to describe and explain the structure,
recycling and indestructibility of
matter. Movements of particles as
an explanation for transitions
between solids, liquids and gases.
Particle models to describe and explain the
properties of phases, phase
transitions and distribution
processes for matter in air,
water and the ground.
Simple scientific studies. Simple systematic studies. Planning, execution and
evaluation.
Systematic studies. Formulating simple
questions, planning,
execution and evaluation.
The relationship between chemical experiments and
the development of
concepts, models and
theories.
Knowledge requirements
The knowledge requirements are related to the age of the students and show a clear
progression through school. At the end of the third, sixth and ninth year there are
clearly defined knowledge requirements (Table 2). Grades are introduced in the sixth
year and levels for grades E (lowest), C, and A (highest) are described in the
curriculum. Also D and B are being used. Grades D or B means that the knowledge
requirements for grade E or C and most of C or A are satisfied respectively.
Strand 11 Evaluation and assessment of student learning and development
6
-
Table 2
Knowledge requirements for different years and grades
Year
3
Based on clear instructions, pupils can carry out [] simple studies dealing with nature and people, power and motion, and also water and air.
Grade E Grade C Grade A
Year
6
Pupils can talk about and
discuss simple questions
concerning energy.
Pupils can carry out
simple studies based on
given plans and also
contribute to
formulating simple
questions and planning
which can be
systematically developed.
In their work, pupils use
equipment in a safe and
basically functional way.
Pupils can [] contribute to making
proposals that can
improve the study.
Pupils can talk about and
discuss simple questions
concerning energy.
Pupils can carry out
simple studies based on
given plans and also
formulate simple
questions and planning
which after some
reworking can be
systematically developed.
In their work, pupils use
equipment in a safe and
appropriate way. Pupils
can [] make proposals which after some
reworking can improve
the study.
Pupils can talk about and
discuss simple questions
concerning energy.
Pupils can carry out simple
studies based on given
plans and also formulate
simple questions and
planning which after some
reworking can be
systematically developed.
In their work, pupils use
equipment in a safe,
appropriate and effective
way. Pupils can [] make proposals which can
improve the study.
Year
9
Pupils can talk about and
discuss questions
concerning energy. Pupils
can carry out studies
based on given plans and
also contribute to
formulating simple
questions and planning
which can be
systematically developed.
In their studies, pupils use
equipment in a safe and
basically functional way.
Pupils apply simple
reasoning about the
plausibility of their results
and contribute to making
proposals on how the
studies can be improved.
Pupils have basic
knowledge of energy,
matter, [] and show this by giving examples and
describing these with
some use of the concepts,
models and theories.
Pupils can talk about and
discuss questions
concerning energy. Pupils
can carry out studies
based on given plans and
also formulate simple
questions and planning
which after some
reworking can be
systematically developed.
In their studies, pupils use
equipment in a safe and
appropriate way. Pupils
apply developed
reasoning about the
plausibility of their results
and make proposals on
how the studies can be
improved. Pupils have
good knowledge of
energy, matter, [] and show this by explaining
and showing
relationships with
relatively good use of the
concepts, models and
theories.
Pupils can talk about and
discuss questions
concerning energy. Pupils
can carry out studies based
on given plans and also
formulate simple questions
and planning that can be
systematically developed.
In their investigations,
pupils use equipment in a
safe, appropriate and
effective way. Pupils apply
well developed reasoning
concerning the plausibility
of their results in relation
to possible sources of
error and make proposals
on how the studies can be
improved and identify new
questions for further
study. Pupils have very
good knowledge of energy,
matter, [] and show this by explaining and showing
relationships between
them and some general
characteristics with good
use of the concepts, models
and theories
Strand 11 Evaluation and assessment of student learning and development
7
-
Assessments of knowledge requirements
The knowledge requirements were interpreted and dissected in smaller units in order
to construct an assessment rubric adapted to the inquiry. Five main skills were
selected from the knowledge requirements; Use of theory, Improvement of the
experiment, Explanations, Relate, and Discuss. In order to make the assessment rubric
more generalized we decided not to use the grades of the curriculum but recognized
three levels of skills; Sufficient, Good, and Better corresponding to the grades E, C
and A respectively. In all cases we also gave examples of relevant student answers.
This is a more or less necessary requirement in order to make sure that the performer,
assessor or teacher really understands what is meant by a specific requirement (Arter
& McTighe 2001, Jnsson 2011).
As an example of this we can look at the knowledge requirement Pupils can carry
out studies based on given plans and also contribute to formulating simple questions
and planning which can be systematically developed. In their studies, pupils use
equipment in a safe and basically functional way. Pupils apply simple reasoning
about the plausibility of their results and contribute to making proposals on how the
studies can be improved. (Year 9, level E). This requirement contains information
that may be dissected into several units.
Primarily it is necessary to look at the five skills of the students that are going to be
assessed and look at the suitable requirements for each skill. The students are
supposed to carry out studies based on given plans. In the case the experiment is
very simple, (light and observe a burning candle), and hardly useful assessing this
specific skill. They shall also contribute to formulating simple questions and
planning which can be systematically developed. This requirement can be further
developed to suit the five skills.
In order to show this skill it is necessary to have some knowledge about the theory
and use it in a suitable way. The skill use of theory is a necessary condition for this
and may be formulated as The student draws simple conclusions partly related to
chemical models and theories. This criterion also is in concordance with the skill
simple reasoning about the plausibility of their results and contribute to making
proposals on how the studies can be improved. This may be formulated as the
student discusses the observations and contributes with suggestions of improvements
in the rubric for assessment of the improvement of the experiment requirement.
In a similar way the assessment of remaining three skills may be developed into more
specific criteria adapted to this experiment (Table 3).
In order to make it possible for the student to understand what is expected it is
necessary to clarify the requirement criteria and give realistic examples of these
requirements. The meaning of words differs between disciplines not only in the
academic world but also in school (cf. Chanock 2000). This has consequences when
students get feedback as they often do not understand the academic discourse with its
specific concepts and fail to use the feedback later (Lea & Street 2006). Criteria
combined with explicit examples are necessary to solve this problem (Sadler 1987).
This is also important when designing assessment rubrics (Busching 1998, Arter &
McTighe 2001). Thus, to every criterion there has to be at least one example given. In
Table 3 this is exemplified in every combination of skill and grade requirement.
Strand 11 Evaluation and assessment of student learning and development
8
-
Table 3
Assessment rubric for assessing skills in an experiment of phase changes Sufficient Good Better
Use of theory The student draws
simple conclusions
partly related to
chemical models and
theories. (I can see
stearic acid in solid,
liquid and gas phase.)
The student draws
conclusions based on
chemical models and
theories. (The heat of the
candle causes the phase
transfer between the
phases.)
The student draws well
founded conclusions out of
chemical models and
theories. (Stearic acid
must in gas phase and mix
with oxygen to burn.)
Improvement
of the
experiment
The student discusses
the observations and
contributes with
suggestions of
improvements.
(Observe more
burning candles.)
The student discusses
different interpretations
of the observations and
suggests improvements.
(Remove the wick and
relight the candle.)
The student discusses well
founded interpretations of
the observations, if they
are reasonable, and
suggests based on these
improvements which allow
enquiries of new
questions. (Heat a small
amount of stearic acid and
try to light the gas phase
above.)
Explanations
The student gives
simple and relatively
well founded
explanations. (The
stearic acid melts by
heat produced by the
flame.)
The student gives
developed and well
founded explanations.
(Also the change from
liquid phase to gaseous
phase depends on the
heat from the flame.)
The student presents
theoretically developed
and well founded
explanations. (All phase
changes from solid to
liquid or liquid to gaseous
need energy.)
Relate The student gives
examples of similar
processes as in the
experiment related to
questions about
energy, environment,
health and society.
(The warmth of the sun
melts the ice on the
lake at the end of the
winter.)
The student generalizes
and describes the
occurrence of similar
phenomena as in the
experiment related to
questions about energy,
environment, health and
society. (In the frying
pan it is hot enough for
butter to melt and in the
sauna water vaporizes.)
The student discusses the
occurrence of the
phenomena observed in
everyday life and the use
of it and its impact on
environment, health and
society. (The phase change
from liquid to gaseous
phase cools you down
when you are sweating.)
Discuss The student
contributes to a
discussion of the
occurrence of the
phenomena studied in
society and makes
statements partly based
on facts and describes
some possible
consequences. (Gases
are often transported
in a liquid phase which
has a lower volume.)
The student describes
and discusses the
occurrence of the
phenomena studied in
society and makes
statements based on facts
and fairly complicated
physical relations and
theories. (The bottle of a
gas stove has fuel mainly
in liquid phase but it is
transported in the hose
and burnt i gaseous
phase.)
The student uses the
experiment as a model and
discusses the occurrence
of the phenomena studied
in society and makes
statements and
consequences based on
facts and complicated
physical relations and
theories (The phase
change from liquid to
gaseous phase cools you
down when you are
sweating.)
Strand 11 Evaluation and assessment of student learning and development
9
-
WORKSHOP
We had prepared four similar activities, all with the same material; candle, wick, and
matchbox but with different purposes. The activities represented studies of mass
transfer, energy transformation, technical design, and phase changes. At the workshop
three groups were formed, omitting the study of technical design. The three groups
were not informed about the differences between the aims of their experiments. The
groups were constructed to include people with as varied background as possible.
Thus, participants from one specific country or similar fields as chemistry or physics
were allocated to different groups. They performed elaborative tasks similar to those
used by teachers working at different levels, assessed the performance and evaluated
the learning outcome of the activity. Within each group one person was selected to do
the assessment of activities the others made. The person assessing the work should
focus not only on the results of the discussions within the group but also try to
evaluate the process, as the aim was to assess the skills of the participants rather than
the content of their knowledge.
Discussion
The aim was to demonstrate of how peer reviewing within the group may be used for
producing information of several kinds beneficial for the performance assessment of
science education at school. Discussions arose among the participants about how an
integrated approach, especially in relation to other subjects in school, improved the
usefulness of the methods. Learning by doing followed by discussions became the
major part of the workshop with sharing of ideas and suggestions for further
development.
Most of the participants had weak knowledge of assessments of practical skills and
expressed their astonishment of the positive result of the workshop and showed
curiosity to use the method. Some of the participants also showed didactic skills when
explaining the different aspects of the experiment they mastered to the others, a good
example of the importance of variation in the skills of group members.
The persons who made the assessments expressed the need of further practicing. They
realized the complexity in assessing different skills at the same time as assessing the
grade. They also expressed a will to develop this ability as they realized the strength
in assessing several skills at one occasion. Further, the participants noted the
importance of questions like the last on in the instructions (Appendix) in order to
assess the quality of the relation between theory and practice.
Conclusion
Although, based on a simple experiment of a burning candle, the workshop gave a
opportunity to discuss and understand theories being regarded as difficult to
understand from the viewpoint of the student or difficult to teach from the teachers
view. The experiments, although similar, were of different character, thus, reflecting a
wide spectrum of possibilities.
Thus, the activities performed may be seen as models or examples possible to further
develop new assessments according to the content of the subject.
Strand 11 Evaluation and assessment of student learning and development
10
-
REFERENCES
Arter, J. A. & McTighe, J. (2001). Scoring rubrics in the classroom, Corwin
Audesirk, T., Audesirk, G., & Byers, G B. (2008). Life on earth, 5 ed., San Francisco,
Pearson Education.
Busching, B. (1998). Grading Inquiry Projects. New Directions for Teaching and
Learning 74: 8996.
Chanock, K. (2000). Comments on Essays: do students understand what tutors write?
Teaching in Higher Education 5 (1): 95105.
Hewitt, P. G., Suchocki, J. & Hewitt, L. A. (2008). Conceptual physical science, 4 ed.
San Francisco, Pearson Education.
Jnsson, A. (2011). Lrande bedmning. Gleerups.
Lea, M.R. & Street B.V. (2006). The Academic Literacies Model: Theory and
Applications. Theory into Practice, 45(4): 368377.
Reece, J.B., Urry, L.A., Cain, M.L., Wasserman, S.A., Minorsky, P.V. & Jackson, R.
B. (2011). Campbell Biology Global Edition, Pearson.
Sadler, D.R. (1987). Specifying and Promulgating Achievement Standards. Oxford
Review of Education 13(2): 191209.
Skolverket (Swedish National Agency for Education). (2010) Curriculum for the
compulsory school, preschool class and the recreation centre 2011. Skolverket.
Trefil, J. & Hazen R.M. (2010). Sciences an integrated approach. Wiley Eurasian
Journal of Mathematics, Science & Technology Education 8(1).
Strand 11 Evaluation and assessment of student learning and development
11
-
APPENDIX
INQUIRY OF A BURNING CANDLE
This is an experiment of phase changes
1. Light the candle and observe the change of phases.
2. Which changes of phase can you observe?
3. Where do they occur?
4. Why do they occur?
5. What happens in the different phases?
6. How may you improve the experiment?
7. Give examples of phase changes in daily life and the society.
INQUIRY OF A BURNING CANDLE
This is an experiment of energy transformation
1. Light the candle and observe the energy transformations.
2. Which changes of energy forms can you observe?
3. Where do they occur?
4. Why do they occur?
5. What happens during the different energy transformations?
6. How may you improve the experiment?
7. Give examples of energy transformations in daily life and the society.
INQUIRY OF A BURNING CANDLE
This is an experiment of mass transfer
1. Light the candle and observe mass transfer
2. Which types of mass transfer can you observe?
3. Where do they occur?
4. Why do they occur?
5. What happens to the candle due to this mass transfer?
6. How may you improve the experiment?
7. Give examples of mass transfer in daily life and the society.
INQUIRY OF A BURNING CANDLE
This is an experiment of candle design
1. Light the candle and discuss the design of the candle.
2. Which different parts can you observe in the candle?
3. Where are they and how are they united?
4. What function do the different parts have?
5. Why is the candle created in that way?
6. How may you improve the experiment?
7. Give examples of similar designs in daily life and the society.
Strand 11 Evaluation and assessment of student learning and development
12
-
DEVELOPMENT OF AN INSTRUMENT TO MEASURE
CHILDRENS SYSTEMS THINKING
Kyriake Constantinide, Michalis Michaelides and Costas P. Constantinou
University of Cyprus
Abstract: Systems thinking is a higher order thinking skill required to meet the demands
of social, environmental, technological and scientific advancements. Science abounds in
systems and makes system function a core object of investigation and analysis. As a
consequence, teaching in science can be a valuable framework for developing systems
thinking. In order to approach this methodically, it becomes important to specify the
aspects that constitute the systems thinking construct, design curriculum materials to help students develop these aspects, and develop instruments for evaluating students competence and monitoring the learning process. The present study aims at the
development of an instrument for standardized assessment of systems thinking. It draws
on a methodology that follows a cyclic procedure for instrument development and
validation, where literature, experts, students and educators contribute in the procedure.
Currently, the assessment instrument is in the second cycle of field testing, having
collected data from about 900 students and having used these to develop a first version of
a validated test and a scale for measuring 10-14-year-old childrens systems thinking. The test consists of multiple-choice scenario items that draw their content from everyday
life. We present the methodology we are following, providing some examples of
multiple-choice items to demonstrate their development and transformation throughout
the process.
Keywords: systems thinking, assessment, test development
BACKGROUND
The rate of advancements in scientific knowledge and technology and the widespread
demands on young people to participate actively in solving problems in almost every
aspect of our lives have reoriented the role of education in general and science teaching
in particular. Nowadays, science teaching aims at developing scientifically literate people
with flexible thinking skills and an ability to participate critically in meaningful
discourse. More specifically, it aims at helping students acquire positive attitudes towards
learning and science, a variety of experiences, conceptual understanding, epistemological
awareness, practical and scientific skills and creative thinking skills (Constantinide,
Kalyfommatou & Constantinou, 2001).
The definitions of systems thinking described in the literature (e.g., Senge, 1990; Thier &
Knott, 1992; Booth Sweeney, 2001; Ben-Zvi Assaraf & Orion, 2005) include thinking
about a system, meaning a number of interacting items that produce a result over a period
of time. According to the Benchmarks for Science Literacy (AAAS; 1993), systems
thinking is an essential component of higher order thinking, whereas Kali, Orion and
Strand 11 Evaluation and assessment of student learning and development
13
-
Eylon (2003) refer to systems thinking as a high-order thinking skill required in
scientific, technological, and everyday domains. Senge (1990) claims that systemic
thinkers are able to change their own mental models, control their way of thinking and
their problem-solving process. Therefore, defining, promoting through curricula, and
measuring systems thinking should be an essential priority for education. Science
teaching and learning can be a valuable framework for developing such skills, since it
abounds in systems and science makes system function a core object of investigation and
analysis.
Several structured teaching attempts to promote systems thinking are reported in the
literature, making the development of instruments for measuring systems thinking and for
evaluating the effectiveness of such curricula a necessity. The most common means of
evaluating systems thinking that has been reported thus far include tests (e.g. Riess &
Mischo, 2009), interviews (e.g. Hmelo-Silver & Green Pheffer, 2004) and computer
simulations and logs (e.g. Sheehy, Wylie, McGuinness & Orchard, 2000). Some
researchers in order to triangulate their data used a combination of various data sources
(e.g. Ben-Zvi Assaraf & Orion, 2005). Almost all means include tasks where a problem is
introduced and the subjects have to propose solutions or predict the behavior of the
system and its elements. Nevertheless, to date there is no validated instrument and prior
research has not provided a scale for measuring systems thinking of children aged 10-14
years old. The purpose of this paper is to describe the on-going development process of
the Systems Thinking Assessment (STA), a test designed to assess systems thinking.
RESEARCH METHODOLOGY
Systems Thinking Assessment (STA): purpose and specifications
The STA will be used to measure the quality of thinking about systems by children aged
10-14 and the effectiveness of curricula designed to promote systems thinking. It consists
of multiple-choice items in the context of everyday phenomena, familiar to the children
of the specific age range. The stems of the items include a scenario and children are
asked to choose the best possible answer, amongst four alternatives.
Multiple choice items have advantages and disadvantages. Given that every other criterion
was taken into account, grading a multiple choice test is objective, since a grader would
mark an item in the same way as anybody else. Besides, a short amount of time is needed
to administer many items, in order to sufficiently cover the content domain under study.
They are also more reliable than other forms of questions, since, in a possible
readministration of a test, it is more likely that a subject will produce the same answers if
the questions are multiple choice than if they are open-ended. A basic disadvantage of
multiple choice questions is that they do not provide much information on the subjects thinking processes, namely the reasons for which they answer each item the way they do.
Nevertheless, the procedure of the tests development and Rasch analysis minimize the effect of this disadvantage on the results.
Strand 11 Evaluation and assessment of student learning and development
14
-
In order to be able to make generalizations, there was an intentional effort to include items
that utilize various systems: physical-biological systems (such as water cycle, a forest, a
dam or food webs), mechanical-electrical systems (such as a bicycle or a car) and
socioeconomic systems (such as a family, a village or a store). Moreover, where possible,
a picture or a diagram was added in the items wording, so as to make the item clearer and the test more eye-pleasant.
We have adopted the following operational definition of systems thinking, which relies
on four strands:
(a) System definition includes identifying the essential elements of a system, its temporal boundaries and its emergent phenomena. (b) System interactions includes reasoning about causes and effects when interactions are taking place within the system.
(c) System balance refers to the abilities of recognizing the relation between interactions and the systems balance. (d) Flows refers to reasoning about the relation of inflows and outflows in a system and recognizing cyclic flows of matter or energy.
STAs cyclic development procedure
Figure 1 presents the cyclic nature of the STA development. The definition of Systems
Thinking in the center of the cycle is in regard to both the abilities that constitute it and
the items that measure it. Involved parties (experts, educators, students and existing
literature), provide feedback on Systems Thinking definition through data that define the
tests validity and reliability.
Figure 1. Development procedure for STA
Educators
(face validity)
Literature
(content validity)
Experts
(content validity)
Students
(test admin. and
interview data)
(construct,
criterion and face
validity,
reliability)
Systems
Thinking
(Abilities and
items)
Strand 11 Evaluation and assessment of student learning and development
15
-
The STA has already undergone its first cycle of development. Reviewing the literature
led to 13 abilities that seemed to define Systems Thinking. The original items were
developed and administered to a small number of 10-year-old students. Qualitative and
quantitative data led to modifications (content and wording changes) and the
development of new items. Two experts gave feedback on the tests content validity. Further improvements were carried out and two educators with experience with children
aged 10-14 years old examined the face validity of the test. The revised version was once
again administered to a small number of 10-year-old students and after the necessary
modifications the final form of the test with 52 multiple-choice items was administered to 900 students. Rasch modeling led to a scale showing items difficulty and students ability.
Based on a broader literature review and the development of separate examples regarding
each ability, the second development cycle began with revising the 13-ability schema and
reducing the abilities to 10 and the items to 41. The revised test was given to
approximately 90 10-14-year-old students. Test and items difficulty indices, items discrimination indices and frequencies were calculated and, items were either modified or
replaced. Afterwards, 16 students participated in interviews, answering the items and
following a think-aloud protocol (Ericsson & Simon, 1998). Non-effective items were
replaced or modified.
The latest version of the items is under evaluation by independent experts. Graduate/PhD
students in Learning in Science, academics specialized in Science Teaching or
Psychology and international researchers with experience on Systems Thinking
measurement will provide feedback on the test by solving it first, and by judging its
efficiency based on a structured protocol. Finally, an expert panel will be formed, during which any problems will be discussed until the panel reaches consensus. The
revised test will be given to four educators to evaluate its face validity. The test will then
be administered to 100 10-14-year-old students to statistically assess its clarity and its
developmental validity. The improved test will finally be administered to 500 students
and the data will be analyzed using Rasch modeling. Confirmatory Factor Analysis will
be carried out in order to assess the 10-ability structure of the construct.
RESULTS
At the final stage of the first cycle of the STA development, the test was administered to
about 900 students. Rasch statistical model provided a scale for the 52 items of the STT,
where both subjects score and items degree of difficulty are presented (Figure 2).
It is evident that the 52 items of the test fit the model well. Both students scores and items degree of difficulty are distributed uniformly on the scale. Students scores vary between -2.16 and 2.37 logits, whereas the items degree of difficulty varies between -2.41 and 2.53 logits.
Strand 11 Evaluation and assessment of student learning and development
16
-
* Every represents 4 students
Figure 2. Scale of STT (at the end of first cycle)
Strand 11 Evaluation and assessment of student learning and development
17
-
Table 1
Statistical values for the 52 STA items for the whole sample and the four groups
Statistical indices Total
sample
5th
Gr.
Prima
ry
6th
grade
Primary
1st
grade
Secon
d.
2nd
grade
Secon
d.
(n=848
)
(n=21
9)
(n=249) (n=13
7)
(n=24
3)
Mean (items*) 0.00 0.00 0.00 0.00 0.00
(persons) -0.01 -0.30 -0.05 0.14 0.21
Standard deviation (items) 0.97 0.96 0.97 1.08 1.03
(persons) 0.72 0.66 0.73 0.73 0.68
Separability** (items) 0.99 0.97 0.98 0.96 0.98
(persons) 0.81 0.77 0.81 0.81 0.78
Mean Infit mean square (items) 1.00 1.00 1.00 1.00 1.00
(persons) 1.00 1.00 1.00 1.00 1.00
Mean Outfit mean square (items) 1.01 1.02 1.02 1.01 1.01
(persons) 1.01 1.02 1.02 1.01 1.01
Infit t (items) -0.12 -0.13 -0.03 0.00 0.04
(persons) -0.04 -0.07 -0.03 -0.02 -0.01
Outfit t (items) 0.09 0.05 0.09 0.05 0.08
(persons) 0.02 0.04 0.03 -0.01 0.02
*L=52 items
** Separability: value=1 shows great reliability, whereas value=0 very little reliability
Table 1 shows the statistical values of Rasch statistical model for the whole sample and
the four subgroups (5th and 6th primary grades and 1st and 2nd secondary grades)
separately. It is evident that, for the whole sample and the subgroups, items reliability values are over .95, whereas subjects reliability values are over .76. Although the generally accepted values for such a scale are over .90 (Wright, 1985), the subjects reliability may be accepted. Furthermore, Mean Infit mean square for both items and
subjects equals to 1 for the whole sample and the subgroups, while Mean Outfit mean
square is either 1.01 or 1.02. Infit t and Outfit t, range from -0.13 to 0.09. Subjects Standard Deviation is rather small (SD=0.72), indicating uniformity in the samples behavior. Namely, students aged 10-14 respond to STT as an unvarying group. Besides,
the subjects mean score increases with age, suggesting developmental validity of the test. Rasch analysis also showed that the items receive infit values from .87 to 1.18,
which fit the generally accepted range .77-1.30 (Adams & Khoo, 1993). Three of the
items have an outfit value over 1.30, but since the difference between infit and outfit
values for these items is small, they remain in the test.
Strand 11 Evaluation and assessment of student learning and development
18
-
This is an on-going study and, at the moment, the test is under its second cycle of
development. Test administration and interviews with students, feedback from experts
and educators provide data to validate the items. The way data from each stage were
analyzed is indicated in the Tables 1 and 2 that are presented in the next subchapter. At
the end of the second cycle, Rasch analysis, as well as confirmatory factor analysis will
be conducted and results will be published.
Two examples of the items development
The development of two items through the STA construction cycles can be seen in Tables
2 and 3. The bicycle item presented in Table 2 refers to the strand System definition and more specifically to the ability of identifying the essential elements of a system and
during the procedure it has been revised. The apple tree item presented in Table 3 refers to the strand System balance and more specifically to the ability of identifying reinforcing balancing loops. It has been replaced by a different one because of
problematic item statistics during the pre-pilot phase of the second development cycle.
Table 2
The development of the bicycle item
1st
cycle Translation in English Comments Action
Pre-
pilot
Which are the least elements that a
bicycle that can troll should have?
. frame, two wheels, pedals, chain . frame, two wheels, gears, handle bar
C. frame, two wheels, pedals, seat
D. frame, two wheels
Students did not understand
the wording of the stem
Frequencies per alternative:
A B C D
0,41 0,09 0,32 0,18
Change
wording
of main
body
and
alternati
ves
To
experts
Which are the elements that a
bicycle SHOULD have in order to
roll, when someone is pushing it?
. frame, two wheels, chain, pedals, handle bar
. frame, two wheels, chain, pedals C. frame, two wheels, chain
D. frame, two wheels
Experts relate the item to two
initially separate abilities (the abilities 1.1 and 1.2 were
afterwards unified)
Keep as
is
To
educat
ors
Which are the elements that a
bicycle SHOULD have in order to
troll, when someone is pushing it?
. frame, two wheels, chain, pedals, handle bar
. frame, two wheels, chain, pedals C. frame, two wheels, chain
D. frame, two wheels
OK Keep as
is
Strand 11 Evaluation and assessment of student learning and development
19
-
Pilot Which are the elements that a
bicycle SHOULD have in order to
troll, when someone is pushing it?
. frame, two wheels, chain, pedals, handle bar
. frame, two wheels, chain, pedals C. frame, two wheels, chain
D. frame, two wheels
Frequencies per alternative:
A B C D
0,56 0,19 0,00 0,25
Revise
distract
or
Final
admini
stration
Which are the elements that a
bicycle SHOULD have in order to
troll, when someone is pushing it?
. frame, two wheels, chain, pedals, handle bar
. frame, two wheels, chain, pedals C. frame, two wheels, chain, handle
bar
D. frame, two wheels
Frequencies per alternative:
A B C D
0,58 0,07 0,16 0,18
Change
wording
of the
stem
2nd
cycle
Pre-
pilot
Which are the elements that a
bicycle SHOULD NECESSARILY
have in order to troll, when
someone is pushing it?
. frame, two wheels, chain, pedals, handle bar
. frame, two wheels, chain, pedals C. frame, two wheels, chain, handle
bar
D. frame, two wheels
Difficulty index (0.21)
Discrimination index (0.3)
Alternatives ok Frequencies per alternative:
A B C D
0,40 0,15 0,24 0,21
Keep as
is
Intervi
ews
(first
set)
Which are the elements that a
bicycle SHOULD NECESSARILY
have in order to troll, when
someone is pushing it?
. frame, two wheels, chain, pedals, handle bar
. frame, two wheels, chain, pedals C. frame, two wheels, chain, handle
bar
D. frame, two wheels
Correct answer with
CORRECT reasoning (4/11)
Wrong answer (7/11)
Suggestion of other
alternatives (2/11)
(wheels, pedals, handle bar) Alternative (B) not chosen by
anyone
Change
alternati
ve
content
Intervi
ews
(secon
d set)
Which are the elements that a
bicycle SHOULD NECESSARILY
have in order to troll, when
someone is pushing it?
. frame, two wheels, chain, pedals, handle bar
Correct answer with
CORRECT reasoning (1/5)
Wrong answer (4/5)
Keep as
is
Strand 11 Evaluation and assessment of student learning and development
20
-
. frame, two wheels, pedals, handle bar
C. frame, two wheels, chain, handle
bar
D. frame, two wheels
Table 3
The development of the apple tree item 1
st cycle Translation in English Comments Action
Pre-pilot - - -
To experts - - -
To
educators
Mr George planted a small apple tree 10
years ago. Now the apple tree is quite big.
As the apple tree grows,
A. it needs more water.
B. it needs less water.
C. the trees need in water does not change. D. it does not need extra water, since it has
already grown.
Keep as is
Pilot Mr George planted a small apple tree 10
years ago. Now the apple tree is quite big.
As the apple tree grows,
A. it needs more water .
B. it needs less water.
C. the trees need in water does not change. D. it does not need extra water, since it has
already grown.
Frequencies per alternative:
A B C D
0,38 0,31 0,13 0,19
Keep as is
Final
administra
tion
Mr George planted a small apple tree 10
years ago. Now the apple tree is quite big.
As the apple tree grows,
A. it needs more water.
B. it needs less water.
C. the trees need in water does not change. D. it does not need extra water, since it has
already grown.
Frequencies per alternative:
A B C D
0,48 0,18 0,25 0,07
Keep as is
2nd
cycle
Pre-pilot Mr George planted a small apple tree 10
years ago. Now the apple tree is quite big.
As the apple tree grows,
A. it needs more water .
B. it needs less water.
C. the trees need in water does not change. D. it does not need extra water, since it has
already grown.
Difficulty index (0.43) OK
Discrimination index (-0.3)
Frequencies per lternatives
A B C D
0,43 0,21 0,28 0,07
Item
replaced
Strand 11 Evaluation and assessment of student learning and development
21
-
CONCLUSION
Systems thinking is a higher order skill, important in dealing with everyday phenomena
and in solving problems. At the same time, science is a field with plenty of models to
analyze and model. Despite the widespread research on curriculum development on
systems thinking, no validated tests have been developed to evaluate their effectiveness.
STA is developed following a cyclic and iterative procedure. It aspires to be a useful
instrument in assessing a curriculum designed to promote systems thinking in upper-
primary and lower-secondary school students.
REFERENCES
Adams, R. J. & Khoo, S. T. (1993). Quest: The Interactive Test Analysis System.
Camberwell, Victoria: ACER.
American Association for the Advancement of Science (1993). Benchmarks for science
literacy. New York: Oxford University Press: Author.
Constantinide, K., Kalyfommatou, N. & Constantinou, C. P. (2001). The development of
modeling skills through computer based simulation of an ant colony. In
Proceedings of the Fifth International Conference on Computer Based Learning
in Science, July 7th July 12th 2001, Masaryk University, Faculty of Education, Brno, Czech Republic.
Ben-Zvi Assaraf, O. & Orion, N. (2005). Development of System Thinking Skills in the
Context of Earth System Education. Journal of Research in Science Teaching, 42
(5), 518560
Booth Sweeney, L. B. (2001). When a butterfly sneezes. Pegasus Communications, Inc,
Waltham.
Ericsson, K. A. and Simon, H. A.(1998). How to Study Thinking in Everyday Life:
Contrasting Think-Aloud Protocols With Descriptions and Explanations of
Thinking. Mind, Culture and Activity, 5, 178-186.
Hmelo-Silver, C. E. and Green Pheffer, M. (2004). Comparing expert and vonice
understanding of a complex system prom the perspective of structures, behaviors,
and functions. Cognitive Science, 28, 127-138.
Kali, Y., Orion, N., & Eylon, B. (2003). The effect of knowledge integration activities on
students perception of the earths crust as a cyclic system. Journal of Research in Science Teaching, 40, 545565.
Riess, W., & Mischo, C. (2009). Promoting Systems Thinking through Biology Lessons.
International Journal of Science Education, 1-21.
Strand 11 Evaluation and assessment of student learning and development
22
-
Senge, P. (1990). The Fifth Discipline: The Art and Practice of the Learning
Organization. New York: Doubleday.
Sheehy, N., Wylie, J., McGuinness, C. & Orchard, G. (2000). How Children Solve
Environmental Problems: using computer simulations to investigate systems
thinking. Environmental Education Research, 6, 2, 109-126.
Thier, H. D. & Knott, R. C. (1992). Subsystems and Variables. Teachers guide, Level 3, Science Curriculum Improvement Study. Delta Education, Inc., Hudson.
Strand 11 Evaluation and assessment of student learning and development
23
-
DEVELOPMENT OF A TWO-TIER TEST-INSTRUMENT
FOR GEOMETRICAL OPTICS
Claudia Haagen and Martin Hopf
University of Vienna, AECCP, Vienna, Austria
Abstract: Light is part of our everyday life. Nevertheless, students face enormous
difficulties in explaining everyday optical phenomena with the help of scientific concepts.
Usually they rely on alternative concepts deduced from everyday experience, which are
often in conflict to scientific views. The identification of such alternative conceptions is
one of the most important prerequisite for promoting conceptual change (Duit und
Treagust 2003). Investigating students concepts with interviews is quite time consuming and difficult to handle in school-settings. Multiple-choice tests on the other hand, depict
the conceptual knowledge base frequently in a superficial way. The main aim of our
project is to develop a two-tier multiple-choice test which reliably and validly diagnoses
year-8 students' understanding of geometrical optics. So far, we have developed and
empirically tested a first (N=643) and second test version (N=367) partly based on items
from literature. Though, the overall results are promising, the quality of the items differs a
lot: There are a number of items which do not have appropriate distractors for the second
tier. In addition, students and teachers feedback on the test indicates that some items pose problems due to their wording or the kind of representation chosen. For a closer analysis of
these problematic items the qualitative method of student interviews was chosen. Semi-structured, problem based interviews were led with 29 year-8 students after their formal
instruction in optics. Based on the results of these interviews, test items were revised and
extended.
Keywords: geometrical optics, two-tier multiple choice test, test development
INTRODUCTION
Despite everyday experience with light, understanding geometrical optics turns out to be
difficult for students. Physics education research shows that students hold numerous
conceptions about optics which differ from scientifically adequate concepts (Duit 2009).
Alternative conceptions are very stable. Research shows that formal instruction is
frequently not able to transform them into scientifically accepted ideas (Andersson und
Krrqvist 1983; Fetherstonhaugh und Treagust 1992; Galili 1996; Langley et al. 1997).
Teachers knowledge about their students learning difficulties is one important prerequisite for the design of successful instruction. Exploring students conceptual knowledgebase can provide important feedback: It can support students in their individual
learning process and can serve as basis for further teaching decisions.
In general, there are two main methods used for examining students conceptual knowledge: Interviews and open ended questionnaires. The most effective methods like
interviews are very time consuming and difficult to handle for teachers in classroom
situations. In search for alternatives out of this dilemma, we encountered the method of
two-tier tests as used by e.g. Treagust 2006; Law & Treagust 2008. Two-tiered test items
are items that require an explanation or defence for the answer [] (see Wiggins and
Strand 11 Evaluation and assessment of student learning and development
24
-
McTighe 1998, p. 14) (Treagust 2006). Each item consists of two parts, called tiers. The first part of the item is a multiple-choice question which consists of distractors including
known student alternative conceptions. In the second part of each item, students have to
justify the choice made in step one by choosing among several given reasons (Treagust
2006).
Research on alternative conceptions in optics has mainly used the methods of interviews or
questionnaires with open answers (Andersson und Krrqvist 1983; Driver et al. 1985; Guesne 1985; Viennot 2003). In addition, multiple-choice tests were developed (Bardar et
al. 2006; Chen et al. 2002;Chu et al. 2009; Fetherstonhaugh und Treagust 1992). These
tests focus on various age-groups and on different content areas within geometrical optics.
We have, however, not found a psychometric valid test-instrument designed to portray
basics conceptions in geometrical optics of students on the lower secondary level.
Our main research objective is the development of a multiple-choice test-instrument for
year-8 students which is able to portray the students conceptions in geometrical optics.
DEVELOPMENT OF THE TEST INSTRUMENT
The test instrument was so far developed in two phases. In the first phase of the test
development the content area of the test was identified based on the Austrian curriculum of
year-8. Then students conceptions related to the key ideas of the content area were investigated by intensive literature research. Finally, items for the test were selected from
already existing assessment tools for geometrical optics and adopted to the two-tier
structure, where possible. Where already existing items were added a second tier,
distractors for this second tier were taken from research on students conceptions. Additionally, some items were newly developed. The final version of the test was tried out
with N=643 year-8 students.
The results of this first test phase were used to revise the first test version. The second test
version was tested with N=367 year-8 students, after their conventional instruction in
geometrical optics in year-8. This version consisted of 20 two-tier items and 6 items with
only one-tier, which were partly taken from literature (Fetherstonhaugh und Treagust 1992;
Kutluay 2005; Bardar et al. 2006; Chu et al. 2009). The results of the statistical analysis
with SPSS and students and teachers feedback on the test indicated a potential for improvement. Some items did not have appropriate distractors for the second tier, while
others seemed to pose problems due to their wording or the kind of representations (Colin
et al. 2002) chosen.
Consequently, semi-structured, problem based interviews were conducted with year-8
students, after their instruction in geometrical optics. These interviews were carried out for
the following reasons: Firstly, we wanted to make sure that the distractors which had been
taken from literature were exhaustive. Secondly, the interviews should investigate the
response space of the newly developed items. Finally, the language and the graphical
representations used in the items should be validated by students.
Participants and Setting
We interviewed 29 students (17 female, 12 male) after their instruction in geometrical
optics. The students attended year-8 in 5 different schools. The students went to 8 different
Strand 11 Evaluation and assessment of student learning and development
25
-
classes and thus had 8 different physics teachers. The schools our sample attended
contained all different types of schools available in Austria at year-8 level.
The interviews were conducted in the school setting. Each student was interviewed
individually. The average duration of the interviews was 19.5 minutes.
METHOD
We carried out semi-structured, problem based interviews (Lamnek, 2002; Mayring, 2002;
Witzel, 1985). The interviews were based on seven selected items of the second test
version. The students were just given the item task without any distractors. The interview
followed a four step structure for each item. The students had to:
paraphrase the task of the item
describe the graphical representation used in the item
answer the item
account for the answer given
Figure 1. Flow chart of the structure of the interviews
Data analysis
The interviews were recorded and transcribed. Afterwards they were analysed with
MAXQDA following the method of qualitative content analysis by Mayring (2010) and
Gropengieer (2008).
The data was analysed concerning three main categories: language issues, the forms of
visual representations used and students conceptions related to the content of the items. As far as language issues are concerned, we were interested how students interpreted the task
of the item on basis of the text given. Additionally, we tried to identify unfamiliar words
and expressions as well as too long or complicated sentences.
For the visual representations our main aim was to find out if the students were able to
grasp the content or the situation represented in visual form.
The final category on students conceptions was supposed to analyse the response space concerning the problems posed and so to get a good overview of students conceptions related to the problem posed.
Strand 11 Evaluation and assessment of student learning and development
26
-
FINDINGS
The findings presented here are results of the empirical testing of the second test version
(N=376). The reliability of the test was established by a Cronbach alpha coefficient of
=0.77. An overview of the test and item statistics concerning the 20 two-tier items is given in figure 2.
Figure 2. Test and item statistics of the second test version
Two-tier items were on average answered only in 37.2% of cases correctly. Contrary, one-
tier items were solved on average in 47.41% of the cases. The solution frequencies of one-
tier items (8.5% - 88.0%) were higher than those of two-tier items, which varied between
3.0% and 57.2%. This effect is well known from research using two-tier items. Next to
other factors, it is mainly caused by the fact that the probability of guessing is reduced by
the necessity of accounting in the second tier for the choice made in tier one (cf. e.g. Tan &
Treagust 2002).
This is also supposed to be one way of distinguishing students who just possess a
superficial factual knowledge of phenomena from students who have a deeper conceptual
knowledge of phenomena as they are not only able to give a correct answer for the first tier
of a multiple choice item but are also able to give a correct reason for their choice. As
reported elsewhere (cf. Haagen & Hopf 2012) a more detailed analysis of the items
indicated that most two-tier items used had a higher potential of portraying students conceptions in more detail in comparison to one-tier items.
The second part of the findings section is going to concentrate on the findings of the
interviews. As already mentioned above, the interviews were used to find appropriate
distractors for items not having a second tier. For this paper, the focus is on this issue and
in the following, one example of adding a second tier with help of the interview results is
reported.
For the topic of continuous propagation of light, the following item represented in figure 3
was used.
Strand 11 Evaluation and assessment of student learning and development
27
-
Figure 3. One-tier item of test version two concerning the key idea of continuous
propagation of light
For those students who indicated in the first tier that they supposed a different distance of
propagation of light from the campfire during day and night, we got 6 different categories
of reasons as shown in figure 4.
Figure 4. Reasons for a different propagation distance of light from a campfire during day
and night
Each of these categories was retranslated into students language taking either a student statement directly from the interviews or modifying a student statement slightly in order to
fulfil psychometric guidelines for distractor construction. This procedure led to the second
tier for this item as presented below in figure 5.
Strand 11 Evaluation and assessment of student learning and development
28
-
Figure 5. Two-tier item of test version two concerning the key idea of continuous
propagation of light
CONCLUSION
In conclusion, the analysis of the second test version showed that two-tier items of the test
are well able to portray several types of students conceptions known from literature. On the other hand, results indicated that some items needed still revision and improvement.
The results obtained by interviews were integrated and make up the third test-version,
which needs to be tested.
REFERENCES
Andersson, B.; Krrqvist, C. (1983): How Swedish pupils, aged 12-15 years, understand light and its properties. In: IJSE 5 (4), S. 387402.
Bardar, E.M; Prather, E.E; Brecher, K.; Slater, T.F (2006): Development and validation of
the light and spectroscopy concept inventory. In: Astronomy Education Review 5, S.
103.
Chu, H.E; Treagust, D.; Chandrasegaran, A. L. (2009): A stratified study of students'
understanding of basic optics concepts in different contexts using two-tier multiple-
choice items. In: RSTE 27, S. 253265.
Colin, P.; Chauvet, F.; Viennot, L. (2002): Reading images in optics: students difficulties
and teachers views. In: IJSE 24 (3), S. 313332.
Driver, R.; Guesne, E.; Tiberghien, A. (Hg.) (1985): Children's ideas in science.
Buckingham: Open University Press.
Duit, R. (2009): BibliographySTCSE: Students and teachers conceptions and science education. Retrieved October 20, 2009.
Duit, R.; Treagust, D.F (2003): Conceptual change: a powerful framework for improving
science teaching and learning. In: IJSE 25 (6), S. 671688.
Strand 11 Evaluation and assessment of student learning and development
29
-
Fetherstonhaugh, T.; Treagust, D. F. (1992): Students' understanding of light and its
properties: Teaching to engender conceptual change. In: SE 76 (6), S. 653672.
Galili, I. (1996): Students conceptual change in geometrical optics. In: IJSE 18 (7), S. 847868.
Guesne, E. (1985): Light. In: R. Driver, E. Guesne und A. Tiberghien (Hg.): Children's
ideas in science. 1993. Aufl. Buckingham: Open University Press, S. 1032.
Langley, D.; Ronen, M.; Eylon, B. S. (1997): Light propagation and visual patterns:
Preinstruction learners' conceptions. In: JRST 34 (4), S. 399424.
Law, J.F; Treagust, D. F. (2008): Diagnosis of student understanding of content specific
science areas using on-line two-tier diagnostic tests. Curtin University of
Technology.
Mayring, P. (2010): Qualitative Inhaltsanalyse. Weinheim: Beltz.
Treagust, D. F. (2006): Diagnostic assessment in science as a means to improving
teaching, learning and retention. In: UniSever Science - Symposium Proceedings:
Assessment in science teaching and learning. Sidney, 2006. UniServe Science.
Treagust, D.F; Glynn, S. M.; Duit, R. (1995): Diagnostic assessment of students science knowledge. In: Learning science in the schools: Research reforming practice 1, S.
327436.
Viennot, L. (2003): Teaching physics. Supported by: U. Besso, F. Chauvet, P. Colin, F.
Hirn-Chaine, W. Kaminski und S. Rainson: Springer Netherlands.
Strand 11 Evaluation and assessment of student learning and development
30
-
STRENGTHENING ASSESSMENT IN HIGH SCHOOL
INQUIRY CLASSROOMS
Chris Harrison
Kings college London
Abstract: Inquiry provides both the impetus and experience that helps students
acquire problem solving and lifelong learning skills. Teachers on the Strategies for
Assessment of Inquiry Learning in Science Project (SAILS) strengthened their
inquiry pedagogy, through focusing on seeking assessment evidence for formative
action. Observing learners in the classroom as they carry out investigations, listening
to learners piece together evidence in a group discussion, reading through answers to
homework questions and watching learners respond to what is being offered as
possible solutions to problems all provide plentiful and rich assessment data for
teachers.
Keywords: Inquiry, Assessment, Teacher change
BACKGROUND
The European Parliament and Council (2006) identified and defined the key
competencies necessary for personal fulfillment, active citizenship, social inclusion
and employability in our modern day society. These included communication skills
both in mother tongue and foreign languages, mathematical, scientific, digital and
technological competencies, social and civic competencies, cultural awareness and
expression, entrepreneurship and learning to learn. These key competencies formed
the foundation for the approach that our European Framework 7 project (EUFP7)
Strategies for Assessment of Inquiry Learning in Science Project (SAILS) took to
developing, researching and understanding how teachers might strengthen their
teaching of inquiry-based science education.
Since the Rocard Report (2007) recommended that school science teaching should
move from a deductive to an inquiry approach to science learning, there have been
several EUFP7 projects such as S-TEAM, ESTABLISH, Fibonacci, PRIMAS and
Pathway,.whose remit has been to support groups of teachers across Europe in
bringing about this radical change in practice. These projects have been successful in
highlighting the importance of IBSE across Europe. They also have enabled us to
determine the range of understanding of what the term inquiry means to teachers
across Europe, and to establish to what extent skills and competencies that are
developed through inquiry practices have been identified. The term inquiry has figured prominently in science education, yet it refers to at least three distinct categories of activitieswhat scientists do (e.g., conducting investigations using scientific methods), how students learn (e.g., actively inquiring through thinking and doing into a phenomenon or problem, often mirroring the processes used by scientists), and a pedagogical approach that teachers employ (e.g., designing or using curricula that allow for extended investigations) (Minner et al, 2009).
Inquiry-based science education (IBSE) has proved its efficacy at both primary and
secondary levels in increasing childrens and students interest and attainments levels (Minner et al, 2009: Osborne et al, 2008) while at the same time stimulating teacher
Strand 11 Evaluation and assessment of student learning and development
31
-
motivation (Wilson et al, 2010). One area that has remained problematic for teachers
and cited as one of the areas limiting the development of IBSE within schools has
been assessment. (Wellcome, 2011). This EUFP7 project Strategies for Assessment of
Inquiry Learning in Science (SAILS) aims to prepare science teachers, not only to be
able to teach science through inquiry, but also to be confident and competent in the
assessment of their students learning through inquiry. The literature on teacher change suggests that teacher change is a slow (and often difficult process and none
moreso than when the initiative requires teachers to review and change their
assessment practices (Harrison, 2012).
Part of the reason for this slow implementation of IBSE in science classrooms is the
time lag that happens between introducing ideas and the training of teachers at both
inservice and preservice level. While this situation should improve over the next few
years, there is a fundamental problem with an IBSE approach and this lies with
assessment. While the many EU IBSE projects have produced teaching materials,
they have not produced support materials to help teachers with the assessment of this
approach. Linked to this is the low level of IBSE type items in national and
international assessments which gives the message to teachers that IBSE is not
considered important in terms of skills in science education. It is clear that there is a
need to produce an assessment model and support materials to help teachers assess
IBSE learning in their classrooms if this approach is to be further developed and
sustained in classrooms across Europe.
Inquiry Skills
Inquiry skills are what learners use to make sense of the world around them. These
skills are important both to create citizens that can make sense of the science in the
world they live in so that they make informed decisions and also to develop scientific
reasoning for those undertaking future scientific careers or careers that require the
logical approach that science encourages. An inquiry approach not only helps
youngsters develop a set of skills such as critical thinking that they may find useful in
a variety of contexts, it can also help them develop their conceptual understanding of
science inquiry based science education (IBSE) and encourages students motivation
and engagement with science.
The term inquiry has figured prominently in science education, yet it refers to at least
three distinct categories of activitieswhat scientists do (e.g., conducting investigations using scientific methods), how students learn (e.g., actively inquiring
through thinking and doing into a phenomenon or problem, often mirroring the
processes used by scientists), and a pedagogical approach that teachers employ
(e.g., designing or using curricula that allow for extended investigations) (Minner,
2009). However, whether it is the scientist, student, or teacher who is doing or
supporting inquiry, the act itself has some core components.
Inquiry based science education is an approach to teaching and learning science that is
conducted through the process of raising questions and seeking answers (Wenning,
2005, 2007) . An inquiry approach fits within a constructivist paradigm in that it
requires the learner to take note of new ideas and contexts and question how these fit
with their existing understanding. It is not about the teacher delivering a curriculum
of knowledge to the learner but rather about the learner building an understanding
through guidance and challenge from their teacher and from their peers.
Strand 11 Evaluation and assessment of student learning and development
32
-
Some of the key characteristics of inquiry based learning are:
Students are engaged with a difficult problem or situation that is open-ended
to such a degree that a variety of solutions or responses are conceivable.
Students have control over the direction of the inquiry and the methods or
approaches that are taken.
Students draw upon their existing knowledge and they identify what their
learning needs are.
The different tasks stimulate curiosity in the students, which encourages them
to continue to search for new data or evidence.
The students are responsible for the analysis of the evidence and also for
presenting evidence in an appropriate manner which defends their solution to
the initial problem (Kahn & O'Rourke, 2005).
In our view, these inquiry skills are developed and experienced through working
collaboratively with others