final paper assessment.docx

Upload: maierbruggeringo

Post on 02-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 final paper assessment.docx

    1/14

    EFL Testing and Assesment

    Heany, Helen SS 2014

    26.7.2014

    Final written Assignment

    Ingo Maierbrugger

    (Mrn.: a0708160)

    Elisabeth Gottesheim

    (Mrn.: a0)

  • 8/10/2019 final paper assessment.docx

    2/14

    CONTENTS

    0. Introduction.2

    ----------------------- PART 1 ------------------

    1.1. Test specifications3

    1.2. Test development process.........4

    1.2.1 Selecting texts:

    Deciding as a group...........6

    1.2.2 Writing Items:

    Taming the test technique.... ..8

    1.2.3 Revising Items and Testlet:

    The importance of feedback..11

    1.2.4 Final Version and Reflections:

    The logistic of the test writing process..11

    ----------------------- PART 2 ------------------

    2.1 Pilot test results

    2.2. Revisions for a main trial

    2.3. Test usefulness

    2.4. Measures to improve test usefulness (could be integrated in II.3)

    ----------------------- PART 3 ------------------

    3.1 Reflection Ingo Maierbrugger: More than a red ink pen ..

    3.1 Reflection Elisabeth Gottesheim

    Bibliography

    Appendix..

  • 8/10/2019 final paper assessment.docx

    3/14

    Introduction:

    The proper relationship of testing and teaching is surely that of a partnership

    Hughes (2003: 7)

    Hughes (2003) observes in the first chapter of his book that many test users; teachers as

    well as students mistrust tests, and he even admits that they are in most cases right in

    doing so as a great deal of language testing is of very poor quality . This unsatisfactory

    condition, however, was, as Hughes admits, the initial cause for him to write his book in

    order to help language teachers to improve their testing.

    Following Hughes mission, and also using his book, our course was directed to future

    language teachers, who should learn right from the beginning how they can design tests

    so that they will prove to be useful to their teaching instead of harmful. In other words

    this course taught its students how to use and design test successfully, from a theoretical

    point of view, as well as in a practice-oriented group work.

    And especially the completion of this group work, which will be outlined in the following

    paper, embraced once again all important stages of a creating a successful test, from

    establishing the test specifications to evaluating of its final result.

    In short, the course as a whole, and the group work in especially, have taught future

    teachers, how to purposefully use test, and thus this paper wants to show, how we have

    learned to bring, just as Hughes wants it to be, teaching and testing in the relation of a

    partnership.

  • 8/10/2019 final paper assessment.docx

    4/14

    1. Test specification:

    1.1 General Statement of Purpose:

    This test is designed to assess the English reading proficiency of students in the last

    forms (7thor 8thgrade) of Academic Secondary School (AHS) who are preparing for the

    Matura. As the test seeks to aid students in their preparations for the Matura exam the

    level of performance the test is designed to measure is like in the Matura the B2 level of

    the CEFR. Moreover, also in terms of test content, which means specifically in regard to

    chosen texts as well as testing techniques, time setting this test tries to simulate

    conditions of a Matura exam as authentically as possible.

    Thus in respect to the test takers the purpose of the test is:

    a.) to inform them about their reading proficiency in light of the upcoming Matura

    b.)to give them a practice test, where they will find conditions similar to the

    Matura

    In this way, the test can be regarded as a proficiency test which will provide test takers

    with a rough estimate of how they might do in an actual Matura exam. As test results are

    not included in the students course grade, this test can be also regarded as a

    approximate diagnostic test, which provides test takers with insights about their

    performance on a Matura-like test and might so show test takers for example which

    testing method they might still have problems with, or if they have to increase their

    reading speed in order to complete the whole test in time. However, to effectively gain

    this kind of diagnostic feedback, it would be advisable that, when the corrected tests are

    handed back to students, some time is spent to discuss the test results as well as the

    studentsoverall experiences with the test format

    In sum, this reading test is a criterion-referenced proficiency test which follows closely

    the test specifications of the Matura B2 level. Thus, the test, which will be objectively

    scored, does not only provide useful feedback to the test takers who are preparing for

    the Matura, but gives them a chance to practice the format. In other words, besides the

    actual test results, this test contributes to creating a beneficial backwash for the new

    Matura asit in the words of Hughes (2003: 55) helps to ensure that the test [namely

    the Matura] is known and understood by students.

  • 8/10/2019 final paper assessment.docx

    5/14

    1.2 Test focus (Test construct):

    Like the Matura also this test uses the test construct embedded in the description of the

    B2 reading proficiency standards as it is defined in the CEFR, and tries so to cover all

    main components that form together an overall reading comprehension. Thus the test

    assesses following reading operations:

    1. Reading for gist

    2. Reading for information/ important details

    3.

    Reading for main ideas and supporting details

    4. Making propositional inferences

    5. Reading to deduce the meaning of words phrase

    Or put in the terminology of Khalifa and Weir (2008), this test tries to elicit both careful

    and expeditious reading on a global as well as local reading level. However in regard to

    ours group testlet, expeditious reading is only indirectly involved in the completion of

    the tasks, as it is used to locate relevant information in the text, but then the test taker

    needs to engage in careful reading in order to extract the answer individual tasks.

    1.3 Test Takers: Age 16 and upwards,L1 majority German speaker

    1.4Test content:

    Also the texts and task types candidates are expected to be able to deal with are

    essentially in line with the demands of the Matura exam, however in regard to overall

    length and coherence within the individual texts some differences had to be made due to

    logistic restraints, so that the final test can be seen as a slightly shortened version of a

    Matura reading exam.

    1.4.1. Authenticity of text:Authentic, not simplified but in some cases shortened to fit

    logistic restraints. As texts are mainly taken from the internet layout is changed for

    paper version, however paragraphing and overall text structure is kept as authentic as

    possible

  • 8/10/2019 final paper assessment.docx

    6/14

    1.4.2. Text types: General interest, articles, book reviews, etc., generally due to task

    formats and layout restrictions mainly non- literary texts

    1.4.3. Discourse type:narrative, argumentative, descriptive, expository, persuasive

    1.4.4. Topic area: In order not to influence the performance of test takers selected

    topics should be neither too provocative, i.e. topics which might cause offence or

    emotional distress, nor too boring to the average reader in the test takers age group.

    1.4. 5. Number of words: 500 Words precisely due to logistic reasons

    1.4.6 Number of texts: 3, each text is accompanied by a different testing method

    1.4.7 Test methods:

    a. Multiple Choice

    b. True / False with Justification

    c. Gap filling

    d. Short answers

    1.5. Number of items per text: 8, which means 24 items altogether

    1.6. Weighing per item:1 point per item (TFJ both parts need to be correct for 1 point)

    1.7. Time for test:45 Minutes for 3 Texts with 8 items, compared with the Matura exam

    which consists of 4 texts with also 8 items each and lasts 60 minutes, this test comprises

    the same test time per individual section, namely 15 minutes, only that this test features

    one section less to fit the time demands of school lessons which last 50 minutes. As

    Hughes (2003: 141) points out that in assessing reading proficiency reading speed is a

    prime important feature of the test which combines with the number and difficulty of

    items to determine the amount of time needed for the test it is especially important that

    the time setting of the test follows the demands of the Matura exam, so that the practice

    for the students is as authentic as possible.

    2. Test development process

    2.1 Selecting Texts: Deciding as a group

  • 8/10/2019 final paper assessment.docx

    7/14

    The first step in compiling our groups testlet namely to select an appropriate text

    became apparent to be both the easiest and the hardest part of the whole creation

    process. It was the easiest part in the sense that they requirements we had for the text

    (around 500 words, authentic English, and not disturbing topic wise) were so broad that

    within only a few minutes hundreds of adequate texts could be found in the internet.

    However, the hard part was then to decide which of these countless texts might be the

    most suitable one for our purpose. To solve this issue group discussions as well as the

    text mapping procedure proved to be highly useful tools to finally make a successful

    choice.

    In this way, after every group member had decided individually on one text, this first

    selection was presented in a group discussion. As everyone wanted to sell his/ her text

    to the other group members the group discussion proved to be a good place to discuss

    advantages and disadvantages of the texts, in which mainly issues concerning the

    possible appeal to test takers and beneficial backwash of chosen text types were

    discussed.

    However, it was the text mapping procedure which helped to reveal the inner quality

    of the text and thus to show which text was written coherent and clear enough to gain

    most consensus points. This procedure proposed by Urquhart & Weir (1998: 306-7)

    limited our final choice to two texts: A description of how to apply for a US visitor visa,

    which scored most consensus points, and a book review on A Million Ways to die in the

    West.

    In a final group discussion we came to the conclusion that the visitor visa had most

    consensus points because it was specifically written and designed to be easily

    understood by the reader. So in order to make the final testlet not too easy we decided

    to use an more traditional text type, namely the book review, which put forward as its

    main advantage a paragraph structure which would give in the words of Hughes (2003:

    142) candidates a good number of fresh starts. Additionally we agreed that a book

    review would be a useful text type in regard to what pupils were expected to read in

    school and later in university.

    So in the end we based our final decision, like Hughes (2003: 142) put it, ultimately on

    experience, judgment and a certain amount of common sense. And as we had come to

    our final choice together as a team, we could include in the previous decision-making

  • 8/10/2019 final paper assessment.docx

    8/14

    process the experiences and general judgment of not just one individual mind but of four

    different people, which gave us finally the assurance that we had made a successful

    choice.

    2.2. Writing Items: Taming the test technique

    The starting point of the item writing process were the consensus information points

    which haven been raised in the text mapping procedure, as an ability to answer the

    question(s) correctly implies that the text has been understood (Heany 2011) For this

    reason we assigned to each group member three consensus points which had to be used

    to create three test items respectively.

    After the individual work we decided in a later held group discussion which 8 final task

    items we would choose out of our pool of 12 possible items. Doing so, we exchanged

    again thoughts and opinions about individual items and more importantly about the

    coherence between those items. Thus, in reference to our initial test specifications we

    noticed that nearly all items we had individually created were directed towards the

    understanding of main ideas and important details, and that therefore the reading

    operations reading for gistand making inference were almostnot addressed at all by

    our test items. In retrospective, two factors can be identified which were responsible for

    this first small set back in our test writing phase.

    Firstly, as mentioned before, because test items were formulated on the grounds of

    consensus items, it was quite logical that these items would target pieces of information

    that were straightforwardly given by the text and not just implied by it, as the consensus

    items themselves, had been comprised of concrete information remembered by a reader.

    Secondly, the test technique itself, namely True/False, seemed to call on first glance as

    well for concrete pieces of information rather than for gist or deductions, because it was

    simply easier to ask yes and no questions, if you had a concrete fact in your mind.(i.e.

    Q10: Albert is a coward.) Also Hughes observed this problem of the multiple choice

    question format, of which the TF-format can be just regarded as a sub form (see Hughes

    2011: 79), consequently Hughes concluded that this technique severely restricts what

    can be tested.(Hughes 2011: 77)

  • 8/10/2019 final paper assessment.docx

    9/14

    However, after we discussed the issue of our somehow restricted question focus we

    nonetheless managed to come up with two items (Q13-14) which addressed the gist of

    the book review rather than only concrete pieces of information, and which prompted

    test takers to engage in inferring strategies in order to answer the items. The idea

    underlying these items, namely to ask about attitudes and opinions of the author of the

    text, was taken from an example of a Matura exam. This way we learned by imitating

    other peoples work, how to overcome the initial problems we had with our assigned

    test technique and learned moreover how to use this technique to a wider purpose than

    just to ask for straight forward given concrete information.

    2.3. Revising Items and Testlet: The importance of feedback

    The revising process of our test items already started in the first group discussion in

    which we presented our three individual test items. At this point we identified the items

    which seemed most useful to all of us, and as described before, came up with new or in

    one case transformed items which should target other reading operations than just

    reading for main ideas and supporting details.

    After this first revising phase of our item, we felt content and even quite proud of our

    pool of now revised 8 task items, so that we could comprehend, what Hughes meant,

    when he referred to home-made tasks items as perceived as minor works of art, or

    even, it sometimes seems, [as] our babies (Hughes 2003: 58)

    Moreover, Hughes was also right when he spoke about the difficulties that came with

    handing your baby over to others, who should evaluate and give you feedback about the

    quality of the self-made items. However, despite the fear that ones own work would get

    too harshly criticized, the moderation process in which members of another group

    evaluated our groups item proved to be highly useful, as via this method notonly minor

    problems within individual items, such as spelling and ambiguous wording had been

    detected, but moreover new important points were raised, which we as producing group

    had simply not noticed before. So for example, it was brought to our attention that the

    very heading of our text, contained the answer to our first question (namely that this

    was MacFarlanes first written novel) and that the last 5 answers to your True/ False

    items were all False, which might have been distracting to the test takers.

  • 8/10/2019 final paper assessment.docx

    10/14

    Both problems were quite obvious, however to us, as we contentiously worked on the

    items and the text, these discrepancies were invisible, as we were so focused on other

    things that it never occurred to us to re-read the texts heading, and we knew the answer

    to our items by heard, so that we never bothered to actually tick of the correct solution

    in the answer boxes, which would had instantly revealed that nearly all items had to be

    answered with False.

    In conclusion, the fact that our proofreader saw our testlet for the first time, gave them a

    different perspective to our work. They saw the big picture, and so they could perceive

    problems, which were invisible to us. As Alderson, Clapham and Wall (1995: 39)

    stressed: It is absolutely crucial in all test development [..] that some person or persons

    other than the individual item writer(s) look closely at each item.

    Revising the items was done easily: We dropped the first question altogether for being

    too easy, and transformed some of the last five questions so that they had now True as

    their correct answers. Thus, the required corrections were done without difficulties,

    only seeing where they had been necessary was the crucial point, in which we had

    clearly benefited from other peopleshelp.

    2.4. Final Version and Reflections: The logistic of the test writing process

    In the end we managed to finish our testlet in time and were quite confident that our

    work was fit for its task. In retrospection, it had become clear that through the whole

    test writing process next to individual ideas and creativity, or maybe even more

    important than these factors, the exchange of thoughts with others, within or without

    the working team, were the forces that drove the creating process forward to the final

    compilation of the testlet.

    However, and this might have the point responsible for some minor difficulties in the

    overall creation process, the moments in which the group members were actually

    physically together in the classroom and had time to discuss arisen issues, proved to be

    a little too short to discuss all issues sufficiently, and other alternative online modes of

    contacting each other were quite laborious, as you could only indirectly interact with

    your colleagues , instead of engaging in a real face to face conversation, which would

    have solved problems instantly.

  • 8/10/2019 final paper assessment.docx

    11/14

    In the case of our group, however, this slight lack of group time was not too harmful to

    the overall testlet development, as we were luckily aided with organizational as well as

    logistical support from our course teacher, but If we were to design another test or

    testlet with the stakes put higher, and moreover find ourselves in charge of moderating

    the test creation process, it would be definitely a good idea to assign plenty of time to

    real life group meetings, and it should be ensured additionally that these meetings

    would happen on a regular basis, because as Hughes (2003: 58) observed: test

    development is be best thought of as a task to be carried out by a team, and certainly,

    and this is also what we experienced throughout our group work, a team works best if

    group members are actually physically together and can so really engage in an open

    exchange of thoughts and ideas, and thus work effectively together in creating a

    successful test.

    3.1 Reflection (Ingo Maierbrugger) More than a red ink pen

    After reading the course name EFL Testing and assessment for the first time, I

    expected that the whole course would be about how to grade Schularbeiten and

    Hausarbeiten and nothing more. I had just the fix idea in my mind that everything thatwas called assessment involved a red ink pen and a teacher reading through some

    students texts in order to find mistakes and mark them.

    This course however has shown me that testing consists of more than just this aspect of

    grading which is even only done in relation to the assessment of writing, but is a form of

    finding information and presenting it that plays an important role in our educational

    system as well as in all our culture. In this way, this course did not only widen my

    understanding of testing in general, which means in respect to the broader role it plays

    in our society (placement test, back wash effect, etc.), but I have also learned that in the

    overall process of testing something grading is just one part of a series of activities

    which are all necessary in order to create and later to execute a successful test.

    And especially in regard to the test creation process, of which I had never thought of

    before this course, I have seen throughout our course as well as our group work how

    many aspects and details have to be thought of in order to create a well-functioning test.

    Thus, I finally realized how much work actually goes into the creation of something that

    I have grown so used to during my school and student days. And so my perspective on

  • 8/10/2019 final paper assessment.docx

    12/14

    the matter has changed throughout the course: from one of a test taker to one of a test

    maker.

    And as a test maker I have learned that such a laborious task, as the compilation of a

    testlet proved to be, is a job that is best done in group work and with help of others. As it

    was described in the test development partof this paper group discussions, exchange

    of thoughts and ideas, feedback and proof reading from other are vitally important in

    creating a successful test. And I am quite confident that these modes of working

    interactions are qualities that are also applicable in many other professional fields other

    than test making.

    In conclusion, this course as well as our group project have first of all taught me that

    testing is more than just marking with a red ink pen. And secondly, and this is for my the

    point that I will especially take home from this course, it was proved to me that working

    together really ensures that the final result has in the end a high quality, and as I have

    said before this observation surely is true for more than just test making.

    Bibliography:

    Alderson, Charles; Clapham, Caroline; Wall, Dianne. 1995. Language test construction

    and evaluation. Cambridge: Cambridge University Press.

    Heany, Hellen. 2011. Explanation of text mapping technique (rationale, method), moodle

    course content.

    Hughes, Arthur. 2003. Testing for Language Teachers. Cambridge: Cambridge University

    Press.

    Khalifa, Hannan; Weir, Cyrill.2008. Cambridge ESOL: Research Notes,2-16.

    Urquhart, A. H.; Weir, Cyrill. 1998. Reading in a Second Language: Process, Product and

    Practice, London: Longman.

  • 8/10/2019 final paper assessment.docx

    13/14

    Appendix 1: First draft items and testlet

  • 8/10/2019 final paper assessment.docx

    14/14

    Appendix 2: Testlet plus items Final version