standards in language testing

34
Group Presentation for Module 3 / Unit 8 by Ben & David Standards in Language Testing: Working with the EALTA Guidelines

Upload: masters8

Post on 13-Jan-2015

2.809 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Standards In Language Testing

Group Presentation for Module 3 / Unit 8by Ben & David

Standards in Language Testing:

Working with

the EALTA Guidelines

Page 2: Standards In Language Testing

Standards in Language Testing

• Attempt to define good practice• Alderson, Clapham & Wall define ‘standards’ as:

“an agreed set of guidelines which should

be consulted and, as far as possible,

heeded in the construction and evaluation

of a test” (1995: 236)• Can holistic standards be applied to all tests?• What ideals should they describe?• How prescriptive should they be?

Page 3: Standards In Language Testing

Presentation Outline:

PART ONE:

Applying the EALTA Guidelines to TEEP

PART TWO:

A critical look at the EALTA Guidelines and

comparison to another well-known set of standards

PART THREE:

Conclusions

Page 4: Standards In Language Testing

PART ONE:

What is EALTA?

• European Association for Language Testing and Assessment

• independent, professional association supported financially by European Community

• aims to promote understanding of theoretical principles of language testing and assessment and improvement and sharing of practices

• promotes adherence to principles of transparency, accountability and quality

Page 5: Standards In Language Testing

What are the EALTA Guidelines?

• Aimed at those involved in (a) training teachers in testing and assessment

(b) classroom testing and assessment

(c) test development in national or institutional testing centres

• Questions for consideration rather than principled statements

• need to be filtered through particular context of each ‘test situation’ - Davies: ethics and standards are about “maintaining a balance between the rights of the individual and the demands of the social.”

Page 6: Standards In Language Testing

What is TEEP?

• Test of English for Educational Purposes • proficiency test in English for academic purposes• taken by overseas students intending study at

both Reading University and other institutions of higher education in Britain

• assesses proficiency in reading, listening and writing in 3 separate sub-tests

• separate grammar test (used to separate borderline cases)

Page 7: Standards In Language Testing

Our Task:

Applying

the EALTA Guidelines

to TEEP

Page 8: Standards In Language Testing

EALTA & BEST PRACTICE

EALTA: “test developers are encouraged to engage in dialogue with decision makers in their institutions and ministries to ensure that decision makers are aware of both good and bad practice, in order to enhance the quality of assessment systems and practices.”

TEEP• major revision was sanctioned by The University of

Reading in 1999, based on: – items functioning unpredictably, highlighting difficulties in

quality. (O’Sullivan, 1999) – an out-dated needs analysis to reflect modern

needs/views of language competence (O’Sullivan, 2000).

Page 9: Standards In Language Testing

TEEP: TEST PURPOSE & SPECIFICATION

• Clearly stated purpose• Clearly described test-taker (347 in 2004)• Test Specifications – Handbooks – for each

audience• Test methods & tasks described and exemplified• Description of constructs underlying sub-tests• Performance data for 2001, 2003, 2004 given

Page 10: Standards In Language Testing

TEEP:STUDENT PERFORMANCE

• Most TEEP candidates attend Pre-sessional course at Centre for Applied Language Studies (CALS) at Reading University with objective of acceptance to the University’s academic courses

• minimum level course entry levels to ensure level of 6.5 or 7.0 achievable

• Entry requirements for university institutions normally 6.5 or 7.0

• CALS use TEEP plus continuous assessment

Page 11: Standards In Language Testing

TEEP 2004

54% achieved grade of at least 6.5(compared with 70% in 2003 and 66% in 2001)

Page 12: Standards In Language Testing

TEEP: PURPOSE & SPECIFICATIONOutstanding Information:

• No descriptions of misuse

• No evidence linking TEEP to CEFR (only advice for candidates at B1 or below to improve their language competence before sitting TEEP)– Candidates should be “intermediate at the very least”

• No rating scales and no information about changes to scales & band descriptions (despite reference to revisions to marking criteria for Writing Paper between 1999 & 2001)

Page 13: Standards In Language Testing

TEEP: TEST DESIGN & ITEM WRITING

• Item analyses in 3 Examiner’s Reports suggests ‘systematic procedures’ are in place…  

• No information about– Qualifications

– Training

– Item-writing guidelines

– Feedback to item writers

• Little information about item revision– “items were working at a very acceptable level”

– First 2 test item FVs of 0.25 & 0.5 though “actually designed as relatively easy introduction”

– No further details

Page 14: Standards In Language Testing

TEEP:QUALITY CONTROL & TEST ANALYSES

• Several quality control procedures– Independent experts invited to conduct test revisions– Development of new Writing Rating Scale– Biannual rater standardisation – Writing tests double-marked– Inter-rater reliability studies (83% of double ratings

within 0.5 of a band of each other)– Results reported as Overall Score based on Reading,

Listening & Writing papers (with Grammar test results only significant in critical level decisions)

– Results sent directly to candidate & Admissions Dept.– Reporting of Standard Error of Measurement (68%

certain that score of 7.0 on the TEEP will be within range 6.73 to 7.27)

Page 15: Standards In Language Testing

TEEP:QUALITY CONTROL & TEST ANALYSES

• Variety of statistical studies:– All Facet Vertical Rulers– Raters Measurement Report– Scale Criteria Measurement Report– Category Statistics– Probability Curves– Expected Score Ogive– Item Fit Statistics

• Classical item analysis in 2004 Report– mean FV = 0.501! (“a fairly satisfactory result, though

somewhat on the low side”) – Point Biserial Correlation = 0.3 (“a very satisfactory

outcome”)

Page 16: Standards In Language Testing

TEEP:QUALITY CONTROL & TEST ANALYSESOutstanding Information:

• No piloting information– No trials data or information about revisions

• 2004 Report states that items “too easy or too difficult for the candidates... ()…will be reviewed and re-trialled before this version of the test is used again”

• Brief mention of (only) 3 test versions • No descriptions of version equivalence • No details about rater training• No descriptions of rater monitoring policy• No information about complaints/appeals

Page 17: Standards In Language Testing

TEEP:TEST ADMINISTRATION & SECURITY

• Only 3 Examiner’s Reports in 25 years• “intention” (2004) for annual report• No information about the training/monitoring of

administrators– "Strict invigilation instructions are always followed,

which are designed to not only keep the test secure but also to alleviate examination stress"

• certificate has some basic security features (original signature/stamp)– “At all stages, the TEEP test is secure.”

Page 18: Standards In Language Testing

TEEP:FURTHER FINDINGS

• High stakes testing• No concrete information about how TEEP keeps

pace with changes in CALS curriculum, or if curriculum keeps pace with TEEP

• No information about how alternative assessment (including a Speaking test) conducted / its impact on TEEP candidates

• No information about washback

Page 19: Standards In Language Testing

PART TWO:A critical and alternative view of the EALTA Guidelines

A) Potential problems with the EALTA Guidelines

B) The ILTA Codes as a possible alternative

Page 20: Standards In Language Testing

EALTA Mission Statement:

“The purpose of EALTA is to promote the understanding of theoretical principles of language testing and assessment, and the improvement and sharing of testing and assessment practices throughout Europe.”------------------------------------------------------------------------

Applying Mission Statement to EALTA Guidelines:

A) Understanding of theoretical principles of language testing and assessment

B) Sharing of testing and assessment practices

C) Improvement of testing and assessment practices

Page 21: Standards In Language Testing

How well do the Guidelines live up to the EALTA Mission Statement?

Goal A (promote the understanding of theoretical principles of language testing throughout Europe):

YES √

Goal B (promote the sharing of testing and assessment practices throughout Europe):

YES √

Page 22: Standards In Language Testing

Goal C (promote the improvement of testing and assessment practices throughout Europe):

1. If understanding and sharing were promoted,

also likely that improvement was promoted

2. One major problem: in the form of QUESTIONS, rather than STATEMENTS

MAYBE

Page 23: Standards In Language Testing

Alan Davies (2007, 437-438):

“There needs to be a description of the standard or level, an explicit statement of the measure that will indicate that the level has or has not been reached and a means of reporting that decision through grades, scores, impressions, profiles and so on.......Description, measure and report, these three stages are essential….” (underlining ours)

Page 24: Standards In Language Testing

A Code of Ethics (according to Davies, 2007)

REPRESENTS:• a set of principles influenced by “moral philosophy”• a guide to “good professional conduct”• a “benchmark of satisfactory ethical behaviour by members

of a profession”• a ‘blending’ of principles of benevolence, non-maleficence,

justice, a respect for autonomy and for civil society

DOES NOT REPRESENT:• statutes or regulations• guidelines for practice

Page 25: Standards In Language Testing

Sample “Principles” from ILTA’s Code of Ethics

(available for public consultation on ILTA’s webpage at http://www.iltaonline.com )

Principle 1: “Language testers shall have respect for the humanity and dignity of each of their test takers. They shall provide them with the best possible professional consideration and shall respect all persons’ needs, values and cultures in the provision of their language testing service.”

Principle 6:“Language testers shall share the responsibility of upholding the integrity of the language testing profession.”

Principle 9: “Language testers shall regularly consider the potential effects, both short and long term on all stakeholders of their projects, reserving the right to withhold their professional service on the grounds of conscience.”

Page 26: Standards In Language Testing

A Code of PRACTICE (according to Davies, 2007)

• meant to specify or instantiate points mentioned in Code of Ethics

• identifies minimum requirements for practice in profession and focuses on clarification of professional misconduct

Page 27: Standards In Language Testing

Sample items from ILTA’s Code of Practice

(available for public consultation on ILTA’s webpage at http://www.iltaonline.com )

Item A2: “All tests, regardless of their purpose or use, must provide information which allows valid inferences to be made. Validity refers to the accuracy of the inferences and uses that are made on the basis of the test’s scores. If, for example….” (Item continues for 5 more lines)

Item B2: “A test designer must decide on the construct to be measured and state explicitly how that construct is to be operationalised.”

Item B6: “Those doing the scoring should be trained for the task and both inter and intra-rater reliability should be calculated and published.”

Item D3: “Those preparing and administering publicly available tests should publish validity and reliability estimates and bias reports for the test, along with sufficient explanation to allow potential test takers and test users to decide if the test is suitable in their situation.”

Page 28: Standards In Language Testing

CONCLUSIONS

1. A personal view about the assignment

2. Standards means that we should all play by the same rules

3. A final interpretation

Page 29: Standards In Language Testing

CONCLUSIONS

1. A personal view about the assignment, and about standards in general

Page 30: Standards In Language Testing

CONCLUSIONS

2. Standards mean that we should all play by the same rules

Page 31: Standards In Language Testing

CONCLUSIONS 3) the final word on standards & ethics

“It has been suggested that ethics in language testing is no more than an extended validity. This is the argument of Alderson, Clapham and Wall (1995), that ethics is made up of a combination of validity and washback. Validity, and particularly consequential validity, is defined by Messick (1989) as being concerned with the social consequences of test use and how test interpretations are arrived at. Gibbs (1994) considers that consequential validity represents a shift from: ‘a purely technical perspective to a test-use perspective – which I would characterise as an ethical perspective’ (Gibbs, p.146).”

(Davies, 2007: 432)

Page 32: Standards In Language Testing

References

Alderson, J. C., Clapham, C., & Wall, D. (1995). Standards in language testing: The state of the art. In J.C. Alderson, C. Clapham, & D. Wall. Language Test Construction and Evaluation. (pp 235-260). Cambridge: Cambridge University Press.

Boyd, K. and Davies, A. (2002) Doctors’ orders for language testers: the origin and purpose of ethical codes. Language Testing, 19 (3), 296-322.

Davidson, F,, Turner, C., & Huhta, A. (1997). Language testing standards. In C. Clapham & D. Corson (Eds.), Encyclopedia of language and education, Volume i7: Language testing and assessment (pp. 303-311). Dordrecht : Kluwer.

Davies, A. (1997) Introduction: the limits of ethics in language testing. Language Testing 14 (3) 235-241.

Davies, A. (2007) Ethics, professionalism, rights and codes. In E. Shohamy & N.H. Hornberger (Eds.) Encyclopedia of language and education (2nd Ed.), Volume 7: Language Testing and Assessment (pp.419-443). Springer Science + Business Media.

Hamp-Lyons, L. (1997). Washback, impact and validity: ethical concerns. Language Testing, 14 (3) 295-303.

Howe, K.R. (1994) Standards, assessment and equality of educational opportunity. Educational researcher 23, 27-33.

Lynch, B.K. (1997). In search of the ethical test. Language Testing 14 (3) 315-327.

Spolsky, B. (1997). The ethics of gatekeeping tests: What have we learned in a hundred years? Language Testing 14 (3) 242-247.

Page 33: Standards In Language Testing

References

(n.d.). EALTA guidelines for good practice in language testing and assessment. Retrieved from http://www.ealta.eu.org/guidelines.htm

(n.d.). TEEP general description. Retrieved from http://www.cals.rdg.ac.uk/teep/index.asp

(n.d.). TEEP history / background. Retrieved from http://www.cals.rdg.ac.uk/teep/background.asp

(n.d.). TEEP - information for candidates. Retrieved from http://www.cals.rdg.ac.uk/teep/candidates.asp

(n.d.). TEEP - faqs. Retrieved from http://www.cals.rdg.ac.uk/teep/faq.asp

(n.d.). TEEP Extended Handbook, 2001(incorporating Examiner's Report). Retrieved from http://www.cals.rdg.ac.uk/teep/files/2001_teep_extended_handbook.pdf

(n.d.). TEEP Examiner’s Report, 2003. Retrieved from http://www.cals.rdg.ac.uk/teep/files/2003_teep_examiners_report.pdf

(n.d.). TEEP Examiner’s Report, 2004. Retrieved from http://www.cals.rdg.ac.uk/teep/files/2004_teep_examination_report.pdf

(2009). ILTA code of ethics. Retrieved from http://www.iltaonline.com

(2009). ILTA guidelines for practice. Retrieved from http://www.iltaonline.com

Page 34: Standards In Language Testing

Thank you…..for your time & attention.

Here’s something to make you smile….