exploring mismatches of scores from automated writing...

19
Exploring Mismatches of Scores from Automated Writing Evaluation (AWE) Software and Instructor Rating in ESL Classes ² from SFL perspective Zhi Li, Volker Hegelheimer ALT, Iowa State University MIDTESOL12, Ames

Upload: phamtu

Post on 07-Mar-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

Exploring Mismatches of Scores from Automated Writing Evaluation (AWE) Software and Instructor Rating in ESL Classes from SFL perspective

Zhi Li, Volker Hegelheimer ALT, Iowa State University MIDTESOL12, Ames

Criterion®, an example of AWE

Criterion  Research  at  ISU               Large  research  group  on  AWE  

Focus  on  Holistic  Scores  

3

   

  score  6  on  Paper  4  

Criterion scores  

Instructor Scores  

[A]   [A-]   [B+]   [B]   [B-]   [C+]   [C]  

Paper 4  

6   11   6   5   9   2  

4

Background    (Study  presented  at  LTRC  2012)  

AWE Related Research

Figure 1. The decomposition of e-rater into features and microfeatures. Enright & Quinlan, 2010, p. 320

Systemic Functional Linguistics (SFL)

Why SFL (also known as Systemic Functional Grammar, SFG)?

Grammatical structure and meaning (Ravelli, 2000). SFL as a tool for genre analysis (Donohue, 2012; Lee, 2006)

Systemic Functional Linguistics (SFL)

Metafunction Grammatical system

Ideational Representing experience of reality

TRANSITIVITY

Interpersonal Enacting social relations MOOD

Textual Presenting messages as text in context

THEME

Adopted from Christie & Unsowrth, 2000, p. 9

Transitivity Analysis

You will receive a package this week.

Participant Process Participant Circumstance

Material

Mental

Verbal

Behavioural

Relational

Existential there be

Human

Non-human

Concrete

Abstract

Specific

Non-specific

Theme/Rheme

Fortunately, the proposal was accepted.

(Interpersonal ) (Topical)Theme Rheme

Topical

Textual

Interpersonal

(Immediate)Thematic Progression Patterns

T2 = T1 Especially in recent society, a lot of people T1#R1 live in their own busy world. But most people T2 # R2 also need to communicate with others and build a close relationship with them.

T2 = R1 So I usually T1#R1 chat a lot with my virtual friends. Most of the chatting detail T2#R2 would be some problems I felt sad.  

T2 = S1 So the electronic-communication tools like online social networks, instant messaging and E-mail T1#R1 appear and become widely used in people of all ages. This T2#R2 is really an impact in human society.

T2 = TN Therefore, they T1#R1 do not think they are not respected or ignored. You T2#R2 want to tell your real friends for some reason,

Adapted from Dane (1985) Note: T2 = the theme in the second clause complex, T1 = the theme in the first clause complex, S1 = the first sentence, R1 = the rheme in the first clause complex, TN = new theme.

Writing samples All the writing samples are taken from Engl101C classes collected in Fall 2011. The assignment was an argumentative writing on Virtual Friends. The instructor ratings were an average of two closest ratings from a panel of experience ESL instructors based on Engl101C assignment rubric.

101C 1112

101C 306

101C 723

101C 1119

101C 301

101C 704

T Grade A A A B B B

Arabic Chinese Chinese Chinese Arabic Chinese

Flesch Reading Ease 43.9 69.1 58.4 64.7 55.7 58.4

F-K Grade Level 14 7.6 9.6 8.7 12.1 9.6

Word Count 997 911 897 911 986 905

Note: the Criterion scores for all the papers are 6, the highest.

Data Analysis Transitivity Paper Existential Relational Mental Material Verbal Behavoural

101C1112  2  (2.7%)   31  (41.3%)   6  (8.0%)   25  (33.3%)   1  (1.3%)   10  (13.3%)  

101C723  4  (4.4%)    21  (23.1%)   14  (15.4%)   38  (41.8%)   5  (5.5%)   9  (9.9%)  

101C306  0  (0.0%)   23  (26.7%)   15  (17.4%)   16  (18.6%)   6  (7.0%)   26  (30.2%)  

101C704  2  (3.2%)   25  (39.7%)   7  (11.1%)   23  (36.5%)   2  (3.2%)   4  (6.3%)  

101c301  3  (4.4%)   22  (32.4%)   7  (10.3%)   17  (25.0%)   2  (2.9%)   17  (25.0%)  

101C1119  2  (2.7%)   19  (25.3%)   8  (10.7%)   18  (24.0%)   7  (9.3%)   21  (28.0%)  

Process types in the writing samples (percentage in parenthesis)

Data Analysis

0.0

5.0

10.0

15.0

20.0

25.0

30.0

35.0

EX RE ME MA VE BE

Matched papers Mismatched papers

Comparison of Process types between two sets of writing samples (in averaged percentage; analysis unit is clause)

Data Analysis Theme

Comparison of Thematic Progression between two sets of writing samples (in averaged percentage; analysis unit is clause complex) Note: T2 = the theme in the second clause complex, T1 = the theme in the first clause complex, S1 = the first sentence, R1 = the rheme in the first clause complex, TN = new theme.

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

T2 = T1 T2 = R1 T2 = TN T2 = S1 Textual theme

Matched papers Mismatched papers

Data Analysis Paper T2  =  T1   T2  =  R1   T2  =  TN   T2  =  S1  

Textual  theme  

101C1112  3  (8%)   11  (30%)   18  (49%)   5  (13%)   24  (65%)  

101C723  8  (15%)   13  (25%)   25  (48%)   6  (12%)   30  (58%)  

101C306  8  (16%)   16  (32%)   23  (46%)   3  (6%)   33  (66%)  

101C704  3  (7%)   9  (21%)   29  (69%)   1  (2%)   29  (69%)  

101c301  14  (27%)   6  (11%)   27  (52%)   5  (9%)   21  (40%)  

101C1119  15  (30%)   11  (22%)   20  (40%)   4  (8%)   30  (60%)  

Theme types in the writing samples (percentage in parenthesis) Note: T2 = the theme in the second clause complex, T1 = the theme in the first clause complex, S1 = the first sentence, R1 = the rheme in the first clause complex, TN = new theme.

Conclusions

The analyses of the two sets of writing samples indicate some differences in processes types, thematic progression patterns, and theme types, which, to some

this argumentative writing assignment.

Implications & Future Studies So what?

With bigger sample of student writing or annotated corpus, significant differences could be spotted to distinguish papers of various quality. If these features could be automatically identified and quantified, they could be a part of future AWE scoring system.

a systematic SFL analysis of a larger sample of students writing Inferential statistical analysis of the data (Multiple regression.)

Selected references Donohue, J. P. (2012). Using systemic functional linguistics in academic writing development: An example from studies. Journal of English for Academic Purposes, 11, 4 16. Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing, 27, 317 334. Lee, S. H. (2006). The use of interpersonal resources in argumentative/persuasive essays by East-Asian ESL and Australian tertiary students. Unpublished Dissertation. Sydney, University of Sydney. Mickan, P., & Slater, S. (2003). Text analysis and the assessment of academic writing. International English Language Testing System, 4(2), 59 88. Ravelli, L. (2000). Getting started with functional analysis of texts. In L. Unsworth (Ed.), Researching language in schools and communities: Functional linguistic perspectives (pp. 27 64). London and Washington: Continuum. Schwarz, L., Bartsch, S., Eckart, R., & Teich, E. (2008). Exploring Automatic Theme Identification: A Rule-Based Approach. Text Resources and Lexical Knowledge. Selected Papers from the 9th Conference on Natural Language Processing KONVENS 2008 (pp. 15 26). Mouton de Gruyter.

Thank you! Your questions and comments will be greatly appreciated.

Zhi Li, [email protected] Volker Hegelheimer, [email protected] Criterion research group: http://volkerh.public.iastate.edu/awe