automatic question generation for vocabulary assessment

19
Automatic Question Generation for Vocabulary Assessment J. C. Brown M. Eskenazi G. A. Frishkoff CMU CMU University of Pittsburgh HLT/EMNLP 2005

Upload: azalia-walton

Post on 30-Dec-2015

29 views

Category:

Documents


3 download

DESCRIPTION

Automatic Question Generation for Vocabulary Assessment. J. C. Brown M. Eskenazi G. A. Frishkoff CMU CMU University of Pittsburgh HLT/EMNLP 2005. Abstract. REAP( Reader Specific Lexical Practice) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Automatic Question Generation for Vocabulary Assessment

Automatic Question Generation for Vocabulary Assessment

J. C. Brown M. Eskenazi G. A. Frishkoff CMU CMU University of Pittsburgh

HLT/EMNLP 2005

Page 2: Automatic Question Generation for Vocabulary Assessment

Abstract REAP( Reader Specific Lexical Practice) We describe an approach to automatically

generating questions for vocabulary assessment.

We suggest that these automatically-generated questions give a measure of vocabulary skill that correlates well with subject performance on independently developed human-written questions.

Page 3: Automatic Question Generation for Vocabulary Assessment

Measuring Vocabulary Knowledge

such as knowledge of the spoken form, the written form, grammatical behavior, collocation behavior, word frequency, conceptual meaning, ... (Nation, 1990).

In this, we focus on knowledge of conceptual word meaning.

Page 4: Automatic Question Generation for Vocabulary Assessment

Question Types

we generated 6 types of questions: definition,synonym,antonym,hypernym, hyponym, and cloze questions.

choose the correct sense of the word: POS tag 少部份 human-annotated

Page 5: Automatic Question Generation for Vocabulary Assessment

Question Forms

Each of the 6 types of questions can be generated in several forms, the primary ones being wordbank and multiple-choice.

Page 6: Automatic Question Generation for Vocabulary Assessment

Question Forms chooses distractors of the same POS and similar frequency

to the correct answer, Coniam (1997). from Kilgarriff’s (1995) word frequency database, based o

n the British National Corpus (BNC) . POS tagged using the CLAWS tagger.

chooses 20 words from this database of the same POS equal or similar in frequency to the correct answer randomly chooses as distractors

Page 7: Automatic Question Generation for Vocabulary Assessment

Question Coverage

156 low frequency and rare English words.

Unable to generate any questions for 16(9%) of these words.

All four questions were able to be generated for 75(50%)of the words.

Page 8: Automatic Question Generation for Vocabulary Assessment

Experiment Design

The human-generated questions were developed by a group of three learning researchers.

Examples of each question type were hand-written for each of the 75 words.

Page 9: Automatic Question Generation for Vocabulary Assessment

Experiment Design

synonym and cloze questions, were similar in form to the corresponding computer-generated question types.

An inference task, a sentence completion task, and a question based on the Osgood semantic differential task.

Page 10: Automatic Question Generation for Vocabulary Assessment

Experiment Design

inference task: “Which of the following is most likely to be lenitive?” “a glass of iced tea”

“a shot of tequila” “a bowl of rice” “a cup of chowder.”

Page 11: Automatic Question Generation for Vocabulary Assessment

sentence completion task:“The music was so lenitive…,”

“…it was tempting to lie back and go to sleep.”

“…it took some concentration to appreciate the complexity.”

Page 12: Automatic Question Generation for Vocabulary Assessment

semantic differential task: were asked to classify a word such as “lenitive” along one of these dimensions (e.g., more good or more bad).

Osgood three dimensions: valence (good–bad), potency (strong–weak), and activity (active–passive).

Page 13: Automatic Question Generation for Vocabulary Assessment

we administered a battery of standardized tests, including the Nelson-Denny Reading Test, the Raven’s Matrices Test, and the Lexical Knowledge Battery.

21 native-English speaking adults participated in two experiment sessions.

Page 14: Automatic Question Generation for Vocabulary Assessment

Session 1 lasted for 1 hour and included the battery of vocabulary and reading-related assessments described above.

Session 2 lasted 2-3 hours and comprised 10 tasks, including 5 human and 4 computer-generated questions.

Confidence rating task, on a 1–5 scale.

Page 15: Automatic Question Generation for Vocabulary Assessment

Experiment Results

We report on aspects of this study: • participant performance on questions • correlations between question types• correlations with confidence ratings

Page 16: Automatic Question Generation for Vocabulary Assessment

participant performance on questions

Mean accuracy scores for each question type varied from .5286 to .6452.

computer-generated cloze task .5286 computer-generated definition task and

the human-generated semantic differential task, both having mean accuracy scores of .6452.

Page 17: Automatic Question Generation for Vocabulary Assessment

correlations between question types

r>.7, p<.01 for all correlations computer-generated synonym and the h

uman-generated synonym questions was particularly high (r=.906)

the correlation between the human and computer cloze questions (r= .860)

Page 18: Automatic Question Generation for Vocabulary Assessment

correlations with confidence ratings

The average correlation between accuracy on the question types and confidence ratings for a particular word was .265 (low).

may be because participants thought they knew these words, but were confused by their rarity.

confidence simply does not correlate well with accuracy.

Page 19: Automatic Question Generation for Vocabulary Assessment

Conclusion

computer-generated questions give a measure of vocabulary skill for individual words that correlates well with human-written questions and standardized assessments of vocabulary skill.