grammar%proﬁle%for% spokenlearnerdatagrammar%proﬁles% extracngcharacteriscs: a2vsb1 rank...

Grammar Proﬁle for Spoken Learner Data By Brendan Flanagan 1 , Emiko Kaneko 2 , Emi Izumi 3 , Sachio Hirokawa 4 1 Kyushu University, JSPS Research Fellow 2 Aizu University 3 Doshisha University 4 Kyushu University

Upload: others

Post on 30-Jan-2021

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Grammar Profile for Spoken Learner Data

By Brendan Flanagan1, Emiko Kaneko2, Emi Izumi3, Sachio Hirokawa4

1 Kyushu University, JSPS Research Fellow 2 Aizu University

3 Doshisha University 4 Kyushu University
Overview

•  IntroducGon •  Equivalent Proficiency Levels • Grammar PaLern Item Dataset •  SVM & OpGmal Feature SelecGon • CharacterisGc Grammar Profiles •  A1 vs A2 •  A2 vs B1 •  B1 vs B2

• Conclusion
Introduc
Equivalent Proficiency Levels The NICT-‐JLE Corpus and CEFR-‐J

The NICT-‐JLE Corpus is made up of 1280 transcripts of the ACTFL-‐ALC SST (Standard Speaking Test) English oral proficiency interview test.

There are 9 proficiency levels based on the SST scoring

method.
Equivalent Proficiency Levels The NICT-‐JLE Corpus and CEFR-‐J

SST Level 4 is categorized at CEFR-‐J Level A2

(in this presentaGon)

Target Proficiency Levels:

CEFR-‐J: A1, A2, B1, B2

CEFR-‐J Level

# Samples SST 4 as CEFR-‐J A1

# Samples SST 4 as CEFR-‐J A2

A1 236 257

A2 738 717

B1 263 263

B2 40 40
Grammar PaIern Item Dataset•  The NICT JLE corpus exam and data structure:

•  Each secGon was preprocessed to count the occurrence of 493 grammar paLerns, eg:

Stage Task Follow-‐up

1

2 ● ●

3 ● ●

4 ● ●

5

Grammar paGern # 00015 # 00253 # 00287

1:人称代名詞主格(I)+be: I am 2 2 4

1-1: 人称代名詞主格(I)+be: I am not 0 0 0

1-2:人称代名詞主格(I)+be: Am I ...? 0 0 0
Grammar PaIern Item Dataset•  The NICT JLE corpus exam and data structure:

•  Each secGon was preprocessed to count the occurrence of 493 grammar paLerns, eg:

Stage Task Follow-‐up

1

2 ● ●

3 ● ●

4 ● ●

5

Excluded ”Follow-‐up” secGon from analysis as it contains free dialog.

Target secGons for analysis.

Grammar paGern # 00015 # 00253 # 00287

1:人称代名詞主格(I)+be: I am 2 2 4

1-1: 人称代名詞主格(I)+be: I am not 0 0 0

1-2:人称代名詞主格(I)+be: Am I ...? 0 0 0
SVM & Grammar Item Dataset•  The preprocessed dataset was then vectorized to create a

special purpose search engine using GETA[1]. •  The dataset was divided into randomly

selected parts to evaluate the classificaGon performance of SVM models by 10-‐fold cross validaGon.

•  SVMlight[2] linear kernel was used to train/test models. •  To rank the importance of grammar items for feature

selecGon, iniGally an SVM model was trained using all features.

•  The SVM model score for each individual grammar item wi was analyzed to determine the weight(wi) ranking.

[1] hLp://geta.cs.nii.ac.jp [2] hLp://svmlight.joachims.org
SVM & Op
Grammar Profiles Extrac
Grammar Profiles Extrac
Analysis By SVM
Grammar Profiles Classifica
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Grammar Profiles Extrac
Visualiza
Visualiza
Visualiza
Visualiza
Grammar Profiles Visualizing Characteris
Grammar Profiles Visualizing Characteris
Grammar Profiles Visualizing Characteris
Grammar Profiles Visualizing Characteris
Grammar Profiles Visualizing Characteris
Grammar Profiles Visualizing Characteris
Conclusion

• Classified the English proficiency levels of data in a spoken learner corpus by SVM. • CharacterisGc grammar items for each CEFR-‐J Level were extracted. •  To aid interpretaGon of the results, we visualized grammar item features by Decision tree. •  In future work, we will extract the error features of spoken learner data.