grammar%profile%for% spokenlearnerdatagrammar%profiles% extracngcharacteriscs: a2vsb1 rank...
TRANSCRIPT
-
Grammar Profile for Spoken Learner Data
By Brendan Flanagan1, Emiko Kaneko2, Emi Izumi3, Sachio Hirokawa4
1 Kyushu University, JSPS Research Fellow 2 Aizu University
3 Doshisha University 4 Kyushu University
-
Overview
• IntroducGon • Equivalent Proficiency Levels • Grammar PaLern Item Dataset • SVM & OpGmal Feature SelecGon • CharacterisGc Grammar Profiles • A1 vs A2 • A2 vs B1 • B1 vs B2
• Conclusion
-
Introduc
-
Equivalent Proficiency Levels The NICT-‐JLE Corpus and CEFR-‐J
The NICT-‐JLE Corpus is made up of 1280 transcripts of the ACTFL-‐ALC SST (Standard Speaking Test) English oral proficiency interview test.
There are 9 proficiency levels based on the SST scoring
method.
-
Equivalent Proficiency Levels The NICT-‐JLE Corpus and CEFR-‐J
SST Level 4 is categorized at CEFR-‐J Level A2
(in this presentaGon)
Target Proficiency Levels:
CEFR-‐J: A1, A2, B1, B2
CEFR-‐J Level
# Samples SST 4 as CEFR-‐J A1
# Samples SST 4 as CEFR-‐J A2
A1 236 257
A2 738 717
B1 263 263
B2 40 40
-
Grammar PaIern Item Dataset• The NICT JLE corpus exam and data structure:
• Each secGon was preprocessed to count the occurrence of 493 grammar paLerns, eg:
Stage Task Follow-‐up
1
2 ● ●
3 ● ●
4 ● ●
5
Grammar paGern # 00015 # 00253 # 00287
1:人称代名詞主格(I)+be: I am 2 2 4
1-1: 人称代名詞主格(I)+be: I am not 0 0 0
1-2:人称代名詞主格(I)+be: Am I ...? 0 0 0
-
Grammar PaIern Item Dataset• The NICT JLE corpus exam and data structure:
• Each secGon was preprocessed to count the occurrence of 493 grammar paLerns, eg:
Stage Task Follow-‐up
1
2 ● ●
3 ● ●
4 ● ●
5
Excluded ”Follow-‐up” secGon from analysis as it contains free dialog.
Target secGons for analysis.
Grammar paGern # 00015 # 00253 # 00287
1:人称代名詞主格(I)+be: I am 2 2 4
1-1: 人称代名詞主格(I)+be: I am not 0 0 0
1-2:人称代名詞主格(I)+be: Am I ...? 0 0 0
-
SVM & Grammar Item Dataset• The preprocessed dataset was then vectorized to create a
special purpose search engine using GETA[1]. • The dataset was divided into randomly
selected parts to evaluate the classificaGon performance of SVM models by 10-‐fold cross validaGon.
• SVMlight[2] linear kernel was used to train/test models. • To rank the importance of grammar items for feature
selecGon, iniGally an SVM model was trained using all features.
• The SVM model score for each individual grammar item wi was analyzed to determine the weight(wi) ranking.
[1] hLp://geta.cs.nii.ac.jp [2] hLp://svmlight.joachims.org
-
SVM & Op
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Analysis By SVM
-
Grammar Profiles Classifica
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Grammar Profiles Extrac
-
Visualiza
-
Visualiza
-
Visualiza
-
Visualiza
-
Grammar Profiles Visualizing Characteris
-
Grammar Profiles Visualizing Characteris
-
Grammar Profiles Visualizing Characteris
-
Grammar Profiles Visualizing Characteris
-
Grammar Profiles Visualizing Characteris
-
Grammar Profiles Visualizing Characteris
-
Conclusion
• Classified the English proficiency levels of data in a spoken learner corpus by SVM. • CharacterisGc grammar items for each CEFR-‐J Level were extracted. • To aid interpretaGon of the results, we visualized grammar item features by Decision tree. • In future work, we will extract the error features of spoken learner data.