intelligent computer-assisted language learning · icall development cycle 1. defining target group...
TRANSCRIPT
![Page 2: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/2.jpg)
Self-presentation: Elena Volodina
● 1998 PhD in Linguistics (Moscow, Russia)● 2008 MA in Language Technologies (Gothenburg, Sweden)
2010 - ...Research Engineer (Språkbanken)→ 2017 - …Researcher (SB-Text)
● Lärka development● ICALL research● Second language resources and algorithms● L2 Swedish infrastructure● L2 profiles● ...
“Teachers never tell you their first names because they don't want you
to Google them”
https://spraakbanken.gu.se/eng/personal/elena
![Page 3: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/3.jpg)
3
Focus on literacy● Dutch study:● → Average reading comprehension ~B1 level
Velleman, E., van der Geest, T.: Online test tool to determine the CEFR reading comprehension level of text. Procedia Computer Science 27 (2014)
![Page 4: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/4.jpg)
4
Literacy: Sweden● PIAAC study focusing on literacy● Sweden among 5 “best” of 23 countries on average● Largest discrepancy between native-born and non-native born citizens● → high unemployment rate● → higher risk for deteriorated health
OECD. 2013. OECD Skills Outlook 2013. First Results from the Survey of Adult Skills.PIAAC. 2013. Survey of Adult Skills (PIAAC).SCB. 2013. Tema utbildning, rapport 2013:2, Den internationella undersökningen av vuxnas färdigheter. Statistiska centralbyrån.
![Page 5: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/5.jpg)
5
Societal need
2015: out of 9,9 mln citizens, 2,2 mln have foreign backgrund, dvs 22,2 %(Statistiska centralbyrån)
![Page 6: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/6.jpg)
6
What can we do?cause versus symptoms
![Page 7: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/7.jpg)
NaturalLanguage
Processing+
technical competence
ComputerAssisted
LanguageLearning
+pedagogical competence
ICALL
NLP + CALL = ICALL
![Page 8: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/8.jpg)
ICALL development cycle
1. Definingtarget group
2. Defininglanguage skill
3. Developing resources
4. Developing tools &
algorithms
5. Developing prototype
6. Evaluatingprototype7. Maintenance
![Page 9: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/9.jpg)
ICALL development cycle
1. Definingtarget group
2. Defininglanguage skill
3. Developing resources
4. Developing tools &
algorithms
5. Developing prototype
6. Evaluatingprototype7. Maintenance
Adults vs kidsHealthy vs special needs
…
![Page 10: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/10.jpg)
ICALL development cycle
1. Definingtarget group
2. Defininglanguage skill
3. Developing resources
4. Developing tools &
algorithms
5. Developing prototype
6. Evaluatingprototype7. Maintenance
Writing, speaking, reading, listening,
vocabulary, grammar…
![Page 11: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/11.jpg)
ICALL development cycle
1. Definingtarget group
2. Defininglanguage skill
3. Developing resources
4. Developing tools &
algorithms
5. Developing prototype
6. Evaluatingprototype7. Maintenance
Research
Not research?
Not research
![Page 12: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/12.jpg)
Why do we need resources (data)?
![Page 13: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/13.jpg)
L2 exercisesContext-free,
understandable, level-appropriate
Appropriatehttps://spraakbanken.gu.se/larkalabb/infl-mc
![Page 14: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/14.jpg)
Sentence selection needTarget vocabulary and grammar need
• Vocabulary exercise (L2)• Inflection exercise (L2)• Bundled gaps (L2)• Word-based exercises (L2: egg, listening)• …
• Exercises for students of linguistics (L1)
![Page 15: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/15.jpg)
à Corpus of course books
Produced by experts FOR L2 learners
→ reading comprehension texts→ exercises→ recordings of listening excerpt
![Page 16: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/16.jpg)
COCTAILL corpus
Elena Volodina, Ildikó Pilán, Stian Rødven Eide and Hannes Heidarsson 2014. You get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second Language. NEALT Proceedings Series 22
![Page 17: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/17.jpg)
COCTAILL
![Page 18: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/18.jpg)
COCTAILL ingredients
![Page 19: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/19.jpg)
How it looks: text topics
![Page 20: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/20.jpg)
https://spraakbanken.gu.se/larkalabb/editor
![Page 21: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/21.jpg)
![Page 22: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/22.jpg)
COCTAILL qualitative explorationstopics across levels
![Page 23: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/23.jpg)
COCTAILL explorations:target skills across levels
![Page 24: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/24.jpg)
From COCTAILL to a graded L2 receptive vocabulary:SVALex
Thomas François, Elena Volodina, Ildikó Pilán, Anaïs Tack. 2016. SVALex: a CEFR-graded lexical resource for Swedish foreign and second language learners. Proceedings of LREC 2016, Slovenia.
![Page 25: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/25.jpg)
http://cental.uclouvain.be/cefrlex/svalex/
![Page 26: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/26.jpg)
From course books to automatic CEFR levelassessment in texts & sentences: HitEx
Ildikó Pilán, Elena Volodina, Lars Borin. (2016). Candidate sentence selection for language learning exercises: from a comprehensive framework to an empirical evaluation. TAL Journal: Special issue NLP for learning and Teaching. Volume 57, Number 3.
![Page 27: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/27.jpg)
HitEx
Machine learning(supervised training)
features trainedclassifier
POStags lexicons
Readabilitystudies
dependencyrelations
80% correct (texts)63% correct (sentences)
Course books (training data)
Languagelearningresources
![Page 28: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/28.jpg)
Features
![Page 29: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/29.jpg)
https://spraakbanken.gu.se/larkalabb/hitex
![Page 30: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/30.jpg)
![Page 31: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/31.jpg)
From SVALex & text classification experiments to text evaluation: TextEval• https://spraakbanken.gu.se/larkalabb/texteval• Text analysis platform• Assessment of learner written language and expert written texts
• CEFR level (machine learning)• Highlighting vocabulary by CEFR level (based on graded word lists)• Out-of-vocabulary items are a challenge
Ildikó Pilán, Elena Volodina and David Alfter. 2016. Coursebook texts as a helping hand for classifying linguistic complexity in language learners' writings. Proceedings of the workshop on Computational Linguistics for Linguistic Complexity (CL4LC), COLING 2016, Osaka, Japan.
![Page 32: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/32.jpg)
https://spraakbanken.gu.se/larkalabb/texteval
![Page 33: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/33.jpg)
![Page 34: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/34.jpg)
Example text
• Den 6 juni utnämndes till Sveriges officiella nationaldag först år 1983, och blev helgdag 2005. Före 1983 var dagen känd som svenska flaggans dag, men har firats som inofficiell nationaldag sedan 1916 – dessförinnan var den känd som Gustafsdagen. Huvudskälet till firandet är just att den då 27-årige Gustav Vasa valdes till kung av Sverige den 6 juni 1523, varpå Kalmarunionen upplöstes och Sverige blev självständigt. Även 1809 och 1974 års regeringsformer, som båda skrevs under den 6 juni, anges som skäl att högtidlighålla dagen. En vanlig stereotyp är att svenskars nationaldagsfirande varken är särskilt omfattande eller patriotiskt – vanligtvis i jämförelse med norrmännens 17 maj-firande. Jonas Engman, sakkunnig i traditionsfrågor vid Nordiska museet, anser att norrmännen snarare är undantaget. - Vi tittar gärna på Norge och frågar oss varför vi inte gör som norrmännen. Norge är dock nog mer ovanligt i sitt firande, sett till övriga Norden. De har varit med om krig, men det har även finländarna och danskarna. Den nationella identiteten spelade nog en stor roll under upplösningen av unionen med Sverige, säger Engman till TT. En kluven dag. Jonas Engman påpekar att den svenska attityden till den 6 juni präglas av en viss kluvenhet. - Bland annat har arbetarrörelsen, som betonade internationell solidaritet över nationell patriotism, varit inflytelserik här. Nationalismen bröt också ut sent hos oss – under sent 1800-tal – medan många av de andra europeiska nationalstaterna tillkom redan efter Napoleonkrigen. Vi är stolta över Sverige på olika sätt, mycket patriotism finns exempelvis i idrotten, säger han. Rättad: I en tidigare version av texten uppgavs fel antal år sedan första officiella firandet.
![Page 35: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/35.jpg)
SweLL pilot
Elena Volodina, Ildikó Pilán, Ingegerd Enström, Lorena Llozhi, Peter Lundkvist, Gunlög Sundberg, Monica Sandell. 2016. SweLL on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings of LREC 2016, Slovenia.
![Page 36: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/36.jpg)
![Page 37: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/37.jpg)
SweLL pilot
![Page 38: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/38.jpg)
Topics
![Page 39: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/39.jpg)
L1s
![Page 40: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/40.jpg)
Age
![Page 41: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/41.jpg)
Non-lemmatized items
![Page 42: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/42.jpg)
From SweLL-pilot to productive vocabularySweLLex
Elena Volodina, Ildikó Pilán, Lorena Llozhi, Baptiste Degryse, Thomas François. 2016. SweLLex: second languagelearners' productive vocabulary. Proceedings of the workshop on NLP4CALL&LA. NEALT Proceedings Series / LiUP
![Page 43: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/43.jpg)
![Page 44: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/44.jpg)
From SweLLex and SVALex to level classificationof new vocabulary: Siwoco
• https://spraakbanken.gu.se/larkalabb/siwoco• Automatic prediction of single word lexical complexity
• SVM, Logistic regression, MLP classifier• Features: Word length, syllables, suffix length, gender, homonymy, polysemy,
compounds, N-grams, topic distribution
• Validation through crowdsourcing
David Alfter, Elena Volodina. 2016. Towards Single Word Lexical Complexity Prediction. Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications 2018, NAACL.
![Page 45: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/45.jpg)
![Page 46: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/46.jpg)
Second chance: starting NOT from scratch
![Page 47: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/47.jpg)
![Page 48: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/48.jpg)
Grant information
Elena Volodina, Beata Megyesi, Mats Wirén, Lena Granstedt, Julia Prentice, Monica Reichenberg, Gunlög Sundberg. 2016. A Friend in Need? Research agenda for electronic Second Language infrastructure. Proceedings of SLTC 2016, Umeå, Sweden
![Page 49: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/49.jpg)
SweLL promises (main)
à
1. Deliver a well-annotated (gold standard) corpus of L2 essays• 600 essays, approx 100 per CEFR levels A1-C1 + 100 for control L1 learner corpus• Incl manual error annotation & manually checked linguistic annotation• Make available for research (and public?)
![Page 50: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/50.jpg)
SweLL promises (main)
2. Set a platform (and workflow) for • Continuous upload of new essays• Manual error-annotation• Automatic linguistic annotation
à à
![Page 51: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/51.jpg)
SweLL promises (main)
• Set a platform for browsing L2 essays • in concordance fashion (+parallel view)• In full text fashion
![Page 52: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/52.jpg)
SweLL focus (main)
• Adult learners (16+ years)• Healthy learners• Written essays (no speech data)• Where possible – longitudinal data
![Page 53: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/53.jpg)
SweLL promises (side path, rather experimental)
• Design a set of exercises• To elicit (structured) responses that would answer some interesting research questions• To create this way a database that could be used for research
• Develop further Lärka platform for • Deploying the above exercises• Link user answers to their individual ”profiles” (age, gender, L1s, …)
![Page 54: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/54.jpg)
![Page 55: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/55.jpg)
Data
![Page 56: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/56.jpg)
2013 2014 2015 2016 2017
Corpus creation
0
5
10Experiments
Num
ber o
f arti
cles
9
Essay corpus, SweLL-corpus, creation and SweLL-based publications
Curios “time & effort” fact:Data vs experiments
2
![Page 57: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/57.jpg)
Lifetime of corpora vs tools
• Corpora creation costs both in time and money, but:
• Well-documented, representative, reliably annotated and available corporaare used far beyond their initial research purpose
• Penn TreeBank (Marcus et al., 1993; cited 6813 times), is still used for research (e.g. Pawar, A., & Mago, V., 2018)
• ICLE (Granger, 1998; cited 358 times) à modern research (e.g. Möller, 2017)
• Whereas tools trained on corpora get outdated as research makes progress
è
![Page 58: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/58.jpg)
Lifetime of tools
https://spraakbanken.gu.se/larka/archive
2012-2016https://spraakbanken.gu.se/larkalabb/
2016-...
![Page 59: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/59.jpg)
Tools decay, data stay
![Page 60: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/60.jpg)
Available data
![Page 61: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/61.jpg)
Corpus availability(and the legal hassle)
• Necessary step acc to GDPR (EU General Data Protection Regulation)• Names and personality cannot be revealed or traced to the real person• Everyone has the right to know which databases he/she is represented in• Everyone has the right to withdraw from the database
• Hence, we cannot destroy the ”Name ßà ID” mapping keys if wewant to have (longitudinal) data
• Anyone can demand access to the data (acc to Principle of Public Access to Official Records, Swedish law)
• à however no right to use the information!
SweLL agreement form: https://goo.gl/5hKuew
![Page 62: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/62.jpg)
GDPR
• Restrictions on use of personal information to protect ”subjects”, i.e. physical people
• Important consequences for learner corpora (L2) projects –IF you want data to be available for research!
• Metadata precautions• Text de-identification and pseudonymization• Name-ID mapping keys handling
![Page 63: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/63.jpg)
SweLLL2 infrastructure
project
![Page 64: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/64.jpg)
No information on the country of birth
Birthyear: 5-year spans, e.g. 2000-2004
No exact date for entering the L2 country
No information on school or teacher
Pseudonymization of text data: names, cities, ages, professions, etc.
![Page 66: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/66.jpg)
![Page 67: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/67.jpg)
Example essay (translation into English + mocking errors)
• I live in Guntorp on apartement . I live with my boyfriend . His name is Hans . The apartement mine has a pattio and tree room . Jag enjoy there in Guntorp but a lot of time to goto shop , fortifive minut . I have the bus and the Guntop train . Jag lived in Norway bifore , in Tromsö . It was less than Gunntorp . I enjoy their too becaus I had more friends. I think it is hard to have friends here . But I enjoy better job here . In Tromsö jobbe I only on one website . In Guntorp I work on many website . I am webdevelooper . But Guntorp is closser to Spain than Tromsö . It is important how one lives because I am not in my country . I mess my mother and my father but I live her with my boyfriend .
![Page 68: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/68.jpg)
![Page 69: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/69.jpg)
To-dos (1)• Test NER on original learner texts:
à Can NER speed up the process? Noise? How about essays reviewing books and films, political events etc?
• Automate pseudonymization for English (partly done for Swedish): lists, consequency of geographical namereplacements, etc
à Assess risks of introducing errors that were not in the original text and find ways of avoiding themà Add a possibility of setting the whole text into a ”cultural” context, e.g. Astrid Lindgren’s or Hungarian, etc.
• Test replicating grammatical forms (and errors?) in pseudonymized segments
à e.g. Stadsbibliotekets --> The Volvo’sà Asses the possibility of projecting MSDs from the original text and evaluate their reliability
• Link to Lärka and crowdsource pseudo-tag corrections by essays writers. Learn from ”correction reports”
![Page 70: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/70.jpg)
Reliable and interesting data
![Page 71: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/71.jpg)
Annotation makes data interesting/useful(you get what you annotate)
![Page 72: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/72.jpg)
Annotation…
• …is now the place where linguistics hides in NLP (Fort, 2016)• Parts of speech• Base forms of the words (lemmas)• Syntactic and semantic information• …
Karën Fort. 2016. Collaborative Annotation for Reliable Natural Langugage Processing. Wiley.
![Page 73: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/73.jpg)
Annotation…
• …can ”hide” other disciplines than linguistics• (e.g. so called) Error annotation • Target skills• Receptive vs productive skills• Level of proficiency in a (second/foreign) language• Text genres• …
![Page 74: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/74.jpg)
Implications (for L2 corpora)
• Take other discipline’s perspectives into account, at least• NLP interests• Second Language Acquisition research questions (or a minor share of those)
• It is worth investing time and money into a resource, and work along:• Corpus design (representativity, balance, availability)• Corpus metadata• Corpus annotation & annotation reliability
![Page 75: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/75.jpg)
NLP needs
• NLP often• is ”applied” to other research disciplines and • seeks to assist with other discipline’s research questions
• but there are a range of (traditional) questions• (automatic) error detection• (automatic) error correction• (automatic) essay grading• (automatic) essay classification (e.g. by level of proficiency, genre, topic, grade…)• L1 identification• Linguistic complexity studies (syntax, vocabulary, etc.)• (semi-automatic) anonymization (pseudonymization)• Writing support / feedback • …
![Page 76: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/76.jpg)
SLA needs
• Longitudinal L2 data underlying mental representations and developmental processes (e.g. Myles, 2005)
• Speech data (e.g. Myles, 2005)
• Task-based data (e.g. Alexopoulou et al., 2017)
• Individual cognitive processes (scores from intelligence tests, motivation test, aptitude tests; Granger & Paquot, 2017)
• …
![Page 77: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/77.jpg)
SweLL corpus design principles
• Representativeness• (most popular) immigrant languages• age and gender • levels of proficiency• various tasks ?• L2 vs L1 learners/writers
• Balance
• Annotation• Documentation
Hovy E.H., Lavid J.M. 2010. ”Towards a ”Science” of corpus annotation: a new methodological challenge for corpus linguistics.
Pre-annotation decisions
Post-annotation work
![Page 78: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/78.jpg)
Representative data
![Page 79: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/79.jpg)
Corpus designL1s A1 A2 B1 B2 C1 Control group Total
M F M F M F M F M F M F
Arabic 5 5 5 5 5 5 5 5 5 5 X X 50
Dari/Persian 5 5 5 5 5 5 5 5 5 5 X X 50
English 5 5 5 5 5 5 5 5 5 5 X X 50
Greek 5 5 5 5 5 5 5 5 5 5 X X 50Croatian/BKS 5 5 5 5 5 5 5 5 5 5 X X 50
Sorani 5 5 5 5 5 5 5 5 5 5 X X 50
Kurmanji 5 5 5 5 5 5 5 5 5 5 X X 50
Somali 5 5 5 5 5 5 5 5 5 5 X X 50
Spanish 5 5 5 5 5 5 5 5 5 5 X X 50
Tigrinya 5 5 5 5 5 5 5 5 5 5 X X 50
50 50 50 50 50 50 50 50 50 50 50 50 600
![Page 80: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/80.jpg)
Annotation campaign management
Adriane Boyd
![Page 81: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/81.jpg)
1. Building a corpus (data,
metadata)
2. Tagset,guidelines,
tool
3. Pilot with acorpus sample
4. Qualitativeanalysis
(comparingannotators’ decisions)
5. Quantitativeanalysis (inter-
annotatoragreement)
6. Annotatingcorpus
(biweeklymeeting)
7. Post-campaign:delivery,
maintenance
Representative?Balanced?Accessible?
no
yes
Reliable annotation?Stable annotation? Appropriate tagset?
Guidelines?
yes
no
Hovy et al. 2010. Towards a ”Science” ofcorpus annotation…
![Page 82: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/82.jpg)
1. Building a corpus (data,
metadata)
2. Tagset,guidelines,
tool
3. Pilot with acorpus sample
4. Qualitativeanalysis
(comparingannotators’ decisions)
5. Quantitativeanalysis (inter-
annotatoragreement)
8. Annotatingcorpus
(regularchecks)
10. Corpus publication
or reviewingor correction,
delivery, maintenance
Representative?Balanced?Accessible?
no
yes
Reliable annotation?Stable annotation? Appropriate tagset?
Guidelines?
yes
no
Fort. 2016. Collaborative annotation…
6. Mini-referencecorpus for annotator
training
7. Annotatortraining
(collective, individual)
Learning curves, checks,
Updates to tagset, guidelines
yes
9. Randommanual
checks by experts
![Page 83: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/83.jpg)
1. Building a corpus (data,
metadata)
2. Tagset,guidelines,
tool
3. Pilot with acorpus sample
4. Qualitativeanalysis
(comparingannotators’ decisions)
5. Quantitativeanalysis (inter-
annotatoragreement)
8. Annotatingcorpus
(regularchecks)
10. Corpus publication
or reviewingor correction,
delivery, maintenance
Representative?Balanced?Accessible?
no
yes
Reliable annotation?Stable annotation? Appropriate tagset?
Guidelines?
yes
no
Fort. 2016. Collaborative annotation…
6. Mini-referencecorpus for annotator
training
7. Annotatortraining
(collective, individual)
Learning curves, checks,
Updates to tagset, guidelines
yes
9. Randommanual
checks by experts
![Page 84: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/84.jpg)
Annotation quality
• Reliability & stability à through inter-annotator agreement checks• Reproducibility à agreement of an annotator with himself, intra-
annotator agreement• Random manual checks of the annotations by experts or evaluators
![Page 85: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/85.jpg)
Error taxonomy
![Page 86: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/86.jpg)
Error annotation
• Don’t say the ”E-word”! • Negative connotation (SLA)• Norm deviations – not better, though• Interlanguage phenomenon (Díaz-Negrillo et al., 2009)• Practice-oriented view as a ”non-norm adequate form” (Dobric, 2015)• Unexpected uses (Gaillat et al. 2014)• Cross-disciplinary misunderstanding?
• Ideal to counter-balance error annotation with so called ”can-do” annotation
• à would allow for e.g. CAF analysis (Complexity, Accuracy, Fluency) (Wolfe-Quintero et al., 1998)
• à would probably help (a bit) to cloze the gap between SLA, LCR & NLP
![Page 87: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/87.jpg)
Error annotation
• Don’t say the ”E-word”! (Julia Prentice, EuroSLA, submitted)
• Negative connotation (SLA)• Norm deviations – not better, though• Interlanguage phenomenon (Díaz-Negrillo et al., 2009)• Practice-oriented view as a ”non-norm adequate form” (Dobric, 2015)• Cross-disciplinary misunderstanding?
• Ideal to counter-balance error annotation with so called ”can-do” annotation
• à would allow for e.g. CAF analysis (Complexity, Accuracy, Fluency) (Wolfe-Quintero et al., 1998)
• à would probably help (a bit) to cloze the gap between SLA, LCR & NLP
What’s in a name?That which we call a rose
by any other namewould smell as sweet.
Shakespeare
![Page 88: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/88.jpg)
Error à Correction annotation
![Page 89: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/89.jpg)
Ideal picture (errors + can-do’s)
Linguistic element absent
AbsentNo annotation
Linguistic element present, but in a deviating form
Error-annotated segment / Can-do annotated
segment
Linguistic element present in a correct
form
Can-do annotated segment
phenomenon
annotation
![Page 90: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/90.jpg)
Taxonomy
Taxonomies are like underwear; everyone needs them, but no one wants someone else’s
Anon
Standards are like tooth brushes; everyone likes the idea of them, but no one wants someone else’s
Anon
Egon Stemle, EURAC, Italy
![Page 91: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/91.jpg)
![Page 92: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/92.jpg)
SweLL pre-pilot experiment
• ASK versus Merlin taxonomy• …was used by project researchers on 2 essays (i.e. producing 4 files each)• …time was taken• …experiences were recorded
![Page 93: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/93.jpg)
SweLL pre-pilot experiment
• Summary• It takes twice as long to use Merlin taxonomy• ASK taxonomy (L2 Norwegian) is closer to L2 Swedish• ASK lacks some useful tags• Decision: enrich ASK taxonomy with a few Merlin tags
![Page 94: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/94.jpg)
Taxonomy ambiguity
![Page 95: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/95.jpg)
Taxonomy ambiguity
![Page 96: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/96.jpg)
Taxonomy ambiguity
![Page 97: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/97.jpg)
Normalization
* I has was
• Re-writing L2 learner original in a normative way, creating a so-calledtarget hypothesis (Lüdeling et al., 2005)
![Page 98: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/98.jpg)
Normalization
* I has was à I have been ? I was? I had?
![Page 99: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/99.jpg)
Normalization: basic principles
• Minimal change• Positive assumption• Lexical and grammatical competence prior to functional and
structural correctness
![Page 100: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/100.jpg)
Minimal change…
![Page 101: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/101.jpg)
Example* Jag trivs mycket bor med dem. (Eng) I enjoy much live with them.
Potential target hypotheses:
Jag trivs mycket bra med dem à Minimal change (seemingly) à Error: wrong word / spelling?
Jag trivs mycket med att bo med dem à Lexical competence of BO, verb à Errors: idiomaticity error (trivs) +
wrong verb form (bo)
![Page 102: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/102.jpg)
![Page 103: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/103.jpg)
Why normalization as a separate step?
• It helps to build a better understanding of a learner’s linguisticcompetence
![Page 104: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/104.jpg)
Why normalization as a separate step?
• It helps to build a better understanding of a learner’s linguisticcompetence
• It can be outsourced to SLA researchers for doing it
![Page 105: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/105.jpg)
Why normalization as a separate step?
• It helps to build a better understanding of a learner’s linguisticcompetence
• It can be outsourced to SLA researchers for doing it• Error annotation depends on the change applied to the original text à and as such it is not ERROR annotation, but CORRECTION annotation
![Page 106: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/106.jpg)
Why normalization as a separate step?
• It helps to build a better understanding of a learner’s linguisticcompetence
• It can be outsourced to SLA researchers for doing it• Error annotation depends on the change applied to the original text –and as such is not ERROR annotation, but is CORRECTION annotation• Inter-annotator agreement with respect to error codes can be objectively measured only given that the annotators are working on the same normalized version
![Page 107: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/107.jpg)
SweLL normalization tool• Transformation-based• String matching & calculating diff• Linking on the fly (original – normalized versions)• Parallel text
• Coming (if ever):• Drop-down menus for error codes• Drag-and-drop (spaghetti view)• Three-tier representation (original, spell-corrected, normalized)
• Desired:• Support with automatic spelling error detection
Dan Rosén, developer
Arild Matsson,research engineer
![Page 108: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/108.jpg)
SweLL normalization & error-annotation tool– hands-on demo
• https://spraakbanken.gu.se/swell/dev/
![Page 109: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/109.jpg)
Inter-annotator agreement (IAA), pilot1
![Page 110: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/110.jpg)
What to compare?
![Page 111: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/111.jpg)
![Page 112: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/112.jpg)
![Page 113: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/113.jpg)
![Page 114: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/114.jpg)
![Page 115: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/115.jpg)
115
COCTAILL “ingredients”
![Page 116: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/116.jpg)
116
IAA: How it looks: text example
Freja looks into Jonas's horoscope: You are playful, and if you can choose, you'd spend theday getting to know better somebody you are acquainted with. The evening will beromantic.
And then into her own: The love life is a mess, but otherwise, the day will be funny, sensualand entertaining. Don't work yourself up. You will receive compliments from somebody inyour surrounding.
![Page 117: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/117.jpg)
117
IAA: How it looks: text topics to choose from
![Page 118: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/118.jpg)
118
IAA: How it looks: result(1) culture and traditions, (2) daily life , (3) relations with other people, (4) religion; myth and legends
![Page 119: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/119.jpg)
Intra- & inter-annotator agreement…
• ”…if humans can agree on something at N%, systems will achieve (N-10%)…” (Hovy & Lavid, 2010)
• ”In Sklandica, a Polish treebank, 20% of the agreed annotations were in factwrong.” (Fort, 2016; Wolinski et al., 2011)
• ”Whatever measure(s) is/are employed, the annotation manager has to determine the tolerances: when the agreement is good enough?” (Hovy & Lavid, 2010)
• ”…perhaps it doesn’t matter what the agreement level is, as long as pooragreements are seriously investigated.” (Hovy & Lavid, 2010)
![Page 120: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/120.jpg)
Finally
• Central question in manual annotation: how to obtain reliable, usefuland consistent annotations?
• Annotation in corpora has a theoretical impact: empirical observations à extension/redifinition of theory
• Annotation in corpora has a practical impact: application withinteaching, tool and algorithm building
![Page 121: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/121.jpg)
The NLP community generally is not very concerned with the theoreticallinguistic soundness. The Corpus Linguistics community does not seem
to seek ”reliability” in the annotation process and results.
(Hovy and Lavid, 2010)
![Page 122: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/122.jpg)
Lesson 1
● Do not underestimate the time it takes to collect and prepare data
● Preparing a resource can be a research&developmentproject in itself (e.g. structured input from exercises for VPs and NPs + providing feedback on that)
![Page 123: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/123.jpg)
123
Time-effect ratio consequences
● Researchers skip compiling own data→ use what is available→ in the end often targeting English
![Page 124: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/124.jpg)
Lesson 2
● Take time to study legal regulations, not to waste previously collected data→ There are “loopholes”, but not without information loss
![Page 125: Intelligent Computer-Assisted Language Learning · ICALL development cycle 1. Defining target group 2. Defining language skill 3. Developing resources 4. Developing tools & algorithms](https://reader036.vdocuments.net/reader036/viewer/2022071100/5fd8c96d53a93023024846ba/html5/thumbnails/125.jpg)
125
Question to you
● In your Master Thesis → Do you plan to collect & prepare data yourself? → or Is data already available?