EUROCALL 2007 - University of Ulster, 5 - 8 September
Developing annotation solutions for online data-driven learning
Pascual Pérez-Paredes and Jose María Alcaraz
SACODEYL
Universidad de Murcia, Spain
EUROCALL 2007 - University of Ulster, 5 - 8 September
System Aided Compilation and Open Distribution of European
Youth Language
225836-CP-1-2005-1-ES-MINERVA-M
EUROCALL 2007 - University of Ulster, 5 - 8 September
Developing annotation solutions for online data-driven learning
1. Annotation in CL
2. Annotating corpora for the FL classroom
3. Challenges of pedagogical annotation
4. Developing annotation solutions
5. SACODEYL annotator
Domainanalysis
Requirements and
software specification
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Add-on
• Needs of the research community
• Annotation = analysis
• Annotation = processing
Annotation in Corpus Linguistics
EUROCALL 2007 - University of Ulster, 5 - 8 September
Why annotate?
Annotation allows corpus users for both refined information retrieval capabilities and the
subsequent treatment of the data
EUROCALL 2007 - University of Ulster, 5 - 8 September
Annotation
• Can be automatic, semi-automatic or manual
• Can be performed by one or different annotators or software operators
• Does reflect the different nature of the ultimate aim of the meta-information being added to the corpus
EUROCALL 2007 - University of Ulster, 5 - 8 September
Non polysemic ambiguity: Poesio and Artstein (2005) -----------Interest in L2 speakers’ errors: Abe and Tono (2005)
EUROCALL 2007 - University of Ulster, 5 - 8 September
Strong research paradigm rooted on
grammatical tagging, including morphological and syntactical information
(Garside, R., Leech, G., and McEnery 1997).
EUROCALL 2007 - University of Ulster, 5 - 8 September
2 Annotating corpora for the FL classroom
2.1 Corpora in the FL classroom
EUROCALL 2007 - University of Ulster, 5 - 8 September
Interest in corpora and FLT:• Volumes: Sinclair 2004, Braun, Kohn and
Mukherkee 2006, Hidalgo, Quereda and Santana 2007
• SIG EUROCALL
• 1st International Conference on Corpus-Based Approaches to ELT , November 2007
EUROCALL 2007 - University of Ulster, 5 - 8 September
Normalisation is still an issue:• Mauranen (2004:99) points out that for a
teaching method to become an important innovation, it has to “make its way to the normal classroom where teachers and students can use it as part of their everyday routine, with not too much extra hassle”.
• Chambers 2007: major obstacles• Braun 2007: secondary education
EUROCALL 2007 - University of Ulster, 5 - 8 September
2 Annotating corpora for the FL classroom
2.2 Annotating with a view on learning
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Braun (2007): pedagogically motivated corpora
(a) provide a more systematic range of material than individual texts or scattered collections of activities and, if well-designed, (b) offer a wider range of idiolects than the average material.
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Braun (2006) states that thematic annotation, including topic keys and section titles, are particularly useful in the implementation of pedagogically motivated corpora.
EUROCALL 2007 - University of Ulster, 5 - 8 September
<event start="0m0" end="1m24" video="horse_caravanning_ie" duration="1m24" wordcount="223">
<topic><topic_title>What we do</topic_title><topic_key>02 What we do</topic_key><content_key/>
</topic><speaker name="Dieter">In the 60s, in the late 60s, I had worked in Germany for a
while and I decided that I wanted to have my children reared in Ireland. So we came back from Germany, working for the Irish Tourist Board and started this enterprise <break/>. It's lovely now with the sunshine, we don't always have it like this, but very often. We started with 12 and then 20 caravans, and now we have about 35. And it's been a basis of what which we can live as a family, raise our children in a nice environment. We work very hard for three months and then have a very relaxed time of it, nine months. And in that time then I took on as a hobby computers, and Mary took on tour-guiding. So we have various different aspects to what we do.The horse caravans is a very intensive work just for those three months, but it's very enjoyable because we mix in the family a quiet nine months where we are very much en famille with the children, you can concentrate on them much more than if we were nine-to-five workers. And then the intensity of the three months means that we can also have our children employed, and learning how to work, learning how to deal with people. So, good mixture, isn't it.<cut/></speaker></event>
EUROCALL 2007 - University of Ulster, 5 - 8 September
• The annotators have a pedagogical use of the text in mind when approaching the annotation stage.
• The tags <topic_title>, <topic_key> and <content_key> highlight the relevance of the communicative purpose of texts, that is, the topics and the contents that characterize them.
EUROCALL 2007 - University of Ulster, 5 - 8 September
Corpus
LanguageData
Annotation
Language
Metadata
Pedagogy
EUROCALL 2007 - University of Ulster, 5 - 8 September
Rememberthe why annotate? slide
Annotation allows corpus users for both refined information retrieval capabilities and the
subsequent treatment of the data
PEDAGOGY
EUROCALL 2007 - University of Ulster, 5 - 8 September
Linguistic analysis of interest in FLT
Tsui (2004)
Corpus-based studies focus on 4 areas of description:
1. Lexical collocation
2. Syntactic patterning
3. Genre analysis
4. Discourse structure and cohesion
Word based and relying
on co-occurrence of grammatical word-class tags
EUROCALL 2007 - University of Ulster, 5 - 8 September
Researcher/LinguistEnd user
Linguistic analysis of interest in FLT------>
Linguistics comes first------->
DDL materialsConcordances and corpus
EUROCALL 2007 - University of Ulster, 5 - 8 September
Pedagogical analysis (and annotation)
of language corpora------>
Pedagogy comes first------->
Pedagogy-driven
DDL
Material developer/Teacher/ LearnerEnd user
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Problem-oriented tagging
• Corpus applications in FLT still need to gain a status on their own
CHALLENGES
EUROCALL 2007 - University of Ulster, 5 - 8 September
Leech (1993) maxims– remove the annotation from the text; – if desired, the annotation could be extracted – based on guidelines everyone could reach; – it should be made clear how and by
whom the annotation was carried out,– it should be based on widely agreed
and theory-neutral principles
DESIGN
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Presuppositions and foundations: antecedent implications in the literature
• Annotation oriented towards pedagogical uses
EPISTEMOLOGY
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Mukherjee (2006): copora in language pegagogy for (a) dictionaries and material, (b) database and (c) representative samples of learner language.
EPISTEMOLOGY
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Meunier (2002): methodological influence ---- use of classroom concordancing and inductive approach to learning leading to “rehabilitation” of grammar (p. 135)
EPISTEMOLOGY
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Bernardini (2000): inductive and deductive learning, probabilistic notion of language and learning pedagogy that resolves the attention to form /meaning dichotomy
EPISTEMOLOGY
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Bernardini (2000):
learners as either researchers or travellers
EPISTEMOLOGY
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Bernardini (2004): potential of corpora as a linguistic aid: favour descriptive insights and discovery learning
EPISTEMOLOGY
EUROCALL 2007 - University of Ulster, 5 - 8 September
• Pérez-Paredes (2003,2004): integrative paradigm of CL in FLT
EPISTEMOLOGY
EUROCALL 2007 - University of Ulster, 5 - 8 September
TECHNOLOGY
•User-friendly: non-computational linguists
•Multilingual support
•Standard-compliant: reusability and valorisation
EUROCALL 2007 - University of Ulster, 5 - 8 September
Developing Annotation Solutions
From Challenges To Requirements
From software engineering perspective, development can be considered as the following process:
InputSoftware
EngineeringOutput
InputSoftware
EngineeringOutput
From Requirements To Solutions
EUROCALL 2007 - University of Ulster, 5 - 8 September
Input Requirements• Input = User Requirement• Changing Approach = Changing Requirements• Identifying New Requirement
– Five Perspectives
InputSoftware
EngineeringOutput
Input Details
Analysis Process
Context
DataActors
Epistemology
Analysis Process
Input Details
Empirical
EUROCALL 2007 - University of Ulster, 5 - 8 September
Actors & Context. Linguistic Engineering vs Pedagogical Engineering
Researching
• Powerful Tool• Research Oriented• Extensible & Modular• Specific Domain• Efficient• Complexity• Ad-Hoc Solutions• Mandatory
Teaching
• Pedagogic Tool• Learning Oriented• Friendly• General Domain• Practical• Simplicity• Organizational• Optional
InputSoftware
EngineeringOutput
EUROCALL 2007 - University of Ulster, 5 - 8 September
Data. Grammatical vs Pedagogical
Linguistic Engineering
• Large amount of data (representative Corpora)
• Grammatical Annotation
• Oriented to retrieve statistical Information
Learning
• Reduced set of data
• Pedagogy Annotation
• Oriented to retrieve learning information(Hierarchical Structures & Selective Information)
InputSoftware
EngineeringOutput
EUROCALL 2007 - University of Ulster, 5 - 8 September
Epistemological & Empirical• Multi-Disciplinarily support
• Multi-Lingual support
• Multi-Corpus Management
• Multi-Purpose Support
• Based on StandardsInput
Software Engineering
Output
EUROCALL 2007 - University of Ulster, 5 - 8 September
Choosing Software Life Cycle
Analysis
DesignImplementing
Testing
Maturity
time
Spiral Approach
Why?
InputSoftware
EngineeringOutput
EUROCALL 2007 - University of Ulster, 5 - 8 September
Output. SACODEYL Annotator
SACODEYL Annotator characteristics:
• Pedagogical Motivation• Teaching Oriented• Friendly Interface• Multi-Language (UTF)• Standardization (TEI)• Multi-Purpose
InputsSoftware
EngineeringOutputs
EUROCALL 2007 - University of Ulster, 5 - 8 September
Developing annotation solutions for online data-driven learning
Contact information
Pascual Pérez-Paredes [email protected]
Jose María Alcaraz [email protected]
Universidad de Murcia, Spain