bridging informal moocs & formal english for academic purposes programmes with language corpora
DESCRIPTION
Presented at the Teaching and Language Corpora (TaLC) Conference in Lancaster on July 23, 2014. Based on collaborative work with the FLAX Language Project (Shaoqun Wu and Ian Witten) and the Language Centre at Queen Mary University of London (Martin Barge, William Tweddle, Saima Sherazi).TRANSCRIPT
Bridging informal MOOCs & formal EAP programmes with
language corporaAlannah Fitzgerald, Shaoqun Wu, Ian Witten,
Martin Barge, William Tweddle, Saima Sherazihttps://www.flickr.com/photos/library_of_congress/8725417555
/
Today’s TaLC Session...
• Development of Tools and Language Corpora– Design-Based Research with the FLAX Project
• Openness in Corpus-Based Tools, Resources & Practices
• New & Old Contexts of Learning, Teaching & Research with Corpus-Based Approaches– Bridging Formal & Informal Higher Education with
Open Do-It-Yourself ESAP Language Collections
Who are we in this flax research & Development collaboration?
FLAX Language at Waikato University
http://flax.nzdl.org FLAX image by permission of non-commercial reuse by Jane Galloway
FLAX Language Project at the Greenstone Digital Library Lab,
Waikato University NZ
Professor Ian WittenFLAX Project Lead
Dr Shaoqun WuFLAX Project Lead Researcher & Developer
Data Mining with Weka MOOC
https://www.youtube.com/user/WekaMOOC/videos?sort=p&flow=grid&view=0
OER Research Hypotheses
http://oerresearchhub.org/collaborative-research/hypotheses/
Research with Queen Mary U. of London
http://language-centre.sllf.qmul.ac.uk/home
Openness across formal & informal higher education
MOOCs You May Know
Openness in Mainstream MOOCs?
http://www.michaelbransonsmith.net/blog/2012/12/19/day-of-the-mooc-now-animated/
The End of the University As We Know It
“The future looks like this: Access to college-level education will be free for everyone; the residential college campus will become largely obsolete; tens of thousands of professors will lose their jobs; the bachelor’s degree will become increasingly irrelevant; and ten years from now Harvard will enroll ten million students.” (Harden, 2013)
http://www.the-american-interest.com/article.cfm?piece=1352
The Education Apocalypse: #opened13 Keynote
“Where in the stories we’re telling about the future of education are we seeing salvation? Why would we locate that in technology and not in humans, for example? Why would we locate that in markets and not in communities? What happens when we embrace a narrative about the end-times — about education crisis and education apocalypse? Who’s poised to take advantage of this crisis narrative? Why would we believe a gospel according to artificial intelligence, or according to Harvard Business School [Christensen’s Disruptive Innovation theory], or according to Techcrunch...?” (Watters, 2013)
http://hackeducation.com/2013/11/07/the-education-apocalypse/
Current MOOC Language Issues• Mainstream MOOCs (Coursera, edX, Udacity) are predominantly in
the English Language– MOOC learners are not registered as language learners
• Impact on retention and course completion• Crowdsourcing and funding for commercial translations of MOOCs
is currently limited– Translations of lectures only do not assist with assessment
requirements in e.g. English-medium MOOCs• Receptive versus productive language needs
• Mainstream MOOCs do not (in most cases) license content openly as Open Educational Resources (OER) – Open licensing with Creative Commons is vital for developing
derivative resources to support language learning– Building linguistic support into MOOC learning platforms? e.g. a
combination of translation and corpus-based tools?• Online learning offers a compelling case for corpus-based approaches
Openness in Mainstream EAP??
Be Free to Do Whatever You Want!• Open Resources for ESAP
Soup Dragons:– Building & Sharing Open ESAP
Corpora to Promote DIY Corpus-Based Approaches
– Developing Automated Interactivity into ESAP Corpora
– Developing ESAP Course Book and Lesson Plan Derivatives
– Researching and Developing ESAP Corpora & Derivatives
– Researching and Developing Corpus Tools e.g. Interfaces
http://en.wikipedia.org/wiki/The_Soup_Dragons
Open Source language TOOLS development
Google-esque Interface Designs
Designed for the non-expert corpus user, namely:
learners, teachers, subject academics, instructional designers and language resource developers.
Introducing the Wikipedia Miner Toolkit (Milne & Witten, 2013)
Building Interactivity into FLAX Language Collections
FLAX Activities Continued
FLAX Across Platforms• FLAX Website flax.nzdl.org for hosting open online
language collections• Building directly onto the Web with OER
• FLAX multilingual open-source software for downloading onto your PC • For offline use• Building collections out of sight using All Rights Reserved
content• FLAX for MOODLE plug-in• FLAX for MOOC Platforms?• FLAX in conjunction with translation technologies?
Training Videos for FLAX on YouTube
http://www.youtube.com/watch?v=fysDzYjbhh0
Domain-specific open language collections building
Collaboration with Subject Specialists
“In the emerging academic literacies approach involving cooperation between subject specialists and writing teachers, the aim is to help the students develop metacognitive awareness of the roles and functions of writing in that discipline, to enable them to stand back from it and observe how it functions, and then to help them gradually participate in the genres, where genre is understood as a constellation of actions rather than a list of formal features.” (Breeze, 2012)
Earth’s Virology Professor with Coursera MOOCs
“Natural science might be characterized as a discipline of discovery, identifying and describing entities that had not been previously considered. As a result, natural science employs a large set of highly technical words, like dextrinoid, electrophoresis, and phallotoxins. Most of these words do not have commonplace synonyms, because they refer to entities, characteristics, or concepts that are not normally discussed in everyday conversation.” (Biber, 2006)
Virology Language Collection in FLAX
Type of media in the FLAX Virology Collection
Number of items in the FLAX Virology Collection
Podcast audio transcripts (This Week in Virology)
130
YouTube video transcripts (2013 virology course at Columbia, also in Coursera)
110
Academic blog posts (Virology Blog) 540
Open Access research articles (relevant to virology course and divided into paper sections)
40
Streaming Open Lectures/Podcasts
Virology Collocations
Virology Terms and Concept Support
Domain-specific Collocations
We focus on lexical collocations with noun-based structures because they are the most salient and important patterns in topic-specific text:
•verb + noun e.g. detect virus particles•noun + noun e.g. tobacco mosaic virus •adjective + noun e.g. negative strand virus•noun + of + noun e.g. genome of the virus
Lexical Bundles“Lexical bundles” are multi-word sequences with distinctive syntactic patterns and discourse functions that are commonly used in academic prose (Biber & Barbieri, 2007; Biber et al, 2003, 2004).
Typical patterns in the virology MOOC lectures include: •noun phrase + of e.g. a DNA copy of•prepositional phrase + of e.g. at the end of•it + verb/adjective phrase e.g. it turns out that•be + noun/adjective phrase e.g. is an example of•verb phrase + that e.g. you can see that
ESAP Law Collections in FLAX at QMULType of media in the FLAX Law Collections
Number and source of items in the FLAX Law Collections
Podcast audio files & transcripts (OpenSpires)
10-15 Lectures (Oxford Law Faculty & the Centre for Socio-Legal Studies)
MOOC lecture transcripts & videos (streamed via YouTube & Vimeo)
4 MOOC Collections: Copyright Law (Harvard/edX), English Common Law (Uni. of London/Coursera), Age of Globalization (Texas at Austin/edX), Environmental Law & Politics (OpenYale)
Student PhD thesis writing & Pre-sessional for Law ESAP essays
British Law Report Corpus (BLaRC)(Marin, 2012)
10-20 EThoS Theses at the British Library; 20+ Essays from QMUL Law Pre-sessional
8-million word corpus derived from freely available content on the BAILII website
Open Access research articles (relevant to QMUL Law Pre- and In-Sessional language courses)
40 Articles (DOAJ - Directory of Open Access Journals)
Working with Full Texts
Collocations Within ESAP Collections
Linking to the FLAX Learning Collocations Collection (BNC, BAWE, Wikipedia)
Good Ol’ Part-Of-Speech Tagging
Wikify Your Collections
Lexical Bundles
FLAX HTML Formatting Tool
Researching resources at the interface of openness for academic English
Key Data Sets Will Consist Of:• Online survey data
– MOOC learners for evaluation of collections– Language Teaching professionals on perceptions of OER
• Offline data for evaluation of collections and course book derivatives of the collections for ESAP– Survey and Think-Aloud Protocols to evaluate the FLAX
Language System – Student texts from Law students (Queen Mary University
of London). • Interview and focus-group data (f2f and online via
Skype) – With stakeholders (language teachers, academics, MOOC
providers) involved in the development of the academic language collections used in this research.
Interfacing Communities
http://videolectures.net/ocwc2014_fitzgerald_resources/
FLAX Multilingual Open-Source Software
http://videolectures.net/ocwc2014_fitzgerald_multilingual_world/
References• Biber, D., Conrad, S., & Cortes, V. (2003). Lexical bundles in speech and
writing: an initial taxonomy. In A. Wilson, P. Rayson, & T. McEnery (Eds.), Corpus linguistics by the lune: A festschrift for Geoffrey Leech (pp. 71–92). Frankfurt/Main: Peter Lang.
• Biber, D., Conrad, S., & Cortes, V. (2004). If you look at . . .: lexical bundles in university teaching and textbooks. Applied Linguistics, 25, 371–405. Biber, D. (2006). University Language, A corpus-based study of spoken and written registers. John Benjamins, Amsterdam.
• Biber, D., Barbieri F. (2007). Lexical bundles in university spoken and written registers. English for Specific Purpose, 26, 263–286.
• Breeze, R. (2012). Rethinking Academic Writing Pedagogy for the European University. Rodopi, Amsterdam.
• Harden, N. (2013). The end of the university as we know it. The American Interest. Retrieved from http://www.the-american-interest.com/article.cfm?piece=1352
• Milne, D. & Witten, I.H. (2013). An open-source toolkit for mining Wikipedia. Artificial Intelligence, 194, 222-239.
• Watters, A. (2013). The Education Apocalypse #opened13. Retrieved from http://www.hackeducation.com/2013/11/07/the-education-apocalypse/
Thank YouFLAX Language Project http://flax.nzdl.org/
Shaoqun Wu: [email protected] / Ian Witten: [email protected]
OER Research Hub http://oerresearchhub.org/ Alannah Fitzgerald: [email protected]; @AlannahFitz;
www.alannahfitzgerald.org TOETOE Blog; Slideshare: http://www.slideshare.net/AlannahOpenEd/
The Language Centre – Queen Mary University of London http://language-centre.sllf.qmul.ac.uk/
Martin Barge [email protected] William Tweddle [email protected]
Saima Sherazi [email protected]