building and analysing your own corpus 1. building a corpus

21
Building and analysing your own corpus 1. Building a corpus.

Upload: horace-gray

Post on 18-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building and analysing your own corpus 1. Building a corpus

Building and analysing your own corpus

1. Building a corpus.

Page 2: Building and analysing your own corpus 1. Building a corpus

Why bother with corpora?

• “Language users cannot accurately report language usage, even their own” (Sinclair, 1987) • “Using a language is a skill that most people are not conscious of; they cannot examine it in detail, but simply use it to communicate” (Sinclair 1995) • “There are many facts about language that cannot be discovered by just thinking about it, or even reading and listening very intently” (Sinclair, 1995) • As language teachers and professionals, we often have strong intuitions about language use… Corpus-based research, however, shows us that our intuitions are often completely wrong. (Biber 2005)

Page 3: Building and analysing your own corpus 1. Building a corpus

• There are many free online corpora like COCA or COHA, but you could alsobuild your own corpus.

Page 4: Building and analysing your own corpus 1. Building a corpus

1. Building a corpus.

• You can collect data from a variety of sources, but the most important thing to remember is that you need to save it in plain text (.txt) format.

• It also needs to be fairly big to make the corpus analysis worthwhile (I would recommend at least 100,000 tokens).

Page 5: Building and analysing your own corpus 1. Building a corpus

White house briefings

• Transcripts of the press conferences

• http://www.whitehouse.gov/briefing-room/press-briefings

Page 6: Building and analysing your own corpus 1. Building a corpus

The Brown family

• Part of the Brown family of corpora (which includes Brown, Frown, LOB, FLOB and BE06)

• http://www.Helsinki.fi/varieng/CoRD/corpora/index.html

Page 7: Building and analysing your own corpus 1. Building a corpus

International Corpus of English

• ICE• Twenty four research teams preparing

electronic corpora of their own national or regional variety of English for comparative purposes (e.g Indian English/ Australian English/South African English)

• http://ice-corpora.net/ice/index.htm

Page 8: Building and analysing your own corpus 1. Building a corpus

Corpora galore

• Learner corpora• Courtroom discourse• Academic English

• Specialised small corpora:• RIP• Sex education

Page 10: Building and analysing your own corpus 1. Building a corpus

• Where you collect your data from will depend on the type of corpus that you need to create, but the principles remain the same.

Page 11: Building and analysing your own corpus 1. Building a corpus

Downloading a newspaper

• 1. Go to a database like Lexis Nexis/ Westlaw

•2. Check which newspapers are available (be careful sometimes they lie - Westlaw claims to have Corriere but actually just has the articles that have been translated into English)

Page 12: Building and analysing your own corpus 1. Building a corpus

Downloading a paper

• 3. Choose the newspaper that you want

4. Use the name of the newspaper as the search term

Page 13: Building and analysing your own corpus 1. Building a corpus

Downloading a paper

• 5. If there is an option to remove duplicates - select it

•6. Choose one day only for the date range

Page 14: Building and analysing your own corpus 1. Building a corpus

Downloading a paper

• 7. Download that day's articles and save as txt.• To download the articles, click on the save

icon on the right of the screen which will open another window. Make sure that you download all the articles in text format.

Page 15: Building and analysing your own corpus 1. Building a corpus

Downloading a paper

• 8. If there are more than 500 articles for one day then you will have to download them as 1-500 and the 501-1000 (or whatever the maximum is).

• Click on the link to open and save your new file• 9. Remember to save the file with a sensible

name that includes the paper and date e.g. GUA20121015 (for the Guardian from 15 Oct 2012)

Page 16: Building and analysing your own corpus 1. Building a corpus

c. Building a corpus of fiction language from Project Gutenburg

• Project Gutenberg contains about 30,000 books which are no longer bound by copyright restrictions.

• This could be very useful if you wanted to look at different time periods, or different genres e.g. children’s writing.

• Go to http://www.gutenberg.org•

Page 17: Building and analysing your own corpus 1. Building a corpus

Building a corpus of fiction language from Project Gutenburg

• Think of a book you would like to download, for instance The Princess and the Goblin. Type the book that you want into the search box on the left.

• Scroll down the page to select a text only format and click to open.

• The text file will open within your browser.

Page 18: Building and analysing your own corpus 1. Building a corpus

Building a corpus of fiction language from Project Gutenburg

• Copy and paste into either Wordpad (look under ‘programs’ then ‘accessories’) or a Word document. Remember to save as .txt

• There is a large introductory section at the beginning of the file which could skew your results.

• In order to tell AntConc to ignore this you will• have to enclose it in angle brackets < The

Project Gutenberg Etext of blah blah blah>

Page 19: Building and analysing your own corpus 1. Building a corpus

Building a corpus of fiction language from Project Gutenburg

• Save your document as text with a sensible name eg ‘PrincessGoblin’ and make sure it is saved somewhere that you can find it easily

Page 20: Building and analysing your own corpus 1. Building a corpus

Class task

• Academic English: research papers introduction.

• Go to the CL 2015 abstract book• Copy the introduction paragraph and save in

text format• You need to decide how to label them• Collect as many as you can

Page 21: Building and analysing your own corpus 1. Building a corpus

Introduction to research

• What phraseologies can you discover from a corpus of the indtroduction to research papers?

• You can use this to help you write your own abstract for your project.