introducing corpus linguistics: antconc and project gutenberg. dr glenn hadikin

27
Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Upload: grant-smith

Post on 22-Dec-2015

226 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Introducing Corpus Linguistics: AntConc and Project Gutenberg.

Dr Glenn Hadikin

Page 2: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

• Download two magazines• Conduct a ‘keyword’ query

Page 3: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

What is corpus linguistics?

Corpus linguistics is the study of large bodies of naturally occurring text that are ‘visible’ to corpus analysis software.

Page 4: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Page 5: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Page 6: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

• https://www.gutenberg.org

Page 7: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Page 8: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Page 9: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Page 10: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

When you see this press ctr a to highlight it all and then ctr c to copy it all

Page 11: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Open up Wordpad and press ctr v to dump all the text to Wordpad

Page 12: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Press ‘save as’, choose ‘plain text’ and give it a filename such as boysandgirl.txt

Page 13: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

• That’s how I got the boysandgirls.txt file on the website.

• The girls.txt file followed the same procedure but is a copy of ‘The Girl’s Own Paper’ from 1886

Page 14: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Page 15: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Go to ‘file’ and open ‘boysandgirls.txt’

Page 16: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

You can type any common word in to the search box at the bottom and see if it’s working okay.

Page 17: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Go to ‘tool preferences’, ‘add files’, upload ‘girls.txt’ and press ‘load’ – this is called a reference file

Page 18: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Before any keyword analysis you must create a ‘wordlist’

Page 19: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

• Any guesses what words or ideas will be key in ‘boysandgirls’ compared with ‘girls’?

Page 20: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Page 21: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Click on a word to explore further…

Page 22: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

You can go back to ‘tool preferences’ and press ‘swap’ for opposite case.

Page 23: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Check there are 1117 occurrences of ‘the’ to make sure the files have swapped correctly.

Page 24: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin
Page 25: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

If the ‘boysandgirls’ keyword list comes back (with ‘illustrated’ at the top) go back to ‘tool preferences’, clear and reload the

‘boysandgirls’ reference corpus.

Page 26: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

• Would similar patterns come up in 21st century books?

Page 27: Introducing Corpus Linguistics: AntConc and Project Gutenberg. Dr Glenn Hadikin

Thank you – all invited to our book launch in Blackwells book shop tomorrow at 5pm.