![Page 1: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/1.jpg)
Corpus Linguistics and Stylistics
PALA Summer School, Maribor, 2014
![Page 2: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/2.jpg)
In this lecture...
• Stylistics and style• Combining stylistics + corpus linguistics• Examples of studies combining corpus linguistics
and stylistics– Analysis of genres– Analysis of the works by particular authors– Analysis of individual texts– Analysis of variation inside texts
• Corpus Tools– WMatrix
![Page 3: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/3.jpg)
Stylistics
Stylistics is the study of literature using methods, theories and concepts from linguistics (Leech and Short 2007: 1)
it is "[...] the study of the relationship between linguistic form and literary function [...]” (Leech and Short 2007: 3).
![Page 4: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/4.jpg)
Linguistic style
‘Style is a way in which language is used’(Leech and Short 2007: 31)
‘[S]tyle consists in choices made from the repertoire of the language.’(Leech and Short 2007: 31)
![Page 5: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/5.jpg)
Linguistic style
‘Stylistic choice is limited to those aspects of linguistic choice which concern alternative ways of rendering the same subject matter’(Leech and Short 2007: 31)
e.g. horse vs. steed but not horse vs. dog
![Page 6: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/6.jpg)
Linguistic style
• Style and genre, e.g. science fiction, romance novels, etc.
• Style and author• Style and text• Style and parts of texts (e.g. the narration or
speech of different characters)
![Page 7: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/7.jpg)
Ways of analysing style
• Analyst’s intuitions• ‘Manual’ comparative analysis
![Page 8: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/8.jpg)
Ways of analysing style
Style and comparison‘Even if style is defined as that variety of language which correlates with context, the recognition and analysis of styles are squarely based on comparison. The essence of variation, and thus of style, is difference, and differences cannot be analysed and described without comparison.’ (Enkvist 1973: 21)
![Page 9: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/9.jpg)
Ways of analysing style
• Comparative analysis – manually– OK for shorter texts/extract
• Comparative analysis – using computers:– Corpus linguistic methods/tools– Especially useful for longer texts – prose fiction
![Page 10: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/10.jpg)
Combining corpus linguistics and stylistics
• The ‘corpus turn’ (Leech and Short 2007:284).• On-going trend in stylistics to use methods
and tools from corpus-linguistics for the analysis of literary and other texts.
• Usually referred to as corpus stylistics• Other terms:
digital stylistics (Louw 2008)electronic text analysis (Adolphs 2006)
![Page 11: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/11.jpg)
Examples of studies
• Combining corpus linguistics and stylistics– Analysis of genres– Analysis of the works by particular authors– Analysis of individual texts– Analysis of variation inside texts
![Page 12: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/12.jpg)
Genre style
• Biber (1988) – multivariate statistical techniques– factor analysis– many different variables– variables = linguistic features (e.g. passive constructions)
• e.g. narrative versus non-narrative texts– important variables = past tense verbs, 3rd person
pronouns, perfect aspect, present participle clauses
– High scores = narrative– Low scores = non-narrative
![Page 13: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/13.jpg)
A range not a dichotomy
narrative / non-narrative
the top text-types
the bottom text typesthere exists a whole range of text-types in the middle – it’s not just a two-way distinction
Note also –spoken and written genres are mixed together along the dimension
![Page 14: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/14.jpg)
Genre style – direct speech
Corpus-based study of speech, writing and thought presentation(Semino and Short 2004)
![Page 15: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/15.jpg)
Genre style – direct speech
Corpus of 260,000 (approx) words of (late) 20th century written British English
• 120 text samples • 2,000 (approx) words each, amounting to a
total of 258,348 words. It is divided into three sections:
![Page 16: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/16.jpg)
Genre style – direct speech
Corpus divided into three sections:
– prose fiction (87,709 words), – newspaper news reports (83,603 words), and– biography and autobiography (87,036 words)
Each genre section further divided into a ‘serious’ and a ‘popular’ sub-sections.
![Page 17: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/17.jpg)
Genre style – direct speech
• Corpus tagged – manually
<sptag cat=NRS next=DS s=0.37 w=7>The theme park’s manager, Mike Slattery said: <sptag cat=DS next=NRS s=1.63 w=18>‘By closing Crinkley Bottom, the council has shot Morecambe in the foot. And I’m out of a job.’
![Page 18: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/18.jpg)
Genre style – direct speech
Section of the corpus Number of instances of DSWhole corpus 2,974
Fiction 1,569
Press 770
(Auto)biography 635
Fiction sub-section Number of instances of DS
Serious 629
Popular 940
![Page 19: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/19.jpg)
Authorial style
• Studies attempting to ‘fingerprint’ authors: i.e. to identify linguistic items that distinguish the works by one author from those of others.
• Burrows (1987): study of Jane Austen’s novels focusing on closed-class words, such as the, and, of, a and to.
• Burrows found that these words can distinguish the works of different authors , different novels, and even the words spoken by different characters.
![Page 20: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/20.jpg)
Authorial style
• Hoover (2002) studied a series of corpora containing chunks from novels by different authors.
• For example, he looked at a corpus containing the first 30,000 words of 29 novels by 17 different authors.
• The distribution of the 300 most frequent words in the corpus as a whole correctly clusters 15 out of 17 novels.
![Page 21: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/21.jpg)
Authorial style
• An analysis of the most frequent word sequences (n-grams) can also be useful, e.g. – of the– in the – to the – it was– he was– and the
![Page 22: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/22.jpg)
Authorial style
• Mahlberg (2007, 2009, 2012) • Corpus stylistics and Dickens’s fiction• Also shows that analysis of frequent
word sequences (clusters) can be useful.
• Clusters containing body parts– “his hands in his pockets”– “his head on one side”– “his hands upon his”
![Page 23: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/23.jpg)
Text style
• Stubbs’s (2005) study of Joseph Conrad’s Heart of Darkness, first published in 1899.
• Marlow, the protagonist and first-person narrator, tells of how he was contracted to travel up a river in the Belgian Congo, in order to find an ivory trader called Kurtz, who was the subject of stories of madness and suspect practices. However, Kurtz dies while travelling back down the river.
![Page 24: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/24.jpg)
• Main themes– ‘hypocrisy of the colonizers’– ‘unreliability of progress and civilization’ – ‘breakdowns in communication’– Light vs. dark– Restraint vs. frenzy– Appearance vs. reality– Marlow’s ‘unreliable and distorted knowledge
(Stubbs 2005: 8-9)
Text style
![Page 25: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/25.jpg)
Text style
• Used WordSmith Tools (Scott 2007)• Compared one novel with a corpus of fictional texts
of around 700,000 words• Overused words in novel include: seemed, mystery,
darkness, absurd, horror, terror, desolation• Several words concern uncertainty, perception and
knowledge.• Coincide with some of the novel’s themes
![Page 26: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/26.jpg)
Text style
• Stubbs shows how the application of corpus methods can provide:– further justification for well-established
interpretations, – new insights into the language and meaning
potential of the text.
![Page 27: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/27.jpg)
Text style: variation inside texts
• Culpeper (2002) used WordSmith Tools to do a key-word analysis of the speech of the main characters in Romeo and Juliet
• A file with the words spoken by each character was compared to a ‘reference corpus’ containing the words of all the other characters.
• Findings are relevant to an understanding of how the characters are linguistically constructed (characterisation).
![Page 28: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/28.jpg)
Text style: variation inside texts
Juliet’s key-words (raw frequencies in brackets):
If (31), Or (25), Sweet (16), Be (59), News (9), My (92), Night (27), I (138), Would (20), Yet (18), Thou (71), Words (5), Name (11), Nurse (20), Tybalt’s (6), Send (7), Husband (7), That (82), Swear (5)
![Page 29: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/29.jpg)
Text style: variation inside texts
Key-words such as if, or, would, yet can be related to Juliet’s tendency to express uncertainty and anxiety throughout the play:
‘I fear it is: and yet, methinks, it should not, For he hath still been tried a holy man’ (IV.iii.)[Context: Wondering whether the Friar has supplied sleeping potion or poison]
![Page 30: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/30.jpg)
Corpus tools
Corpus tools make comparison relatively easy• WordSmith Tools (Scott 2007)• WMatrix (Rayson 2009)• AntConc (Anthony 2011)• MLCT (Piao)
![Page 31: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/31.jpg)
Summary
• Style is the way in which language is used.• The notion of ‘style’ is fundamentally based on
comparison• Corpus linguistic methods are relevant to the
analysis of style in fiction/literature.• They have been applied to the analysis of
genres, authors and texts.• Manual analysis and interpretation of the
output from corpus tools is needed.
![Page 32: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/32.jpg)
Summary
[...] ‘corpus stylistics’ is not purely a quantitative study of literature. Rather, it is still a qualitative stylistic approach to the study of the language of literature, combined with or supported by corpus-based quantitative methods and technology.(Ho 2011:10)
![Page 33: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/33.jpg)
ReferencesCulpeper, J. (2009) “Keyness: words, parts-of-speech and semantic categories in the character-talk of
Shakespeare’s Romeo and Juliet” International Journal of Corpus Linguistics, 14(1): 29-59. Ho, Y. (2011) Corpus Stylistics in Principles and Practice: A Stylistic Exploration of John Fowles’ The Magus.
London: Continuum Leech, G. (2008) Language in Literature: style and foregrounding Harlow, UK: PearsonLouw, B. (2008) "Consolidating Empirical method in data-assisted stylistics: Towards a corpus-attested
glossary of literary terms" in Zyngier, S., Bortlussi, M., Chesnokova, A. and Auracher, J. Directions in Empirical Literary Studies, pp. 243-264. Amsterdam: Benjamins.
Mahlberg M. (2007) “Clusters, Key Clusters and local textual functions in Dickens” Corpora 2(1): 1-31Mahlberg, M. (2009) “Corpus Stylistics and the Pickwickian watering-pot”, in Contemporary Corpus
Linguistics Baker, P. (ed.) Contemporary Corpus Linguistics, pp47-63. London: Continuum.Mahlberg, M. (2012) Corpus Stylistics and Dickens’s Fiction. London: RoutledgeMcIntyre, D. (2010) “Dialogue and Characterization in Quentin Tarantino’s Reservoir Dogs: A Corpus Stylistic
Analysis”, in McIntyre, M. and Busse, B. (eds.) Language and Style pp 162-182. Basingstoke: Palgrave. McIntyre, D. and Walker, B. (2010) 'How can corpora be used to explore the language of poetry and drama?'
in McCarthy, M. and O’Keefe, A. (eds) The Routledge Handbook of Corpus Linguistics. London: RoutledgeWiddowson, H. G. (2008) “The Novel Features of Text. Corpus Analysis and Stylistics” in Gerbig, A. and
Mason, O. (eds.)Language, People, Numbers: Corpus Stylistics and Society, pp. 293-304. Amsterdam: Rodopi.
![Page 34: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/34.jpg)
WMatrix
![Page 35: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/35.jpg)
WMatrix
• Web-based corpus tool• Developed by Paul Rayson at Lancaster
University• Automated grammatical and semantic analysis
of texts/corpora• A web-based front end for CLAWS and USAS
![Page 36: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/36.jpg)
WMatrix
Using a web interface: • Texts are uploaded onto the Wmatrix server
(at Lancaster)• The upload procedure automatically adds
(i) Grammatical or Part of Speech (POS) tags;(ii) Semantic tags
![Page 37: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/37.jpg)
WMatrix
• CLAWS grammatical (POS) tagger.CLAWS = Constituent Likelihood Automatic Word-tagging System
• USAS semantic taggerUSAS = UCREL Semantic Analysis System
• (UCREL = University Centre for Corpus Research on Language)
![Page 38: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/38.jpg)
WMatrix
USAS
• Assigns tags to each word using a hierarchical framework of categorization
• Based originally on McArthur’s (1981) Longman Lexicon of Contemporary English
![Page 39: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/39.jpg)
The 21 Top Level Semantic Categories of the USAS Tag-set
AGENERAL & ABSTRACT TERMS
BTHE BODY & THE INDIVIDUAL
CARTS & CRAFTS
EEMOTION
FFOOD & FARMING
GGOVERNMENT & PUBLIC DOMAIN
HARCHITECTURE, HOUSING & THE HOME
IMONEY & COMMERCE (IN INDUSTRY)
KENTERTAINMENT
LLIFE & LIVING THINGS
MMOVEMENT, LOCATION, TRAVEL, TRANSPORT
NNUMBERS & MEASUREMENT
OSUBSTANCES, MATERIALS, OBJECTS, EQUIPMENT
PEDUCATION
QLANGUAGE & COMMUNICATION
SSOCIAL ACTIONS, STATES & PROCESSES
TTIME
WWORLD & ENVIRONMENT
XPSYCHOLOGICAL ACTIONS, STATES & PROCESSES
YSCIENCE & TECHNOLOGY
ZNAMES & GRAMMAR
![Page 40: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/40.jpg)
WMatrix
G - Government and the public domain G1.1
G1.2
Government, politics and elections
Crime, law and order
War, defence and the army: weapons
Government, etc.
Politics
G1
G2
G3
![Page 41: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/41.jpg)
WMatrix
Allows analysis of texts at :
– the word level– the grammatical level (POS)– and the semantic level
![Page 42: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/42.jpg)
WMatrix
Allows text comparison at:
– the word level– the grammatical level (POS)– and the semantic level
![Page 43: Corpus Linguistics and Stylistics PALA Summer School, Maribor, 2014](https://reader037.vdocuments.net/reader037/viewer/2022103021/56649c6e5503460f94920b8e/html5/thumbnails/43.jpg)
WMatrix
Keyness
• Word level – Key-words• Grammatical level – Key-POS • Semantic level – Key-concepts