compair: compare and visualise the usage of language, david beavan, university of glasgow, dh2011

13
ComPair: Compare and Visualise the Usage of Language David Beavan University of Glasgow [email protected] @DavidBeavan

Upload: david-beavan

Post on 20-Jun-2015

311 views

Category:

Education


3 download

DESCRIPTION

Talk given at Digital Humanities 2011 (DH2011) in Stanford, USA on 21 June 2011. Web site: http://www.scottishcorpus.ac.uk/corpus/bnc/compair.php Abstract: https://dh2011.stanford.edu/wp-content/uploads/2011/05/DH2011_BookOfAbs.pdf This paper will demonstrate ComPair, a new tool to investigate and compare word usage, encouraging new ways to explore language variation. While remaining focussed on the usability and the promotion of navigation, this tool represents an evolutionary step forward from the author’s previous award winning visualisation applications. This paper will introduce the methods and technologies at its core, perform a demonstration of the tool and discuss opportunities for further collaboration.

TRANSCRIPT

Page 1: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

ComPair: Compare and Visualise the Usage of Language

David Beavan University of Glasgow [email protected] @DavidBeavan

Page 2: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

‘You shall know a word by the company it keeps’

Firth, John R., 1957. Modes of meaning. Oxford: Oxford University Press.

Page 3: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Collocation

•  Words which go together •  More than by chance, they show an association

•  Take a corpus •  Search for a term (node word) •  Examine words in a window (e.g. 5) either side of node •  Aggregate these co-occurring words •  Rank (e.g. by frequency or collocational strength)

Page 4: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

‘Stanford’ collocate search via Davies, Mark. (2004-) BYU-BNC: The British National Corpus.Available online at http://corpus.byu.edu/bnc.

Collocates

Page 5: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Collocate Cloud

‘Stanford’ search via Beavan, David. (2008-) BNC Collocate Cloud. Available online at http://www.scottishcorpus.ac.uk/corpus/bnc/collocatecloud.php

Page 6: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Collocate Cloud properties

•  100 most frequent collocates listed alphabetically •  Font size shows frequency of word •  Brightness shows collocational strength of word •  Interactively create new clouds

•  Best New Idea for Improving a Current Web-Based Tool,

2008 TADA Research Evaluation eXchange (T-REX)

Page 7: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

Comparison

•  Investigate and compare word usage –  Expose attitudes and cultures –  Investigate degrees of synonymy

•  Semantic prosody –  How synonymous words can actually take on positive or negative

connotations

•  Applications for language learning –  Examine real-world usage of words

Page 8: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011
Page 9: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011
Page 10: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011
Page 11: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

ComPair properties

•  Visualise usage of two node words •  Distribute 150+ collocates on a continuum •  Colour shows attraction to node •  Brightness shows degree of collocational attraction

•  Currently uses British National Corpus •  Can be applied to any corpus or dataset (in progress)

Page 12: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

ComPair how-to

•  Take two collocate word lists –  Same corpus, different node words –  Different corpora, same node word

•  Calculate collocational strength towards each node –  Mutual Information etc.

•  Place collocates on continuum between node words –  Those with attraction to a single node appear near that node –  Those with little attraction to either node appear central and dim –  Those with attraction to both nodes appear central and bright

Page 13: ComPair: compare and visualise the usage of language, David Beavan, University of Glasgow, DH2011

ComPair: http://www.scottishcorpus.ac.uk/corpus/bnc/compair.php

David Beavan University of Glasgow [email protected] @DavidBeavan