british national corpus

27
. The British National Corpus (BNC) is one of the most important corpus in the field of linguistics. The content of BCN contains British English data from the late twentieth century. This corpus covers a variety of different genres.

Upload: laura-p

Post on 21-Jun-2015

2.241 views

Category:

Education


2 download

TRANSCRIPT

Page 1: British national corpus

.

The British National Corpus (BNC) is one of the most important corpus in the field of linguistics. The content of BCN contains British English data from the late twentieth century. This corpus covers a variety of different genres.

Page 2: British national corpus

CREATION OF THE BRITISH NATIONAL CORPUS (BCN)

The project was developed by an academic consortium called the BCN Consortium, The British Library and the British Academy. In addition, some other academic research centres are the the University Centre for Computer Corpus Research on Language and the Oxford University Computing Services.

Page 3: British national corpus

The construction of the corpus began in 1991 and it was finished in 1994. Although no more texts were added to the corpus, there was a revision of this work carried out in 2001 with the publication of the BNC World and again in 2007 with a new edition called BNC XML Edition.

Page 4: British national corpus

The corpus is divided into two types of different type of corpora which are:

● The BCN Sampler is a collection of one million written words.

● The BNC Baby collects about four one-million word samples which belong to different genres

Page 5: British national corpus

The British National Corpus follows the Guidelines of the Text Encoding Inititative. There are two different parts which constitute the corpus:

● Written part: (90%). It covers data from several sources like books, periodicals, brochures and leaflets. In addition the written part covers regional and national newspapers, journals for all ages and interests, academic books, popular fiction, university essays etc

Page 6: British national corpus

● Spoken part (10%): That part extracts information from orthographic transcriptions of informal conversations and spoken language collected in different contexts.

Page 7: British national corpus

WHY TO USE THE BRITISH NATIONAL CORPUS (BNC)?

The BNC can be used to know about aspects we did not know about a word and to check our thoughts about its meaning. Moreover, the corpus can help to find out the meaning of a word not just what we think it means. The BNC offers many options like for instance to know if a word can collocate with other set of words, if it is grammatically right in some specific contexts.

Page 8: British national corpus

If we look for the word the word “bent” plus the preposition “on” the BNC indicates that this combination of words appear together in a specific context. From a grammatical point of view, the British National Corpus determines that “Bent to” can only be followed by a noun or noun phrase, or by verb plus the suffix- ing. Let’s look at it in the next image:

Page 9: British national corpus
Page 10: British national corpus

HOW TO USE THE BRITISH NATIONAL CORPUS

There exists two ways of using the British National Corpus according to its complexity:

● Xaira: It can be used to check the spelling of a word, compare different variants to measure the frequency of use and if a certain word is part of the BCN.

● The BNC Simple Search: It is a quick way of searching a word / phrase. This type of search can be used to check the spelling of a word and also to compare the frequency and variants of a word.

Page 11: British national corpus

If we use the BNC Simple search, it is necessary to type the word or phrase in the search box that the person wants to find. Once the word/ phrase has been search a list of up to 50 selected instances headed by a note of the total frequency of use of them appears on the screen.

Page 12: British national corpus
Page 13: British national corpus

If we want to look for more complex queries we should add the following characters to the words. The _ character is used to match single words, while the = character allows the restriction of chains of speech and the use of braces {} helps to define a certain expressions

Page 14: British national corpus

In addition, in the screen, four options are part of the option “display” of the corpus when we are looking for a word: LIST, CHART, KWIC and COMPARE. Then there are three more options under the label of search string which are: word, collocation and pos list.

Page 15: British national corpus
Page 16: British national corpus

In addition, there is a section called “sorting and limits”. The sorting can be looked in terms frequency, relevance and alphabetical order.

Page 17: British national corpus

The corpus includes several categories or labels of texts from different nature which are ““spoken”, fiction”, “magazine”, “newspaper” or “non- academic texts”

Page 18: British national corpus

For instance, if we look for the word “couch”, the corpus shows us that this word collocates with different words: lying, lay, room, potato etc. After having clicking on one of this word several examples will appear on the screen. The corpus allows looking for a word or phrase but at the same time the possibility of finding collocations. To look for a collocation is as easier as to type the word which wants to be searched and automatically an asterisk will appear on the box of collocation. Once the search has been produced the corpus displays a list of words which collocates with the word.Let’s see:

Page 19: British national corpus
Page 20: British national corpus
Page 21: British national corpus
Page 22: British national corpus

The KWIC search enriches the corpus because it helps the person which is looking for the word to know in which grammatical structures and contexts the word appears. For example, if we look for the word: “shoes” the corpus shows in colours the different words which can be used with this word. “A new pair of”, “the soles of our”, “the second hand”, “new polished”, or “thousands of” etc.

Page 23: British national corpus
Page 24: British national corpus
Page 25: British national corpus

COMPARISON BETWEEN THE BRITISH NATIONAL CORPUS AND THE COCA

In terms of size there is a huge difference between both corpuses as the COCA is four times bigger than the BNC. The COCA is made up of 410 + million words in opposition to the BNC which covers 100 million words. In relation to the composition of both corpuses the COCA focuses on spoken, popular magazines, academic texts and each of those genres means a 20% of the total.

Page 26: British national corpus

However, the BNC is strictly divided in a 90% which is written while the other 10% is spoken English. As a result the COCA deals with more recent information as the corpus was updated while BNC focuses more on everyday language.

Page 27: British national corpus

INFORMATION SOURCES

British National Corpus. (2011, April 9th ). In Wikipedia. Retrieved 19: 40, April 9th , 2011, from: http://en.wikipedia.org/w/index.php?title=British_National_Corpus&oldid=328182118

British National Corpus . http://www.natcorp.ox.ac.uk/ .Retrieved 9th April, 2011

BYU-BNC: BRITISH NATIONAL CORPUS. Mark Davies / Brigham Young University. Retrieved 19:40, April 9th , 2011, from http://corpus.byu.edu/bnc/

Encoding the British National Corpus. Retrieved 19:40, April 9th 2011 from http://xml.coverpages.org/bnc-encoding2.html

“Phrases in English” (PIE) and the British National Corpus. Retrieved 19:40, April 9th, 2011. http://pie.usna.edu/