magnus huber - the old bailey corpus: spoken english in the 18th and 19th centuries

36
The Old Bailey Corpus Spoken English in the 18th and 19th centuries The use of historical court records in the investigation of language change Digital History Seminar, 21 February 2012 Magnus Huber Department of English University of Giessen Otto-Behaghel-Str. 10B D-35394 Giessen, Germany [email protected]

Upload: historyspot

Post on 02-Jul-2015

1.012 views

Category:

Education


4 download

TRANSCRIPT

Page 1: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

The Old Bailey Corpus

Spoken English in the 18th and

19th centuries The use of historical court records in

the investigation of language change

Digital History Seminar, 21 February 2012

Magnus Huber

Department of English

University of Giessen

Otto-Behaghel-Str. 10B

D-35394 Giessen, Germany

[email protected]

Page 2: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

2

Structure 1. Introduction

1.1 Corpus linguistics, sociolinguistics and

sociohistorical linguistics

1.2 The Proceedings of the Old Bailey

1.3 Turning the Proceedings into a linguistic corpus

2. How linguistically accurate is OBC?

2.1 Comparison with alternative accounts

2.2 Language event and its representation

2.3 Internal consistency: negative contraction

2.4 Sociolinguistic potential: relative clauses

3. Brief summary

Page 3: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Definition of linguistic corpus

Generally speaking, a

(usually large) collection of

machine-readable texts used

as a database in linguistic

analyses

Importance of

spoken language

Spoken language precedes

written language

1. Introduction

1.1 Corpus linguistics, sociolinguistics and

sociohistorical linguistics

Page 4: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

0

20

40

60

80

100

MMC LMC UWC MWC LWC

Female

Male

Percentage

of (ng):[n] by

social class

and sex

MMC middle middle class

LMC lower middle class

UWC upper working class

MWC middle working class

LWC lower working class

drinking

(ng):[n]

= [drɪnkɪn]

Peter Trudgill (1974)

The social differentiation of English in Norwich

Page 5: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Historical linguistics: language change

ye > you in subject position

when ye

come set it in

sech rewle as

ye seeme

best (1465)

And thus in

hast fare you

hartely well

(1545)

Page 6: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Sociohistorical linguistics

Gender-related change: ye > you

Page 7: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

7

1.2 The Proceedings of the Old Bailey

• Old Bailey = London's Central Criminal Court

• meets 8 times/year, from 1830s 10 times/year

• "Proceedings" published 1674-1913

• start as a commercial enterprise: publishers

send scribes into courtroom

• proceedings taken down in shorthand

• sold privately by publishers

• City of London gains more and more control

during 18th century

Page 8: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

• 2100+ volumes

• ca. 200,000 trials

• ca. 134 million words

Page 9: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

www.oldbaileyonline.org

Page 10: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>

<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>

<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]

Original computerized Proceedings (Sheffield)

Page 11: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>

<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>

<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]

Original computerized Proceedings (Sheffield)

Page 12: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Sociolinguistically useful XML-tags

in Sheffield Proceedings

• name

<given>Sarah</given> <surname>Sanders</surname>

• year

<identifier>t17180110-1</identifier>

• gender

<defend gender="f">

• age

<age>43</age>

• profession

<deflabel>Servant</deflabel>

• origin

<crimeloc>Tottenham</crimeloc>

Page 13: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

1.3 Turning the Proceedings

into a linguistic corpus of

early spoken English

13

Page 14: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

<unit id="t17330510-1"><trial><info><identifier>t17330510-1</identifier><source>173305100002</source><header>Sarah Sanders, theft: specified place, 10 May 1733.</header> <pfro>17330510</pfro><ntrial>2</ntrial><psession>17330404</psession><nsession>17330628</nsession></info>

<p>1. <person gender="f"><defend gender="f"><given>Sarah </given><surname>Sanders </surname></defend></person>, was indicted for <off><theft type="specified place">stealing a Portugal Piece of Gold, value 36 s. a Gold Ring, value 10 s. a Gold Ring set with Vermillion Stones, value 7 s. 6d. a Silver Girdle Buckle, value 10 s. three Aprons, a Shirt, a Shift, and 2 Ells of Holland, the Goods of <person gender="m"><victim gender="m"><given>John </given><surname>Underwood </surname></victim> </person>, in his House</theft></off>, <cd>March 4</cd>.</p>

<p>John Underwood. The Prisoner was my <deflabel>Servant</deflabel>, she came to me very well recommended, but had not staid above ten Weeks before several [. . .]

<speech>

Page 15: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Tagging spoken language

• Need for automatic annotation

• Perl script identifying non-linguistic

patterns indicating spoken language

in the original proceedings

– layout

– metalinguistic information

• Linguistic markers indicating spoken

language? > 1st + 2nd person prns

Page 16: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Automatic speech tagging

e.g. "Q. – A."-sequences

Q. Did you see him on Sunday night? - A.

Yes, at Walworth, on Sunday night, the

12th of January, at one o'clock - I am sure

of that.</p>

<speech> </speech>

<speech>

</speech>

Page 17: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

17

Sociobiographical speech event annotation

The New Bailey Tag Assistant

Page 18: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

- <xml>

- <document name="19100426">

. . .

- <speaker id="271">

<sex>m</sex>

<age></age>

<given>Thomas</given>

<surname>Tuckey</surname>

<occupation>Warder</occupation>

<occupation2></occupation2>

<hiscolabel>Prison Guard</hiscolabel>

<hiscocode>58930</hiscocode>

<hiscolabel2></hiscolabel2>

<hiscocode2></hiscocode2>

<crimescene></crimescene>

<birthplace></birthplace>

<workplace>Wormwood Scrubs Prison</workplace>

<placeofresidence></placeofresidence>

<role>witness</role>

</speaker>

. . .

- </document>

- </xml>

18

Social data file

• XML format

• attributes of every speaker

in OBC

• plus: scribe, printer,

publisher

Page 19: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

2. How linguistically accurate is OBC?

Proceedings (718 words) Tryal (1290 words)

Thomas. I am clerk to Mr Jones,

a Stationer in the Temple.

Henry Thomas. I am clerk to Mr

Jones, a Stationer, in the Temple.

Hargrave. By Mr Ayliffe: I saw

him seal and deliver it.

Walter Hargrave. By Mr Ayliffe. – I

saw him sign, seal, and deliver it, as

his act and deed.

./. John Fannen. I am not sure; but to

the best of my remembrance, it was

sometime the beginning of

December last, at Mr Fox's house.

19

2.1. Comparison with alternative accounts, e.g.

trial of John Ayliffe, 17591024-27, vs. alternative

account The tryal at large of John Ayliffe

Page 20: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Proceedings (718 words) Tryal (1290 words)

Hargrave. Because he said he

was not willing Mr Fox should

know of it?

Walter Hargrave. The reason Mr

Ayliffe gave, was, that he would not

on any account have it come to Mr

Fox's ears.

Thomas. I can't particularly say

that; sometimes we leave a

blank by the gentlemens desire,

perhaps they may add another

covenant, or something of that

sort, I can't recollect the reason

for that.

Henry Thomas. I cannot positively

say. – We sometimes leave out the

conclusion by gentlemen's desire, in

order that they may add a covenant,

or some such thing, if it should be

thought necessary; but I cannot

particularly recollect the reason why

the conclusion was omitted in this

case.

20

Page 21: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

speech event

perception by scribe

shorthand script

expanding shorthand

proof reading

type setting

21

formulation writing

Letters

Trial proceedings (e.g. Old Bailey Proceedings)

2.2 Language event ↔ written representation

Page 22: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Gurney (1752)

Brachygraphy: or short-writing

22

'to take a Speech,

or Sermon

verbatim, as a

Person talks in

common' (p. 3)

Scribes

Thomas Gurney

(1749-1770)

Joseph Gurney

(1770-1782)

Page 23: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Recording linguisticdetails

• no distinction between inflected and

uninflected auxiliaries

= 'may' or 'mayst'

= 'can' or 'canst'

= 'should' or 'shouldst'

• dot placed on the top left of the noun phrase

= allomorphs a and an

• auxiliary contractions

'you will' (you w-il) vs. 'you'll' (you-l)

but │ 'it will' ~ 'twill' (│= <t> and it) 23

Page 24: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

24

2.3 Internal consistency:

negative contraction

e.g. do not > don't, need not > needn't, was not > wasn't

N = 1,344,244

0

2

4

6

8

10

12

14

16

18

1732-1759 1760-1789 1790-1819 1820-1849 1850-1879 1818-1913

NEG contraction in %

Page 25: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Negative contraction in the

OBC, 1732-1912 1. Lexeme?

AUX form % contr. N

do not 28.9 189,776

will not 27.7 17,302

shall not 20.6 4,172

cannot 13.3 106,005

are not 3.2 11,552

dare not 3.1 260

need not 0.6 2,136

did not 0.4 429,143

does not 0.4 9,539

have not 0.4 44,038

could not 0.2 85,361

25

AUX form % contr. N

is not 0.2 47,142

must not 0.2 1,620

would not 0.2 52,123

had not 0.1 72,395

has not 0.1 9,244

should not 0.1 20,192

was not 0.1 64,574

may not 0.0 1,271

might not 0.0 2,404

ought not 0.0 1,221

Page 26: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Negative contraction in the

OBC, 1732-1912 2. Frequency?

AUX form % contr. N

do not 28.9 189,776

will not 27.7 17,302

shall not 20.6 4,172

cannot 13.3 106,005

are not 3.2 11,552

dare not 3.1 260

need not 0.6 2,136

did not 0.4 429,143

does not 0.4 9,539

have not 0.4 44,038

could not 0.2 85,361

26

AUX form % contr. N

is not 0.2 47,142

must not 0.2 1,620

would not 0.2 52,123

had not 0.1 72,395

has not 0.1 9,244

should not 0.1 20,192

was not 0.1 64,574

may not 0.0 1,271

might not 0.0 2,404

ought not 0.0 1,221

Page 27: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Negative contraction in the

OBC, 1732-1912 3. Tense?

AUX form % contr. N

do not 28.9 189,776

will not 27.7 17,302

shall not 20.6 4,172

cannot 13.3 106,005

are not 3.2 11,552

dare not 3.1 260

need not 0.6 2,136

did not 0.4 429,143

does not 0.4 9,539

have not 0.4 44,038

could not 0.2 85,361

27

AUX form % contr. N

is not 0.2 47,142

must not 0.2 1,620

would not 0.2 52,123

had not 0.1 72,395

has not 0.1 9,244

should not 0.1 20,192

was not 0.1 64,574

may not 0.0 1,271

might not 0.0 2,404

ought not 0.0 1,221

Page 28: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

28

Explaining the absence of

negative contraction

• combination of phonology and genre

• n't is phonetically reduced, less salient than not

• do-don't [u - o(u)] vs. did-didn't [ɪ - ɪ]

can-can't vs. could-couldn't

will-won't vs. would-wouldn't

shall-shan't vs. should-shouldn't

• negative contraction is (near) absent where the

context (e.g. change in the stem vowel in the

negative) does not allow disambiguation

Page 29: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Hierarchy of perceptive difference

between positive and negative

contracted forms

29

V change C change/

addition

Score

do-don('t) 1 1 2

will-won('t) 1 1 2

shall-shan('t) 0.5 1 1.5

can-can('t) 0.5 0 0.5

Page 30: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

2.4 Sociolinguistic potential: relative

clauses

• random extracts of speech events from OBC:

20,000 words/decade (10,000 w. each for m + f)

• 2500+ relative clauses, of which 1533 restrictive

30

1720-

1779

% 1780-

1839

% 1840-

1913

% ∑ %

that 259 53.8 240 45.4 136 26.0 635 41.4

zero 107 22.2 118 22.3 201 38.4 426 27.8

which 70 14.6 97 18.3 92 17.6 259 16.9

who 38 7.9 69 13.0 89 17.0 196 12.8

whom 6 1.2 2 0.4 5 1.0 13 0.8

whose 1 0.2 3 0.6 0 0.0 4 0.3

∑ 481 529 523 1533

Page 31: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Diagram 1 Distribution of that with regard to

animacy of the head

1720-1779 vs 1780-1839 p = 0.000

1720-1779 vs 1840-1913 p = 0.000

1780-1839 vs 1840-1913 p = 0.070 31

1720-1779 1780-1839 1840-1913

non-human 121 164 105

human 137 76 31

0%

20%

40%

60%

80%

100%

Page 32: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

32

Diagram 2 Distribution of that and pronominal

relativizers with human heads

1720-1779 vs 1780-1839: p = 0.000

1720-1779 vs 1840-1913: p = 0.000

1780-1839 vs 1840-1913: p = 0.000

1720-1779 1780-1839 1840-1913

PRN 49 72 93

that 137 76 31

0%

20%

40%

60%

80%

100%

Page 33: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Diagram 3 Relativizers by gender (excl. genitives)

f 1720-1779 vs 1780-1839: p = 0.135 m 1720-1779 vs 1780-1839: p = 0.033

f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.000

f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.000

f m f m f m

1720-1779 1780-1839 1840-1913

PRN 43 71 56 112 66 119

zero 53 54 66 52 110 73

that 124 134 108 132 72 64

0%

20%

40%

60%

80%

100% p = 0.135 p = 0.001 p = 0.000

Page 34: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Diagram 4 Zero relativizer by gender (excl. genitives)

f 1720-1779 vs 1780-1839: p = 0.268 m 1720-1779 vs 1780-1839: p = 0.326

f 1720-1779 vs 1840-1913: p = 0.000 m 1720-1779 vs 1840-1913: p = 0.022

f 1780-1839 vs 1840-1913: p = 0.000 m 1780-1839 vs 1840-1913: p = 0.001

f m f m f m

1720-1779 1780-1839 1840-1913

other 167 205 164 244 138 173

zero 53 54 66 52 110 73

0%

20%

40%

60%

80%

100%

Page 35: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

Thank you

35

Page 36: Magnus Huber - The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries

References

• Gurney, Thomas. 1752. Brachygraphy: or short-writing.

2nd ed. London: [no publisher].

• Nevalainen, Terttu & Raumolin-Brunberg, Helena (eds).

1996. Sociolinguistics and language history: studies

based on the corpus of early English correspondence.

Amsterdam: Rodopi.

• Trudgill, Peter. 1974. The Social Differentiation of

English in Norwich. Cambridge: Cambridge University

Press.

• van Leeuwen, Marco H.D., Ineke Maas and Andrew

Miles. 2002. HISCO: Historical international standard

classification of occupations. Leuven: Leuven University

Press. 36