ig ihr 2012
DESCRIPTION
TRANSCRIPT
Using texts to explore historical texts: Examples from Lake District literature and the
Registrar General’s Reports
Ian Gregory
Lancaster University
Acknowledgements:
Alistair Baron, Patricia Murrieta-Flores, Andrew Hardie , and Paul Rayson (Lancaster)
Claire Grover (Edinburgh) – providing access to the geo-reference Histpop data
Richard Deswarte – help with the HistPop data
What is GIS?
Change in Infant Mortality in England & Wales, 1851-2001
0
20
40
60
80
100
120
140
160
180
1851 1861 1871 1881 1891 1901 1911 1921 1931 1941 1951 1961 1971 1981 1991 2001
IMR
Traditional HGIS: Infant mortality decline in England & Wales, 1851-1911
-30
-20
-10
0
10
20
30
1850s 1860s 1870s 1880s 1890s 1900s
% n
atio
nal r
ate
.
1
2
3
4
5
6
7
8
Source: Gregory (2008)
Annals of the Assoc. of
American Geographers
Distant Reading
Graphs (p. 16) Maps (p. 55) Trees (p. 73)
Moretti (2005) Graphs, Maps, Trees
Literary Mapping of the Lakes
• British Academy funded pilot project with David Cooper and Sally Bushell
• Two tours of the Lake District – Thomas Gray, 1769 (9,000 words)
• Proto-Picturesque
– ST Coleridge, 1802 (10,000 words) • Romantic
• Aims: – Can we create a GIS of text?
– What can it offer to literary research?
• Method: – Texts typed up by hand
– Places tagged manually
– Conversion
– Analysis
Place names coded in XML
<p in_text="Y">On Sunday Augt. 1st - half after 12 I had a Shirt, cravat, 2 pair of
Stockings, a little paper & half a dozen Pens, a German Book (Voss's Poems)
& a little Tea & Sugar, with my Night Cap, packed up in my natty green oil-
skin, neatly squared, and put into my <format format_type="I">net</format>
Knapsack / and the Knap-sack on my back & the Besom stick in my hand, which
for want of a better, and in spite of <person>Mrs C.</person> &
<person>Mary</person>, who both raised their voices against it, especially as I left
the Besom scattered on the Kitchen Floor, off I sallied - over the
Bridge<my_comment><pl_name visited="Y">Greta Bridge,
Keswick</pl_name></my_comment>, thro' the Hop-Field, thro' the <pl_name
visited="Y">Prospect Bridge</pl_name> at <pl_name
visited="Y">Portinscale</pl_name>, so on by the tall Birch that grows out of the
center of the huge Oak, along into <pl_name visited="Y">Newlands</pl_name>--
<pl_name visited="Y">Newlands</pl_name>is indeed a lovely Place-the houses…
Convert to a GIS
OS 1:50,000 gazetteer – all places on 1:50,000 maps
• Accuracy
• Spelling problems
• Disambiguation
Coleridge & Gray in a GIS
Smoothed surface of Gray’s places
All mentions Visits
Smoothed surface of Coleridges’s places
All mentions Visits
Class intervals are 10 equal intervals of
the all mentions. Bandwidth=10km
Comparing Coleridge and Gray
Green: Only in Gray
Yellow: Evenly in both
Red: Only in Coleridge
All mentions Visits
Mapping Emotional Response
Gray Coleridge
Physical Characteristics of Tours
0
100
200
300
400
500
600
700
STC Not visited STC Visited Grey Not visited Grey Visited
Po
p D
en
sit
y
Normal
1
10
100
1000
STC Not visited STC Visited Grey Not visited Grey VisitedP
op
. D
en
sit
y
Logged
Population density Altitude of mentions
0
10
20
30
40
50
60
70
0 to 99 100 to 199
200 to 299
300 to 399
400 to 499
500 to 599
600 to 699
700 to 799
800+
% o
f m
en
tio
ns
Height
Visited Didn't visit/Unclear
0
10
20
30
40
50
60
70
0 to 99 100 to 199
200 to 299
300 to 399
400 to 499
500 to 599
600 to 699
700 to 799
800+
% o
f m
en
tio
ns
Height
Visited Didn't visit/Unclear
Gra
y C
ole
rid
ge
Close Reading with Internet Mapping
http://www.lancs.ac.uk/mappingthelakes http://www.lancs.ac.uk/mappingthelakes/v2
The Histpop Collection
• Covers the printed reports published in the Census and the Registrar General’s Annual Reports, 1801-1937
• Nearly 13,000,000 words
• Georeferenced by C. Grover (University of Edinburgh)
• Just concerned with the Registrar General’s Reports, 1851-1911
• Total: 3,750,000 words
• England & Wales: 2,000,000 words
• http://www.histpop.org
Dot maps of place-name instances
Place-name instances, 1850s
Density Smoothing Cluster identification:
Standard deviations
of density www.histpop.org
Extract place-names Word
Frequency Cnt Kernel
Density Density Cnt
North Shields 300 Bermondsey
.5849 6
London 294 Newington .5842 4
Durham 207 Spitalfields .5835 1
Nottingham 193 Whitechapel .5835
1
Liverpool 171 Stepney .5823 2
Hawarden 145 Rotherhithe .5809 5
Grantham 131 London .5803 294
Cardington 125 Shoreditch .5794 1
Linslade 121 Bethnal Green .5788 4
Wakefield 121 Camberwell .5787 12
58th: Southwick (nr Sunderland)
.3498 1
Collocation
• “In Southwick and Monkwearmouth offensive nuisances abound.”
• “At Royton, in Oldham, where the drainage was imperfect, typhoid fever was prevalent”
• “The deaths in the Liverpool workhouse, in the Mount Pleasant sub-district of Liverpool, were above 100 more than in the same period of the two previous years, owing chiefly to an epidemic of measles among children of German emigrants temporarily located in this institution; there were also 101 deaths from typhus, nearly all of which occurred in the workhouse.”
KWIC of “West Bromwich”
Most common words in clusters • Uses Mutual Information scores – top 10 for each cluster, excluding place-names, numbers,
and punctuation
• 1 (North-East): Fog, took [changes in rainfall or temperature took place], largest [changes in weather], least [as largest], dense [weather related], greatest [weather], observatory, Asiatic [cholera], Halos [lunar or solar], thunder. WEATHER
• 2 (Wakefield): Falls, rain, seen [meteorological phenomena or “swallows”], reading, fell [snow or rain], number [met. readings], June, March. WEATHER
• 3 (South Lancs): declining [marriages, births or mortality], incorporated [boundary changes], noted [health or weather], cubic [cubic feet – earth movement for sanitation], workhouse, sail [Irish emigrants sailing from Liverpool], observatory, aurora, salutary [salutary effects that led to death], took [weather]. MIXED
• 4 (Oxon to Beds): cuckoo [was first heard], infirmary, Regius Professor, intermittent [intermittent fevers], sleet, solar, halos, least [rainfall or temperature], heard [thunder], thunder - WEATHER
• 5 (London): changed [changed water supply], anemometer, exclusively [supplied by one water company], hospital, command [front matter], Junction [Grand Junction Water Company], Company [almost always water company], pipes, Bills [Bills of Mortality], asylum, sewage – WATER SUPPLY
“Company” in Cluster 5
Mentions of diseases collocating to place-names
Diarrhoea Diphtheria Dysentery MeaslesScarlet-Fever
SmallpoxWhooping
-cough
Mentions_1850-1911 1555 1261 332 1513 964 333 23
0
200
400
600
800
1000
1200
1400
1600
Fre
qu
en
cyMentions of diseases from 1850 to 1910
0
100
200
300
400
500
600
700
1850 1860 1870 1880 1890 1900 1910
Me
nti
on
s
Decades
Diseases related to placenames
Whooping cough
Smallpox
Dysenterya
Scarlet Fever
Diphtheria
Measles
Diarrhoea
Places that collocate with “measles”
www.histpop.org
Comparing texts with statistics
0
10
20
30
40
1 2 3 4 5 6 7 8
%
Urban Level
Mentions of measles
Districts
Population
% national
pop (1911)
Sample areas
1 9.4 Stow on the Wold (Glou), Whitchurch (Hants.), Hexham (N’humb), Oakham (Rutland), Northallerton (N.Rid.), Holbeach (Lincs)
2 13.0 Cockermouth (Cumb), Chippenham (Wilts), Bridport (Dorset), Bangor (Carn), Alton (Hants), Pembroke (Pembs)
3 17.8 Guildford (Surrey), Redruth (Corn), York (E.Rid), Bucklow (Chesh), Chorley (Lancs), Maidstone (Kent)
4 18.7 Swansea, Canterbury, Hastings, Rochdale, Bolton, Wolverhampton
5 18.0 Sheffield, Leeds, Oxford, Southampton, Coventry, Edmonton (Mdlsex)
6 11.9 Exeter, Hull, Nottingham, Portsmouth, Leicester, Salford (Lancs)
7 9.0 Most of London, also Manchester, Liverpool and Birmingham
8 2.1 Only London, mainly East End
Do mentions of “Diarrhoea, dysentery and cholera” correlate with deaths from these diseases?
IMRchdidy Mchdiady
Correlation Coefficient 1.000 .225**
Sig. (1-tailed) .000
N 626 626
Correlation Coefficient .225** 1.000
Sig. (1-tailed) .000
N 626 626
Correlation Coefficient 1.000 .290**
Sig. (1-tailed) .000
N 626 626
Correlation Coefficient .290** 1.000
Sig. (1-tailed) .000
N 626 626
**. Correlation is significant at the 0.01 level (1-tailed).
Kendall's tau_b IMRchdidy
Mchdiady
Spearman's rho IMRchdidy
Mchdiady
Geographical Text Analysis
• Combination of Corpus Linguistics and GIS allows us to: – 1. Geographical approach:
• Ask where is this corpus talking about?
• Identify place-names in areas that the corpus concentrates on.
• Find out what it is saying about these places
– 2. Theme of interest approach:
• Find out which places are associated with our theme
• Find out what it is saying in relation to this theme
• Find out what other themes are associated with these places
• Compare geography of place-name mentions with statistical evidence to explore biases in sources
Further work
• HistPop
• BL’s C19th Century Newspapers
• Other sources