Page 1
Wikimedia/British Library map mapping project
– review and latest update
How to find 50,000 maps in a haystack of 1,000,000 images; geolocate them, and categorise them
... on a budget of no not many euros.
James Heald,Wikimedia volunteer
(User:Jheald)
Kimberly Kowal,British Library
[email protected]
Page 2
1,000,000 imagesFantastic, but …
Page 3
Very limited metadata
Page 4
Very limited metadataCommons said no bulk upload
Page 5
Volunteer response…
Create a subject index by book…
Page 6
… encouraging images to be uploaded by the book(20,000 so far – majority by one user)
Page 7
… however, manual categorisation of images isvery very time-consuming.
Page 8
Could anything be done more automatically…
?
Page 9
Maps: natural classification, given co-ordinates
Could anything be done more automatically…
?
Page 10
So: find the maps on Flickr, and tag them…
Page 11
… using the index to drive the process
31 Oct
Page 12
… using the index to drive the process
31 Oct
Page 13
… using the index to drive the process
31 Oct
Page 14
… using the index to drive the process
03 Nov
Page 15
… using the index to drive the process
17 Dec
Page 16
… using the index to drive the process
19 Dec
Page 17
But how many maps were there ?
Oct 31
Page 18
But how many maps were there ?
Oct 31
Page 19
But how many maps were there ?
Nov 2
Page 20
But how many maps were there ?
Nov 7
Page 21
But how many maps were there ?
Nov 14
Page 22
But how many maps were there ?
Dec 1
Page 23
But how many maps were there ?
Dec 10
Page 24
But how many maps were there ?
Dec 17
Page 25
But how many maps were there ?
Dec 28
Page 26
-- including 20,000 found independently by @Quasimondo, machine-assisted using his own pattern recognition methods
50,000 maps in all:
classmark detailed totals index index ------ ---------- ----------- misc 16074 14091 1983
Europe 13136 6254 6882British Isles 7191 269 6922North America 6758 1524 5234 USA 5782 1209 4573Asia 2736 1280 1456Africa 2300 1075 1225South America 895 659 236
Page 27
Geo-location, using the Klokan/BL Georeferencer
(Free alternatives are also available)
Next step:
Page 28
10x more images than the BL has ever attempted before
Next step:
Page 29
Success allows the old map to be laid over the top of a modern one
Page 30
Pilot run of 3,000 completed
Page 31
Now characterised by location …
Pilot run of 3,000 completed
Page 33
All that is needed to identify individual continents …
Page 35
… nation …
… nations …
Page 37
… and beyond
… and beyond.
Page 38
Ready to be uploaded to Commons…
Page 39
Ready to be uploaded to Commons…
… almost
Page 40
To do list:
Better subject identification
Reasonable Commons categorisation
Page 41
To do/1: Subject identification
Current: OSM Nominatim, 4 votes out of 5
Page 42
To do/1: Subject identification
Small features: Look up on Wikidata, find plausible candidate
Page 43
To do/1: Subject identification
Large features: can be over-cautiousNeed better idea of size of candidate features…
Page 44
To do/1: Subject identification
Large features:… so compare typical existing maps
Page 45
To do/2: Categorisation
Principle on Commons is to refine into groups of'human manageable' size.
~ 4 to 40 images (larger for series)
Good for humans, less good for machines... wildly different categorisation depths & naming
Page 46
To do/2: Categorisation
Routine upload and management categories ... straightforward enough.
Maps from collection uploaded on <date> Maps from collection uploaded on <date> with
categorisation to confirm Images from <book>
but then ...
Page 47
To do/2: Categorisation
Countries: Old maps of <country> Old maps of part of <country>Cities: Old maps of <city> Old maps of cities in <country>
Old maps of cities in <part of country>+ "<city>" itself ?
Features: (ie buildings, castles, cathedrals, battlefields, etc)
<Feature> / Plans of <Feature> Plans of <feature-type>s in <place>
Page 48
To do/3: Strengthening Wikidata
<feature-type> should be given by P31 ("Instance of“) -> church, castle, cathedral, battlefield, etc
But data often not yet there...Need to supply: WP category mining (care needed:"category spillage"), databases (if PD), etc.
Page 49
To do list
There is work to do…
But with some work, (and some human mop-up),automated upload + reasonable categorisationshould be possible.
Page 50
State of play
Georeferencing is underwayIndex pages now have “to georef” templates.
Page 51
State of play
Main progress page is live
Page 52
Conclusions: Tiered levels of wiki-pages leading to image searches can be used to drive large projects Even ad-hoc rough indexes are useful Commons's own old maps should be next
(~ 60,000)
Georeferencing is fun -- come and give it a try