under the hood: how geonames aggregates over 35 sources into one data set
DESCRIPTION
Speaker: Marc WickTRANSCRIPT
GeoNames is ...aggregator of free geo data
I am ...Marc Wick
self employed software engineer, Switzerland
GeoNames“Under the Hood: How GeoNames Aggregates
many Sources into One Data Set“
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 2
GeoNames Feature Density Map
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 3
GeoNames - Gazetteer
Pragmatic, useful, ease of useOver 6.5 million features Cc-by licence9 feature classes
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 4
Screen shot Berlin
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 5
Origins and Goal
Proprietary applicationTeam up togethercontribute modifications to central data base.applications switch to GeoNames from proprietary aggregation
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 6
Challenge
A lot of data IS availableMany providersLanguagesScripts
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 7
GeoNames Ambassadors
GeoNames contactSpeak local languageKnow local situation
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 8
Data Sources
National Mapping AgenciesStatistical OfficesPostal codesNational Geospatial-Intelligence Agency (NGA) Applications using GeoNames− Data files− Manual modifications
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 9
US vs Europe
US data is freely availableEuropean data is not availableRest of the World?Consequences
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 10
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 11
Future of geodata availability
We believe basic geodata will be free in most countries
Why :− Economy− Traffic Policy and Road Safety (road signs)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 12
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 13
Free Availability is only a First Step
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 14
Who aggregates data
GeoNamesSuper national mapping agenciesSuper national organisations
INSPIRE
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 15
Problems and Solutions I
Shape / GMLDatum reprojection
FWTools/ GDAL/OGRPostgis/epsg/native tools/custom impl
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 16
Problems and Solutions II
FeatureCodes not 1:1non-ASCIICountry codesAdmin1 codes
Pattern matchingTransliteration
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 17
Place name matching
GeocodingDistancefeature type and feature codeReverse geocoding, compare name similarity− levenshtein distance− letter pair similarity
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 18
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 19
Wikipedia GeoTemplates
Proliferation of GeoFormatsNo consensus, AnarchyExamples− <geo>48 46 36 N 121 48 51 W</geo>− {{coor d|48.7767|N|121.8142|W|}}− Berlin : |lat_deg = 52|lat_min = 31− ... (Any template you could possibly think of is used somewhere)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 20
Alternate Names
...Italian : BerlinoEnglish : BerlinArabic : نيلربKorean :���Thai : เบอรลินRussian : БерлинChinese :��Marathi : बर् लि न... (ca 100 names)
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 21
Postal codes
Geocode – postal code numeric distanceAccuracy, completeness
ScribbleMaps by Robert Kosara
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 22
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 23
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 24
Data Dump
Flat csv filesSimple formatEase of useFull daily dumpdaily modificationsrdf
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 25
Web Services
Search− Ranking
Tf idfRelevancy
− I18n
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 26
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 27
Hierarchy Web Services
HierarchyChildNeighbour Sibling
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 28
Gto
po30
SRTM
3
JDBC
Database : Postgres(postgis)
Lucene
Full Text IndexTF-IDF
Tomcat (Java)
Apache
mod rewrite
JSONjdom.org (xml) ROME (RSS)
JMSactiveMQ
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 29
Libraries
JavaDrupalRubyPhpPerlPythonLisp
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 30
Synchronization
Dail dumpDaily modificationJms
Rdf dump, periodically
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 31
Linked Data
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 32
Applications using GeoNames
thousands of applicationssearchSite navigationgeo-coding
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 33
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 34
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 35
GeoNames, Marc Wick Web 2.0 Expo - 8. Nov 2007 Berlin 36
Thank you for your attention.