Survey frame geocoding using administrative data sources
Mirosław MigaczChief GIS Specialist
Central Statistical Office of Poland
Dublin, 2 XI 2017
Survey frames
• Survey frame statistical units:• persons,
• buildings,
• enterprises,
• farms.
• Georeference – localization of statistical units:• facilitates field work for interviewers,
• facilitates survey management,
• enables survey result presentation on maps.
TERCVOIVODSHIP
POWIAT GMINA
NATIONAL REGISTER OF BOUNDARIES (PRG)
SIMC LOCALITYNATIONAL REGISTER OF GEOGRAPHICAL NAMES
(PRNG)
BREC STATISTICAL REGIONCENSUS ENUMERATION AREA
STATISTICAL REGIONCENSUS ENUMERATION AREA
BOUNDARIES
ULIC STREET STREET AXES
NOBC BUILDINGDWELLING
STATISTICAL ADDRESS POINTS
Spatial data in official statistics
• referencematerialcollection
January 2009 –December 2009
• referencematerialprocessing
January 2010 –May 2010 • address point
acquisition
January 2010 –June 2010
• address point databaseupdate
continuouslysince July 2010
4
PBA – spatial address databases
5
PBA vs survey frames
• PBA• locations of buildings with at least one dwelling.
• Survey frames• OBS – frame for social surveys,
• OBR – frame for agricultural surveys,
• BJS – statistical unit database (enterprises)
BJSOBR
OBS
SURVEY FRAME
• address data
• TERYT identifiers
SPATIAL DATA
• address points
• TERYT identifiers
GEOCODING
Survey frame geocoding
PBA OBS
OBR
BJSothersources
PBA vs survey frames
PRG OBS
OBR
BJSLPIS
complementary
Survey frames vs data sources
PRG
OBS
OBR
BJSLPIS
PBAFarm StructureSurvey 2016
Survey frame geocoding
PRG
OBS
OBR
BJSLPIS
PBA
Improvement of the use of administrative sources (ESS.VIP ADMIN)
Survey frame geocoding
PRG
OBS
OBR
BJSLPIS
PBA Improvement of the use of administrative sources (ESS.VIP ADMIN) - application
Survey frame geocoding
gmina(LAU2)
EMUiA
PRG
address pointsfor allbuildings
National Register of Boundaries(PRG)
register of:- localities- streets- addresses
Register of localities, streetsand addresses (EMUiA)
Address point
Locality
Street
Register of localities, streetsand addresses (EMUiA)
Administrativeunit
Locality
TERYT identifier – voidable
Register of localities, streetsand addresses (EMUiA)
Street name
Locality
TERYT identifier – voidable
Street
Register of localities, streetsand addresses (EMUiA)
street
address
locality
Register of localities, streetsand addresses (EMUiA)
Gmina Locality Loc.ID
Street Str.ID
Addr. #
X Y
name Węgorzyno X Kolejowa X 1 281563,44 636550,11
Address point
Gmina ID Locality Loc. ID
X Węgorzyno 0980062
Locality
Gmina ID Locality/Loc. ID
Street Str. ID
X X Kolejowa 08828
Street
EMUiA – problems
• multiple localities with the same name within one voivodeship
• multiple street names with the same name withinone voivodeship / gmina / locality
• typing errors
• completeness issues
EMUiA – solutions
• assign gmina ID to localities:• pairing by locality ID
with TERYT localityregister
• spatial join
• assign gmina ID to address points:• spatial join
• assign locality ID to address points:• pair by both: gmina
ID and locality name
EMUiA – solutions
• assign street ID to address points:• pairing by street
name with the streetfeature class
• pairing with TERYT street catalogue […]
Pairing w/ TERYT street catalogue
LocalityID (SIMC)
StreetID
Street name variations:NAZWA_1ULICA_1: NAZWA_2 + NAZWA_1ULICA_2: NAZWA_1 + NAZWA_2ULICA_3: CECHA + NAZWA_2 + NAZWA_1
(CECHA + NAZWA_1 if NAZWA_2 IS NULL)
Pairing w/ TERYT street catalogue
Street name variations:NAZWA_1ULICA_1: NAZWA_2 + NAZWA_1ULICA_2: NAZWA_1 + NAZWA_2ULICA_3: CECHA + NAZWA_2 + NAZWA_1
(CECHA + NAZWA_1 if NAZWA_2 IS NULL)
Pairing by:• SIMC + NAZWA_1• SIMC + ULICA_1• SIMC + ULICA_2• SIMC + ULICA_3• NAZWA_1• ULICA_1• ULICA_2• ULICA_3
Locality ID(SIMC)
Street ID
Pairing w/ TERYT street catalogue
• pairing addresses with streetcatalogue by street names (string)
• multiple matches -> multiplyingaddress point records
• result: 13 635 270 matched addresspoint records (initial number of address points: 7 533 868),
• 275 453 (3,6%) out of 7 533 868 address points with a street namepresent but no street ID assigned,
Survey frame geocoding
• agricultural survey frame: a bit more than half of recordsqualified for pairing (identifierspresent) acquired georeference,
• other survey frames: Q4 2017, Q1 2018
Conclusions on source data
• hope for data quality improvement over time (the PRG dataset tested is dated 13.06.2016),
• other techniques for record matching in order to assign identifiers to more address points:• building an address locator for ArcGIS geocoding tools,
• string distance analyses (e.g. stringdist Python module).
Mirosław MigaczChief GIS SpecialistCentral Statistical Office of Poland
@mireslav
www.linkedin.com/in/migacz
www.slideshare.net/MirosawMigacz
Survey frame geocoding using administrative data sources