exploratory analysis of openstreetmap for land use classification
DESCRIPTION
Presented at the 2nd ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information (GEOCROWD) 2013TRANSCRIPT
Nome e/ou Título e/ou Outros05-11-2013 1
Exploratory analysis of OpenStreetMap for land use classification
Jacinto Estima and Marco [email protected]; [email protected] www.isegi.unl.pt
2nd International Workshop on Crowdsourced and Volunteered Geographic Information (GEOCROWD) 2013
21st International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2013)
November 5 - 8, 2013 — Orlando, Florida, USA
Agenda
• Introduction• Objective• Related work:
– Volunteered Geographic Information– VGI Initiatives (examples)– Research using VGI
• Material and Methods• Results and Discussion• Conclusions
05-11-2013 2GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Introduction• VGI has become exponentially available over the web in the last
years• An inventory made by Elwood in 2009 identified 99 VGI initiatives
running• Research has already been conducted in some areas:
– emergency response– Navigation– Land Use/Cover validation– etc.
• To our best knowledge, no study using OSM in the production of Land Use/Cover databases exists
05-11-2013 3GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Objective
• Objective:– Conduct an exploratory analysis of the OSM database for land
use/cover production, using Corine Land Cover (CLC) as reference data
• Contributions:– Establish a tentative to relate both nomenclatures– Evaluate the quality of OSM land use classification over
continental Portugal taking CLC as reference data, to assess if it can be used as ground truth for LULC validation in the future
05-11-2013 4GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Volunteered Geographic Information• “Spatial” type of User Generated Content (UGC) contributed by
volunteers• Has been exponentially growing since 2005:
– Evolution of important technologies (Web 2.0, GPS, etc.)
– Willingness of private citizens to contribute
• 70% of the initiatives counted by Elwood started after 2005 (when Google Maps was launched)
• Issues:– heterogeneity, absence of formal structures and quality control
procedures, absence of metadata
• Advantages:– Quantity, temporal coverage, and the local knowledge of its contributors
05-11-2013 5GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
VGI initiatives (examples)• HD Traffic TM from TomTom (real-time traffic data)• OpenStreetMap (OSM) (aims to provide free geographic data for
free to anyone)• Wikimapia (based in Google Maps)• Flickr (Kisilevich downloaded a total of 86,314,466 geotagged
photos in 2010)• Map Tube (“Place to put maps”)• “Did you feel it” (USGS initiative for earthquake mapping)• Etc.
05-11-2013 6GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Research using VGI• Fritz et al. developed a plattform that uses a global network of volunteers to
help improving the quality of global land cover maps• Leung and Newsam (2010) conducted some experiments to automatically
derive maps of what-is-where from large collections of georeferenced photos (they achieved around 75% of classification accuracy)
• Estima and Painho (2013) explored the possibility of using Flickr photos as a source of truth data to help in the accuracy assessment phase of land use/cover production
• OSM:– Over et al. (2010) studied, for the first time, the possibility of generating interactive
3D City Models based on free geo-data available from OSM, and public domain height information provided by the Shuttle Radar Topography Mission
– Al-Bakri and Fairbairn (2012) used OSM and Ordnance Survey (OS) to give one step towards the integration of geospatial datasets from varied sources (focus on semantic and structural similarities)
05-11-2013 7GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Study area and Datasets (1)
• Study area– The study area is Continental Portugal– The land cover is mainly composed by agricultural and forest
areas (around 95%)
• Datasets– OSM database (only polygon datasets were used to quantify
areas):• Buildings, Landuse and Natural Areas datasets
– Corine Land Cover (CLC) database for the CLC2006 inventory - version 16 (04/2012) – vector format
– Portuguese official administrative boundaries database - “Carta Administrativa Oficial de Portugal” (CAOP) – vector format
05-11-2013 8GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Study area and Datasets (2)
05-11-2013 9
a)
c)
b)
GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
• CLC nomenclature• OSM nomenclature available from http://
wiki.openstreetmap.org/wiki/Map_Features
• Assumptions:– We assume the time difference between CLC and OSM
databases (2006 for CLC and 2013 for OSM) would not represent a major issue, Considering a yearly average change value of land cover in Europe of 0.23%
Methods
1. Analysis of OSM datasets (nomenclature and area of coverage)
2. Analysis and establishment of a relationship between the nomenclatures (OSM and CLC)
3. Analysis of the coverage of each OSM class using CLC level 1 as reference
4. Analysis of the matching degree between related classes
5. Analysis of the OSM spatial distribution
05-11-2013 10GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
1. Analysis of OSM datasets
05-11-2013 11GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Dataset Area (Ha) Countrycoverage (%)
Natural areas 140006.95 1.57%Landuse 144350.23 1.62%Buildings 7057.61 0.08%Total 3.27%Overlapping areas - 0.03%Total 3.24%
Areas of coverage of OSM datasets
Natural areasdataset
Landusedataset
Buildingsdataset
Area(Ha)
Forest Military None 5.24Residential Reservoir_cover 0.02Recreation_ground Hospital 0.25
Park Commercial None 0.01Residential Museum
CafeChapelChurchHouseLibraryMuseumPublicPublic_buildingRestaurantRoofTheatreToiletsYes
0.390.050.010.000.030.030.080.020.370.030.010.030.000.01
Existing classification differences (overlapping areas)
2. OSM and CLC relationship nomenclatures
05-11-2013 12GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
OSM classesCLC classes
Level 3 Level 2 Level 1Landuse dataset
Abutters 111-112-121 11-12 1Allotments 242 24 2Basin 512 51 5Beach 331 33 3Brownfield 133 13 1Cemetery 111-112 11 1Commercial 121 12 1Conservation 313-312-311 31 3Construction 133 13 1Farm 222-231-241-242 22-23-24 2Farmland 222-231-241-242 22-23-24 2Farmyard 222-231-241-242 22-23-24 2Field ? ? ?Garages 122 12 1Garden 142 14 1Grass 231-321 23-32 2-3
3. Coverage analysis of OSM datasets
05-11-2013 13GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
CLC classesArea from CLC
(Ha)Area from OSM
(Ha)Class coverage
(%)unclassified --- 7036.75 ---
1 309716.89 62407.48 20.152 4199177.27 34309.93 0.823 4259642.22 98536.62 2.314 28777.11 64.59 0.225 110906.66 82621.61 74.50
Coverage areas from CLC level 1 and OSM
1. Gave to each OSM class got its correspondent CLC level12. Dissolve by CLC level1 class3. Removed overlapping areas (not deducted) – 1.39% of the total OSM area
4. Matching degree between classes (areas)
05-11-2013 14GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Confusion matrix of CLC vs. OSM classifications
ClassClassificationaccuracy (%)
1 84.3%2 46.6%3 83.5%4 1.2%5 99.5%
Global 76.7%
Classification accuracy
5. OSM spatial distribution
05-11-2013 15GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Spatial distribution of OSM classified areas over continental Portugal
Distribution of classes’ coverage areas by continental Portuguese districts
Conclusions
• Tentative to relate OSM and CLC• Determined the accuracy of classification of OSM
polygon features based on CLC level 1 classes• Analyzed OSM spatial distribution• Results show that might be worth to study OSM with the
more detailed CLC levels 1 and 2• Classification accuracy of 76.7% (23.3% need further
investigation):– Not all the classes have similar accuracy– We believe that it might be used, for instance, as another source
of ground truth data in the validation process of LULC databases
05-11-2013 16GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Conclusions (2)
• Further investigation needed:– Correspondence between OSM and the 3 levels of CLC– Ways to avoid “not_known” class and classes without
description– The cause of discrepancies between both classifications (errors
or just different views?)– Understand the real effect of conflicting overlapping areas
(1.39%) (level of contributors trust to decide which one is correct?)
05-11-2013 17GEOCROWD 2013 • ACM SIGSPATIAL GIS 2013November 5 - 8, 2013 — Orlando, Florida, USA
Nome e/ou Título e/ou Outros05-11-2013 18
Thank you for your attention
Jacinto Estima and Marco [email protected]; [email protected]
www.isegi.unl.pt
2nd International Workshop on Crowdsourced and Volunteered Geographic Information (GEOCROWD) 2013
21st International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2013)
November 5 - 8, 2013 — Orlando, Florida, USA