openstreetmap - georgia institute of technology€¦ · (osm) data. the main contention was that...
TRANSCRIPT
data field guideoverview
accessstandardscodebookpractices
visualizationworkflow
contextsources
OpenStreetMap Data Field Guide | Fall 2017 1
OpenStreetMap
overview
Atlanta’s Geographic Map is a subset of the data available on OpenStreetMap.org. OpenStreetMap
(OSM) is reportedly the largest open-source global mapping data service in existence today. It is an
ongoing, free and editable map built by volunteers as an alternative to other map providers.
Over 13 years, the map has been accessed by individuals, governments and companies for personal,
communal and commercial interests through an open-content license. It is a community mapping
project, supported by a MediaWiki and continuously developed using several open source components
by a community of users.
Historically focused on mapping the UK, OSM was started by an individual Steve Coast through tax-
funded projects to create huge map datasets. It contains curated geo-data since 2004, gaining a global
community now at 3 million registered users, who engage in maintaining the software, the map data
and channels to present the data in the form of applications. Since 2006, it has been backed by the
OpenStreetMap Foundation and espouses the Open Geospatial Convention.
The geo-data is collected via events from the public and communal activity. Its main aim is to encourage
collection, growth and distribution of geospatial data. The unique features of OSM are its localization of
map data and access to underlying data-points. These are collected a) by novices and experts, b) as
disparate data points and bulk uploads, c) by individuals, companies and civic bodies, so the data on
OSM is varied. However, once on OSM, any of these can be contested or refined by the local population.
OpenStreetMap Data Field Guide | Fall 2017 2
access
It is possible to export data in several formats. Users often extract sections of the data by selecting an
area on the map or specific features using query structures on the Turbo Query API. Building queries
requires an understanding of the main structure of map data on OSM - many resources online focus on
this aspect acting as references to extract these data points.
Data is available at the map site itself: https://www.openstreetmap.org. Popular formats include PBF,
OSM, and XML data extracts. There are multiple APIs, OverPass-Turbo: http://overpass-turbo.eu/ is the
API with read-only access and a simple query builder used by most people. However, for working with
raw geo-data there’s another Editing API and for embedding maps there is a Web Map Framework.
There are also processed data providers which have data in Shapefiles, Geojson, KML formats. Special
data like global coastlines, and other land/water polygons and clip map areas can be found on
www.openstreetmapdata.com
OSM data is fairly massive, even in a compressed form it exceeds 30 GB. A limited dataset like a 3 mile
radius around the Eastside beltline trail in Atlanta is a little under 2 MB, and most people export and
extract a subset of the map data. OSM has ongoing updates and daily updates on the API. Weekly
releases are scheduled for the overall database and daily for the US and European databases.
While it is not recommended, it is possible to download OSM map data as a full database (Planet.osm).
Smaller extracts at the continent, country and metro area can be done using the Overpass Turbo API.
Once data is downloaded, it can be cleaned or configured as required using other database tools.
OpenStreetMap Data Field Guide | Fall 2017 3
standards
In December 2006, OpenStreetMap started using Yahoo’s aerial photography as a backdrop for map
production. And, new online data editor Potlatch started being utilized instead of the old JOSM to
onboard new users. These affect the data collection but the underlying data standards have remained
stable for OSM in terms of the kind of elements observed and recorded by mappers.
OSM’s tags describe the geographic attributes of the physical artifact.
Features in the real world are tagged as key-value pairs. For example, a restaurant is attached as a value
to amenity as the key, to tag a food location. These are described further by data structures like nodes,
ways and relations explained below.
Notably, a node can be tagged as many things. Also relations can be used to create route maps to
visualize different “walks.” For example, bus routes in an area can be related to bus stops as nodes,
routes as directional ways and include a relation to any restrictions on paths.
Single points with latitude and longitudinal details, identifiable by unique node id assigned by the platform .
Nodes
Ordered lists of nodes with at least one tag. This can be an open or closed depending on whether the nodes create a closed area. Similarly one way streets can be a set to that booleanvalue but still be tagged.
Ways
This defines how one or more tags, nodes or ways may be linked. They model local, geographical and logical data. They can be an ordered list or a single node with relations defining the role of a feature in the overall map.
Relations
OpenStreetMap Data Field Guide | Fall 2017 4
standardsOpenStreetMap Data Field Guide | Fall 2017 5
codebook
The tags explained in TagInfo showcase a dataset in the format explained above.
The main structure of Key, Objects, Nodes, Ways, Relations, Users, Values and Prevalent Values helps to
understand what kind of data is most likely to be found (in terms of nodes, ways etc.) on OSM. It also
points to the related Wikipage for each tag to check what it means and associate it with the dataset.
For the dataset I chose to analyze, seen in the general map
area of the Eastside Beltline data:
1. Reviewed the keys to see the kind of prevalent tags (in
prevalent values)
2. Extracted data through a turbo query - “amenities=*” to
export the dataset of 6101 records on Overpass turbo
api, downloading the content in KML format
3. Clean and transform the data using database editing
software (I used Carto for cleaning the data)
Figure: OSM dataset available on search term “eastside beltline”
However, data manipulation by regular users is done using Osmosis, a powerful command-line tool to
process raw OSM data.
OpenStreetMap Data Field Guide | Fall 2017 6
codebook
www.taginfo.openstreetmap.org
OpenStreetMap Data Field Guide | Fall 2017 7
practices
The perspectives quoted here are mainly from an interview with an executive at OSM and blogs online.
It explains the data collection, gaps and errors that users are likely to encounter in OpenStreetMaps
(OSM) data.
The main contention was that while OSM may be more up to date than traditional map data, the missing
data tends to be whatever users do or don’t map. To serve the OSM user base, a diverse and dynamic
community is essential for collection, correction and updates to data.
I. Data Collection (and Tagging)
Data collection efforts center around teaching people how to get started with mapping data. They are
encouraged to map what they are familiar with especially when their motivation is to help during
disaster management. Typical contributors are highly educated, tend to be smaller communities around
the city area. Web products like Mapbox get to create interfaces using this data and enable a more
“divide and conquer” approach. This is a better data correction/collection process than a one-time
communal effort – which as seen in the case of Atlanta’s mapathon (2010) is not sustainable. Open Data
kit, a local/mobile based data collection app is an alternative source. However, adding new tags on OSM
does nothing much. If a tag gets popular in usage, the community steps in to adopt or retire the tag.
OpenStreetMap Data Field Guide | Fall 2017 8
practices
II. Data Gaps
Data gaps on OSM seem to be in cities that are car-focused (Dallas, TX, Los Angeles, CA). Efforts in
countries like Japan including importing data and add cultural richness to OSM, but it is mainly a drive
for data to help with disaster management like in Tanzania, Indonesia where it’s someone’s daily work to
collect data. A look at the list of contributors indicates which countries’ Governments were involved in
OSM’s data generation too. Another aspect is the inherited nature of the database (from UK). Points of
interest show up, not directions or coverage for certain road types because the OSM map background
layout is more like motorways (UK) unlike US maps.
Contributors: http://wiki.openstreetmap.org/wiki/Contributors
III. Data Errors
Errors in data are perceived to be a misunderstanding of what needed to be mapped and are often
corrected by the individuals who are part of the community. However, it must be noted that errors, even
of omission, reflect blindspots people exhibit considering they map for specific purposes. For example,
most map users are men and because women traditionally have less time, points of interests (POIs) like
childcare are not mapped though brothels are. Errors can be detected, like connections missing in the
roads mapped, but require people to go look and resolve the issue themselves.
OpenStreetMap Data Field Guide | Fall 2017 9
datavisualization
The sample visualization addresses: What does the beltline provide access to in terms of amenities,
transport, leisure and recreation at present towards its vision of connecting Atlanta’s 45 neighborhoods
through a 22 mile loop?
Process
For this visualization, I limited scope to a sample set of the Eastside beltline and amenities tagged in a
300m radius from OpenStreetMaps.
I acquired data by first running a query through the wizard for amenities=* on OverPass Turbo for the
Eastside beltline in the KML format.
Uploading this file into an online editor, I was able to parse data visually to identify 45 distinct amenities
as tags. I continued to filter data in the CSV format to form categories (represented in the chart) and
sub-categories.
To represent the data I used templates from RawGraphs.io. I tried using relational representations to
showcase relative size and hierarchies.
I refined these further for clarity and created a variation without the large outlier (car parking) due to
the kind of data mapped in a nearby residential area (Virgina Highlands) that skewed the dataset.
OpenStreetMap Data Field Guide | Fall 2017 10
datavisualization
Interpretation
The Sunburst charts visually represent how much of the data mapped is mainly parking, possibly
because OSM is used for navigation. Food follows and then rest areas (benches). The secondary level
sub-categories are also visible in this data representation. These are not complete representations of
the real world, however it helps to identify the data on OSM.
OpenStreetMap Data Field Guide | Fall 2017 11
data workflow
OpenStreetMap Data Field Guide | Fall 2017 12
How people Collect Data
1. Armchair mapping Tracelogs, OpenStreetCam, Mapillary, Pic4Carto, GPS traces, Street view, Aerial (Bing)
2. Outdoor MappingTracelogs, Timelogs, Paper, Field paper, Mobile/GPS
Data collection formats: video, images, photo, audio, timestampsDevices: camera, computers, GPs and paper; unique and bulk uploads
Upload processes
Single data points > Changeset > Potluck JOSM editorsBulk uploads > Review by proposal > Upload via editors
How people Edit & Review
1. Adding and reviewing notes in datasets2. Tags voting and usage3. Country level – local representation4. Review proposals (select committee)5. Conflicts resolved at local level
MediaWiki as the resource repository; Review forum and error tracking with OSM inspector, KeepRight and Osmose
Dataset
Nodes, ways, areas. and Tags (new added by proposal to community)
OpenStreetMap Community
Individuals, local groups, companies access and build data1
2
3
A B
context
As a community mapping project, volunteers collect data offline by surveying local areas for maps using
Mapnik and Leaflet. This is the primary data representation in the discursive context of a standard visual
map, the OpenStreetMap Carto which is a type of stylesheet, the largest open multi-contributor map of
its kind (link). While some elements like map tiles are rendered using Bing’s aerial imagery where
available, the potential and the principles underlying the Open Street Map (OSM) rely on local
knowledge; including how decisions were made and ways to participate in them.
An “operational context” of contribution and is in the disaster-relief workers who are engaged in
mapping a bulk of these areas. OSM volunteers offer focus on collecting data in these areas by creating
one-time projects based on demand. The data on these projects is collected by a concentrated set of
users with great detail to aid relief efforts. It is one of the few contexts where the OSM mapping work
maybe paid for to encourage participation. For example, for armchair mapping (memory) users are given
resources to be aware of the folksonomy (tagging) and options for bulk upload on some areas.
People who seek to contribute to OSM can easily access the website or download the app to map data.
However, data at this point is up for being contested by the locals on the OSM website.
Performing calculations is possible after extracting the data, but not directly on the interface provided
by OSM. This seems to be the intent, so users have the option to interpret data as they see fit.
iOS apps like Atlanta Map Offline Navigation provide an alternate interface to the data.
OpenStreetMap Data Field Guide | Fall 2017 13
contextOpenStreetMap Data Field Guide | Fall 2017 14
Sources+ author
http://wiki.openstreetmap.org/
https://www.openstreetmap.us/about/
http://paulnorman.ca/blog/2015/11/openstreetmap-carto-complexity/
https://www.directionsmag.com/article/1823
https://www.theguardian.com/technology/2014/jan/14/why-the-world-needs-openstreetmap
http://learnosm.org/en/osm-data/data-overview/
http://overpass-turbo.eu/
This codebook was prepared by Udaya Lakshmi, a PhD student in Human-Centered Computing. She has
a background in user interface design, human computer interaction, and marketing communications.
She can be reached at udaya[at]gatech.edu.
OpenStreetMap Data Field Guide | Fall 2017 15