openstreetmap - georgia institute of technology€¦ · (osm) data. the main contention was that...

data field guideoverview

accessstandardscodebookpractices

visualizationworkflow

contextsources

OpenStreetMap Data Field Guide | Fall 2017 1

OpenStreetMap

overview

Atlanta’s Geographic Map is a subset of the data available on OpenStreetMap.org. OpenStreetMap

(OSM) is reportedly the largest open-source global mapping data service in existence today. It is an

ongoing, free and editable map built by volunteers as an alternative to other map providers.

Over 13 years, the map has been accessed by individuals, governments and companies for personal,

communal and commercial interests through an open-content license. It is a community mapping

project, supported by a MediaWiki and continuously developed using several open source components

by a community of users.

Historically focused on mapping the UK, OSM was started by an individual Steve Coast through tax-

funded projects to create huge map datasets. It contains curated geo-data since 2004, gaining a global

community now at 3 million registered users, who engage in maintaining the software, the map data

and channels to present the data in the form of applications. Since 2006, it has been backed by the

OpenStreetMap Foundation and espouses the Open Geospatial Convention.

The geo-data is collected via events from the public and communal activity. Its main aim is to encourage

collection, growth and distribution of geospatial data. The unique features of OSM are its localization of

map data and access to underlying data-points. These are collected a) by novices and experts, b) as

disparate data points and bulk uploads, c) by individuals, companies and civic bodies, so the data on

OSM is varied. However, once on OSM, any of these can be contested or refined by the local population.


access

It is possible to export data in several formats. Users often extract sections of the data by selecting an

area on the map or specific features using query structures on the Turbo Query API. Building queries

requires an understanding of the main structure of map data on OSM - many resources online focus on

this aspect acting as references to extract these data points.

Data is available at the map site itself: https://www.openstreetmap.org. Popular formats include PBF,

OSM, and XML data extracts. There are multiple APIs, OverPass-Turbo: http://overpass-turbo.eu/ is the

API with read-only access and a simple query builder used by most people. However, for working with

raw geo-data there’s another Editing API and for embedding maps there is a Web Map Framework.

There are also processed data providers which have data in Shapefiles, Geojson, KML formats. Special

data like global coastlines, and other land/water polygons and clip map areas can be found on

www.openstreetmapdata.com

OSM data is fairly massive, even in a compressed form it exceeds 30 GB. A limited dataset like a 3 mile

radius around the Eastside beltline trail in Atlanta is a little under 2 MB, and most people export and

extract a subset of the map data. OSM has ongoing updates and daily updates on the API. Weekly

releases are scheduled for the overall database and daily for the US and European databases.

While it is not recommended, it is possible to download OSM map data as a full database (Planet.osm).

Smaller extracts at the continent, country and metro area can be done using the Overpass Turbo API.

Once data is downloaded, it can be cleaned or configured as required using other database tools.


standards

In December 2006, OpenStreetMap started using Yahoo’s aerial photography as a backdrop for map

production. And, new online data editor Potlatch started being utilized instead of the old JOSM to

onboard new users. These affect the data collection but the underlying data standards have remained

stable for OSM in terms of the kind of elements observed and recorded by mappers.

OSM’s tags describe the geographic attributes of the physical artifact.

Features in the real world are tagged as key-value pairs. For example, a restaurant is attached as a value

to amenity as the key, to tag a food location. These are described further by data structures like nodes,

ways and relations explained below.

Notably, a node can be tagged as many things. Also relations can be used to create route maps to

visualize different “walks.” For example, bus routes in an area can be related to bus stops as nodes,

routes as directional ways and include a relation to any restrictions on paths.

Single points with latitude and longitudinal details, identifiable by unique node id assigned by the platform .

Nodes

Ordered lists of nodes with at least one tag. This can be an open or closed depending on whether the nodes create a closed area. Similarly one way streets can be a set to that booleanvalue but still be tagged.

Ways

This defines how one or more tags, nodes or ways may be linked. They model local, geographical and logical data. They can be an ordered list or a single node with relations defining the role of a feature in the overall map.

Relations


http://wiki.openstreetmap.org/wiki/Potlatch

standardsOpenStreetMap Data Field Guide | Fall 2017 5

codebook

The tags explained in TagInfo showcase a dataset in the format explained above.

The main structure of Key, Objects, Nodes, Ways, Relations, Users, Values and Prevalent Values helps to

understand what kind of data is most likely to be found (in terms of nodes, ways etc.) on OSM. It also

points to the related Wikipage for each tag to check what it means and associate it with the dataset.

For the dataset I chose to analyze, seen in the general map

area of the Eastside Beltline data:

1. Reviewed the keys to see the kind of prevalent tags (in

prevalent values)

2. Extracted data through a turbo query - “amenities=*” to

export the dataset of 6101 records on Overpass turbo

api, downloading the content in KML format

3. Clean and transform the data using database editing

software (I used Carto for cleaning the data)

Figure: OSM dataset available on search term “eastside beltline”

However, data manipulation by regular users is done using Osmosis, a powerful command-line tool to

process raw OSM data.


https://taginfo.openstreetmap.org/keys

codebook

www.taginfo.openstreetmap.org


practices

The perspectives quoted here are mainly from an interview with an executive at OSM and blogs online.

It explains the data collection, gaps and errors that users are likely to encounter in OpenStreetMaps

(OSM) data.

The main contention was that while OSM may be more up to date than traditional map data, the missing

data tends to be whatever users do or don’t map. To serve the OSM user base, a diverse and dynamic

community is essential for collection, correction and updates to data.

I. Data Collection (and Tagging)

Data collection efforts center around teaching people how to get started with mapping data. They are

encouraged to map what they are familiar with especially when their motivation is to help during

disaster management. Typical contributors are highly educated, tend to be smaller communities around

the city area. Web products like Mapbox get to create interfaces using this data and enable a more

“divide and conquer” approach. This is a better data correction/collection process than a one-time

communal effort – which as seen in the case of Atlanta’s mapathon (2010) is not sustainable. Open Data

kit, a local/mobile based data collection app is an alternative source. However, adding new tags on OSM

does nothing much. If a tag gets popular in usage, the community steps in to adopt or retire the tag.


practices

II. Data Gaps

Data gaps on OSM seem to be in cities that are car-focused (Dallas, TX, Los Angeles, CA). Efforts in

countries like Japan including importing data and add cultural richness to OSM, but it is mainly a drive

for data to help with disaster management like in Tanzania, Indonesia where it’s someone’s daily work to

collect data. A look at the list of contributors indicates which countries’ Governments were involved in

OSM’s data generation too. Another aspect is the inherited nature of the database (from UK). Points of

interest show up, not directions or coverage for certain road types because the OSM map background

layout is more like motorways (UK) unlike US maps.

Contributors: http://wiki.openstreetmap.org/wiki/Contributors

III. Data Errors

Errors in data are perceived to be a misunderstanding of what needed to be mapped and are often

corrected by the individuals who are part of the community. However, it must be noted that errors, even

of omission, reflect blindspots people exhibit considering they map for specific purposes. For example,

most map users are men and because women traditionally have less time, points of interests (POIs) like

childcare are not mapped though brothels are. Errors can be detected, like connections missing in the

roads mapped, but require people to go look and resolve the issue themselves.


http://wiki.openstreetmap.org/wiki/Contributors

datavisualization

The sample visualization addresses: What does the beltline provide access to in terms of amenities,

transport, leisure and recreation at present towards its vision of connecting Atlanta’s 45 neighborhoods

through a 22 mile loop?

Process

For this visualization, I limited scope to a sample set of the Eastside beltline and amenities tagged in a

300m radius from OpenStreetMaps.

I acquired data by first running a query through the wizard for amenities=* on OverPass Turbo for the

Eastside beltline in the KML format.

Uploading this file into an online editor, I was able to parse data visually to identify 45 distinct amenities

as tags. I continued to filter data in the CSV format to form categories (represented in the chart) and

sub-categories.

To represent the data I used templates from RawGraphs.io. I tried using relational representations to

showcase relative size and hierarchies.

I refined these further for clarity and created a variation without the large outlier (car parking) due to

the kind of data mapped in a nearby residential area (Virgina Highlands) that skewed the dataset.


http://overpass-turbo.eu/

https://carto.com/

datavisualization

Interpretation

The Sunburst charts visually represent how much of the data mapped is mainly parking, possibly

because OSM is used for navigation. Food follows and then rest areas (benches). The secondary level

sub-categories are also visible in this data representation. These are not complete representations of

the real world, however it helps to identify the data on OSM.


data workflow


How people Collect Data

1. Armchair mapping Tracelogs, OpenStreetCam, Mapillary, Pic4Carto, GPS traces, Street view, Aerial (Bing)

2. Outdoor MappingTracelogs, Timelogs, Paper, Field paper, Mobile/GPS

Data collection formats: video, images, photo, audio, timestampsDevices: camera, computers, GPs and paper; unique and bulk uploads

Upload processes

Single data points > Changeset > Potluck JOSM editorsBulk uploads > Review by proposal > Upload via editors

How people Edit & Review

1. Adding and reviewing notes in datasets2. Tags voting and usage3. Country level – local representation4. Review proposals (select committee)5. Conflicts resolved at local level

MediaWiki as the resource repository; Review forum and error tracking with OSM inspector, KeepRight and Osmose

Dataset

Nodes, ways, areas. and Tags (new added by proposal to community)

OpenStreetMap Community

Individuals, local groups, companies access and build data1

2

3

A B

context

As a community mapping project, volunteers collect data offline by surveying local areas for maps using

Mapnik and Leaflet. This is the primary data representation in the discursive context of a standard visual

map, the OpenStreetMap Carto which is a type of stylesheet, the largest open multi-contributor map of

its kind (link). While some elements like map tiles are rendered using Bing’s aerial imagery where

available, the potential and the principles underlying the Open Street Map (OSM) rely on local

knowledge; including how decisions were made and ways to participate in them.

An “operational context” of contribution and is in the disaster-relief workers who are engaged in

mapping a bulk of these areas. OSM volunteers offer focus on collecting data in these areas by creating

one-time projects based on demand. The data on these projects is collected by a concentrated set of

users with great detail to aid relief efforts. It is one of the few contexts where the OSM mapping work

maybe paid for to encourage participation. For example, for armchair mapping (memory) users are given

resources to be aware of the folksonomy (tagging) and options for bulk upload on some areas.

People who seek to contribute to OSM can easily access the website or download the app to map data.

However, data at this point is up for being contested by the locals on the OSM website.

Performing calculations is possible after extracting the data, but not directly on the interface provided

by OSM. This seems to be the intent, so users have the option to interpret data as they see fit.

iOS apps like Atlanta Map Offline Navigation provide an alternate interface to the data.


http://paulnorman.ca/blog/2015/11/openstreetmap-carto-complexity/

contextOpenStreetMap Data Field Guide | Fall 2017 14

Sources+ author

http://wiki.openstreetmap.org/

https://www.openstreetmap.us/about/


https://www.directionsmag.com/article/1823

https://www.theguardian.com/technology/2014/jan/14/why-the-world-needs-openstreetmap

http://learnosm.org/en/osm-data/data-overview/


This codebook was prepared by Udaya Lakshmi, a PhD student in Human-Centered Computing. She has

a background in user interface design, human computer interaction, and marketing communications.

She can be reached at udaya[at]gatech.edu.


http://wiki.openstreetmap.org/

https://www.openstreetmap.us/about/


https://www.directionsmag.com/article/1823

https://www.theguardian.com/technology/2014/jan/14/why-the-world-needs-openstreetmap

http://learnosm.org/en/osm-data/data-overview/


openstreetmap - georgia institute of technology€¦ · (osm) data. the main contention was that...

Documents