demolitions and dali : web dev and data in a graph database

40

Upload: nicholas-doiron

Post on 22-Jan-2018

672 views

Category:

Software


1 download

TRANSCRIPT

• — TO —>

Also

OK, graph databases

• Instead of tables and SQL

• Nodes and relationships

• Specialized queries

• Not everything is a graph (and this is not sponsored)

Install / Update Neo4j

• Neo4j

• http://localhost:7474Community Edition 3.0.3

• Python, PIP, and Py2Neo

• py2neo.__version__ = ‘3b1’

Step 0 - installing• Install Neo4j - neo4j.com/install

• brew on Mac

• DigitalOcean has Linux instructions

• change default password

• Trouble installing locally?

• heroku addons:add graphene

Who uses graphs?

• Panama Papers

• IMDB / Six Degrees of Kevin Bacon

• Especially:

• social networks, research data, maps

• anywhere number of joins is large, indefinite, or unlimited

Cypher

MoMA.org• PostgreSQL sync to “The Museum System” CMS

outside our control

Who uses MoMA.org?• Tourists

• Researchers

• Distant art fans

• Members

The trouble with tables

• Many joins to get people, titles, photos, additional relationship info

• Speed of query

• Difficult to write new queries

Art Graph DB• did Picasso collaborate with other artists

in his lifetime?

• are any artists credited as painter, director, sculptor, etc?(maybe an art EGOT)

Let’s build that graph

• Artists and artworks

• Basic bio data, MoMA ID -> Artist node

• Future DB: all people connected

• Title, date, MoMA ID -> Artwork node

• ARTIST_OF relationship (include order)

Let’s build that graph

• git clonehttps://github.com/mapmeld/graph

!

• Building a scraper for MoMA

Demolitions and Dalíin a Graph Database

Nick Doiron - @mapmeld

Cypher

Cypher

On to OSM

If you’re interested

• Google: MapZen Extracts

• download a city

• for this script, download the OSM XML file

• if you like PostGIS, there is a download (no import script)

Benefits of OSM

• Open to use / full data

• Open to edit / choose tags

• HOT community

• Civil e-mail lists (Crimea)

Benefits of OSM

Google on OSM

• "Our maps representwhat you or I need to do on a day-to-day basisin the developed part of the world”

• — Google Maps Geospatial Technologist (quoted in FastCompany)

In Haiti and worldwide

In Haiti and worldwide

XML data

XML data• Nodes, ways, and relations

• Ways made up of multiple nodes

• Relations contain nodes and ways

• Practically:

• Multiple ways connect / combine

• Tags are a community construct

Smart Renderer

• When is a <way> a line (cul-de-sac) or a polygon (river, lake, parking lot)?

• Has to support world’s fonts

• Tag for real life, not for the renderer

Building graph data

• Script adds all roads to Neo4j

• Includes an array of node ids (can mix content types, similar to a document database)

• If two ways share a node with the same ID, link them both ways <—>

Cypher + OSM

* you can put an index on schema fields now

Problem

Google Prediction API

• Prediction based on a CSV

• Categorization or numerical

• Google generates a model and estimates accuracy

• Not allowed in Myanmar

Predicting Houses• Format 60,000+ rows of database export

• Choose categories to predict 2-3 years

• Competing models determine how important each column is

• Can it parse dates? Find patterns

• Edging up to ~74 percent accuracy

Network effect

• Adding network of streets

• Now tokens include not just my street and neighbors, but shared streets

Network effect

• Most demolitions have one house on their street demolished (it’s them)

Network effect

Network effect

• Google Prediction API reported 81% accuracy

• But is it good?

• Early optimization studies moved fire stations and left neighborhoods vulnerable

• City can’t maintain it… hasn’t continued to open their data

Looking forward

• Ideas for graph databases?Ways to release large graph data - as an API? As JSON files? As Neo4j dump?

• Ideas for statisticians / future research?

Demolitions and Dalíin a Graph Database

Nick Doiron - @mapmeld