streaming gis using postgis & sqlstream julian hyde - chief architect sunil mujumdar –...

23
Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Upload: claire-french

Post on 13-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Streaming GIS usingPostGIS & SQLstream

Julian Hyde - Chief Architect

Sunil Mujumdar – Founding Engineer

Page 2: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

The Data Crunch

» Data volumes rising fast

» Human-originated data (e.g. e-commerce purchases) rising fast

» Machine-generated data (e.g. e-commerce events and network

packets) rising faster

» Sensor data (e.g. GIS-enabled mobile phone, road sensors) faster still

» Every business needs answers with lower latency

» Every significant problem is massively parallel &

distributed:

» Geographically distributed organizations

» Multiple boxes for scale

» Exploit multiple cores

Page 3: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

The world is no longer flat

• In data warehouse, all records are equally important

• In many real-world applications, recent & close events are

much more important

Time

Spac

e

NowNow

Here

Page 4: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Case study: Mozilla

Page 5: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Data management is hard

» If you make a mistake, the system won’t be fast enough

» Can’t afford to lose data

» New technologies are very difficult to use

» MapReduce

» NoSQL

» Multi-threaded programming in Java, C++, Erlang, Scala, …

» Collaborate, interoperate, evolve

Page 6: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

SQL – life in the old dinosaur yet

» Widely spoken

» Rich

» Orthogonal

» Declarative

» Tune your system without

changing your logical schema

» Apps don’t interfere with each

other

» Adaptive

» Route around failure

» Exploit available resources

» Make tradeoffs to meet QoS goals

Page 7: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Streaming SQL: example #1

Tweets about this conference:

» SELECT STREAM ROWTIME, author, text

FROM Tweets

WHERE text LIKE ‘%#PGWest%'

Page 8: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Streaming SQL basics

» Streams:

» CREATE STREAM Tweets (

author VARCHAR(20),

text VARCHAR(140));

» Relational operators have streaming counterparts:

» Project (SELECT)

» Filter (WHERE)

» Union

» Join

» Aggregation (GROUP BY)

» Windowed aggregation (e.g. SUM(x) OVER window)

» Sort (ORDER BY)

Page 9: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Streaming SQL: example #2

» Each minute, return the number of clicks on each web

page:

» SELECT STREAM ROWTIME, uri, COUNT(*)

FROM PageRequests

GROUP BY FLOOR(ROWTIME TO MINUTE), uri

Page 10: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Streaming SQL: Time

» ROWTIME pseudo-column

» Provided by source application or generated by system

» WINDOW

» Present in regular SQL (e.g. SQL:2003) but more important in

streaming SQL

» Defines a ‘working set’ for streaming JOIN, GROUP BY, windowed

aggregation

» Monotonicity (“sortedness”)

» Prerequisite for certain streaming operations

Page 11: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Streaming SQL: example #3

Find all orders from New York that shipped within an hour:

» CREATE VIEW compliant_orders AS

SELECT STREAM *

FROM orders OVER sla

JOIN shipments

ON orders.id = shipments.orderid

WHERE city = 'New York'

WINDOW sla AS (RANGE INTERVAL '1' HOUR PRECEDING)

Page 12: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Streaming SQL: more

» Usual advanced SQL stuff:

» Schemas, views, tables

» Ability to nest queries

» User-defined functions and transforms

» Interoperate with 3rd party systems

» Adapters make external systems look like read/write streams

» Push/pull

» Active/passive

» Interact with databases:

» As source (change-data capture)

» Lookup (e.g. GIS lookup; normalizing current data using historic norms)

» As sink (populating the data warehouse)

Page 13: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Real-time road traffic monitoring

1. Map vehicle positions to

road segments

2. Compute average speed of

each road segment

3. Detect traffic incidents

Line segmentsrepresentingsections of freeway

Vehicleposition

» Vehicle id, latitude, longitude,

speed, timestamp

» 15,000 vehicles with sensors

» Each vehicle transmits each min

» Road network through New

South Wales, Australia

Page 14: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Copyright © 2010 SQLstream, Inc.

Google earth

Road traffic analytics architecture

Position LogStream

POSDATA_nnn.txt

POSDATA_n.txt

ParseRoadInfoLookup

PostGIS

SQLstream

TrafficAnalytics

Dashboard

Page 15: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Gathering input data

-- Define the Foreign Stream for reading log data

CREATE OR REPLACE FOREIGN STREAM "PositionLogStream" (

MESSAGE VARCHAR(132))

SERVER "PositionLogReader"

OPTIONS (file_pattern 'POSDATA.*\.txt')

DESCRIPTION 'Raw Vehicle Position Log Stream';

Page 16: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Problem 1: Map vehicle positions to road segments

SELECT STREAM segmentId,

roadElementId,

vehiclePositionX,

vehiclePositionY,

velocityX,

velocityY

FROM (TABLE RoadInfoLookup(

CURSOR (SELECT STREAM * FROM VehiclePositions),

'postgis_source.properties', -- data source properties

'road_segment', -- table name

'v_latitude', -- latitude column name

'v_longitude')) -- longitude column name

Page 17: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

SQLstream user-defined transform (UDX)

» public class RoadInfoLookupUdx {

public static void RoadInfoLookup(

ResultSet trafficInfoIn,

PreparedStatement roadSegmentInfoOut)

{

while (trafficInfoIn.next()) {

double latitude = trafficInfoIn.getDouble(1);

double longitutde = trafficInfoIn.getDouble(2);

int roadElementId = getInfo(latitude, longitude);

roadSegmentInfoOut.setDouble(1, latitude);

roadSegmentInfoOut.setDouble(2, longitude);

roadSegmentInfoOut.setDouble(3, roadElementId);

// etc.

roadSegmentInfoOut.executeUpdate();

}

}

Page 18: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Helper method to access PostGIS

» private int getInfo(

double latitude,

double longitude) throws SQLException

{

// First time through, prepare query.

if (pstmt == null) {

pstmt = connection.prepareStatement(

“select … from road_segments where

ST_Distance(uts_geom, ST_GeomFromText(?, srid))

< width”);

}

pstmt.setDouble();

ResultSet rset = pstmt.executeQuery();

rset.next();

return rset.getInt(1);

}

Page 19: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Problem 2: Compute average speed

Streaming query computes average over 15 minute sliding windowResults are written to Google Earth file (and elsewhere)

-- Average road element SpeedsCREATE OR REPLACE VIEW "EstimatedReSpeeds"

DESCRIPTION 'Estimated RE Speeds' ASSELECT STREAM "roadElementID",

AVG("vSpeed") OVER "last15" AS "reSpeed",

"reSpeedLimit"

FROM "Stage3"

WINDOW "last15" AS (

PARTITION BY "roadElementID"

RANGE INTERVAL '15' MINUTE PRECEDING);

Page 20: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Problem 3: Incident detection

» Use Bollinger bands to detect outliers (3 standard deviations = 99.7%)

CREATE OR REPLACE VIEW "Incidents"

DESCRIPTION 'Detect incidents' AS

SELECT STREAM ...

FROM ( SELECT STREAM "roadElementID",

AVG("vSpeed") OVER "lastMinute" AS "avgSpeedLastMinute",

AVG("vSpeed") OVER "last15" AS "avgSpeedLast15",

STDDEV("vSpeed") OVER "last15" AS "stddevSpeedLast15",

"reSpeedLimit", ...

FROM "Stage3"

WINDOW "last15" AS (PARTITION BY "roadElementID" RANGE INTERVAL '15' MINUTE PRECEDING)

WINDOW “lastMinute” AS (PARTITION BY "roadElementID" RANGE INTERVAL '1' MINUTE PRECEDING) )

WHERE "avgSpeedLastMinute" < "avgSpeedLast15" – 3 * "stddevSpeedLast15";

Page 21: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Summary

• Emergence of data problems that are:

– Real-time

– Geospatial

– High throughput

• In particular, Intelligent Transport Systems (ITS) analytics

• Need to combine streaming, GIS and relational (SQL)

• Technology synergy:

– PostGIS is a mature GIS implementation, integrates SQL with GIS

– SQLstream integrates SQL with streaming

Page 22: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Any questions?

Page 23: Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer

Thank you for attending!

Further reading:

» “Data in Flight” by Julian Hyde

(Communications of the ACM, Vol. 53

No. 1, Pages 48-52)

Blogs:

» http://www.sqlstream.com/blog

» http://julianhyde.blogspot.com

Twitter:

» @julianhyde

» @sunil_mujumdar