state of geospatial bigdata - university of redlands...@mraad 2016 hql drop table if exists logs;...

124
@mraad 2016 Mansour Raad thunderheadxpler.blogspot.com [email protected] @mraad State of Geospatial BigData

Upload: others

Post on 30-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

@mraad 2016

Mansour Raad thunderheadxpler.blogspot.com

[email protected] @mraad

State of Geospatial BigData

@mraad 2016

“WHERE” IS UBIQUITOUS !

@mraad 2016

• Where is the closest ATM ?

• Where is the best location to place my store ?

• Where is UBL ?

• Where is the next Ebola/Zika outbreak ?

@mraad 2016

A BIT OF HISTORY…

With Esri Specifically :-)

@mraad 2016

~1990

@mraad 2016

http://en.wikipedia.org/wiki/Shapefile

@mraad 2016

1995

@mraad 2016

RDBMS ID,Name,Address,Lat,Lon

Spatial Data Engine

APITCP/IP

SQL

@mraad 2016

>1996-2005

@mraad 2016

RDBMS ID,Name,Address,Lat,Lon

xDBC

@mraad 2016

SPATIAL INDEXING

http://en.wikipedia.org/wiki/Spatial_database#Spatial_index

@mraad 2016

QUADTREE

http://en.wikipedia.org/wiki/Quadtree

@mraad 2016

R-TREE

http://en.wikipedia.org/wiki/R-tree

@mraad 2016

(NOT SO) MODERN DAY…

@mraad 2016

STORY TIME…

@mraad 2016

U.S.Demographic

Data

@mraad 2016

@mraad 2016

FOR EACH LOCATIONFOR EACH DEMOGRAPHIC

⬇50 MILE HEATMAP

@mraad 2016

@mraad 2016

TRADITIONAL MEANS…

14 Days

850 GB Raster Files

@mraad 2016

BETTER WAY ?

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

BIG DATA ?

@mraad 2016

U R IN BIGDATA SPACE IF…

Volume

Velocity Variety

@mraad 2016

BUT THEN I’VE SEEN…

Volume Velocity Variety

Veracity Validity

Visualization Vulnerability

Value

→ data at rest → data in motion → many types → data in doubt → data that is correct → data in patterns → data at risk → data that is meaningful

@mraad 2016

I’M STICKING WITH…

Volume

Velocity Variety

@mraad 2016

@mraad 2016

NOSQL(NOT ONLY SQL :-)

@mraad 2016

GEOJSON

http://geojson.org/

@mraad 2016

{ "type": "Feature", "geometry": { "type": "Point", "coordinates": [125.6, 10.1] }, "properties": { "name": "Dinagat Islands" }}

@mraad 2016

• Points

• Lines

• Polygons

• Multipoints

• Multilines

• Multipolygons

• Geometry Collection

@mraad 2016

Key1→Value1Key2→Value2…KeyN→ValueN

Sorted

byte[] byte[]

@mraad 2016

GEOHASH

http://en.wikipedia.org/wiki/Geohash

@mraad 2016

00 10-180 180

-90

90

01 11

if left of vertical center set left bit to 0 else 1if lower of horizontal center set right bit 0 else 1

@mraad 2016

0000 1000-180 180

-90

90

0100 1100

0010

0001 0011

1111

@mraad 2016

0000 1000-180 180

-90

90

0100 1100

0010

0001 0011

1111

@mraad 2016

0000 1000-180 180

-90

90

0100 1100

0010

0001 0011

1111

@mraad 2016

SPACE FILLING CURVES~ 1880

@mraad 2016

N DIM→1DIM

@mraad 2016

HILBERT CURVE

http://en.wikipedia.org/wiki/Space-filling_curve

@mraad 2016

SPACE LINEARIZATION

@mraad 2016

SPATIAL SUPPORT

RTree

@mraad 2016

INDIRECT SUPPORT

@mraad 2016

WHAT IS OLD…IS NEW AGAIN !

@mraad 2016

SPATIAL MIDDLEWARE

@mraad 2016

NoSQL ID,Name,Address,Lat,Lon

Spatial Data Engine

http://ws://

NoSQLAPI

@mraad 2016

NOT A BIGGER OX…

@mraad 2016

HADOOP.APACHE.ORG

@mraad 2016

WHAT’S IN A NAME ?

http://blog.pivotal.io/pivotal/products/demystifying-hadoop-in-5-pictures

@mraad 2016

WHAT IS HADOOP ?

• Library / Framework

• Very Very Large Un/Structured Dataset

• Multi Node Distributed Processing

• Resilient To Commodity Hardware Failure

@mraad 2016

HADOOP BASIC STACK

Hadoop Distributed File System (HDFS)

Yet Another Resource Negotiator (YARN)

Commodity Servers

MapReduce

@mraad 2016

OTHER HADOOP PROJECTS• Avro - Serialization / RPC System

• HBase - Distributed Columnar Database

• Hive - Ad Hoc “SQL” Interface

• Pig - Data Flow Parallel Execution (AML)

• ZooKeeper - Coordination Service

• More….

@mraad 2016

HDFS• Distributed File System

• Lots and Lots of Commodity Drives

• Fault Tolerant

• Loves Big Files

• “POSIX” Like Interface

@mraad 2016

HDFS

NameNode

DataNode DataNode DataNode

HDFS Client

@mraad 2016

HDFS Resilience !HDFS

DataNode DataNode DataNode

@mraad 2016

BigData

Program

@mraad 2016

BigData

Program

@mraad 2016

MAPREDUCE

http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html

@mraad 2016

WHAT IS MAPREDUCE ?• Parallel Fault Tolerant Framework

• Splits Large Input

• Invoke User Defined “Map” Function

• Shuffle and Sort

• Invoke User Defined “Reduce” Function

@mraad 2016

MAPREDUCE & HDFS

Data Node

Task Tracker

Data Node

Task Tracker

Data Node

Task Tracker

Name Node

Job TrackerClient

.jar

@mraad 2016

WRITING MR IS HARD…

@mraad 2016

HOW ABOUT….NO PROGRAMING ???

@mraad 2016

@mraad 2016

APACHE HIVE

“SQL” ⬇

MapReduce Job

@mraad 2016

HQLdrop table if exists logs;

create external table if not exists logs( ip string, method string, uri string, status string, bytes int, time_taken int, referrer string, user_agent string ) partitioned by (year int, month int, day int, hour int) row format delimitedfields terminated by '\t' lines terminated by '\n' stored as textfile location ‘hdfs://hadoop:8020/logs/';

@mraad 2016

OTHER ADHOC ENGINES

• Cloudera Impala

• Facebook Presto

• SparkSQL

• Bypass MR generation / Direct HDFS Access

@mraad 2016

WHAT ABOUT SPATIAL ?

@mraad 2016

@mraad 2016

GIS TOOLS FOR HADOOP

• Computational Geometry Library

• Hive Spatial UDF Functions

• GeoProcessing Extensions to ArcMap

@mraad 2016

GEOMETRY LIBRARY• Points / Lines / Polygons

• I/O (GeoJSON,WTK,WBT,Shape)

• Spatial Relations (inside, touches, intersects,…)

• Spatial Operations (buffer, cut, convex hull,…)

• In-Memory Spatial Index

@mraad 2016

API USAGE IN BIGDATA

• Map-only jobs - GeoEnrichment

• Given set of locations

• Given demographic area

• Augment location with demographic attributes

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

TELCO• CDR - Call Data Record

• DateTime, UUID, LatLon, Duration, Status, etc…

• Drop Call Emerging HotSpot

• Street Traffic Condition

• Massive Spatial Join million x million polygon

• Overlay with Demographic polygons

• Overlay with Current Weather

• Overlay with Social Media

@mraad 2016

TELEMATICS• Feedback to Engineering

• Car to Car Communication

• Street Condition Detection

• Best Route Prediction (EV)

• Overlay with weather

@mraad 2016

Insurance as you drive !

@mraad 2016

Observations

@mraad 2016

‹#›A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam

Title

Frank Cremer (Geomatik) Mansour Raad (ESRI)

A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam

Hadoop Summit, Brussels, 15 April 2015

‹#›A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam

Image

Access information in three clicks

‹#›A Hadoop-enabled Ship Tracking Application for the Port of Rotterdam, Hadoop Summit Brussels, 15 April 2015 © Copyright - Port of Rotterdam

Text

Usage of ship position data

• Harbour master • Incident analysis • Safety checks

• Capacity management • Identifying bottlenecks • Planning decision support

• Environmental management • Pollution (NOx) calculations • Speed measures to reduce pollutions

?

@mraad 2016

DdΔ

(Lat,lon)

WhereisΔ≃0?

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

Selected'pickups' Corresponding'drop2offs' Density'of'passenger'drop2offs'

Turtle'Bay'–'UN'

@mraad 2016

Shu$le'Loca,ons'!!!!!!!!!!Pickup!!!!!!!!!!!Drop+off!

Mission: Fast Dynamic Segmentation for Linear Referencing

Segment (L-M) has Green, Blue, Red attributes

Event From Mile M to P

Mile M Mile PMile L

Step 1 - Clean and Bulk Load Roads

Roads.gdb Features

ElasticSearch { roadid:”state_code_road_id”, shape: “wkt-string” }

BulkLoad Feature - {json}

Step 2 - New Dyn-Seg GeoProcess

10M Eventsyear statecode routeid mile1 mile2 key val

Parallel Distributed

Share-Nothing “GeoProcessing”

Clean Roads ElasticSearchIndex Every Document/Field { roadid=“xxx” wkt=“multilinestring(((x y,…) IRI=1 F_SYSTEM=2 …. }

{json}

Step 3 - Query and Display

ArcPy

ArcGISElasticSearch

DataCenters

ArcGIS → Nerdalize

sshd

ElasticSearch DynSeg GP

nerdalize.comArcPy

@mraad 2016

YOU CAN DO IT TOO !

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

~180 million entries - small :-)

@mraad 2016

IMPORT BY TIME/SPACE

*.csv MRImport hdfs://…/yyyy/mm/dd/hh/uuid.csv

@mraad 2016

create external table if not exists trips (pickupdatetime string,dropoffdatetime string,pickupx double,pickupy double,dropoffx double,dropoffy double,passengercount int,triptime int,tripdist double,rc25 string,rc50 string,rc100 string,rc200 string) partitioned by (year int, month int, day int, hour int)row format delimitedfields terminated by '\t'lines terminated by '\n'stored as textfile;

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

@mraad 2016

STATISTICAL SIGNIFICANCE ?

@mraad 2016

HOTSPOT ANALYSIS

http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000010000000

@mraad 2016

PROCESSING EVOLUTION

Transaction - Batch

Operational - Dashboard

Analytics - Exploration

Intelligent - Realtime / Predictive

@mraad 2016

WHAT IS NEXT ?• In Memory

• Native Spatial Index In NoSQL DB

• Native Spatial Types (Point, Line, …)

• Out-of-the-box Spatial Operators / Operations

• Distributed/Disconnected GPU Integration

• Visualization via Gamification

@mraad 2016

Mansour Raad thunderheadxpler.blogspot.com

[email protected] @mraad

Q&A