geospatial big data

22
Geospatial Big Data Dr. Fabio Petroni

Upload: fabio-petroni-phd

Post on 10-Apr-2017

164 views

Category:

Engineering


0 download

TRANSCRIPT

Geospatial Big DataDr. Fabio Petroni

2 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  exponential grow in volume of spatio-temporal data •  e.g., total number of foursquare check-ins: ∼8 billion

Motivation

3 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Examples of Geospatial Big Data Analysis

4 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

monitor the evolving sentiment trends over time and over geographies about BREXIT

Case Study

5 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  Scalability -  storing, processing and visualizing large scale spatio-temporal data

three dimensions one dimension (latitude, longitude, time) lexicographical ordering of keys in a table

Challenges

B+ tree

6 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  binary string in which each character indicates alternating divisions of the global longitude-latitude rectangle

Solution: Geohashes

0! 1!

7 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Solution: Geohashes

00! 10!

01! 11!

8 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Solution: Geohashes

0000! 0010 !

0001! 0011!

0100! 0110!

0101! 0111!

1000! 1010!

1001! 1011!

1100! 1110!

1101! 1111!

9 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  z-order traversal of the globe via 4-bit geohashes

Solution: Z-order Traversal

0000! 0010 !

0001! 0011!

0100! 0110!

0101! 0111!

1000! 1010!

1001! 1011!

1100! 1110!

1101! 1111!

10 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  a cluster node holds neighboring data points

Locality Aware Index1

2

3

4

11 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

!

KPMG open source pipeline / stack

HDFS!

Accumulo!

!

!

visualization processing storage

!

!

large-scale data analysis

query and share data

12 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  the GDELT Project monitors the world’s broadcast, print and web news from the entire world •  GDELT Global Knowledge Graph (GKG)

-  hyper-edges → represent news stories -  vertices → represent persons, organizations, locations, etc.

Experiments – GDELT Data

Hillary Clinton!

Donald Trump!

h"p://www.bbc.co.uk/….

Washington,.D.C..

New.York.City.

Tone:.?3.7.

Tone:.+5.1.

London.

h"p://www.nyGmes.com/….

e1! e2!

13 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  ∼200000 data points (news stories)

Experiments - Brexit Dataset

•  First Location •  Date and Time •  URL •  Average Tone

data point 2 October 2016

London (51.509865, -0.118092) #01111010…

-2.76

14 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

GeoServer / OpenLayers Data Visualization

15 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

GeoServer / OpenLayers Heatmap - GeoMesa Plugin

16 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Shiny / Leaflet - Interactive Data Visualization

17 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

1.  project data points on a covering set of polygons 2.  calculate aggregate statistics

•  1010… •  0010…. •  0000…. •  1000…. are these points in Australia? •  1011…. •  1001… •  1100… •  ….

Aggregating Data With Apache Spark

18 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  1010… •  0010…. •  0000…. •  1000…. are these points in Australia? •  1011…. •  1001… •  1100… •  ….

Aggregating Data With Apache Spark

1000! 1010!

1001!

11!

0!1011!

19 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

Average Tone Per Country

0

10000

20000

30000

40000

50000

60000

UK

US

CH

GM

AS

JA EI

BE

NO

PO

FR

CA IN

MX

RS

NL

LG IT SZ

SP

nu

mb

er

of

ne

ws s

tories

0

10000

20000

30000

40000

50000

60000

UK

US

CH

GM

AS

JA EI

BE

NO

PO

FR

CA IN

MX

RS

NL

LG IT SZ

SP

nu

mb

er

of

ne

ws s

tories

News stories per country

POST-BREXIT PRE-BREXIT

OVERALL

20 Document Classification: KPMG Public

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

•  We have presented an architecture for Geo Spatial Big Data storage, processing and visualization

•  Completely open-source!

•  Fast and efficient: few minutes to perform the aggregation on Apache Spark with a few machines AWS cluster

Conclusions

Thank you!

Dr. Fabio Petroni

Document Classification: KPMG Public

The KPMG name, logo and “cutting through complexity” are registered trademarks or trademarks of KPMG International. Designed by CREATE | CRT057939

The information contained herein is of a general nature and is not intended to address the circumstances of any particular individual or entity. Although we endeavour to provide accurate and timely information, there can be no guarantee that such information is accurate as of the date it is received or that it will continue to be accurate in the future. No one should act on such information without appropriate professional advice after a thorough examination of the particular situation.

© 2016 KPMG LLP, a UK limited liability partnership and a member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

kpmg.com/uk