coping with iot data · 7/19/2016  · coping with iot data on google cloud platform. jen tong...

98
Jen Tong Developer Advocate Coping with IoT Data On Google Cloud Platform

Upload: others

Post on 12-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Jen TongDeveloper Advocate

Coping with IoT DataOn Google Cloud Platform

Page 2: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Jen Tong

Developer AdvocateGoogle Cloud Platform

@MimmingCodesmimming.com

Page 3: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Agenda

● IoT Data Challenges● A use case● A recipe● Demos

○ Simulate the IoT○ Capture with Pub/Sub○ Wrangle with Dataflow○ Analyze with BigQuery

Page 4: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 4

About you

● Electrical engineers?● Web developers?● Data scientists?● Mechanical engineers?● Not engineers at all?

Page 5: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 5Photo credit: Matt Chan

Data

photo credit - taniwha on flickr

Page 6: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 6

Page 7: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 7Photo credit: Matt Chan

Data

photo credit - wemake_cc on flickr

Page 8: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 8

Data

Page 9: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 9

Big data

Page 10: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 10

Page 11: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 11

Page 12: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 12

Google Research Publications

Page 13: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 13

Google Research Publications

Page 14: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 14

Open Source Implementations

Bigtable

Flume

Dremel

Page 15: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 15

Managed Cloud Versions

Bigtable

Flume

Dremel

Bigtable

Dataflow

BigQuery

Page 16: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 16

Coping with big data

Page 17: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 17

Big data

Page 18: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 18

Really big data

TuesdayWednesday

Thursday

Page 19: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 19

Infinite data

9:008:00 14:0013:0012:0011:0010:002:001:00 7:006:005:004:003:00

Page 20: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 20

Delayed data

9:008:00 14:0013:0012:0011:0010:00

8:00

8:008:00

8:00

Page 21: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 21

Batch Patterns: Creating Structured Data

MapReduce

Page 22: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 22

Batch Patterns: Repetitive Runs

MapReduce

TuesdayWednesday

Thursday

Page 23: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 23

Batch Patterns: Time Based Windows

MapReduce

Tuesday [11:00 - 12:00)

[12:00 - 13:00)

[13:00 - 14:00)

[14:00 - 15:00)

[15:00 - 16:00)

[16:00 - 17:00)

[18:00 - 19:00)

[19:00 - 20:00)

[21:00 - 22:00)

[22:00 - 23:00)

[23:00 - 0:00)

Page 24: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 24

Batch Patterns: Sessions

MapReduce

TuesdayWednesday

Jose

Lisa

Ingo

Asha

Cheryl

Ari

WednesdayTuesday

Page 25: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 25

Streaming Patterns: Element-wise transformations

13:00 14:008:00 9:00 10:00 11:00 12:00 Processing Time

Page 26: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 26

Streaming Patterns: Aggregating Time Based Windows

13:00 14:008:00 9:00 10:00 11:00 12:00 Processing Time

Page 27: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 27

Delayed data

9:008:00 14:0013:0012:0011:0010:00

8:00

8:008:00

8:00

Page 28: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 28

Streaming Patterns: Event Time Based Windows

11:0010:00 15:0014:0013:0012:00Event Time

11:0010:00 15:0014:0013:0012:00Processing Time

Input

Output

Page 29: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 29

Streaming Patterns: Session Windows

Event Time

Processing Time 11:0010:00 15:0014:0013:0012:00

11:0010:00 15:0014:0013:0012:00

Input

Output

Page 30: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 30

The use case

Page 31: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 31

The game

Page 32: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 32

Page 33: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 33

Page 34: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 34

Problem scope

● Intermittent connectivity● Inconsistent data delivery timing● Large, endless stream of data● Multiple input and output streams● Bursts of activity● Integrate and synchronize multiple event streams

Page 35: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 35

Solution requirements

● Keep up with the event streams● Respond in real-time● Scale up and down with demand● Process data once● Accommodate late-arriving data● Detect anomalies

Page 36: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 36

A recipe

Page 37: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

The recipe

Data

Page 38: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

The recipe

Pub/SubData

Page 39: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

The recipe

Pub/Sub DataflowData

Page 40: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

The recipe

Pub/Sub Dataflow BigQueryData

Page 41: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 41

Cloud Pub/Sub

Page 42: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 42

How Pub/Sub works

Topics Subscriptions Subscribers

Push

Pull

Push

Page 43: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 43

OSS alternative

Page 44: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 44

Cloud Pub/Sub features

● Asynchronous messaging● Many-to-many● Push and pull● At-least-once message delivery● REST/JSON API

Page 45: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 45

Nonfunctional stuff

● Globally available● Automatic scaling● Replicated storage● Encrypted on the wire and at rest

Page 46: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 46

Demo time!

Pub/Sub injector

Page 47: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 47

Cloud Dataflow

Page 48: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 48

Dataflow Pipelines

Data sources

Page 49: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 49

Dataflow Pipelines

Pipeline Steps

Page 50: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 50

Dataflow Pipelines

Destinations

Page 51: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 51

Dataflow Pipelines

Page 52: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 52

OSS alternative

Page 53: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 53

Features

● Unified model for streaming and batch analysis● Once-and-only-once input element processing● Autoscaling● Toolkit of complex transforms● Support for event-time stream processing

○ Handles late data● Session windowing● Real-time analytics, dashboard, and alerts

Page 54: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 54

What it's good at

• Filtering

• Transformation

• Movement

• Extract insights

• Batch

• Continuous

AnalysisETL

Page 55: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 55

Demo time!

Start a pipeline

Page 56: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 56

Google BigQuery

Page 57: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 57

OSS alternative

Page 58: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 58

BigQuery

● Scales flat to petabytes● SQL dialect● User defined functions● REST, Web UI, ODBC● 1TB free each month

Page 59: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 59

BigQuery

● Scales flat to petabytes● SQL dialect● User defined functions● REST, Web UI, ODBC● 1TB free each month

Page 60: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 60

Off topic demo!

Count stuff

Page 61: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 61

SELECT count(word)FROM publicdata:samples.shakespeare

Words in Shakespeare

Page 62: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 62

SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_20150212_01]

Wikipedia hits over 1 hour

Page 63: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 63

SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_201505]

Wikipedia hits over 1 month

Page 64: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 64

Several years of Wikipedia data

SELECT sum(requests) as totalFROM [fh-bigquery:wikipedia.pagecounts_201105], [fh-bigquery:wikipedia.pagecounts_201106], [fh-bigquery:wikipedia.pagecounts_201107],

...

Page 65: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 65

SELECT SUM(requests) AS totalFROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")')

Several years of Wikipedia data

Page 66: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 66

How about a RegExp

SELECT SUM(requests) AS totalFROM TABLE_QUERY( [fh-bigquery:wikipedia], 'REGEXP_MATCH( table_id, r"pagecounts_2015[0-9]{2}$")')WHERE (REGEXP_MATCH(title, '.*[dD]inosaur.*'))

Page 67: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 67

Demo time!

BigQuery

Page 68: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 68

Wrap up

Page 69: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 69

Big data

Page 70: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 70Photo credit: Matt Chan

Data

photo credit - wemake_cc on flickr

Page 71: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 71Photo credit: Matt Chan

Data

photo credit - taniwha on flickr

Page 72: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Thank you!

Jen TongDeveloper AdvocateGoogle Cloud Platform@MimmingCodes

Free trial: cloud.google.com

Slides:mimming.com/presos

Page 73: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 73

Page 74: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 74

Bonus stuff

Page 75: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 75

Dataflow

Ozmg code

Page 76: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 76

PCollections -- pipeline collections

● Collection of data in a pipeline

● Bounded or unbounded in size

{Seahawks, NFC, Champions, ...}

{..., “NFC Champions #Seahawks”, “Seahawks third #superbowl!”, ... “Je suis #12thMan”, “#GoHawks”, ...}

Page 77: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 77

ParDo -- Parallel Do transformation

● Process each PCollection element independently using a user-provided DoFn

● Both map and reduce phases in Hadoop.

{Seahawks, NFC, Champions, ...}

{KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, ...}

Key by initial letter

Page 78: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 78

ParDo example

{Seahawks, NFC, Champions, ...}

Lowercase

PCollection<String> tweets = …;

tweets.apply(ParDo.of(

new DoFn<String, String>() {

@Override

public void processElement(ProcessContext c) {

c.output(c.element().toLowerCase());

}));

{seakawhs, nfc, champions, ...}

Page 79: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 79

GroupByKey

{KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, ...}

{KV<S, Seahawks>, KV<C,Champions>, <KV<S, Seattle>, KV<N, NFC>, ...}

GroupByKey

{KV<S, {Seahawks, Seattle, …}, KV<N, {NFC, …} KV<C, {Champion, …}}

● Gathers all PCollection elements with the same key

● Shuffle phase in Hadoop

Page 80: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 80

GroupByKey & Combine

● Compute the most common value for each key with GroupByKey and DoFn

● DoFn needs to see all of the elements

● Easier to optimize than CombineFn

GroupByKey

{KV<S, Seahawks>, KV<C,Champion>, KV<S, Seattle>, KV<N, NFC>, ...}

{KV<S, {Seahawks, Seattle, …}>, KV<N, {NFC, …}>,

KV<C, {Champion, …}>}

Combine.groupedValues(TopFn)

{KV<S, Seahawks>, KV<N, NFC>,

KV<C, Champion>}

Page 81: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 81

Windows

● Divide or group elements of a PCollection into windows○ Fixed Windows: hourly, daily, …○ Sliding Windows○ Sessions

● Required for GroupByKey transforms on an unbounded PCollection

Nighttime Mid-Day Nighttime

Page 82: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 82

Composite PTransforms

● Build new PTransforms from existing transforms

● Some utilities are included in the SDK:○ Count, RemoveDuplicates,

Join, Min, Max, Sum… ● Define your own

○ DoSomething, DoSomethingElse...● Why bother?

○ Code reuse○ Easy to monitor

GroupByKey

Pair With Ones

Sum Values Count

Page 83: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 83

How BigQuery works

Page 84: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 84

Qualities of a good RDBMS

Page 85: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 85

Qualities of a good RDBMS

● Inserts & locking● Indexing● Cache● Query planning

Page 86: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 86

Qualities of a good RDBMS

● Inserts & locking● Indexing● Cache● Query planning

Page 87: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 87

Page 88: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 88

Page 89: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 89

Page 90: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 90

Storing data

-- -- -- ---- -- -- ---- -- -- --

Table

Columns

Disks

Page 91: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 91

Reading data: Life of a BigQuery

SELECT sum(requests) as sumFROM ( SELECT requests, title FROM [fh-bigquery:wikipedia.pagecounts_201501] WHERE (REGEXP_MATCH(title, '[Jj]en.+')) )

Page 92: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 92

Life of a BigQuery

L L

MMixer

Leaf

Storage

Page 93: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 93

L L L L

M M

M

Life of a BigQuery

Root Mixer

Mixer

Leaf

Storage

Page 94: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 94

Life of a BigQueryQuery

L L L L

M M

MRoot Mixer

Mixer

Leaf

Storage

Page 95: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 95

Life of a BigQueryLife of a BigQuery

L L L L

M M

MRoot Mixer

Mixer

Leaf

StorageSELECT requests, title

Page 96: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 96

Life of a BigQueryLife of a BigQuery

L L L L

M M

MRoot Mixer

Mixer

Leaf

Storage5.4 Bil

SELECT requests, title

WHERE (REGEXP_MATCH(title, '[Jj]en.+'))

Page 97: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 97

Life of a BigQueryLife of a BigQuery

L L L L

M M

MRoot Mixer

Mixer

Leaf

Storage5.4 Bil

SELECT sum(requests)

5.8 MilWHERE (REGEXP_MATCH(title, '[Jj]en.+'))

SELECT requests, title

Page 98: Coping with IoT Data · 7/19/2016  · Coping with IoT Data On Google Cloud Platform. Jen Tong Developer Advocate Google Cloud Platform @MimmingCodes mimming.com. Agenda IoT Data

Confidential & ProprietaryGoogle Cloud Platform 98

Life of a BigQueryLife of a BigQuery

L L L L

M M

MRoot Mixer

Mixer

Leaf

Storage5.4 Bil

SELECT sum(requests)

5.8 MilWHERE (REGEXP_MATCH(title, '[Jj]en.+'))

SELECT requests, title

SELECT sum(requests)