iot ny - google cloud services for iot

42
IoT NY - Cloud services for IoT James Chittenden Google Cloud Platform Solutions Engineer [email protected]

Upload: james-chittenden

Post on 15-Jan-2017

680 views

Category:

Technology


0 download

TRANSCRIPT

IoT NY - Cloud services for IoTJames Chittenden Google Cloud Platform Solutions [email protected]

+James Chittenden(Big Data Cloud Engineer)

[email protected]

Google confidential │ Do not distribute

Agenda

Big Data the Cloud Way - Why would you ?

Fully Managed: NoOps Ingest, Process & Analyse

Hands On Demo: Building an Event Streaming Pipeline

1

2

3

Big Data at Googleaka. Data at Google

20-?? BILLION devices will be

connected by 2020

$4-11 TrillionEconomic Impact

54% of top performer companies will invest

more in sensors this yr

Sources: Gartner, PwC, McKinsey

20-?? BILLION devices will be

connected by 2020

$4-11 TrillionEconomic Impact

54% of top performer companies will invest

more in sensors this yr

Sources: Gartner, PwC, McKinsey

What is IoT?IoT is a period of transformation

Phone IoT Phone

Wearables

Watches

Phones

Cars

Home Appliances

Existing Business Owned Equipment

Connected

IoT is a transition to connected

Not Connected

Back in the 70s ….

The PC

The Result

A datacenter is not a collection of computers,a datacenter is a computer.

The same is happening in the Cloud today

State of the art Data Centers.

For the past 17 years, Google has been building out the world’s fastest, most powerful, highest quality cloud

infrastructure on the planet.

2002 2004 2006 2008 2010 2012

Dremel ColossusMapReduce

GFS Bigtable Spanner

2014

Dataflow

Google’s Big Data Innovations go far back Flumejava

BigQuery

Millwheel

Bigtable

Extends the Android platform to IoT devices

Weave - IoT Protocol and Schema

Google Glass at Work

Nest - solutions for the connected home

Health and Wearables

Confidential & ProprietaryGoogle Cloud Platform 21

Management

Mobile

Services

Compute

Big Data

Networking

Storage

Developer Tools

Fully Managed:NoOps Ingest, Process & Analyze

Store

Cloud Storage Cloud SQL Cloud

Datastore

Capture Analyze

BigQuery

Process

DataflowCloud Storage

DatastoreCloud SQL

Hadoop/Spark Kafka

Pub/Sub

Hadoop/Spark

Manage the Entire Lifecycle of Big Data

Dataflow

BigQuery

Fast ETLRegexJSONUDFs

Spreadsheets

BI Tools

Coworkers

Applications + Reports PubSub

Cloud Storage

BigTable

Your Data

GCS-Hadoop Connector

Hadoop on Compute Engine Cloud Dataproc

unmanaged managed

Big Data Architecture with Google managed services

Building what’s next 25

Scales automatically

No setup or administration

Stream up to 100,000 rows p/sec

Easily integrates with third-party software

Google BigQuerymakes complex data analysis simple

Question:Find root cause why ad was or was not delivered in the last 30 days.

select date, rejection_reason, count(*)from line_item_table.last30dayswhere line_item_id=56781234

1.2B Rows scanned Result in ~5 seconds!

BigQuery Use @Google: DoubleClick Support

BigQuery scales “Google scale”

Streaming ingest at peak

Largest Data Lake on BigQuery

Largest query by data size

Largest query by rows 10.5 Trillion rows

2.3 Million rows per second

38 Petabytes

2.1 Petabytes

What is BigQuery?

Externalization of Google Dremel

Convenience of SQL

Petabyte-Scale and Fast

Fully Managed, No-Ops Data Warehouse

Building what’s next 29

Merges batch and stream processing

Data processing pipelines

Monitoring interface

Significantly lower cost

Runs on Google or Cloudera Spark (Github)

Google Cloud Dataflowmakes complex data analysis simple

What is Cloud Dataflow?

Cloud Dataflow is a collection of SDKs for

building batch or streaming parallelized

data processing pipelines.

Cloud Dataflow is a fully managed service for executing optimized

parallelized data processing pipelines.

Cloud Pub/Sub

• Globally redundant• Low latency (sub sec.)• Batched read/write• Custom labels• Push & Pull• Auto expiration

Publisher A Publisher B Publisher C

Message 1

Topic A Topic B Topic C

Subscription XA Subscription XB Subscription YC

Subscription ZC

Cloud Pub/Sub

Subscriber X Subscriber Y

Message 2 Message 3

Subscriber Z

Message 1

Message 2

Message 3

Message 3

Dataflow goodies

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Pipeline p = Pipeline.create();

p.begin()

.apply(TextIO.Read.from(“gs://…”))

.apply(ParDo.of(new ExtractTags())

.apply(Count.create())

.apply(ParDo.of(new ExpandPrefixes())

.apply(Top.largestPerKey(3))

.apply(TextIO.Write.to(“gs://…”));

p.run();

Dataflow goodies

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Deploy

Schedule & Monitor

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Dataflow goodies

800 RPS 1200 RPS 5000 RPS 50 RPS

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Dataflow goodies

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Dataflow goodies

Pipeline p = Pipeline.create();

p.begin()

.apply(TextIO.Read.from(“gs://…”))

.apply(ParDo.of(new ExtractTags())

.apply(Count.create())

.apply(ParDo.of(new ExpandPrefixes())

.apply(Top.largestPerKey(3))

.apply(TextIO.Write.to(“gs://…”));

p.run();

.apply(PubsubIO.Read.from(“input_topic”))

.apply(Window.<Integer>by(FixedWindows.of(5, MINUTES))

.apply(PubsubIO.Write.to(“output_topic”));

Autoscaling mid-job

Fully managed - No-Ops

Intuitive Data Processing Framework

Batch and Stream Processing in one

Liquid sharding mid-job

1

2

3

4

5

Dataflow goodies

Nighttime Mid-Day Nighttime

Demo Time

Pub/Sub

Ingest Process Analyse

Cloud Dataflow BigQuery

Git: https://github.com/james-google/event-streams-dataflow

Demo Time

Pub/Sub

Ingest Process Analyse

Cloud Dataflow BigQuery

Git: https://github.com/james-google/event-streams-dataflow

Questions?

Thank You

James [email protected]