google cloud and data pipeline patterns

28
1 Google Cloud & Data Pipeline Patterns @LynnLangit

Upload: lynn-langit

Post on 15-Feb-2017

57 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Google Cloud and Data Pipeline Patterns

1

Google Cloud & Data Pipeline Patterns@LynnLangit

Page 2: Google Cloud and Data Pipeline Patterns

2

Google Cloud in AustraliaData center here in 2017

Page 3: Google Cloud and Data Pipeline Patterns

3

GCP and Patterns

Developer-first• Fast, flexible and cheap• Virtual Machines / GCE• Storage / GCS

Servers ➡ Containers ➡ Functions• Data Warehouse• Internet of Things (IoT)• Bioinformatics

1. Modern Cloud by Example

2. GCP Data Pipeline Patterns

Page 4: Google Cloud and Data Pipeline Patterns

4Confidential & ProprietaryGoogle Cloud Platform 4

Demo – Storage / GCS

Page 5: Google Cloud and Data Pipeline Patterns

5

Page 6: Google Cloud and Data Pipeline Patterns

6Confidential & ProprietaryGoogle Cloud Platform 6

Demo – Virtual Machines / GCE

Page 7: Google Cloud and Data Pipeline Patterns

7

Virtual Machines / GCE• Fast

• Spin up in seconds• Tools - SSH, gcloud console

• Flexible• Custom sizing – slider • OS variety – Linux or Windows

• Cheap and Simple• Auto discount for use• Pre-emptible

Storage / GCS• Fast

• Very fast within region• Tools included

• Flexible• 4 storage options• Simple to use /

understand• Cheap

• Pricing by type

Page 8: Google Cloud and Data Pipeline Patterns

8

Pipeline Architectures

Page 9: Google Cloud and Data Pipeline Patterns

9Google Cloud Platform 9

Data Warehousing

Page 10: Google Cloud and Data Pipeline Patterns

10

Big Data > Data Warehouse

Reference tableQuery / ComputeBigQuery

Customer Lists / Reference Data

Export Ad DataCloud Storage

Id matchingCloud Dataflow

Marketing List

DoubleClickCampaign Manager

Google Analytics

Relevant UsersCloud Storage

AnalystsDataStudio 360Dashboards

Page 11: Google Cloud and Data Pipeline Patterns

11Confidential & ProprietaryGoogle Cloud Platform 11

Demo – BigQuery

Page 12: Google Cloud and Data Pipeline Patterns

12

Batch

Streaming

Big Data > Log Processing

Log StorageCloud Storage

Log StreamingCloud Pub/Sub

Log AnalyticsBigQuery

Log ProcessingCloud Dataflow

Page 13: Google Cloud and Data Pipeline Patterns

13

Big Data > Time Series Analysis

Batch

StorageBigQuery

StorageCloud Storage

Time Series ProcessingCloud Dataflow

AnalysisCloud Datalab

StorageCloud Bigtable*

ProcessingCloud Dataproc

Time Series FilesCloud Storage

MLCloud ML

Streaming Time Series

StreamingCloud Pub/Sub

*Note: Use Bigtable with NoSQL workloads of 1 TB or more

Page 14: Google Cloud and Data Pipeline Patterns

14

Streaming

Big Data > Complex Event Processing

Cloud AppsCompute Engine

Streaming

Batch

Push to DevicesApp Engine

Rules EngineCloud Dataflow

Data AnalysisCloud Datalab

Mobile DevicesPush Notifications

Report & ShareBusiness Analysis

Cloud AppsCompute Engine

On-PremisesDatabases

On-PremisesApplications

Processed EventsCloud BigtableEvents Time Series

Data WarehouseBigQueryExecution Results

StreamingCloud Pub/SubTransactions

ProcessingCloud DataflowTransaction Streams

MessagingCloud Pub/SubRules Actions

ETLCloud DataflowTransform Data

Cloud DataCloud Storage

Rules EngineCloud Dataproc

Page 15: Google Cloud and Data Pipeline Patterns

1515

Files• Cloud Storage

Compute• Big Query• Cloud Dataflow

Other• 3rd party ETL• 3rd party dashboards

Core Products for Data Warehousing

More on Big Query…• Interactive or Batch query• ANSI SQL compliant• Cost control - Purchase

‘slots’• NoOps Data Warehouse

Page 16: Google Cloud and Data Pipeline Patterns

16Google Cloud Platform 16

Internet of Things

Page 17: Google Cloud and Data Pipeline Patterns

17

Internet of Things > MQTT

IoT WarehouseBigQuery

IoT ApplicationApp Engine

Stream AnalyticsCloud Dataflow

IoT TopicCloud Pub/Sub

MQTTDevices

Auto-scaled Broker TierCustom MQTT broker MQTT Broker

Compute Engine

RabbitMQ

Cloud LoadBalancing

Page 18: Google Cloud and Data Pipeline Patterns

18

Ingest Pipelines

Storage

Analytics

Application &Presentation

StandardDevicesHTTPS

ConstrainedDevicesNon-TCPe.g. BLE

Gateway

Internet of Things > Sensor stream ingest and processing

AppEngineContainerEngine

CloudStorage

CloudPub/Sub

CloudDataflow

Monitoring

Logging

CloudDataflow

CloudDatastoreCloudBigtable

BigQuery

CloudDataprocCloudDatalab

ComputeEngine

Page 19: Google Cloud and Data Pipeline Patterns

19

Retail > Beacons and Targeted Marketing

EventsCloud BigtableProximity Events

AnalyticsBigQueryData Warehouse

MessagingCloud Pub/SubProximity Streams

ProcessingCloud DataflowStream Processing

NotificationsApp EnginePush to Devices

Mobile-Push Notifications

Office Business Systems

BeaconsProximity Notifications

MessagingCloud Pub/SubQueued Notifications

Page 20: Google Cloud and Data Pipeline Patterns

2020

Files & Storage• Cloud Storage• Big Table

Compute & Ingest• Cloud Pub/Sub• Big Query• Cloud Dataflow

Core Products for IoT

Page 21: Google Cloud and Data Pipeline Patterns

21Confidential & ProprietaryGoogle Cloud Platform 21

Demo – Machine Learning

Page 22: Google Cloud and Data Pipeline Patterns

22Google Cloud Platform 22

Bioinformatics

Page 23: Google Cloud and Data Pipeline Patterns

23

Patient

Analytics

Life Sciences > Patient Monitoring

Analytics Process

DataPrediction API

IngestCloud Pub/Sub

StorageCloud Bigtable

AlertsNotificationsCloud Pub/Sub

Health CareProfessional

Patient Monitors(pulse, bloodsugar, exercise)

Page 24: Google Cloud and Data Pipeline Patterns

24

Private Datasets

Public Datasets

Life Sciences > Variant Analysis

MSSNG AutismCloud Storage

Scientist

HighThroughputGenomeSequencers

1000 GenomesCloud Storage

Patient DataCloud Storage

Illumina PlatformCloud Storage

Ref GenomesCloud Storage

TCGACloud Storage

Analytics

Online AnalyticsBigQuery

Batch AnalyticsCloud Dataflow

Lab NotebooksCloud Datalab

Data IngestGenomics

BAMFASTQ

Page 25: Google Cloud and Data Pipeline Patterns

25

Ingest

Elastic Cluster

Storage

Analytics

Life Sciences > Genomics, Secondary Analysis

CarrierInterconnect

HighThroughputGenomeSequencers

Scientist

Raw DatafilesCloud Storage

Processed DataCloud Storage

MetadataCloud SQL

Lab notebooksCloud Datalab

HPC ClusterCompute Engine10 Nodes

Ingest ServerCompute Engine

Online AnalyticsBigQuery

Cloud LoadBalancing

CloudNetwork

Page 26: Google Cloud and Data Pipeline Patterns

2626

• Cloud Storage• Big Query• Compute Engine• Cloud Dataflow• Public datasets on GCP

Core Products for Bioinformatics

Page 27: Google Cloud and Data Pipeline Patterns

27

Page 28: Google Cloud and Data Pipeline Patterns

28

“The Future is Functional” - @LynnLangit