the journey of moving from aws elk to gcp data pipeline

18
Build DMP on top of GCP VMFive - Randy Huang

Upload: randy-huang

Post on 14-Apr-2017

483 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: The journey of Moving from AWS ELK to GCP Data Pipeline

Build DMP on top of GCP

VMFive - Randy Huang

Page 2: The journey of Moving from AWS ELK to GCP Data Pipeline

Agenda

• Migrated Pipeline to GCP

• Cost Comparison

• Business Use Case

• Fluentd Demo

Page 3: The journey of Moving from AWS ELK to GCP Data Pipeline

ELK + AWS EMR

Kinesis Lambda

Page 4: The journey of Moving from AWS ELK to GCP Data Pipeline

Pros & Cons• Pros :

• Well Support.

• Well docs.

• Easy to find Reference.

• Cons :

• High Cost.

• Not open source.

• Have to set the scale at first.

Page 5: The journey of Moving from AWS ELK to GCP Data Pipeline

Pipeline on GCP

Dataflow

BigQuery

Machine Learning

Data Visualization

Compute Engine

Global Load Balancing

Page 6: The journey of Moving from AWS ELK to GCP Data Pipeline

Datastudio

Page 7: The journey of Moving from AWS ELK to GCP Data Pipeline

The Products and Services logos may be used to accurately reference Google's technology and tools, for instance in architecture diagrams. 7

Batch

BI Analysis

Storage Cloud Storage

Processing Cloud DataflowStreaming

Time Series Streaming Cloud Pub/Sub

Storage BigQuery

Page 8: The journey of Moving from AWS ELK to GCP Data Pipeline

The Products and Services logos may be used to accurately reference Google's technology and tools, for instance in architecture diagrams. 8

Targeting Engines

Data Sources

Machine Learning Applications

API Backend Compute Engine

Spark MLlib Cloud Dataproc

App Engine

Transform Data

Hosted Models Cloud Machine Learning

Real-Time Prediction API

Device Related Cloud Pub/Sub

Behavior Related Cloud Pub/Sub

3rd Party Data Cloud Pub/Sub

Redis Compute Engine

Page 9: The journey of Moving from AWS ELK to GCP Data Pipeline

Pros & Cons• Pros :

• Cost-effective.

• Operation-effective.

• Google got your back.

• Cons :

• API/SDK changes everyday.

• Some still in beta mode.

• Docs everywhere.

Page 10: The journey of Moving from AWS ELK to GCP Data Pipeline

Workflow Monitoring• Digdag <Airflow/Oozie/Luigi>

• Native support Python & Ruby

• Multi-Cloud

• Modular

• Workflow as code

• Docker Support

• Altering to Slack

Page 11: The journey of Moving from AWS ELK to GCP Data Pipeline

Digdag Sample

Page 12: The journey of Moving from AWS ELK to GCP Data Pipeline

Digdag

Page 13: The journey of Moving from AWS ELK to GCP Data Pipeline
Page 14: The journey of Moving from AWS ELK to GCP Data Pipeline

Cost Comparison

• $2000 on AWS per month

• about $200 on GCP production

• about another $200 for dev

• 50M events per month

Page 15: The journey of Moving from AWS ELK to GCP Data Pipeline

Business Use Case• Digital Ads Targeting

• User Behavior Tagging

• BI

• GEO Reporting

• KPI Reporting

• User Demographic

Page 16: The journey of Moving from AWS ELK to GCP Data Pipeline

Some Tips• BigQuery

• https://status.cloud.google.com/incident/bigquery/18022

• Solved by Fluentd’s Retry and HA

• Dataflow’s SDK & docs is not sync

• Dataflow Sideinput has a bug with Streaming mode

• Compute Engine SLB - TCP/UDP setup for forwarding

Page 17: The journey of Moving from AWS ELK to GCP Data Pipeline

Flunetd Update

• Release note for v0.14

• sub second event flush

• New Plugin APIS support formatting configurations dynamically

(e.g., path /my/dest/${tag}/mydata.%Y-%m-%d.log)

• Secure Forward

Page 18: The journey of Moving from AWS ELK to GCP Data Pipeline

Demo

• Nginx -> Fluentd -> BigQuery -> DataStudio

• MySQL -> Fluentd -> BigQuery