building serverless data infrastructure in the aws cloud

41
Building Serverless Data Infrastructure in the AWS Cloud Ryan Plant @ryan_plant November 10, 2017

Upload: ryan-plant

Post on 21-Jan-2018

226 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Building Serverless Data Infrastructure in the AWS Cloud

Building ServerlessData Infrastructure in the AWS Cloud

Ryan Plant@ryan_plant

November 10, 2017

Page 2: Building Serverless Data Infrastructure in the AWS Cloud

ThankstoourSponsors!Partners

Premier

Marquee:

Prize:

Page 3: Building Serverless Data Infrastructure in the AWS Cloud

Gettheapp!Givefeedback!

Page 4: Building Serverless Data Infrastructure in the AWS Cloud

WHAT WE’LL COVER

The New Data Economy

Reference Architecture

Using the AWS Cloud

Page 5: Building Serverless Data Infrastructure in the AWS Cloud

The world’s most valuable resource is no longer oil, but data…

May 6th, 2017

Page 6: Building Serverless Data Infrastructure in the AWS Cloud

Data => Revenue(but extraction, refinement, packaging, and distribution needed)

Page 7: Building Serverless Data Infrastructure in the AWS Cloud

DW

Traditional Data Warehousing

Volume, variety, and velocity…

Advanced analytics…

Artificial intelligence…

”What got us here won’t (entirely) get us there…”

Mostly proprietary…

Costly and complex to scale…

Page 8: Building Serverless Data Infrastructure in the AWS Cloud

Next Generation Data Infrastructure

(i.e. the “data lake”)

Page 9: Building Serverless Data Infrastructure in the AWS Cloud

James “Data Lake” Dixon

Page 10: Building Serverless Data Infrastructure in the AWS Cloud

If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption –the data lake is a large body of water in a more natural state…

Page 11: Building Serverless Data Infrastructure in the AWS Cloud

From Data Warehouses to Lakes

A data pond, lake, ocean is not a product it’s an architecture…(and architecture is a principled and pattern-oriented approach to building systems)

Any and all data…Any source and format…

Any time…

Page 12: Building Serverless Data Infrastructure in the AWS Cloud

WHAT WE’LL COVER

The New Data Economy

Reference Architecture

Using the AWS Cloud

Page 13: Building Serverless Data Infrastructure in the AWS Cloud

APPS & SOURCES

STORAGE AND PROCESSING LAYER

SERVING LAYER

Storage

Catalog

ProcessingAnalytics

& Artificial

IntelligenceIngestion

Models & Marts

DATA OPS

API

Search Security

Config

Telemetry

Cost Mgmt

Page 14: Building Serverless Data Infrastructure in the AWS Cloud

DATA OPS

Security

Config

Telemetry

Cost Mgmt

SERVING LAYER

Models & Marts

API

Search

APPS & SOURCES

STORAGE AND PROCESSING LAYER

StorageIngestion

Catalog

ProcessingAnalytics

& Artificial

Intelligence

Page 15: Building Serverless Data Infrastructure in the AWS Cloud

Data Ingestion Pipelines

SERVICESERVICE

SERVICE

MONOLITHMONOLITH

MONOLITH Change Data Capture(CDC)

STREAMS

MESSAGING

FILE EXTRACTS

STORAGE

source data aggregated, stored indefinitelymany supported formats

append

append

PUT

Securitysegregation & encryption

Page 16: Building Serverless Data Infrastructure in the AWS Cloud

Storage and Catalog

STORAGE

RAW REFINED

Catalog

• Register source and schema• Data attribute inventory• Relationships and dependencies• Etc…

dataIngestion

Page 17: Building Serverless Data Infrastructure in the AWS Cloud

Catalog

Raw to Refined Processing Pipelines

STORAGE

RAW REFINED

Processing Pipelines

dataIngestion

C1 C2 C3 C..n

• Preserve RAW data; enrich only• Apply transforms to create new, REFINED

datasets (e.g. customer partitioned views)• Catalog new datasets• Enable new use cases:

• Reporting/Analytical views• Machine/Deep Learning

X Y ZALL DATA

Page 18: Building Serverless Data Infrastructure in the AWS Cloud

Processing Pipelines

Catalog

Analytics and AI

STORAGE

RAW REFINED

dataIngestion

Analytics and Artificial Intelligence

C1 C2 C3 C..nALL DATAX Y Z

… … …

Page 19: Building Serverless Data Infrastructure in the AWS Cloud

DATA OPS

Security

Config

Telemetry

Cost Mgmt

APPS & SOURCES

STORAGE AND PROCESSING LAYER

StorageIngestion

Catalog

ProcessingAnalytics

& Artificial

Intelligence

SERVING LAYER

Models & Marts

API

Search

Page 20: Building Serverless Data Infrastructure in the AWS Cloud

Processing Pipelines

Catalog

Curation and Serving

STORAGE

RAW REFINED

dataIngestion

Analytics and Artificial Intelligence

C1 C2 C3 C..nALL DATAX Y Z

Models and Marts

… … …

Search

… … …

Page 21: Building Serverless Data Infrastructure in the AWS Cloud

Processing Pipelines

Catalog

STORAGE

RAW REFINED

dataIngestion

Analytics and Artificial Intelligence

C1 C2 C3 C..nALL DATAX Y Z

Models and Marts

… … …

Search

… … …

API

Page 22: Building Serverless Data Infrastructure in the AWS Cloud

APPS & SOURCES

STORAGE AND PROCESSING LAYER

SERVING LAYER

Storage

Catalog

ProcessingAnalytics

& Artificial

IntelligenceIngestion

Models & Marts

DATA OPS

API

Search Security

Config

Telemetry

Cost Mgmt

Page 23: Building Serverless Data Infrastructure in the AWS Cloud

WHAT WE’LL COVER

The New Data Economy

Reference Architecture

Using the AWS Cloud

Page 24: Building Serverless Data Infrastructure in the AWS Cloud

Lots of software, hardware, etc.

Page 25: Building Serverless Data Infrastructure in the AWS Cloud

TRADITIONAL INVESTMENT IN NEXT GENERATION DATA

Page 26: Building Serverless Data Infrastructure in the AWS Cloud

CAPITAL AND RISK BARRIERS

acquire/write and maintain software

procure, install, and maintain hardware

get commercial real estate license

Page 27: Building Serverless Data Infrastructure in the AWS Cloud
Page 28: Building Serverless Data Infrastructure in the AWS Cloud

PUBLIC CLOUD ECONOMIES OF SCALE

Page 29: Building Serverless Data Infrastructure in the AWS Cloud

CLOUD OPTIMIZATION

Infrastructure as a ServiceSomeone else’s hardware and real estate

Your software, your (virtual) servers

Platform as a ServiceSomeone else’s software, servers, hardware and real estate

Your custom application software

Software as a ServiceSomeone else’s application software, you provide the data

(everything else doesn’t matter)

Cycle TimeCapital OptimizationDifferentiation Focus

High

Higher

Highest

Page 30: Building Serverless Data Infrastructure in the AWS Cloud

Go Serverless!(as much as possible)

Page 31: Building Serverless Data Infrastructure in the AWS Cloud

everything is an event: messages, log entries, file I/Os, clock alarms, etc.listen for events: trigger a handler with an eventstateless event handling: avoid state, persist as event source, handoff as soon as possibleautomation through orchestration and coordination

Principles for event-driven, reactive data infrastructure primed for serverless architectures

Page 32: Building Serverless Data Infrastructure in the AWS Cloud

StorageIngestion

SQS

SNS

Kinesis

DynamoDB/RDS

event triggers y = f (x)

y = f (x, y)

y = f ([x, y])

event handlers

AWS Glacier(archival)

/{source}-raw/{key}/YYYY-MM-DD/{source}-refined/{key}/YYYY-MM-DD

AWS Lambda AWS S3(ready)

KMS(encryption) lifecycle policies

IAM + Directory(access control)

CloudWatch/Trail

to S3 direct

AWS Step Functions(coordinated state)

Page 33: Building Serverless Data Infrastructure in the AWS Cloud

Catalog

StorageSources

Ingestion

AWS Glue(serverless ETL/ELT)

source crawlers

metadata

classifier

classifierdoSomething(…) {…} trigger

Processing Pipelines

jobs and job runner

To Targets

Page 34: Building Serverless Data Infrastructure in the AWS Cloud

Catalog

Storage

Sources &

Targets Ingestion

Processing Pipelines

AWS Glue(serverless ETL/ELT)

AWS EMR(Managed Hadoop)

Streaming

Kinesis

Batch

AWS Batch

Targets &

SourcesIngestion

Serving Layer

Page 35: Building Serverless Data Infrastructure in the AWS Cloud

Catalog

Storage

Processing Pipelines

AWS Glue(serverless ETL/ELT)

Serving Layer

AWS ElasticSearch(managed ES)

AWS RedShiftSpectrum

(Parallel DW)

SourcesIngestion

AWS Athena(Ad-hoc Query)

Page 36: Building Serverless Data Infrastructure in the AWS Cloud

Catalog

Storage

Processing Pipelines

Serving Layer

SourcesIngestion

AWS API Gateway(serverless APIs)

AWS QuickSight(visualization)

AWS Cognito(Web/Mobile Identity and SSO)

Page 37: Building Serverless Data Infrastructure in the AWS Cloud

WHAT WE’LL COVER

The New Data Economy

Reference Architecture

Using the AWS Cloud

Page 38: Building Serverless Data Infrastructure in the AWS Cloud

CLOUD OPTIMIZATION

Infrastructure as a ServiceSomeone else’s hardware and real estate

Your software, your (virtual) servers

Platform as a ServiceSomeone else’s software, servers, hardware and real estate

Your custom application software

Software as a ServiceSomeone else’s application software, you provide the data

(everything else doesn’t matter)

Cycle TimeCapital OptimizationDifferentiation Focus

High

Higher

Highest

Page 39: Building Serverless Data Infrastructure in the AWS Cloud

CLOUD OPTIMIZATION

Infrastructure as a ServiceSomeone else’s hardware and real estate

Your software, your (virtual) servers

Platform as a ServiceSomeone else’s software, servers, hardware and real estate

Your custom application software

Software as a ServiceSomeone else’s application software, you provide the data

(everything else doesn’t matter)

You are likely here…

Aim here…

TBD

Opportunity!

Public Cloud R&D Investment

Page 40: Building Serverless Data Infrastructure in the AWS Cloud

SERVERLESS: USE CAUTION

The floor is wet (and is constantly getting mopped!)

The edges are sharp:• Development, Test, Debug tools and experience• Configuration and Deployment challenges• Variable, non-deterministic performance

Extremely new (but inevitable) paradigm…

Page 41: Building Serverless Data Infrastructure in the AWS Cloud