aws summit auckland - building a server-less data lake on aws

81
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sebastien Menant, Enterprise Solutions Architect & Nam Je Cho, Enterprise Solutions Architect, Amazon Web Services Chris Riddell, Senior Software Engineer, Parrot Analytics Building a Server-less Data Lake on AWS Technical 301

Upload: amazon-web-services

Post on 22-Jan-2017

177 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: AWS Summit Auckland - Building a Server-less Data Lake on AWS

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Sebastien Menant, Enterprise Solutions Architect & Nam Je Cho, Enterprise Solutions Architect,

Amazon Web Services

Chris Riddell, Senior Software Engineer, Parrot Analytics

Building a Server-less Data Lake on AWS

Technical 301

Page 2: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Business

101 Technical

201 Technical

301 Technical

401 Technical

Session Depth

Page 3: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Agenda

• What is a Data Lake?

• Why You Need a Data Lake

• Building the Data Lake

• Demo

• Next Steps

Page 4: AWS Summit Auckland - Building a Server-less Data Lake on AWS

What is a Data Lake?

Page 5: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Definition

“A data lake provides massive storage for

any kind of data, enormous processing

power and the ability to handle virtually

limitless concurrent tasks or jobs”

- Wikipedia

Page 6: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Characteristics of a Data Lake

Collect

Everything

Dive in

Anywhere

Flexible

Access

Page 7: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Why You Need a Data Lake

Page 8: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 9: AWS Summit Auckland - Building a Server-less Data Lake on AWS

What About Modern Business Needs?

Page 10: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Big Data… and The Hadoop Ecosystem

Page 11: AWS Summit Auckland - Building a Server-less Data Lake on AWS

But Both are Complementary

Amazon

EMR

Amazon

Redshift

But Both are Complementary

Page 12: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 13: AWS Summit Auckland - Building a Server-less Data Lake on AWS

STORAGECOMPUTE

COMPUTE

COMPUTE

COMPUTE

COMPUTE

COMPUTE

COMPUTECOMPUTE

COMPUTE

Amazon

EMR

Amazon S3

Page 14: AWS Summit Auckland - Building a Server-less Data Lake on AWS

New Business Outcomes and Capabilities

• Enable New Insights in Your Data

• Cost Savings of Compute and Storage

• Use the Right Tool for the Job

• Increase Durability of Data

• Charge Storage Costs to Owner

• Streaming and Real-time Analysis

Retain all your data, for years!

Page 15: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Building the Data Lake

Page 16: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Beware

Page 17: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Building Blocks of the Data Lake

Storage and Ingestion

Catalogue and Search

Security

API and UI

Page 18: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Storage and Ingestion

Storage and

Ingestion

Catalogue and

Search

Security

API and UI

Page 19: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Requirements for Storage

• Multi-year Scalable Storage Capability

• High Durability

• Store Raw Data from Any Input Sources

• Support for Any Data Type

• Low Cost

Page 20: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Amazon S3

1. Highly Scalable and Durable

2. Security and Encryption

3. Lifecycle Management

4. Event Notifications

5. Versioning

Key Services for Storage

Amazon Glacier

1. Long-term Archival Storage

2. Lifecycle Integration with S3

3. Extremely Low-cost

4. Vault Lock

Amazon

S3

Amazon

Glacier

Page 21: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Amazon

S3

Amazon

Glacier

Storage

and

Ingestion

Page 22: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Recommendations #1

• S3 Buckets

• Close to Users and Compute

• Select Region for Regulatory Compliance

• Naming

• Human-readable Path

• Random Hash Prefix for Optimal Partitioning

• Format

• Structured vs Unstructured + Compression

• CSV, Parquet, ORC, JSON, XML, logs, etc

• GZIP for small files, Avro, LZO, Snappy

Page 23: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Recommendations #2

• Optimise

• Store Everything

• Use Large Files with Split-able Format

• Lifecycle Policies for Cost-savings

• Tagging for Cost Allocation

• Security

• Encryption

• Bucket Policies, ACL, Tagging, CloudTrail

Page 24: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Requirements for Ingestion

• Batch File Support

• Traditional ETL

• Streaming Data

• Consumption of any Dataset as a Stream

• Low Latency Analytics

• Replay-ability from the Data Lake

• Server-less ETL Capabilities

Page 25: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Amazon Kinesis Firehose

1. Easy to use with Agent

2. Automatic Elasticity

3. Near Real-time

4. Simultaneous Destinations

Key Services for Ingestion

Amazon Kinesis Streams

1. Enables Custom Processing

2. Continuous Data Collection

3. Real-time

4. API Driven for Custom Apps

Amazon

Kinesis

Streams

Amazon

Kinesis

Firehose

Page 26: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Data

Sources

Data

Sources

Data

Sources

Data

Sources

Data

Sources

S3

DynamoDB

Redshift

Amazon Kinesis

Availability

Zone

Availability

Zone

Availability

Zone

Stream

AWS Lambda

KCL App

EMR

Elasticsearch

Page 27: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Amazon

Glacier

Amazon

Kinesis

Storage

and

IngestionAmazon

S3

Page 28: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Recommendations

• Reminder

• Added Complexity needs Business Justification

• Select the Right Tools

• Real-time Analysis: Apache Spark Streaming, Storm, Flink

• Firehose to Redshift for BI and Dashboards

• Tips

• AWS Lambda for ETL Transformation

• Persist Streams into S3

Page 29: AWS Summit Auckland - Building a Server-less Data Lake on AWS

http://amzn.to/23DWr5O

Page 30: AWS Summit Auckland - Building a Server-less Data Lake on AWS

http://amzn.to/1SRk8wG

Page 31: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Catalogue and Search

Storage and

Ingestion

Catalogue and

Search

Security

API and UI

Page 32: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Requirements for Catalogue and Search

• Metadata Index

• Automated Metadata Processing

• Discovery and Search

• Data Classification

• Server-less and Event-driven

Page 33: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Key Services for Catalogue and Search

1. Server-less

2. Event Driven

3. Auto Scaling

4. Real-time

1. NoSQL

2. Streams

3. Logstash Plugin

1. Deploy Simply

2. Easy Admin

3. Kibana

Amazon

Elasticsearch

Service

Amazon

DynamoDB

AWS

Lambda

Lambda DynamoDB Elasticsearch

Page 34: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Catalogue and Search

AWS

Lambda

Amazon

DynamoDB

Amazon

Elasticsearch

Page 35: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Recommendations

• Tips

• Start Small and Simple… add Capabilities

• File names, size, state, dates, tags, owner

• Region, versions, lineage, relationships

• Search Metadata and Object Content

• Events

• S3 Triggers Lambda

• DynamoDB Streams

• Logstash Plugin to Elasticsearch

Page 36: AWS Summit Auckland - Building a Server-less Data Lake on AWS

http://amzn.to/23E9LUp

Page 37: AWS Summit Auckland - Building a Server-less Data Lake on AWS

http://amzn.to/1TQVBwp

Page 38: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Security

Storage and

Ingestion

Catalogue and

Search

Security

API and UI

Page 39: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Requirements for Security

• Data Encryption at Rest

• Authentication

• Authorisation

Page 40: AWS Summit Auckland - Building a Server-less Data Lake on AWS

AWS IAM

1. Users and Roles

2. Identity Federation

3. Multi Factor Authentication

4. Granular Permissions

Key Services for Security

AWS KMS

1. Seamless Service Integration

2. Extensive Compliance

AWS

IAM

AWS

KMS

AWS

CloudHSM

SSE-S3

Page 41: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Security

AWS

KMS

AWS

IAM

Page 42: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Recommendations

• Start Early

• Security Needs Practice!

• Federate with your Corporate Directory

• Best Practice

• Use CloudTrail and CloudWatch

• Encrypt Where Possible

• Select Bucket Region for Regulatory Compliance

• Tips

• IAM Policies, S3 Versioning and MFA Delete

• Lambda for Data Masking

Page 43: AWS Summit Auckland - Building a Server-less Data Lake on AWS

API and UI

Storage and

Ingestion

Catalogue and

Search

Security

API and UI

Page 44: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Requirements for API and UI

• Serve Data and Capabilities to Customers

• Programmatically

• Search Catalogue

• Run Compute

• Extend Access Control Management

• And… Use of Familiar Visualisation Tools

Page 45: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Amazon API Gateway

1. Performance at Any Scale

2. Create RESTful Frontend

3. Managed API Lifecycle

Key Services for API and UI

AWS Lambda

1. Enables Server-less API

2. Custom Logic for Services

3. Automatic Scaling

AWS

Lambda

Amazon API

Gateway

Page 46: AWS Summit Auckland - Building a Server-less Data Lake on AWS

API

and

UI Amazon

API Gateway

AWS

Lambda

Page 47: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Recommendations

• Tips

• Go Server-less!

• Extend Existing AWS Services and Build Custom Logic

• Data Management, Processing and Transformations

• API Gateway for Data Access

• Serve the Data, Search and Compute via RESTful APIs

• Distribute a Custom SDK

• Extend the Solution

• Build Advanced Security Controls using Metadata Index

Page 48: AWS Summit Auckland - Building a Server-less Data Lake on AWS

The Whole Picture…

Storage and

Ingestion

Catalogue and

Search

Security

API and UI

Storage and

Ingestion

Catalogue and

Search

Security

API and UI

Page 49: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Amazon

EMR

Amazon

RDS

Amazon

S3

Amazon

Glacier

Amazon

Kinesis

Storage

and

Ingestion

Security

AWS

KMS

AWS

IAM

API

And

UI Amazon

API Gateway

AWS

LambdaUSERS

Amazon

Redshift

Catalogue and Search

AWS

Lambda

Amazon

DynamoDB

Amazon

Elasticsearch

Page 50: AWS Summit Auckland - Building a Server-less Data Lake on AWS

A Data Lake is…

• Foundation of Data Storage and Streaming Data

• Metadata index to help Categorise and Govern

• Search Index to Enable Data Discovery

• Robust Set of Security Controls

• Governance Through Technology Not Policy

• Interface to Expose Data and Capabilities to Users

Page 51: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Demo

Page 52: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Building Catalogue and Search

ElasticSearch

Metadata

Index

LambdaS3 Bucket Logstash

Data Flow

Data

Source

DynamoDB

Page 53: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 54: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 55: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 56: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 57: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 58: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 59: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 60: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 61: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 62: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 63: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 64: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 65: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 66: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 67: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 68: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 69: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 70: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 71: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 72: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 73: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 74: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 75: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Next Steps

Page 76: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Proof of Concept

Page 77: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Next Steps

• How to Get Started

• AWS Documentation

• Getting Started Guide

• AWS Training & Certification

• Big Data on AWS

• AWS Partner Network

• AWS Professional Services

• Big Data Specialists

Page 78: AWS Summit Auckland - Building a Server-less Data Lake on AWS

AWS Training & Certification

Intro Videos & Labs

Free videos and labs to

help you learn to work

with 30+ AWS services

– in minutes!

Training Classes

In-person and online

courses to build

technical skills –

taught by accredited

AWS instructors

Online Labs

Practice working with

AWS services in live

environment –

Learn how related

services work

together

AWS Certification

Validate technical

skills and expertise –

identify qualified IT

talent or show you

are AWS cloud ready

Learn more: aws.amazon.com/training

Page 79: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Your Training Next Steps:

Visit the AWS Training & Certification pod to discuss your

training plan & AWS Summit training offer

Register & attend AWS instructor led training

Get Certified

AWS Certified? Visit the AWS Summit Certification Lounge to pick up your swag

Learn more: aws.amazon.com/training

Page 80: AWS Summit Auckland - Building a Server-less Data Lake on AWS
Page 81: AWS Summit Auckland - Building a Server-less Data Lake on AWS

Thank You!