instructure-aws-bigdata -...

19
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data on AWS Quick overview and deep’ish dive on Realtime analytics Ben Snively AWS Specialist SA – Big Data and Analytics [email protected]

Upload: lamthien

Post on 17-Mar-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data on AWSQuick overview and deep’ish dive on Real-­time analytics

Ben SnivelyAWS Specialist SA – Big Data and [email protected]

Page 2: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Growing Data, Faster data, from more sources

1.2 ZB in 201544 ZB by 2020 180 ZB by 2025

Data is being generated faster and faster

More and more data sources80 billion devices -­ 2025500 million tweets daily

GB TB

PB

ZB

EB

Source: IDC, 2015

IoT

Social Media

Enterprise Systems

Page 3: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

(7 Vs) Visualization

Value

(5 Vs)VeracityVariability

Velocity Volume

Variety

Requirements for Solution

Drivers for Big Data

Page 4: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Big Data was Meant for the Cloud

Big data Cloud ComputingVariety, volume, and velocity requiring new tools Variety of compute, storage, and networking options

Potentially massive datasets

Massive, virtually unlimited capacity

Iterative, experimental style of data manipulation and analysis

Iterative, experimental style of infrastructure deployment/usage

At its most efficient with highly variable workloads

Frequently not steady-­state workload;; peaks and valleys

Absolute performance not as critical as “time to results”;; shared resources are a bottleneck

Parallel compute projects allow each workgroup to have more autonomy, get faster results

Page 5: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

GenerateCollect,

Orchestrate, Store

Analyze

Lower CostIncreased Velocity Traditionally -­ Highly constrained

Common Big Data Flow

Page 6: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

One tool to rule them all

Page 7: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

AWS Big Data Platform

EMR EC2

Glacier

S3

Import Export

Kinesis

Direct Connect

Machine LearningRedshift

DynamoDB

AWS Database Migration Service

Collect Orchestrate Store Analyze

AWS Lambda

AWS IoT

AWS Data Pipeline

Kinesis AnalyticsAmazonSNS

AWS Snowball

AmazonSWF

Amazon Athena

Amazon QuickSight

Amazon AuroraAWS Glue

Page 8: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Real-­time Analytics-­ Amazon Kinesis Platform

Page 9: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Amazon Kinesis: Streaming Data Done the AWS WayMakes it easy to capture, deliver, and process real-­time data streams

Pay as you go, no up-­front costs

Elastically scalable

Right services for your specific use cases

Real-­time latencies

Easy to provision, deploy, and manage

Page 10: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Amazon Kinesis StreamsFor Technical Developers

Build your own custom applications that process or analyze streaming

data

Amazon Kinesis Firehose

For all developers, data scientists

Easily load massive volumes of streaming data into S3,Amazon Redshift and Amazon

Elasticsearch

Amazon Kinesis Analytics

For all developers, data scientists

Easily analyze data streams using standard

SQL queries

Amazon Kinesis: Streaming data made easyServices make it easy to capture, deliver and process streams on AWS

Page 11: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Region Availability (today)Kinesis Streams Kinesis Firehose Kinesis Analytics Lambda

Page 12: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Sending & Reading Data from Kinesis Streams

AWS SDK

LOG4J

Flume

Fluentd

Get* APIs

Kinesis Client Library+

Connector Library

Apache Storm

Amazon Elastic MapReduce

Sending Consuming

AWS Mobile SDK

KinesisProducerLibrary AWS Lambda

Apache Spark

Apache Flink

Page 13: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Amazon Kinesis Streams vs. Amazon Kinesis Firehose

Amazon Kinesis Streams is for use cases that require customprocessing, per incoming record, with sub-­1 second processinglatency, and a choice of stream processing frameworks.

Amazon Kinesis Firehose is for use cases that require zeroadministration, ability to use existing analytics tools based onAmazon S3, Amazon Redshift and Amazon Elasticsearch, and adata latency of 60 seconds or higher.

Page 14: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Demonstration

Page 15: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

S3 is the Data Lake

Page 16: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Data Lake Reference Architecture

AthenaGlue

Page 17: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Processing & Analytics

Real-­‐time Batch

AI & Predictive

BI & Data Visualization

Transactional & RDBMS

AWS LambdaApache Storm on EMR

Apache Flink on EMR

Spark Streaming on EMR

Elasticsearch Service

Kinesis Analytics, Kinesis Streams

DynamoDB

NoSQL DB Relational DatabaseAurora

EMRHadoop, Spark,

Presto

RedshiftData Warehouse

AthenaQuery Service

Amazon LexSpeech recognition

Amazon Rekognition

Amazon PollyText to speech

Machine LearningPredictive analytics

Kinesis Streams & Firehose

Page 18: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Amazon S3 Data Lake

Amazon KinesisStreams & Firehose

AWS LambdaApache Storm on

EMR

Apache Flink on EMR

Spark Streaming on EMR

Hadoop / Spark

Streaming Analytics Tools

Amazon RedshiftData Warehouse

Amazon DynamoDBNoSQL DB & Graph DB

Amazon Elasticsearch Service

Relational DatabaseAmazon EMR

Amazon Aurora

Amazon Machine LearningMachine Learning

Open Source Tool of Choice

on EC2

Data Sources

Data Lake Architecture with AWS Tools

Data Science Sandbox

Visualization / Reporting

Amazon Kinesis Analytics

Page 19: Instructure-AWS-BigData - schd.wsschd.ws/hosted_files/pandamonium2017/f0/Instructure-AWS-BigData.pdf · Elasticsearch Amazon$Kinesis$ ... KinesisStreams KinesisFirehose Kinesis’Analytics

Summary

• AWS enables you to build sophisticated big data applications • Retrospective, Real-­time, Predictive

• You can build incrementally, adding use cases and increasing scale as you go

• AWS provides a broad range of security and auditing features to enable you to meet your security requirements

• AWS makes it easy to build hybrid applications that span across your datacenters and the AWS Cloud

https://aws.amazon.com/big-­data/ also /iot