big data: mejores prácticas en aws

Post on 15-Apr-2017

173 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Javier Ros, Solution Architect

Jun, 2016

2016 Big Data. Mejores practicas en

AWS

Agenda

Big Data challenges

Design Patterns on AWS

RavenPack. Big Data for Financial Applications

Shopping cart

Big Data Challenges

Volume

Velocity

Variety

Simplify Big Data Processing

data answers

Time to Answer (Latency)

Throughput

Cost

ingest / collect

storeprocess /analyze

consume / visualize

On-Demand Big Data Analytics

Young Huang. Director, Big Data Analytics.

“We were able to save about 90% over the EC2 ondemand cost”

Clickstream Analysis

Suneel Sajnani. Senior VP of Enterprise Technology

Kinesis and Spark to process more than 30TB per day

Event-driven Extract, Transform, Load (ETL)

Brian Filppu. Director of Business Intelligence

Kinesis, Lambda and EMR for 16 million events per day

Smart Applications

Joe Emison. Founder & Chief Technology Officer

“Amazon Machine Learning democratizes the process of building predictive

models. It's easy and fast to use, and has machine-learning best practices

encapsulated in the product, which lets us deliver results significantly faster than

in the past.”

June 2, 2016

RavenPackAWS SummitMadrid

Mapping the World’s Big Datafor Financial Applications

Jose Luis Cruz ‒ Operations Managerjlcruz@ravenpack.com

● What is RavenPack?

● Current Use Cases in the Cloud

● What’s Next?

11ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

• RavenPack delivers big data analytics to financial professionals

• 80% of big data is unstructured

• Only 29% of decisions are based on big data.

RavenPack at a Glance

80% 29%

12ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

Top hedge funds and investment banks use RavenPack for trading and risk management

RavenPack processes hundreds of thousands of documents each day

We produce machine readable analytics for each document in real time <250ms

Archive of +300 million documents, over +20 years

RavenPack at a Glance

13ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

Classic Model

• 6 Servers, 19 KVM virtual machines

• Limited Storage - Expensive to Upgrade

• Multiple Points of Failure

Use Case: Realtime Classification

RDBMS

CollectorsRT Feed

Snapshots

Classifier

Files

14ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

Cloud Model using AWS

• CloudFormation to model the Stack

• Unlimited, Distributed Storage

• Easy redundancy, failover and backup

Use Case: Realtime Classification

Amazon

EC2

AWS

CloudFormation

Amazon

DynamoDB

Amazon

S3Amazon

RDS

Amazon

CloudSearch

Amazon

Redshift

Amazon

Kinesis

RT Feed

Snapshots

ClassifiersCollectors

15ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

Classic Model

• Same Limited Set of Servers, Same RDBMS

• Can affect Realtime System, Backups

• Full archive, 4-6 Classifiers → 6 weeks!

Use Case: History Classification

RDBMS FilesClassifiers

Classifiers

16ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

Cloud Model using AWS

• Servers on Demand, Distributed Storage

• Independent of Realtime System

• Full archive, 100 Classifiers → from 6 weeks to 3 days!

Use Case: History Classification

Amazon

EC2

AWS

CloudFormation

Amazon

DynamoDB

Amazon

S3

Amazon

RDS

Amazon

Redshift

Availability ZoneAvailability Zone

...

Classifiers

Coordinator

17ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

• Structured BIG DATA available:

Consensus and estimates

Online purchases

Bank and credit card transactions

Satellite photographical information

• Can improve current analytics or create new ones

• Challenges

Amount of data available

Mapping all those different datasets

• Solution: Kinesis + RedShift + EMR

Future: Incorporating Structured Data

Amazon

S3

Amazon

Redshift

Amazon

EC2

Amazon

EMR

Amazon

Kinesis

18ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

Download a Custom “Slice” of Analytics Data

• Provide a Web-API and Web Service

• Let client specify parameters

Data Set and Time Range

Entities and Events

Filters

• Leverage Amazon RedShift and S3

• Compression and Multiple Output Formats

Future: Self-Service Data

Amazon

S3

Amazon

Redshift

Amazon API

Gateway

Amazon

EC2AWS

Lambda

19ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90

• Let Clients upload Proprietary Contentto the Amazon Virtual Private Cloud (VPC)

Internal documents / research

Email, Instant Messaging

CRM, bug tracking system

Client Support Calls transcriptions

...

• Provision Computing and Storage Resourceson a Per Project Basis

• View Private Analytics in Isolation or AlongsideStandard RavenPack Analytic DataSets

• Everything Goes Away when Project Completes

Future: The RavenPack Cloud

Amazon

DynamoDBAmazon

RDS

Amazon

S3

Amazon

Redshift

Amazon

EC2

AWS

CloudFormation

Amazon

CloudSearch

RavenPack International

Thanks for listening!

ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +44 (0) 782 783 8282

Jose Luis Cruz: jlcruz@ravenpack.com

Shopping Cart

http://amzn.to/BigDataSummit

Shopping cart

Business Metrics

Time to buy

Time to cancel

Number of sales

Sales per country

% buy

Architecture

client

mobile client

API Server

Cart event

Cart event

Amazon

KinesisAmazon

S3

Amazon

S3Amazon

EMR

Amazon

Redshift

Amazon Machine

Learning

Amazon

QuickSight

Customer events

{

“type”: “productAdded”,

“timestamp”: 1462465948,

“customer”: 5,

“cart”: 203438,

“product”: 937293

}

{

“type”: “productRemoved”,

“timestamp”: 1462465948,

“customer”: 5,

“cart”: 203438,

“product”: 937293

}

{

“type”: “cartBuy”,

“timestamp”: 1462465948,

“customer”: 5,

“cart”: 203438

“productlist”: [34, 253]

}

{

“type”: “cartDiscard”,

“timestamp”: 1462465948,

“customer”: 5,

“cart”: 203438,

“productlist”: [2353, 1355, 1234]

}

Amazon Kinesis Firehose

Amazon Kinesis Firehose

Architecture

client

mobile client

API Server

Cart event

Cart event

Amazon

KinesisAmazon

S3

Amazon

S3Amazon

EMR

Amazon

Redshift

Amazon Machine

Learning

Amazon

QuickSight

AWS Data

Pipeline

AWS Data Pipeline

AWS Elastic MapReduce

Pig script

DATA = LOAD 's3://shoppingcart-summit/streams/$inputdate/*' USING JsonLoader('type:chararray,timestamp:int,customer:int,cart:long,product:chararray, productlist: chararray');

DATA2 = FILTER DATA BY type is not null;

CARTS = GROUP DATA2 BY cart;

CARTDATA = FOREACH CARTS {

LOGIN = FILTER DATA2 BY type == 'login';

ADDED = FILTER DATA2 BY type == 'productAdded';

REMOVED = FILTER DATA2 BY type == 'productRemoved';

BUY = FILTER DATA2 BY type == 'cartBuy';

GENERATE MAX(DATA2.customer) AS customer, group AS cart,

MAX(DATA2.timestamp)-MIN(DATA2.timestamp) AS duration, IsEmpty(BUY) AS buy,

COUNT_STAR(ADDED) AS added, COUNT_STAR(REMOVED) AS removed,

MAX(DATA2.timestamp)-MAX(ADDED.timestamp) AS thinking,

MIN(LOGIN.timestamp) AS timestamp, '\"\"';

};

STORE CARTDATA INTO 's3://shoppingcart-summit/redshift/$inputdate/' USING PigStorage(',');

AWS Quicksight

Architecture

client

mobile client

API Server

Cart event

Cart event

Amazon

KinesisAmazon

S3

Amazon

S3Amazon

EMR

Amazon

Redshift

Amazon Machine

Learning

Amazon

QuickSight

AWS Data

Pipeline

Machine learning and smart applications

Machine learning is the technology that

automatically finds patterns in your data and

uses them to make predictions for new data

points as they become available

Your data + machine learning = smart applications

Introducing Amazon Machine Learning

Easy to use, managed machine learning service built for developers

Robust, powerful machine learning technology based on Amazon’s internal systems

Create models using your data already stored in the AWS cloud

Deploy models to production in seconds

Trainmodel

Evaluate andoptimize

Retrieve predictions

1 2 3

Building smart applications with Amazon ML

- Create a Datasource object pointing to the shopping cart

processed data

- Explore and understand your data

- Transform data and train your model

Trainmodel

Evaluate andoptimize

Retrieve predictions

1 2 3

Building smart applications with Amazon ML

- Understand model quality

- Adjust model interpretation

Explore model quality

Fine-tune model interpretation

Trainmodel

Evaluate andoptimize

Retrieve predictions

1 2 3

Building smart applications with Amazon ML

- Batch predictions

- Real-time predictions

Real-time predictions for interactive applications

Your application

Query for predictions with

Amazon ML real-time API

ml = boto3.client('machinelearning')

prediction = ml.predict(

MLModelId='ml-dZxbrDXAstA',

Record={

'customer': '4634',

’cart': '13661535770434', …

},

PredictEndpoint='https://realtime.machinelearning….’

)

Architecture

client

mobile client

API Server

Cart event

Cart event

Amazon

KinesisAmazon

S3

Amazon

S3Amazon

EMR

Amazon

Redshift

Amazon Machine

Learning

Amazon

QuickSight

AWS Data

Pipeline

http://amzn.to/BigDataSummit

top related