big data: mejores prácticas en aws
Post on 15-Apr-2017
173 Views
Preview:
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Javier Ros, Solution Architect
Jun, 2016
2016 Big Data. Mejores practicas en
AWS
Agenda
Big Data challenges
Design Patterns on AWS
RavenPack. Big Data for Financial Applications
Shopping cart
Big Data Challenges
Volume
Velocity
Variety
Simplify Big Data Processing
data answers
Time to Answer (Latency)
Throughput
Cost
ingest / collect
storeprocess /analyze
consume / visualize
On-Demand Big Data Analytics
Young Huang. Director, Big Data Analytics.
“We were able to save about 90% over the EC2 ondemand cost”
Clickstream Analysis
Suneel Sajnani. Senior VP of Enterprise Technology
Kinesis and Spark to process more than 30TB per day
Event-driven Extract, Transform, Load (ETL)
Brian Filppu. Director of Business Intelligence
Kinesis, Lambda and EMR for 16 million events per day
Smart Applications
Joe Emison. Founder & Chief Technology Officer
“Amazon Machine Learning democratizes the process of building predictive
models. It's easy and fast to use, and has machine-learning best practices
encapsulated in the product, which lets us deliver results significantly faster than
in the past.”
June 2, 2016
RavenPackAWS SummitMadrid
Mapping the World’s Big Datafor Financial Applications
Jose Luis Cruz ‒ Operations Managerjlcruz@ravenpack.com
● What is RavenPack?
● Current Use Cases in the Cloud
● What’s Next?
11ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• RavenPack delivers big data analytics to financial professionals
• 80% of big data is unstructured
• Only 29% of decisions are based on big data.
RavenPack at a Glance
80% 29%
12ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Top hedge funds and investment banks use RavenPack for trading and risk management
RavenPack processes hundreds of thousands of documents each day
We produce machine readable analytics for each document in real time <250ms
Archive of +300 million documents, over +20 years
RavenPack at a Glance
13ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model
• 6 Servers, 19 KVM virtual machines
• Limited Storage - Expensive to Upgrade
• Multiple Points of Failure
Use Case: Realtime Classification
RDBMS
CollectorsRT Feed
Snapshots
Classifier
Files
14ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model using AWS
• CloudFormation to model the Stack
• Unlimited, Distributed Storage
• Easy redundancy, failover and backup
Use Case: Realtime Classification
Amazon
EC2
AWS
CloudFormation
Amazon
DynamoDB
Amazon
S3Amazon
RDS
Amazon
CloudSearch
Amazon
Redshift
Amazon
Kinesis
RT Feed
Snapshots
ClassifiersCollectors
15ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model
• Same Limited Set of Servers, Same RDBMS
• Can affect Realtime System, Backups
• Full archive, 4-6 Classifiers → 6 weeks!
Use Case: History Classification
RDBMS FilesClassifiers
Classifiers
16ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model using AWS
• Servers on Demand, Distributed Storage
• Independent of Realtime System
• Full archive, 100 Classifiers → from 6 weeks to 3 days!
Use Case: History Classification
Amazon
EC2
AWS
CloudFormation
Amazon
DynamoDB
Amazon
S3
Amazon
RDS
Amazon
Redshift
Availability ZoneAvailability Zone
...
Classifiers
Coordinator
17ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Structured BIG DATA available:
Consensus and estimates
Online purchases
Bank and credit card transactions
Satellite photographical information
• Can improve current analytics or create new ones
• Challenges
Amount of data available
Mapping all those different datasets
• Solution: Kinesis + RedShift + EMR
Future: Incorporating Structured Data
Amazon
S3
Amazon
Redshift
Amazon
EC2
Amazon
EMR
Amazon
Kinesis
18ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Download a Custom “Slice” of Analytics Data
• Provide a Web-API and Web Service
• Let client specify parameters
Data Set and Time Range
Entities and Events
Filters
• Leverage Amazon RedShift and S3
• Compression and Multiple Output Formats
Future: Self-Service Data
Amazon
S3
Amazon
Redshift
Amazon API
Gateway
Amazon
EC2AWS
Lambda
19ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Let Clients upload Proprietary Contentto the Amazon Virtual Private Cloud (VPC)
Internal documents / research
Email, Instant Messaging
CRM, bug tracking system
Client Support Calls transcriptions
...
• Provision Computing and Storage Resourceson a Per Project Basis
• View Private Analytics in Isolation or AlongsideStandard RavenPack Analytic DataSets
• Everything Goes Away when Project Completes
Future: The RavenPack Cloud
Amazon
DynamoDBAmazon
RDS
Amazon
S3
Amazon
Redshift
Amazon
EC2
AWS
CloudFormation
Amazon
CloudSearch
RavenPack International
Thanks for listening!
ravenpack.com | info@ravenpack.com | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +44 (0) 782 783 8282
Jose Luis Cruz: jlcruz@ravenpack.com
Shopping Cart
http://amzn.to/BigDataSummit
Shopping cart
Business Metrics
Time to buy
Time to cancel
Number of sales
Sales per country
% buy
Architecture
client
mobile client
API Server
Cart event
Cart event
Amazon
KinesisAmazon
S3
Amazon
S3Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
Customer events
{
“type”: “productAdded”,
“timestamp”: 1462465948,
“customer”: 5,
“cart”: 203438,
“product”: 937293
}
{
“type”: “productRemoved”,
“timestamp”: 1462465948,
“customer”: 5,
“cart”: 203438,
“product”: 937293
}
{
“type”: “cartBuy”,
“timestamp”: 1462465948,
“customer”: 5,
“cart”: 203438
“productlist”: [34, 253]
}
{
“type”: “cartDiscard”,
“timestamp”: 1462465948,
“customer”: 5,
“cart”: 203438,
“productlist”: [2353, 1355, 1234]
}
Amazon Kinesis Firehose
Amazon Kinesis Firehose
Architecture
client
mobile client
API Server
Cart event
Cart event
Amazon
KinesisAmazon
S3
Amazon
S3Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
AWS Data
Pipeline
AWS Data Pipeline
AWS Elastic MapReduce
Pig script
DATA = LOAD 's3://shoppingcart-summit/streams/$inputdate/*' USING JsonLoader('type:chararray,timestamp:int,customer:int,cart:long,product:chararray, productlist: chararray');
DATA2 = FILTER DATA BY type is not null;
CARTS = GROUP DATA2 BY cart;
CARTDATA = FOREACH CARTS {
LOGIN = FILTER DATA2 BY type == 'login';
ADDED = FILTER DATA2 BY type == 'productAdded';
REMOVED = FILTER DATA2 BY type == 'productRemoved';
BUY = FILTER DATA2 BY type == 'cartBuy';
GENERATE MAX(DATA2.customer) AS customer, group AS cart,
MAX(DATA2.timestamp)-MIN(DATA2.timestamp) AS duration, IsEmpty(BUY) AS buy,
COUNT_STAR(ADDED) AS added, COUNT_STAR(REMOVED) AS removed,
MAX(DATA2.timestamp)-MAX(ADDED.timestamp) AS thinking,
MIN(LOGIN.timestamp) AS timestamp, '\"\"';
};
STORE CARTDATA INTO 's3://shoppingcart-summit/redshift/$inputdate/' USING PigStorage(',');
AWS Quicksight
Architecture
client
mobile client
API Server
Cart event
Cart event
Amazon
KinesisAmazon
S3
Amazon
S3Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
AWS Data
Pipeline
Machine learning and smart applications
Machine learning is the technology that
automatically finds patterns in your data and
uses them to make predictions for new data
points as they become available
Your data + machine learning = smart applications
Introducing Amazon Machine Learning
Easy to use, managed machine learning service built for developers
Robust, powerful machine learning technology based on Amazon’s internal systems
Create models using your data already stored in the AWS cloud
Deploy models to production in seconds
Trainmodel
Evaluate andoptimize
Retrieve predictions
1 2 3
Building smart applications with Amazon ML
- Create a Datasource object pointing to the shopping cart
processed data
- Explore and understand your data
- Transform data and train your model
Trainmodel
Evaluate andoptimize
Retrieve predictions
1 2 3
Building smart applications with Amazon ML
- Understand model quality
- Adjust model interpretation
Explore model quality
Fine-tune model interpretation
Trainmodel
Evaluate andoptimize
Retrieve predictions
1 2 3
Building smart applications with Amazon ML
- Batch predictions
- Real-time predictions
Real-time predictions for interactive applications
Your application
Query for predictions with
Amazon ML real-time API
ml = boto3.client('machinelearning')
prediction = ml.predict(
MLModelId='ml-dZxbrDXAstA',
Record={
'customer': '4634',
’cart': '13661535770434', …
},
PredictEndpoint='https://realtime.machinelearning….’
)
Architecture
client
mobile client
API Server
Cart event
Cart event
Amazon
KinesisAmazon
S3
Amazon
S3Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
AWS Data
Pipeline
http://amzn.to/BigDataSummit
top related