Download - Big Data: Mejores prácticas en AWS
![Page 1: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/1.jpg)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Javier Ros, Solution Architect
Jun, 2016
2016 Big Data. Mejores practicas en
AWS
![Page 2: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/2.jpg)
Agenda
Big Data challenges
Design Patterns on AWS
RavenPack. Big Data for Financial Applications
Shopping cart
![Page 3: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/3.jpg)
Big Data Challenges
Volume
Velocity
Variety
![Page 4: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/4.jpg)
Simplify Big Data Processing
data answers
Time to Answer (Latency)
Throughput
Cost
ingest / collect
storeprocess /analyze
consume / visualize
![Page 5: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/5.jpg)
On-Demand Big Data Analytics
Young Huang. Director, Big Data Analytics.
“We were able to save about 90% over the EC2 ondemand cost”
![Page 6: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/6.jpg)
Clickstream Analysis
Suneel Sajnani. Senior VP of Enterprise Technology
Kinesis and Spark to process more than 30TB per day
![Page 7: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/7.jpg)
Event-driven Extract, Transform, Load (ETL)
Brian Filppu. Director of Business Intelligence
Kinesis, Lambda and EMR for 16 million events per day
![Page 8: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/8.jpg)
Smart Applications
Joe Emison. Founder & Chief Technology Officer
“Amazon Machine Learning democratizes the process of building predictive
models. It's easy and fast to use, and has machine-learning best practices
encapsulated in the product, which lets us deliver results significantly faster than
in the past.”
![Page 9: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/9.jpg)
June 2, 2016
RavenPackAWS SummitMadrid
Mapping the World’s Big Datafor Financial Applications
Jose Luis Cruz ‒ Operations [email protected]
![Page 10: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/10.jpg)
● What is RavenPack?
● Current Use Cases in the Cloud
● What’s Next?
![Page 11: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/11.jpg)
11ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• RavenPack delivers big data analytics to financial professionals
• 80% of big data is unstructured
• Only 29% of decisions are based on big data.
RavenPack at a Glance
80% 29%
![Page 12: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/12.jpg)
12ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Top hedge funds and investment banks use RavenPack for trading and risk management
RavenPack processes hundreds of thousands of documents each day
We produce machine readable analytics for each document in real time <250ms
Archive of +300 million documents, over +20 years
RavenPack at a Glance
![Page 13: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/13.jpg)
13ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model
• 6 Servers, 19 KVM virtual machines
• Limited Storage - Expensive to Upgrade
• Multiple Points of Failure
Use Case: Realtime Classification
RDBMS
CollectorsRT Feed
Snapshots
Classifier
Files
![Page 14: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/14.jpg)
14ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model using AWS
• CloudFormation to model the Stack
• Unlimited, Distributed Storage
• Easy redundancy, failover and backup
Use Case: Realtime Classification
Amazon
EC2
AWS
CloudFormation
Amazon
DynamoDB
Amazon
S3Amazon
RDS
Amazon
CloudSearch
Amazon
Redshift
Amazon
Kinesis
RT Feed
Snapshots
ClassifiersCollectors
![Page 15: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/15.jpg)
15ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Classic Model
• Same Limited Set of Servers, Same RDBMS
• Can affect Realtime System, Backups
• Full archive, 4-6 Classifiers → 6 weeks!
Use Case: History Classification
RDBMS FilesClassifiers
Classifiers
![Page 16: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/16.jpg)
16ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Cloud Model using AWS
• Servers on Demand, Distributed Storage
• Independent of Realtime System
• Full archive, 100 Classifiers → from 6 weeks to 3 days!
Use Case: History Classification
Amazon
EC2
AWS
CloudFormation
Amazon
DynamoDB
Amazon
S3
Amazon
RDS
Amazon
Redshift
Availability ZoneAvailability Zone
...
Classifiers
Coordinator
![Page 17: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/17.jpg)
17ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Structured BIG DATA available:
Consensus and estimates
Online purchases
Bank and credit card transactions
Satellite photographical information
• Can improve current analytics or create new ones
• Challenges
Amount of data available
Mapping all those different datasets
• Solution: Kinesis + RedShift + EMR
Future: Incorporating Structured Data
Amazon
S3
Amazon
Redshift
Amazon
EC2
Amazon
EMR
Amazon
Kinesis
![Page 18: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/18.jpg)
18ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
Download a Custom “Slice” of Analytics Data
• Provide a Web-API and Web Service
• Let client specify parameters
Data Set and Time Range
Entities and Events
Filters
• Leverage Amazon RedShift and S3
• Compression and Multiple Output Formats
Future: Self-Service Data
Amazon
S3
Amazon
Redshift
Amazon API
Gateway
Amazon
EC2AWS
Lambda
![Page 19: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/19.jpg)
19ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +34 952 90 73 90
• Let Clients upload Proprietary Contentto the Amazon Virtual Private Cloud (VPC)
Internal documents / research
Email, Instant Messaging
CRM, bug tracking system
Client Support Calls transcriptions
...
• Provision Computing and Storage Resourceson a Per Project Basis
• View Private Analytics in Isolation or AlongsideStandard RavenPack Analytic DataSets
• Everything Goes Away when Project Completes
Future: The RavenPack Cloud
Amazon
DynamoDBAmazon
RDS
Amazon
S3
Amazon
Redshift
Amazon
EC2
AWS
CloudFormation
Amazon
CloudSearch
![Page 20: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/20.jpg)
RavenPack International
Thanks for listening!
ravenpack.com | [email protected] | AMERICAS Tel: (646) 277-7339 | EMEA-APAC Tel: +44 (0) 782 783 8282
Jose Luis Cruz: [email protected]
![Page 21: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/21.jpg)
Shopping Cart
http://amzn.to/BigDataSummit
![Page 22: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/22.jpg)
Shopping cart
![Page 23: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/23.jpg)
Business Metrics
Time to buy
Time to cancel
Number of sales
Sales per country
% buy
![Page 24: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/24.jpg)
Architecture
client
mobile client
API Server
Cart event
Cart event
Amazon
KinesisAmazon
S3
Amazon
S3Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
![Page 25: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/25.jpg)
Customer events
{
“type”: “productAdded”,
“timestamp”: 1462465948,
“customer”: 5,
“cart”: 203438,
“product”: 937293
}
{
“type”: “productRemoved”,
“timestamp”: 1462465948,
“customer”: 5,
“cart”: 203438,
“product”: 937293
}
{
“type”: “cartBuy”,
“timestamp”: 1462465948,
“customer”: 5,
“cart”: 203438
“productlist”: [34, 253]
}
{
“type”: “cartDiscard”,
“timestamp”: 1462465948,
“customer”: 5,
“cart”: 203438,
“productlist”: [2353, 1355, 1234]
}
![Page 26: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/26.jpg)
Amazon Kinesis Firehose
![Page 27: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/27.jpg)
Amazon Kinesis Firehose
![Page 28: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/28.jpg)
Architecture
client
mobile client
API Server
Cart event
Cart event
Amazon
KinesisAmazon
S3
Amazon
S3Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
AWS Data
Pipeline
![Page 29: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/29.jpg)
AWS Data Pipeline
![Page 30: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/30.jpg)
AWS Elastic MapReduce
![Page 31: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/31.jpg)
Pig script
DATA = LOAD 's3://shoppingcart-summit/streams/$inputdate/*' USING JsonLoader('type:chararray,timestamp:int,customer:int,cart:long,product:chararray, productlist: chararray');
DATA2 = FILTER DATA BY type is not null;
CARTS = GROUP DATA2 BY cart;
CARTDATA = FOREACH CARTS {
LOGIN = FILTER DATA2 BY type == 'login';
ADDED = FILTER DATA2 BY type == 'productAdded';
REMOVED = FILTER DATA2 BY type == 'productRemoved';
BUY = FILTER DATA2 BY type == 'cartBuy';
GENERATE MAX(DATA2.customer) AS customer, group AS cart,
MAX(DATA2.timestamp)-MIN(DATA2.timestamp) AS duration, IsEmpty(BUY) AS buy,
COUNT_STAR(ADDED) AS added, COUNT_STAR(REMOVED) AS removed,
MAX(DATA2.timestamp)-MAX(ADDED.timestamp) AS thinking,
MIN(LOGIN.timestamp) AS timestamp, '\"\"';
};
STORE CARTDATA INTO 's3://shoppingcart-summit/redshift/$inputdate/' USING PigStorage(',');
![Page 32: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/32.jpg)
AWS Quicksight
![Page 33: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/33.jpg)
Architecture
client
mobile client
API Server
Cart event
Cart event
Amazon
KinesisAmazon
S3
Amazon
S3Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
AWS Data
Pipeline
![Page 34: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/34.jpg)
Machine learning and smart applications
Machine learning is the technology that
automatically finds patterns in your data and
uses them to make predictions for new data
points as they become available
Your data + machine learning = smart applications
![Page 35: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/35.jpg)
Introducing Amazon Machine Learning
Easy to use, managed machine learning service built for developers
Robust, powerful machine learning technology based on Amazon’s internal systems
Create models using your data already stored in the AWS cloud
Deploy models to production in seconds
![Page 36: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/36.jpg)
Trainmodel
Evaluate andoptimize
Retrieve predictions
1 2 3
Building smart applications with Amazon ML
- Create a Datasource object pointing to the shopping cart
processed data
- Explore and understand your data
- Transform data and train your model
![Page 37: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/37.jpg)
Trainmodel
Evaluate andoptimize
Retrieve predictions
1 2 3
Building smart applications with Amazon ML
- Understand model quality
- Adjust model interpretation
![Page 38: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/38.jpg)
Explore model quality
![Page 39: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/39.jpg)
Fine-tune model interpretation
![Page 40: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/40.jpg)
Trainmodel
Evaluate andoptimize
Retrieve predictions
1 2 3
Building smart applications with Amazon ML
- Batch predictions
- Real-time predictions
![Page 41: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/41.jpg)
Real-time predictions for interactive applications
Your application
Query for predictions with
Amazon ML real-time API
ml = boto3.client('machinelearning')
prediction = ml.predict(
MLModelId='ml-dZxbrDXAstA',
Record={
'customer': '4634',
’cart': '13661535770434', …
},
PredictEndpoint='https://realtime.machinelearning….’
)
![Page 42: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/42.jpg)
Architecture
client
mobile client
API Server
Cart event
Cart event
Amazon
KinesisAmazon
S3
Amazon
S3Amazon
EMR
Amazon
Redshift
Amazon Machine
Learning
Amazon
QuickSight
AWS Data
Pipeline
http://amzn.to/BigDataSummit
![Page 43: Big Data: Mejores prácticas en AWS](https://reader034.vdocuments.net/reader034/viewer/2022042619/58f136251a28ab352c8b456b/html5/thumbnails/43.jpg)