generating recommendations at amazon scale with apache spark and amazon dsstne
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Solutions Architect, Amazon Web Services Japan
Generating Recommendationsat Amazon Scale withApache Spark and Amazon DSSTNERyosuke Iwanaga
October 2016
Agenda
• Recommendation and DSSTNE
• Data science productivity with AWS
Note: Details are not the actual Amazon case, but general pattern
Recommendation and DSSTNE
Product Recommendations
What are people who bought items A, B, C … Z most likely to purchase next?
Input and Output
InputPurchase history for each customer
OutputPossibility to buy each products for each customer
Machine Learning for Recommendation
Lots of algorithmsMatrix FactorizationLogistic RegressionNaïve Bayesetc.=> Neural Network
Neural Networks for Product Recommendations
Output (10K-10M)
Input (10K-10M)
Hidden (100-1K)
This Is A Huge Sparse Data Problem
l Uncompressed sparse data either eats a lot of memory or it eats a lot of bandwidth uploading it to the GPU
l Naively running networks with uncompressed sparse data leads to lots of multiplications of zero by zero. This wastes memory, power, and time
l Product Recommendation Networks can have billions of parameters that cannot fit in a single GPU so summarizing...
Framework Requirements (2014)
l Efficient support for large input and output layersl Efficient handling of sparse data (i.e. don't store zero)l Automagic multi-GPU support for large networks and
scalingl Avoids multiplying zero and/or by zerol 24 hour or less training and recommendations
turnaroundl Human-readable descriptions of networks
DSSTNE: Deep Sparse Scalable Tensor Network Engine*
l A Neural Network framework released into OSS by Amazonl Optimized for large sparse data problems and for fully
connected layersl Extremely efficient model-parallel multi-GPU supportl 100% Deterministic Executionl Full SM 3.x, 5.x, and 6.x support (Kepler or better GPUs)l Distributed training support OOTB (~20 lines of MPI calls)
*”Destiny”
Describes Neural Networks As JSON Objects{
"Version" : 0.7,"Name" : "AE","Kind" : "FeedForward", "SparsenessPenalty" : {
"p" : 0.5,"beta" : 2.0
},
"ShuffleIndices" : false,
"Denoising" : {"p" : 0.2
},
"ScaledMarginalCrossEntropy" : {"oneTarget" : 1.0,"zeroTarget" : 0.0,"oneScale" : 1.0,"zeroScale" : 1.0
},"Layers" : [
{ "Name" : "Input", "Kind" : "Input", "N" : "auto", "DataSet" : "input", "Sparse" : true }, { "Name" : "Hidden", "Kind" : "Hidden", "Type" : "FullyConnected", "N" : 128, "Activation" : "Sigmoid", "Sparse" : true },{ "Name" : "Output", "Kind" : "Output", "Type" : "FullyConnected", "DataSet" : "output", "N" : "auto", "Activation" : "Sigmoid", "Sparse" : true }
],
"ErrorFunction" : "ScaledMarginalCrossEntropy"}
Summary for DSSTNE
Very efficient performance for sparse fully-connected NNMultiple GPU by Model parallel and Data parallel
Declare NN by human readable formatJSON definition
100% Deterministic execution
Data science productivitywith AWS
Productivity
Agile iteration is the most important for productivitydesign=>train=>predict=>evaluate=>design=>…
Training: GPU (DSSTNE and others)Pre/Post process: CPU
How to unify these different workload?Data scientists don't want to use too much tools
What are Containers?
OS virtualization
Process isolation
Images
Automation Server
Guest OS
Bins/Libs Bins/Libs
App2App1
Deep Learning meets Docker(Container)
A lot of Deep Learning frameworksDSSTNE, Caffe, Theano, TensorFlow, etc.
To compare each framework using the same input and outputContainerize each frameworkJust swap the container image and configurationNo more worry about setup machines!
Spark moves at interactive speed
join
filter
groupBy
Stage 3
Stage 1
Stage 2
A: B:
C: D: E:
F:
= cached partition= RDD
map
• Massively parallel
• Uses DAGs instead of map-reduce for execution
• Minimizes I/O by storing data in DataFrames in memory
• Partitioning-aware to avoid network-intensive shuffle
Apache Zeppelin notebook to develop queries
Architecture
Control CPU cluster and GPU cluster
Both CPU and GPU jobs are submitted via Spark driver
CPU jobs: Normal Spark tasks running on Amazon EMR
GPU jobs: Spark submits jobs to Amazon ECSNot only DSSTNE but also other DL frameworks by Docker
Amazon EMR
Why EMR?
Automation Decouple Elastic
Integration Low-costCurrent
Why EMR? Automation
EC2 Provisioning Cluster Setup Hadoop Configuration
Installing ApplicationsJob submissionMonitoring and Failure Handling
Why EMR? Decoupled Architecture
Separate compute and storage
Resize and shutdown with no data loss
Point multiple clusters ad the same data on
Amazon S3
Easily evolve infrastructure as
technology evolves
HDFS for iterative and disk I/O intensive
workloads
Save with spot and reserved instances
Why EMR? Decouple Storage and Compute
Amazon Kinesis(Streams, Firehose)
Hadoop Jobs
Persistent Cluster – Interactive Queries(Spark-SQL | Presto | Impala)
Transient Cluster - Batch Jobs(X hours nightly) – Add/Remove Nodes
ETL Jobs
Hive External Metastorei.e Amazon RDS
Workload specific clusters(Different sizes, Different Versions)
Amazon S3 for Storage
create external table t_name(..)...location s3://bucketname/path-to-file/
EMR 5.0 - Applications
Amazon ECS
Amazon EC2 Container Service (ECS)
Container Managementat Any Scale
Flexible ContainerPlacement
Integrationwith the AWS Platform
Components of Amazon ECS
TaskActual containers running on Instances
Task DefinitionDefinition of containers and environment for task
ClusterFleet of EC2 instances on which tasks run
ManagerManage cluster resource and state of tasks
SchedulerPlace tasks considering cluster status
AgentCoordinate EC2 instances and Manager
How Amazon ECS runs Task Scheduler
ManagerCluster
Task Definition
Task
Agent
Integration with Spark and ECS
Install AWS SDK for Java on the EMR cluster
Create Task Definition for each Deep Learning framework
Call RunTask APIECS Scheduler will try to find enough space to run it
Training: Model parallel
Prediction: Data parallel
Why AWS?
Scalability
Fully-managed services
GPU instances
Summary
Amazon Personalization runs on AWS
Spark and Zeppelin for the single interface for data scientists
DSSTNE helps running DL on a huge amount of sparse NN
Using Amazon EMR for CPU and Amazon ECS for GPUYou can do it!
Thank you!