standalone spark deployment for stability and performance
TRANSCRIPT
Totango❖ Leading Customer Success Platform
❖ Helps companies retain and grow their customer base
❖ Advanced actionable analytics for subscription and recurring
revenue
❖ Founded @ 2010
❖ Infrastructure on AWS cloud
❖ Spark for batch processing
❖ ElasticSearch for serving layer
About Me
Romi Kuntsman
Senior Big Data Engineer @ Totango
Working with Apache Spark since v1.0
Working with AWS Cloud since 2008
Spark on AWS - first attempts❖ We tried Amazon EMR (Elastic MapReduce) to install Spark on
YARN➢Performance hit per application (starts Spark instance for each)➢Performance hit per server (running services we don't use, like
HDFS)➢Slow and unstable cluster resizing (often stuck and need to
recreate)❖We tried spark-ec2 script to install Spark Standalone on AWS
EC2 machines➢Serial (not parallel) initialization of multiple servers - slow!➢Unmaintained scripts since availability of Spark on EMR (see
above)➢Doesn't integrate with our existing systems
Spark on AWS - road to success❖ We decided to write our own scripts to integrate and control
everything
❖Understood all Spark components and configuration settings
❖Deployment based on Chef, like we do in all servers
❖Integrated monitoring and logging, like we have in all our systems
❖Full server utilization - running exactly what we need and nothing more
❖Cluster hanging or crashing no longer happens
❖Seamless cluster resize without hurting any existing jobs
❖Able to upgrade to any version of Spark (not dependant on third party)
What we'll discuss❖Separation of Spark Components
❖Centralized Managed Logging
❖Monitoring Cluster Utilization
❖Auto Scaling Groups
❖Termination Protection
❖Upstart Mechanism
❖NewRelic Integration
❖Chef-based Instantiation
Data w/ Romi
Ops w/ Alon
Separation of Components❖Spark Master Server (single)
➢Master Process - accepts requests to start applications➢History Process - serves history data of completed
applications❖Spark Slave Server (multiple)
➢Worker Process - handles workload of applications on server
➢External Shuffle Service - handles data exchange between workers
➢Executor Process (one per core - for running apps) - runs actual code
Configuration - Deploy Spread Out❖spark.deploy.spreadOut (SPARK_MASTER_OPTS)
➢true = use cores spread across all workers➢false = fill up all worker cores before getting more
Configuration - Cleanup❖spark.worker.cleanup.* (SPARK_WORKER_OPTS)
➢.enabled = true (turn on mechanism to clean up app folders)
➢.interval = 1800 (run every 1800 seconds, or 30 minutes)
➢.appDataTtl = 1800 (remove finished applications after 30 minutes)
❖We have 100s of applications per day, each with it's jars and logs
❖Rapid cleanup is essential to avoid filling up disk space
❖We collect the logs before cleanup - details in following slides ;-)
❖Only cleans up files of completed applications
External Shuffle Service❖Preserves shuffle files written by executors
❖Servers shuffle files to other executors who want to fetch them
❖If (when) one executor crashes (OOM etc), others may still access it's shuffle
❖We run the shuffle service itself in a separate process from the executor
❖To enable: spark.shuffle.service.enable=true
❖Config: spark.shuffle.io.* (see documentation)
Logging - components❖ Master Log (/logs/spark-runner-
org.apache.spark.deploy.master.Master-*)
➢Application registration, worker coordination
❖History Log (/logs/spark-runner-org.apache.spark.deploy.history.HistoryServer-*)
➢Access to history, errors reading (e.g. I/O from S3, not found)
❖Worker Log (/logs/spark-runner-org.apache.spark.deploy.worker.Worker-*)
➢Executor management (launch, kill, ACLs)
❖Shuffle Log (/logs/org.apache.spark.deploy.ExternalShuffleService-*)
➢External Executor Registrations
Logging - applications❖Application Logs (/mnt/spark-work/app-12345/execid/stderr)
➢All output from executor process, including your own code
❖Using LogStash to gather logs from all applications togetherinput {
file { path => "/mnt/spark-work/app-*/*/std*" start_position => beginning
}}filter {
grok { match => [ "path", "/mnt/spark-work/%{NOTSPACE:application}/.+/%{NOTSPACE:logtype}" ]
}}output {
file { path => "/logs/applications.log" message_format => "%{application} %{logtype} %{message}"
}}
Monitoring Cluster Utilization❖ Spark Reports Metrics (Codahale) through Graphite
➢Master metrics - running application and their status
➢Worker metrics - used cores, free cores
➢JVM metrics - memory allocation, GC
❖We use Anodot to view and track
metrics trends and anomalies
Auto Scaling Group Components❖Auto Scaling Group
➢Scale your group up or down flexibly
➢Supports health checks and load balancing
❖Launch Configuration
➢Template used by the ASG to launch instances
➢User Data script for post-launch configuration
❖User Data
➢ Install prerequisites and fetch instance info
➢ Install and start Chef client
➢Sanity checks throughoutLaunch
Configuration
Auto Scaling Group
EC2InstanceEC2
InstanceEC2InstanceEC2
InstanceEC2InstanceEC2
Instance
User Data
Auto Scaling Group resizing in AWS❖ Scheduled
➢Set the desired size according to a specified schedule
➢Good for scenarios with predictable, cyclic workloads.
❖Alert-Based
➢Set specific alerts that trigger a cluster action
➢Alerts can monitor instance health properties (resource usage)
❖Remote-triggered
➢Using the AWS API/CLI, resize the cluster however you want
Resizing the ASG with Jenkins❖We use schedule-based Jenkins jobs that utilize the AWS CLI
➢Each job sets the desired Spark cluster size
➢Makes it easy for our Data team to make changes to the
schedule
➢Desired size can be manually overridden if needed
Termination Protection❖When scaling down, ASG treats all nodes as equal
termination candidates
❖We want to avoid killing instances with currently running jobs
❖To achieve this, we used a built-in feature of ASG - termination protection
❖Any instance in the ASG can be set as protected, thus preventing termination when scaling down the cluster.
if [ $(ps -ef | grep executor | grep spark | wc -l) -ne 0 ]; then
aws autoscaling set-instance-protection --protected-from-scale-in …
fi
Upstart Jobs for Spark❖ Every spark component has an upstart job the does the
following➢Set Spark Niceness (Process priority in CPU resource
distribution)
➢Start the required Spark component and ensure it stays running
■ The default spark daemon script runs in the background
■ For Upstart, we modified the script to run in the foreground❖ nohup nice -n "$SPARK_NICENESS"…&
vs
❖ nice -n "$SPARK_NICENESS" ...
NewRelic Monitoring❖ Cloud-based Application and Server monitoring
❖Supports multiple alert policies for different needs
➢Who to alert, and what triggers the alerts
❖Newly created instances are auto - assigned the default alert policy
Policy Assignment using AWS Lambda❖Spark instances have their own policy in NewRelic❖Each instance has to ask NewRelic to be reassigned to the
new policy➢Parallel reassignment requests may collide and override
each other❖Solution - during provisioning and shutdown, we do the
following:➢Put a record in an AWS Kinesis stream that contains their
hostname and their desired NewRelic policy ID➢The record triggers an AWS Lambda script that uses the
NewRelic API to reassign the hostname given to the policy ID given
Chef❖Configuration Management Tool, can provision and configure
instances
➢Describe an instance state as code, let chef handle the rest
➢Typically works in server/client mode - client updates every 30m
➢Besides provisioning, also prevents configuration drifts
❖Vast amount of plugins and cookbooks - the sky's the limit!
❖Configures all the instances in our DC
Spark Instance Provisioning❖ Setup Spark
➢Setup prerequisites - users, directories, symlinks and jars
➢ Download and extract spark package from S3
❖Configure termination protection cron script
❖Configure upstart conf files
❖Place spark config files
❖Assign NewRelic policy
❖Add shutdown scripts➢Delete instance from chef database
➢Remove from NewRelic monitoring policy