techtalk v2.0 - performance tuning cassandra + aws
TRANSCRIPT
Eddie Garcia, VP of InfoSec and Services Gazzang, Inc.
I/O Performance tuning for Cassandra running on AWS with Gazzang
Today’s Agenda
• Tips and Tricks to achieve high performance when running
Cassandra on AWS
• ConfiguraBon tuning for Cassandra
• Tools to benchmark raw file system I/O
• AWS available AMIs to boost performance
• Stress tesBng on AWS i2 HVM instances
• Configuring AWS EC2 instances with SSDs and EBS storage
with PIOPS
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 2
Performance tuning
• Tuning at every layer – Tune the AWS layer – Tune the Cassandra layer – Tune the file system / security layer
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 3
Tune the AWS layer
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 4
Tune the AWS layer
• i2 HVM instances will provide beNer I/O over other instance
types
• i2 instances will support SSD TRIM for beNer SDD health and
performance over Bme
• Use Amazon Linux distribuBon AMI or kernel version 3.8 and
greater for higher I/O performance
• Use Amazon Linux distribuBon AMI for built-‐in SR-‐IOV (single
root I/O virtualizaBon) drivers to enable higher performance
AWS Enhanced Networking when running in a VPC
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 5
Amazon Linux AMI Instance Types and Sizes
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 6
http://aws.amazon.com/amazon-linux-ami/
Amazon Linux AMI Instance Types and Cost on-‐demand in US East
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 7
http://aws.amazon.com/ec2/pricing/
Tune the Cassandra layer
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 8
Tune the Cassandra layer
• Follow DataStax published Cassandra best pracBces hNp://www.datastax.com/documentaBon/cassandra/2.0/cassandra/install/installRecommendSe]ngs.html
• Data directory should go on the mounted ephemeral instance
storage, avoid EBS storage for maximum I/O performance
• IMPORTANT: You must have a backup strategy when using
ephemeral, for example using S3 for backups
• RAID-‐0 (stripe) of SSDs is supported but Cassandra also does a great job of using all mounted drives without RAID
• Scale by adding smaller instances vs. increasing instance size
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 9
Tune the Cassandra layer
• Cassandra writes immutable sstable files to disk. It then
compacts mulBple sstables into 1 larger sstable with some
cleanup occurring along the way which also helps TRIM
• More OS memory the beNer, on read the sstables are cached
as normal memory mapped file loaded into OS memory
• Increasing the JVM heap size can cause performance issues for
Cassandra during garbage collecBon “Death by Garbage
CollecBon”
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 10
Tune the file system / security layer
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 11
Tune the file system layer
• Format the file system with ext4 vs ext3 or xfs if supported by
your chosen Linux distribuBon
• Use the most current Linux version for your distribuBon, many
performance fixes are supported only in newer kernels
• Use IOZone or other file system tests before and ager
configuraBons to benchmark raw file I/O before loading your
Cassandra data
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 12
Tune the file security layer
• Use Block Level encrypBon dedicaBng enBre SSD volume
• Encrypt the cluster before loading data whenever possible
• Use systems that support hardware encrypBon acceleraBon
like Intel AES-‐NI hNp://aws.amazon.com/ec2/instance-‐types
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 13
Test and measure
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 14
Performance TesJng
• When tesBng performance reduce the number of variables
that can affect the test
– Stopping and stopping a server can switch your instance to a different host with different performance
– Time of day when you run tests can affect the performance
– Eliminate cached in memory data from prior tests which may
contaminate your results
– Avoid tesBng on systems with unknown state and size of data
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 15
Cassandra Test Environment
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 16
Cassandra Stress Client
Cassandra Node 1
Cassandra Node 2
Cassandra Node 3
Cassandra Node 4
Cassandra Node 5
Cassandra Node 6
EBS Clear text
EBS 4K PIOPS
SSD Clear text
SSD Encrypted
IOZone Tests Cassandra
Stress Tests
S3 Backups
Test Environment SpecificaJons
Instance: i2.2xlarge AZ: us-‐east-‐1a AMI InformaBon: amzn-‐ami-‐hvm-‐2013.09.2.x86_64-‐ebs (ami-‐e9a18d80) Linux DistribuBon: Amazon Linux AMI release 2013.09 Kernel Version: 3.4.73-‐64.112.amzn1.x86_64 Drive Layout: Filesystem Size Used Avail Use% Mounted on /dev/xvda1 7.9G 1.8G 6.1G 23% / (EBS backed for tests, ephemeral is beNer) tmpfs 30G 0 30G 0% /dev/shm /dev/xvdb 734G 197M 697G 1% /mount/ssd1 (Cleartext test SSD) /dev/mapper/encrypted 734G 36G 662G 6% /encrypted (Encrypted test SSD) Cassandra Stress Client – m1.medium Cassandra Cluster: 6 Nodes DataStax enterprise: dse-‐libcassandra-‐3.2.2-‐1.noarch Cassandra: version 1.2.12.2 Java HotSpot(TM) 64-‐Bit Server VM/1.6.0_45
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 17
IOZone SSD vs. Non-‐SSD
IOZone test configuraBon Bme iozone -‐ORa -‐s 163840 -‐r 16384 Iozone: Performance Test of File I/O Version $Revision: 3.420 $ Compiled for 64 bit mode. Build: linux-‐AMD64 OPS Mode. Output is in operaBons per second. Excel chart generaBon enabled Auto Mode File size set to 163840 KB Record Size 16384 KB Command line used: iozone -‐ORa -‐s 163840 -‐r 16384 Time ResoluBon = 0.000001 seconds. Processor cache size set to 1024 Kbytes. Processor cache line size set to 32 bytes. File stride size set to 17 * record size.
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 18
http://www.iozone.org/
Cassandra Test Environment
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 19
Cassandra Node
EBS Clear text
EBS 4K PIOPS encrypted
SSD
SSD Encrypted
IOZone Tests
real 1m6.360s user 0m0.084s sys 0m0.911s
real 0m15.223s user 0m0.115s sys 0m1.391s
real 0m9.951s user 0m0.291s sys 0m3.595s
Cassandra stress
The cassandra-‐stress tool
• A Java-‐based stress tesBng uBlity for benchmarking and load tesBng a Cassandra cluster.
• The binary installaBon of the tool also includes a daemon, which in larger-‐scale tesBng can prevent potenBal skews in the test results by keeping the JVM warm.
• Modes of operaBon: – InserBng: Loads test data. – Reading: Reads test data. – Indexed range slicing: Works with RandomParBBoner on indexed tables.
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 20
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCStress_t.html
Current Cassandra stress test configuraJon
• Cassandra stress test command – <cassandra home>/tools/bin/cassandra-‐stress -‐l 3 -‐o insert -‐n 100000000 -‐i 1 -‐e ONE -‐c 10 -‐d <Cassandra Node IPs> -‐t 150 -‐f T1.csv &
• In the stress test, client stress test nodes 1 – 3 will target two separate Cassandra nodes. On client node #4, target all Cassandra nodes. – Client#1 —> CAS 1, 2 – Client#2 —> CAS 3, 4 – Client#3 —> CAS 5, 6 – Client#4 —> CAS 1, 2, 3, 4, 5, 6
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 21
Cassandra Test Environment
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 22
Stress Client 1
Cassandra Node 1
Cassandra Node 2
Cassandra Node 3
Cassandra Node 4
Cassandra Node 5
Cassandra Node 6
SSD Clear text
SSD Encrypted
Cassandra Stress Tests
Stress Client 2
Stress Client 3
Stress Client 4
Benchmark clear text vs encrypted inserts (write)
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 23
Summary
• Test in your environment with your data, results will vary greatly on OS, HW and applicaBon configuraBons – Baseline before you tune – Tune – Test ager tuning – Measure – Rinse and repeat twice
• Security and Performance are not mutually exclusive,
encrypBon can coexist with High I/O performance • Do your homework, configure and run tests that map to your
use case
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 24
• Headquartered in AusBn, Texas • Focus on securing sensiBve data in cloud and big data environments
• Enable customers to meet compliance requirements like HIPAA, PCI, FIPS and FERPA
• SaBsfy internal security mandates
• Protect valuable client informaBon
About Gazzang
Gazzang is focused on data at-‐rest encrypBon
Security in the cloud is a layered approach
26 4/24/14 Gazzang - All rights reserved 2013
Data in process (in applicaJon)
Data at rest (storage)
Data in transit (SSL)
and key management
27 4/24/14 Gazzang - All rights reserved 2013
Security in the cloud is a layered approach
Data in process (in applicaJon)
Data at rest (storage)
Data in transit (SSL)
Thank you!
Gazzang, Inc www.gazzang.com Eddie Garcia VP of InfoSec and Services [email protected]
4/24/14 © Gazzang, Inc. -- CONFIDENTIAL -- 28