Using Cassandra in Cloudian,
an S3 Cloud Storage System
August 8, 2012
Gary Ogasawara
Cloudian, Inc.
Cassandra Summit 2012 (#cassandra12)
Page 1 Copyright © 2012 Cloudian Inc. & KK All Rights Reserved.
#cassandra12
What is Cloudian?
Cloudian =
S3 Cloud Storage
as Packaged SoBware
(c) Copyright , Cloudian Inc. & KK, 2012, All rights reserved. 2
#cassandra12
Cloudian Features 1. Full Amazon S3 API CompaJbility, including error codes
2. MulJ-‐datacenter, peer-‐to-‐peer architecture. No single point of failure.
3. MulJ-‐tenant: QoS controls, billing, reporJng by each User and each Group
4. Public and Private Clouds.
5. ElasJc Capacity: small start and scale-‐out as needed
6. System, Group, and User management by Management Console or REST API
7. Easy to Use Packaged SoBware, backed by 24x7 carrier grade support.
3 (c) Copyright, Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cloudian ObjecJves
1. Fully packaged soBware • Hide NoSQL complexity
• Easy install/upgrade
• HyperStore: Best fit store
• Easy to deploy on exisSng hardware/network.
• Flexible for different customer types.
• Scalable. Start small and grow.
1. S3 API full compaJbility
• Use S3 ecosystem applicaSons “as is”.
• API already designed.
1. Complete service pla^orm
• User/Group Provisioning • Cluster Management
• ReporSng • Billing
• Turnkey system.
• Can choose integraSon points with exisSng systems.
(c) Copyright, Cloudian Inc. & KK, 2012, All rights reserved. 4
#cassandra12
Object vs. File vs. Block Storage
AbstracJon Level
OBJECTS
FILES
BLOCKS
HTTP
ApplicaJon Level
OS User Level
OS Kernel Level
NAS (NFS, CIFS)
SAN (iSCSI)
Page 5 (c) Copyright , Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Libraries, applicaSons, gateways, etc. using Amazon S3 can be simply re-‐pointed to Cloudian.
Public
Private
Hybrid
S3 Ecosystem
Page 6 Copyright © 2012 Cloudian Inc. & KK All Rights Reserved.
#cassandra12
S3 FuncJons
• HTTP REST API. PUT, POST, GET, DELETE, HEAD. • Objects organized into buckets. • Security. Requests authenScated using keyed HMAC with symmetric keys. Also, HTTPS opSon, client-‐side encrypSon, server-‐side encrypSon.
• Access control lists (ACLs) define access rights to bucket and object. • Accoun9ng of bytes inbound, outbound, stored and HTTP request counts. Billing by Sered raSng plans per accounSng type, per-‐region.
• Mul9-‐part uploads. Allows uploading large objects in mulSple parts.
• Versioning. MulSple versions of same object.
• Loca9on constraint. Buckets can be assigned to a specific region. Each region has own domain.
• … Page 7 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Works with leading Cloud Compute Pla^orms�
© 2010-2012 Gemini Mobile Technologies Inc. & KK
Page 8
Cloudian-Citrix CloudStack �(May 9, 2012) �
Cloudian-OpenStack �(October 21, 2011)�
#cassandra12
Cloudian Customers
Private Hybrid
Public
Channel Partners:
Page 9 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Why Cassandra? l Scalable • Add capacity by adding nodes to running system. • Distributed (P2P architecture), no single point of failure
l Reliable • Resilient to network or hardware failures. • MulS-‐datacenter replicaSon • Tuneable data consistency level.
l Features • TTL, secondary indexes, counters, compression, encrypSon, …
l Fast • Write path especially fast.
Why Cassandra?
10 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cassandra in Cloudian
• v1.0.7 in use (started at 0.7.x) • Forked to add customizaSons
• Hector client • Data stored includes: • Object metadata
• Reports/logs • Counters for rate control • …
Page 11 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cloudian: Logical Architecture
12
Admin Server
S3 Server
CredenJals DB
AccountInfo & QoS DB
(Cassandra)
UserData DB (Cassandra)
Reports DB (Cassandra)
Servlets Servlets
Login
Account profile / Security keys
Reports
Data Explorer
HTTPS
HTTPS
HTTP
HTTP
WEB UI
ApplicaJons
HTTP
HTTP or HTTPS (S3)
Management Console
Data Servers
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Minimum Redundant ConfiguraJon
13
LB
Browser requests for UI
ApplicaJon requests for S3
HTTP/HTTPS
HTTPS SJcky sessions HTTP/S
Server Cassandra
Servlets
HTTP/S Server Cassandra
Servlets
CredenJals DB
CredenJals DB
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
MulJ-‐Datacenter Example
l 2 datacenters / 4 nodes per datacenter
14
l Storage objects, reports, profiles replicated across DCs by Cassandra.
l CredenSals DB (Redis) has local DC slave and single global master.
S3/Admin /HyperStore
CMC
S3/Admin /HyperStore
CMC
S3/Admin /HyperStore
Redis (M)
S3/Admin /HyperStore
DC1
Redis (S)
Cassandra Cassandra
Cassandra Cassandra
CMC CMC
Redis (S) Redis (S)
S3/Admin /HyperStore
CMC
S3/Admin /HyperStore
CMC
S3/Admin /HyperStore
Redis (S)
S3/Admin /HyperStore
DC2
Redis (S)
Cassandra Cassandra
Cassandra Cassandra
CMC CMC
Redis (S) Redis (S)
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Region 3 DC 3-‐1
DC 3-‐2
Network Scaling Example
15
Region 2
Region 1 DC 1-‐1
DC 1-‐2
DC 2-‐1
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cassandra for Object Store
l Dynamically decide how to store each object (Cassandra or file system). l Cassandra beier for small objects.
l Large objects split into mulSple parts and chunks. l Row key: Object name + version + part info + Smestamp l Column name: Unused
16
Row key
Column Name
Value
Column Family
Random ParSSoner
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cassandra for Object Metadata
l Size, Etag, MD5, Smestamp, ACL, part info, version, etc.
l Old versions of metadata format supported.
l Row key: Group + user + bucket
l Column names: Object name + version + part info + Smestamp
l Wide rows. Column sorSng used for bucket lisSng.
17
Column Name
Value Row Key
Column Name
Value
Column Name
Value
Sorted by Column Name Column Family
… Random ParSSoner
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cassandra for Account Info DATA MODEL l User
-‐ ID, name, contact info, etc. l Group
-‐ ID, name, contact info, etc. l RaSng Plan l Security CredenSals l QoS Counters NOTES l “StaSc” data. Fixed number of columns. l Secondary index in User CF on groupID. Allows query to get all userIDs for a specified groupID.
l Could be put in a RelaSonal DB like MySQL, but no need to add another component.
18 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Quality of Service / SLA Management • Configurable maximum limits per-‐region at per-‐user, per-‐group, system level. • Requests/minute • Storage bytes • Storage objects • Data Bytes Inbound • Data Bytes Outbound
• While limit is reached, requests are rejected.
Page 19 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cassandra for Reports
DATA MODEL l “Raw” column family
-‐ User, Group, System -‐ TransacSon type (HTTP GET, PUT, DELETE) -‐ Object path -‐ Size -‐ …
l “Rollup” column families. -‐ RollupHour. Summarizes data for each hour using Raw data. -‐ RollupDay. Summarizes data for each day using RollupHour data. -‐ RollupMonth. Summarizes data for each month using RollupDay data.
NOTES l High write rate. Low read rate. l Rollup tables used for direct queries. l AutomaSc deleSon using Cassandra TTL (Sme-‐to-‐live).
20
…
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cassandra: Wish List
21
1. Repair • Slow, impact on performance, difficult to monitor progress, manual
operator acSon required.
2. CompacSon • Heavy performance impact. Hard to tune. Capacity planning difficult.
3. Schema changes • Fixed in 1.1.
4. Large column slices.
5. Caches (row and key) not useful. Slower performance, large memory use.
6. JMX too slow. Need to directly use and expose Java interfaces.
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
HyperStore™ HyperStore: Management policies tailored for different object types. l Object metadata is sSll stored in Cassandra
l Use Cassandra’s distributed systems methods for data parSSoning, replicaSon, node health detecSon.
l Fork Cassandra source for customizaSons. Benefits: l Beier performance l More capacity per node l Higher disk uSlizaSon l Storage layer flexibility
22
AccounJng (Cassandra)
ReporJng (Cassandra)
Admin
CredenJals
Data Store (Cassandra)
Data Store (File System)
HyperStore Manager
S3 REST API
NFS
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
Cloudian S3 Storage Server
#cassandra12
HyperStore: Hybrid Storage Example
l OpSmal soluSon is to choose the storage method that minimizes latency. l Generally, you want to maximize/minimize U, a performance metric, based on random variables X using a mixture of N storage layers.
l In a simple case, l U : average latency l X = {object size} l N = {cassandra, ext4 fs}.
23
U
X
Storage 1
optimal Storage 2
(c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
HyperStore: Faster Read & Writes
(c) Copyright and ConfidenSal, Gemini Mobile Technologies, Inc. & KK, 2011, All rights reserved.
0
10
20
30
40
50
0.5 5 50 500 KB
PUT-‐Cass
PUT-‐HS
ms
0 10 20 30 40 50 60
0.5 5 50 500
GET-‐Cass
GET-‐HS
ms
>30% faster
>400% faster
KB
#cassandra12 25
Strictly ConfidenSal
8/9/12
PUT GET LIST DELETE
Operations 50478� 1679� 3642� 422
Latency (msec) 149.78� 314.80� 41.60� 34.50
PUT GET LIST DELETE
Operations 50559� 9195� 3575� 2224�
Latency (msec) 96.64� 35.63� 28.14� 23.93�
No HyperStore With HyperStore
iostat % uJlizaJon iostat % uJlizaJon
io read/write (MB) io read/write (MB)
20 tps, 10 threads, 2MB data
HyperStore: Less Compaction
#cassandra12
Finally
Cassandra and other enabling technologies has allowed “leveling the playing field” for cloud storage providers.
Info: www.cloudian.com
l Download trial version.
l Coming soon:
l #1 best seller in “Database” category on amazon.co.jp.
Page 26 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cloudian S3 API Compliance
Amazon S3
Cloudian S3 Mul9-‐part Upload
Tiered Storage
Loca9on constraint
Client Library & Error Code Compa9ble Mul9-‐Tenant Service
(Dashboard, QoS, Monitoring, Admin, Reports, Billing)
Packaged & Supported
Basic S3 CompaJbility Put, Get, Head, Delete, etc.
Basic Object Store RESTful API Objects in Buckets w/ Metadata
Distributed & Replicated
Mul9-‐ Datacenter Support
Versioning
Content Sharing
Integra9on Ready with Turnkey Installa9on
Page 28 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Admin & User Dashboard
Billing
ReporSng
Quality of Service & SLA Management
Monitoring
Complete Service Pla^orm
Page 29 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Monitoring
• Open-‐source network management system • Used for node and applicaSon monitoring • Gemini provides a template file for Cloudian-‐specific monitoring • Cloudian monitoring uses JMX staSsScs that are output by Cassandra and Cloudian servers
Page 30 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
AccounJng, Usage Data and Billing • Per-‐Group, Per-‐User and Global
• AccounSng data is maintained per-‐group, per-‐group user, and also at the global basis.
• Separate for each region. Admin API can retrieve data for 1 region or all regions. • Aiributes
• Storage. Bytes and objects stored. The amount of bytes (in GB-‐months) and objects stored.
• Data Transfer. Bytes in and out. For both inbound (writes) and outbound (reads) data, the number of bytes transmiied.
• Requests. Number of requests. The total number of requests, HTTP type, URI.
• Billing rules • Billing rules as a weighted sum of the accounSng aiributes can be configured per-‐group. • Example:
• Storage bytes: For first 5 GB-‐month ($0.10/GB-‐month) , then $0.08/GB-‐month. • PUT requests $0.01 per 1,000 requests. • GET requests $0.001 per 1,000 requests. • Data Transfer IN: $0.00 per GB • Data Transfer OUT: For first 1 GB ($0.10/GB), for next 9 GB ($0.05/GB), then ($0.03/GB).
Page 31 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cloudian Management Console (CMC)�
Page 32 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
Cloudian Management Console (CMC)�
Page 33 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.
#cassandra12
CloudStack IntegraJon
HOST �
HOST �
Cluster �
POD �
ZONE�
Primary Storage �
Secondary Storage �
CloudStack System VM�
HOST �
HOST �
Cluster �
POD �
ZONE�
Primary Storage �
Secondary Storage �
CloudStack System VM�
CloudStack Zone � CloudStack Zone �
Snapshot, Backup, ISO
Cluster �
POD �
NFS�NFS�
CloudStack Zone �
Secondary Storage �
Secondary Storage �
Secondary Storage �
Page 34 (c) Copyright. Cloudian Inc. & KK, 2012, All rights reserved.