![Page 1: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/1.jpg)
Getting Started with DataStax Enterprise
A Technical Overview
Confidential 1
![Page 2: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/2.jpg)
![Page 3: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/3.jpg)
Agenda
Confidential 3
Why Cassandra?
Why DataStax Enterprise?
How to Evaluate?
![Page 4: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/4.jpg)
Confidential 4
Why Cassandra?
![Page 5: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/5.jpg)
What is Apache Cassandra?
Apache Cassandra™ is a massively scalable NoSQL database.
• Continuous availability• High performing writes and reads• Linear scalability• Multi-data center support
![Page 6: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/6.jpg)
10
50
3070
80
40
20
60
Client
Client
Replication Factor = 3
We could still retrieve the data from the other 2 nodes
Token Order_id
Qty Sale
70 1001 10 100
44 1002 5 50
15 1003 30 200
Node failure or it goes down temporarily
Cassandra is Fault Tolerant
![Page 7: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/7.jpg)
Source: Netflix Tech Blog
Netflix Cloud Benchmark…
“In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.”Source: Solving Big Data Challenges for Enterprise Application Performance Management benchmark paper presented at the Very Large Database Conference, 2013.
End Point Independent NoSQL BenchmarkHighest in throughput…
Lowest in latency…
The NoSQL Performance Leader
![Page 8: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/8.jpg)
Linearly Scalable
10
50
3070
80
40
20
60
10
30
2040100,000 txns
per sec
200,000 txns
per sec
400,000 txns/
per sec
Simply add nodes to double, quadruple performance and capacity
10
20
![Page 9: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/9.jpg)
Client
10
50
3070
80
40
20
60
Client
15
55
3575
85
45
25
65
East Data CenterWest Data Center
10
50
3070
80
40
20
60
Data Center Outage Occurs
No interruption to the business
Multi Data Center Support
![Page 10: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/10.jpg)
Built for Modern Online Applications
• Architected for today’s needs• Linear scalability at lowest cost• 100% uptime• Operationally simple
![Page 11: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/11.jpg)
Agenda
Confidential 11
Why Cassandra?
• Scale with ease• Always on• Deploy across data centers
![Page 12: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/12.jpg)
Agenda
Confidential 12
Why Cassandra?
Why DataStax Enterprise?
• Scale with ease• Always on• Deploy across data centers
![Page 13: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/13.jpg)
Confidential 13
DataStax deliversApache Cassandra to the Enterprise
![Page 14: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/14.jpg)
DataStax supports both the open source community and modern business enterprises.
Why DataStax?
Open Source DataStax Enterprise
Apache Cassandra (Cassandra Chair and 30% of committers)
Community Edition Enterprise Edition(Tested & Certified for Production)
OpsCenter Standard Enterprise (Alerts, Automated Management Services, Cluster
Management)
DevCenter
Drivers/Connectors
Online Documentation
Online Training
Mailing Lists and Forums
Security Standard Enterprise(Kerberos Authentication & SSL Encryption)
Built-in Real-time Analytics
Built-in Enterprise Search
In-Memory Database Option
Expert Support (24x7x365)
Consultative Support
Onsite Training
![Page 15: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/15.jpg)
• Visual browser-based UI• Point-and-click administration• Visual cluster management• Proactive alerts• Built-in external notifications• Visual backup operations
DataStax OpsCenter
![Page 16: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/16.jpg)
Cassandra Query Language (CQL)
DataStax DevCenter – a free, visual query tool for creating and running CQL statements against Cassandra and DataStax Enterprise.
![Page 17: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/17.jpg)
Internal Authentication
Internal validation of authorized users
Simple to implement & easy to understand
No learning curve
Object Permission Management
Deep control over who can add/change/delete/read data
Uses familiar GRANT/REVOKE from relational world
No learning curve
Client to Node Encryption
Ensures data cannot be captured/stolen in route to a server
Data is safe both in flight from/to a database and on the database
Complete coverage is ensured
Cassandra Security
![Page 18: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/18.jpg)
External Authentication
External validation of authorized users
Leverages Kerberos & LDAP)
Single sign-on to all data domains
Transparent Data Encryption
Protects sensitive data at rest via SSL
No changes needed at application level
Encrypt both Cassandra and Hadoop data
Data Auditing
Audit trail of all accesses and changes
Control to audit only what’s needed
Uses log4j interface to ensure performance & efficient audit operations
DataStax Enterprise Security
![Page 19: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/19.jpg)
• Delivers Solr integration • Very fast performance • Search indexes span
multiple data centers (regular Solr cannot)
• Online scalability via adding new nodes
• Built-in failover; continuously available
Built-in Enterprise Search
C* &
Solr
C* &
Solr
C* &
Solr
C* &
Solr
![Page 20: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/20.jpg)
• Real-time analytics on Cassandra hot data
• MapReduce, Hive, Pig, Sqoop, and Mahout
• No single points of failure
Built-In Enterprise Analytics
Enterprise
Analytics
MapReduce, Hive, Pig, More
Continuous
availability
Integrated big data
platform
C* & Hadoo
p
C* & Hadoo
p
C* & Hadoo
p
C* & Hadoo
p
![Page 21: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/21.jpg)
Agenda
Confidential 21
Why Cassandra?
Why DataStax Enterprise?
• Scale with ease• Always on• Deploy across data centers
• Enterprise-ready capabilities• 24x7x365 support
![Page 22: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/22.jpg)
Agenda
Confidential 22
Why Cassandra?
Why DataStax Enterprise?
• Scale with ease• Always on• Deploy across data centers
• Enterprise-ready capabilities• 24x7x365 support
How to Evaluate?
![Page 23: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/23.jpg)
Evaluation Process
Download & install binaries or sandbox
Leverage use cases to identify needs
Install DSE/OpsCenter on servers
Design/Modify data model
Implement data model
Load sample data
Stress test servers
Develop application
1) R&D Mode2) POC Cycle
3) Optimize
Add Nodes(C*, SOLR, and/or
Hadoop)
![Page 24: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/24.jpg)
A Typical POC Environment
• Ideally at least 4 nodes, RF=3• Hardware per node:
• At least 8 core• At least16 GBs RAM (more the better)• SSD physically attached• Linux (ideally 3.x for improved buffered
cache)• Each environment has its own
steps/requirements:• EC2, Rackspace, Google Compute, Other
cloud providers• In-house servers• In-house servers VM
![Page 25: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/25.jpg)
Tailored to Meet Your Needs
Confidential 25
FREE Resources
PAID Services
DSE Sandbox
DSE for Non-Production
OpsCenter (Standard)
DevCenter
DataStax Academy
Community Forums
White Papers &Documentation
Onsite Consulting
Remote Consulting
Onsite Training
Public Training
PAID Subscription
Production DSE Pro
Production DSE Standard
Non-Production DSE Max
Non-Production DSE Pro
Non-ProductionDSE Standard
Production DSE Max
PAID Bundles
Quick StartEnterprise
Quick StartStandard
Customer Success Manager
Proactive Guidance
Free Health Check
Free Migration Assessment
Monthly Bulletin Best Practices
Customer Benefits
![Page 26: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/26.jpg)
The Right Mix of Support Resources
Confidential 26
Education & Training Planning & Design Develop & Test
Training Consulting Support
How to use DataStax Enterprise
Learn DataStax admin features
How to use integrated search
How to use integrated analytics
DataStax Enterprise architecture
Data modeling with DataStax
Cluster tuning and performance
Best practices and planning
Troubleshooting errors
Experiencing unexpected results
Clarification on documentation
Critical issue support
Production Support
![Page 27: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/27.jpg)
Available Online Resources
• Patrick McFadin’s data modeling series• CQL/Data modeling on DataStax• Virtual training• Java driver sample code• SOLR documentation and tutorial on DataStax• Analytics documentation• Github code samples• Advance time series best practices
MassivelyScale a DB!
![Page 28: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/28.jpg)
Agenda
Confidential 28
Why Cassandra?
Why DataStax Enterprise?
• Scale with ease• Always on• Deploy across data centers
• Enterprise-ready capabilities• 24x7x365 support
How to Evaluate?
• Evaluate efficiently
![Page 29: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/29.jpg)
Q&A and Next Steps
Confidential 29
Want to learn more about the evaluation process?• Contact your account manager or email us at
Want access to more Cassandra resources?• Visit Planet Cassandra at www.planetcassandra.com
![Page 30: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/30.jpg)
Appendix
![Page 31: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/31.jpg)
EC2 Install Process with Linux AMI’s
• Read through ec2 production planning: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningEC2_c.html
• Go for i2.2xlarge to i2.4xlarge • Create security group: http://
www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/install/installAMIsecurity.html
• Pick a reputable reliable Linux flavored image to start with - preferably an image with the 3.x kernel on it
• Run through the wizard and start AMI's up• Install the prereq's: http://
www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJREJNAabout_c.html
• Install dse node (depends on OS): http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/install/installTOC.html
• Following the "what's next at the bottom of installation instructions, including configuring dse node multidc or single dc (topology should be planned for): http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/deploy/deploySingleDC.html#deploySingleDC or http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/deploy/deployMultiDC.html#deployMultiDC
• Follow and set recommended production settings: http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html
![Page 32: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/32.jpg)
Cassandra Architecture Basics – One NodeOrganizes Data in Partitions
Inserted data is written to a Commit Log
As well as a MemTable
MemTables are flushed to disk in an SSTable based on size.
SSTables are immutable
Changes to a partition are written to additional SSTables.
Deletes write tombstones
Node 1Row Data
Partition Key
75
Row DataPartition
Key
9
![Page 33: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/33.jpg)
Background – How Cassandra Stores Data
Model brought from BigTable*Partition key and a lot of cellsCell names sorted (UTF8, Int, Timestamp, etc)• CQL creates timestamp if not specified
Partition key
Cell Name ... Cell Name
Cell Value Cell Value
Timestamp Timestamp
TTL TTL
1 2 Billion
©2013 DataStax Confidential. Do not distribute without consent. 33
![Page 34: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/34.jpg)
Node 1
Node 2
Node 5
Node 3
Node 4
Row Data2
3
Row Data7
6
Row Data2
3Row Data2
3
Row Data7
6
Row Data7
6
Cassandra Architecture Basics – Multi Data Center
• Nodes can be arranged in multiple data centers
• Cassandra replicates data efficiently between remote data centers
• Each data center can have a different RF
• Use data centers to segment nodes for different query patterns
Boston
San FranciscoReal
Time
Analytics
![Page 35: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/35.jpg)
Reading Data
©2013 DataStax Confidential. Do not distribute without consent. Slide 35
/* Demonstrate an easy way to query data. */
try { ResultSet result = session.execute ( "SELECT password from user " +
"WHERE username = 'user2';"); if (result.isExhausted())
return; Row user = result.one();
System.out.println("Password is: " + user.getString("password"));
} catch (NoHostAvailableException ex) {
System.out.println("No Host Available");} catch (QueryValidationException ex) {
System.out.println(“Requested consistency” + “level not met”);}
![Page 36: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/36.jpg)
©2013 DataStax Confidential. Do not distribute without consent. Slide 36
Prepared Statements
PreparedStatement statement = session.prepare( "INSERT INTO user (username, password) " +
"VALUES (?, ?);");
BoundStatement boundStatement = new BoundStatement(statement);
try {
session.execute(boundStatement.bind("user4”,"user4password"));
} catch (NoHostAvailableException ex) { System.out.println("Host Not Available");} catch (QueryExecutionException ex) { System.out.println (”Syntax error, runtime, not authorized");} catch (QueryValidationException ex) { System.out.println ("Requested consistency level not met");}
![Page 37: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/37.jpg)
Query-Driven Data Modeling
©2013 DataStax Confidential. Do not distribute without consent.
37
Start by addressing the queries that you will need to answer• Your data should be able to match it directly
Think about:• The actions your application needs to perform
• How you want to access the data
• What are the use cases?
• What does the data look like?
![Page 38: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/38.jpg)
Queries (cont)
What are you trying to retrieve• Does it need to be ordered?
• Is there any nesting of data?
• Do you need to group data?
• Do you need to filter data?
Does data expire?Does data need to be retrieved in chronological order?
©2013 DataStax Confidential. Do not distribute without consent. 38
![Page 39: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/39.jpg)
Relational Concept - Denormalization
• Combine table columns into a single view• No joins• All in how you set the data for fast reads
Employees
SELECT First, Last, DeptFROM employeesWHERE id = ‘1’;
id First Last Dept
1 Edgar Codd Engineering
2 Raymond Boyce Math
©2013 DataStax Confidential. Do not distribute without consent. 39
![Page 40: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/40.jpg)
• Examples: medical device, energy devices/equipment, financial data• Application for sensors, clickstreams, historical data• Typical very high volume writes required• Usually coupled with need to analyze data or search using real-time
analytics• Great fit for DSE Cassandra, SOLR, Analytics Nodes
Time Series – Patterns
©2013 DataStax Confidential. Do not distribute without consent. Slide 40
StationID
Timestamp
Value/s
Timestamp
Value/s
1…N
FLGAZ101
20130611T01:01:01
74.34
20130611T01:01:11
74.28
20130611T01:01:21
74.41
![Page 41: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/41.jpg)
Hardware• Ideal node:
• Processor: CPU 8 cores, • Memory: RAM 16 - 64 GB, with 8 GB of Heap, • Network: at least a Gigabit card, • Disks: lots of small disks using JBOD or basic RAIDs
(0 or 10), but prefer SSDs• Exact needs vary by use case• Production planning:
• http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/architecture/architecturePlanningHardware_c.html
![Page 42: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/42.jpg)
Cassandra Query Language (CQL)
• Very similar to RDBMS SQL syntax• Create objects via DDL (e.g. CREATE…) • Core DML commands supported: INSERT,
UPDATE, DELETE• Query data with SELECT• Leverage Java drivers to execute queries via
PreparedStatements and ResultSets
SELECT * FROM USERSWHERE STATE = ‘TX’;
![Page 43: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/43.jpg)
Client
SSTable
Memory
SSTables
Commit Log
Flush to Disk
Cassandra is Durable
Data is organized into Partitions
Inserted data is written to a Commit Log for a node
As well as a MemTable
MemTables are flushed to disk in an SSTable based on size.
SSTables are immutable
![Page 44: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/44.jpg)
Overview of Replication in Cassandra
• Replication is controlled by what is called the replication factor. A replication factor of 1 means there is only one copy of a row in a cluster. A replication factor of 2 means there are two copies of a row stored in a cluster
• Replication is controlled at the keyspace level in Cassandra
Original row
Copy of row
Replication Factor (RF) determines additional nodes that get a copy of the partition Eg. RF=3
Copy of row
![Page 45: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/45.jpg)
• The schema used in Cassandra is modeled after after Google Bigtable. It is a row-oriented, column structure
• A keyspace is akin to a database in the RDBMS world• A column family is similar to an RDBMS table but is
more flexible/dynamic• A row in a column family is indexed by its key
ID Name SSN DOB
Portfolio Keyspace
Customer Column Family
Data Model
![Page 46: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/46.jpg)
Tunable Data Consistency• Choose between strong and eventual
consistency (one to all responding) depending on the need
• Can be done on a per-operation basis, and for both reads and writes
• Handles multi-data center operations
• Any• One• Quorum• Local_Quorum• Each_Quorum• All
Writes• One• Quorum• Local_Quorum• Each_Quorum• All
Reads
![Page 47: Getting Started with DataStax Enterprise from a Technical Perspective](https://reader038.vdocuments.net/reader038/viewer/2022102922/54c665a44a79592d268b4581/html5/thumbnails/47.jpg)
Thank You