apache cassandra: nosql in the enterprise

Post on 01-Nov-2014

4.367 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Apache Cassandra:NoSQL in theEnterprise, today

Jonathan Ellis CTO

@spyced

Cassandra Job Trends (indeed.com)

“Big Data” trend

Why Big Data Matters

Research done by McKinsey & Company shows the eye-opening, 10-year category growth rate differences between businesses that smartly use their big data and those that do not.

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

?

✤ Financial✤ Social Media✤ Advertising✤ Entertainment✤ Energy✤ E-tail✤ Health care✤ Government

Some users

Common use cases

✤ Time series data✤ Messaging✤ Ad tracking✤ Data mining✤ User activity streams✤ User sessions✤ Anything requiring:

Scalable + performant + highly available

Why Cassandra?

✤ Fully distributed, no SPOF✤ Multi-master, multi-DC✤ Linearly scalable✤ Larger-than-memory datasets✤ Best-in-class performance (not just writes!)✤ Fully durable✤ Integrated caching✤ Tuneable consistency

Classing partitioning with SPOF

master

slave

slave

partition 1 partition 2 partition 3 partition 4

request router

Fully distributed, no SPOF

client

p1

p1

p1p3

p6

Performance summary

“With Cassandra, we get better business agility, and we don’t have to plan capacity in advance, we don’t need to ask permission of other people to build things for us, and we don’t worry about running out of space or power.”

Adrian Cockcroft, Cloud Architect

Netflix on Cassandra

✤ Could not build datacenters fast enough✤ Made decision to go to cloud (AWS)✤ Applications include Netflix’s subscriber system, AB

testing, and viewing history service

✤ Over a year in, Netflix finds Cassandra to be✤ Fast✤ Cost-effective✤ Scalable✤ Flexible✤ Reliable: no SPOF

“Without Cassandra, our engineers would’ve had to create something that could scale to our needs, that would’ve prevented us from focusing on building product and solving problems for Backupify’s users, which are far more important tasks.”

Matt Conway, VP Engineering

Backupify on Cassandra

✤ Cloud-based utility that enables businesses and consumers to backup, search and restore the content of popular online applications such as Google Apps, Gmail, Facebook, Twitter, and Blogger

✤ Cassandra findings:✤ Solved scaling, allowing engineers to focus on their business✤ DataStax OpsCenter made it easy to monitor the health and

performance of their cluster✤ Reliable, redundant and scalable data storage helped

eliminate down-time✤ Ability to offer both backup and storage, but also analysis

“You can seamlessly add new nodes and expand your total capacity without deteriorating the performance of the data store. Cassandra has allowed us to scale very effectively.”

Harry Robertson, Tech Lead

Ooyala on Cassandra

✤ Ooyala provides a suite of technologies and services that support content owners in managing, analyzing and monetizing the digital video they publish online

✤ Cassandra findings:✤ Classic “Big Data” problem did not require re-architecting✤ Delivered ability to respond to increasingly sophisticated

analytic needs of customers✤ Developers spend time building application features, not

figuring out how to scale

“Cassandra has allowed us to build bigger features faster and more reliably, while using less money and without needing to expand our staff.”

Kyle Ambroff, Sr. Engineer

Formspring on Cassandra

✤ Users of Formspring engage with and learn more about each other by asking and responding to questions. Close to 4B responses in the system and 30M unique users

✤ Cassandra experience✤ No sharding needed – just add nodes to scale✤ Performance – the popular users with many followers saw no

speed reduction. No more memcached!✤ Flexibility of a schema-optional architecture is very developer

friendly

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

?

The evolution of Analytics

Analytics + Realtime

The evolution of Analytics

Analytics Realtime

replication

The evolution of Analytics

ETL

Big data

Analytics(Hadoop)

Realtime(“NoSQL”)

DatastaxEnterprise

DataStax Enterprise re-unifiesrealtime and analytics

Portfolio Demo dataflow

Portfolios

Historical Prices

Intermediate Results

Largest loss

Portfolios

Live Prices for today

Largest loss

Operations

✤ “Vanilla” Hadoop✤ 8+ services to setup, monitor, backup, and recover

(NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Region Server,...)

✤ Single points of failure✤ Can't separate online and offline processing

✤ DataStax Enterprise✤ Single, simplified component✤ Self-organizes based on workload✤ Peer to peer✤ JobTracker failover

Managing & Monitoring Big Data

✤ DataStax OpsCenter manages and monitors all Cassandra and Hadoop operations

Questions?

top related