cassandra summit 2014: cyanite — better graphite storage with apache cassandra
DESCRIPTION
Presenter: Pierre-Yves Ritschard, CTO at Exoscale Graphite is the go-to tool of sysadmins everywhere to store and retrieve timeseries data. Cyanite is an alternative graphite compatible daemon which uses Cassandra as its main storage engine. The talk will focus on how to build efficient time-series data models in Cassandra, how the ecosystem of tools around Cassandra can help in processing timeseries in batches and will provide architectural insight in how to build truly scalable time series pipelines.TRANSCRIPT
BETTER GRAPHITESTORAGE WITH CYANITE
PIERRE-YVES RITSCHARD@PYR
#CASSANDRASUMMIT
0
@PYRCTO at exoscale, the safe home for your cloud applicationsOpen source developer: pithos, cyanite, riemann, collectd…Recovering Operations Engineer
AIM OF THIS TALKPresenting graphite and its ecosystemPresenting cyaniteShow-casing simplicity through cassandra
OUTLINEGraphite overviewThe problem with graphiteCyanite solutions & internalsLooking forward
GRAPHITE OVERVIEW
FROM THE SITEGraphite does two things:
1. Store numeric time-series data2. Render graphs of this data on demand
http://graphite.readthedocs.org
SCOPEA metrics toolNot a complete monitoring solutionInteracts with metric submission tools
WHY ARE METRICS IMPORTANTOutside the scope of this talkNarrowing the gap between map and territory
GRAPHITE COMPONENTSwhispercarbongraphite-web
WHISPERRRD like storage libraryWritten in pythonEach file contains different roll-up periods and an aggregationmethod
CARBONAsynchronous (twisted) TCP and UDP service to input time-series dataSimple storage rulesSplit across several daemons
CARBON-CACHEMain carbon daemonTemporarily caches values to RAMWrites out to whisper
CARBON-AGGREGATORAggregates data and forwards to carbon-cacheLess I/O strain on the filesystemAt the expense of resolution
CARBON-RELAYProvides sharding and replicationForwards to appropriate carbon-cache processes based on aprovided hashing method
GRAPHITE-WEBSimple Django-Based HTTP apiPersists configuration to SQLData query and manipulation through a very simple DSLGraph renderingComposer client interface to build graphs
## sum CPU valuessumSeries("collectd.web01.cpu-*")
## provide memory percentagealias(asPercent(web01.mem.used, sumSeries(web01.mem.*)), "mem percent")
SCREENSHOTS
SCREENSHOTS
ARCHITECTURE OVERVIEW
MODULARITY IN GRAPHITERecently improvedA module can implement a storage strategy for graphite-webCarbon modularity is a bit harder
THE GRAPHITE ECOSYSTEMA wealth of tools are now graphite compatible
STATSDVery popular metric service to integrate within applications.
Aggregates events in n second windowsShips off to graphite
statsd.increment 'session.open'statsd.gauge 'session.active', 370statsd.timing 'pdf.convert', 320
COLLECTDVery popular collection daemon with a graphite destinationEvery conceivable system metricsA wealth of additional metric sources (such as a fast statsdserver)<plugin write_graphite> <carbon> Host "graphite-host" </carbon></plugin>
GRAPHITE-APIAlternative to graphite-webShares data manipulation codeNo persistence of configuration
GRAFANAIncreasingly popular alternative to graphite-web, withgraphite-apiInspired by the kibana project for logstashOptional persistence to elasticsearch for configuration
RIEMANNDistributed system monitoring solution
(def graph! (graphite {:host "graphite-server"}))
(streams (where (service "http.404") (rate 5 graph!)))
AND A LOT MOREsyslog-nglogstashdescartestasseojmxtrans
HIGH VALUE PROJECTActive and friendly developer communityGrowing ecosystemVery few contenders
THE PROBLEM WITH GRAPHITE
ESSENTIALY A SINGLE-HOST SOLUTIONBuilt in a day where cacti reignedInnovative project at the time which decoupled collectionfrom storage and display
THE WHISPER FILE FORMATOne file per data pointOptimized for space, not speedPlenty of seeksOnly shared storage option is NFS…In many ways can be seen as RRD in python
SCALING STRATEGIESTacked on after the factThe decoupled architecture means that both graphite-weband carbon need upfront knowledge on the locations of shard
SCALING OVERVIEW
IT GETS A BIT HAIRYCluster topology must be stored on all nodesManual replication mechanism (through carbon-relay)Changing cluster topology means re-assigning shards byhand
WHAT GRAPHITE CAN KEEPPersistence of configurationLocal data manipulation
WHAT GRAPHITE WOULD NEEDAutomatic shard assignmentReplicationEasy managementEasy cluster topology changes (horizontal scalability)
THE CYANITE APPROACHLeveraging Apache Cassandra to store time-seriesLeveraging Graphite for the interface
A CASSANDRA-BACKED CARBON REPLACEMENT
Written in clojure Async I/ONo more whisper filesFast storageHorizontally scalableInterfaced with graphite-web through graphite-cyanite
CYANITE DUTIESProviding graphite-compatible input methods (carbonlisteners)Providing a way to retrieve metric names and metric time-series
Implemented as two protocolsA metric-storeA path-store
The rest is up to the graphite eco-system, through graphite-cyaniteThe recommended companion is graphite-api
GETTING UP AND RUNNINGA simple configuration file
carbon: host: "127.0.0.1" port: 2003 readtimeout: 30 rollups: - period: 60480 rollup: 10 - period: 105120 rollup: 600http: host: "0.0.0.0" port: 8080logging: level: info files: - "/var/log/cyanite/cyanite.log"store: cluster: 'localhost' keyspace: 'metric'
GRAPHITE-CYANITEwith graphite-web:
STORAGE_FINDERS = ( 'cyanite.CyaniteFinder', )CYANITE_URLS = ( 'http://host:port', )
with graphite-api:cyanite: urls: - http://cyanite-host:portfinders: - cyanite.CyaniteFinder
LEADING ARCHITECTURE DRIVERSSimplicityOptimize for speedAs few moving parts as possibleMulti-tenancyResource efficiencyRemain compatible with the graphite ecosystem
CYANITE INTERNALS
CASSANDRA IS GREAT FOR TIME-SERIESIt bears repeating
High write to read ratio workloadNo manual shard allocation or reassignmentSorted wide columns mean efficient retrieval of data
A NEW STACK
SIMPLE SCHEMACREATE TABLE "metric" ( tenant text, period int, rollup int, path text, time bigint, data list<double>, PRIMARY KEY((tenant, period, rollup, path), time))
TAKING ADVANTAGE OF WIDE COLUMNS
LOOKING FORWARD
REPLACING MORE GRAPHITE PARTS, EXTENDINGFUNCTIONALITY
Implement graphite's data manipulation functionsRemove the need for graphite-api or graphite-web whenusing grafanaFinish providing multi-tenancy options
PICKLE SUPPORTEasier integration in existing architecturesWould allow integration with carbon-relay
ALTERNATIVE INPUT METHODSSupport queue input of metricsCollectd already supports shipping graphite data to ApacheKafkaSupport the statsd protocol directly
PROVIDE A CYANITE LIBRARYEasy, standard-compliant storage from JVM basedapplications
BATCH OPERATIONSCompactions of rolled up seriesDynamic thresholdsGreat opportunity to leverage the cassandra & sparkinteraction
A FEW TAKE-AWAYSCassandra enabled a quick-win in about 1100 lines of clojureGreatly simplified scaling strategyBuilding block for a lot moreGood way to reduce technology creep if you're already usingcassandra
THANKS !Cyanite owes a lot to:
Max Penet (@mpenet) for the great alia libraryBruno Renie (@brutasse) for graphite-api, graphite-cyaniteand the initial nudgeDatastax for the awesome cassandra java-driverIts contributorsApache Cassandra obviously
@pyr – #CassandraSummit