five signs you have outgrown cassandra (and what …€¦ · five signs you have outgrown cassandra...
TRANSCRIPT
2FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
ExecutiveSummaryCassandraisawell-knownNoSQLdatabase,maintainedundertheApacheFoundationandcommercializedbyanumberofcompanies.Whileit’seasyfororganizationssuchasyourstostartwithCassandra,youfindyourself(orsoonwillbe)facingincreasinglylargecostsandcomplexityofday-to-dayoperationsasyourapplicationloadgrows.
ThisimpactsnotonlyyourLineofBusiness(LOB)budget,butalsoyouroperationalstability,andfurther,yourcustomerexperience.YourCassandrainfrastructurehampersyourorganization'sabilitytobeagile,tocompete,andtobringnewproductsandservicestomarket.Aerospike,theleadingenterprise-gradeNoSQLdatabase,cansaveyou5xormoreinTotalCostofOwnership(TCO)whileprovidingproven,unparalleleduptimeandavailability.Aerospikeisusedinproductionandtrustedbyindustry-leadingorganizationsfortheirmission-criticalapplications.
FiveSignsWhatarethefivesignsthatyourcompanymayhaveoutgrownCassandra?
3FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
AboutAerospike
Foundedin2009,Aerospikehasdiligentlyfocusedonbuildingamission-critical,highlyavailable,distributed,andrecord-orientedkey-valueNoSQLdatabase.AerospikepowerstheAdTechindustry;itscustomersincludeAdForm,Applovin,AppNexus,BlueKai,InMobi,RubiconProject,TradeDesk,andmanyothers.Aerospikealsodrivesinnovationinanumberofothersectors,includingTelecommunications(withNokia,HPEnterprise,Airtel,NTT,andViettel),FinancialServices,Gaming(withKing,DraftKings,andCurse),andeCommerce(withWilliams-SonomaandKayak).
DesignedandbuilttoexploitthecharacteristicsofFlash/SSDandpoisedtotakeadvantageofstorageclassmemory,Aerospikeprovidesunprecedentedvaluetoitscustomers.Ourtechnologyisdrivingfundamentalchangesinhowpeoplethinkabout,store,andaccesstheirdata;it'sthekeyingredientforbuildingrich,engagingapplicationsandservices.Aerospikeisdrivingdigitaltransformationacrossmanyindustriesbyenablingourcustomerstobuildrelevantsystemsofengagement;thisincludesbetterrecommendationenginesinretailandmarketing,fraudpreventioninpaymentprocessingandcybercrimedetection,andbillingandserviceenablementintelecommunications.Aerospike’scombinationofextraordinaryuptime,highavailability,andconsistentperformanceallbuteliminatesservicedisruptionsforyourcustomers.
FiveSignsYouHaveOutgrownCassandraTherearemanybusinessandtechnicaldemandsdrivingyourorganization:deliveringnewapplicationsfaster,reducingcosts,providingareliableandengagingexperience,maintainingyourNetPromoterScore,drivingdigitaltransformations,andmore.HowdothesemaptothesignsthatyouhaveoutgrownyourCassandracluster?
Sign#1:YourCassandraClustersAreGrowingatanUnexpectedRate&You’reWorriedaboutTCOIt'sadirtylittlesecret:theNoSQLcommunityandvendorshaveencouragedyoutobuildbig-reallybig-databaseclusters.Itbecameamatterofhonortoberunningthousandsofnodes(andinApple’scase,75,000nodes1).Butwhataretheconsequencesofsuchexpansiveclusters?Inalargecluster,theactualprobabilityandincidenceofhavinganodefailonyougoesfromtheorytoadaily-ifnothourly-occurrence.It’sdozensofharddrivesorSSDsperserver,overhundredsorthousandsofservers.Vendorsdirectlybenefitfromyourbigclusters.Evernoticehowtheypricebythenode?There’snoincentiveforthemtoreducetheirnumber.
1http://cassandra.apache.org/
4FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
Toillustrateourpoint,let’susethedatapresentedondrivefailuresbyGoogleatFast16.Thisresearchshowedafailurerateof1-2%forSSDsand2-20%forHDDs.Whatdoesthismeaninpracticalterms?Let’sstartwithoneserverwithonedriveandworkourwayupto75,000serverseachwith4drives,asyouwouldseeinverylargedeployments.ThefailureratesinincreasinglylargeserverdeploymentsareillustratedinTable1below:
Table1.Observablefailureratesinserverdeployments
Thelargerthecluster,themorecomponentsyouhave;hence,hardwarefailuregoesfromamerepossibilitytoapracticethatoccursdaily,ifnothourly.Serversprawlcreatesmorehardwarefailuresthatyouroperationsteamsneedtodealwith.Bycontrast,smallerclustersmeanfewercomponents,whichreducesthenumberofactualfailureswithwhichyoumustdeal.
Cassandradoesagreatjobofhorizontalscaling:yousimplyaddmorenodes.Themoreimportantquestionis,areyouabletofullyutilizeeachnodebeforeyouneedtobuy,provisionandmanageanother…andanother...andanother?Youknowtheanswer:youhavefoundthatCassandracannotfullyutilizeadatabasenode.Thus,whenyouhitaresourcelimit-eitherCPU,storageIOPs,orDRAMfortheJVMheap-youronlyalternativeistoscaleout.Yeteachnodeyouaddcreatesmorecomplexity.Eachnodeyouaddalsoresultsinsignificantlygreatercost,becauseCassandravendorsliketochargebythenode.Further,reliabilitysuffers:thelawoflargenumbersmeansthatyouwillseeactualfailures,andseethemmorefrequentlythanyoucanimagine.
5FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
AsCassandrarendersyouunabletoutilizetheperformanceofyourserversinitsentirety,youarethusforcedtoperformsomeunnaturalactsbestdescribedinthefollowingwaybytheCassandracommunityanditscommitters:
“Insteadofscalingthecomputesideoverthemetal,wedosillythingslikerunmultipleinstancesperbox…”2
Indeed,runningmultipleCassandrainstancesperboxissocommonthatDataStax,oneoftheCassandravendors,createdaMulti-InstancefeatureaspartoftheirEnterpriseversiontoautomatethisdeploymenttopology.However,runningmultipleinstancesperserverjustaddsfurtheroperationalcomplexityandcompoundingfailuremodeswhenaservergoesdown.
WhyisCassandrasoinefficientwithcomputeresources?TheuseofJava-withmultipleJVMproviders,andwithanumberofgarbagecollection(GC)strategies(e.g.,HotSpot’sCMS,G1,etc.)-createmanyvariablesthatdevelopersandopspeoplecantrytooptimizeandtune34.Togetthemostoutofanode,youneedtocarefullyreadthelogs,adjustJVMparameters,debug5,lookatthreaddumps6,etc.Naturally,youneedtodosoforeachdifferentworkloadandclusterconfiguration,especiallyifthehardwareisdifferent.Andwhenyouupgradethehardwareonyourexistingcluster?Yes,youneedtoretunealloveragain.Whatifyouaddanewworkloadtoexistingdata?You’veguessedit-youneedtoretune.
Thistuningisdifficult,andchangesarepronetoerror7.Mostopsteamsfinditeasiertoexpandtheclusterusingthesameconfigsandhardwareprofile.It’sapractical-thoughcostly-approachfortheLineofBusinessowner,ITbudgetowner,orwhoeverhastopaythebill.
Thinkbackonhowyouinitiallysizedyourcluster.Howdidyouaccomplishthistask?Youfollowedthebestpracticesfromnumerouscommunityblogs,theApachewiki,ordocumentationfromoneormoreofthevendors.Hopefully,youtookintoaccounttheadditionalstoragespaceneededdependingonyourchoiceofcompactionstrategy(STCSvs.LCS).Youmayhavetakenintoaccountspaceforsnapshots.Wereyouthensurprisedwhentheapplicationwasdeployedandusedahugeamountofadditionalstoragespace?Thisiswhereyouneededtoknowwithprecisionwhichfeaturestheapplicationteamusedwhentheapplicationwasconstructed,asnotedbyonecommunityuser:
“...weattemptedtouseaCQLMaptostoreanalyticsdata,wesaw30Xdatasizeoverheadvs.usingasimplerstorageformatandCassandra’soldstorageformat,nowcalledCOMPACTSTORAGE.Ah,that’swherethenamecomesfrom:COMPACT,asinsmall,lightweight.Putanotherway,CassandraandCQL’snewdefaultstorageformatisNOTCOMPACT,thatis,largeandheavyweight.”8
2https://issues.apache.org/jira/browse/CASSANDRA-7486
3https://issues.apache.org/jira/browse/CASSANDRA-8150
4https://issues.apache.org/jira/browse/CASSANDRA-7486
5https://alexzeng.wordpress.com/2013/05/25/debug-cassandrar-jvm-thread-100-cpu-usage-issue/
6https://support.datastax.com/hc/en-us/articles/204226009-Taking-Thread-dumps-to-Troubleshoot-High-CPU-Utilization
7https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html
8http://blog.parsely.com/post/1928/cass/
6FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
Aswewilldescribelater,howyoumodeldata-andwhichfeaturesyouuseinCassandra-dramaticallyaffectsyourutilization,reliabilityandresponsetimes.
Finally,yourcompactionandbackupstrategycanalsohaveahugeimpactonyourCAPEX.Becauseyouarereliantoncompactiontoreducestoragerequirements,youmayendupbackingupoldergenerationsofthedataagainandagainuntilthecompactionscancatchup.Thismayrequiresignificantadditionalstoragecapacity,aswasnotedbyRohitShekharofDatos.ioinhisteam’sexperiments:
“Caseinpoint:[...]secondarystoragewasashighas12timestheprimarystorageforlevelcompaction.”9
Sign#2:PeakLoadsAreCausingServiceDisruptionsIngestinghugeamountsofdata-eitherperiodicallyoraspartoftheregularusageoftheapplication-canbecriticalinmanyapplications.However,asthedataiswritten,theapplicationwillneedtoreadandmodifythesamedata;suchmixedworkloadsconstitutethenormalpatternforapplicationslikeactivitystreams,profilestores,tradestores,etc.Write-onceworkloads,wherethedataisnevermodified,likelogstreams,arenotthenorm.
Ifmixedreadsandwritesaresuchacommonusecase,whyisthispatternsuchaproblemforCassandra?Quitesimply,thisisduetoanarchitecturalchoicemadebythedesignersofCassandra:namely,itslog-structuredfilesystemandtheeventualconsistencyofdata.
KyleKingsbury’s(a.k.a.@aphyr’s)postaboutCassandra10statesthatwithoutvectorclocks,Cassandrahastorelyonaverypreciseusagemodelfromitsusers.Withoutadheringtothesemodels,Cassandrawillloseacknowledgedwrites,meaningthattherearefewguaranteestoreadthecorrectinformation.Asadeveloper,youcantrytocodearoundtheproblemwithvariousconsistencymodels11,soyoucanatleastgetaquorumacrossthecopiesheldacrossthenodesoftheclustersforreadsandwrites.Thisaddsunpredictablelatencytoanyoperation,astheoperationcanonlybeasfastastheslowestnode.Themostcommonsolutionistocachemoreofthedatatoavoiddiskreads;thisleadstolargerclustersandmoreDRAM,violatingmanyofthetuningguidesregardingthesizeoftheJVMheap.
That’snottheonlychallengeformixedworkloads,aswasnotedbythevenerableOraclecorporation:
“Cassandrausesconsistenthashingoverapeer-to-peerarchitecturewhereeverynodeinthesystemcanhandleanyread-writerequest,soarbitrarynodesbecomecoordinatorsofrequestswhentheydonotactuallyholdthedatainvolvedintherequestoperation.Thatmeansbothanextranetworkhop(minimum)foreachcallanditmeansthefailureofasinglenodecanhave
9https://datos.io/backup-challenges-cassandra-compaction/
10https://aphyr.com/posts/294-jepsen-cassandra
11https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
7FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
systemwideperformanceimpactsasotherarbitrarynodeschangetheirbehaviorinresponsetothefailednode.”12
OneCassandravendoralsoacknowledgedthisbehaviorintheirdocumentation:
“Clientreadorwriterequestscanbesenttoanynodeinthecluster.Whenaclientconnectstoanodewitharequest,thatnodeservesasthecoordinatorforthatparticularclientoperation.Thecoordinatoractsasaproxybetweentheclientapplicationandthenodesthatownthedatabeingrequested.Thecoordinatordetermineswhichnodesintheringshouldgettherequestbasedonhowtheclusterisconfigured.”13
Thus,eveninhealthyclusters,thereareinevitablenetworkhopstoservicethesimplestofrequests.Thesecompoundingfactorsleadtoawidevarianceinreadlatencies.Choosingasensiblepartitionkeyisonlyapartialsolution:itsimplylimitsthenumberofnodesthatmustbecheckedratherthaneliminatingtheneedinthefirstplace.
Duringanysituationwherenodesbecomeunavailable,furthermemorypressure14isappliedtothecoordinatornode,sinceitneedstokeeptrackofanyhintedhandoffforwritesthatwillneedtobere-appliedlater.Thismemorypressurecanleadtoinstabilitythroughoutthecluster,aswewillseeinthenextsign.
Compactionsaddanotherlayerofcomplexity.Inanylog-structuredmergefilesystem,youneedtoperiodicallypruneandcompressthetrees,removingolderandredundantversionofthedataandcleaningtombstones(deletedrecords).Cassandrahasbothmajorandminorcompactions,whichthecommunityspendsalotoftimefiguringouthowtotuneforthegivenworkloadandhardware15.Thisisasignificantproblem:duringthetimecompactionsrun,theyadverselyaffectthereadandwritelatencyandthroughputofoperations,andimpactyourSLAs(again).Youknowthiswhenyourlogfilesstarttogetsprinkledwiththefollowingtypesoferrormessages:
Howcanyoubetterdealwithpeakload,then?YouwillwanttoexpandyourCassandraclusteraspartofyourregularcapacityplanning,ortodealwithseasonaleventslikeholidaysales.Butdon’twaituntilthelastmomenttoexpandyourcluster,oryouriskbeingtoolate.Someguides,suchastheThreatStackBlog,state:
“Budgetdaystobringanodeintothecluster.Ifyou’veverticallyscaled[withfewerlargenodes],thenitwilltakeoveraweek.”16
12
http://www.oracle.com/technetwork/database/nosqldb/overview/ondb-cassandra-hbase-2014-2344569.pdf
13http://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archIntro.html
14http://www.datastax.com/dev/blog/modern-hinted-handoff
15https://medium.com/@foundev/how-i-tune-cassandra-compaction-7c16fb0b1d99#.78lo047w7
16http://blog.threatstack.com/scaling-cassandra-lessons-learned
8FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
AndasyouexpandyourCassandracluster,expectthatthiswillhaveoperationalimpactandthatyourapplicationwillhavemissedSLAs.Asonecommunityuserremarked:
“Thebottomline,isthatyourqueriesdohaveahigherchanceoffailingbeforethenewnodeisfully-streamed.”17
Sign#3:You’veLearnedtoLiveWithCascadingFailuresFormission-criticalsystems,availabilityisthemostcrucialaspectofadatalayer.Afterall,isn’tavailabilitywhyyoupickedAPfromtheCAPtheoremandselectedApacheCassandrainthefirstplace?You’vechosenasystemthathasdistributionandreplicationofdata,sowhenanodebecomesunavailable-momentarilyorpermanently-thedataliveselsewhere.Right?
Wrong,actually.Datadistributionsoundswellandgood(andisthecorrectsolution),butifasinglenodeoutagecausesacascadingfailureacrossyourcluster,everynodebecomesthesinglepointoffailureforthecluster.Andcascadingfailuresarecommon1819,especiallywhenCPUpressurecausesnodestostopself-reporting20andinflictingclusterrebalances,causingfurtherCPUandI/Opressureonthesurvivingnodes.AsoneCassandrausernoted,“Cassandraseemstohavetwomodes:fineandcatastrophic”.21
Thefailureofonenodehasoftenbeenobservedtocausecascadingfailures2223acrossthewholecluster.ResearchpapershaveshowntheseproblemstobesystemicwithCassandra24.
What,then,arethetypicalsourcesofcascadingfailures?Theyinclude:
• Memorypressurecausedbyhintedhandoffduringfailover• Compactionstrashingtherowcache• I/Oandmemorypressurefrommemtableflushesduringhighload• CompactionscausingI/O,andthus,CPUpressure• Compactionsnotoccurringfastenough,causingmemorypressure• Memoryusecausingfrequentgarbagecollection,andthus,CPUpressure• Alargenumberoftables,causingmemorypressure
Letustackleeachoftheseinturn.
MemoryPressurecausedbyhintedhandoffduringfailover-AsnotedbyoneCassandravendorinseveralpagesoftheirdocumentation,thecauseofthisisclear.IfyourelyonCassandra’sabilitytostorewritesonacoordinatornodetoreplaylaterwhenthedesignatednodereturnstothecluster,this
17
http://stackoverflow.com/questions/37283424/best-way-to-add-multiple-nodes-to-existing-cassandra-cluster
18http://danluu.com/postmortem-lessons/
19https://www.usenix.org/conference/osdi14/technical-sessions/presentation/yuan
20https://moz.com/devblog/cassandra-in-production-things-we-learned/
21http://www.slideshare.net/planetcassandra/pd-melting-cass/12?src=clipshare
22http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%[email protected]%3E
23http://www.stackdriver.com/post-mortem-october-23-stackdriver-outage/
24http://ucare.cs.uchicago.edu/pdf/socc14-cbs.pdf
9FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
functionalityconsumeslargequantitiesofmemory;thus,anothernode’soutagecausessignificantmemorypressureonalltheothernodes:
"Ifthishappensonmanynodesatoncethiscouldbecome[sic]substantialmemorypressureonthecoordinator.Sothecoordinatortrackshowmanyhintsitiscurrentlywriting,andifthisnumbergetstoohighitwilltemporarilyrefusewrites(with)whosereplicasincludethemisbehavingnodes."25
Thisisnotjustatheoreticalproblem;it'saveryrealonethat’safunctionofyourdatausageanddatadesignwithhintedhandoff,aswasnotedbyoneCassandrauser:
“Serializingthebigrowscauseshighmemorypressure…”26
UsersofCassandraoftenrecommenddisablinghintedhandoffs-andthusreducingavailability-toavoidcascadingfailures:
“Don’tusehintedhandoffs(ANYorLOCAL_ANYquorum).Infact,justdisablethemintheconfiguration.It’stooeasytolosedataduringaprolongedoutageorloadspike,andifanodewentdownbecauseoftheloadspikeyou’rejustgoingtopasstheproblemaroundthering,eventuallytakingmultipleorallnodesdown”27
Compactionstrashingtherowcache-AsnotedbyanotherCassandravendor,compactions,whichincreaseduringasinglenodeoutageasgreaterloadisappliedonsurvivingnodes,alsoincreasepressureontheI/O,memory,andCPUofthosenodes:
“Cassandracompactionthrashesthe[O/S]pagecache,becauseitreadsandwriteseverything,andaftercompactionthemostfrequentlyuseddataislikelytonolongerbeinthecache.”28
ACassandraclusterisoftensizedusingassumptionsabouttheeffectivenessoftherowcache;anineffectiverowcacheleadstoagreaternumberofconnectionsandtransactionsinflight.Thiscausesdifficulty(attheveryleast,performanceissues)forsurvivingnodes.However,theeffectsofthecachearenegatedwhentheblocksbeingcachedarepagedoutbyanotherdatabasefeature.
I/Oandmemorypressurefrommemtableflushesduringhighload-Flushingmemtablesiscriticalbecausewritesareblockeduntiltheflushsucceeds29.Butthereisacascadingeffectifflushingisnottunedcorrectly:
“...propertuningofthesethresholdsisimportantinmakingthemostofavailablesystemmemory,withoutbringingthenodedownforlackofmemory.”30
Indeed,duringanodeoutage,yourcarefullyselectedtuningbecomesinvalid.25
http://www.datastax.com/dev/blog/modern-hinted-handoff
26http://java.cz/dwn/1003/72451_CassandraCZJUG_horky.pdf
27http://blog.threatstack.com/scaling-cassandra-lessons-learned
28http://www.scylladb.com/technology/memory/
29https://docs.datastax.com/en/cassandra/2.1/cassandra/dml/dml_write_path_c.html
30https://wiki.apache.org/cassandra/MemtableThresholds
10FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
CompactionscausingI/O,andthus,CPUpressure-ChoosingLeveledvs.Size-TieredcompactionswilldramaticallychangeI/OandCPUpressure.Thisisafunctionofthereadsandwritesatthismomentintime:apredominantlyread-heavyapplicationwillgettheadverseeffectsofadataloadjob,causingpressureonbothI/OandCPU.
Aswasnotedbyonevendor:
“SinceSSTablesareimmutable,thisprocessputsalotofpressureondiskioasSSTablesarereadfromdisk,combinedandwrittenbacktodisk.”31
Compactionsnotoccurringfastenough,causingmemorypressure-TheimmutablenatureofCassandra'slogstructuremeansthattheprocessofcompactionsisnotonlyinevitable;dependingonyourdataaccesspatterns,itmaydownrightimprisonyou.Indeed,when“thecompactionisnotabletocomplete”32,thiscausesunavailability.AswasalsoexpressedonTarget’stechblog:
“ThenodeswouldOOMfrequentlywhencompactingaspecificcolumnfamily…WhatIdiscoveredisthatCassandrawasreadingalotoftombstoneseachtime,andthiswasputtinglotsofextradataontheheap.Thiswouldjustsnowballwhentheclusterwasunderload,andblowtheheap.”33
Memoryusecausingfrequentgarbagecollection,andthus,CPUpressure-WithaJava-basedcodebase,garbagecollectionisinevitableanduncontrollable.Tuningispossible,butoftendescribedasa“darkart”.Unfortunately,thesideeffectsofgarbagecollectionarereal,asusersreport:
“...garbagecollectionwashappening20+timesasecond,evenwhenCassandrawasundertinyload.”34
“InbothcasestheC*nodesendupdoinggarbagecollectionfor~90secspersweep”35
Thisimpactsthelatencyofresponsesandthroughputofthesystem,asvitalsystemresourcesareusedtomanagememory.
Alargenumberoftables,causingmemorypressure-Thewayinwhichyouconstructedthedataschemacanalsoimpactthememorypressure,andchangestoapplicationdesignandusecanradicallychangehardwarerequirements.AsnotedbyRyanSvihla,aSolutionArchitectatDataStax:
“Thereisingeneralaclustermaxeffectively[sic]limitontablecounts.Anythingover300startstocreatesignificantheappressure.”36
31
http://www.planetcassandra.org/blog/impact-of-shared-storage-on-cassandra/
32http://stackoverflow.com/questions/29273276/cassandra-node-heap-pressure-during-compaction-after-bulk-load
33http://target.github.io/infrastructure/tuning-cassandra
34http://target.github.io/infrastructure/tuning-cassandra
35http://stackoverflow.com/questions/29273276/cassandra-node-heap-pressure-during-compaction-after-bulk-load
36https://medium.com/@foundev/domain-modeling-around-deletes-1cc9b6da0d24#.goi7cxibs
11FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
Sign#4:YourOperationsTeamIsGrowingDisproportionately&TheCostofSupportIsConcerning
Thenumberofaspectsthatanoperationalteammustconsideratclusterprovisioningtimeislarge.Thisleadstoatime-consumingprocessforprovisioningeachcluster.Worse,theoperationsteammustcontinuallymonitorCassandraclustersforchangesinapplicationpatterns,andre-tunetheclustersonafrequentbasis.Failuretoretuneleadsnotonlytopoorperformance,butalso(eventually)toCPUpressure,whichlimitsgarbagecollectioncapabilities;thiscausesmemorypressure,andinturn,outages.
ThecommunityandcommittersarewellawarethatCassandracannotutilizecomputeresourceseffectively,ascanbeseenonCassandra’sownissuetrackingsystem:
“Insteadofscalingthecomputesideoverthemetal,wedosillythingslikerunmultipleinstancesperbox.It’snotreallysillyifitgetsresults,butitisanexampleofwherewedosomethingtactically,getsousedtoitasanecessarycomplexity,andthenjustkeeptakingforgrantedthatthisishowwedoit.”37
Inordertoincreaseutilization,clustersareforcedtogetwider,andmultipleCassandranodesarecommonlyrequiredonthesamecomputenode.Thispatterngreatlyincreasesoperationalcomplexity,necessitatingnotjustmoretimefromyouroperationsstafftoplananddeploy,butalsomorecomplexandcompoundingfailuremodestodiagnoseandfix.
ThesetupoftheJVMandothertuningparametersforthespecificworkloadandhardwaremeansthereisno“outofthebox”settingthatwillconsistentlywork.Tuningisinevitable,asthecommittersofCassandrathemselveshavenoted:
“...there'sabunchofdifferentworkloadsandabunchofdifferenthardwarethatC*runson,andtheideaofhavingadefaultthat'soptimalforeveryoneisunrealistic.ItmayverywellbethatG1isabetter"goodenough"defaultformostdistributions,largeheaporno,andthat'stheconversationonIRC…”38
Thefactthattuningisanecessitycanhaveasignificantimpactonday-to-dayoperations,asDanielParker,anEngineeratTarget,observed:
“Wewerehavingtorestartnodesfrequentlytocleartheheap.Thiswasnotanoptiontocontinuedoing,especiallypre-peakwhenweexpectnearly10xthetrafficvolume.”39
37
https://issues.apache.org/jira/browse/CASSANDRA-7486
38https://issues.apache.org/jira/browse/CASSANDRA-10403
39http://target.github.io/infrastructure/tuning-cassandra
12FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
Sign#5:HiringDedicatedCassandraExpertsHasBecomeUnavoidableandDifficult
Technicalresourcesarealwayshardtofind,employandretain.Findingstaffwithdeepexpertise-evenApachecodecommitters-isastepbeyondmostorganizations’capabilities;yet,thisisoftenrecommendedtoeffectivelyoperateCassandraclusters.IsittrulyabadgeofhonortohaveateamofcommitterslikeNetflixorApple40,orevenemployingandmotivatingateamof“Cassandrawhisperers”41?Doesyourbusinesscoststructureallowseveralhundredsofthousandsofdollarsperyearinhead-counttomaintainfreesoftware?Canyoujustifybuyingacompany42justforitsCassandraexpertise?AsTedWallace,VPofDataDeliveryatBlueKai,noted:
“WeultimatelyfoundoutthattodoCassandrayouneedpeoplewhoarefocusedonkeepingCassandraaliveandrunning.Wedidn’twanttoinvestincreatingateamof‘Cassandrawhisperers’.Wedidn’twanttobeexpertsatmanagingCassandra”43
YouroperationsstaffmustalsounderstandthedifferencesbetweenthemultipleversionsofCassandra,andbeabletotuneeffectivelyfordifferentversions.WillyourstaffchoosethegenericApacheCassandradistribution,oravendoredversioncreatedbyyourCassandradistributor?IfyourteamselectsaCassandradistributor,willtheyprovidecurrentreleases?Willtheykeepupwiththecommunityforfeatures,bugfixes,vulnerabilitiesandcurrency,orwillyouroperationsstaffneedtopatchavendordistributionwithfixesfromtheopensourceversion?EvenacorecommitterlikeDataStaxwilldeferthemigrationfromApacheCassandra2.1to3.0intheirenterprisereleases;isyourorganizationwillingtowait?AsnarratedbyDataStaxthemselvesin2016:
“Today,ittypicallytakesDataStaxfourtosixmonthstocertifyanew,majorversionofopensourceCassandraandensureitisreadyforenterprisedeployments.Thistimemayshortenasthetick-tockprocessdrivesdowndefectrates.”44
40
http://www.planetcassandra.org/mvps/
41http://www.aerospike.com/blog/bluekai-nosql-speed-scale-simplicity/
42http://appleinsider.com/articles/15/03/25/apple-acquires-big-data-analytics-firm-acunu
43http://www.aerospike.com/blog/bluekai-nosql-speed-scale-simplicity/
44http://web.archive.org/web/20160322221453/http://www.datastax.com/2016/01/comparing-open-source-apache-cassandra-and-datastax-enterprise-release-
models
13FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
TherealityisthatitoftentakesaminimumofeightmonthstovendorinApachecode.HereisanexampleofthereleasetimelinefortheDataStaxEnterpriseEdition:
Figure1.VendoringinCassandrareleasesfromApache
Youwillbeleftwithachoicebetweenthefollowingoptions:waituntilapatchisvendoredin,uptakeanewerApachereleaseyourself(andabandonyoursupportsubscription),orfixthecodebaseyouhave.
Committingchangesbetweendistributionsisjustonemorecomplextaskforyouroperationsstafftoundertake.Yourstaff’sharderchallengeoccurredmuchearlierintheprocess-whentheapplicationdevelopersdesignedtheschema,primaryandpartitionkeys,anddecidedonwhichfeaturestouse.AsJuanValencia,PrincipalEngineeratShareThis,offered:
“We'vemadealotofmistakesindatamodelingoverthecourseofdevelopment.Settingupourdatamodelscorrectlywastricky.”45
Distributedcountersseemlikeareasonablefeatureforadistributeddatabase,especiallywhenperformingreal-timeanalytics,andlikely,theapplicationteamchosethem.ButasnotedbyAndrewMontalenti,CTOofParse.ly:
“WhenI’minagoodmood,IsometimesaskquestionsaboutCountersintheCassandraIRCchannel,andifI’mlucky,long-timedevelopersdon’tlaughmeoutoftheroom.Sometimes,theyjustcallmea“bravesoul”...Allofthisistosay:CassandraCounters—it’satrap!Run!”46
Youroperationsstaffcan’trestbyjustcreatingachecklistoffeaturesused;theymustknowhowthedataisused,andwhatitslifecycleis.Considersomethingsimplelikeaqueue,whereyouneedtomaintainorderandalsoexpungedataassoonasit’sprocessed.Thatsimpledatamodeldesignleadsto
45
http://www.informationweek.com/strategic-cio/why-we-picked-cassandra-for-big-data/a/d-id/1318250
46http://blog.parsely.com/post/1928/cass/
14FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
operationalproblemsdowntheroad,asCassandrathenneedstomanagelargenumbersoftombstones(deletedrecords).AsRyanSvihla,aSolutionArchitectatDataStax,remarked:
“Yourealizethatbasedonyourqueueworkflowinsteadof5recordsyou’llendupwithmillionsandmillionsperdayforyourshortlivedqueue,yourquerytimesendupmissingSLAandyourealizethiswon’tworkforyourtinycluster.”47
TheuseofTTL(Time-To-Live)asamechanismtoremovedataafteraspecifiedtimeperiodisanotherpotentialtrap.AsoneCassandrauserobservedintheirownattemptstouseTTLs:
“...[TTLauto-expire]wasabletoeffectadenialofserviceforallloginsthroughcreatingalargeamountofgarbage[tombstone]records.Oncetherecordsforthesefailedloginshadexpired,allqueriestothistablestartedtimingout.”48
Finally,youroperationsstaffmustabsorbtheconsequencesofyourdevelopmentstaffpickingthewrongprimaryorpartitionkey.Apoorchoiceinevitablyendsupwithhotspotnodes,whichcancauseoneormoreofthefollowing:highmemory,CPUpressure,orI/Opressure.Thisleadstothekindofcascadingfailuresdescribedabove.Onestoryfromthecommunitydemonstratesthischainreaction:
“Thefailingnode,infact,wasahotspot!Becauseofanerrorinaprimarykeyofoneofahighloadedtable!Table'sdatawasnotproperlydistributedacrossallclusternodes.Andthelargeportionofdatawasconcentratedonthatnode!Thisledtotwoproblems:1-significantamountofqueries(read/write)wereaddressedtothatnode;2-hugekeys-about~5megsperkey;Thesetwoproblemsledtonodeloadand,duetohugekeys,instability(highpressureonGC).”49
47
https://medium.com/@foundev/domain-modeling-around-deletes-1cc9b6da0d24#.goi7cxibs
48https://www.tildedave.com/2014/03/01/application-failure-scenarios-with-cassandra.html
49https://www.reddit.com/r/cassandra/comments/3uzlnp/cassandra_high_gc_pressure/
15FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
Aerospike:TheEnterprise-GradeNoSQL
Asnotedearlier,Aerospikewasdesignedandbuiltfromthegrounduptotakeadvantageofmoderncomputingarchitectures.Aerospikeundertookanalternativeapproach,buildingfromacleansheetatechnologystackandfundamentalIPthathasenableditscustomerstoobtainandenjoyanunprecedentedlowTCO(TotalCostofOwnership),unparalleleduptime,highavailability,andreduceddatabaseinfrastructurecomplexity.Let'sexplorehowAerospikehasachievedthis.
ReducedTCOAerospikeiswritteninCbyadevelopmentteamwithdeepexpertiseinnetworking,storageanddatabases.Byremovingthelayersoffilesystem,block,andpagecaches,andinsteadbuildingaproprietarylog-structuredfilesystemdesignedforthewayflashdeviceswork,Aerospikecandeliverunprecedentedresourceutilization.ThebottomlineisthatAerospikeclustersaresizedtohave(onaverage)atleast5timesfewerserversthantheequivalentCassandracluster.Forinstance,AdFormwasabletodecreaseitsnumberofnodesfrom32withCassandrato3withAerospike,andachieveafourfoldexpansionofdata50.AerospikeenabledthecompanytosustaintheidenticalthroughputaswiththeiroldCassandracluster,butwithlower-andconsistent-latency,andunmatchedavailability.AsJakobBak,AdForm’sCTO,notes:
“WithAerospike,wehavebeenabletodrasticallycutdownonthenumberofCassandraservers,whichprovidedagreatcostreduction.”51
Lowercost.Higheravailability.Predictableperformance.YougettopickallthreewithAerospike.TherecentlypublishedYCSBbenchmarkcomparingAerospikeandCassandrashowsingreatdetailhowtoprovetoyourselfthisreductionincosts.UsingthestandardYCSBbenchmark,weobservedthefollowinggains:
Table2.SummaryofYCSBBenchmark,Aerospikevs.Cassandra
50
https://vimeo.com/101290545andhttp://www.aerospike.com/adform-divorces-cassandra-scales-4x-with-2x-reduced-servers/
51http://www.aerospike.com/industry/adtech/adform-divorces-cassandra-scales-performance-by-4x-with-2x-fewer-servers/
16FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
Table2showsAerospike’sabilitytofullyutilizethehardware.Butit'snotjustspeedsandfeedsthatproveAerospike’ssuperiority.Accordingly,let'sillustrateasimpleTCOcalculation-basedonactualAerospikecustomerdata-withthesavingsourcustomerswereabletoachieve(Table3):
Table3.CostComparison,Aerospikevs.Cassandra
Table3representsacompositeofmultipleCassandrareplacementsderivedfromactualcustomerimplementations.Thetabledepictsa3-yearTCOcomparison-forexactlythesameproblemset-usingaCassandrasolutionvs.anAerospikesolution.Togeneratethistable,wefirstestimatedthesizeoftheCassandraclusterrequired;wethenestimatedthesizeoftherequiredAerospikeclusterunderthesameassumptions.TheexistingCassandraclustersusedHDDs,whiletheAerospikeclusterwassizedtouseSSDs.Despitethecostdifferencebetweenbothdrivetypes(SSDscostmore),thecumulative3-yearTCOsavingsobtainedbyusinganAerospikesolutionareveryclearandreal. EvenifyourCassandracostsaresunkcostsintheshortterm(forexample,becauseyouhavepre-purchasedayear'sworthofinstancehourswithacloudprovider),switchingtoAerospikerightnowstartstheprocessofsavingmoney.Usingthedataabove,thiswouldstillresultinasavingsofnearly$6MbyYear3.
AsAerospikeisanativeCimplementation,therearenoinefficienciesfromaJavaruntime.Theprimarykeyindexisstoredasaparentlessred-blacktree,enablingultra-fastkeylookupsinDRAM;thedataisthenretrievedfromtheproprietarylog-structuredfilesystem.Thisfilesystemallowsparallelizationacrossallthedevicesonthechassis;atypicalAerospikenodewillhave8-12SSDs-sometimesasmany
17FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
as16.Byreadingandwritinginparalleltoalldevices,andremovingtheblockandpagecaches,AerospikecanfullyutilizealltheIOPsanddiskslotsavailablebeforerunningoutofCPU.
Aerospike’simplementationisnotfastersimplybecauseitusesC.Itusesperformance-tunedlibraries,suchasre-implementationsofmsgpack,andspecifictestedversionsofJEMalloc.Thecodeisahighlyoptimized,reference-countedmultithreadedimplementation,requiringcertaindeveloperskillsinwhichAerospikespecializes.
PredictablePerformanceSinceAerospikehasmaster-basedreplication,operationsareforwardedtotheprimarynodefortherecord.Thisisequivalenttohavingaspecificcoordinatornodeforeachportionofthedata.ThePrimaryKeyisgeneratedbyacryptographicalgorithm,RIPEMD-160,usedbytheBitcoinalgorithm,whichhashadzerodetectedhashcollisionsinanyuse.Thefirsttwelvebitsofthisgeneratedhashidentifythepartitionwheretherecordresides.
Aerospikesupportsallthepopularlanguages(C/C++,C#,Java,Python,Go,Node.js,PHP,Ruby,Perl,andErlang)withavendor-supportedlanguageclient.Thesehigh-performancenativeclientsprovidetranslationofnativedatatypestoandfromAerospike,aswellasinteroperabilitybetweenlanguages,greatlyimprovingdeveloperproductivity.
UnliketheproxymodelusedinCassandra,whererequestsarereroutedbytheCoordinatornode,Aerospikeclientsmaintainadynamicpartitionmap.Thisidentifiesthemasternodeforeachpartition,whichenablestheclienttoroutethereadorwriterequestdirectlytothecorrectnodewithoutanyadditionalnetworkhops.UnlikeCassandra'sCoordinator,thisremovesanunnecessarynetworkhop.Becausethedataiswrittensynchronouslytoallcopiesofthedata,thereisnoneedtodoanyformofquorumreadacrosstheclustertogetaconsistentversionofthedata.AsAdForm’sCTO,JakobBak,opines:
“Evenmoreimportantisthesuperfastkey-valuestoreandextraordinarypredictabilitywegetwithAerospike,providingtheresponsivenessourclientsrequiretocompeteinthecrowdedInternetandmobilemarkets.”52
52
http://www.aerospike.com/industry/adtech/adform-divorces-cassandra-scales-performance-by-4x-with-2x-fewer-servers/
18FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
ThisapproachenablesAerospiketoexcelatmixedread/writeworkloads,withouttheunnecessarycomplicationsandimpacttolatencyofeventuallyconsistentsystems.Tothispoint,ValeryVybornov,HeadofR&DatIMHOVi,notes:
“Itmetourdemandsforrandomaccessresponsetime(mixedreads/writes)underourtypicalloadofseveralhundredwrites/somethousandreadspersecond.”53
YetAerospikeisaboutmorethanmaintainingpredictableperformanceforashortburstoftime,orshiningin5-minutebenchmarks.Aerospikeenablespredictableperformanceoverdaysandmonths-thetypeofperformancethatallowsyourbusinessandapplicationstogrowseamlesslyasyoubroadenyourpresenceinlocalandinternationalmarkets.AsKoBaryimes,Kayak’sSVPofTechnology,observes:
“Aswecontinuetorapidlyexpandintointernationalmarkets,weneededasolutionthatwasreliableandcouldscaletoserveoffersacrossournetwork.Aerospikeenabledustoachievemulti-keygetsinlessthan3milliseconds,deploywitheaseandscalewithverylowjitter.”54
PredictablescalingwaskeytothesuccessofAerospikeatmarketinganalyticsfirmIx+1I,astheirCTO,PatrickDeAngelis,expresses:
“We’veseenAerospikescaletoafewbillionkeyvalueswithnocompromisetoperformance,andwe’veevenseenresponsetimesunder1ms,whichisphenomenal.”55
53
http://www.aerospike.com/blog/scaling-to-meet-russias-rapid-internet-ad-trajectory/
54http://www.aerospike.com/press-releases/kayak-selects-aerospike-delivers-personalized-offers-at-scale/
55http://s3-us-west-1.amazonaws.com/aerospike-fd/wp-content/uploads/2015/02/x-1casestudy_2012.pdf
19FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
Aswesawabove,therecentCassandrabenchmarkresultsshowedtheoverallTCOsavingsthatarepossiblewithAerospike.Butthisisjustpartofthestory.AcomparisonofthevarianceofAerospikevs.Cassandra-thatis,therangeofresponsetimesandthroughput,ratherthanasimpleaverage-tellsaveryinterestingtale:
Figure2.Measuredvarianceinread/writethroughputandlatency
AsFigure2illustrates,duringthetwelvehoursthatthebenchmarkran,thevarianceinthroughputandlatencyforbothreadandwriteoperationsvariedhugelyforCassandra(markedinblue).Incontrast,this
20FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
varianceremainedinaverynarrowbandforAerospike(markedinred).Whatthistranslatestoisconsistentresponseandthroughputcharacteristicsforyourapplications:Aerospikeisverypredictable,makingittrivialforyoutomeetyourapplicationSLAsnotjustnow,butalsointhefuture.
ProvenReliabilityAerospikeusesashared-nothingarchitecture,whereallnodesarepeers,withoutanynodehavingaspecificrole.Itisbuiltwithauniquemaster-basedclusteralgorithm.Asinglenodewillbetheownerortheprimarynodeforthatpartition.Ifareplicationfactorisdefined,thentherewillbeNnumberofothernodesthathaveacopyofthatrecordforreliability.Ifyouloseanode,thenyouhaveanothercopy.Andunlikeothersystems,Aerospikewritessynchronouslyacrossallcopiesofthedata.
Whenanodefailsorisremovedfromthecluster,anynodethathasasecondarycopyofthepartitioncanbeinstantlypromotedtobethemasterofthatpartition,withoutthetypicaldelaysimposedbyaconsensusalgorithm.AerospikeusesthePaxosalgorithmtoensureconsensusacrossthecluster,butsinceeachnodeisequalinitsroleandequalwiththecurrentstateofthedata,anyofthesecondarynodescanbepromoted.ThishasallowedanorganizationlikeAppNexustominimizedowntime,asexpressedbyitsCTO,GeirMagnusson:
“2.5millionimpressionsasecondatpeak,althoughwecangomuchhigher,andweseenorthof90Billionimpressionsperdayandthisisa24×7businesswith100%uptimewithAerospike.”56
ValeryVybornov,HeadofR&DatIMHOVi’s,hasaverysimilarstory:
“WealsoconsideredtheAerospikedatabase’smaturityandstability—mostnotablythatithasbeenrunninginproductionnon-stopforalmostfouryears.”57
Beyond“fair-weather”performanceandavailability,youalsoneedasystemthatcandealwithhuman(e.g.,“phatfingering”)ornaturaldisasters(e.g.,theweather).AerospikehastheabilitytoensureavailabilityacrossregionsusingCrossDatacenterReplication(XDR).DuringSuperstormSandythathittheEastCoastoftheUnitedStatesin2012,thisfeatureallowedadMarketplacetomaintainavailabilitywithoutahitch,astheircompany’sCTO,MikeYudin,narrates:
“Wedoa100%uptime.Welostoneofourdatacentersintheflood[HurricaneSandy],andit’snotjustthedatacenteritselfthatlostpower,it’stheentirenetworkinfrastructureofthetri-statemajorarea,allthebackbones...Howdidwedothis?Wedothisbyhavingredundant,notonlyredundantequipmentwithinthedatacenter,butalsothegloballyload-balancedinfrastructureacrossmultiplelocations.Ifonegetsflooded,thentrafficjustgetsshiftedintothedatacenterthatsurvives.Thetrickhereofcourseistomakesurethatyourlocationhasallthesamedataandallthesameintelligenceasthesystemthatgotdestroyed.”58
56
http://www.aerospike.com/why-appnexus-uses-aerospike/
57http://www.aerospike.com/blog/scaling-to-meet-russias-rapid-internet-ad-trajectory/
58http://www.aerospike.com/blog/super-storm-sandy-and-100-uptime/
21FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
PeoplewareHiringandretainingstaffisalwaysacriticaltaskinanyorganization.Youneedtoensurethattheservicesandapplicationsyoubuildanddeploymeetthetimetomarket(TTM)needsoftheorganization.Yetpeoplecostsshouldneverbeexcludedfromthiscalculation.Thelong-termcareandmaintenanceofyourinfrastructureisarealcostwhichyouneedtodrivedown-ifonlytospendyourcompany'senergyonuniquebusinessfunctions,notdatabasemaintenance.Fromanoperationalperspective,Aerospikeradicallysimplifiesrunningandmaintainingadistributeddatabase,asexpressedbyBlueKai’sVPofDelivery,TedWallace:
“Itjustworksandthenyoumoveon.We’vedonethatrepeatedlyoverthecourseofthepast2.5yearsandit’salwaysjustworked.That’sbeenawesome.”59
ContrastthiswiththetypicaladvicefromtheCassandracommunity-forinstance,SamBisbee,CTOofThreatStack:
“Don’tunderstaffCassandra.Thisishardasastartup,butrecognizegoinginthatitcouldrequire1to2FTEsasyourampup,maybemoredependingonhowquicklyyouscaleup.”60
OperationalSimplicityThetruevalueofcomplextechnologyisitssimplicityofusage,especiallyinchallengingproductionenvironments.Youdonotwanttoconsumetimeandenergyastheclusterexpands,orwhenyourefreshthehardwareormigratetoanewDataCenter.Youwanttouseyourdatabaseinfrastructurelikeautility,addingcapacityasandwhenyouneedto,withouttheneedtoperformextensiveplanningformaintenancewindows.AsTapad’sCEO,DagLiodden,states:
“Aerospikemakesupgradingsimple.That’sthebeautyofthisproduct.There’snoplanningrequired.Youcantakeserversdown,andstillhavethesystemrunning.”61
Aerospikeachievesthiswiththeoperationalsimplicityofself-formingandself-healingclusters,whicharebothrack-awareanddatacenter-aware.Nodowntimeisrequiredtoaddorremovenodesfromacluster;itautomaticallyre-distributespartitionsofdatatoensurethenewcomputeresourcesareefficientlyused.Rackawarenessalsoensuresthatdataiscorrectlyseparatedtoavoidcompoundingfailures.Usingaproprietaryalgorithm,nodescanberestarted-forexample,toperformasoftwareupgrade-butthememorycontentsarepreserved,allowingtheprocesstorestartwithouttheneedtowarmthememorybuffersandcachesinjustseconds,asthedatawasneverevictedfromDRAM.
59
http://www.aerospike.com/blog/bluekai-nosql-speed-scale-simplicity/
60http://blog.threatstack.com/scaling-cassandra-lessons-learned
61http://s3-us-west-1.amazonaws.com/aerospike-fd/wp-content/uploads/2015/02/Tapad_CaseStudy_101012.pdf
22FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
AsTedWallacefromBlueKaiopines:
“InthecaseofAerospike,whenweneedmorecapacityinourcluster,wegetamachineready,addittotheclusterandstepaway.Itjustworks.Thatisveryempoweringandveryrewardingwhenyoudon’thavetohavesomebodyspenddayscreatingadocumentedprocess,togetapprovals,youdon’thavetosendnotificationsouttoyourcustomersbecausethereisdowntimebecauseyouhavetodosomemassivedatabasemaintenance.Itjustworksandthenyoumoveon.”62
AmeyPatil,BigDataEngineeratCrowdfireadds:
“Duetoitsmaster-masterreplicationmodel,wedonothavetoworryaboutrebalancing,failoverorrecovery!ThishasdefinitelypleasedourDevOpsteam.;-)”63
Complextechnologydoesnothavetobecomplextouseandoperate.Period.
62
http://www.aerospike.com/blog/bluekai-nosql-speed-scale-simplicity/
63https://crowdfire.engineering/why-we-chose-aerospike-over-other-databases-1dfa2d66a292#.27jfii8t1
23FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
SummaryInTable4below,wecontrastthefivesignsyou’veoutgrownCassandrawiththeircorrespondingsolutionusingAerospike:
Table4.CharacteristicsofAerospikevs.Cassandraacrosskeyattributes
24FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
Aerospikebelievesinthreecoreprinciples:
1. Beingbuiltformoderncomputingarchitectures,andreadyforthenextgenerationofhardware2. Master-basedclustering,whichmeanssimplescalingandfailover3. SimpleDeveloperExperience(DX)
Beingbuiltformoderncomputingarchitectures,andreadyforthenextgenerationofhardware-Bydesigningearlyandunderstandingthedeepandsignificanttechnologytrends,AerospikehaspositioneditselfastheonlyNoSQLdatabasesystemdesignedandequippedtofullyutilizeFlash/SSDandstorageclassmemorysystems.Bybeingabletofullyparallelizereadsandwritestostorage,Aerospikecandrive12,14,even16SSD/Flashdevicesperchassis;itrunsoutofIOPsbeforeyourunoutofCPUcycles.ThisisahugecostsavingsovertryingtouseDRAMandcache-basedsolutions64.Thesecharacteristicsmakecloud-baseddeploymentsviableforhightransactionalloadstypicallyonlyreservedforon-premisedeployments.High-memoryinstancesandin-memorysolutionsarenotacost-effectivewaytogo;youneedtheabilitytoblendthespeedofDRAMaccesswiththecosteffectivenessofFlash/SSD(asAerospikedoes).Increasingly,cloudprovidersaremovingtoFlash/SSD-basedsystemstodrivebetterutilizationandreducepowerandcoolingcostsversustraditionalHDDsystems.Theyarealeadingindicatorofwherethefutureofcomputearchitectureswilllooklikeforeverybodyinthenext18-24months.Youneedasolutiondesignedandoptimizedformoderncomputingarchitectures.
Master-basedclustering,whichmeanssimplescalingandfailover-Availabilityisakeyingredientformostapplications.It'saformofbrandinsurance,becauseanytimeyourinfrastructure(ofwhichyourdatabaseisapart)isunavailable,thisimpactsyourbrandperception.Whetherthismanifestsasadeclinewhenswipingacreditcardbecausethesystemcannotprocessa“rainyday”load,orastheinabilitytoprovidesuitablerecommendations(andtheresultinglostopportunitycost)-itcomesdowntobrandandperception.It'snolongersimplygettinganemptyorsystemmaintenancepagewhenthewebsiteisunavailable-it'sallaboutensuringyouraudienceremainsengaged,regardlessofwhetheryourinfrastructureishavingarainydayornot.Master-basedclusteringandPaxosconsensusalgorithmsmakeupthecoretechnicalreasonswhyAerospikeprovidesnear-instantaneousfailover.Aerospikecanthereforesustainmixedread/writeworkloadswitheaseandwithpredictablethroughputandlatency,evenonrainydays.Thisnotonlysatisfiesyourkeyavailabilityrequirements;italsosignificantlyreducesyouroverallTCOwhencomparedtoeventuallyconsistentsystemslikeCassandra.Asyouhaveseen,Aerospikehasunmatchedavailability.Andavailabilityisacoreattributeofyourcustomers’perceptionofyourservicesandofferings.Thisisnolabexperiment-thesearereal-worldexamplesofapplicationshandlingthemostdemandingworkloads,24x7.
64
http://www.aerospike.com/blog/bluekai-flashssd-speed-at-scale/
25FiveSignsYouHaveOutgrownCassandra(andWhattoDoAboutIt)
SimpleDeveloperExperience(DX)-Developershavebecometheearlyadoptersoftechnology,oftenchoosingatechnologybeforetheoperations(or“DevOps”)teamareawareofanewproject.APIshavetobenatural,simpleandcurrent.Youdon’twantyourvendortobealaggard-whichiswhyAerospikepublisheditsDXManifestoin2015.Wesupportthelanguagesandframeworksyouneedtobuildefficient&flexibleapplications,cuttingthetimetomarketthatyourbusinessisdemanding,andensuringthatyouremaincompetitive.
Aerospikeisthenext-generation,enterprise-gradeNoSQLsolution.AerospikehasafundamentallyuniquearchitecturethathashelpedcustomerslikeBlueKai,Applovin,ShareThis,AdForm,InMobi,PubMatic,NexTagandCursecost-effectivelyconvertsignificantapplicationsfromCassandratoAerospike.ConvertingtoAerospikehasallowedtheseorganizationstoachievemorepredictableperformance,improveuptimeandavailability,andsignificantlydecreaseTCO.
OurrecentYCSBbenchmarkgoesintogreatdetailontheperformancegainsofanAerospikesolution.Ifyou’reconcernedaboutyourCassandraimplementation,contactAerospikeat(+1)408-462-AERO(408-462-2376),orfilltheformathttp://www.aerospike.com/contact-us.
Aerospikeisthehigh-performanceNoSQLdatabasethatdeliversSpeedatScale.Aerospikeispurpose-builtforthereal-timetransactionalworkloadsthatsupportmission-criticalapplications.TheseworkloadshavethemandatetodeliverinformedandimmediatedecisionsforverticalslikeFinancialServices,AdTech,andeCommerce.Theuniquecombinationofspeed,scale,andreliabilitycandeliverupto10xperformanceor1/10ththecostcomparedtomostotherdatabases.
2525E.CharlestonRoad,Suite201MountainView,CA94043
Tel:408.462.AERO(2376)www.aerospike.com
©2017Aerospike,Inc.Allrightsreserved.AerospikeandtheAerospikelogoaretrademarksorregisteredtrademarksofAerospike.Allothernamesandtrademarksareforidentificationpurposesandarethepropertyoftheirrespectiveowners.(WP04-101317)
AppendixProductComparisonPleaserefertothechartbelowforasummaryofkeyproductdifferencesbetweenAerospikeandCassandra:65
65PlatformssupportthroughApachehttp://www.planetcassandra.org/cassandra/