in-memory computing, storage & analysis: apache apex + apache geode
TRANSCRIPT
![Page 1: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/1.jpg)
In-MemoryComputing,Storage&AnalysisApacheApex+ApacheGeode
SandeepDeshmukh AshishTadose
![Page 2: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/2.jpg)
ProjectStatus
Mentor ListTed Dunning: Apache Member, MapR
Alan Gates: Apache Member, HortonworksTaylor Goetz: Apache Member, Hortonworks
Justin Mclean: Apache Member, Class SoftwareChris Nauroth: Apache Member, HortonworksHitesh Shah: Apache Member, Hortonworks
ApexInApacheIncubationStage
![Page 3: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/3.jpg)
ApacheApex(Incubating)CommitterList
Open-sourced inJuly2015
Over50 committersalready…Andgrowing….
![Page 4: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/4.jpg)
ApexPlatformOverview EnterpriseEdition
![Page 5: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/5.jpg)
Directed AcyclicGraph (DAG)
ApplicationProgrammingModel
• A Stream is a sequence of data tuples• An Operator takes one or more input streams, performs computations & emits one or more output streams
• Each Operator is YOUR custom business logic in java, or built-in operator from our open source library• Operator has many instances that run in parallel and each instance in single-threaded
• Directed Acyclic Graph (DAG) is made up of operators and streams
Output StreamTuple Tuple er
Operator
er
Operator
er
Operator
er
Operator
ApplicationProgrammingModel
![Page 6: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/6.jpg)
Hadoop EdgeNode
DTRTSManagement
Server
HadoopNode
YARNContainerApexAppMaster
HadoopNode
YARNContainerYARNContainer
YARNContainer
Thread1
Op2
Op1
Thread-N
Op3
StreamingContainer
HadoopNode
YARNContainerYARNContainer
YARNContainer
Thread1
Op2
Op1
Thread-N
Op3
StreamingContainer
CLI
RESTAPI
DTRTSManagement
Server
RESTAPI
PartofCommunityEdition
ApexComponentOverview
![Page 7: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/7.jpg)
• NativeHadoopIntegration• PartitioningandScalingout• AdvancedWindowingSupport• StatefulFault-tolerance• ProcessingSemantics• ComputeLocality• Dynamicupdates
ApexFeatures…
![Page 8: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/8.jpg)
ApacheApex-Malhar
![Page 9: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/9.jpg)
• Processingdatain-motion
• Preventingdata-loss– bufferserver
• Inmemorydatastoresforqueryingdata
IMCComponentsinApex
![Page 10: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/10.jpg)
Typicallatencies
WhyIn-MemoryComputing?
![Page 11: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/11.jpg)
WhyIn-MemoryComputing?
In-memorycomputingwillhavelongterm,disruptiveimpactbyradicallychangingusersexpectations,applicationdesignprinciples,product'sarchitecturesandvendor'sstrategiesRAMisthenewdisk,
diskthenewtapeRAMisthenewdisk,diskthenewtape
In-memorycomputingisthefutureofcomputing..itoffersmassivenotonlyinTCOreductionbutacrossallfourvaluedimensions:performance,process,processinnovation,simplificationand
flexibility.
![Page 12: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/12.jpg)
WhatareIMDG?• IMDGshostdatainmemoryanddistribute itacrossa clusterofcommodityservers• Themainaccesspatterniskey/valueaccess,MapReduce,variousformsofHPC-likeprocessing,
andalimiteddistributedqueryingandindexingcapabilities.
Whytheyareimportant?
• Performance– usingRAMisfasterthanusingdisk.• Extremely Highavailabilityofdata- bykeepingitinmemoryandinhighlydistributedcluster.• DataStructure– usingakey/valuestoreallowsgreater flexibility fortheapplicationdeveloper.
objectstoresimilar ininterfacetoatypicalconcurrenthashmap.• ScalableDataPartitioning• TransactionalACIDsupport
InMemoryDataGrid- IMDG
![Page 13: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/13.jpg)
![Page 14: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/14.jpg)
HighLevelArchitecture- Geode
![Page 15: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/15.jpg)
GeodeFeatures
CoreFeatures• Linearscalability&latencyminiming datadistribution • Performanceoptimizedpersistence- Highavailability&durability • Configurableconsistency- regiontypes{partitioned, replicated&local}• Distributed transactions• Clusterresilience&failover
AdvancedFeatures• ServerFunctionExecution- Sendcomputationtodata• Asynchronous Events- Delivereventstoareceiverwithoutimpacting the
writepath• ContinuesQueries&Clientsubscriptions - Usefulforrefreshing client
cache
![Page 16: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/16.jpg)
GeodeFeatures
CoreFeatures• Linearscalability&latencyminiming datadistribution • Performanceoptimizedpersistence- Highavailability&durability • Configurableconsistency- regiontypes{partitioned, replicated&local}• Distributed transactions• Clusterresilience&failover
AdvancedFeatures• ServerFunctionExecution- Sendcomputationtodata• Asynchronous Events- Delivereventstoareceiverwithoutimpacting the
writepath• ContinuesQueries&Clientsubscriptions - Usefulforrefreshing client
cache
![Page 17: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/17.jpg)
� Caching for speed and scale– Read-through, Write-through, Write-behind
� Geode as the OLTP system of record– Data in-memory for low latency, on disk for durability
� Parallel compute engine
� Real-time analytics
ApplicationPatterns
![Page 18: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/18.jpg)
GeodereadsWithConsistentLatencyandCPU
• Scaledfrom256clientsand2serversto1280clientsand10servers• Partitionedregionwithredundancyand1Kdatasize
0
2
4
6
8
10
12
14
16
18
0
1
2
3
4
5
6
2 4 6 8 10
Spee
dup
ServerHosts
speedup
latency(ms)
CPU%
GeodeFeatures
![Page 19: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/19.jpg)
Geode3.5-4.5XFasterThanCassandraforYCSB
![Page 20: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/20.jpg)
Roadmap
� HDFS persistence
� Off-heap storage
� Lucene indexes
� Spark integration
� Cloud Foundry service
…and other ideas from the Geode community!
Roadmap
![Page 21: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/21.jpg)
StreamingmeetsInMemoryDataGrid
![Page 22: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/22.jpg)
Apex+GeodeApexOperatorcheck-pointinginGeodestore• BetterlatencyforcheckpointoperationsthanHDFScheck-pointing • MakesApexDAGacompletein-memorypipeline• https://issues.apache.org/jira/browse/APEXCORE-283
WriteApexdatastreamstoGeodestore• Apexoutput operatorimplementationwhichwritesdatatoGeoderegion• Usecases
• IngeststreamingdatainGeodeforfurtherprocessing• StoreDataprocessedbyApexpipeline inGeodestoretoserveuserqueries
• https://malhar.atlassian.net/projects/MLHR/issues/MLHR-1942
![Page 23: In-Memory Computing, Storage & Analysis: Apache Apex + Apache Geode](https://reader030.vdocuments.net/reader030/viewer/2022021500/58cfc7141a28ab7c6e8b50d9/html5/thumbnails/23.jpg)
Questions???
ThankYou…