soa & big data
DESCRIPTION
Some of the challenges and oppurtunitiesTRANSCRIPT
SOA & Big data
Arnon Rotem-‐Gal-‐Oz
Sept 2012 – iOS6 launched with new maps applica>on
But something went terribly wrong….
hEp://theamazingios6maps.tumblr.com/
• It isn’t just about geKng all the data there
• Algorithms are cool but we need humans in the loop
• Hire the right people • Test ! Test ! Test!
hEp://theamazingios6maps.tumblr.com/
hEp://theamazingios6maps.tumblr.com/
It isn’t just one pile of data
Integra>ng Big data & SOA
Yoel Ben Avraham -‐ hEp://www.flickr.com/photos/epublicist/3546059144/
Data Refinery
Ofer Berger hEp://www.haifacity.com/allsites/allpic/a/A1738/A1738Pic3326.jpg
ETL integra>on DB integra>on File-‐based integra>on Online integra>on
Department Server DB
ASB BLT
HDL
AFT TGI FRY
DRW SWG
QYD DLY
BST
WIU
ASB
ZIS XOI CUI
RMO
DLY XPS
KYF
KFC
WHR
JIA GEX
FQA VUH
HCO
WKD
ECP
SKD
MFP
WCP
DKE AJT
Object soup
ASB BLT
HDL
AFT TGI FRY
DRW SWG
QYD DLY
BST
WIU
ASB
ZIS XOI CUI
RMO
DLY XPS
KYF
KFC
WHR
JIA GEX
FQA VUH
HCO
WKD
ECP
SKD
MFP
WCP
DKE AJT
Services
Invoices
Customer
Promotions
Orders
Service
Describes
Endpoint Exposes
Messages Sends/receives
Contracts
Binds to
Service consumer Implements
Policy Governed by
Sends/receives
Adheres to
Component
Rela>on
Key
Understands Serves
Interac>ons
Customer
Categories Agents
Integra>ng Big data & SOA
Yoel Ben Avraham -‐ hEp://www.flickr.com/photos/epublicist/3546059144/
Coordinator*
Prepare/commit/undo
Service consumer
Protocol
Rela>on
Key
SOA component PaEern component
Concern/aEribute
RegistraDon
Perform acDvity
Compensate
Create context
Ini>ator Service
Par>cipator
Perform acDvity
Compensate Prepare / commit / undo
Register
AcDviDes and replies
AcDviDes and replies
Saga
Hadoop Cluster
NIM
Interaction Recordings
ETL
Customer HBase
Raw(HDFS)
Interactions
HBaseData Management
HBase
HCatalog
Resolved Interactions(H
DFS)
Categories HBase
HBase
Resolved Interactions(H
DFS)
So, what’s the problem ?
& Big data can’t move
Performance of joins in distributed system sucks!
Node 1
customers A-H
Interactions 0-99
Node 2
customers I-M
Interactions 100-199
Node 3
customers N-Z
Interactions 200-299
{”Interac>on": { "id": ”5", ”par>cipants": { ”customer": [ {”surname": ”McDonalds", ”name": ”Old"},] } }}
Cookie cuEer scalability
Cell architecture
Node 2
Node 3
Node 1
Node N
Cell Architecture
BUS
Categories Customers
Interactions ReferenceData
ORCA
…
HBaseHDFS HBase
HBase HBase HBase
Initiate business process
Workflow engine
Endpoint
Workflow instance
Invoke services
Manage process
Route request
Host workflows
Schedule
Service
Endpoint Service
Manage workflows
Monitor workflows
Orchestra>on
Map Reduce processing pipeline
Resolve Customer IDs(Custoemr)
Categorize Segment
(Categorization)
Update Segmentdocument
(Interaction)
Map pipeline
Segment RowRetrive segment
data - create segment
document(Interaction)
Write Categories
Results(Categorization)
Write Interaction
(interaction)
CustomersLocal cache
InteractionID, Segment Row
Map
Prepare data mart Export(Datamart)
Update Interactiondocument
(Interaction)
Reduce pipeline
Interaction &Segments Categorize
Interaction(Categorization)
Write Categories
Results(Categorization)
Write Interaction
(interaction)
Reduce
Write Interaction
(interaction)
Hadoop Map/Reduce
Map Reduce processing pipeline
Resolve Customer IDs(Custoemr)
Categorize Segment
(Categorization)
Update Segmentdocument
(Interaction)
Map pipeline
Segment RowRetrive segment
data - create segment
document(Interaction)
Write Categories
Results(Categorization)
Write Interaction
(interaction)
CustomersLocal cache
InteractionID, Segment Row
Map
Prepare data mart Export(Datamart)
Update Interactiondocument
(Interaction)
Reduce pipeline
Interaction &Segments Categorize
Interaction(Categorization)
Write Categories
Results(Categorization)
Write Interaction
(interaction)
Reduce
Write Interaction
(interaction)
Hadoop Map/Reduce
Data Facets
In-memory
Data grid Columnar
Graph
Indexing
NewSQL
Columnar
Caching
HBase
Hypertable
Neo4j
Apache Solr AKvio
IndexTank
RavenDB
Cassandra
MongoDB
CouchDB
ScaleBase
VoltDB
Amazon RDS
HP Ver>ca
EMC Greenplum
IBM Netezza
Microsoo PDW
Aster Data
ParAccel
Memcached GigaSpaces Redis GridGain
Oracle Coherence
WebSphere eXtreme Scale
Pregel
Hama
SAP HANA Oracle Exadata
Accumulo
Document Relational
Analytics/MPP
Key-value store
Distributed file systems
Hadoop GlusterFS
Mul>-‐>ered data
Datawarehouse (Hadoop/Hbase)
20 years detailed
aggregated
Datamart(s) (RDBMS)
6-‐12 months Detailed
1-‐3 years aggregated
Cube (MOLAP)
6-‐? Months aggregated
Real-‐>me (in memory) 1-‐7 days detailed
Data is mul>-‐>ered
Mul>-‐>ered data
Data warehouse (Hadoop/Hbase)
20 years detailed
aggregated
Real-‐>me
1-‐7 days detailed
Datamart(s) (Columnar)
6-‐12 months Detailed
Data is mul>-‐>ered
SOA leaves us with a lot of isolated data
Subscribed/ pulled data
Pull data
Data backend
Endpoint
Out
Load
Report
Ingest
Clean Join
Transform
Transpose
Produce reports
Report
Endpoint
Request
Raw data
ODS/DM
SQL endpoint
SQL endpoint
Landing area
Service
Aggregated Repor>ng
Landing
Raw data
DW/ODS
Views
Transforma>on service
1
1
2
3
4
5
Load service
2
Report service
Report tool
Data mart
4
3
Raw data (HDFS)
Aggregation map/reduce
HBase
ETL (map/reduce
+ETL)
Drill through REST API
Details Aggregates
1
2
2
5
6
7
8
9
10
Take aways
SOA & Big data are beEer together
Arnon Rotem-‐Gal-‐Oz
[email protected] hEp://www.nice.com
hEp://arnon.me/soa-‐paEerns
[email protected] hEp://arnon.me
@arnonrgo