soa & big data

Post on 10-May-2015

3.800 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Some of the challenges and oppurtunities

TRANSCRIPT

SOA & Big data  

Arnon  Rotem-­‐Gal-­‐Oz  

Sept  2012  –  iOS6  launched  with  new  maps  applica>on  

But  something  went  terribly  wrong….  

hEp://theamazingios6maps.tumblr.com/  

•  It  isn’t  just  about  geKng  all  the  data  there  

•  Algorithms  are  cool  but  we  need  humans  in  the  loop  

•  Hire  the  right  people  •  Test  !  Test  !  Test!    

hEp://theamazingios6maps.tumblr.com/  

hEp://theamazingios6maps.tumblr.com/  

It  isn’t  just  one  pile  of  data  

Integra>ng  Big  data  &  SOA    

Yoel  Ben  Avraham  -­‐  hEp://www.flickr.com/photos/epublicist/3546059144/  

Data    Refinery    

Ofer  Berger    hEp://www.haifacity.com/allsites/allpic/a/A1738/A1738Pic3326.jpg  

ETL  integra>on  DB  integra>on  File-­‐based  integra>on  Online  integra>on  

Department  Server  DB  

ASB BLT

HDL

AFT TGI FRY

DRW SWG

QYD DLY

BST

WIU

ASB

ZIS XOI CUI

RMO

DLY XPS

KYF

KFC

WHR

JIA GEX

FQA VUH

HCO

WKD

ECP

SKD

MFP

WCP

DKE AJT

 Object  soup  

ASB BLT

HDL

AFT TGI FRY

DRW SWG

QYD DLY

BST

WIU

ASB

ZIS XOI CUI

RMO

DLY XPS

KYF

KFC

WHR

JIA GEX

FQA VUH

HCO

WKD

ECP

SKD

MFP

WCP

DKE AJT

 Services  

Invoices

Customer

Promotions

Orders

Service  

Describes  

Endpoint   Exposes  

Messages   Sends/receives  

Contracts  

Binds  to  

Service    consumer   Implements  

Policy   Governed  by  

Sends/receives  

Adheres  to  

Component  

Rela>on  

Key  

Understands  Serves  

Interac>ons  

Customer  

Categories  Agents  

Integra>ng  Big  data  &  SOA    

Yoel  Ben  Avraham  -­‐  hEp://www.flickr.com/photos/epublicist/3546059144/  

               

               

               

               

Coordinator*  

Prepare/commit/undo  

Service  consumer  

Protocol  

Rela>on  

Key  

SOA  component   PaEern  component  

Concern/aEribute    

RegistraDon  

Perform    acDvity  

Compensate  

Create  context  

Ini>ator  Service    

Par>cipator  

Perform    acDvity  

Compensate  Prepare  /  commit  /  undo  

Register  

AcDviDes  and  replies  

AcDviDes  and  replies  

Saga  

Hadoop Cluster

NIM

Interaction Recordings

ETL

Customer HBase

Raw(HDFS)

Interactions

HBaseData Management

HBase

HCatalog

Resolved Interactions(H

DFS)

Categories HBase

HBase

Resolved Interactions(H

DFS)

So,  what’s  the    problem  ?  

 &  Big  data    can’t  move  

Performance  of  joins  in  distributed  system  sucks!  

Node 1

customers A-H

Interactions 0-99

Node 2

customers I-M

Interactions 100-199

Node 3

customers N-Z

Interactions 200-299

{”Interac>on":  {      "id":  ”5",        ”par>cipants":  {          ”customer":  [              {”surname":  ”McDonalds",  ”name":  ”Old"},]      }  }}  

Cookie  cuEer  scalability    

Cell  architecture  

Node  2  

Node  3  

Node  1  

Node  N  

Cell  Architecture  

BUS

Categories Customers

Interactions ReferenceData

ORCA

HBaseHDFS HBase

HBase HBase HBase

                 

Initiate business process

Workflow engine

Endpoint  

Workflow instance

Invoke services

Manage  process  

Route request

Host  workflows  

Schedule  

Service

                 

Endpoint  Service

Manage  workflows  

Monitor  workflows  

Orchestra>on  

Map  Reduce  processing  pipeline  

Resolve Customer IDs(Custoemr)

Categorize Segment

(Categorization)

Update Segmentdocument

(Interaction)

Map pipeline

Segment RowRetrive segment

data - create segment

document(Interaction)

Write Categories

Results(Categorization)

Write Interaction

(interaction)

CustomersLocal cache

InteractionID, Segment Row

Map

Prepare data mart Export(Datamart)

Update Interactiondocument

(Interaction)

Reduce pipeline

Interaction &Segments Categorize

Interaction(Categorization)

Write Categories

Results(Categorization)

Write Interaction

(interaction)

Reduce

Write Interaction

(interaction)

Hadoop Map/Reduce

Map  Reduce  processing  pipeline  

Resolve Customer IDs(Custoemr)

Categorize Segment

(Categorization)

Update Segmentdocument

(Interaction)

Map pipeline

Segment RowRetrive segment

data - create segment

document(Interaction)

Write Categories

Results(Categorization)

Write Interaction

(interaction)

CustomersLocal cache

InteractionID, Segment Row

Map

Prepare data mart Export(Datamart)

Update Interactiondocument

(Interaction)

Reduce pipeline

Interaction &Segments Categorize

Interaction(Categorization)

Write Categories

Results(Categorization)

Write Interaction

(interaction)

Reduce

Write Interaction

(interaction)

Hadoop Map/Reduce

Data  Facets  

In-memory

Data grid Columnar

Graph

Indexing

NewSQL

Columnar

Caching

HBase  

Hypertable  

Neo4j  

Apache  Solr   AKvio  

IndexTank  

RavenDB  

Cassandra  

MongoDB  

CouchDB  

ScaleBase  

VoltDB  

Amazon  RDS  

HP  Ver>ca  

EMC  Greenplum  

IBM  Netezza  

Microsoo  PDW  

Aster  Data  

ParAccel  

Memcached   GigaSpaces  Redis   GridGain  

Oracle  Coherence  

WebSphere  eXtreme  Scale  

Pregel  

Hama  

SAP  HANA   Oracle  Exadata  

Accumulo  

Document Relational

Analytics/MPP

Key-value store

Distributed file systems

Hadoop  GlusterFS  

Mul>-­‐>ered  data  

Datawarehouse  (Hadoop/Hbase)  

     

20  years    detailed  

aggregated      

Datamart(s)  (RDBMS)  

   

6-­‐12  months  Detailed  

 1-­‐3  years  aggregated  

Cube  (MOLAP)  

   

6-­‐?  Months  aggregated  

Real-­‐>me  (in  memory)  1-­‐7  days  detailed  

Data  is  mul>-­‐>ered  

Mul>-­‐>ered  data  

Data  warehouse  (Hadoop/Hbase)  

     

20  years    detailed  

aggregated      

Real-­‐>me    

1-­‐7  days  detailed  

Datamart(s)  (Columnar)  

 6-­‐12  months  Detailed  

 

Data  is  mul>-­‐>ered  

SOA  leaves  us  with  a  lot  of  isolated  data  

                         

                         

Subscribed/ pulled data

                         

               

Pull data

Data backend

Endpoint

Out

Load

Report

Ingest  

Clean  Join  

Transform

Transpose  

Produce    reports  

Report

Endpoint

Request

Raw  data  

ODS/DM

                         SQL endpoint

                         SQL endpoint

Landing  area  

Service

Aggregated  Repor>ng  

 Landing  

Raw  data          

DW/ODS  

Views  

Transforma>on  service  

1

1

2

3

4

5

Load  service  

2

Report  service  

Report tool

Data mart

4

3

Raw data (HDFS)

Aggregation map/reduce

HBase

ETL (map/reduce

+ETL)

Drill through REST API

Details Aggregates

1

2

2

5

6

7

8

9

10

Take  aways  

SOA  &  Big  data  are  beEer  together  

Arnon  Rotem-­‐Gal-­‐Oz    

arnonr@nice.com    hEp://www.nice.com  

 

hEp://arnon.me/soa-­‐paEerns    

arnon@rgoarchitects.com     hEp://arnon.me  

 

@arnonrgo  

top related