slash n: tech talk track 2 – distributed transactions in soa - yogi kulkarni, ganesh subramanian

23
Distributed Transactions in SOA Dealing with data consistency issues Yogi Kulkarni Ganesh Subramanian

Upload: slashn

Post on 07-Dec-2014

2.091 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Distributed Transactions in SOA Dealing with data consistency issues

Yogi Kulkarni

Ganesh Subramanian

Page 2: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

• Experiences from recent project to rebuild Flipkart's Supply

Chain Platform as a SOA

• 20 new services

o Separate databases, data-access only through public API

o HTTP + JSON API

Context

Page 3: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Order

Management

HTTP + JSON API

Fulfillment

HTTP + JSON API

Procurement

HTTP + JSON API

Warehouse

HTTP + JSON API

Doc

HTTP + JSON API

Supplier

HTTP + JSON API

Shipping

HTTP + JSON API

Accounting

HTTP + JSON API

Flipkart Supply Chain: High Level View

Page 4: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

• Strict data consistency requirements

o Cannot afford to lose Orders, Payments,

Invoices, Inventory, Shipments, etc

• Business-transactions involving multi-service

updates

• Have to deal with "partial failures"

Challenges

Page 5: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Failure Scenario 1

Other service "disappears" after it was asked to do something

• Retry?

• What if B had already processed the request?

do_Y

do_X

Service

A

Service

B

?

• B crashed

• Network error

Page 6: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Failure Scenario 2

Caller crashed after the other service did what it was told to do

Caller crashed. On restarting:

- Should it ask B to undo what it did?

- Should it resume from where it crashed?

do_Y

do_X

Service

A

Service

B

Page 7: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

"Transaction" concept?

Distributed Transactions to the Rescue

Types of distributed transactions

- XA transactions using 2-phase commit

- Stateless Queued transactions*

- Stateful Queued transactions*

* From: Transactional Information Systems - Weikum & Vossen http://books.google.co.in/books?id=wV5Ran71zNoC

Keeping data consistent in the face of highly concurrent

access and despite all sorts of failures*

Page 8: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

XA Transactions

Distributed

Transaction

Coordinator

2-phase commit

Application

DB DB

Message

Queue

Resource

Manager

Resource

Manager Resource Manager

Page 9: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

XA Transactions

Pros

o Simple programming model for caller

Cons

o Breaks service encapsulation if underlying DB participates in

the transaction

o Performance issues due to lock hold times

o Can block when coordinator fails

o Service-level 2 PC not standardized or mature e.g. WS-

AtomicTransaction Specification from OASIS Group

Page 10: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Stateless Queued Transactions

A B

DB DB

Transaction Scope:

DB Commit +

enqueue message

1

Transaction Scope:

Dequeue message +

DB Commit

2

Worker Worker

Page 11: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Stateless Queued Transactions

Pros

o Works well for simple request-reply interactions

Cons

o Difficult to deal with complex multi-step transactions

o Performance issues if XA transaction is used between DB &

MQ

o Idempotency checks need to be implemented to prevent

duplicate processing

o Message ordering problem

o 2 "interfaces" - Messages and main HTTP API

Page 12: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Stateful Queued Transactions

Image from: Transactional Information Systems - Weikum & Vossen

Page 13: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Stateful Queued Transactions

Requires a state-machine / workflow-like engine

Pros

o Simplifies implementation of stateful processes spanning multiple services

o Convenient for flows which fork & join

Cons

o State-engine is complex to build

o Open source and commercial tools not a good fit

o BPM, Workflow, ESB tools are integration focussed

o Heavyweight

Page 14: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

What we did

• Abstract out messaging

• Services expose a single HTTP interface for sync and async

• Local transactions & relayer to avoid XA issues

• Idempotency check infrastructure

Make it very simple for a service to participate in asynchronous interactions under such transactional guarantees

Most interactions were one-hop request-reply, so chose Stateless Queued Transactions

Page 15: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Service A Service B Service C

Restbus Server

Restbus Client Restbus Client Restbus Client

MQ

Restbus

Page 16: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Application Business Logic Restbus

Filter Relayer

Database inbound_messages outbound_messages

Restbus Server

Service B

POST /some_resource/:id

{"data":"some data"}

Don't see restbus headers.

Forwarding to App

POST /some_resource/:id

X_RESTBUS_MESSAGE_ID: IDxyz1234

{"data":"some data"}

Msg id `IDxyz1234`was not

processed before.

Adding it to inbound_messages

I have to do an update on Service C !!

Let me write to outbound_messages

Msg id `IDxyz1234`was already

processed.

Sending the stored response back

Found new messages to be relayed.

Order them by group id and send in

sequence to restbus

Restbus was not reachable. I will

retry processing the rest of the

messages in the group later

Sending the stored response back

Restbus Client

Service A

Page 17: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Instance 3 Instance 2

Instance 1

MQ

MQ Connector

HTTP Connector

C1 C2 C4 C3

Msg Handler

Service A Service B Service C

C5 C6

Push the message to

retry queue

Restbus Server

Page 18: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Restbus

MQ

Order

Consumer Payment

Consumer

Order

Management Accounting

Idempotency

Filter Idempotency

Filter

Relayer Relayer

Order DB Payments DB

Inbound

Messages

Inbound

Messages

Outbound

Messages

Outbound

Messages

Send

Message Poll

Messages

Poll

Messages

Insert Message Insert Message

Insert Payment

POST /callback

(Payment Created)

HTTP

POST /payments

HTTP

Get messages to

be relayed

Begin Txn

Insert Order

Insert create

payment

Message

Begin Txn

Push

response to

queue

Commit Txn

Push Messages

HTTP

Commit Txn

Approve

Order

Page 19: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Restbus Features

o Pluggable MQ

o End-to-end group-based message ordering

o Dynamic load balancing of consumers in a cluster

o Multi-level retries and sideline queues

Page 20: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Restbus - Tech Stack

o Netty-based Java server

o JMS - Swiftmq and HornetQ

o Zookeeper

Page 21: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

Restbus at Flipkart

o Live in production for past 4 months

o Backbone of Flipkart's supply chain system

o 20 services, 200 queues, 30 topics

o 3 million messages & events a day

Page 22: Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,  Ganesh Subramanian

o Resiliency because of message-persistence, relay, auto-retries, idempotency

o Monitoring is critical to deal with increased complexity of

async operations

o Service-responsibility paritioning is important to prevent read-

then-write race conditions - "tell-don't-ask"

o Synchronous inter-service update calls == lots of partial

failures

Learnings

o "Orchestrator" services get complex fast

o HTTP is excellent integration glue

o Unified interface

o HTTP semantics directly usable