csis0402 system architecture distributed computing - middleware technology (1) k.p. chow university...

CSIS0402 System Architecture CSIS0402 System Architecture Distributed ComputingDistributed Computing

- - Middleware Middleware Technology (1)Technology (1)

K.P. Chow

University of Hong Kong

Early DaysEarly Days

Mainframe terminal networks Connecting “dumb” green-screen

terminals to mainframe Terminal identified by line id

Programs

Terminal handler

TerminalsDistributed network Each machine can have many

connections (TCP/IP port no.) Each machine is identified by its

IP address

Programs

Network interface

Distributed SystemsDistributed Systems

First distributed system: ARPANET– 1969: 4-node network

– 1972: about 50 computers Communication between companies in the same industry:

– SWIFT (Society for Worldwide Interbank Financial Telecommunication) for international money transfers in the financial industry

– IATA (International Air Transport Association) in the airline industry Vendor specific network architecture (1970s)

– IBM’s System Network Architecture (SNA)

– Sperry’s Distributed Computing Architecture (DCA)

– Burroughs’ Network Architecture (BNA)

– DEC’s Distributed Network Architecture (DNA)

– Basic services: file transfer, remote printing, terminal transfer, remote file access

– Distributed applications: e-mail

Open Systems Interconnection (OSI)Open Systems Interconnection (OSI)

Force IT vendors to implement the same standards, e.g. RS232 and X.25 Use International Organization for Standardization (ISO) as the standards

authority Based on OSI (Open Systems Interconnection) series of standards

– OSI Basic Reference Model: seven-layer model

Application

Presentation

Session

Transport

Network

Data Link

Physical

Application

Presentation

Session

Transport

Network

Data Link

Physical

Sender Receiver

Flow of data

Flow of data

Protocol between each layer

TCP/IPTCP/IP Open Systems Standards

– Too slow

– Standards are too complex, e.g. OSI virtual terminal standard

Unix– Open systems

– Cheap and portable

– Many versions: Berkeley version and AT&T version

– Network support: TCP/IP TCP/IP

– Internet Protocol (IP): network standard

– Transmission Control Protocol (TCP): session standard for program-to-program communication over IP

– Provides application programming interfaces (APIs) for sending messages/data over the network: Sockets

– Services: telnet, snmp, ftp

TCP UDP

IP

Ethernet ATM

Session

Transport

Network

Data Link

Why Middleware?Why Middleware?

When building distributed systems, need to build the middleware: Easy of use: compare to Sockets Location transparency:

– Application should not need to know the network and application address

– Possible to move an application from a machine to another machine with different network address without recompilation

Message delivery integrity: messages should not be lost or duplicated Message content integrity: messages should not be corrupted Language transparency: a program using middleware should be able to

communicate with another program written in a different language Support client/server model: server provides a service for the client

Traditional Middleware ModelsTraditional Middleware Models

Remote Procedure Call (RPC) Remote data access Transaction processing Distributed transaction processing Message queue

Remote Procedure Call (RPC)Remote Procedure Call (RPC)

Access a remote service through remote procedure calls: the syntax in the client (the caller) and the server (the called) remains the same as if they were on the same machine

Known RPC mechanism:– Open Network Computing (ONC) from SUN

– Distributed Computing Environment (DCE) from Open Software Foundation (OSF)

Stub

Skeleton

Interface Definition Language (IDL)

Compilation Environment

Client ServerPrice = getPrice(Product);

Price = getPrice(Product);

Remote Procedure Calls (RPC)Remote Procedure Calls (RPC)

Stub and skeletons– Generated by IDL compiler

– Small chunks of C code that are compiled and linked into the client and server programs

Stub: converts parameters into a string of bits and sends the message over the network

Skeleton: takes the messages and converts it back to parameters, and calls the server program

Marshalling: handles different data format, e.g. byte ordering

Call getPrice(string Product) return float // price

010011010110011110001010

getPrice(string Product) return float // price

Parameters marshalled into a string of bits

Client

Server

RPC and Multi-threadingRPC and Multi-threading

Using RPC– A client is blocked when it calls a remote procedure

– Client may be left waiting due to message lost, server too slow, or server halted

– Server is also blocked when process a request Multi-thread

– Allow client to read from the keyboard or the mouse while waiting for remote procedure call returns

– Allow server to handle many clients simultaneously, e.g. 1000 threads to handle 1000 clients

– May have synchronization problem during resources sharing: need to use locks or semaphores, e.g. withdraw money from the same account from 2 different ATMs

– Difficult to test and debug multi-thread program

Remote Database AccessRemote Database Access

Provides the ability to read or write to a database that is physically on a different machine from the client program

Vendor specific, e.g. Oracle’s Oracle Transparent gateway, IBM’s DRDA

ODBC (Open Database Connectivity)– Remote database access provided by Microsoft

– Standard programming interface (only) for Windows, not middleware

– Vendors will have to provides the client and server middleware

– ODBC client software that is written to communicate remotely to an Oracle database will not communicate to an IBM database

Remote Database Access – Microsoft’s ApproachRemote Database Access – Microsoft’s Approach

Move away from ODBC, use OLE DB instead

OLE DB uses COM-objects, rather complex

ADO (Active Data Objects): another standard by Microsoft– A simpler COM-

compliant interface

– Uses OLE DB, and OLE DB uses an ODBC data provider

ADO

OLE DB

ODBC

Application or Tool

Database

Host DB interface

Remote DB access

Remote DB Access – SQL ParsingRemote DB Access – SQL Parsing

Parse step - turns the SQL commands into a query plan:– Define tables that will be accessed

– List of indexes to be used

– Filtered by which expression

– Define the output Execute query – parameters will be

provided– Output can be any length, e.g. a

million rows, large overhead on network

Optimization: can be achieved using stored procedure

Good for ad-hoc queries Not good for transaction processing

Parse Query

Execute Query

SQL Text

Client Server

Query Output Description

Execute command plus parameters

Query Plan

Output Data

Transaction ProcessingTransaction Processing Transaction

– A transaction is a sequence of operations that are carried out together and form a single unit

– Must be either end up completed (committed) or is completely undone– E.g. withdraw money from a bank account:

write a record of the debit update the account total update the bank teller record

Early transaction processing system:– Handled by transaction monitor, e.g. CICS on IBM MVS, TIP on Unisys

2200, COMS on Unisys A Series– On a single machine (mainframe)

Distributed transaction processing: databases were on different machines

How important is transaction?– Business involves transaction, not customer, e.g.

Give the bank a check to settle the credit card bill (atomic action) The bank has a complex business process to ensure the each payment is settled

(transaction)

Transaction Characteristics – ACID PropertiesTransaction Characteristics – ACID Properties

A for atomic: the transaction is never half done, if there is an error, it is completely undone

C for consistent: the transaction changes the DB from one consistent state to another consistent state, i.e.– The DB data integrity constraints hold true

– The DB need not be consistent within the transaction

– Includes explicit data integrity (e.g. product codes must be between 8 to 10 digits long) and internal integrity constraints (e.g. all index entries must point at valid records)

I for isolation: the data updates within a transaction are not visible to other transaction until the transaction is completed

D for durable: when a transaction is done, it really is done and the updates will not disappear at sometime in the future

Transactional ArchitectureTransactional Architecture Local transactions: all operations are carried out on the same database

engine Distributed transactions: involve multiple database servers, which are

usually deployed on different hosts, requires a transaction manager to co-ordinate the processing and managed by a 2-phase commit process

Global transactions: a distributed transaction that encompasses not only different servers, but different application, e.g. EJB application that sends messages to a legacy system using JMS

Transaction demarcation:– Transaction demarcation is defined as the specification of transaction

boundaries: indicating when a transaction begins and ends

– The execution of a program “crosses a transactional boundary” whenever there is a change in transaction context

– A transaction context is the association of a transaction with an application component

Classic ExampleClassic Example

Transfer money from a savings account to a current account, one of the two followings should occur:– Money is withdrawn from savings account and deposited into the current

account

– Money is not taken out from the savings account and is not deposited into the current account

Classic implementation:begin_transaction()

withdraw_from_savings()

deposit_to_current()

commit_transaction() With one DB, the DBMS can serve as the transaction coordinator With multiple DBs, we need distributed transaction coordinator to

coordinate the processing using 2-phase commit

Transaction using ObjectsTransaction using Objects

Class SavingsAccount with the following withdraw method that can be reused:

Class CurrentAccount with the following deposit method that can be reused:

withdraw(amount){

begin_transaction();…withdraw_money;…end_transaction();

}

deposit(amount){

begin_transaction();…deposit_money;…end_transaction();

}

Transaction involves multiple objectsTransaction involves multiple objects

Implement the transfer operation:transfer(savingsAccount, currentAccount, amount)

{

savingsAccount.withdraw(amount);

currentAccount.deposit(amount);

} Problem:

– savingsAccount.withdraw(amount) and currentAccount.deposit(amount) are in different transactions

Will the following work?Will the following work?

Remove the transaction boundaries from withdraw and deposit

Add the transaction boundary to the transfer:transfer(savingsAccount, currentAccount, amount)

{

begin_transaction();



end_transaction();

}

Problem: withdraw() and deposit() are no longer transactionally protected

withdraw(amount){

…withdraw_money;…

}

deposit(amount){

…deposit_money;…

}

Automatic Transaction Boundary ManagementAutomatic Transaction Boundary Management

No transaction boundary management code in the business method:transfer(savingsAccount, currentAccount, amount)

{



}

Transactions are defined by the run-time environment using transaction attributes, e.g. Required

Supported by modern transaction servers, e.g. MS MTS, J2EE

withdraw(amount){


}

deposit(amount){


}

Transaction AttributesTransaction Attributes

TransferSessiontransfer(savingsAccount,currentAccount,amount) RequiresNew{

savingsAccount.withdraw(amount);currentAccount.deposit(amount);

}

SavingsAccountwithdraw(amount) Required{


}

CurrentAccountdeposit(amount) Required{


}

Transactional ContextTransactional Context

TranferSession

transfer( )

SavingsAccount

withdraw( )

CurrentAccount

deposit( )

Transaction Manager

2.Register TransferSession

TransferSession

SavingsAccount

CurrentAccount

1.Invoke transfer()

3.Create a new Transactional Context and add TransferSession

4.Invoke withdraw()

RequiresNew

Required

Required

5.Register SavingsAccount

6.Add SavingAccount

7.Invoke deposit()

8.Register CurrentAccount

9.Add CurrentAccount

Transaction Context X

Transactional Context (cont)Transactional Context (cont)

TranferSession

transfer( )

SavingsAccount

withdraw( )

CurrentAccount

deposit( )

Transaction Manager

2.

TransferSession

SavingsAccount

CurrentAccount

1.Invoke transfer()4.Invoke withdraw()

RequiresNew

Required

Required

7.Invoke deposit()

Transaction Context X

3.

5.

6.

8

9.

10. Check TxCtx X to make sure that updates to each will work

11. Commit all updates or

roll back all updates

Transaction Isolation LevelTransaction Isolation Level

Isolation: a transaction should be ignorance of the existence of any other transactions– If a second transaction attempts to read the data being modified by the

first transaction, that second transaction will not see any changes the first transaction has made

E.g. if my wife read a joint saving account balance at the same time I was transferring money from saving to current, the following events may occur:– I see my saving account balance $1000

– I initiate a transfer of $500 from saving to current

– The system debits my saving account

– My wife requests the balance for the saving account and sees $1000

– The system credits my current account Full transaction isolation: a transaction act on the DB as if it were the

only thread operating on the DB– Unable to support high concurrency requirement in real systems

Isolation LevelsIsolation Levels

To achieve better concurrency ANSI SQL identifies 4 distinct transaction isolation levels

– To balance performance needs with data integrity needs Isolation conditions:

– Dirty read: a transaction A views the uncommitted changes of another transaction B, if transaction B rolls back its change, transaction A is said to have “dirty” data

– Nonrepeatable read: a transaction A reads different data from the same query when it is issued multiple times and other transactions have changed the rows between reads by transaction A. A transaction that mandates repeatable reads will not see the committed changes made by other transactions.

– Phantom read: a phantom reads deals with changes in other transactions that would result in new rows matching the transaction’s WHERE clause. E.g. if transaction A reads all accounts with balance < 100 and A performs 2 reads, a phantom read allows for new rows to appear in second read based on changes made by other transactions

ANSI SQL Transaction Isolation LevelsANSI SQL Transaction Isolation Levels

Read uncommitted transactions: allows dirty reads, nonrepeatable reads, and phantom reads

Read committed transactions: only data committed to the DB may be read, can perform nonrepeatable and phantom reads

Repeatable read transactions: committed, repeatable reads as well as phantom reads are allowed, nonrepeatable reads are not allowed

Serializable transactions: only committed, repeatable reads are allowed, phantom reads are not allowed

The support of different isolation levels depends on the database engine

Distributed Transaction ProcessingDistributed Transaction Processing

Steps

1. The client tells the middleware that a transaction begins

2. The client calls server A

3. Server A updates the database

4. The client calls the server B

5. Server B updates its database

6. The client tells the middleware that the transaction ends

If the updates to 2nd DB failed (5), updates to 1st DB are rolled back

To maintain ACID properties, all locks acquired by DB cannot be released until end of transaction (6)

Begin Transaction

Commit

Client

Server A

Update DB on A

Server B

Update DB on B

2 phase commit

X/OpenX/Open

A consortium to establish the standards for distributed transaction processing (now called Open Group)

Standard protocol between the middleware and the database is called XA protocol

XA compliant: a DB that can co-operate with X/Open DTP middleware in a two-phase commit protocol

X/Open standard for client/server contains 3 protocols:– Protocol based on SNA LU6.2 (IBM): peer-to-peer protocol with no

marshalling

– Protocol based on DCE’s remote procedure calls (from Encina): parameter marshalling and threads are blocked during a call

– XATMI protocol based on ATMI (from Tuxedo): uses message format called View Buffers, supports both RPC-like calls and unblocked calls

Message QueuingMessage Queuing

Program-to-queue communication Message queue: like a very fast mail box, i.e. put a message in a box

without the recipient being active Actions:

– Put: puts a message into the queue

– Get: takes a message out of the queue Message queue software:

– Transfer of messages from queue to queue

– Ensures message arrives eventually and only one copy of the message is placed in the destination queue

Queue

Put

PutQueue

Get

Get

Message Queue SoftwareMessage Queue Software

Queue have names The queues are independent of program, i.e. many programs can

perform Puts and Gets on the same queue, and a program can access many queues

If the network goes down, messages will wait in the queue until the network comes up

The queues can be put onto a disk so that the queue is not lost even the system goes down

The queue can be a resource manager and co-operate with a transaction manager, i.e. if the message is put in a queue during a transaction and the transaction later aborted, then the DB will be rolled back and the message is removed from the queue

Some message queue systems can cross networks of different types, e.g. message goes through SNA leg, TCP/IP leg and then a Novell IPX leg

Efficient: used in applications require sub-second response time

Message Queue SoftwareMessage Queue Software

Products: IBM MQSeries, Microsoft MSMQ, BEA Systems Tuxedo/Q Disadvantage: no parameter marshalling, up to the sender and the

receiver to interpret the data Peer-to-peer instead of client/server

MQ vs. DTP – DTP solutionMQ vs. DTP – DTP solution

Example: transfer money from account A to account B using DTP

Actions “Debit Account A” and “Credit Account B” are done in one distributed transaction

Any failure aborts the whole transaction

Disadvantages: Performance degrade due to

two-phase commit If either of the system is down

or network is down, the transaction cannot take place

Debit Account A

Machine X

Start Transaction

Initiate Partner Transaction

2 phase commit

Machine Y

Credit Account B

Start Transaction

MQ vs. DTP – MQ solutionMQ vs. DTP – MQ solution

Message queue solution to the same problem

Message is not allowed to reach machine Y until the first transaction has committed (make sure 1st transaction is not aborted)

Flaw (not atomic): If the destination transaction

fails, money is taken out of A and disappears

Debit Account A

Machine X

Start Transaction

Send Message

Commit

Machine Y

Credit Account B

Start Transaction

Read Transaction

Commit

MQ vs. DTP – MQ with ReversalMQ vs. DTP – MQ with Reversal

Reversal transaction in case transaction cannot be completed on Machine Y

Flaw (not isolated): Fails if account A is deleted

before the reversal takes effect, i.e. action “Credit Account A” cannot be processed

Practical solution: Wait for a complaint and do

manual adjustment

Debit Account A

Machine X

Start Transaction

Send Message

Commit

Machine Y

FAIL !!

Start Transaction

Read Message

Commit

Send MessageStart Transaction

Read Message

Credit Account A

What will happen to these technologies?What will happen to these technologies?

Technology Replacement

RPC Component middleware

Distributed transaction middleware Component middleware

MSMQ Provides a component interface to message queue and supports sending objects

MQSeries Will exist

Remote data access Will exist to solve simple problems that require quick solution

csis0402 system architecture distributed computing - middleware technology (1) k.p. chow university...

Documents

node network

distributed systems

different network address

remote data access

att version network

layer slide

open systems standards

client slide