csis0402 system architecture distributed computing - middleware technology (1) k.p. chow university...
TRANSCRIPT
CSIS0402 System Architecture CSIS0402 System Architecture Distributed ComputingDistributed Computing
- - Middleware Middleware Technology (1)Technology (1)
K.P. Chow
University of Hong Kong
Early DaysEarly Days
Mainframe terminal networks Connecting “dumb” green-screen
terminals to mainframe Terminal identified by line id
Programs
Terminal handler
TerminalsDistributed network Each machine can have many
connections (TCP/IP port no.) Each machine is identified by its
IP address
Programs
Network interface
Distributed SystemsDistributed Systems
First distributed system: ARPANET– 1969: 4-node network
– 1972: about 50 computers Communication between companies in the same industry:
– SWIFT (Society for Worldwide Interbank Financial Telecommunication) for international money transfers in the financial industry
– IATA (International Air Transport Association) in the airline industry Vendor specific network architecture (1970s)
– IBM’s System Network Architecture (SNA)
– Sperry’s Distributed Computing Architecture (DCA)
– Burroughs’ Network Architecture (BNA)
– DEC’s Distributed Network Architecture (DNA)
– Basic services: file transfer, remote printing, terminal transfer, remote file access
– Distributed applications: e-mail
Open Systems Interconnection (OSI)Open Systems Interconnection (OSI)
Force IT vendors to implement the same standards, e.g. RS232 and X.25 Use International Organization for Standardization (ISO) as the standards
authority Based on OSI (Open Systems Interconnection) series of standards
– OSI Basic Reference Model: seven-layer model
Application
Presentation
Session
Transport
Network
Data Link
Physical
Application
Presentation
Session
Transport
Network
Data Link
Physical
Sender Receiver
Flow of data
Flow of data
Protocol between each layer
TCP/IPTCP/IP Open Systems Standards
– Too slow
– Standards are too complex, e.g. OSI virtual terminal standard
Unix– Open systems
– Cheap and portable
– Many versions: Berkeley version and AT&T version
– Network support: TCP/IP TCP/IP
– Internet Protocol (IP): network standard
– Transmission Control Protocol (TCP): session standard for program-to-program communication over IP
– Provides application programming interfaces (APIs) for sending messages/data over the network: Sockets
– Services: telnet, snmp, ftp
TCP UDP
IP
Ethernet ATM
Session
Transport
Network
Data Link
Why Middleware?Why Middleware?
When building distributed systems, need to build the middleware: Easy of use: compare to Sockets Location transparency:
– Application should not need to know the network and application address
– Possible to move an application from a machine to another machine with different network address without recompilation
Message delivery integrity: messages should not be lost or duplicated Message content integrity: messages should not be corrupted Language transparency: a program using middleware should be able to
communicate with another program written in a different language Support client/server model: server provides a service for the client
Traditional Middleware ModelsTraditional Middleware Models
Remote Procedure Call (RPC) Remote data access Transaction processing Distributed transaction processing Message queue
Remote Procedure Call (RPC)Remote Procedure Call (RPC)
Access a remote service through remote procedure calls: the syntax in the client (the caller) and the server (the called) remains the same as if they were on the same machine
Known RPC mechanism:– Open Network Computing (ONC) from SUN
– Distributed Computing Environment (DCE) from Open Software Foundation (OSF)
Stub
Skeleton
Interface Definition Language (IDL)
Compilation Environment
Client ServerPrice = getPrice(Product);
Price = getPrice(Product);
Remote Procedure Calls (RPC)Remote Procedure Calls (RPC)
Stub and skeletons– Generated by IDL compiler
– Small chunks of C code that are compiled and linked into the client and server programs
Stub: converts parameters into a string of bits and sends the message over the network
Skeleton: takes the messages and converts it back to parameters, and calls the server program
Marshalling: handles different data format, e.g. byte ordering
Call getPrice(string Product) return float // price
010011010110011110001010
getPrice(string Product) return float // price
Parameters marshalled into a string of bits
Client
Server
RPC and Multi-threadingRPC and Multi-threading
Using RPC– A client is blocked when it calls a remote procedure
– Client may be left waiting due to message lost, server too slow, or server halted
– Server is also blocked when process a request Multi-thread
– Allow client to read from the keyboard or the mouse while waiting for remote procedure call returns
– Allow server to handle many clients simultaneously, e.g. 1000 threads to handle 1000 clients
– May have synchronization problem during resources sharing: need to use locks or semaphores, e.g. withdraw money from the same account from 2 different ATMs
– Difficult to test and debug multi-thread program
Remote Database AccessRemote Database Access
Provides the ability to read or write to a database that is physically on a different machine from the client program
Vendor specific, e.g. Oracle’s Oracle Transparent gateway, IBM’s DRDA
ODBC (Open Database Connectivity)– Remote database access provided by Microsoft
– Standard programming interface (only) for Windows, not middleware
– Vendors will have to provides the client and server middleware
– ODBC client software that is written to communicate remotely to an Oracle database will not communicate to an IBM database
Remote Database Access – Microsoft’s ApproachRemote Database Access – Microsoft’s Approach
Move away from ODBC, use OLE DB instead
OLE DB uses COM-objects, rather complex
ADO (Active Data Objects): another standard by Microsoft– A simpler COM-
compliant interface
– Uses OLE DB, and OLE DB uses an ODBC data provider
ADO
OLE DB
ODBC
Application or Tool
Database
Host DB interface
Remote DB access
Remote DB Access – SQL ParsingRemote DB Access – SQL Parsing
Parse step - turns the SQL commands into a query plan:– Define tables that will be accessed
– List of indexes to be used
– Filtered by which expression
– Define the output Execute query – parameters will be
provided– Output can be any length, e.g. a
million rows, large overhead on network
Optimization: can be achieved using stored procedure
Good for ad-hoc queries Not good for transaction processing
Parse Query
Execute Query
SQL Text
Client Server
Query Output Description
Execute command plus parameters
Query Plan
Output Data
Transaction ProcessingTransaction Processing Transaction
– A transaction is a sequence of operations that are carried out together and form a single unit
– Must be either end up completed (committed) or is completely undone– E.g. withdraw money from a bank account:
write a record of the debit update the account total update the bank teller record
Early transaction processing system:– Handled by transaction monitor, e.g. CICS on IBM MVS, TIP on Unisys
2200, COMS on Unisys A Series– On a single machine (mainframe)
Distributed transaction processing: databases were on different machines
How important is transaction?– Business involves transaction, not customer, e.g.
Give the bank a check to settle the credit card bill (atomic action) The bank has a complex business process to ensure the each payment is settled
(transaction)
Transaction Characteristics – ACID PropertiesTransaction Characteristics – ACID Properties
A for atomic: the transaction is never half done, if there is an error, it is completely undone
C for consistent: the transaction changes the DB from one consistent state to another consistent state, i.e.– The DB data integrity constraints hold true
– The DB need not be consistent within the transaction
– Includes explicit data integrity (e.g. product codes must be between 8 to 10 digits long) and internal integrity constraints (e.g. all index entries must point at valid records)
I for isolation: the data updates within a transaction are not visible to other transaction until the transaction is completed
D for durable: when a transaction is done, it really is done and the updates will not disappear at sometime in the future
Transactional ArchitectureTransactional Architecture Local transactions: all operations are carried out on the same database
engine Distributed transactions: involve multiple database servers, which are
usually deployed on different hosts, requires a transaction manager to co-ordinate the processing and managed by a 2-phase commit process
Global transactions: a distributed transaction that encompasses not only different servers, but different application, e.g. EJB application that sends messages to a legacy system using JMS
Transaction demarcation:– Transaction demarcation is defined as the specification of transaction
boundaries: indicating when a transaction begins and ends
– The execution of a program “crosses a transactional boundary” whenever there is a change in transaction context
– A transaction context is the association of a transaction with an application component
Classic ExampleClassic Example
Transfer money from a savings account to a current account, one of the two followings should occur:– Money is withdrawn from savings account and deposited into the current
account
– Money is not taken out from the savings account and is not deposited into the current account
Classic implementation:begin_transaction()
withdraw_from_savings()
deposit_to_current()
commit_transaction() With one DB, the DBMS can serve as the transaction coordinator With multiple DBs, we need distributed transaction coordinator to
coordinate the processing using 2-phase commit
Transaction using ObjectsTransaction using Objects
Class SavingsAccount with the following withdraw method that can be reused:
Class CurrentAccount with the following deposit method that can be reused:
withdraw(amount){
begin_transaction();…withdraw_money;…end_transaction();
}
deposit(amount){
begin_transaction();…deposit_money;…end_transaction();
}
Transaction involves multiple objectsTransaction involves multiple objects
Implement the transfer operation:transfer(savingsAccount, currentAccount, amount)
{
savingsAccount.withdraw(amount);
currentAccount.deposit(amount);
} Problem:
– savingsAccount.withdraw(amount) and currentAccount.deposit(amount) are in different transactions
Will the following work?Will the following work?
Remove the transaction boundaries from withdraw and deposit
Add the transaction boundary to the transfer:transfer(savingsAccount, currentAccount, amount)
{
begin_transaction();
savingsAccount.withdraw(amount);
currentAccount.deposit(amount);
end_transaction();
}
Problem: withdraw() and deposit() are no longer transactionally protected
withdraw(amount){
…withdraw_money;…
}
deposit(amount){
…deposit_money;…
}
Automatic Transaction Boundary ManagementAutomatic Transaction Boundary Management
No transaction boundary management code in the business method:transfer(savingsAccount, currentAccount, amount)
{
savingsAccount.withdraw(amount);
currentAccount.deposit(amount);
}
Transactions are defined by the run-time environment using transaction attributes, e.g. Required
Supported by modern transaction servers, e.g. MS MTS, J2EE
withdraw(amount){
…withdraw_money;…
}
deposit(amount){
…deposit_money;…
}
Transaction AttributesTransaction Attributes
TransferSessiontransfer(savingsAccount,currentAccount,amount) RequiresNew{
savingsAccount.withdraw(amount);currentAccount.deposit(amount);
}
SavingsAccountwithdraw(amount) Required{
…withdraw_money;…
}
CurrentAccountdeposit(amount) Required{
…deposit_money;…
}
Transactional ContextTransactional Context
TranferSession
transfer( )
SavingsAccount
withdraw( )
CurrentAccount
deposit( )
Transaction Manager
2.Register TransferSession
TransferSession
SavingsAccount
CurrentAccount
1.Invoke transfer()
3.Create a new Transactional Context and add TransferSession
4.Invoke withdraw()
RequiresNew
Required
Required
5.Register SavingsAccount
6.Add SavingAccount
7.Invoke deposit()
8.Register CurrentAccount
9.Add CurrentAccount
Transaction Context X
Transactional Context (cont)Transactional Context (cont)
TranferSession
transfer( )
SavingsAccount
withdraw( )
CurrentAccount
deposit( )
Transaction Manager
2.
TransferSession
SavingsAccount
CurrentAccount
1.Invoke transfer()4.Invoke withdraw()
RequiresNew
Required
Required
7.Invoke deposit()
Transaction Context X
3.
5.
6.
8
9.
10. Check TxCtx X to make sure that updates to each will work
11. Commit all updates or
roll back all updates
Transaction Isolation LevelTransaction Isolation Level
Isolation: a transaction should be ignorance of the existence of any other transactions– If a second transaction attempts to read the data being modified by the
first transaction, that second transaction will not see any changes the first transaction has made
E.g. if my wife read a joint saving account balance at the same time I was transferring money from saving to current, the following events may occur:– I see my saving account balance $1000
– I initiate a transfer of $500 from saving to current
– The system debits my saving account
– My wife requests the balance for the saving account and sees $1000
– The system credits my current account Full transaction isolation: a transaction act on the DB as if it were the
only thread operating on the DB– Unable to support high concurrency requirement in real systems
Isolation LevelsIsolation Levels
To achieve better concurrency ANSI SQL identifies 4 distinct transaction isolation levels
– To balance performance needs with data integrity needs Isolation conditions:
– Dirty read: a transaction A views the uncommitted changes of another transaction B, if transaction B rolls back its change, transaction A is said to have “dirty” data
– Nonrepeatable read: a transaction A reads different data from the same query when it is issued multiple times and other transactions have changed the rows between reads by transaction A. A transaction that mandates repeatable reads will not see the committed changes made by other transactions.
– Phantom read: a phantom reads deals with changes in other transactions that would result in new rows matching the transaction’s WHERE clause. E.g. if transaction A reads all accounts with balance < 100 and A performs 2 reads, a phantom read allows for new rows to appear in second read based on changes made by other transactions
ANSI SQL Transaction Isolation LevelsANSI SQL Transaction Isolation Levels
Read uncommitted transactions: allows dirty reads, nonrepeatable reads, and phantom reads
Read committed transactions: only data committed to the DB may be read, can perform nonrepeatable and phantom reads
Repeatable read transactions: committed, repeatable reads as well as phantom reads are allowed, nonrepeatable reads are not allowed
Serializable transactions: only committed, repeatable reads are allowed, phantom reads are not allowed
The support of different isolation levels depends on the database engine
Distributed Transaction ProcessingDistributed Transaction Processing
Steps
1. The client tells the middleware that a transaction begins
2. The client calls server A
3. Server A updates the database
4. The client calls the server B
5. Server B updates its database
6. The client tells the middleware that the transaction ends
If the updates to 2nd DB failed (5), updates to 1st DB are rolled back
To maintain ACID properties, all locks acquired by DB cannot be released until end of transaction (6)
Begin Transaction
Commit
Client
Server A
Update DB on A
Server B
Update DB on B
2 phase commit
X/OpenX/Open
A consortium to establish the standards for distributed transaction processing (now called Open Group)
Standard protocol between the middleware and the database is called XA protocol
XA compliant: a DB that can co-operate with X/Open DTP middleware in a two-phase commit protocol
X/Open standard for client/server contains 3 protocols:– Protocol based on SNA LU6.2 (IBM): peer-to-peer protocol with no
marshalling
– Protocol based on DCE’s remote procedure calls (from Encina): parameter marshalling and threads are blocked during a call
– XATMI protocol based on ATMI (from Tuxedo): uses message format called View Buffers, supports both RPC-like calls and unblocked calls
Message QueuingMessage Queuing
Program-to-queue communication Message queue: like a very fast mail box, i.e. put a message in a box
without the recipient being active Actions:
– Put: puts a message into the queue
– Get: takes a message out of the queue Message queue software:
– Transfer of messages from queue to queue
– Ensures message arrives eventually and only one copy of the message is placed in the destination queue
Queue
Put
PutQueue
Get
Get
Message Queue SoftwareMessage Queue Software
Queue have names The queues are independent of program, i.e. many programs can
perform Puts and Gets on the same queue, and a program can access many queues
If the network goes down, messages will wait in the queue until the network comes up
The queues can be put onto a disk so that the queue is not lost even the system goes down
The queue can be a resource manager and co-operate with a transaction manager, i.e. if the message is put in a queue during a transaction and the transaction later aborted, then the DB will be rolled back and the message is removed from the queue
Some message queue systems can cross networks of different types, e.g. message goes through SNA leg, TCP/IP leg and then a Novell IPX leg
Efficient: used in applications require sub-second response time
Message Queue SoftwareMessage Queue Software
Products: IBM MQSeries, Microsoft MSMQ, BEA Systems Tuxedo/Q Disadvantage: no parameter marshalling, up to the sender and the
receiver to interpret the data Peer-to-peer instead of client/server
MQ vs. DTP – DTP solutionMQ vs. DTP – DTP solution
Example: transfer money from account A to account B using DTP
Actions “Debit Account A” and “Credit Account B” are done in one distributed transaction
Any failure aborts the whole transaction
Disadvantages: Performance degrade due to
two-phase commit If either of the system is down
or network is down, the transaction cannot take place
Debit Account A
Machine X
Start Transaction
Initiate Partner Transaction
2 phase commit
Machine Y
Credit Account B
Start Transaction
MQ vs. DTP – MQ solutionMQ vs. DTP – MQ solution
Message queue solution to the same problem
Message is not allowed to reach machine Y until the first transaction has committed (make sure 1st transaction is not aborted)
Flaw (not atomic): If the destination transaction
fails, money is taken out of A and disappears
Debit Account A
Machine X
Start Transaction
Send Message
Commit
Machine Y
Credit Account B
Start Transaction
Read Transaction
Commit
MQ vs. DTP – MQ with ReversalMQ vs. DTP – MQ with Reversal
Reversal transaction in case transaction cannot be completed on Machine Y
Flaw (not isolated): Fails if account A is deleted
before the reversal takes effect, i.e. action “Credit Account A” cannot be processed
Practical solution: Wait for a complaint and do
manual adjustment
Debit Account A
Machine X
Start Transaction
Send Message
Commit
Machine Y
FAIL !!
Start Transaction
Read Message
Commit
Send MessageStart Transaction
Read Message
Credit Account A
What will happen to these technologies?What will happen to these technologies?
Technology Replacement
RPC Component middleware
Distributed transaction middleware Component middleware
MSMQ Provides a component interface to message queue and supports sending objects
MQSeries Will exist
Remote data access Will exist to solve simple problems that require quick solution