Distributed Systems 2006 3
Basic Client/Server Properties
Often little communication among clients Client to server ratio often high
– Need for caching to achieve performance
Primary server
Database
Access objectsAccess files
Access relations
Distributed Systems 2006 4
Caching
Storing shared state at both client and server– Requests may be handled
locally
State may become out-of-date if updated outside cache– Stale vs consistent
Coherent caching– Guarantees that cache is not
stale
Distributed Systems 2006 5
Stateless and Stateful
Stateless architectures– Server does not track clients or ensure
that cached data is up-to-date; clients independently responsible
– Server may change state without notifying clients; thus shared state may become stale – caches as hints
– May take load off server, simplifies error recovery
– Distributed file systems
Stateful architectures– Client make take actions, assuming
that local state is correct– Servers typically need to track clients– Mimic behavior of non-replicated/-
shared data – often locking of data– Transactional databases
Primary server
Database
Distributed Systems 2006 6
Distributed File Systems
Two major stateless examples– Sun NFS on Unix/Linux/...– Windows NTFS
Emulate local file access interface– File access on remote
server using RPC– Caching to improve
performance
Application
Data Cache
Buffer Pool
File Store
Distributed Systems 2006 7
Sun NFS
Developed by Sun in the 1980’s to provide a file system for diskless clients– NFS Version 4 (2000) introduces part state as in NTFS...
Distributed Systems 2006 8
Unix File Systems
Virtual file system allows the use of many actual file systems
Unix files are described using inodes– Contain mode, ownership information, time stamps,
addresses of actual file blocks, ...
A directory is a file containing list of names and inodes
Commands– E.g., mount, create, open, read, write
Distributed Systems 2006 9
NFS Operations
RPC is not reliable, retry until success Mount
– Client issues mount RPC request– Server checks request for validity. Is file available and is user allowed to access?– Server returns handle– Client stores in remote mount table
Create, open– Client file system sees that path maps to remotely mounted file system, sends request to
server– Server takes action, returns virtualized inode, vnode– Example of reliability problems: Create/delete used to create lock files vs timeouts
Read, write– Uses vnodes– Cache vnodes and subset of blocks of files
• Freshness interval during which no cache validation is made• Otherwise validation based on timestamps of files
– Write-through cache policy• Client applications see own modifications, modifications written back to server eventually• Another source of problems: Even though cache is validated at client, server does not see complete
state
Distributed Systems 2006 10
Real-World File Access?
[Baker et al., 1991]: Study of file access patterns– All file access is sequential– Little sharing of files between applications – one
application creates, other application reads– Files are either very short-lived or very long-lived
Stateful distributed file systems– Andrew File System– Sprite file system– CODA– XFR
Distributed Systems 2006 11
Transactional Databases
Assumptions– All stateful interaction client <-> server– Interaction through short sequences of
• begin -> {read, update}* ->{commit, abort}
issued at clients
Structure interaction in transactions– Statefulness
• E.g., data read and written during transactions, ongoing transaction shared state
80% of all reliable distributed computing is based on transactions– Just like 80% of all programming is done in Visual Basic– In this course, transactions is just one reliability technique
Distributed Systems 2006 12
Properties of Transactions
The basic unit of interaction with a database server
Guaranteed properties (ACID)– Atomicity
• Each transaction is executed to completion or not at all• Rollbacks may be necessary
– Concurrency• Transactions are minimized so as to maximize concurrency
– Independence/Isolation• Transactions execute independently on each other• Effects of interactions do not interleave
– Durability• Results of committed transactions are persistent
Distributed Systems 2006 13
Serializability
Isolation of concurrent transactions– Makes it easier to program database applications
T1 and T2 executed concurrently/interleaved, effects will be as– T1 executed, then T2 executed, or– T2 executed, then T1 executed
Simple, but slow approach: – Run transactions serially, i.e., no interleaving...
The point here is that database server may optimize based on interleaved scheduling of transcation steps!
Distributed Systems 2006 14
Lost Update
T1– salary1 = doctor1.getSalary()
– doctor1.setSalary(salary1 + bonus)
– salary2 = doctor2.getSalary()
– doctor2.setSalary(salary2 - bonus)
T2
– salary1 = doctor1.getSalary()
– doctor1.setSalary(salary1 + bonus)
– salary3 = doctor3.getSalary()
– doctor3.setSalary(salary3 - bonus)
Distributed Systems 2006 15
A Serializable Transaction Interleaving
T1– salary1 = doctor1.getSalary()
– doctor1.setSalary(salary1 + bonus)
– salary2 = doctor2.getSalary()
– doctor2.setSalary(salary2 - bonus)
T2
– salary1 = doctor1.getSalary()
– doctor1.setSalary(salary1 + bonus)
– salary3 = doctor3.getSalary()
– doctor3.setSalary(salary3 - bonus)
Distributed Systems 2006 16
Inconsistent Retrievals
T1– ward1.discharge(patient)
– ward2.admit(patient)
T2
– sum += ward1.getNoPatients()
– sum += ward2.getNoPatients()
Distributed Systems 2006 17
Achieving Serializability
Avoiding conflicts– Result of T1.a.read and T2.a.write depends on order
of execution– Result of T1.a.write and T2.a.write depends on order
of execution
For all a, if access to a is in conflict:– T1 needs to make all accesses to a before T2
accesses a
Or vice versa
Distributed Systems 2006 18
Concurrency Control
Locking (1)– Use read/write locks on objects that are accessed
• Read locks non-exclusive• Write locks exclusive
– Take these locks before objects are accessed Optimistic
– Operate on tentative version of objects– Check with overlapping transactions just before commit whether
there is a potential conflict on an object– If so, abort; otherwise commit
Time-stamped– Timestamp transactions uniquely– Validate operations, e.g.,
• Write allowed on an object only if read and written last by earlier transaction
• Read allowed on an object only if written last by earlier transaction
Distributed Systems 2006 19
Deadlock
T1– ward1.discharge(patient1)
– ward2.admit(patient1)
T2
– ward2.discharge(patient2)
– ward1.admit(patient2)
Locking (2)
Distributed Systems 2006 20
Avoiding Deadlocks
Deadlock prevention– E.g., requesting locks in a predefined order
Deadlock detection– Cycles in wait-for graph
Timeout
Distributed Systems 2006 21
Ensuring Serializability
Two-phase locking – Growing phase
• During transaction– Acquire write lock on object if it will update– Acquire read lock on object if it will only read– Don’t release!
– Shrinking phase• At abort/commit
– Release locks when aborted or when updates are persistent
Serializability?– Order of conflicting updates cannot change
• Reads commute • If T2 updates an object after T1 has read or updated, T2 will have to
wait• If T2 reads an object after T1 has updated, T2 will have to wait
Distributed Systems 2006 22
Two-Phase Locking Example
T1– salary1 = doctor1.getSalary()
– doctor1.setSalary(salary1 + bonus)
– salary2 = doctor2.getSalary()
– doctor2.setSalary(salary2 - bonus)
T2
– salary1 = doctor1.getSalary()
– doctor1.setSalary(salary1 + bonus)
– salary3 = doctor3.getSalary()
– doctor3.setSalary(salary3 - bonus)