distributed operating systems neeraj suri

Distributed Operating Systems

Neeraj Suri

www.deeds.informatik.tu-darmstadt.de

First: Evaluation Forms!• Please fill the provided forms for the evaluation of our

lecture

• Your answers are providing us with feedback for improving the lecture/exercises/labs

• Online:www.fachschaft.informatik.tu-darmstadt.de/feedback

• Two volunteers should bring the filled forms to Fachschaft Informatik (Raum D120, S2|02)

• Thanks!

Coverage• DS Paradigms

– DS & OS’s– Services and models– Communication– File Systems

• Coordination– Dist. ME– Dist. Co-ordination– Synchronization

• DS Scheduling & Misc. Issues

What is a Distributed System“A distributed system is the one

preventing you from working because of the failure of a machine that you had never heard of.”

Leslie Lamport

• Multiple computers sharing (same) state and interconnected by a network

Distribution: Example Pro/Cons

All the Good Stuff: High-Perf, Distributed Access, Scalable, Heterogeneous, Sharing (Concurrency), Load Balancing (Migration, Relocation), FT , …

• Bank account database (DB) example– Naturally centralized: easy consistency and performance

– Fragment DB among regions: exploit locality of reference, security & reduce reliance on network for remote access

– Replicate each fragment for fault tolerance

• But, we now need (additional) DS techniques– Route request to right fragment

– Maintain access/consistency of fragments as a whole database

– Maintain access/consistency of each fragment’s replicas

– …

Transparency: Global Access

• Illusion of a single computer across a DS

Multiprocessor OS Types (1)

- Each CPU has its own operating system- Shared bus commn. blocking & CPU idling!

Bus


Master-Slave multiprocessorsBus

- Master is a bottleneck!


• Symmetric Multiprocessors– SMP multiprocessor model

Bus

- Eliminates the CPU bottleneck, but have issues associated to ME, synchronization

- Mutex on OS?

OS’s for DS’s• Loosely-coupled OS

– A collection of computers each running their own OS, OS’s allow sharing of resources across machines

– AKA Network Operating System (NOS)– Manages heterogeneous multicomputer DS– Difference: provides local services to remote clients via remote logging– Data transfer from remote OS to local OS via FTP (File Transfer Protocols)

• Tightly-coupled OS– OS tries to maintain single global view of resources it manages– AKA Distributed Operating System (DOS)– Manages multiprocessors & homogeneous multicomputers– Similar “local access feel” as a non-distributed, standalone OS– Data migration or computation migration modes (entire process or threads)

Network Operating Systems (NOSs)

Distributed Operating Systems: DOS’s

Client Server Model for DOS & NOS

Middleware

• Can we have the best of both worlds?– Scalability and openness of a NOS

– Transparency and relative ease of a DOS

• Solution: additional layer of SW above NOS– Mask heterogeneity

– Improve distribution transparency (and others)

“Middleware”

Middleware (& Openness)

File System-Based Middleware

(a)(b)

• Approach: make a DS look like a big file system• Transfer: (a) upload/download model (work done locally)

(b) remote access model (work done remotely)

(a) Two file systems(b) Naming Transparency: All clients have same view of FS(c) Some clients with different FS view



• Semantics of File sharing (ordering and session semantics)– (a) single processor gives sequential consistency– (b) distributed system may return obsolete value

Shared Object-Based Middleware

• Main elements of CORBA based system– Common Object Request Broker Architecture

• Approach: make a DS look like objects (variables + methods)• Easy Scaling to large systems

• replicated objects (C++, Java)• flexibility

inter-ORB protocol


Internet structured object (Globe)


A distributed shared object in Internet– can have its state copied on multiple computers at once– how to maintain sequential consistency of write operations?

OS’s, DS’s & MW

• Know the above table well!!!

Network Hardware

Network Services and Protocols

Network Services

(blocking)

(non-blocking)

Client-Server Communications

• Unbuffered msg passing– send(addr,msg), recv(addr,msg)

– all request and reply at C/S level

– all msg. acks between kernels only

• Buffered msg. passing– msg. sent to kernel mailbox or

kernel/user interface socket

client server

kernel kernel

msg. directed at a process

client server

kernel kernel

blocking? non-blocking?

Remote Procedure Calls

• Synchronous/Asynchronous (blocking/non-blocking) communication– [Sync] client generated request, STUB kernel– [Sync] kernel blocks process till reply received from server– [ASync] buffers msg

RPC & Stubs (Dummy Procedure i.p.o RPC)

• [C] call “client stub” procedure• [CS] prepare msg. buffer• [CS] load parameters into buffer• [CS] prepare msg. header • [CS] send trap to kernel• [K] context switch to kernel• [K] copy msg. to kernel• [K} determine server address (NS)• [K} put address in header• [K} set up network interface• [K] start timer for msg

• [S] process req; initiate “server stub”• [SS] call server• [SS] set up parameter stack/unbundle• [K] context switch to server stub• [K] copy msg. to stub• [K] see if stub is waiting• [K] decide which stub to assign• [K] check packet for validity• [K] process interrupt (save PC, kernel

state)

NSC: Client; CS: Client StubS: Server; SS: Server Stub

Remote Procedure Call

• Implementation Issues• Can we pass pointers? (local context…)

– call by reference becomes copy-restore (but might fail)

• Weakly typed languages (C) allow computations (say product of arrays sans array size specs)– can client stub determine unspecified size to pass on?

• Not always possible to determine parameter types• Cannot use global variables

– C/S may get moved to remote machine

RPC Failures?

• C/S failure vs. communication failure?• Who detects? Timeouts?• Does it matter if a node (C/S) failed BEFORE or

AFTER a request arrived? BEFORE or AFTER a request is processed?

• Client failure: orphan requests? add expiration counters

• Server crash?

Communication

• Delivers messages despite– communication link(s) failure

– process failures

• Main kinds of failures to tolerate– Timing (link and process)

– Omission (link and process)

– Value

Communication: Reliable Delivery

• Omission failure tolerance (degree k).

• Design choices:a) Error masking (spatial): several (> k) links

b) Error masking (temporal): repeat K+1 times

c) Error recovery: detect error and recover

Reliable Delivery (cont.)

• Error detection and recovery: ACK’s and timeouts

• Positive ACK: sent when a message is received– Timeout on sender without ACK: sender retransmits

• Negative ACK: sent when a message loss detected– Needs sequence #s or time-based reception semantics

• Tradeoffs– Positive ACKs faster failure detection usually– NACKs : fewer msgs…

• Q: what kind of situations are good for– Spatial error masking?– Temporal error masking?– Error detection and recovery with positive ACKs?– Error detection and recovery with NACKs?

Resilience to Sender Failure

• Multicast FT-Communication harder than point-to-point– Basic problem is of failure detection

– Subsets of senders may receive msg, then sender fails

• Solutions depend on flavor of multicast reliabilitya) Unreliable: no effort to overcome link failures

b) Best-effort: some steps taken to overcome link failures

c) Reliable: participants coordinate to ensure that all or none of correct recipients get it (sender failed in b)

Distributed-File Systems• Multiple users, multiple sites, multiple files & storage

• Transparency of services (local =‘s remote)– network transparency

– location transparency/independence (migration transparency)

– name transparency (symbolic pathnames)

– universal access from all sites

– concurrency (use of resources) transparency

• Availability/Performance– fault tolerance

– security

– scalability

• Sanity of Ops – sequencing of actions

– cache/consistency

Typical Model: Client Server (NFS)(a) Upload/Download Model: file resides on FS, moved to client on request

(b) Remote Access Model

2: File ops done at client’s local cache

client file server

workcopy

initial

final

1

3

updated file returned to server

• storage needs on client/FS?• 2nd client trying to access file consistency of data? ME?

client file server

file

file resides on FS

file ops done on FSC/FS comm. via RPC for all file ops

• data consistency• performance? • network dependence?

Transparency• location transparency (file name does not reveal the file’s physical storage

location)– /server X/dir Y/ f.n

– server X identified by name, not by physical ID or location

– file moves to server Z – transparent?

• location independence (file name does not need to be changed when the file’s physical storage location changes )– file accessible by name not by path

– (a) /server group name/ <rest of path>

– (b) file/dir mounting (exporting): symbolic linkage• tree rooted at /home/X is mounted on /rule/home/X point, the users of “rule” can see

the X file system as if were a directory under /rule/home/X (the mounting point)

• access ~/home/X [~ location transparency]

– MOUNT Table at client? client access flexibility

– MOUNT Table at server? server transparency for updates and consistency

Consistency

• Is locally cached copy of the data consistent with the master copy?

• Client-initiated approach– Client initiates a validity check– Server checks whether the local data are consistent with the master

copy

• Server-initiated approach– Server records, for each client, the (parts of) files it caches – When server detects a potential inconsistency, it must react

Caching and Remote Service

• Servers contracted only occasionally in caching (not for

each access)– Reduces server load and network traffic– Enhances potential for scalability

• Total network overhead in transmitting big chunks of data (caching) is lower than a series of responses to specific requests (remote-service)

• Caching best with infrequent writes– With frequent writes, substantial overhead incurred to overcome

cache-consistency problem

Caching and Remote Service (Cont.)

• Who (co-ordinates) caching in a DFS?

• Robust? Access control handled by …?

• What policy? Write-Thru, Delayed write (NFS v2: 3 sec for data block, 30 sec for dir. block, forced write after that), write-on-close (NFS v3+: write file block back to server), sender initiated (RFS: server maintains global client view … not scalable)

Caching & File Replication

• File replicas reside on failure-independent machines• Improves availability and can shorten service time• Naming scheme maps a replicated file name to a particular

replica– Existence of replicas should be invisible to higher levels – Replicas must be distinguished from one another by different

lower-level names• BUT: Updates – replicas of a file denote the same logical

entity, and thus an update to any replica must be reflected on all other replicas

• Demand replication – reading a nonlocal replica causes it to be cached locally, thereby generating a new nonprimary replica.

Stateful File Service (RFS, AFS)• Server remembers last request!

– Client opens a file– Server fetches information about the file from its disk, stores it in its

memory, and gives the client a connection identifier unique to the client and the open file

– Server maintains request/file status in centralized tables: open/close file, data consistency, file op. ordering etc

– Identifier is used for subsequent accesses until the session ends– Server must reclaim the main-memory space used by clients who are no

longer active– Performance! Fewer disk accesses, shorter msgs– Stateful server knows if a file was opened for sequential access and can

thus read ahead the next blocks, file locking, good cache predictions– caching (client and server) with concurrent write invalidation (state info)– poor FT: server crashes complex state recovery! Restore state by recovery

protocol based on a dialog with clients, or abort operations that were underway when the crash occurred

• Server needs to be aware of client failures in order to reclaim space allocated to record the state of crashed client processes (orphan detection and elimination)

Stateless File Server (NFS)

• Acts on per-request basis (client sends req to server; server executes req and replies; server “deletes” all info about client post-request) – no state information stored!– Each request identifies the file and position in the file

– No need to establish and terminate a connection by open and close operations

– + no server space used for file tables etc

– + FT: server crashes: no state recovery

client crashes: no effect on server consistency

– - file sharing/locking

– - long msg. and network dependency

Distinctions

• Some environments require stateful service– A server employing server-initiated cache validation

cannot provide stateless service, since it maintains a record of which files are cached by which clients

– UNIX use of file descriptors and implicit offsets is inherently stateful; servers must maintain tables to map the file descriptors to inodes, and store the current offset within a file

Example: NFS: Stateless Server

• No file open/close semantic visible to clients (unlike UNIX); file access done using file handles + locking protocol on file for gen # consistency

• Session semantics: file attributes known to other files only at client/server instance (at time of discrete READ/CLOSE)

• Consistency processes: 4.2 BSD/NFS delayed write (x secs after write); NFV v3+ write-on-close

• stateless nice FT for clients– server crash clients hang (NFS server mount or RPC service not responding)

– ideally: no dedicated server

– reality: dedicated, customized fall-back server lists

ANDREW FS: Stateful Server• Very large systems (5000+)• Clients and servers structured in clusters interconnected by a backbone LAN• A cluster consists of a collection of workstations and a cluster server and is

connected to the backbone by a router

• Key mechanism for remote file operations is whole file caching from servers– Opening a file causes it to be cached, in its entirety, on the local disk– A client workstation interacts with Vice servers only during opening and closing

of files– Files modified locally and updated only on CLOSE

• Reading and writing bytes of a file are done by the kernel without intervention on the cached copy

• Caches contents of directories and symbolic links, for path-name translation• Exceptions to the caching policy are modifications to directories that are

made directly on the server responsibility for that directory

ANDREW (Cont.)

• Clients are presented with a partitioned space of file names: a local name space and a shared name space

• Dedicated servers, present the shared name space to the clients as an homogeneous, identical, and location transparent file hierarchy

• The local name space is the root file system of a workstation, from which the shared name space descends

• Fids are location transparent; therefore, file movements from server to server do not invalidate cached directory contents

• Location information is kept on a volume basis, and the information is replicated on each server

AFS File System

Client's view

ANDREW Implementation

• Client processes are interfaced to a UNIX kernel with the usual set of system calls

• Venus carries out path-name translation component by component• The UNIX file system is used as a low-level storage system for both

servers and clients– The client cache is a local directory on the workstation’s disk

• Server processes access UNIX files directly by their inodes to avoid the expensive path name-to-inode translation routine

distributed operating systems neeraj suri

Documents

ds slide

network slide

middleware slide

bus slide

doss slide

remote os

dos nos slide

middleware openness