distributed operating systems. coverage distributed systems (ds) paradigms –ds...

Download Distributed Operating Systems. Coverage Distributed Systems (DS) Paradigms –DS & OS’s –Services and Models –Communication –Distributed File Systems

Post on 23-Dec-2015




0 download

Embed Size (px)


  • Slide 1
  • Distributed Operating Systems
  • Slide 2
  • Coverage Distributed Systems (DS) Paradigms DS & OSs Services and Models Communication Distributed File Systems Coordination Distributed Mutual Exclusion (ME) Distributed Co-ordination Synchronization DS Scheduling & Misc. Issues
  • Slide 3
  • What is a Distributed System A distributed system is the one preventing you from working because of the failure of a machine that you had never heard of. Leslie Lamport Multiple computers sharing (same) state and interconnected by a network A distributed system is a collection of independent computers that appear to the users of the system as a single computer. Tanenbaum shared memory multiprocessor message passing multiprocessor distributed system
  • Slide 4
  • Distribution: Example Pro/Cons All the Good Stuff: High Performance, Distributed Access, Scalable, Heterogeneous, Sharing (Concurrency), Load Balancing (Migration, Relocation), Fault Tolerance, Bank account database (DB) example Naturally centralized: easy consistency and performance Fragment DB among regions: exploit locality of reference, security & reduce reliance on network for remote access Replicate each fragment for fault tolerance But, we now need (additional) DS techniques Route request to right fragment Maintain access/consistency of fragments as a whole database Maintain access/consistency of each fragments replicas
  • Slide 5
  • Transparency: Global Access Illusion of a single computer across a DS Distribution transparency: Access transparency + location transparency + replication transparency + fragmentation transparency Fragmentation Hide wether the resource is fragmented or not
  • Slide 6
  • Multiprocessor OS Types (1) -Each CPU has its own operating system -Shared bus Comm. blocking & CPU idling! Bus
  • Slide 7
  • Multiprocessor OS Types (2) Master-Slave multiprocessors Bus -Master-Slave multiprocessors -Master is a bottleneck!
  • Slide 8
  • Multiprocessor OS Types (3) Symmetric Multiprocessors SMP multiprocessor model Bus -Eliminates the CPU bottleneck, but have issues associated to ME, synchronization -Mutex on OS?
  • Slide 9
  • OSs for DSs Loosely-coupled OS A collection of computers each running their own OS, OSs allow sharing of resources across machines A.K.A. Network Operating System (NOS) Manages heterogeneous multicomputer DS Difference: provides local services to remote clients via remote logging Data transfer from remote OS to local OS via FTP (File Transfer Protocols) Tightly-coupled OS OS tries to maintain single global view of resources it manages A.K.A. Distributed Operating System (DOS) Manages multiprocessors & homogeneous multicomputers Similar local access feel as a non-distributed, standalone OS Data migration or computation migration modes (entire process or threads)
  • Slide 10
  • Network Operating Systems (NOS)
  • Slide 11
  • Distributed Operating Systems (DOS)
  • Slide 12
  • Client Server Model for DOS & NOS
  • Slide 13
  • Middleware Can we have the best of both worlds? Scalability and openness of a NOS Transparency and relative ease of a DOS Solution: additional layer of SW above NOS Mask heterogeneity Improve distribution transparency (and others) Middleware (MW)
  • Slide 14
  • Middleware (& Openness) Document-based middleware (e.g. WWW) Coordination-based MW (e.g., Linda, publish subscribe, Jini etc.) File system based MW Shared object based MW
  • Slide 15
  • File System-Based Middleware (1) (a) (b) Approach: make a DS look like a big file system Transfer models: (a)Download/upload model (work done locally) (b)Remote access model (work done remotely)
  • Slide 16
  • (a) Two file systems (b) Naming Transparency: All clients have same view of FS (c) Some clients with different FS views File System-Based Middleware (2)
  • Slide 17
  • File System-Based Middleware (3) Semantics of file sharing (ordering and session semantics) (a) single processor gives sequential consistency (b) distributed system may return obsolete value
  • Slide 18
  • Shared Object-Based Middleware (1) Main elements of CORBA based system Common Object Request Broker Architecture Approach: make a DS look like objects (variables + methods) Easy scaling to large systems replicated objects (C++, Java) flexibility inter-ORB protocol Example 1: Main elements of CORBA (Common Object Request Broker Architecture) based system: - Object Request Broker (ORB) - Internet Inter- ORB Protocol (IIOP)
  • Slide 19
  • Shared Object-Based Middleware (2) Example 2: Globe [1] - provides a model of distributed shared objects and basic support services (naming, locating objects etc.) - an object is completely self-contained - designed to scale to a billion users, a trillion objects around the world [1] M. van Steen, P. Homburg, and A. Tanenbaum. Globe: A Wide-Area Distributed System, 1999. GIDS: Globe Infrastructure Directory Service
  • Slide 20
  • Shared Object-Based Middleware (3) A distributed shared object in Globe can have its state copied on multiple computers at once how to maintain sequential consistency of write operations?
  • Slide 21
  • Shared Object-Based Middleware (4) Internal structure of a Globe object
  • Slide 22
  • OSs, DSs & MW
  • Slide 23
  • Network Hardware The Internet
  • Slide 24
  • Network Services and Protocols Network Services (blocking) (non-blocking) Internet Protocol (IP) Transmission Control Protocol (TCP): Connection-oriented Universal Datagram Protocol (UDP): Connectionless Network services:
  • Slide 25
  • Client-Server Communications Unbuffered msg. passing send(addr,msg), recv(addr,msg) all request and reply at C/S level all msg. acks between kernels only Buffered msg. passing msg. sent to kernel mailbox or kernel/user interface socket client server kernel msg. directed at a process client server kernel blocking? non-blocking?
  • Slide 26
  • Remote Procedure Calls (RPC) Synchronous/Asynchronous (blocking/non-blocking) communication [Sync] client generated request, STUB kernel [Sync] kernel blocks process till reply received from server [ASync] buffers msg
  • Slide 27
  • RPC & Stubs (Dummy Procedure i.p.o RPC) [C] call client stub procedure [CS] prepare msg. buffer [CS] load parameters into buffer [CS] prepare msg. header [CS] send trap to kernel [K] context switch to kernel [K] copy msg. to kernel [K] determine server address (NS) [K] put address in header [K] set up network interface [K] start timer for msg [S] process req; initiate server stub [SS] call server [SS] set up parameter stack/unbundle [K] context switch to server stub [K] copy msg. to stub [K] see if stub is waiting [K] decide which stub to assign [K] check packet for validity [K] process interrupt (save PC, kernel state) NS C: Client; CS: Client Stub, [K] Kernel S: Server; SS: Server Stub, NS : Network service
  • Slide 28
  • Remote Procedure Call Implementation Issues: Can we pass pointers? (local context) call by reference becomes copy-restore (but might fail) Weakly typed languages (e.g., C allows computations, say product of arrays sans array size specs) can client stub determine unspecified size to pass on? Not always possible to determine parameter types Cannot use global variables C/S may get moved to remote machine
  • Slide 29
  • RPC Failures? C/S failure vs. communication failure? Who detects? Timeouts? Does it matter if a node (C/S) failed BEFORE or AFTER a request arrived? BEFORE or AFTER a request is processed? Client failure: orphan requests? add expiration counters Server crash?
  • Slide 30
  • Communication Delivers messages despite communication link(s) failure process failures Main kinds of failures to tolerate timing (link and process) omission (link and process) value
  • Slide 31
  • Communication: Reliable Delivery Omission failure tolerance (degree k). Design choices : a)Error masking (spatial): several (> k) links b)Error masking (temporal): repeat K+1 times c)Error recovery: detect error and recover
  • Slide 32
  • Reliable Delivery (cont.) Error detection and recovery: ACKs and timeouts Positive ACK: sent when a message is received Timeout on sender without ACK: sender retransmits Negative ACK: sent when a message loss detected Needs sequence #s or time-based reception semantics Tradeoffs Positive ACKs faster failure detection usually NACKs : fewer msgs Q: what kind of situations are good for Spatial error masking? Temporal error masking? Error detection and recovery with positive ACKs? Error detection and recovery with NACKs?
  • Slide 33
  • Resilience to Sender Failure Multicast FT-Communication harder than point-to-point Basic problem is of failure detection Subsets of senders may receive msg, then sender fails Solutions depend on flavor of multicast reliability a)Unreliable: no effort to overcome link failures b)Best-effort: some steps taken to overcome link failures c)Reliable: participants coordinate to ensure that all or none of correct recipients get it (sender failed in b)


View more >