erkay savas distributed systems cs403/534...
TRANSCRIPT
1
Introduction
CS403/534Distributed Systems
Erkay SavasSabanci University
2
Issues
1. Communication2. Processes3. Naming4. Synchronization5. Consistency & Replication6. Fault Tolerance7. Security
7 important principals
3
Distributed Systems: Paradigms
1. Distributed Object-Based Systems• CORBA, DCOM, Globe
2. Distributed File Systems• Sun NFS, Coda• Plan 9, xFS, SFS
3. Distributed Document-Based Systems• WWW, Lotus Notes
4. Distributed Coordination-Based Systems• TIB/Rendezvous, Sun JINI
4
Distributed System: DefinitionA distributed system is a piece of software ensuring that:
A collection of independent computers that appears to its users as a single coherent system.
Two aspects:1. Independent computers and2. Software that makes everything look like a
single system (middleware)
5
Machine A Machine B Machine C
Distributed System: Organization
Local OS Local OS Local OS
Middleware Service
Distributed Applications
Network
6
Goals of Distributed Systems
• Connecting resources and users• Distribution transparency• Openness• Scalability
7
Transparency in a Distributed System
Different forms of transparency in a distributed system.
Hide whether a (software) resource is in memory or on diskPersistence
Hide the failure and recovery of a resource (masking failures)Failure
Hide that a resource may be shared by several competitive users while keeping it at consistent state.Concurrency
Hide that a resource is replicatedReplication
Hide that a resource may be moved to another location while in useRelocation
Hide that a resource may move to another locationMigration Hide where a resource is locatedLocation
Hide differences in data representation and how a resource is accessedAccess
DescriptionTransparency
8
Degree of TransparencyObservation: aiming at full distribution transparency may be too much:
– Users may be located in different continents; distribution is apparent and not something you want to hide
– Completely hiding failures of networks and nodes is(theoretically and practically) impossible.
– You cannot distinguish a slow computer from a failing one.– You can never be sure that a server actually performed an
operation before a crash– Full transparency will cost performance, exposing the distributed
nature of the system• Keeping Web caches exactly up-to-date with the master copy• Immediately flushing write operations to disk for fault tolerance
9
Openness of Distributed Systems Open distributed system: Extending Services
– services given according to standard rules that describes the syntax and semantics of those services,
– standardization of interfaces– interoperability– Systems should support portability of applications.
Achieving openness: At least make the distributed system independent from the heterogeneity of the underlying environment
– Hardware– Platforms– Languages
10
ScalabilityAt least three types of scalability:
– Size scalability: Number of users and resources – Geographical scalability: Distance between users and
resources– Administrative scalability: Number of administrative
domains
Most systems emphasize only, to a certain extent, size scalability
Today, the challenge lies in geographical and administrative scalability
11
Obstacles for Scalability
Examples of scalability limitations.
routing algorithms based on complete information (load on all machines and lines, run a graph theory algorithm to compute all the optimal routes on a single machine)
Centralized algorithms
A single on-line telephone book; DNS implemented as a single table.Centralized data
A single server for all usersCentralized services
ExampleConcept
12
Characteristics of Distributed Algorithms
1. No machine has complete information about the system state
2. Machine make decisions based only on local information
3. Failure of one machine does not ruin the algorithm
4. There is no assumption that a global clockexists
13
Techniques for ScalingThree techniques:1. Hiding communication latencies
• Hide communication latency in the case of geographical scalability
• use, for example, asynchronous communication2. Distribution
• Partition components into smaller parts and spread them across multiple machines
• Move computations to clients (Java applets)• Decentralized naming services (DNS, tree of domains)• Decentralized information services (WWW)
3. Replication
14
Hiding Communication Latencies
1.4
Reducing the communication by checking the form for syntactic errors at client side through the use of a Java applet.
15
Distribution of DNS
1.5
Resolving, for example, flits.cs.vu.nl
An example of dividing the DNS name space into non-overlapping zones.
16
Techniques for Scaling (2)
Replication: Make copies of the same data available at different machines
– Replicated file servers (mainly for fault tolerance)– Replicated databases– Mirrored Web sites– Large-scale distributed shared memory systems
Caching: allow client processes to access local copies
– Web caches (browser/Web proxy)– File caching (at server and client)
17
Scaling – The ProblemsObservation: Applying replication as a scaling technique is easy, however
– Having multiple copies leads to inconsistencies;– Always keeping copies consistent requires global
synchronization on each modification– Global synchronization precludes large-scale solutions
Observation: If we can tolerate inconsistencies, we may reduce the need for global synchronization.Observation: Tolerating inconsistencies is application dependent
18
Hardware Concepts
• Multiprocessors• Multicomputers
– homogeneous– heterogeneous (network of computers)
19
Multiprocessors and Multicomputers
1.6
Different basic organizations and memories in distributed computer systems
20
Homogeneous Multicomputer Systems
a) Gridb) Hypercube
1-9
21
Networks of ComputersHigh degree of node heterogeneity
– High-performance parallel systems – High-end PCs and workstations– Simple network computers – Mobile computers (palmtops, laptops)
High degree of network heterogeneity– Local-area gigabit networks– Wireless connections– Long-haul, high latency connections
Ideally, a distributed system hides these differences
22
Software Concepts
An overview between • DOS (Distributed Operating Systems)• NOS (Network Operating Systems)• Middleware
Provide distribution transparency
Additional layer atop of NOS implementing general-purpose servicesMiddleware
Offer local services to remote clients
Loosely-coupled operating system for heterogeneous multicomputers (LAN and WAN)
NOS
Hide and manage hardware resources
Tightly-coupled operating system for multi-processors and homogeneous multicomputers
DOS
Main GoalDescriptionSystem
23
Multiprocessor Operating Systems
• Distinguishing feature of multiprocessors operating systems is the support for multiple processors having access to a shared memory.
• The communication is achieved through shared memory
• Data in shared memory must be protected against concurrent access
• Protection is done through synchronization primitives– Semaphores– Monitors
24
Semaphores• An integer with two operations: down and up• If the value of semaphore is greater than 0, the down operation decrements the value of semaphore and continues
• If the value is 0, the calling process is blocked• The up operation checks if there are any blocked
processes that were unable to complete an earlier down operation. If so, it unblocks one of them
• Otherwise, it simply increments the semaphore value
• An unblocked process can simply continue by returning from the down operation.
25
Monitors (1)• A monitor is a programming language construct similar to
an object– data and methods
• However, only one process can call the monitor at a timemonitor Counter {
private:
int count = 0;
public:
int value() { return count;}
void incr () { count = count + 1; }
void decr() { count = count – 1; }
}
26
Monitors (2)A monitor that conditionally blocks a processExample: a process blocks when count is 0
monitor Counter {
private:
int count = 0;
int blocked_procs = 0;
condition unblocked;
public:
int value () { return count;}
void incr () {
if (blocked_procs == 0)
count = count + 1;
else
signal (unblocked);
}
void decr() {
if (count == 0) {
blocked_procs = blocked_procs + 1;
wait (unblocked);
blocked_procs = blocked_procs – 1;
}
else
count = count – 1;
}
}
Condition variables
27
Multicomputer Operating Systems (1)
General structure of a multicomputer operating system
Machine A Machine B Machine C
kernel kernel kernel
Distributed operating system services
Distributed Applications
Network
28
Multicomputer Operating Systems (2)
Some characteristics:– kernel on different computers is generally the same– Services are generally (transparently) distributed
across computersHarder than multiprocessor OS
– Virtual (distributed) shared memory requires OS to maintain global memory map in software.
– When there is no shared (virtual) memory, processor communication through message passing
– No simple systemwide synchronization mechanisms
29
Network Operating System (1)
General structure of a network operating system
Machine A Machine B Machine C
kernel kernel kernel
Distributed Applications
Network
Network OSservices
Network OSservices
Network OSservices
30
Network Operating System (2)Some characteristics:
– Each computer has its own OS with networking facilities– Computers work independently (they may even have different OS)– Services are tied to individual nodes (ftp, telnet, WWW)– Highly file oriented (basically, processors share only files)
A service that is commonly provided by network OS is to allow a user to log into another machine remotely by using a command such as
rlogin machine_name
31
Network Operating System (3)
Two clients and a server in a network operating system
1-20
32
Network Operating System (4)
Different clients may mount the servers in different places
1.21
33
Positioning Middleware
General structure of a distributed system as middleware
Machine A Machine B Machine C
kernel kernel kernel
Distributed Applications
Network
Network OSservices
Network OSservices
Network OSservices
Middleware services
34
Middleware
Some characteristics:– OS on different computers need not be generally the
same– Services are generally (transparently) distributed
across computersNeed for middleware: Best of both worlds
– Scalability and openness of NOS– Transparency and related ease of use of DOS.
35
Middleware Services (1)Communication Services: Abandon primitive socket-based messages in favor of:
– Procedure calls across networks– Remote-object method invocation– Message-queuing systems– Advanced communication streams– Event notification service
Information system services:– Large-scale, system-wide naming services– Advanced directory services (search engines)– Location services for tracking mobile objects
36
Middleware Services (2)Control services: Services giving applications control over when, where, and how they access data
– Distributed transaction processing– Code migration
Security services: Services for secure processing and communication:
– Authentication and authorization services– Simple encryption services– Auditing services– Access Control
Others:– Persistent storage facilities– Data caching & replication
37
Middleware and Openness
In an open middleware-based distributed system, the protocols used by each middleware layer should be the same, as well as the interfaces they offer to applications.
Network OS
Middleware
Application
Network OS
Middleware
Application
Commonprotocol
sameprogramming
interface
38
Comparison between Systems
A comparison between multiprocessor operating systems, multicomputer operating systems, network operating systems, and middleware based distributed systems
OpenOpenClosedClosedOpenness
VariesYesModeratelyNoScalability
Per nodePer nodeGlobal, distributed
Global, centralResource management
Model specificFilesMessagesShared
memoryBasis for communication
NNN1Number of copies of OS
NoNoYesYesSame OS on all nodes
HighLowHighVery HighDegree of transparency
Multicomp.Multiproc.Middleware-based DS
Network OS
Distributed OSItem
39
The Client-Server Model
Characteristics: – There are processes offering services (servers)– There are processes using services (clients)– Clients and servers can be distributed across different
machines– Clients follow request/reply model with respect to
using services
40
Request/Reply model
General interaction between a client and a server.
Client
Server
Request
Provideservice
Reply
wait for result
time
41
Clients and ServersServers: Generally provide services related to shared resources
– Servers for file systems, databases, implementation repositories
– Servers for shared, linked documents (WWW, Lotus Notes)
– Servers for shared applications– Servers for shared distributed objects
Clients: Allow remote service access– Interface transforming client’s local service calls to
request/reply messages
42
Application Layering
Traditional three-layered view:– User-interface layer contains units for an
application’s user interface– Processing layer contains the functions of an
application, i.e. without specific data– Data layer contains the data that a client wants to
manipulate through processing layer(important property is that the data is persistent)
43
Example: Layering
The general organization of an Internet search engine into three different layers
1-28
44
Client-Server Architectures
• Two alternatives for distributing layers of an application:
1. Two-tiered: client/single server configuration2. Three-tiered: each layer on separate machine
45
Traditional Two-Tiered Configurations
Alternative client-server organizations (a) – (e).
1-29
46
Three-Tiered Architectures
An example of a server acting as a client.
1-30
47
Alternative C/S Architectures (1)Cooperating servers: Service is physically distributed across a a collection of servers:
– Traditional multi-tiered architectures– Replicated file systems– Network news services– Large-scale naming systems (DNS, X.500)– Workflow systems
Cooperating clients: Distributed applications exists by virtue of client collaboration (peer-to-peer distribution)
– Teleconferencing where each client owns a workstation– Publish/subscribe architectures in which role of client
and server is blurred
48
Modern Architectures
An example of horizontal distribution of a Web service.
1-31