meng han presentation 09/11/2013 cs8320 – advanced operating systems fall 2013 – section 2.6...

DISTRIBUTED SYSTEMS MAJOR DESIGN ISSUES

Meng Han

Presentation 09/11/2013CS8320 – Advanced Operating SystemsFall 2013 – Section 2.6 Presentation

OutlineIntroductionDistributed System Design Issues

• Object Models and Naming Schemes• Distributed Coordination• Interprocess Communication• Distributed Resources• Fault Tolerance and Security

Design for big-dataSummaryReferences

Introduction

A distributed system mainly consists[1]: Coordination of concurrent distributed processes Management of distributed resources Functioning of distributed algorithms

However… Network may be UNRELIABLE Components may be UNTRUSTED

These raise the design and implementation issues, in particular how to support transparency.

Introduction

design and implementation issues: How to model and identify objects in system How to co-ordinate the interaction among objects How to communicate with each other How to shared/replicated objects be managed in

controlled fashion How to protect objects and security of system

Object Models and Naming Schemes

Objects in a computer system: processes, data files, memory, devices,

processors, and networks.

Objects are encapsulated in servers process servers, file servers, memory servers

etc. a client is a null server that accesses object

servers.

Object Models and Naming Schemes

Identify a server[2] by name (name server) by either physical or logical address (network server) by service that the servers provide

Following all depend on the naming scheme for system objects: Structure of the system, management of name

space, name resolution, access methods

Distributed CoordinationCoordination to achieve synchronizationDifferent types of synchronization:

Barrier synchronization• Process must reach a common synchronization point

before they can continue

Condition coordination• process must wait for a condition that will be set

asynchronously by other interacting processes to maintain some ordering of execution

Mutual exclusion• Concurrent processes must have mutual exclusion

when accessing a critical shared resource

Synchronization IssuesState information sent by messages:

Typically only partial state information is known about other processes making synchronization difficult.

Information not current due to transfer time delay.Decision if process may continue must rely on

a message resolution protocol. Centralized Coordinator: Central point of failure

Deadlocks[3] Circular Waiting for the other process Deadlock detection and recovery strategies

Synchronization IssuesDeadlocksFour conditions must hold for deadlock to occur

• Exclusive use• Hold and wait• No preemption • Cyclical wait

The problem of deadlocks can be handled in following ways Prevention, avoidance and detection

Deadlock Prevention

Schemes that guarantee the deadlocks can never happen because of the way the system is structured. One of the four conditions is prevented, thus

preventing deadlocks. For example, to impose an order on the

resources and require processes to request resources in increasing order.

This prevents cyclical wait and thus makes deadlocks impossible.

Interprocess Communication

Lower level: Interprocess communication can be accomplished

by using simple message passing primitives. Higher level:

logical communication methods provides the transparency:

Hide the physical details of message passing Two important concepts :

• The client/server model • Remote Procedure Call (RPC)

The Client/Server ModelThe client/ server model is a programming

example for structuring processes in distributed systems[4].

logical communication

request

reply

actual communication

network

client server

kernel kernel

The RPC Model

The remote procedure call model is similar to that of the local model: The caller places arguments to a procedure in

a specific location (such as a result register). The caller temporarily transfers control to the

procedure. When the caller gains control again, it obtains

the results of the procedure from the specified location.

The caller then continues program execution.

The RPC Model

On the server side, a process is dormant (inactive, sleeping)— Awaiting the arrival of a call message. When one arrives, the server process

computes a reply that it then sends back to the requesting client.

After this, the server process becomes dormant again.

The RPC Model

Distributed Resources

Load Distribution multiprocessor scheduling (Static) load sharing (Dynamic)

Distributed shared memoryDistributed file systems

Load Distribution

Multiprocessor scheduling[5] Minimize communication overhead with

efficient scheduling.

Load sharing Process migration strategy & mechanism

Distributed File Systems and Distributed Shared Memory

Distributed file systems Issues are based on a file point of view

Distributed shared memory Issues are based on a process perception of

the system.

The common issues central to them: Sharing and replication of data

Fault Tolerance and Security

Security threats and failures are both system faults.

The problem of failures can be alleviated if there is redundancy in the system. The system should transparently handle

failures or removal of machines, network links, and other resources without loss of data or functionality.

This should hold true for both the system itself and for its applications.

Fault Tolerance and Security

Security[6] Authentication -- clients and also

servers and messages must be authenticated.

Authorization-- access control has to be performed across a physical network with heterogeneous components under different administrative units using different security models.

Design for BIG-DATA

Emergence of Big Data Big data is a foundational element of social

networking and Web 2.0-based information companies. The enormous amount of data is generated as a result of democratization and ecosystem factors such as the following:

• Mobility trends • Data access and consumption • Ecosystem capabilities

Design for BIG-DATA

• Mobility trends: Mobile devices, mobile events and sharing, and

sensory integration

• Data access and consumption: Internet, interconnected systems, social networking,

and convergent interfaces and access models

• Ecosystem capabilities: Major changes in the information processing model

and the availability of an open source framework; the general-purpose computing and unified network integration

Design for BIG-DATA

Summary

Given the system architectures, we summarized the important design and implementation issues.

These issues include object models and naming schemes, interprocess communication and synchronization, data sharing and replication, and failure and recovery.

These problems are unique to distributed systems.

References

[1] Randy Chow & Theodore Johnson, 1997, “Distributed Operating Systems & Algorithms”, (Addison-Wesley), p. 45 to 50, 61 to 63.

[2] Suresh Sridharan, 2006, “Distributed Operating Systems”, (University of Wisconsin, Madison). http://pages.cs.wisc.edu/~dusseau/Classes/CS739/Writeups/Survey.pdf

[3] Chandy, K. Mani, Jayadev Misra, and Laura M. Haas. “Distributed deadlock detection.” ACM Transactions on Computer Systems (TOCS) 1.2 (1983): 144-156.

References

[4] Holliday, J., and Amr El Abbadi. “Distributed deadlock detection.” Encyclopedia of Distributed Computing. Kluwer Academic Publishers, Dordrecht (accepted for publication) (2005).

[5] Babaoglu, Ozalp, and Keith Marzullo. “Consistent global states of distributed systems: Fundamental concepts and mechanisms.” Distributed Systems 2 (1993): 12.

[6] Krishna Sankar, Andrew Balinsky, Darrin Miller, Sri Sundaralingam. (Feb 18, 2005)“EAP Authentication Protocols for WLANs”.

References

[7] Bohlouli, Mahdi, et al. “Towards an Integrated Platform for Big Data Analysis.”Integration of Practice-Oriented Knowledge Technology: Trends and Prospectives. Springer Berlin Heidelberg, 2013. 47-56.

[8] Wolf, Marilyn. “Computers as components: principles of embedded computing system design.” Access Online via Elsevier, 2012.

[9] Provost, Foster, and Tom Fawcett. “Data Science and its Relationship to Big Data and Data-Driven Decision Making.” Big Data 1.1 (2013): 51-59.

www.gsu.edu

meng han presentation 09/11/2013 cs8320 – advanced operating systems fall 2013 – section 2.6...

Documents

security of system slide

rpc slide

rpc model slide

detection slide

presentation slide

system objects

server process

distributed algorithms