ics362 – distributed systems dr. ken cosh week 1

ICS362 – Distributed Systems

Dr. Ken Cosh

Week 1

Course Description

This course provides an introduction to the basic issues in the design and implementation of distributed systems. Topics include communication, processes, naming, synchronisation, consistency and replication, fault tolerance and security.

Course Objectives

On completion of this course students will be able to:– 3.1 Discuss key elements to consider when

managing Distributed Systems, such as security, fault tolerance, consistency and replication.

– 3.2 Compare differences between different Object Based Systems, File Systems, Web Based Systems and Co-ordination Based Systems.

References

1) (Compulsary) Distributed Systems, Principles and Paradigms, 2nd Edition, Andrew S. Tanenbaum & Maarten Van Steen, 2007.

2) Distributed Systems, Concepts and Design, 4th Edition, George Coulouris, Jean Dollimore, Tim Kindberg, 2005.

Topics

Introduction Architectures Processes Communication Naming Synchronisation Consistency / Replication Fault Tolerance Security Example Systems

Assessment

1. Quizzes and Presentations - 30% 2. Midterm exam - 30% 3. Final exam - 40%

Course Info.

Mon / Wed 12:30-14:00 Room PC319 Office Hours: By Appointment

NOTE: Plagiarism = 0.

What is a Distributed System?

“A distributed System is a collection of independent computers that appears to it users as a single coherent system.” (Tanenbaum)

“Hardware of Software components located at networked computers communicate and coordinate their actions only by passing messages” (Coulouris)

Key Features

Components that are autonomous Users think they are dealing with a single

system This requires some collaboration

Note: The challenges involved are independent of the type of computers used.

Characteristics of DS

How it works is hidden from user. Interaction is consistent & uniform Scalability Continuously available, even if some parts

are out of order

Layered Architecture

Commonly implemented through layers & middleware

Application A

Local OS 1 Local OS 2 Local OS 3

Application C

Local OS 4

Distributed System Layer (Middleware)

Application B

Network

Goals

Make Resources Available Hide the fact that resources are distributed

– Distribution Transparency

Be Open Be Scalable

Make Resources Available

E.g. Printers, storage facilities, data, files, webpages, networks etc.– For economic reasons– For collaboration reasons– To create virtual organisations

This produces challenges– Security– Privacy

Distribution Transparency

An important goal of distributed systems is to hide the fact that processes / resources are physically distributed Enabling users to use the system without worrying about where the resources are.

•Access Transparency•Location Transparency•Migration Transparency•Relocation Transparency•Replication Transparency•Concurrency Transparency•Failure Transparency

Access Transparency

Different Resources may represent data in different formats, but this shouldn’t be an issue for the user.

– A user on an Intel workstation sending data to a Sun SPARC machine, shouldn’t be concerned that Intel orders its bytes by little endian format (high order bytes first) while SPARC uses big endian format (low order bytes first).

Different file naming formats should also not be of concern to the user. ‘/’ or ‘\’.

Location Transparency

Location Transparency refers to the physical position of a resource, which should be hidden from the user. This is normally achieved through naming, where normally only logical names are used;– http://cis.payap.ac.th/index.php

Where is it (physically)? Has it always been there?

Migration / Relocation Transparency

In the previous web address, you have no idea whether index.html has always been on the cis.payap.ac.th server, or when it might have moved there. If resources can be moved without affecting the way the resource is accessed then migration transparency is provided. If that movement occurs while the resource is being accessed, then relocation transparency is provided. Consider moving around using a wireless laptop.

Replication Transparency

The efficiency of distributed systems can be improved greatly by locating replicas (copies) of a resources physically closer to a user. Replication transparency enables the system to do this, without the user knowing they are using a replica.

Concurrency Transparency

A goal of distributed systems is often sharing of resources between users. These users may wish to access or even update the same data at the same time (concurrently). An important challenge when designing distributed systems is how to deal with concurrent accesses.– How to maintain consistency when different users

use the same resource in different ways.

Failure Consistency

“You know you have one when the crash of a computer you’ve never heard of stops you from getting any work done!”

Failure Consistency tries to mask failures such as this.

It is difficult to identify between a resource that has failed and a resource which is performing badly (slowly).

– Consider opening a webpage - is it dead or painfully slow, how long should the browser wait?

Complete Transparency?

Complete Transparency isn’t always completely necessary.– E.g. daily newspaper arriving at 7am regardless

of location in the world.

Nor is it always possible.– Physics behind signal transmission.

Openness

A further goal of distributed systems is openness - that any resource conforms to a set of open standards. Doing so enables different parts of the system to make use of required services.

This is normally achieved through modules which offer services which are specified through interfaces, using a standard IDL (Interface Definition Language).

The IDL specifies the syntax of the resource, harder to specify is the semantics of what the services actually do.

Openness

Distributed Systems should be complete and neutral, and in doing so should be interoperable and portable;

– Interoperability refers to how well 2 different systems (possibly from different manufacturers) can co-exist making use of each others services.

– Portability refers to whether an application written for system A can be used by system B.

Openness

Another feature of open systems is flexibility. Systems should be flexible to enable users to specialise their interactions without affecting other users or components.

Flexibility is often achieved through designing systems as a collection of small, replaceable or adaptable components.

Scalability

A further goal of Distributed Systems is that they should be scalable - that is that they can grow;– Scalable by size; more users or resources can be

added to the system.– Scalable by location; resources and users may be

physically distant.– Scalable by administration; system can be easily

manageable as it grows.

Scalability

One problem often encountered when dealing with scalability is dealing with centralisation.

– Centralised services– Centralised data– Centralised algorithms

Imagine how the internet would work if there was only one single DNS table, and every address resolution request had to be directed through that computer.

Scalability

Another problem affecting scalability concerns whether synchronous communication is actually possible.

– Many existing systems were designed for synchronous communication.

The laws of physics (including the speed of light), limits the speed of communication between physically distant resources.

– Leaving a ‘client’ blocked until a reply is sent back.

Scalability & Administration

What happens when a system needs to scale across multiple, independent adminstrative domains?– Conflicting policies

Resource Usage Management Security

Solving Scalability (briefly & currently)

Hiding Communication Latencies– Essentially asynchronous communication. Not waiting for a

reply, instead creating a special handler (thread) to complete previous requests.

Distribution– Splitting a component into smaller parts – e.g. DNS,

splits .com, .th, .edu etc.

Replication– For example caching. A copy of the data closer to the

request.

Replication & Scalability

Replication can have a downside effect on Scalability– Consistency Problems– How big a problem is this?

Complexity

Clearly designing a DS is a complex task. Some common false assumptions adding to complexity:

– The network is reliable– The network is secure– The network is homogenous– The topology doesn’t change– Latency is zero– Bandwidth is infinite– Transport cost is zero– There is one administrator

Examples of DS

Distributed Computing Systems– Cluster Computing– Grid Computing

Distributed Information Systems– Transaction Processing Systems– Enterprise Application Integration

Distributed Pervasive Systems

Distributed Computing Systems

For high performance computing tasks When price/performance ration of PCs and

Workstations improved, it was financially & technically attractive to build supercomputers by hooking up a collection of simple computers on a high speed network.

Cluster Computing

Homogeneous hardware Master node handles allocation of tasks and

user interface E.g. Beowulf Linux clusters

Grid Computing

Heterogeneous Hardware– No assumptions about hardware, OS, Networks,

Administrative domains, security policies

Resources from different organisations are brought together to allow collaboration – essentially realising a virtual organisation.– Towards Service Oriented Architectures

Distributed Information Systems

When Business Information Systems moved into a networked environment.– Sharing data between functional units– Sharing functionality both internally and externally

Transaction Processing Systems

Consider a transaction as an operation on a database.– Handled through Remote Procedure Calls (RPCs)

Each transaction should have 4 characteristics (ACID)– Atomic– Consistent– Isolated– Durable

ACID

Atomic– Either the whole transaction happens, or none of it.

Consistent– Certain invariants must remain true – e.g. the total amount

of money in a bank must remain the same before an after internal transfers (even if momentarily during the transaction this isn’t true).

Isolated– Two concurrently running transactions should not interfere

with each other. Durable

– One a transaction commits, there is no going back.

Enterprise Application Integration

Applications are built on top of databases – separated from the databases.– So these applications may need to communicate

with each other.

Which leads to different communication middleware– RPC– Remote Method Invocations (RMI)

Distributed Pervasive Systems

Thus far systems have been ‘stable’, i.e. relatively permanent fixed nodes with high quality connections.

– Pervasive systems integrate mobile / embedded computing devices.

Small, battery-powered, mobile, wirelessly connected nodes which blend into their environment.

– Nodes should be able to discover local services and react accordingly

E.g. Home Systems, Electronic Health Care Systems, Sensor Networks

ics362 – distributed systems dr. ken cosh week 1

Documents

order slide

scalable slide

compulsary distributed

ics362 distributed systems

file systems

messages coulouris slide

distributed enabling

web based systems