ics362 – distributed systems dr. ken cosh week 1
TRANSCRIPT
ICS362 – Distributed Systems
Dr. Ken Cosh
Week 1
Course Description
This course provides an introduction to the basic issues in the design and implementation of distributed systems. Topics include communication, processes, naming, synchronisation, consistency and replication, fault tolerance and security.
Course Objectives
On completion of this course students will be able to:– 3.1 Discuss key elements to consider when
managing Distributed Systems, such as security, fault tolerance, consistency and replication.
– 3.2 Compare differences between different Object Based Systems, File Systems, Web Based Systems and Co-ordination Based Systems.
References
1) (Compulsary) Distributed Systems, Principles and Paradigms, 2nd Edition, Andrew S. Tanenbaum & Maarten Van Steen, 2007.
2) Distributed Systems, Concepts and Design, 4th Edition, George Coulouris, Jean Dollimore, Tim Kindberg, 2005.
Topics
Introduction Architectures Processes Communication Naming Synchronisation Consistency / Replication Fault Tolerance Security Example Systems
Assessment
1. Quizzes and Presentations - 30% 2. Midterm exam - 30% 3. Final exam - 40%
Course Info.
Mon / Wed 12:30-14:00 Room PC319 Office Hours: By Appointment
NOTE: Plagiarism = 0.
What is a Distributed System?
“A distributed System is a collection of independent computers that appears to it users as a single coherent system.” (Tanenbaum)
“Hardware of Software components located at networked computers communicate and coordinate their actions only by passing messages” (Coulouris)
Key Features
Components that are autonomous Users think they are dealing with a single
system This requires some collaboration
Note: The challenges involved are independent of the type of computers used.
Characteristics of DS
How it works is hidden from user. Interaction is consistent & uniform Scalability Continuously available, even if some parts
are out of order
Layered Architecture
Commonly implemented through layers & middleware
Application A
Local OS 1 Local OS 2 Local OS 3
Application C
Local OS 4
Distributed System Layer (Middleware)
Application B
Network
Goals
Make Resources Available Hide the fact that resources are distributed
– Distribution Transparency
Be Open Be Scalable
Make Resources Available
E.g. Printers, storage facilities, data, files, webpages, networks etc.– For economic reasons– For collaboration reasons– To create virtual organisations
This produces challenges– Security– Privacy
Distribution Transparency
An important goal of distributed systems is to hide the fact that processes / resources are physically distributed Enabling users to use the system without worrying about where the resources are.
•Access Transparency•Location Transparency•Migration Transparency•Relocation Transparency•Replication Transparency•Concurrency Transparency•Failure Transparency
Access Transparency
Different Resources may represent data in different formats, but this shouldn’t be an issue for the user.
– A user on an Intel workstation sending data to a Sun SPARC machine, shouldn’t be concerned that Intel orders its bytes by little endian format (high order bytes first) while SPARC uses big endian format (low order bytes first).
Different file naming formats should also not be of concern to the user. ‘/’ or ‘\’.
Location Transparency
Location Transparency refers to the physical position of a resource, which should be hidden from the user. This is normally achieved through naming, where normally only logical names are used;– http://cis.payap.ac.th/index.php
Where is it (physically)? Has it always been there?
Migration / Relocation Transparency
In the previous web address, you have no idea whether index.html has always been on the cis.payap.ac.th server, or when it might have moved there. If resources can be moved without affecting the way the resource is accessed then migration transparency is provided. If that movement occurs while the resource is being accessed, then relocation transparency is provided. Consider moving around using a wireless laptop.
Replication Transparency
The efficiency of distributed systems can be improved greatly by locating replicas (copies) of a resources physically closer to a user. Replication transparency enables the system to do this, without the user knowing they are using a replica.
Concurrency Transparency
A goal of distributed systems is often sharing of resources between users. These users may wish to access or even update the same data at the same time (concurrently). An important challenge when designing distributed systems is how to deal with concurrent accesses.– How to maintain consistency when different users
use the same resource in different ways.
Failure Consistency
“You know you have one when the crash of a computer you’ve never heard of stops you from getting any work done!”
Failure Consistency tries to mask failures such as this.
It is difficult to identify between a resource that has failed and a resource which is performing badly (slowly).
– Consider opening a webpage - is it dead or painfully slow, how long should the browser wait?
Complete Transparency?
Complete Transparency isn’t always completely necessary.– E.g. daily newspaper arriving at 7am regardless
of location in the world.
Nor is it always possible.– Physics behind signal transmission.
Openness
A further goal of distributed systems is openness - that any resource conforms to a set of open standards. Doing so enables different parts of the system to make use of required services.
This is normally achieved through modules which offer services which are specified through interfaces, using a standard IDL (Interface Definition Language).
The IDL specifies the syntax of the resource, harder to specify is the semantics of what the services actually do.
Openness
Distributed Systems should be complete and neutral, and in doing so should be interoperable and portable;
– Interoperability refers to how well 2 different systems (possibly from different manufacturers) can co-exist making use of each others services.
– Portability refers to whether an application written for system A can be used by system B.
Openness
Another feature of open systems is flexibility. Systems should be flexible to enable users to specialise their interactions without affecting other users or components.
Flexibility is often achieved through designing systems as a collection of small, replaceable or adaptable components.
Scalability
A further goal of Distributed Systems is that they should be scalable - that is that they can grow;– Scalable by size; more users or resources can be
added to the system.– Scalable by location; resources and users may be
physically distant.– Scalable by administration; system can be easily
manageable as it grows.
Scalability
One problem often encountered when dealing with scalability is dealing with centralisation.
– Centralised services– Centralised data– Centralised algorithms
Imagine how the internet would work if there was only one single DNS table, and every address resolution request had to be directed through that computer.
Scalability
Another problem affecting scalability concerns whether synchronous communication is actually possible.
– Many existing systems were designed for synchronous communication.
The laws of physics (including the speed of light), limits the speed of communication between physically distant resources.
– Leaving a ‘client’ blocked until a reply is sent back.
Scalability & Administration
What happens when a system needs to scale across multiple, independent adminstrative domains?– Conflicting policies
Resource Usage Management Security
Solving Scalability (briefly & currently)
Hiding Communication Latencies– Essentially asynchronous communication. Not waiting for a
reply, instead creating a special handler (thread) to complete previous requests.
Distribution– Splitting a component into smaller parts – e.g. DNS,
splits .com, .th, .edu etc.
Replication– For example caching. A copy of the data closer to the
request.
Replication & Scalability
Replication can have a downside effect on Scalability– Consistency Problems– How big a problem is this?
Complexity
Clearly designing a DS is a complex task. Some common false assumptions adding to complexity:
– The network is reliable– The network is secure– The network is homogenous– The topology doesn’t change– Latency is zero– Bandwidth is infinite– Transport cost is zero– There is one administrator
Examples of DS
Distributed Computing Systems– Cluster Computing– Grid Computing
Distributed Information Systems– Transaction Processing Systems– Enterprise Application Integration
Distributed Pervasive Systems
Distributed Computing Systems
For high performance computing tasks When price/performance ration of PCs and
Workstations improved, it was financially & technically attractive to build supercomputers by hooking up a collection of simple computers on a high speed network.
Cluster Computing
Homogeneous hardware Master node handles allocation of tasks and
user interface E.g. Beowulf Linux clusters
Grid Computing
Heterogeneous Hardware– No assumptions about hardware, OS, Networks,
Administrative domains, security policies
Resources from different organisations are brought together to allow collaboration – essentially realising a virtual organisation.– Towards Service Oriented Architectures
Distributed Information Systems
When Business Information Systems moved into a networked environment.– Sharing data between functional units– Sharing functionality both internally and externally
Transaction Processing Systems
Consider a transaction as an operation on a database.– Handled through Remote Procedure Calls (RPCs)
Each transaction should have 4 characteristics (ACID)– Atomic– Consistent– Isolated– Durable
ACID
Atomic– Either the whole transaction happens, or none of it.
Consistent– Certain invariants must remain true – e.g. the total amount
of money in a bank must remain the same before an after internal transfers (even if momentarily during the transaction this isn’t true).
Isolated– Two concurrently running transactions should not interfere
with each other. Durable
– One a transaction commits, there is no going back.
Enterprise Application Integration
Applications are built on top of databases – separated from the databases.– So these applications may need to communicate
with each other.
Which leads to different communication middleware– RPC– Remote Method Invocations (RMI)
Distributed Pervasive Systems
Thus far systems have been ‘stable’, i.e. relatively permanent fixed nodes with high quality connections.
– Pervasive systems integrate mobile / embedded computing devices.
Small, battery-powered, mobile, wirelessly connected nodes which blend into their environment.
– Nodes should be able to discover local services and react accordingly
E.g. Home Systems, Electronic Health Care Systems, Sensor Networks