rabeeajaffari.files.wordpress.com€¦ · web viewa multiprocessor system is a computer that has...

Types of Distributed Computing Systems:

There are two types of distributed computing systems; Multiprocessor Systems: Multi computer Systems

1. Multiprocessor Systems:A multiprocessor system is a computer that has more than one CPU on its motherboard making it capable of running different processes on different CPUs, or different threads belonging to same process. When they originally were released, it was just two cores in a single CPU but now there are options for four, six and even eight. Example: INTEL’s core processors (i3,i5,i7, etc.)

2. Multicomputer systems

A multicomputer system is a system made up of several independent computers interconnected by a telecommunications network. Multicomputer systems can be homogeneous or heterogeneous. A homogeneous distributed system is one where all CPUs are similar and are connected by a single type of network. They are often used for parallel computing which is a kind of distributed computing where every computer is working on different parts of a single problem.

In contrast a heterogeneous distributed system is one that can be made up of all sorts of different computers, eventually with vastly differing memory sizes, processing power and even basic underlying architecture. They are in widespread use today, with many companies adopting this architecture due to the speed with which hardware goes obsolete and the cost of upgrading a whole system simultaneously.

Distributed Computing Models

There are three models practiced in distributed computing; Grid Computing Cluster Computing Parallel Computing/Processing

1. Cluster Computing:

In computers, clustering is the use of multiple homogenous computers to form what appears to users as a single highly available system. It is used for load balancing as well as for high availability. Clusters are usually deployed to improve speed and/or reliability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or reliability.

A common use of cluster computing is to load balance traffic on high-traffic Web sites. Cluster computing can also be used as a relatively low-cost form of parallel processing for scientific and other applications that lend themselves to parallel operations.

2. Grid computing: It is a form of distributed computing in which an organization uses its existing computers to handle long-running computational tasks.

Grid computing applies the resources of many computers in a network to a single problem at the same time - usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data.Grid computing has the design goal of solving problems too big for any single supercomputer, whilst retaining the flexibility to work on multiple smaller problems. It provides a multi-user environment. Its secondary aims are: better exploitation of the available computing power, and catering for the intermittent demands of large computational exercises.Example: SETI (Search for Extraterrestrial Intelligence) @Home project in which thousands of people are sharing the unused processor cycles of their PCs in the vast search for signs of "rational" signals from outer space.

Grid computing requires the use of software that can divide and farm out pieces of a program to as many as several thousand computers. Grid computing can be thought of as distributed and large-scale cluster computing and as a form of network-distributed parallel processing.

Grid computing appears to be a promising trend for three reasons:(1) its ability to make more cost-effective use of a given amount of computer resources, (2) as a way to solve problems that can't be approached without an enormous amount of

computing power, and (3) It suggests that the resources of many computers can be cooperatively and perhaps

synergistically harnessed and managed as collaboration toward a common objective.

Grid computing is also attractive to large commercial enterprises with complex computation problems who aim to fully exploit their internal computing power: internal grids.

Differences between cluster and grid computing

Cluster computing Grid computingHomogenous HeterogeneousComputers are contained in a single location or complex

Distributed by its nature over a LAN, metropolitan or WAN.

Tightly coupled systems Loosely coupled (Decentralization)Single system image Diversity and DynamismCentralized Job management & scheduling system

Distributed Job Management & scheduling

3. Parallel Processing:

Also known as parallel computing, it is simultaneous use of more than one CPU or processor core to execute a program or multiple computational threads. Ideally, parallel processing makes programs run faster because there are more engines (CPUs or cores) running it.

With single-CPU, single-core computers, it is possible to perform parallel processing by connecting the computers in a network. However, this type of parallel processing requires very sophisticated software called distributed processing software. It provides a very viable option of using the idle time of processor cycles across network effectively by sophisticated distributed computing software. Different types of workers can be;

Different CPUs in a multi-processor system

Different machines in a distributed system Different threads in the same core Different cores in the same CPU

Needs of Parallel Programming: Multiple “processes” active simultaneously solving a given problem, general multiple

processors. Communication and synchronization of its processes (forms the core of parallel

programming efforts).

Processing Elements Architecture (Flynn’s Taxonomy):

First proposed by Michael J. Flynn in 1966, Flynn's taxonomy is a specific classification of parallel computer architectures. Parallel systems are more difficult to program than computers with a single processor because the architecture of parallel computers varies accordingly and the processes of multiple CPUs must be coordinated and synchronized.

The crux of parallel processing is the number of CPUs. Based on the number of concurrent instruction (single or multiple) and data streams (single or multiple) available in the architecture, computing systems are classified into four major categories:

The four categories in Flynn's taxonomy are the following:

1. SISD (Single Instruction, Single Data): This architecture involves a single processor executing a single instruction stream, to operate on data stored in a single memory. Speed is limited by the rate at which computer can transfer information internally. In SISD, machine instructions

are processed in a sequential manner and computers adopting this model are popularly called sequential computers. Most conventional computers have SISD architecture. All the instructions and data to be processed have to be stored in primary memory. Ex: IBM PC, workstations, traditional von Neumann Single CPU computer

2. SIMD (Single instruction, multiple data): This architecture involves single multiprocessor machine performing the same identical action (retrieve, calculate, or store) (single instruction) simultaneously on two or more pieces of data. Each processor simultaneously performs the same instruction on its local data progressing through the instructions, with the instructions issued by the controller processor. Machines based on an SIMD model are well suited to scientific computing since they involve lots of vector and matrix operations. So that the information can be passed to all the processing elements (PEs) organized data elements of vectors can be divided into multiple sets(N-sets for N PE systems) and each PE can process one data set.Ex: CRAY’s vector processing machine.

3. MISD (Multiple instructions, Single data): An MISD computing system is a multiprocessor machine capable of executing different instructions on different PEs but all of them operating on the same dataset. This architecture is not commercially available yet. Each processor operates under the control of an instruction stream issued by its control unit:

therefore the processors are potentially all executing different programs on single data while solving different sub-problems of a single problem.

4. MIMD (Multiple instructions, multiple data): Unlike SISD, MISD, MIMD computer works asynchronously. Each processor uses it own data and executes its own program (or part of the program). It is easier/cheaper to build by putting together “off-the-shelf” processors. Multiple computer instructions, which may or may not be the same, and which may or may not be synchronized with each other, perform actions simultaneously on two or more pieces of data. The class of distributed memory MIMD machines is the fastest growing segment of the family of high-performance computers. There are two types of MIMDs;

Shared memory (tightly coupled) MIMD: It is easy to construct but a memory component or any processor failure affects the whole system. Also, increase of processors leads to memory contention.Distributed memory (loosely coupled) MIMD: The machines of this category are undoubtfully the fastest growing part in the family of high-performance computers. Unlike Shared MIMD it is easily/ readily expandable and Highly reliable (any CPU failure does not affect the whole system). Accessing local memory is much faster than remote memory. If most accesses are to local memory than overall memory bandwidth increases linearly with the number of processors.

Levels of parallelism:

Levels of parallelism can also be based on the lumps of code (grain size) that can be a potential candidate for parallelism. Table below lists categories of code granularity for parallelism. All approaches of creating parallelism based on code granularity have a common goal to boost processor efficiency by hiding latency of a lengthy operation such as a memory/disk access.

Grain Size Code Item Parallelized byVery Fine Instruction ProcessorFine Loop/Instruction block CompilerMedium Standard One Page

FunctionProgrammer

Large Program-Separate heavyweight process

Programmer

The different levels of parallelism are also depicted in Figure below. Among the four levels of parallelism, the first two levels are supported transparently either by the hardware or parallelizing compilers. The programmer mostly handles the last two levels of parallelism. The three important models used in developing applications are shared-memory model, distributed memory model (message passing model), and distributed-shared memory model.

1. Task level parallelism: It refers to multiple processes or tasks or programs running concurrently.

2. Thread level parallelism: Also known as function parallelism and control parallelism, it is a form of parallelization of computer code across multiple processors in parallel computing environments. It focuses on distributing execution processes (threads) across different parallel computing nodes. Multiple threads or instruction sequences from the same application can be executed concurrently.

3. Data level parallelism: DLP to the act of performing the same operation on multiple datum simultaneously. A classic example of DLP is performing an operation on an image in which processing each pixel is independent from the ones around it (such as brightening).

4. Instruction level parallelism: It takes advantage of sequences of instructions that require different functional units (such as the load unit, ALU, etc). Different architectures approach this in different ways, but the idea is to have these non-dependent instructions executing simultaneously to keep the functional units busy as often as possible.

Parallel Languages:

Some parallel languages, like SISAL and PCN are not very popular amongst application programmers because they are not willing to learn a completely new language for parallel programming. Hence, traditional high-level languages (like C and FORTRAN) and offer extensions to existing languages or run-time libraries are a viable alternative.

Note: Look for several high level languages and libraries offered by high-level languages that help to implement task level and thread level parallelism.

Thread Programming models:

There are 3 types of thread programming models;

1. The boss/worker model: It involves a single thread, the boss accepting input for the entire program. Based on that input, the boss passes off specific tasks to one or more worker threads. Boss creates dynamically or creates a pool.

The boss/worker model works well with servers (database servers, file servers, window managers, and the like). The complexities of dealing with asynchronously arriving requests and communications are encapsulated in the boss. The specifics of handling requests and processing data are delegated to the workers.

In this model, it is important that you minimize the frequency with which the boss and workers communicate. The boss can't spend its time being blocked by its workers and allow new requests to pile up at the inputs. Likewise, you can't create too many interdependencies among the workers. If every request requires every worker to share the same data, all workers will suffer a slowdown.

2. The peer model: Unlike the boss/worker model, in which one thread is in charge of work assignments for the other threads, in the peer model, all threads work concurrently on their tasks without a specific leader. The boss/worker model employs a stream of input requests to the boss, the peer model makes each thread responsible for its own input. A peer knows its own input ahead of time, has its own private way of obtaining its input, or shares a single point of input with other peers.The peer model is suitable for applications that have a fixed or well-defined set of inputs, such as matrix multipliers, parallel database search engines, and prime number generators.

3. A thread pipeline: It assumes; A long stream of input A series of sub operations (known as stages or filters) through which every unit of

input must be processed Each processing stage can handle a different unit of input at a time

Applications in which the pipeline might be useful are image processing, text processing, RISC (reduced instruction set computing which passes each instruction through the stages of decoding, fetching operands, computation, and storing results) instruction processors or any application that can be broken down into a series of filter steps on a stream of input. When designing a multithreaded program according to the pipeline model, you should aim at balancing the work to be performed across all stages; that is, all stages should take about the same amount of time to complete.

rabeeajaffari.files.wordpress.com€¦ · web viewa multiprocessor system is a computer that has...

Documents