rgk cluster computing project

22

Upload: ostopd

Post on 08-Aug-2015

89 views

Category:

Engineering


2 download

TRANSCRIPT

FAULT TOLERANCE IN CLUSTER COMPUTING

Guided By- Submitted By-

Mr. Ankush Agrawal Ravindra Pratap Singh

Mr. Praveen Rai Garima Kaushik

Kamini Saraswat

OUTLINE IntroductionPurposeRequirementsAdvantages Of Linux Objective Sub-Objective Research Gap Basic MPI CommandsMessage Passing InterfaceWorking StrategyGraphical Representation

INTRODUCTIONWhat Is Cluster …??? A cluster is a set of connected computers that work together so that it can be viewed as a single system. It works on master slave connection.

What Is Cluster Computing…??? A cluster computing is also known as HPC as it is used to solve the large problems in less time compared with other techniques. HPC may include Parallel, Cluster, Grid, Cloud and Green computing.

CONTINUE...What Is Fault…??? A fault is any error or unwanted condition that may arise in a system due to which our system will stop its execution. It may be a natural or man-made types.

What Is Fault Tolerance…??? A fault tolerance is an ability by which we will tolerate some type of faults so that we will get the correct final outcome. Eg. Faulty processor etc.

PUPOSE

The purpose of cluster technology is to eliminate single points of failure. When availability of data is your paramount consideration, clustering is ideal. Using a cluster we can avoids all of these single points of failure: Network card failure Processor failure Motherboard failure

REQUIREMENTSSoftware Environment

Operating system- Ubuntu 10.0.4 LTS

MPI_Ch2 Package

Open_MPI

Libshem-dev

Libmpich2-dev

ADVANTAGES OF USING LINUXThe following are some advantages of using Linux:Linux is readily available on the Internet and

can be downloaded without cost.It is easy to fix bugs and improve system

performance.Users can develop or fine-tune hardware

drivers which can easily be made available to other users.

The most important advantage of using linux is that it creates a several copies of one processor which helps in enhancing the performance of a system.

OBJECTIVE

We are working on linux operating system & on a communication patterns of clusters using MPI.

Our aim is to find faults, and to recover those faults which are causing unexpected behaviours (error , bugs etc.).

MESSAGE PASSING INTERFACE(MPI)The generic form of message passing in the parallel computing is the Message Passing Interface.

It is used as a medium of communication among the nodes.

In message passing, data is moved from address space of one to that of other by mean of cooperative operation such as send/receive pair.

BASIC MPI ROUTINS/COMMANDSFor comunication among different processes some routines are used which are-MPI_Send, to send a message to another process.MPI_Recv, to receive a message from another process.MPI_Gather, MPI_Gatherv, to gather data from participating processes into a single structure. MPI Comm size() – Number of MPI processes. MPI Comm rank() – Internal process number. MPI Get processor name() – External processor name.

CONTINUE… MPI_Scatter, MPI_Scatter, to break a structure into portions and distribute those portions to other processes. MPI_Allgather, MPI_Allgatherv, to gather data from different processes into a single structure that is then sent to all participants (Gather-to-all). MPI_Alltoall, MPI_Alltoallv, to gather data and then scatter it to all participants (All-to-all scatter/gather).MPI_Bcast, to Broadcast data to other processes.

COMMUNICATION PATTERNSCluster Computer s working on four communication patterns-

1. Single Direction Communication

2.Pair-based Communication

3.Pre-posted Communication

4.All-start Communication

SINGLE DIRECTION COMMUNICATION

Processes are paired off, with the lower rank sending message to the higher rank in a tight loop.

The individual pair synchronize before communication begins.

PAIR-BASED COMMUNICATIONEach process communicates with a small number of remote processes in each communication phase.

Communication is paired, so that a given process is both sending and receiving messages with exactly one other process at a time, rotating to a new process when communication is complete.

PRE-POSTED Excepted message reception in the next communication phase is computed before starting the computation phase.

This guarantees that receive buffer will be available during the communication phase.

ALL-START COMMUNICATIONIt is very much same as that of the pre-posted communication but it does not guarantee that all receives are pre-posted.

After the computation, MPI_WaitALL is called.

A call to MPI_WaitALL can be used to wait for all pending operation in a list.

WORKING STRATEGY Installation of Ubuntu 10.04 LTS.Installation of C in Ubuntu 10.04 LTS.Use of terminal.Installation of MPI_ch package on our Linux system.Study of basic Linux command & other Linux features Study of MPI, its basic commands & syntax.Execution of basic Linux & MPI commands. Execution of matrix program using C on linux platform.

CONTINUE... Execution of basic programs using MPI. Execution of parallel computing. We will generate fault, then detect &

at last, recover them by assigning the task of faulty process to some other process so as to overcome from failure.

We will apply fault tolerance techniques i.e.

Co-ordinate checkpoints Message logging

RESEARCH GAP Up to now, fault tolerance has not yet been applied to communication patterns.

So as to overcome with this problem, we need to introduce fault tolerance in communication patterns so as to reach to the correct final outcome.

GRAPHICAL REPRESENTATION

0

50

100

150

200

250

300

350

400

may

apr

mar

feb

jan

dec

nov

oct

sept