hadoop fault tolerance

A survey on Hadoop: Fault Tolerance and Optimization of Fault Tolerance Model

Group-11Project Guide : Mr. R Patgiri

Members:Pallav(10-1-5-023)

Prabhakar Barua(10-1-5-017) Prabodh Hend(10-1-5-053)Prem Chandra(09-1-5-062)Jugal Assudani(10-1-5-068)

What is Apache Hadoop?

• Large scale, open source software framework▫ Yahoo! has been the largest contributor to date

• Dedicated to scalable, distributed, data-intensive computing• Handles thousands of nodes and petabytes of data• Supports applications under a free license• 2 Hadoop subprojects:

▫ HDFS: Hadoop Distributed File System with highthroughput access to application data▫ MapReduce: A software framework for distributedprocessing of large data sets on computer clusters

Hadoop MapReduce• MapReduce is a programming model and softwareframework first developed by Google (Google’sMapReduce paper submitted in 2004)• Intended to facilitate and simplify the processing ofvast amounts of data in parallel on large clusters ofcommodity hardware in a reliable, fault-tolerantmanner▫ Petabytes of data▫ Thousands of nodes• Computational processing occurs on both:▫ Unstructured data : filesystem▫ Structured data : database

Hadoop Distributed File System (HDFS)• Inspired by Google File System• Scalable, distributed, portable filesystem written in Java forHadoop framework▫ Primary distributed storage used by Hadoop applications• HFDS can be part of a Hadoop cluster or can be a stand-alonegeneral purpose distributed file system• An HFDS cluster primarily consists of▫ NameNode that manages file system metadata▫ DataNode that stores actual data• Stores very large files in blocks across machines in a largecluster▫ Reliability and fault tolerance ensured by replicating data acrossmultiple hosts• Has data awareness between nodes• Designed to be deployed on low-cost hardware

Typical Hadoop cluster integratesMapReduce and HFDS

• Master/slave architecture• Master node contains▫ Job tracker node (MapReduce layer)▫ Task tracker node (MapReduce layer)▫ Name node (HFDS layer)▫ Data node (HFDS layer)• Multiple slave nodes contain▫ Task tracker node (MapReduce layer)▫ Data node (HFDS layer)• MapReduce layer has job and task tracker nodes• HFDS layer has name and data nodes

Hadoop simple cluster graphic

MapReduce framework

• Per cluster node:▫ Single JobTracker per master 1. Responsible for scheduling the jobs component tasks on the slaves2. Monitors slave progress3. Re-executing failed tasks▫ Single TaskTracker per slave1. Execute the tasks as directed by the master

MapReduce core functionality• Code usually written in Java- though it can be written inother languages with the Hadoop Streaming API• Two fundamental pieces:▫ Map step1. Master node takes large problem input and slices it intosmaller sub problems; distributes these to worker nodes.2.Worker node may do this again; leads to a multi-level treestructure3.Worker processes smaller problem and hands back to master▫ Reduce step1. Master node takes the answers to the sub problems andcombines them in a predefined way to get the output/answerto original problem

MapReduce core functionality (II)

• Data flow beyond the two key pieces (map and reduce):▫ Input reader – divides input into appropriate size splitswhich get assigned to a Map function▫ Map function – maps file data to smaller, intermediate<key, value> pairs▫ Compare function – input for Reduce is pulled from theMap intermediate output and sorted according to thscompare function▫ Reduce function – takes intermediate values and reduces toa smaller solution handed back to the framework▫ Output writer – writes file output

MapReduce core functionality (III)

• A MapReduce Job controls the execution▫ Splits the input dataset into independent chunks▫ Processed by the map tasks in parallel

• The framework sorts the outputs of the maps• A MapReduce Task is sent the output of theframework to reduce and combine• Both the input and output of the job are storedin a filesystem• Framework handles scheduling

▫ Monitors and re-executes failed tasks

MapReduce input and output

• MapReduce operates exclusively on <key, value>pairs• Job Input: <key, value> pairs• Job Output: <key, value> pairs

Input and Output (II)

example: WordCount

Two input files:file1: “hello world hello moon”file2: “goodbye world goodnight moon”Three operations:mapcombinereduce

Output per step

MAPFirst map:< hello, 1 >< world, 1 >< hello, 1 >< moon, 1 >

MAPSecond map:< goodbye, 1 >< world, 1 >< goodnight, 1 >< moon, 1 >

Output per step

• COMBINE• First map: • < moon, 1 >• < world, 1 >• < hello, 2 >

• Second map:• < goodbye, 1 >• < world, 1 >• < goodnight, 1>• < moon, 1 >

Map class (II)

Two maps are generated (1 per file) First map emits: < hello, 1 >< world, 1 >< hello, 1 >< moon, 1 >

Second map emits: < goodbye, 1 >< world, 1 >< goodnight, 1 >< moon, 1 >

Hadoop MapReduce Word Count Source

public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer itr = new StringTokenizer(line); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } }

/** * A reducer class that just emits the sum of the input values. */ public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }

Hadoop MapReduce Word Count Driver

public void run(String inputPath, String outputPath) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); // the keys are words (strings) conf.setOutputKeyClass(Text.class); // the values are counts (ints) conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setReducerClass(Reduce.class); FileInputFormat.addInputPath(conf, new Path(inputPath)); FileOutputFormat.setOutputPath(conf, new Path(outputPath)); JobClient.runJob(conf); }

PROPOSITIONS

1.Addition of one more “Backup” state in Hadoop Pipeline

• Motive In a Hadoop pipeline when system failure occurs then then the whole process of MapReduce is computed again, although it is a straight forward and the most plausible remedy, However when the system turns faulty after the mapping process the whole process of MapReduce is done again. If the Intermediate key value pairs can be backedup then even after the faults in the system the intermediate key value pair data can be fed to another cluster of reducers.


• Once this backup system is installed in the pipeline then the adaptivity of hadoop in terms of fault tolerance will increase. As the intermediate data is preserved for the reducer, even if the earlier cluster is non functional, the preserved data could be fed into a new reducer. Although the scheduling decisions to be taken by the master remains unchanged i.e. O(M+R), where M and R are the mappers and Reducers, but time required to allocate a new cluster for computing MapReduce again will be saved. Once the Reducer completes the process the indermediate data is removed from the backup device.


Advantages & Disadvantages AdvantagesUnnecessary computation of Mapping is subsided.BackUp is created as a new state in the pipeline. Disadvantage

Cost overheadThere will be some delay in the reducer function as backup also will be accessing the data.

2. Protocol for Supernode

• Motive

Hadoop usually does not have any communication between its slave nodes in which control signals regarding the status is shared. This proposal tries to estalblish a differnt type of communication between the slave nodes so that when any node turns faulty, the neighbouring nodes try to do its job. In this new infrastructure configuration, neighboring nodes will behave as a supernode, and each node will know the other nodes in the supernode and the tasks they have assigned. Thus, in case one of the nodes fails, another node in the supernode can take the role of the failed one, without needing the JobTracker to know about it or take any action.

2. Protocol for Supernode• Detection of Fault

Every neighbouring nodes (Node 1, Node 2, Node 3, Node 4, Node 5 ) will ping Node 1 periodically after time T(T<heartbeat) and keep track of it. If there is no response from Node 1 for a certain time interval it will be assumed as a non functional node.

• Info Node 1 sends to its neighbour during normal execution1. Task Information2. Location of Data in Shared Memory3. Progress of Task(Checkpoint)

• Failure of NodeNodes 1,2,3,4,5 ping each other periodically, These nodes should get a response from other nodes, however if there comes no response from other nodes for a period of three handshake time, it will be considered as a faulty node.


• Re assingment of tasks to Node 2Any of the neighbour 2,3,4,5 which has completed its job or it is about to complete i.e. it has free resources, will assign itself this task and its task tracker will schedule it.

• Revival of Node 1If Node 1 starts working again and tries to gain access to the shared memory where Node 2 is already performing the task allocated to Node 1,the Control Unit of the shared memory will prevent the Node 1 from accessing the shared memory location.


• Control UnitControl unit is present in the shared memory. Its job is to handle all theshared memory in the cluster. It also prevents simultaneous access of more than 1 node to a particular memory segment.

• Client App requestBefore this structure was theorised, ie. the current system in use, the client app requests the main node for the address of the reducers where it may find the required answers, now will request the CU of the shared memory associated with the task related to the client app via name node, the address of the shared memory which will be kept track of by the main node.

2. Protocol for Supernode• Advantages1. More fault tolerant

When any node becomes non-functional, then the node present nearby (ie. Supernode), which is near completion or has already completed its task reassigns itself to the task of that faulty node, The description of which is present in the shared memory. Therefore a faulty node does not have to wait for the Master node to notice about its non-functionality and hence reduce execution time in case any of the node gets faulty.

2. Shared MemoryBy use of shared memory, the memory of a particular node does not get blocked in case it gets faulty, it is available to other nodes, with a controlled access to it.

3. Control UnitControl Unit prevents simultaneous access of the same memory block, thereby providing data integrity. It also provides the client application about the data it requires.


• Disadvantages • Cost overhead

Extra hardware is needed to implement this structure.

• Bandwidth consumptionThere will be some bandwidth consumption during the interaction between the nodes.

System Of Slaves

Motive In Hadoop every node is a single commodity hardware. However a vision of system of nodes has never been implemented. If a node turns non functional, its task is reassigned by the master to another node. To decrease the time in assigning this task and all the overhead caused during this process is reduced by making a system of multiple nodes as a unit.

System Of Slaves

System Of Slaves

TheorizationUnlike conventional Hadoop Mapreduce each slave is now considered as a system of slaves which contain multiple nodes. Athough similar to the process of task assignment in Hadoop here also master distributes the task among several system of slaves.Name node informs each system about the location of data in the shared memory.System seeks for the data and after it has got the access permission N1 and N2 acquires the data and they start computation. If N1 gets non-functional N2 resumes N1’s functioning by completing its own task and after moving forwad to N1’s task. N2 gets the information of N1 just by going through the checkpoints and output data. If the whole system fails supernode algorithm will come into play.

System Of Slaves

System Of Slaves

Division of data:-If chunk size is 64 MB, N1 and N2 each will get 32MB of data. Although N1 and N2 cannot access the data simultaneously. Firstly when the map-reduce job will be assigned to the system, N1 will acquire the data. Here N1 is given higher priority. After N1 takes its data, N2 will proceed. However if N1 turns faulty, after some time, N2 will start processing its data, and while submitting the checkpoint it will notice the inavailablity of N1.

THANK YOU

hadoop fault tolerance

Technology