advance mapreduce concepts - module 4

15
Advance MapReduce Concepts

Upload: rohit-agrawal

Post on 15-Apr-2017

57 views

Category:

Technology


0 download

TRANSCRIPT

Advance MapReduceConcepts

Counters• Counter provides a way to measure the progress or the number of operations

that occur within MapReduce programs.

Built-in counter groups

Task counters

File System Counters

File InputFormat Counters

File OutputFormat Counters

Creating Custom Counters• First step is to create an Enum that will contain the names of all custom

counters for a particular job.public enum CustomCounters {VALID, INVALID};

• Inside the map or reduce task, the counter can be adjustedif(validRecord) context.getCounter(CustomCounters.VALID).increment(1); // increase the counter by 1else if(invalidRecord) context.getCounter(CustomCounters.INVALID).increment(1); // increase the counter by 1

• The custom counter values will be displayed alongside the built-in counter values on the summary web page for a job viewed through the JobTracker.

• The values can be accessed programmaticallylong validCounterValue = job.getCounters().findCounter(CustomCounters.VALID).getValue();

Serialization• Serialization is the process of turning structured objects into a byte

stream for transmission over a network or for writing to persistent storage.

• Deserialization is the reverse process of turning a byte stream back into a series of structured objects.

• Hadoop uses its own serialization format, Writables, which is certainly compact and fast, but not so easy to extend or use from languages other than Java.

The Writable Interface• The Writable interface defines two methods

• one for writing its state to a DataOutput binary stream• one for reading its state from a DataInput binary stream

Writable class hierarchy

Custom Writables Examplepublic class TextPair implements WritableComparable<TextPair>{private Text first;private Text second;public TextPair() {set(new Text(), new Text());}public TextPair(String first, String second) {set(new Text(first), new Text(second));}

public TextPair(Text first, Text second) {set(first, second);}

public void set(Text first, Text second) {this.first = first;this.second = second;}

public Text getFirst() {return first;}

public Text getSecond() {return second;}

public void write(DataOutput out) throws IOException {first.write(out); second.write(out);}

public void readFields(DataInput in) throws IOException {first.readFields(in);second.readFields(in);}

@Overridepublic int hashCode() {return first.hashCode() * 163 + second.hashCode();}

@Overridepublic boolean equals(Object o) {if (o instanceof TextPair) {TextPair tp = (TextPair) o;return first.equals(tp.first) && second.equals(tp.second);}return false;}

@Overridepublic String toString() {return first + "\t" + second;}

public int compareTo(TextPair tp) {int cmp = first.compareTo(tp.first);if (cmp != 0) {return cmp;}return second.compareTo(tp.second);}}

Error Handling• Handling non-fatal errors that need to be tracked• In the mapper:

if (some_error_condition){ context.getCounter(COUNTER_GROUP, COUNTER).increment(1);}

• In the client:boolean okay = job.waitForCompletion(true);if (okay){ Counters counters = job.getCounters(); Counter bwc = counters.findCounter(COUNTER_GROUP, COUNTER); System.out.println("Errors" + bwc.getDisplayName()+":" + bwc.getValue());}

Compression• It reduces the space needed to store files.• It speeds up data transfer across the network, or to or from disk.

Tuning

Map Side Tuning Properties

Reduce Side Tuning Properties