only aggressive elephants are fast elephants -...

Only Aggressive Elephants are Fast Elephants

Jun Fan, Vijay Sukhadeve

Computer Science Dept.

Worcester Polytechnic Institute (WPI)

Introduction/Motivation

Typical analysts Problem analyzing Web Logs

Source IP

Visit date

Add Revenue

Modifications to map Reduce Jobs for records of interest

Different Filter criteria


The approach is modify as we go along

Slow query run times - Not acceptable


Use Trojan Indexes ( Hadoop++)

Expensive Index Creation

Question is same old. ( Which attribute to Index)

Note - Trojan Index is a covering index consisting of a sparse directory over the sorted split data.

Idea

HDFS Keeps copies (default 3) of data

Failover

Same Physical Layout

Instead – Use different sort order – Indexes in replica. How? – (Use different vertical Layout)

Obviously – Use configurations

Physical Design Algorithm(Intelligent)

HAIL

Changes upload pipeline of HDFS

Create different Clustered Indexes on each data block replica.

Sorting plus indexing done in memory

Cluster index on sourceIP

Replica – Index on visitdate

Figure 1: The HAIL upload pipeline

Figure 2: HAIL data column index

HAIL

Considerations

Never split row into two blocks – different from HDFS

Convert data block into PAX(Pattern attribute Across)

Create sparse clustered B(+) Tree – single root directory (no multilevel tree) because block sizes are less than 1GB

Never flush data and checksum to Disk

CONCLUSION This paper-work presents HAIL (Hadoop Aggressive Indexing Library).

HAIL improves the upload pipeline of HDFS to create different clustered

indexes on each replica. As a consequence each HDFS block will be

available in at least three different sort orders and with different indexes.

HAIL also works for a larger number of replicas.

A major advantage of HAIL is that the long upload and indexing times which had to

be invested on previous systems are not required anymore.

Hadoop++ created indexes per logical HDFS block whereas HAIL creates different

indexes for each physical replica.

HAIL is experimentally compared with Hadoop as well as Hadoop++ using

different datasets and a number of different clusters. The results demonstrated the

high

efficiency of HAIL. Users can upload their datasets up to 1.6x faster than Hadoop

and run jobs up to 68x faster than Hadoop.

HAIL

Research Challenges

Create Indexes while upload

Modify MR to exploit sort order and indexes at run time.

QUESTIONS?

LOGO

www.nordridesign.com

Bob’s Perspective

—Hadoop’s Perspective

HAIL’s Perspective

Query Pipeline

http://www.nordridesign.com/

LOGO


Bob’s Perspective

Writes a MapReduce job, including a job

configuration class, a map function, and a

reduce function.

Specifies the HailInputFormat to enables

his MapReduce job to read HAIL Blocks.

Annotates his map function to specify the

selection predicate and the projected

attributes required by his MapReduce job.


LOGO


Bob’s Perspective

SELECT sourceIP

FROM UserVisits WHERE visitDate

BETWEEN ‘1999-01-01’ AND ‘2000-01-

01’;

@HailQuery(filter="@3 between(1999-

01-01,2000-01-01)", projection={@1})

void map(Text key, Text v) { ... }

@3 = visitDate @1 = sourceIP; the

position of attributes in UserVisits


LOGO


Bob’s Perspective

In case do not specify filter predicates,

MapReduce will perform a full scan on the

HAIL Blocks as in standard Hadoop.

At query time, HAIL checks whether an

index exists on the filter attribute using

the Index Metadata of a data block.


LOGO


Bob’s Perspective

Use a HailRecord object as input value in

the map function to directly read the

projected attributes without splitting the

record into attributes.

Do not have to filter out the incoming

records, because this is automatically

handled by HAIL via the HailQuery

annotation


LOGO


Bob’s Perspective

void map(Text key, Text v) {

String[] attributes = v.toString().split(",");

if (DateUtils.isBetween(attributes[2],

"1999-01-01", "2000-01-01"))

output(attributes[0], null);

}

void map(Text key, HailRecord v) {

output(v.getInt(1), null);

}


LOGO


Hadoop’s Perspective

JobClient copies all the resources needed

to run the MapReduce job.

JobClient breaks the input into smaller

pieces called input splits using the

InputFormat to a distinct HDFS block.

JobClient retrieves for each input split all

datanode locations having a replica of that

block.

JobClient submits the job to the

JobTracker with a set of splits to process


LOGO


Hadoop’s Perspective

JobTracker creates a map task for each

input split and JobTracker allocates the

map task to the TaskTracker.

The map task uses a RecordReader UDF

in order to read its input data block from

the closest datanode.

The RecordReader breaks block into

records and makes a call to the map

function for each record and writes its

output back to HDFS


LOGO



HailInputFormat uses a more elaborate

splitting policy, called HailSplitting.

HailSplitting is to map one input split to

several data blocks whenever a

MapReduce job performs an index scan

over its input.

Allow HAIL to reduce the number of map

tasks to process and hence to reduce the

aggregated cost of initializing and finalizing

map tasks


LOGO



To improve data locality, HailSplitting first

clusters the blocks of the input of an

incoming MapReduce job by locality.

HailSplitting produces as many collections

of blocks as there are datanodes storing

at least one block of the given input.

For each collection of blocks, HailSplitting

creates as many input splits as map slots

each TaskTracker has.


LOGO



Creates a map task per resulting input

split and schedules these map tasks to the

replicas having the matching index.

If no relevant index exists, HAIL,

scheduling falls back to standard Hadoop

scheduling by optimizing data locality only.


LOGO



The HailRecordReader is responsible for

retrieving the records that satisfy the

selection predicate of MapReduce jobs.

Those records are then passed to the

map function.

For example, need to find all records

having visitDate between(1999-01-01,

2000-01-01)


LOGO



First open an input stream to the block

having the required index by using the

getHostsWithIndex method of each

BlockLocation so as to choose the closest

datanode with the required index.

Read the index entirely into main

memory (typically a few KB) to perform

an index lookup.


LOGO



Reconstruct the projected attributes of

qualifying tuples from PAX to row layout.

Make a call to the map function for each

qualifying tuple.


LOGO


Query Pipeline


LOGO


Experiments

Six different clusters. Each one is a

physical 10-node cluster. Each node has

one 2.66GHz Quad Core Xeon

processor running 64-bit platform Linux

openSuse 11.1 OS, 4x4GB of main

memory, 6x750GB SATA hard disks.

compared (1) Hadoop, (2) Hadoop++ and

(3) HAIL


LOGO


Experiments


LOGO


Experiments

the Synthetic dataset is well suited for

binary representation


LOGO


Experiments


LOGO


Experiments

Tideal = #MapTasks/

(#ParallelMapTasks *

Avg(TRecordReader))

Toverhead = Tend-to-end - Tideal


LOGO


Experiments

HAIL-1Idx creates an index on the same

attribute for all three replicas.


THANK YOU

Questions!

Only Aggressive Elephants are

fast Elephants Vijay Sukhadeve and Fan Jun

f

http://www.istockphoto.com/stock-photo-14423885-red-curtain.php?st=66287df

Motivation(...))

fast querying

One Day in Bob´s Life

Detail View of TaskTracker 5

Adaptive Indexer

Mapper Block 42

Block Metadata

map(K, V) Index Metadata

{...} Index d 5 a 0 1 1 1 0 1 0 1 0 1 . . .

b 0 1 0 1 1 0 1 0 0 0 . . .

c 1 1 0 0 1 0 1 0 1 1 . . .

d 0 0 0 0 0 0 0 0 1 1 . . .

m

bob$ head UserVisits.dat

170.122.19.7|0eqt.html|1994-4-27|336.43|NetAnts/1.2|MLI|MLI-HE|Hertzsprung-Russell|7

161.110.31.8|0gzhwrmzfd.html|2003-7-16|176.15|MojeekBot|GUY|GUY-ZQ|astrochemistry|7

166.103.36.25|0fwrbsuzn.html|1991-2-14|182.915|KAIST AITrc|CAF|CAF-AZ|molecules|1

175.101.2.40|0nfxnbnuu.html|1988-3-24|130.32|Mozilla/5.0|MKD|MKD-RQ|A&A|7

156.100.40.7|0iczechptml|1983-12-5|39.906|Mozilla/5.0|MOZ|MOZ-RF|cosmology:|5

174.118.37.24|0setzifibmk.html|1976-10-11|71.28|Robozilla/1.|MLT|MLT-HT|GALAXIES|5

173.125.36.22|0hhzmry.html|1974-8-27|15.58|Incutio HttpClient|MTQ|MTQ-RE|microwave|2

SOURCE IP 14.1 n.html|2002-4-18|7.930|Mozilla/4.03|MYT|MYT-PK|techniques|1 5.1|0fwrbjsuz 168.1 164.110.6.28|0vpkgsvohgkxdz.html|2004-5-1|223.571|Pagestacker|ASM|ASM-WN|function|4 161.107.40.29|0ctxwdzsbm.html|1975-1-20|273.854|Privoxy/3.0|BHR|BHR-MD|individual|6

158.118.25.1|0sfwtqxjwaavk.html|1977-5-23|160.077|Toutatis x.x-|JOR|JOR-TW|nebulae|5

169.113.12.45|0vxwzb.html|2001-1-20|351.129|Wavefire/0.8-dev|TLS|TLS-PW|spots|2

155.106.15.8|0vlmwypbhhlucr.html|1975-1-19|239.101|OutfoxBot/0.x|MDG|MDG-MG|radial|5

150.101.22.8|0aoclhkqqzcas.html|1973-4-30|220.202|MSIE 4.0|EGY|EGY-KB|occultations|8

175.109.8.39|0jsup.html|1983-9-8|352.6177|RSSOwl/1.2.3|ATA|ATA-BV|interferometers|7

163.111.9.17|0vefhgquwbzgh.html|1976-6-18|268.3|Mozilla/4.05|SOM|SOM-IV|surveys|8

160.122.23.10|0dnbzhr.html|1990-5-24|237.3636|NextGenSearchBot|VUT|VUT-ZL|infrared|6

160.106.6.40|0rwjyocdra.html|1981-10-29|94.965|libwww-perl/5.6|VIR|VIR-FI|Scuti|5

155.110.37.21|0az567.html|1979-1-27|375.101|Jakarta-HttpClient|LBN|LBN-XS|MEDIUM|6

URL DURATION

Uploading UserVisits.dat (2.6 TB) to HDFS...

Stop

30 minutes later...

Uploading UserVisits.dat (2.6 TB) to HDFS...

Stop

Uploading 2.6 TB of UserVisits.dat

completed

Map Progress

Reduce Progress

Stop

Reduce Progress

Stop

Map Progress

Map Progress

Reduce Progress

Stop

MapReduce Job 4711

canceled

Creating Trojan Index on Attribute 0...

Stop

Jo

b

run

tim

e

[s]

0

100 nodes [ J. Dittrich, J.-A. Quiane-Ruiz,A. Jindal,Y. Kargin,V. Setty, and J. Schad. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without it Even Noticing). PVLDB 2010. ]

30

60

90

120 Hadoop Hadoop++ (Trojan Index)

Trojan Index Query Time

Stop


Stop


MapReduce Job 1337

canceled

Trojan Index Creation Time

0

10000

20000

30000

40000

50000

10 nodes 50 nodes 100 nodes

run

tim

e

[se

co

nd

s]

Hadoop HadoopDB

Hadoop++(256MB) Hadoop++(1GB)

(I)ndex Creation (C)o-Partitioning Data (L)oading

L L C

I

C

I

L L

C C

I

I

L L

C

I

C

I

[ J. Dittrich, J.-A. Quiane-Ruiz,A. Jindal,Y. Kargin,V. Setty, and J. Schad. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without it Even Noticing). PVLDB 2010. ]

Hadoop Aggressive Indexing Library

vs.

HDFS

+ MapReduce

HDFS

HDFS

...

Bob

http://www.istockphoto.com/stock-

photo- 5 9 1 1 3 4 - l o ng- l i ve- the-

king.php

DN1 DN2 DN3 DN4 DN5 DN6 DNn







HDFS

horizontal partitions

DNn DN6 DN5 DN4 DN3 DN2 DN1

...

HDFS


...

HDFS blocks 64MB (default) horizontal partitions

HDFS

4 5 6

7 8 9

...

1 2 3 1 1 2 3


2 3


... 1

4 5 6 4 5 6

2

3

1 1 2 2

3

3

HDFS

DNn DN5 DN4 DN3 DN2 DN1

... 1

5 6 4

7

2

5 6 3 5

8 9

1 1 2 3

4 4

7 8 9

HDFS

... 4 8 5 6 4 9

2 9 1 7 3 8

HDFS

DN3 DN4 DN5

4 5 6

7 8 9

2

3 6

DN6

7 8 9

1

4 7

DNn

2 8

3 6

DN6

1 7 2 9

5 6 3 5

DN1 DN2

DN6 DN5 DN4 DN3 DN2 DN1

... 4 8 5 6 4 9 3 6

2 9 1 7 3 8 2 8

5 6 3 5

1 7 2 9

Failover

HDFS

MapReduce

DN6 DN5 DN4 DN3 DN2

... 4 8 5 6 4 9 3 6

2 9 1 7 3 8 2 8

HDFS

1

4 7

DNn

1

4 7

DNn

1 7 2 9

5 6 3 5

DN1


... 4 8 5 6 4 9 3 6

2 9 1 7 3 8 2 8

5 6 3 5

1 7 2 9

HDFS


... 4 8 5 6 4 9 3 6

2 9 1 7 3 8 2 8

5 6 3 5

1 7 2 9

MapReduce

HDFS

map(row) -> set of (ikey, value)


... 1

4 8 5 6 4 9 3 6 4 7

2 9 1 7 3 8 2 8

Map Phase

HDFS

MapReduce map(row) -> set of (ikey, value)

1

4 7

DNn

1

4 7

DNn

1 7 2 9

5 6 3 5

DN1


... 2 9 1 7

4 8 5 6

Map Phase

HDFS


M1 M2 M3 M4 M5 M6 M7


... 2 9 1 7

4 8 5 6

Map Phase

HDFS


M1 M2 M3 M4 M5 M6 M7


... 1

4 8 5 6 4 9 3 6

2 9 1 7 3 8 2 8

Map Phase

6‘ 8‘ 1‘ 3‘ 2‘ 7‘

9 1 2 3

7 6 8 4 7

HDFS


M1 M2 M3 M4 M5 M6 M7

1

4 7

DNn

3 8 2 8

4 9 3 6

1 7 2 9

5 6 3 5

1

4 7

DNn

1 7 2 9

5 6 3 5

3 8 2 8

4 9 3 6

1 7 2 9

5 6 3 5

9‘

DN2


... 2 9 1 7 3 8 2 8

4 5

1 7 2 9

4 8 5 6 4 9 3 6 5 6 3 5

Map Phase

9‘ 5‘ 3‘ 4‘ 2‘ 6‘ 8‘ 1‘

HDFS

MapReduce

M8 M9

map(docID, document) -> set of (term, docID)

HDFS

+ MapReduce

vs.

HDFS HAIL

+ MapReduce

HDFS

+ MapReduce

1

4 7

7‘

DNn

HDFS HAIL

+ MapReduce

HAIL

HAIL

...

Bob


HAIL

horizontal partitions


...

HAIL


...

HDFS blocks 64MB (default) horizontal partitions

HAIL

4 5 6

7 8 9

...

1 2 3 1 1 2 3


2 3

HAIL


... 1

4

2

3

5 6

1 1 2 2

3

3

4 5 6

HAIL


...

4 5 6 4 5 6

1 1 1 2 2 2 3

3 3

HAIL

... 1 1 1 2

3

2 2

3

3 5 6 4 5 5 6 4 4 6


4 5 6

7 8 9

4 5 6

7 8 9

7 8 9


...

HAIL

1 1 1 2 2 2 3

3 6 3 5 5 6 4 5 6 4 4


...

HAIL

1 1 1 2 2 2 3

3 6 5 6 3 5 5 6 4 4 4

7 8 9 7 8 9

...

HAIL

1 1

5 6 4

1

4 5 6 4

2

3 5

2 2

3 6

3 9 9

8

8

9

8 7 7

7


7 8 9

7 8 9


...

HAIL

2 9 1 7 3 8 2 8

4 8 5 6 4 9

1

4 7 3 6 3 5 5 6

1 7 2 9


... 1 7 1 1 2 2 9 2 7 3 8

3 6 5 6 3 5 4 8 5 6 4 9 4 7

8 9

Failover

HAIL

Details

Network

Network

PCK

2

C

B

A PCK

1

PCK

2

PCK

1

Block Metadata

C

B

A

Block Metadata

C

B

A

Network

...

8 forward Block Metadata

1

append

12

HAIL Client Datanode DN1 upload

check

acknowledge

10

PCK

2

PCK

1

0010

1110

ACK 1

3 2 1

ACK 2

3 2 1

ACK 1

3

ACK 1

3 2 1 ACK 2

3 2 forward

13

15

9

14

HDFS Namenode Block directory HAIL Replica directory

register register

ACK 2

3

convert

2

4 5

6 reassemble

7

reassemble

Block Metadata

C

B

A

Index Metadata Index A

HAIL Block

0001001100011 C 0111100111111

1001110111011 B 1010101101010

Block Metadata 0010011101101 A 1101001101101

Index Metadata Index C

Bob

A B C

1

preprocess

11

notify

get location

3

OK

Datanode DN3

HAIL Upload Pipeline

HAIL Query Pipeline

MapReduce Pipeline

System's Perspective

Hadoop MapReduce Pipeline

HDFS HDFS

Task Tracker Job Tracker Job Client

Split Phase Scheduler Map Phase

for each spliti in splits {

allocate spliti to closest DataNode storing blocki splits[] }

Record Reader:

- Perform Index Scan

- Perform Post-Filtering

- for each Record invoke

map(HailRecord)

3

send allocate

Map Task

chose 4 computing

Node read

6 block42

block42

C DN3 DN4

block42

A

DN5

DN6

block42

B

DN7

DNn DN2

...

2

run

Job 5

7 output store

...

@HailQuery( filter="@3 between(1999-01-01, 2000-01-01)",

projection={@1})

void map(Text k, HailRecord v) {

output(v.getInt(1), null);

}

...

Bob's Perspective

HAIL Annotation

Bob

DN1

for each blocki in input { locations = blocki.getHostWithIndex(@3); splitBuilder.add(locations,

blocki); } splits[] = splitBuilder.result;

Experiments

PAX Block build

0101001010111

0110010111010

0101010101010

0011010001100

0110100010111

0101000110110

PAX Block build

0101001010111

0110010111010

0101010101010

0011010001100

0110100010111

0101000110110 PAX Block

0101001010111

0110010111010

0101010101010

0011010001100

0110100010111

0101000110110

HAIL Block

0000100100011

0011110111111

1011000110110

1000101110101

1000101010001

0111101100110

1 write Job

MapReduce Job

Main Class

map(...)

reduce(...)

Upload Times

0

1700

3400

5100

6800

0 3 1 2

Number of created indexes

5766

704 712 717

671

3472

1132

Uplo

ad

tim

e [

sec]

Hadoop Hadoop++ HAIL

all with 3 replicas

Upload Time

0

1700

5766

5100

3472

0 1 2


3

712 717 704 671 1132

Uplo

ad

tim

e [

sec]


all with 3 replicas

Upload Time

0

1700

3400

5100

6800

0 1 2


3

717 712 704 671

5766

3472

1132

Uplo

ad

tim

e [

sec]


all with 3 replicas

Upload Time

0

1700

3400

5100

6800

0 1 2


3

717 712 704 671

5766

3472

1132

Uplo

ad

tim

e [

sec]


all with 3 replicas

Upload Time

0

1075

2150

3225

4300

3 (default)

5 6 7 10

1700 1254 1089 956 717

1773

1132

Uplo

ad

tim

e [

sec]

Number of created replicas

Hadoop HAIL

Hadoop upload time with 3 replicas 3710

2712 2256

Replica Scalability

0

550

1100

1650

2200

Syn UV Syn UV Syn UV

1476 1486

633

1530

684

1742

600

1026

1836

918

1284

827

Uplo

ad

Tim

e [sec]

Number of Nodes

Hadoop HAIL

10 nodes 50 nodes 100 nodes

Scale-Out

Query Times

Individual Jobs: Weblog, RecordReader

0

1000

2000

3000

4000

73 83 75

Bob-Q1 Bob-Q2 Bob-Q3 Bob-Q4 Bob-Q5

683 333

53 52

2864 2917 2776 2442 2470

2112 2156

3358

RR

Ru

ntim

e

[ms]

MapReduce Jobs

Hadoop Hadoop ++ HAIL

Individual Jobs: Weblog, Job

375

0

750

1125

1500

Bob-Q1 Bob-Q2 Bob-Q3 Bob-Q4

MapReduce Jobs

Bob-Q5

602 598 651598 598 601

10991145 10991143

705

1160 942 1006 1094

Jo

b

Ru

ntim

e

[se

c]


Scheduling Overhead

1125

750

375

0

Bob-Q1 Bob-Q2 Bob-Q3 Bob-Q4 Bob-Q5

MapReduce Jobs

1500

Jo

b

Ru

ntim

e

[se

c]

Hadoop Hadoop++ HAIL Overhead

HAIL Scheduling

Summary(Summary(...))

fast querying

0

375

750

1125

1500

Bob-Q1

15 15 22

Bob-Q2 Bob-Q3 Bob-Q4

MapReduce Jobs

Bob-Q5

65 16

10991145 10991143

651 705

1160 942 1006 1094

Job

Runtim

e

[se

c]


Individual Jobs: Weblog

Failover

Failover

750

375

0

1500

1125

Hadoop HAIL

Systems

HAIL-1Idx

Job

Runtim

e

[sec]

Hadoop HAIL Slowdown

10 nodes, one killed

10.3 % slowdown

1099

10.5 % slowdown 5.5 % slowdown

598 598

19 21 12

6 15 14

DN1

4 12 1

7

Hadoop Scheduling

HAIL

MapReduce

5

11 2 22

23 sort order

7 map tasks (aka waves)

7 times scheduling overhead


HAIL Split DN1

6 15 14

4 12 1

23 7

HAIL Scheduling

HAIL

MapReduce

5

11 2 22

19 21 12

sort order

1 map task (aka wave)

1 times scheduling overhead


Query Times with HAIL Scheduling

Summary(Summary(...))

fast querying

only aggressive elephants are fast elephants -...

Documents