tackling big data with the elephant in the room

20
TACKLING BIG DATA WITH THE ELEPHANT IN THE ROOM

Upload: bti360

Post on 16-Aug-2015

159 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Tackling Big Data with the Elephant in the Room

TACKLING BIG DATA WITH THE ELEPHANT IN THE ROOM

Page 2: Tackling Big Data with the Elephant in the Room

WHAT’S THE PROBLEM WITH BIG DATA?

Volume Variety Velocity

Page 3: Tackling Big Data with the Elephant in the Room

WHAT’S THE SOLUTION TO BIG DATA?

“In pioneer days they used oxen for heavy pulling, and when one oxen couldn’t budge

a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger

computers, but for more systems of computers.” – Grace Hopper

Page 4: Tackling Big Data with the Elephant in the Room

HADOOP’S SOLUTION

Sqoop

Pig Hive

HBase Mahout Flume

Oozie …

Hadoop Distributed File System

MapReduce

Hadoop Core

Components

Hadoop Ecosystem

Page 5: Tackling Big Data with the Elephant in the Room

WHAT IS

HDFS?

Page 6: Tackling Big Data with the Elephant in the Room

HOW DOES HDFS WORK?

Large Data File

Block #1

Block #2

Page 7: Tackling Big Data with the Elephant in the Room

HOW DOES HDFS WORK?

Large Data File

Block #1

Block #2

Block #1

Block #1

Block #1

Page 8: Tackling Big Data with the Elephant in the Room

HOW DOES HDFS WORK?

Large Data File

Block #1

Block #2

Block #1

Block #1

Block #1

Block #2

Block #2

Block #2

Page 9: Tackling Big Data with the Elephant in the Room

HOW DOES HDFS WORK?

Large Data File

Block #1

Block #2

Block #1

Block #1

Block #1

Block #2

Block #2

Block #2

Page 10: Tackling Big Data with the Elephant in the Room

WHAT IS MAP-REDUCE? Core Ideas

–  Data Locality –  Parallelism –  Block Independence

Three Stages 1.  Map 2.  Swap & Sort 3.  Reduce

Page 11: Tackling Big Data with the Elephant in the Room

WORD COUNT MAP

the cat sat on the mat the aardvark sat on the …

Node 1

the mahout drove the ….

Node 2

the cat sat on the mat The aardvark sat on the … The mahout drove the …

Page 12: Tackling Big Data with the Elephant in the Room

Mapper

WORD COUNT MAP

the cat sat on the mat the aardvark sat on the …

Node 1

the mahout drove the ….

Node 2

Mapper

map()

map()

Page 13: Tackling Big Data with the Elephant in the Room

Mapper

WORD COUNT MAP

the cat sat on the mat the aardvark sat on the …

Node 1

the mahout drove the ….

Node 2

Mapper

map()

map()

the 1

cat 1

sat 1

on 1

the 1

mat 1

the 1

mahout 1

drove 1

the 1

Page 14: Tackling Big Data with the Elephant in the Room

Mapper

WORD COUNT MAP

the cat sat on the mat the aardvark sat on the …

Node 1

the mahout drove the ….

Node 2

Mapper

map()

map()

the 1

cat 1

sat 1

on 1

the 1

mat 1

the 1

mahout 1

drove 1

the 1

map() the 1

aardvark 1

sat 1

on 1

the 1

Page 15: Tackling Big Data with the Elephant in the Room

WORD COUNT SWAP & SORT the 1

cat 1

sat 1

on 1

the 1

mat 1

the 1

mahout 1

drove 1

the 1

the 1

aardvark 1

sat 1

on 1

the 1

Page 16: Tackling Big Data with the Elephant in the Room

WORD COUNT SWAP & SORT the 1

cat 1

sat 1

on 1

the 1

mat 1

the 1

mahout 1

drove 1

the 1

the 1

aardvark 1

sat 1

on 1

the 1

aardvark 1

cat 1

mat 1

on 1,1

sat 1

the 1,1,1,1

drove 1

mahout 1

the 1,1

Page 17: Tackling Big Data with the Elephant in the Room

WORD COUNT SWAP & SORT the 1

cat 1

sat 1

on 1

the 1

mat 1

the 1

mahout 1

drove 1

the 1

the 1

aardvark 1

sat 1

on 1

the 1

aardvark 1

cat 1

mat 1

on 1,1

sat 1

the 1,1,1,1

drove 1

mahout 1

the 1,1

aardvark 1

cat 1

mat 1

mahout 1

sat 1

drove 1

on 1,1

the 1,1,1,1,1,1

Node 3

Node 4

Node 5

Page 18: Tackling Big Data with the Elephant in the Room

WORD COUNT REDUCER aardvark 1

cat 1

mat 1

mahout 1

sat 1

drove 1

on 1,1

the 1,1,1,1,1,1

Node 3

Node 4

Node 5

Reducer 0

Reducer 1

Reducer 2

aardvark 1

cat 1

mat 1

mahout 1

sat 1

drove 1

on 2

the 6

Page 19: Tackling Big Data with the Elephant in the Room

TAKE-AWAYS

Sqoop

Pig Hive

HBase Mahout Flume

Oozie …

Hadoop Distributed File System

MapReduce

Hadoop Core

Components

Hadoop Ecosystem

Page 20: Tackling Big Data with the Elephant in the Room

QUESTIONS?