hadoop an introduction
TRANSCRIPT
![Page 1: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/1.jpg)
By :- Rishi Arora
www.rishiarora.com
![Page 2: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/2.jpg)
![Page 3: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/3.jpg)
![Page 4: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/4.jpg)
![Page 5: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/5.jpg)
![Page 6: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/6.jpg)
![Page 7: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/7.jpg)
![Page 8: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/8.jpg)
![Page 9: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/9.jpg)
Companies
by
estimated
Number
of
Servers
![Page 10: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/10.jpg)
![Page 11: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/11.jpg)
![Page 12: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/12.jpg)
![Page 13: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/13.jpg)
Source : http://www.ibmbigdatahub.com/infographic/four-vs-big-data
![Page 14: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/14.jpg)
WHY BIG DATA ?
WHY NOW ?
2,500 exabytes of new information in 2012 with Internet as primary driver
Digital universe grew by 62% last year to 800K petabytes and will grow to
1.2 “zettabytes” this year
Source : An IDC White Paper- As the Economy Contracts, the Digital Universe
![Page 15: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/15.jpg)
Problems
with
Big Data
?
![Page 16: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/16.jpg)
Read Write Disk is Slow
1Tb Drives are read at 100Mb/sec
Use Disks in Parallel
1 HDD = 100 Mb/sec
100 HDD = 10 Gb /Sec
Solution
![Page 17: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/17.jpg)
Problem #2
Hardware Failure
Single Machine Failure
Keep Multiple Copies of Data
Solution
![Page 18: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/18.jpg)
Problem #3
Merge Data from Different Reads
Keep Multiple Copies of Data
Solution
Only completed results need to be taken into consideration
and failed results need to be ignored
Data needs to be compressed to be sent across the network
![Page 19: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/19.jpg)
DISTRIBUTED
FAULT TOLERENT
SCALABLE
FLEXIBLE
INTILLIGENT
![Page 20: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/20.jpg)
Hadoop
Components
HDFS Map Reduce
Distributed File Manager Map Reduce
![Page 21: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/21.jpg)
• Designed for modest number of Large files (millions
instead of billions)
• Sequential access not Random access
• Write Once, Read Many
• Data is split into chunks and stored in multiple nodes
as blocks
• Namenode maintains the block locations
• Blocks get replicated over the data nodes
• Single namespace and accessible universally
• Computation is moved to the data – data locality
HDFS Overview
![Page 22: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/22.jpg)
Map Reduce Overview
• Tasks are distributed to multiple nodes
• Each node processes the data stored in that node
• Consists of two phase:
• Map – Reads input data and output intermediate
keys and values
• Reduce – Values with the same key are sent to
the same reducer for further processing
![Page 23: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/23.jpg)
HDFS
HDFS v2 YARN
ZO
OK
EE
PE
R
C
oord
inato
r
F
LU
ME
L
og
Co
lle
cto
r
SQ
OO
P
Data
Exchanger
Wo
rkflo
w
P
IG
S
cripting
H
IVE
SQ
L Q
uery
Mach
ine
Le
arn
ing
C
olu
mn
S
tore
Hadoop Ecosystem
![Page 24: Hadoop An Introduction](https://reader030.vdocuments.net/reader030/viewer/2022021506/5873c02d1a28abbc788b647d/html5/thumbnails/24.jpg)
Thank You !!