concepts on hadoop
TRANSCRIPT
HadoopChris Sharkeytoday @shark2900
Why use hadoop?• Uses commodity hardware • Stores petabytes of data reliably • Allows for huge distributed computations• Open source project and ecosystem
Core concepts
Single Computer
Map Reduce
HDFS
Task TrackerData Node
or
Cluster
Task TrackerData Node
Job Tracker
Task TrackerData Node
Task TrackerData Node
Name Node
The ecosystem
Pig• Programing language • High level for MapReduce• ‘Compiler for MapReduce’
HDFS
MapReduce
Pig
Hive• SQL like interface • Querying & RD functionality • Familiar to traditional business intelligence operations
HDFS
MapReduce
Hive
HBase• NoSQL database • Based off of HDFS• Real time updating and access
HDFS
MapReduce
HBas
e
Zookeeper• Coordination services for many server architects. Distributed application management with added reliability
HDFS
MapReduce
HBas
e
Zook
eepe
r
Combined System
HDFS
MapReduce
HBas
e Zook
eepe
r
Hive
Pig
Summary• Powerful system for ‘big data’ • Commodity hardware • Redundant and reliable • Ecosystem affords modularity • Ecosystem affords relevance • Distributed analytics. Learn form petabytes of data
Questions?
Thank you