jerrinjoseph hadoop pptcis.csuohio.edu/~sschung/cis611/jerrinjoseph_hadoop_ppt.pdfhdfs: key features...
TRANSCRIPT
![Page 1: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/1.jpg)
APACHE HADOOP
JERRIN JOSEPH
CSU ID#2578741
![Page 2: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/2.jpg)
CONTENTS
� Hadoop
� Hadoop Distributed File System (HDFS)
� Hadoop MapReduce
� Introduction
� Architecture
� Operations
� Conclusion
� References
![Page 3: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/3.jpg)
ABSTRACT
� Hadoop is an efficient Big data handling tool.
� Reduced the data processing time from ‘days’ to
‘hours’.
� Hadoop Distributed File System(HDFS) is the
data storage unit of Hadoop.
� Hadoop MapReduce is the data processing unit
which works on distributed processing principle.
![Page 4: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/4.jpg)
INTRODUCTION
�What is Big Data??
� Bulk Amount
� Unstructured
� Lots of Applications which need to handle huge
amount of data (in terms of 500+ TB per day)
� If a regular machine need to transmit 1TB of
data through 4 channels : 43 Minutes.
�What if 500 TB ??
![Page 5: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/5.jpg)
HADOOP
� “The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters of
computers using simple programming models”[1]
� Core Components :
� HDFS: large data sets across clusters of
computers.
� Hadoop MapReduce: the distributed
processing using simple programming models
![Page 6: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/6.jpg)
HADOOP : KEY FEATURES
� High Scalability
� Highly Tolerant to Software & Hardware
Failures
� High Throughput
� Best for larger files with less in number
� Performs fast and parallel execution of Jobs
� Provides Streaming access to data
� Can be built out of commodity hardware
![Page 7: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/7.jpg)
HADOOP: DRAWBACKS
� Not good for Low-latency data access
� Not good for Small files with large in number
� Not good for Multiple write files
� Do not encryption at storage level or network
level
� Have a high complexity security model
� Hadoop is not a Database: Hence cannot alter a
file.
![Page 8: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/8.jpg)
HADOOP ARCHITECTURE
![Page 9: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/9.jpg)
HADOOP
DISTRIBUTED FILE
SYSTEM (HDFS)
![Page 10: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/10.jpg)
HADOOP DISTRIBUTED FILE
SYSTEM (HDFS)
� Storage unit of Hadoop
� Relies on principles of Distributed File System.
� HDFS have a Master-Slave architecture
�Main Components:
� Name Node : Master
� Data Node : Slave
� 3+ replicas for each block
� Default Block Size : 64MB
![Page 11: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/11.jpg)
HDFS: KEY FEATURES
� Highly fault tolerant. (automatic failure recovery
system)
� High throughput
� Designed to work with systems with vary large
file (files with size in TB) and few in number.
� Provides streaming access to file system data. It
is specifically good for write once read many kind
of files (for example Log files).
� Can be built out of commodity hardware. HDFS
doesn't need highly expensive storage devices.
![Page 12: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/12.jpg)
HDFS ARCHITECTURE
![Page 13: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/13.jpg)
NAME NODE
�Master of HDFS
�Maintains and Manages data on Data Nodes
� High reliability Machine (can be even RAID)
� Expensive Hardware
� Stores NO data; Just holds Metadata!
� Secondary Name Node:
� Reads from RAM of Name Node and stores it to hard
disks periodically.
� Active & Passive Name Nodes from Gen2 Hadoop
![Page 14: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/14.jpg)
DATA NODES
� Slaves in HDFS
� Provides Data Storage
� Deployed on independent machines
� Responsible for serving Read/Write requests from
Client.
� The data processing is done on Data Nodes.
![Page 15: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/15.jpg)
HDFS OPERATION
![Page 16: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/16.jpg)
� Client makes a Write request to Name Node
� Name Node responds with the information about on available data nodes and where data to be written.
� Client write the data to the addressed Data Node.
� Replicas for all blocks are automatically created by Data Pipeline.
� If Write fails, Data Node will notify the Client and get new location to write.
� If Write Completed Successfully, Acknowledgement is given to Client
� Non-Posted Write by Hadoop
HDFS OPERATION
![Page 17: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/17.jpg)
HDFS: FILE WRITE
![Page 18: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/18.jpg)
HDFS: FILE READ
![Page 19: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/19.jpg)
HADOOP
MAPREDUCE
![Page 20: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/20.jpg)
HADOOP MAPREDUCE
� Simple programming model
� Hadoop Processing Unit
�MapReduce also have Master-Slave architecture
�Main Components:
� Job Tracker : Master
� Task Tracker : Slave
� From Google’s MapReduce
� Do not fetch data to Master Node; Processed data
at Slave Node and returns output to Master
![Page 21: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/21.jpg)
� Implemented using Maps and Reduces
� Split by FileInputFormat
� Maps
� Inheriting Mapper Class
� Produces (key, value) pair as intermediate result
from data.
� Reduces
� Inheriting Reducer Class
� Produces required output from intermediate result
produced by Maps.
HADOOP MAPREDUCE
![Page 22: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/22.jpg)
JOB TRACKER
�Master in MapReduce
� Receives the job request from Client
� Governs execution of jobs
�Makes the task scheduling decision
TASK TRACKER
� Slave in MapReduce
� Governs execution of Tasks
� Periodically reports the progress of tasks
![Page 23: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/23.jpg)
MAPREDUCE ARCHITECTURE
![Page 24: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/24.jpg)
MAPREDUCE OPERATIONS
![Page 25: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/25.jpg)
MAPREDUCE OPERATIONS
![Page 26: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/26.jpg)
MAPREDUCE OPERATIONS
![Page 27: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/27.jpg)
MAPREDUCE OPERATIONS
![Page 28: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/28.jpg)
APACHE HIVE
![Page 29: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/29.jpg)
HIVE
� Built on top of Hadoop
� Supports SQL like Query Language : Hive-QL
� Data in Hive is organized into tables
� Provides structure for unstructured Big Data
�Work with data inside HDFS
� Tables
� Data : File or Group of Files in HDFS
� Schema : In the form of metadata stored in Relational Database
� Have a corresponding HDFS directory
� Data in a table is Serialized
� Supports Primitive Column Types and Nestable Collection Types (Array and Map)
![Page 30: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/30.jpg)
HIVE QUERY LANGUAGE
� SQL like language
� DDL : to create tables with specific serialization formats
� DML : to load data from external sources and insert query results into Hive tables
� Do not support updating and deleting rows in existing tables
� Supports Multi-Table insert
� Supports custom map-reduce scripts written in any language
� Can be extended with custom functions (UDFs)
� User Defined Transformation Function(UDTF)
� User Defined Aggregation Function (UDAF)
![Page 31: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/31.jpg)
� External Interfaces:
� Web UI : Management
� Hive CLI : Run Queries, Browse Tables, etc
� API : JDBC, ODBC
�Metastore :
� System catalog which contains metadata about Hive tables
� Driver :
� manages the life cycle of a Hive-QL statement during compilation, optimization and execution
� Compiler :
� translates Hive-QL statement into a plan which consists of a DAG of map-reduce jobs
HIVE ARCHITECTURE
![Page 32: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/32.jpg)
HIVE ARCHITECTURE
![Page 33: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/33.jpg)
HIVE ACHIEVEMENTS & FUTURE
PLANS
� First step to provide warehousing layer for
Hadoop(Web-based Map-Reduce data processing
system)
� Accepts only sub-set of SQL: Working to subsume
SQL syntax
�Working on Rule-based optimizer : Plans to build
Cost-based optimizer
� Enhancing JDBC and ODBC drivers for making
the interactions with commercial BI tools.
�Working on making it perform better
![Page 34: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/34.jpg)
APACHE HBASE
![Page 35: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/35.jpg)
H-BASE
� Distributed Column-oriented database on top of Hadoop/HDFS
� Provides low-latency access to single rows from billions of records
� Column oriented:
� OLAP
� Best for aggregation
� High compression rate: Few distinct values
� Do not have a Schema or data type
� Built for Wide tables : Millions of columns Billions of rows
� Denormalized data
�Master-Slave architecture
![Page 36: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/36.jpg)
H-BASE ARCHITECTURE
![Page 37: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/37.jpg)
HMASTER SERVER
� Like Name Node in HDFS
�Manages and Monitors HBase Cluster
Operations
� Assign Region to Region Servers
� Handling Load-balancing and Splitting
REGION SERVER
� Like Data Node in HDFS
� Highly Scalable
� Handle Read/ Write Requests
� Direct communication with Clients
![Page 38: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/38.jpg)
INTERNAL ARCHITECTURE
� Tables Regions
� Store
�MemStore
� FileStore Blocks
� Column Families
![Page 39: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/39.jpg)
APACHE
ZOOKEEPER
![Page 40: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/40.jpg)
ZOOKEEPER
�What is ZooKeeper?
� Distributed coordination service for distributed applications
� Like a Centralized Repository
� Challenges for Distributed Applications
� ZooKeeper Goals
![Page 41: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/41.jpg)
ZOOKEEPER ARCHITECTURE
![Page 42: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/42.jpg)
� Always Odd number of nodes.
� Leader is elected by voting.
� Leader and Follower can get connected to Clients
and Perform Read Operations
�Write Operation is done only by the Leader.
� Observer nodes to address scaling problems
ZOOKEEPER ARCHITECTURE
![Page 43: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/43.jpg)
ZOOKEEPER DATA MODEL
![Page 44: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/44.jpg)
� Z Nodes:
� Similar to Directory in File system
� Container for data and other nodes
� Stores Statistical information and User data up to
1MB
� Used to store and share configuration information
between applications
� Z Node Types
� Persistent Nodes
� Ephemeral Nodes
� Sequential Nodes
�Watch : Event system for client notification
ZOOKEEPER DATA MODEL
![Page 45: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/45.jpg)
PROJECTS & TOOLS ON
HADOOP
� HBase
� Hive
� Pig
� Jaql
� ZooKeeper
� AVRO
� UIMA
� Sqoop
![Page 46: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/46.jpg)
CONCLUSION
� Hadoop is a successful solution for Big Data
Handling
� Hadoop expanded from a simple project to the
level of a platform
� The projects and tools on Hadoop are proof for
the successfulness of Hadoop.
![Page 47: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/47.jpg)
REFERENCES
[1] "Apache Hadoop", http://hadoop.apache.org/
[2] “Apache Hive”, http://hive.apache.org/
[3] “Apache HBase”, https://hbase.apache.org/
[4] “Apache ZooKeeper”, http://zookeeper.apache.org/
[5] Jason Venner, "Pro Hadoop", Apress Books, 2009
[6] "Hadoop Wiki", http://wiki.apache.org/hadoop/
[7] Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, Xiao Qin, " Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters", 19th International Heterogeneity in Computing Workshop, Atlanta, Georgia, April 2010
![Page 48: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/48.jpg)
[8] Dhruba Borthakur, The Hadoop Distributed File System: Architecture and Design, The Apache Software Foundation 2007.
[9] "Apache Hadoop", http://en.wikipedia.org/wiki/Apache_Hadoop
[10] "Hadoop Overview", http://www.revelytix.com/?q=content/hadoop-overview
[11] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, The Hadoop Distributed File System, Yahoo!, Sunnyvale, California USA, Published in: Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium.
REFERENCES
![Page 49: JerrinJoseph Hadoop pptcis.csuohio.edu/~sschung/cis611/JerrinJoseph_Hadoop_ppt.pdfHDFS: KEY FEATURES Highly fault tolerant. (automatic failure recovery system) High throughput Designed](https://reader035.vdocuments.net/reader035/viewer/2022071300/60933c647b120004097127fc/html5/thumbnails/49.jpg)
[12] Vinod Kumar Vavilapalli, Arun C Murthy, Chris
Douglas, Sharad Agarwal, Mahadev Konar, Robert
Evans, Thomas Graves, Jason Lowe, Hitesh Shah,
Siddharth Seth, Bikas Saha, Carlo Curino, Owen
O’Malley, Sanjay Radia, Benjamin Reed, Eric
Baldeschwieler, Apache Hadoop YARN: Yet Another
Resource Negotiator, ACM Symposium on Cloud
Computing 2013, Santa Clara, California.
[13] Raja Appuswamy, Christos Gkantsidis, Dushyanth
Narayanan, Orion Hodson, and Antony Rowstron,
Scale-up vs Scale-out for Hadoop: Time to rethink?,
Microsoft Research, ACM Symposium on Cloud
Computing 2013, Santa Clara, California.
REFERENCES