hadoop hive presentation
DESCRIPTION
Hadoop seminar topic,Hadoop Cse,Hadoop pptTRANSCRIPT
Hadoop
Agenda• Problems with traditional large-scale systems
• Requirements for new approaches
• What is Hadoop..?
• Why Hadoop?
• Overview of Hadoop
• HDFS
• Map Reduce
• Applications
• Conclusion
Problems with traditional large-scale systems
Data is being increased day-by-day Issues with the network failure Server failure Loss of data Cost is more. Distributed computing need manual processing
Requirements for new approaches
Data should be stored in a distributed manner and parallel processing.
High performance and less cost. Should be scalable Should be simple to access and process Fault tolerance
What is Hadoop…?
Open Source Framework
Process large amount of data
Why Hadoop…?
• Accessible• Scalable• Robust• Simple
Overview of Hadoop
It handles 3 types of data
Structured
Semi – structured
Unstructured
Analyses and process large amounts of data (Peta byte)
Compare with traditional DB’s
RDBMS
• Stores GB’s of data
• Supports batch process and interactive process
• Allows Updation
• Schemas must me defined
• Only structured data
HADOOP
• Stores PB’s of data
• Only batch process
• Does not allow Updation, it follows WORM
• Schemas not required
• Supports 3 types of data
Components
Hadoop can be divided into 2 parts
1. HDFS – Hadoop Distributed File System
2. MapReduce Programming model
Hadoop Distributed File System
It is a distributed file system
Runs on commodity hardware
Provides high throughput access to application data
suitable for applications that have large data sets.
It is designed to store a very large amount of data (Tera or peta bytes).
Core Architectural Goal of HDFS
A HDFS instance may consist of thousands of server machines.
Detection of faults and quickly recovering from them in an automated manner
MapReduce Programming Model
MapReduce works on divide and conquer rule on the data.
Schedules execution across a set of machines
Manages inter-process communication
The Reducer processes all output from all mappers and arrives at final output
MapReduce Programming Model
– MAP• Map() function that processes a key/value pair to
generate a set of intermediate key/value pairs
– REDUCE• reduce() function that merges all intermediate values
associated with the same intermediate key.
Applications
REFERENCE
• HADOOP IN ACTION
- By CHUK LAM
• YOUTUBE
• WIKEPEDIA
• GOOGLE IMAGES
Conclusion