introduction to hadoop technology
TRANSCRIPT
![Page 1: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/1.jpg)
HADOOP TECHNOLOGY
Presented by :- Manish.S. Borkar
Poly 6th sem, IT branch, Nagpur Polytechnic, Nagpur
![Page 2: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/2.jpg)
Contents
• Introduction to Hadoop
• Why Hadoop ?
• Pillars of Hadoop
• Architecture of Hadoop
• HDFS Architecture
• MapReduce
• Hadoop Projects
• Conclusion
• References
![Page 3: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/3.jpg)
Introduction to Hadoop
HADOOP IS A OPEN SOURCE FRAME WORK WRITTEN IN JAVA.
DISTRIBUTED DATA STORGE.
PARALLEL PROCESSING OF DATA.
HADOOP WAS CREATED BY DOUGH CUTTING AND MIKE CAFARELLA IN 2005.
HADOOP USES A CLUSTER OF COMMODITY SERVERS IN TIGHTLY CONNECTED NETWORK
![Page 4: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/4.jpg)
Cluster of machines running Hadoop at Yahoo!
![Page 5: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/5.jpg)
Why Hadoop ?
SCENARIO 1ST :-
Processing Vcards:
Example of VCARD
•BEGIN : VCARD•N: Manish Borkar•INSTT : Nagpur Polytechnic, Nagpur•DESIG : Student•EMAIL : [email protected] •URL : http://www.facebook.com/oasisfoundation•URL : http://www.twitter.com/manishborkar •END:VCARD
![Page 6: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/6.jpg)
![Page 7: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/7.jpg)
• 1 GB – 10 GB – 100 GB --- limits
• More Investments
• -- 10 TB – 100 TB --- again limits
• Data from Facebook, Twitter, RFID
readers, sensors.
• Structured / Unstructured
SCENARIO 2:-
![Page 8: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/8.jpg)
![Page 9: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/9.jpg)
![Page 10: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/10.jpg)
Here is come the solution
Hadoop….
![Page 11: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/11.jpg)
![Page 12: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/12.jpg)
Pillars of Hadoop
•Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster.
•Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications.
• Hadoop MapReduce – a programming model for large scale data processing.
![Page 13: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/13.jpg)
Architecture of Hadoop
![Page 14: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/14.jpg)
HDFS Architecture
![Page 15: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/15.jpg)
Name node:- The HDFS namespace is a hierarchy of files and directories. Files and directories are represented on the NameNode by inodes.
Data Node:- Each block replica on a DataNode is represented by two files in the local native filesystem. The first file contains the data itself and the second file records the block's metadata
HDFS Client:- User applications access the filesystem using the HDFS client, a library that exports the HDFS filesystem interface.
![Page 16: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/16.jpg)
MapReduce in Hadoop
• MapReduce is an associated implementation for processing and generating large data sets.
• A Map-Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner.
• A MapReduce job is a unit of work that the client wants to be performed: it consists of the input data, the MapReduce program, and configuration information. Hadoop runs the job by dividing it into tasks, of which there are two types: map tasks and reduce tasks
• Mapreduce is a progrmming model for processing and generating large data sets with a parallel, distributed algorithms on a cluster
![Page 17: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/17.jpg)
THE PROGRAMMING MODEL OF MAPREDUCE
Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs. The MapReduce library groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce function.
![Page 18: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/18.jpg)
The Reduce function, also written by the user, accepts an intermediate key I and a set of values for that key. It merges together these values to form a possibly smaller set of values
![Page 19: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/19.jpg)
MAPREDUCE ARCHITECTURE
![Page 20: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/20.jpg)
Hadoop Projects
Pig
Mahout
Hive
Avro
Strom
![Page 21: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/21.jpg)
Distributers of Hadoop
Amazon web Services Apache Bigtop Cascading Cloudera Cloudspace Datameter
![Page 22: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/22.jpg)
Users Of Hadoop
![Page 23: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/23.jpg)
Conclusion
• As the amount of data being stored around the globe continues to rise and the cost of technologies that enable the extraction of meaningful patterns .As the amount of data and cost of handling it increases this make difficult to organization to affort the cost and store the high amount of dataand to process it.Then the hadoop is the best choice for the growing world by its easy handling and largestoring of data.
![Page 24: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/24.jpg)
References
[1] UNIX Filesystems: Evolution, Design, and Implementation. WileyPublishing, Inc., 2003.
[2] The diverse and exploding digital universe. http://www.emc.com/digitaluniverse, 2009.
[3] Hadoop. http://hadoop.apache.org, 2009.
[4] en.wikipedia.org/wiki/Apache_Hadoop
[5] HDFS (hadoop distributed file system) architecture. http://hadoop.apache.org/common/docs/current/hdfs design.html, 2009.
![Page 25: Introduction to Hadoop Technology](https://reader034.vdocuments.net/reader034/viewer/2022052218/55ce2c8ebb61ebcf528b47f9/html5/thumbnails/25.jpg)
THANK YOU
Any Questions……?