hadoop - files.meetup.comfiles.meetup.com/2808892/hadoopbigdata_presentation.pdf · hdfs – blocks...
TRANSCRIPT
![Page 1: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/1.jpg)
Beginners Guide
HADOOP
![Page 2: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/2.jpg)
Agenda
● What is Hadoop?● What is Big Data?● Architecture● MapReduce● Querying Data● Examples of Hadoop in Action
![Page 3: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/3.jpg)
What is Hadoop?
![Page 4: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/4.jpg)
What is Hadoop?
![Page 5: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/5.jpg)
What is Hadoop?
![Page 6: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/6.jpg)
What is Hadoop?
![Page 7: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/7.jpg)
What is Hadoop?
![Page 8: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/8.jpg)
What is Hadoop?
● Open Source project● Written in Java● Optimised to handle
– Massive amounts of data through parallelism
– A variety of data ( structured, unstructured, semi-structured)
– Using commodity hardware
● Great performance
![Page 9: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/9.jpg)
What is Hadoop?
● Reliability provided through replication● Not for OLTP, not for OLAP/DSS● Good for Big Data
![Page 10: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/10.jpg)
What is Big Data?
● RFID Reader
![Page 11: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/11.jpg)
What is Big Data?
● 2 billion of Internet users
![Page 12: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/12.jpg)
What is Big Data?
● 4.7 Billion Mobile phones
![Page 13: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/13.jpg)
What is Big Data?
● 7 TB of data processed by Twitter every day
![Page 14: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/14.jpg)
What is Big Data?
● 10 TB of data processed by Facebook every day
![Page 15: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/15.jpg)
What is Big Data?
● About 80% of data is unstructured
![Page 16: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/16.jpg)
Architecture
● HDFS● MapReduce● Types of Nodes● Topology Awareness● Writing a File to HDFS● HDFS CLI
![Page 17: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/17.jpg)
Architecture
![Page 18: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/18.jpg)
Architecture
![Page 19: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/19.jpg)
Architecture
● Two main components– Distributed File System– MapReduce Engine
![Page 20: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/20.jpg)
HDFS
● HDFS runs on top of a existing file system● Designed to handle very large files with
streaming data access● Uses blocks to store a file or parts of a file
![Page 21: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/21.jpg)
HDFS – Blocks
● File blocks– 64 MB – 128 MB– 1 HDFS block is supported by many OS Blocks
![Page 22: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/22.jpg)
HDFS – Blocks
● Advantages of blocks– Fixed size, easy to calculate how many fit on
a disk– A file can be larger than any disk in the
cluster– If a file or a chunk of a file is smaller than
the block, only needed space is used– Fits well with replication
![Page 23: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/23.jpg)
HDFS – Replication
● Blocks are replicated to multiple nodes● Allows node failure without data loss
![Page 24: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/24.jpg)
MapReduce Engine
● Technology from Google● A MapReduce program consists of map and
reduce functions● A MapReduce jobs is divided into tasks that
run in parallel
![Page 25: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/25.jpg)
Type of Nodes
● HDFS Nodes– NameNode– DataNode
● MapReduce Nodes– Job Tracker– Task Tracker
![Page 26: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/26.jpg)
Type of Nodes
![Page 27: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/27.jpg)
Type of Nodes
● NameNode– Only 1 per Hadoop Cluster– Manages the filesystem namespace and
metadata– Single point of failure– Large memory requirements
![Page 28: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/28.jpg)
Type of Nodes
● DataNode– Many per Hadoop Cluster– Manages blocks with data and serves them to
the clients– Periodically reports to the NameNode the list
of blocks it stores– Commodity hardware
![Page 29: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/29.jpg)
Type of Nodes
● JobTracker Node– 1 per Hadoop Cluster– Receives job request submitted by clients– Schedule and monitor MapReduce jobs on
tasks trackers
![Page 30: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/30.jpg)
Type of Nodes
● TaskTracker Node– Many per Hadoop Cluster– Executes MapReduce operations
![Page 31: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/31.jpg)
Topology Awareness
![Page 32: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/32.jpg)
Topology Awareness
![Page 33: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/33.jpg)
Topology Awareness
![Page 34: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/34.jpg)
Topology Awareness
![Page 35: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/35.jpg)
Writing a File to HDFS
![Page 36: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/36.jpg)
Writing a File to HDFS
![Page 37: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/37.jpg)
Writing a File to HDFS
![Page 38: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/38.jpg)
Writing a File to HDFS
![Page 39: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/39.jpg)
HDFS Command Line
● File System Shell
![Page 40: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/40.jpg)
MapReduce
● Map Operation● Reduce Operation● Submitting a MR job● The Shuffle● Data Types● Fault tolerance● Scheduling / Task Execution
![Page 41: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/41.jpg)
Map Operation
![Page 42: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/42.jpg)
Map Operation
![Page 43: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/43.jpg)
Map Operation
![Page 44: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/44.jpg)
Reduce Operation
![Page 45: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/45.jpg)
Reduce Operation
![Page 46: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/46.jpg)
Reduce Operation
![Page 47: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/47.jpg)
Submitting a MR Job
![Page 48: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/48.jpg)
Submitting a MR Job
![Page 49: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/49.jpg)
Data Types
● Key / Value● Lists
![Page 50: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/50.jpg)
Data Types
● Simple data flow example
![Page 51: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/51.jpg)
Fault Tolerance
● Task Failure– Child task fails, the JVM reports to the
TaskTracker.– Child task hangs, it is killed. JobTracker
reschedule the task on another machine.– If task continues to fail, job is failed
![Page 52: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/52.jpg)
Fault Tolerance
● TaskTracker Failure– JobTracker receives no heartbeat– Remove TaskTracker from the pool
● JobTracker Failure– Single point of failure. Job Fails
![Page 53: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/53.jpg)
Scheduling
● FIFO– Each job uses the whole Hadoop Cluster
● Fair– Job is placed in pools
● Capacity– Hadoop simulates for each user a separeta
MP Cluster with FIFO scheduling
![Page 54: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/54.jpg)
Task Execution
● Speculative– Job execution is time sensitive to slow-
running tasks● JVM Reuse
– Use the same JVM through configuration
![Page 55: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/55.jpg)
Querying Data
● Pig● Hive
![Page 56: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/56.jpg)
Pig
● Developed by Yahoo!● Pig is a platform for analysing large data sets
that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
![Page 57: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/57.jpg)
Pig
● Two Components– Language (PigLatin)– Execution environment
● Two Execution environments– (Single JVM)
● pig -x local– Distributed System
● pig -x mapreduce / pig
![Page 58: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/58.jpg)
Pig
● Running Pig– Script
● pig script-name.pig– Grunt
● pig (launch Command Line Tool)– Embedded
● Calling pig from Java
![Page 59: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/59.jpg)
HIVE
● Hive is a data warehouse system for Hadoop that facilitates easy data summarisation, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.
● Provide a SQL-lke language HiveQL
![Page 60: HADOOP - files.meetup.comfiles.meetup.com/2808892/HadoopBigData_Presentation.pdf · HDFS – Blocks Advantages of blocks – Fixed size, easy to calculate how many fit on a disk –](https://reader036.vdocuments.net/reader036/viewer/2022081613/5f8baf5a0f4ea761cf5552e8/html5/thumbnails/60.jpg)
HIVE
● Running Hive– Interactive
● Hive– Script
● Hive -f my-script– Inline
● Hive -e 'SELECT * FROM MyTable'