hadoop basics
TRANSCRIPT
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Big Data & Hadoop
D. Praveen KumarJunior Research Fellow
Department of Computer Science & EngineeringIndian Institute of Technology (Indian School of Mines)
Dhanbad, Jharkhand, India
Head of IT & ITES, Skill Subsist Impels Ltd, Tirupati.
March 25, 2017
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 1 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
1 Introduction
2 Big Data
3 Sources of Big Data
4 Tools
5 HDFS
6 Installation
7 Configuration
8 Starting & Stopping
9 Map Reduce
10 Execution
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 2 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Data
Data means a value or set of values.
Examples:march 1st 201720, 30, 40ΨΦϕ
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 3 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Information
Meaningful or preprocessed data we called as Information.Examples:
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 4 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Data Types
The kind of data that may appear in a computer.
Examples: intfloatchardoubleAbstract data types -user defined data types.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 5 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Traditional approaches
Traditional approaches to store and process the data
1 File system
2 RDBMS (Relational Database Management Systems)
3 Data Warehouse & Mining Tools
4 Grid Computing
5 Volunteer Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 6 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
GUESTS =4
Transportation from railway station to yourhome( one Auto/car is sufficient)
mom can prepare food or snacks without risk.
Your house is sufficient for Accommodation.
Facilities like bed, bathrooms, water and TV areprovided which you use.
You can talk to each other and crack jokes andyou can make them happy
Expenditure is nearly Rs.1000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 7 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
GUESTS =100
Transportation = 25 autos/car or twobuses
Food = catering.
Accommodation = Lodge.
Facilities = AC, TV, and all other facilities
Maintenance= somewhat difficult
Expenditure =nearly Rs. 90,000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 8 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
GUESTS =10000
Transportation = 2500 autos or 500 buses
Food = catering.
Accommodation = all Lodges, functionhalls and cottages in the town.
Facilities = AC, TV, and all otherfacilities are somewhat difficult to provide.
Maintenance= more difficult
Expenditure =nearly Rs. 2,00,000/-
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 9 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Grid Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 10 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Volunteer Computing
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 11 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
GUESTS =10000000
Transportation=how many autos=?
Food =?
Accommodation =?
Facilities =?
Maintenance=?
Cost =?
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 12 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Problems
Same we assume in computing environment
Difficult to handle a huge and ever growing amount of data
Processing of data can not be possible with few machines
distributing large data sets is difficult
Construction of online or offline models are very difficult
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 13 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Solution
A single solution to all these problems is
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 14 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
What is Big Data?
Big data refers to voluminous amounts of structured orunstructured data that organizations can potentially mine andanalyze.
Big data is huge amount of large data sets characterized by
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 15 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Data generation
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 16 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
How Data generated
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 17 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Internet of Events
Internet is the main source to generating the wast amount of data.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 18 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
4 Internet of Events
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 19 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
4 Questions of Data Analysts
1 What happened?
2 Why did it happen?
3 What will happen?
4 What is the best that can happen?
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 20 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Big Data Platforms and Analytical Software
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 21 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Hadoop
Here we go with
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 22 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Hadoop History
Hadoop was created by Doug Cutting, creator of Lucene.
He also involved in a project called Nutch. (It is basic versionof hadoop)
Nutch is a combination of MapReduce and NDFS (NutchDistributed File System)
Later Nutch renamed to Hadoop. (Mapreduce + HDFS(Hadoop Distributed File System))
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 23 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Hadoop
Apache Hadoop is an open-source software framework fordistributed storage and distributed processing of very large datasets on computer clusters built from commodity hardware.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 24 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Hadoop
The base Apache Hadoop framework is composed of the followingmodules:
Hadoop Common contains libraries and utilities needed byother Hadoop modules
Hadoop Distributed File System (HDFS) a distributedfile-system that stores data
Hadoop YARN a resource-management platform
Hadoop MapReduce for large scale data processing.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 25 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Hadoop Components
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 26 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Hadoop Components
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 27 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
HDFS- Goals
The design goals of HDFS
1 Very Large files
2 Streaming Data Access
3 Commodity Hardware
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 28 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
HDFS- Failed in
HDFS is Not FIT for
1 Lots of small files
2 Low latency database access
3 Multiple writers, arbitrary file modifications
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 29 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
HDFS- Concepts
1 Blocks
2 Namenodes
3 Datanodes
4 HDFS Federation
5 HDFS High Availability
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 30 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Requirements
Necessary
Java >= 7
ssh
Linux OS (Ubuntu >=14.04)
Hadoop framework
Optional
Eclipse
Internet connection
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 31 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Java 7 & Installation
Hadoop requires a working Java installation. However, usingjava 1.7 or more is recommended.
Following command is used to install java in linux platformsudo apt-get install openjdk-7-jdk (or)
sudo apt-get install default-jdk
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 32 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Java PATH Setup
We need to set JAVA path
Open the .bashrc file located in home directorygedit ~/.bashrc
Add below line at the end:export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 33 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Installation & Configuration of SSH
Hadoop requires SSH(Secure Shell) access to manage itsnodes, i.e. remote machines plus your local machine if youwant to use Hadoop on it.
Install SSH using following commandsudo apt-get install ssh
First, we have to generate DSA an SSH key for user.ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa
cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 34 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Download & Extract Hadoop
Download Hadoop from the Apache Download Mirrors
http://mirror.fibergrid.in/apache/hadoop/common/
Extract the contents of the Hadoop package to a location of yourchoice. I picked /usr/local/hadoop.$ cd /usr/local
$ sudo tar xzf hadoop-2.7.2.tar.gz
$ sudo mv hadoop-2.7.2 hadoop
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 35 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Add Hadoop configuration in .bashrc
Add Hadoop configuration in .bashrc in home directory.export HADOOP INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP INSTALL/bin
export PATH=$PATH:$HADOOP INSTALL/sbin
export HADOOP MAPRED HOME=$HADOOP INSTALL
export HADOOP HDFS HOME=$HADOOP INSTALL
export HADOOP COMMON HOME=$HADOOP INSTALL
export YARN HOME=$HADOOP INSTALL
export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native
export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 36 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Create temp file, DataNode & NameNode
Execute below commands to create NameNodemkdir -p /usr/local/hadoopdata/hdfs/namenode
Execute below commands to create DataNodemkdir -p /usr/local/hadoopdata/hdfs/datanode
Execute below code to create the tmp directory in hadoopsudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 37 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Files to Configure
The following are the files we need to configure
core-site.xml
hadoop-env.sh
mapred-site.xml
hdfs-site.xml
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 38 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Add properties in /usr/local/hadoop/etc/core-site.xml
Add the following snippets between the< configuration > ... < /configuration > tags in the core-site.xmlfile.
Add below property to specify the location of tmp< property >< name > hadoop.tmp.dir < /name >< value > /app/hadoop/tmp < /value >< /property >
Add below property to specify the location of default filesystem and its port number.< property >< name > fs.default.name < /name >< value > hdfs : //localhost : 9000 < /value >
< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 39 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Add properties in /usr/local/hadoop/etc/hadoop-env.sh
Un-Comment the JAVA HOME and Give Correct Path ForJava.export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 40 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Add property in/usr/local/hadoop/etc/hadoop/mapred-site.xml
In file we add The host name and port that the MapReduce jobtracker runs at. Add following in mapred-site.xml :< property >< name > mapred .job.tracker < /name >< value > localhost : 54311 < /value >< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 41 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Add properties in ... etc/hadoop/hdfs-site.xml
In file hdfs-site.xml add following:
Add replication factor< property >< name > dfs.replication < /name >< value > 1 < /value >
< /property >
Specify the NameNode< property >< name > dfs.namenode.name.dir < /name >< value > file : /usr/local/hadoopdata/hdfs/namenode < /value >
< /property >
Specify the DataNode< property >< name > dfs.datanode.name.dir < /name >< value > file : /usr/local/hadoopdata/hdfs/datanode < /value >
< /property >
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 42 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Formatting the HDFS filesystem via the NameNode
The first step to starting up your Hadoop installation is
Formatting the Hadoop file system
We need to do this the first time you set up a Hadoop.
Do not format a running Hadoop filesystem as you will lose allthe data currently in HDFS
To format the filesystem, run the commandhadoop namenode -format
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 43 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Starting single-node cluster
Run the command:start-all.sh
This will startup a NameNode,SecondaryNameNode,DataNode, ResourceManager and a NodeManager on yourmachine.
A nifty tool for checking whether the expected Hadoopprocesses are running is jpshadoop1@hadoop1:/usr/local/hadoop$ jps
2598 NameNode3112 ResourceManager3523 Jps2917 SecondaryNameNode2727 DataNode3242 NodeManager
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 44 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Stopping your single-node cluster
Run the commandstop-all.sh
To stop all the daemons running on your machine output will belike this.stopping NodeManagerlocalhost: stopping ResourceManagerstopping NameNodelocalhost: stopping DataNode
localhost: stopping SecondaryNameNode
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 45 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Map-Reduce Framework
Map Reduce programming paradigm
It relies basically on two functions, Map and Reduce
Map Reduce used to manage many large-scale computations
The framework takes care of scheduling tasks, monitoringthem and re-executes the failed tasks.
The framework to effectively schedule tasks on the nodeswhere data is already present
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 46 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Map-Reduce Computation Steps
The key-value pairs from each Map task are collected by amaster controller and sorted by key. The keys are dividedamong all the Reduce tasks, so all key-value pairs with thesame key wind up at the same Reduce task.
The Reduce tasks work on one key at a time, and combineall the values associated with that key in some way. Themanner of combination of values is determined by the codewritten by the user for the Reduce function.
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 47 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Hadoop - MapReduce
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 48 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Hadoop - MapReduce (Word Count) Example
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 49 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
MapReduce - WordCountMapper
In WordCountMapper class we perform the following operations
Read a line from file
Split line into Words
Assign Count 1 to each word
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 50 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
WordCountMapper source code
public static class WordCountMapper
extends Mapper<Object, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context ) throws
IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {word.set(itr.nextToken());
context.write(word, one);
}}}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 51 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
MapReduce - WordCountReducer
In WordCountReducer class we perform the following operations
Sum the list of values
Assign sum to corresponding word
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 52 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
WordCountReducer source code
public static class WordCountReducer
extends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context ) throws IOException, InterruptedException {int sum = 0;
for (IntWritable val : values) {sum += val.get();
}result.set(sum);
context.write(key, result);
}}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 53 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
WordCountJob
public class WordCountJob {public static void main(String[] args) throws Exception {Configuration conf = new Configuration();
Job job = new Job(conf, "word count");
job.setJarByClass(WordCountJob.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}}
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 54 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Header Files to include
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 55 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Execution of Hadoop Program in Eclipse
Step1:
1 Starting Hadoop in terminal using command:$ Start-all.sh
2 Use JPS command to check all services of Hadoop are startedor not.
Step 2: open EclipseStep 3: Go to file ⇒ New ⇒ ProjectSelect Java Project and click on Next buttonWrite project name and click on Finish button
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 56 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Continue...
Step 4: Right side it creates a project
1 Right click on Project ⇒ New ⇒ Class
2 Write Name of Class and then Click Finish
3 Write MapReduce program in that class
Step 5: Write JAVA Program
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 57 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Continue...
Step 6: Importing JAR files
1 Right click on Project and select properties (Alt+Enter)
2 Select Java Build Path ⇒ Click on Libraries, then click on addexternal JARS
3 Select the following jars from Hadoop library./usr/local/Hadoop/share/Hadoop/common/libs
/usr/local/Hadoop/share/Hadoop/hdfs/libs
/usr/local/Hadoop/share/Hadoop/httpfs/libs
/usr/local/Hadoop/share/Hadoop/mapreduce/libs
/usr/local/Hadoop/share/Hadoop/yarn/libs
/usr/local/Hadoop/share/Hadoop/tools/
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 58 / 60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution
Continue ....
Step 7: Set input file path
1 Create folder in home dir
2 copy text files in to that
3 Select path of Input
Step 8: Set input and output path
1 right click on source ⇒ Run As ⇒ Run Configuration ⇒Argument
2 Enter your input and out put path with a single space
3 click on Run
Sree Venkateswara College of Engineering, Nellore, A. P.
Big Data & Hadoop
March 25, 2017 Slide: 59 / 60