hadoop on aws amazon
TRANSCRIPT
Hadoop Cluster Configuration on AWS EC2
-----------------------------------------------------------------------------------------------------------Buy some Instances on Aws amazon and one master and 10 slaves
ec2-50-17-21-209.compute-1.amazonaws.com master ec2-54-242-251-124.compute-1.amazonaws.com slave1 ec2-23-23-17-15.compute-1.amazonaws.com slave2 ec2-50-19-79-241.compute-1.amazonaws.com slave3 ec2-50-16-49-229.compute-1.amazonaws.com slave4 ec2-174-129-99-84.compute-1.amazonaws.com slave5 ec2-50-16-105-188.compute-1.amazonaws.com slave6 ec2-174-129-92-105.compute-1.amazonaws.com slave7 ec2-54-242-20-144.compute-1.amazonaws.com slave8 ec2-54-243-24-10.compute-1.amazonaws.com slave9 ec2-204-236-205-227.compute-1.amazonaws.com slave10----------------------------------------------------------------------------------------------------------------------------
• Make seperation as one master and 10 slaves----------------------------------------------------------------------------------------------------------------------------
• Make sure ssh is working from master to all slaves----------------------------------------------------------------------------------------------------------------------------
• Add the Ip and DNS name and duplicate DNS name in /etc/hosts in master----------------------------------------------------------------------------------------------------------------------------
• Master /etc/hosts file Looks like this.
127.0.0.1 localhost localhost.localdomain 10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master 10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1 10.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave2 10.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave3 10.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave4 10.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave5 10.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave6 10.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave7 10.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave8 10.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave9 10.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10
----------------------------------------------------------------------------------------------------------------------------• and slaves etc/hosts file looks like this.
• remove 127.0.0.1 in all slaves
10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master 10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1 10.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave2 10.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave3 10.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave4
10.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave5 10.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave6 10.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave7 10.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave8 10.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave9 10.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10
---------------------------------------------------------------------------------------------------------------------------• Download Hadoop installation folder from ApacheHadoop release and keep it in master folder
(Ex:-/usr/local/hadoop1.0.4)----------------------------------------------------------------------------------------------------------------------------
• Open the Hadoop.env.sh file from (Hadoop-.10.4/conf/) folder----------------------------------------------------------------------------------------------------------------------------
• set the environment variables for JAVA PATH,HADOOP_HOME, LD_LIBRARY_PATH, HADOOP_OPTS
export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 export HADOOP_HOME=/usr/local/hadoop-1.0.4/ export LD_LIBRARY_PATH=/usr/local/hadoop-1.0.4/lib/native/Linux-amd64-64 export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"export HADOOP_HEAPSIZE=400000 (in MB)
----------------------------------------------------------------------------------------------------------------------------• Open the Hdfs-Site.xml file.
• and set the following param's<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/logs</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/media/ephemeral0/tmp-${user.name}
</value> </property> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>dfs.name.dir</name> <value>/media/ephemeral0/name-${user.name}</value> </property>
<property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>dfs.replication</name> <value>3</value> <description>Default block replication</description> </property>
<property> <name>dfs.block.size</name> <value>536870912</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>
</configuration>
----------------------------------------------------------------------------------------------------------------------------• Open the Mapred-site.xml.
• and set the following param's
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/logs</value> </property> <property> <name>mapred.child.java.opts</name> <value>60000</value> </property>
<property> <name>dfs.datanode.max.xcievers</name> <value>-Xmx400m</value> </property>
<property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>14</value> </property>
<property> <name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>14</value> </property> <property> <name>mapred.system.dir</name> <value>/media/ephemeral0/system-${user.name}</value> <description> system directory to run map and reduce tasks </description> </property> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/log-${user.name}</value> </property>
<property> <name>mapred.job.tracker</name> <value>master:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>10</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>
<property> <name> mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapred.create.symlink</name> <value>true</value> </property> <property> <name>mapred.child.ulimit</name> <value>unlimited</value> </property> </configuration>
----------------------------------------------------------------------------------------------------------------------------• Open the Core-Site.Xml
• and set the following param's
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/media/ephemeral0/tmp-${user.name}</value> </property> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>dfs.name.dir</name> <value>/media/ephemeral0/name-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>----------------------------------------------------------------------------------------------------------------------------
• Open the Masters file and set the following param'smaster
----------------------------------------------------------------------------------------------------------------------------• Open the Slaves file and set the following param's
slave1salve2salve3salve4salve5salve6salve7salve8salve9
salve10----------------------------------------------------------------------------------------------------------------------------
• Give owner Permision to the Ec2-user in all slaves for the folder /media(all folders which are all we using for hadoop).
----------------------------------------------------------------------------------------------------------------------------• from master copy full hadoop-1.0.4 to all slave
ex:- Scp -r /usr/local/hadoop-1.0.4 ec2-50-17-21-209.compute-1.amazonaws.com:/usr/local/hadoop-1.0.4
----------------------------------------------------------------------------------------------------------------------------• copy to all slaves from master.
----------------------------------------------------------------------------------------------------------------------------• Add port 50000-50100 in security groups in aws console.
Hadoop namenode -format from master and start-all.sh
----------------------------------------------------------------------------------------------------------------------------