hadoop on aws amazon

Hadoop Cluster Configuration on AWS EC2

-----------------------------------------------------------------------------------------------------------Buy some Instances on Aws amazon and one master and 10 slaves

ec2-50-17-21-209.compute-1.amazonaws.com master ec2-54-242-251-124.compute-1.amazonaws.com slave1 ec2-23-23-17-15.compute-1.amazonaws.com slave2 ec2-50-19-79-241.compute-1.amazonaws.com slave3 ec2-50-16-49-229.compute-1.amazonaws.com slave4 ec2-174-129-99-84.compute-1.amazonaws.com slave5 ec2-50-16-105-188.compute-1.amazonaws.com slave6 ec2-174-129-92-105.compute-1.amazonaws.com slave7 ec2-54-242-20-144.compute-1.amazonaws.com slave8 ec2-54-243-24-10.compute-1.amazonaws.com slave9 ec2-204-236-205-227.compute-1.amazonaws.com slave10----------------------------------------------------------------------------------------------------------------------------

• Make seperation as one master and 10 slaves----------------------------------------------------------------------------------------------------------------------------

• Make sure ssh is working from master to all slaves----------------------------------------------------------------------------------------------------------------------------

• Add the Ip and DNS name and duplicate DNS name in /etc/hosts in master----------------------------------------------------------------------------------------------------------------------------

• Master /etc/hosts file Looks like this.

127.0.0.1 localhost localhost.localdomain 10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master 10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1 10.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave2 10.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave3 10.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave4 10.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave5 10.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave6 10.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave7 10.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave8 10.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave9 10.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10

----------------------------------------------------------------------------------------------------------------------------• and slaves etc/hosts file looks like this.

• remove 127.0.0.1 in all slaves

10.155.245.153 ec2-50-17-21-209.compute-1.amazonaws.com master 10.155.244.83 ec2-54-242-251-124.compute-1.amazonaws.com slave1 10.155.245.185 ec2-23-23-17-15.compute-1.amazonaws.com slave2 10.155.244.208 ec2-50-19-79-241.compute-1.amazonaws.com slave3 10.155.244.246 ec2-50-16-49-229.compute-1.amazonaws.com slave4

10.155.245.217 ec2-174-129-99-84.compute-1.amazonaws.com slave5 10.155.244.177 ec2-50-16-105-188.compute-1.amazonaws.com slave6 10.155.245.152 ec2-174-129-92-105.compute-1.amazonaws.com slave7 10.155.244.145 ec2-54-242-20-144.compute-1.amazonaws.com slave8 10.155.244.71 ec2-54-243-24-10.compute-1.amazonaws.com slave9 10.155.244.46 ec2-204-236-205-227.compute-1.amazonaws.com slave10

---------------------------------------------------------------------------------------------------------------------------• Download Hadoop installation folder from ApacheHadoop release and keep it in master folder

(Ex:-/usr/local/hadoop1.0.4)----------------------------------------------------------------------------------------------------------------------------

• Open the Hadoop.env.sh file from (Hadoop-.10.4/conf/) folder----------------------------------------------------------------------------------------------------------------------------

• set the environment variables for JAVA PATH,HADOOP_HOME, LD_LIBRARY_PATH, HADOOP_OPTS

export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64 export HADOOP_HOME=/usr/local/hadoop-1.0.4/ export LD_LIBRARY_PATH=/usr/local/hadoop-1.0.4/lib/native/Linux-amd64-64 export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"export HADOOP_HEAPSIZE=400000 (in MB)

----------------------------------------------------------------------------------------------------------------------------• Open the Hdfs-Site.xml file.

• and set the following param's<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<configuration> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/logs</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/media/ephemeral0/tmp-${user.name}

</value> </property> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>dfs.name.dir</name> <value>/media/ephemeral0/name-${user.name}</value> </property>

<property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> <property> <name>dfs.replication</name> <value>3</value> <description>Default block replication</description> </property>

<property> <name>dfs.block.size</name> <value>536870912</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property>

</configuration>

----------------------------------------------------------------------------------------------------------------------------• Open the Mapred-site.xml.

• and set the following param's

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/logs</value> </property> <property> <name>mapred.child.java.opts</name> <value>60000</value> </property>

<property> <name>dfs.datanode.max.xcievers</name> <value>-Xmx400m</value> </property>

<property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>14</value> </property>

<property> <name>mapred.tasktracker.reduce.tasks.maximum</name>

<value>14</value> </property> <property> <name>mapred.system.dir</name> <value>/media/ephemeral0/system-${user.name}</value> <description> system directory to run map and reduce tasks </description> </property> <property> <name>hadoop.log.dir</name> <value>/media/ephemeral0/log-${user.name}</value> </property>

<property> <name>mapred.job.tracker</name> <value>master:9001</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>10</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>

<property> <name> mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapred.create.symlink</name> <value>true</value> </property> <property> <name>mapred.child.ulimit</name> <value>unlimited</value> </property> </configuration>

----------------------------------------------------------------------------------------------------------------------------• Open the Core-Site.Xml

• and set the following param's

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/media/ephemeral0/tmp-${user.name}</value> </property> <property> <name>dfs.data.dir</name> <value>/media/ephemeral0/data-${user.name}</value> </property> <property> <name>dfs.name.dir</name> <value>/media/ephemeral0/name-${user.name}</value> </property> <property> <name>fs.default.name</name> <value>hdfs://master:9000</value> </property> </configuration>----------------------------------------------------------------------------------------------------------------------------

• Open the Masters file and set the following param'smaster

----------------------------------------------------------------------------------------------------------------------------• Open the Slaves file and set the following param's

slave1salve2salve3salve4salve5salve6salve7salve8salve9

salve10----------------------------------------------------------------------------------------------------------------------------

• Give owner Permision to the Ec2-user in all slaves for the folder /media(all folders which are all we using for hadoop).

----------------------------------------------------------------------------------------------------------------------------• from master copy full hadoop-1.0.4 to all slave

ex:- Scp -r /usr/local/hadoop-1.0.4 ec2-50-17-21-209.compute-1.amazonaws.com:/usr/local/hadoop-1.0.4

----------------------------------------------------------------------------------------------------------------------------• copy to all slaves from master.

----------------------------------------------------------------------------------------------------------------------------• Add port 50000-50100 in security groups in aws console.

Hadoop namenode -format from master and start-all.sh

----------------------------------------------------------------------------------------------------------------------------

hadoop on aws amazon

Documents