tutorial: to run the mapreduce eemd code with hadoop on futuregrid

Tutorial: To run the MapReduce EEMD code with Hadoop on

Futuregrid

-by Rewati Ovalekar

2

● Step 1:– Code is available on:

http://code.google.com/p/cyberaide/– Download the code from:

http://code.google.com/p/cyberaide/source/browse/#svn%2Ftrunk%2Fproject%2Fspring2011%2FEEMDAnalysis%2FEEMDJava

http://code.google.com/p/cyberaide/

3

● Step 2:– Create a futuregrid account– For further details refer:

https://portal.futuregrid.org/tutorials (FutureGrid Tutorial)

4

● Step 3:– Login to Futuregrid– ssh [email protected]– Following message will be displayed for successful

login

5

● Step 4:– Create a jar file

● Step 5:– To transfer the jar file and the input file:– sftp [email protected]

– put /../filepath

6

● Step 6:– In order to run Hadoop on FutureGrid create an

eucalyptus account– For further details refer:

https://portal.futuregrid.org/tutorials/eucalyptus

● Step 7:– Once the account is approved, load the eucalyptus

tools :

Module load euca2ools

7

● Step 8:– Make sure that the jar file and the input file are in the

same directory as the username.private key– Run the image which has hadoop on it:

euca-run-instances -k rovaleka -t c1.xlarge emi-D778156D

-k indicates the key name

-t indicates the type of instance

emi-D778156D indicates the image name

-n indicates the number of clusters to run

8

● Step 8:– Check the status using:– euca-describe-instances– Keep checking till the status is running, once the

status is running one can login to run the Hadoop. It will be displayed as below:

9

● Step 9:– Transfer the input file and the jar file to the required

VM using:

scp –i username.private filename [email protected]:/

(Make sure that the address is same as the address assigned to you else it will ask for password)

– Login using:

scp –i username.private [email protected] (Make sure the address is same)

10

SINGLE NODE

● Step 10:– Above message will be displayed for successful login– Retrieve the transferred files and transfer it in the Hadoop folder:

cd /..

mv filename /opt/hadoop-0.20.2

cd /opt/hadoop-0.20.2

11

● Step 11:– To run Hadoop:

cd /opt/hadoop-0.20.2

bin/start-all.sh– To check if everything is started:

jps

12

● Step 12:– Transfer the input file on the HDFS:

bin/hadoop dfs –copyFromLocal inputfile name_in_HDFS

– To check if it is present on HDFS:

bin/hadoop dfs –ls

NOTE: We need to transfer the input file whenever we start Hadoop

13

● Step 13:– To run the code:

bin/hadoop jar [jarFile] EEMDHadoop [inputfilename] [required_output_file]

14

● Step 14:– Retrieve the output :

bin/hadoop dfs -copyToLocal [outputFileName] [outputfileNameToBeGiven]

(output will be avaliable in part-00000 file)

To check the logs and to debug the code go to folder logs/userlogs

15

● Step 15:– Stop the Hadoop:

bin/stop-all.sh

exit

16

Thank you!!!

tutorial: to run the mapreduce eemd code with hadoop on futuregrid

Documents