assignment6dbs-spring2015
DESCRIPTION
Question on hadoop just for beginners so they can start learning.TRANSCRIPT
-
CSE441:DATABASESYSTEMSAssignment:6
IntroductiontoHadoopandMapReduceDeadline:9:00PM,13thApril,2015
Forthisassignment,youwouldberunningaHadoopVirtualMachineonyoursystemandwritecodeforthefollowingproblems.
CodingLanguage:Python
VirtualMachineSetup:Downloading the VM
1. Downloaditfromhttp://content.udacitydata.com/courses/ud617/ClouderaUdacityTrainingVM4.1.1.c.zip.Warningthezippedfilesizeis1.7GB.IfyouareonaWindowsmachineyouwilllikelyneedtouseWinRARtoopenthis.zipfilebecauseothermethodsfailtoopentheunzippedfile(whichexceedsthemaximumspecified4GBfora.zipfile).
2. MD5sumfilecanbefoundherehttp://content.udacitydata.com/courses/ud617/ClouderaUdacityTrainingVM4.1.1.c.zip.md5
3. Unzipit.Warningtheunzippedsizeis4.2GB4. MD5hashesforfiles:
8a610c151d4b1ebdce11542d13dd2a53ClouderaTrainingVM4.1.1.c.log 6b44c965c1c6062554bf4cc12d11e87eClouderaTrainingVM4.1.1.c.plist 46dedeba3e0affd8311431d7e370705eClouderaTrainingVM4.1.1.c.vmdk d41d8cd98f00b204e9800998ecf8427eClouderaTrainingVM4.1.1.c.vmsd 096956c1cbabeaa652ca63a2d5e14612ClouderaTrainingVM4.1.1.c.vmx c9f8a375e82ef1e9d96097850e237df9ClouderaTrainingVM4.1.1.c.vmxf 0d7c8becb5a515068e81bb303c794e4fnvram
Using Oracle VirtualBox1. DownloadandinstallVirtualBoxfromhttps://www.virtualbox.org/wiki/Downloads2. CreateanewVirtualmachine:
a. CreateanewvirtualmachinebypressingtheNewbutton:
b. Chooseaname,useType:Linux:
-
c. PressNextd. SelectmemorysizefortheVM.
e. PressNextf. SelectUseanexistingvirtualharddrivefile,clickthebuttontobrowsetothe
directoryyouunzippedtheprovidedVMimageandpressCreate.
g. StarttheVM!
-
Using VMWare1. Downloadandinstallfrom
https://my.vmware.com/web/vmware/free#desktop_end_user_computing/vmware_player/6_0
2. CreatetheVirtualMachine:a. ClickonOpenaVirtualMachineand,whenprompted,navigatetothefolderyou
unzippedtheVM,choosethefileandclickOpen.
-
b. SelectthemachineandclickPlayvirtualmachine
DatasetDownload:DatasetfortheproblemisadatasetonAirportswhichcanbedownloadedfromthecoursespageundertheresourcestab.
Problem1:WriteMapperandReducertogetthenumberofAirportsby:
1. Country2. Type
Problem2:WriteMapperandReducertofindthe
1. Country2. Region
havingthehighestnumberofairports
NOTE:Forboththeproblemsandeachpart,writeseparateMappersandReducersanddontmixtheproblem.
Resources:1. Unit2andUnit3fromthisonlinecourse(~12hours).Unit1and4
arenotneeded.https://www.udacity.com/course/ud6172. Chapter6shouldsufficewhichisalsofreetodownload.
http://go.cloudera.com/udacitylesson2
-
Deliverables/UploadFormat:RollNo/ProblemNumber
Mapper.pyReducer.py
YoucanuploadthecodefromtheVirtualMachineitself.