mass hdfs: multi-agent spatial simulation hadoop...
TRANSCRIPT
MASS HDFS: Multi-Agent Spatial Simulation Hadoop
Distributed File System Yun-Ming Shih
Capstone Project Term Report I
Master of Science in Computer Science & Software Engineering
University of Washington
06/17/2017
Project Committee: Munehiro Fukuda, Committee Chair Michael Stiber, Committee Member
Johnny Lin, Committee Member
BackgroundAnincreasingamountofdataandprocessingneedsispushingdevelopmentofparallelizedbigdataanalysis.MostapproachesdealwithdatathathasasimplestructurelikeCSVandSQL.Sciencedataforclimateanalysishasacomplexstructure,whichisnotwellsupported.Toexpandtheuseofparallelizedbigdataanalysiswithinscientificrelatedfields,Prof.Fukudaandhisresearchgroupproposedamulti-agentbasedmethodthatcanprocessmulti-dimensionalNetCDFdata.TheydemonstrateditspracticalitybyincorporatingtheUniversityofWashingtonClimateAnalysis(UWCA)webapplication.ItusesNetCDFsoftwarewiththeParallel-ComputingLibraryforMulti-AgentSpatialSimulationinJava-MASSJavaLibrary.
The original version of MASS UWCA, implemented by JasonWoodring,hasonemasterserverthatreadsallthedatafromthestorage and sends them to the slave servers for processing. Theissuewiththisapplicationistheamountoftimespentinreadinglarge files (in this case, 22 GB). At the time, the slow readingperformancewas suspected to be the design of having only themaster server to read and transfer data to slaves intensively.However,theimprovementwasfarfrommeetingtheexpectationaftermanuallyduplicatingdata toeachof theslaveservers.ThissuggeststheissuemaycomefromtheimplementationofreadingdatafromtheserversitselftoPlacesforprocessing.
InAutumnof2015,aformerstudent,MichaelO’Keefe,proposedasolutiontoimprovetheUWCAperformancebyaddingaMASSParallelI/OtotheMASSJavalibrary.ParallelI/OistheMASSJavalayerthatallowsefficientfilereadingandwritingfromeveryslavenodetotheMASSPlaces.Thislayerdoesnothandlefiletransferfrommastertoslaves.Theimplementationmadeopening,reading,writing,andclosingfilespossibleateachslaveserver,withtheassumptionthatfilesexist.AlthoughtheideacomesfromimprovingUWCAreadperformance,Michael’sworkhasonlybeentestedwithMASSJava,andhaven’tbeenintegratedwithUWCA.Myproposedproject,MASSHDFS,focusesonhandlingstoringdataanddatatransfer.ThiswillbedoneusingHadoopDistributedFileSystem(HDFS).Inthefollowingsections,IwilldiscusstheliteraturereviewthatIhavedoneinchoosingthefilesystemandhowIincorporateHDFSwithMASSParallelI/O.
LiteratureReviewInthisphaseoftheproject,IexploredBigDataandHadoopliteraturereviewstohelpmeunderstandthetopicin-depthpriortomyHadoopsetupprocess.
BigDataBigdatacouldbefoundin3forms:Structured,unstructured,andsemi-structuredwhichcancontainbothforms(xmlfile).Thedataformatforstructureddataiswellknowninadvance.Likearelationaldatabase,itcanstoreandaccessanydata,andprocessitintheformoffixedformat.Theissuewithstructureddataisthatthesizegrowsverylarge(zettabyte=1billionterabytes).Unstructuredformofbigdatareferstoanydatawithanunknownform,orthestructureisclassifiedasunstructureddata.Googlesearch,documentprocessing,isanexampleofunstructuredbigdata.Whenthesizeislargeitisdifficulttoderivevalueoutofit.Asdescribed,allformsofbigdatahavetheFour-Vcharacteristics:
• Volume• Variety-heterogeneoussourcesandthenatureofdata,bothstructuredandunstructured• Velocity-speedofdatageneration(dataflowsinfrombusinessprocesses,applicationlogs,
networksandsocialmediasites,sensors,mobiledevices,etc.)• Variability-theinconsistencywhichcanbeshownbythedataattimes
Withbigdata,businessescanutilizeoutsideintelligencewhiletakingdecisions,customerservicehasimprovedovertime,itcanbeusedforearlyidentificationofrisktotheproduct/services,andforbetteroperationalefficiency.However,inthegeospatialdomainlikeclimateanalysis,datawouldusuallyhavehigherintensitystructureslikeNetCDF.Thesedata,withexponentialgrowthofdatarelationships,areproducedfromvarioussensorsthataredistributedovertheenvironmentthatrecordphysicalchanges.Then,dataaretobeaccessedandappliedwithscientificmodelsforsimulatingandpredictingthephenomena.Techniquesfordatastoring,real-timedataaccessing,andhandlingremainchallengingforbigdataanalysisinthisdomain.
Hadoop-HDFSHadoopisaframeworkthatenablesdistributedprocessingoflargedataacrossclustersofcommodityservers.Itiscomposedoffourcorecomponents:HadoopCommon,HDFS,MapReduce,andYARN.HadoopCommonisasetofutilitiesandlibrariesthatcanbeusedinothermodulesofHadooporotherprograms.Forexample,IamusingHadoopCommontoestablishtheconnectionbetweenMASSJavaParallelIOandHDFS.Theothercorecomponentsthatarepreviouslymentionedwillbeintroducedinthefollowingsections.HDFSArchitecture:InthisMASSHDFS,datastoringandtransferringishandledbyHDFS.ItisformedwithNameNode,DataNode,andSecondaryNameNode.ItoperatesonaMaster-Slavearchitecturemodelwithonenamenodeandmultipledatanodes.
o Namenodeismasterofcluster(UW1-320-03)! Storesmetadataandfiledirectory! Metadata
• Filename,Filesize,Numberofblocks,BlockIDs,User,Group,Permission,Replication,Blocksize,etc.
! MetadatastoredinRAManddisk(storesdataindiskincaseifnamenodefails,informationcanberecoveredfromthedisk)
! Namenodedoesn'tstoreactualdata(Datanodedoes)! Namenodeknowsthedatanodesareactiveordownoftheentirecluster
• Datanodessendaheartbeatevery3seconds• Namenodewaitsfor10mintodetermineifthedatanodeisoutofservice
o Datanodesareslaveservers(UW1-320-00,01,02,04,05,06,07)! Dataarestoredasblocks! Blocksizesareusuallyin128MB! Thedatagetsdividedfirst,thengetsstoredtodatanodesbasedonthereplication
factornumber• Lastblocksize<=blocksize
! Whyareblocksreplicated?(SeeBlockReplicaPolicysection)• Reliability
o Ifblock1indatanode1fails,youcanstillgetblock1fromdatanodes2and5.
o Ifdatanode1itselfisdown,thenthereplicainnode2,3,4,and5willmakemorereplicatotheavailablenodessothatthenumberofreplicastillmatchesthereplicationfactor
• DatanodeshavenoknowledgeoffilesinHDFS,theyonlyhaveknowledgeaboutblocks
• Datanodesscanallblocksondisksandgenerateablockreport–blockreporthasablockversionusedforappendingoperationo Blockreportshappenatstartupandperiodically
HDFSReadOperation:• StepstoreadafilefromHDFS:
o Clientmustcallopen()-ThiswillmakeanRPCcalltothenamenodetogettheblockidandlocationsforthefirstfewblocks.! Thereturnslistissortedbynetworkdistance
o Theclientwillthendirectlycontactthedatanodestorequesttransferringthequeryingblock.Ifallreadsfail,theclientwillcontactthenextclosestdatanode.Thesameprocessisrepeateduntilthewholefileistransferredblockbyblock.
o Thisprocessofrequestingaspecificblockfromonedatanodetoanotherisconcealedfromtheclientapplication.Theclientseesthisasacontinuousstreamofdata.
HDFSWriteOperation:• StepstowriteafiletoHDFS:
o Clientmustcallcreate()-ThiswillmakeanRPCcalltothenamenode.
! Namenodeensuresafiledoesnotalreadyexistandcheckstheclient’swritepermission.
! Clientasksthenamenodetoallocatethefileinblocks(128MB).Basedonthereplicationfactor,thenamenodereturnslistissortedbynetworkdistance.
o Theclientwillthendirectlyflushthedatatotheclosestdatanodein4Kpackets.Then,thatdatanodewillforwardthepackettoitsclosestdatanode,andsoon.Eachdatanodesendsanacknowledgemessagetoitsrequester.ThisishowtheloadisdistributedinanHDFScluster.
o Whenthenumberofreplicameetsthereplicationfactor,namenodewillupdatetheblocklocationmemory.ThesameprocedureisrepeateduntilallblocksarestoredinHDFS.Theclientcallsclose()tocompletewritingdatatoHDFS.
o Thisprocessofrequestingaspecificblockfromonedatanodetoanotherisconcealedfromtheclientapplication.Theclientseesthisasacontinuousstreamofdata.
BlockReplicaPolicy:Blockreplicaplacementpolicyisbasedonfactorsofreliability,availability,andnetworkbandwidthutilization.Supposewehavereplicationfactorof3,4racks,and4datanodes:Scenario1:WhendatawrittenfromoutsideworldtoHDFS–copydatatoHDFS:
o Adatanodeischosenrandomlytostorethefirstreplica.o Then,anodefromadifferentrackwillbechosentostorethesecondreplication.o Thethirdreplicawillbestoredinadifferentnodeofthesamerackwherethesecondreplica
is.o Thisway,ifonerackfails,youwillstillhaveanotherrackavailable.
Scenario2:Whendataiswrittenbysometaskinsidethecluster:o First,replicaisstoredonthedatanodewherethetaskexists.o Second,thethirdarestoredondifferentnodesofasamerack,butadifferentrackfromthe
rackofthefirstreplica.o Thisway,ifonerackfails,youwillstillhaveanotherrackavailable.
Trade-off:Ifthenumberofreplicaishigh,thenthesystemishighlyreliableandavailable.However,morenetworkbandwidthisutilizedandlesswritingefficiency(writeoperationisexpensivebecausewriteoperationneedsnetworkbandwidth).Ifthenumberofreplicaisless,thenthesystemisnothighlyreliableandavailable,butlessnetworkbandwidthisutilizedwhichwouldgiveabetterwriteperformanceforthesamereason.
WhathappenswhenaDataNodeisOutofService?If,forsomereason,datanode1isdown,itwillnotsendaheartbeattonamenode.Namenodewillwait10minfordatanode1tosenditsheartbeatandthendecideifdatanode1isoutofservice.Fortunately,blocksarestillavailableonothernodes,buttheclusterwillbeunder-replicated.Asaresult,namenodewilldotheschedulesjobtomakemorereplicatootherdatanodes.Then,thenewdatanodewillsendablockreporttothenamenodeandthenamenodewillupdateitsblocklocationmapping.
HadoopYarn:YARNisacompletelyrewrittenarchitectureofHadoopcluster.Itoffersclearadvantagesinscalability,efficiency,andflexibilitycomparedtotheclassicalMapReduceengineinthefirstversionofHadoop(MRv1).
Limitations:MRv1limitationsrelatedtoscalability,resourceutilization,andthesupportofworkloadsdifferentfromnewMapReduce.Jobexecutioncontrolledbytwotypesofprocesses:
• SinglemasterprocesscalledJobTracker-coordinatesalljobsrunningontheclusterandassignsmapandreducetaskstorunontheTaskTrackers.
• NumberofsubordinateprocessescalledTaskTrackers-runassignedtasksandperiodicallyreporttheprogresstotheJobTracker.
Issues:1. ScalabilitybottleneckiscausedbyhavingasingleJobTracker.Limitsarereachedwithacluster
of5000nodesand40000tasksrunningconcurrently.2. NeithersmallnorlargeHadoopclustershadusedtheircomputationalresourceswith
optimumefficiency.Theclusteradministratordividesthecomputationalresourcesoneachslavenodeintoafixednumberofmap/reduceslots.Evenwhenthereducetasksarenotrunning,nodescanonlyrunanumberofmaptasksuptothenumberofavailablemapslots,andviceversa.
3. HadoopwasdesignedtorunMapReducejobs.Thisincreasestheneedtosupportotherdataprocessingframeworksthatcouldrunonthesameclustertoshareresourcesinanefficientandfairmanner.
Addressingthescalabilityissue:JobTrackerresponsiblefor
1. ClusterresourcemanagementManagingcomputationalresourcesintheclusterinvolves:maintainingthelistoflivenodes,listofavailableandoccupiedmapandreduceslots,andallocatingavailableslotstoappropriatejobsandtasksaccordingtoselectedschedulingpolicy.
2. TaskcoordinationCoordinatingalltasksrunningonaclusterinvolves:instructingTaskTrackerstostartmapandreducetasks,monitoringtheexecutionofthetasks,restartingfailedtasks,speculativelyrunningslowtasks,calculatingtotalvaluesofjobcounters,andmore.
JobTrackerconstantlykeepstrackofthousandsofTaskTrackers,hundredsofjobs,andtensofthousandsofmap-and-reducetasks.Ontheotherhand,TaskTrackersusuallyrunonlyadozentasks.OnesolutionistoreducetheresponsibilitiesofthesingleJobTrackeranddelegatesomeofthemtotheTaskTrackerssincetherearemanyoftheminacluster.ThisisdonebyseparatingdualresponsibilitiesoftheJobTracker(clusterresourcemanagementandtaskcoordination)intotwodistincttypesofprocesses.YARNintroducesaclustermanagerthatisonlyresponsiblefortrackinglivenodesandtheavailableresourcesintheclusterandassigningthemtothetasks.Foreachjobsubmittedtothecluster,aTraskTrackerstartsadedicatedandshort-livedJobTrackertocontroltheexecutionofthetaskswithinthejob.Doingso,coordinationofajob'slifecycleisspreadacrossalltheavailablemachinesinthecluster.Morejobscanruninparallelandmorenodes/taskscanbedone,whichincreasesscalability.
Namechanges:• ResourceManagerinsteadofaclustermanager• ApplicationMasterinsteadofadedicatedandshort-livedJobTracker• NodeManagerinsteadofTaskTracker• AdistributedapplicationinsteadofaMapReducejob
ThisresearchwasrequiredtodeterminewhetherYARNcanbebeneficialtotheMASSHDFS.YARNisarewrittenarchitectureofHadoopcluster,bothsmallandlargeHadoopclustersgreatlybenefitfromit.ItssuitableforprogramslikeMapReducethatneedsdynamicresourceutilizationontheHadoopframework.MASSHDSFdoesnotuseYARN.InMapReduce,tasksgetsenttowheredataresidesforprocessing.However,MASSJavaprocessdoesnotdecidewhereitshouldgotoperformthetask,butinstead,whichdatatoretrieveforthetasktoperform.EachagentretrievesthedataandreadsthemintoPlacetoprocess.
HadoopSetupPhase: Date Worklog
Text File
4/9 • InstallHadoopandset-up4/10 • Developmentenvironmentset-up
4/11–4/13 • RunMichael’sParalleltestIO,wrotescriptsfordevelopmentuse
• SuccessfullyrunningMASSwith1node(noHDFS)4/14–4/15 • StuckonMASSinitandHadoop
• Addsecondarynamenode(UW1-320-09)resolveHadoopissue
• AlteropenTextFilemethodinPlaceclass4/17–4/18 • Set-updistributedenvironment
• Generateauthenticationkey• HavetroublerunningMASSonremote• Debugissuesandbindingissue• “Hadoopclassnotfound”issue
4/19–4/20 • RewriteprogramduetoMichael’scodecleanedup• TestedrewrittenMASSHDFS,failedonconnection
refused• BindissuealsooccurredwhenrunningParallelIO
4/23 – 4/24 • ReformatHadoopandtestHDFSoperations• TestMASSHDFS-failedonconnectionrefused• ReformatHadoopandtestHDFSoperations
4/25 • ReformatHadoopandstopcalling./sbin/stop-dfs.sh• Michael’scodeiscausingissuessostartonaseparate
project4/26 • ReformatHadoop
• CreatenewmavenHDFSclient;Can’trunonremoteduetomanifest.txt
4/27 • RecreatemavenHDFSclient;Can’trunonremoteduetomanifest.txt
• Createanon-mavenprojectMassHDFS.Adddependenciesmanually
• Issue:MassHDFSnotfindingfileinHDFS4/28 • WithProf.Fukuda,successfullysetupConfiguration
inMassHDFS• Issue:Can’tfindfilesbecausehdfshomeissettolocal
directory5/1 • RunMassHDFSusingHadoopcommandworks
• ModifyandrunMASSHDFSusingHadoopcommandalsoworks
NetCDF File
5/2 • StartonNetCDF• Issue:FailedonreadingNetCDF1000fromHDFS• Debug
5/4 – 5/18 • Debug• PullMichael’snewchanges• Issue:Log4jclassnotfoundissue• Issue:OutOfMemory–heapsizeissue• Issue:openForReadusingNetCDFAPI
5/22 • ReformatHDFStousernodeswith8replicationfactor
• SetJavaheapsizefromHadoop-envfile5/23 – 5/24 • Reformatforheapsizechange
• Issue:afterchangingheapsizeinHadoop-env,stilldoesn’twork.
• TestNetCDFonParallelIOwithoutMASSHDFS5/25 • CheckwithMichael–NetCDFworkswithhisbranch
ofcodesoproblemisfrommergingourcode• Createanewbranchandrewrite–everythingworks
5/26 • Smallbugfix• Cleanup• Test(pass)withNetCDF50,NetCDF100,andTxtfile
Evaluation
6/2 – 6/5 • Issue:Create10Gand50GdummytextfileonUWmachineandrunoutofmemory
• Issue:tryingwithMASS2nodesbutgetting~./ssh/id_rsaissue
• Issue:gettingauthenticationissues6/7 • Createmass_java_appl(MASSapplication)tomake
suretheissueisn’tfromMASS• Issue:stuckonschoolmachinesnotworking
6/10 – 6/16 • InstallHadoopondslabinsteadofshihy4–failedbecausefilesizetoolargeandcan’tlogbackin
• Issue:schoolmachinesconnectionissue–can’tloginfromhome
• Termreport• CreateWritefunction–notworking
HadoopInstallationHDFSusesmaster-slavearchitecturetoenableautomaticdatadistribution,andIcombineParallelIOwithHDFS,whichIcalledMASSHDFS,tohandlefilestoringandtransferring.Ideally,thenumberofMASSnodesshouldbethenumberofHDFSnodes.Inhdfs-site.xml,replicationfactorshouldbesettheequaltothenumberofHDFSnodessothateverynodeintheclusterpossessestheentirefile.Thisway,usingthesamenodesforMASSandHDFSreducesnetworkdelaysinceeveryMASSnodehasacopyofthefile.
Iamusinguw1-320-03forthemasternode.Server00,01,02,03,04,05,06,07aresetupasslavenodes,anduw1-320-09isthesecondarynamenode.
InHadoop-env.sh,Isettheheapsizeto20GtoavoidtheOutOfMemoryissuewhenusingMASSforNetCDFfiles.
TextFileTheoriginalTxtFileclassinParallelIOusesthefilesysteminterfacetoopenandreadfiles.IenabledHDFSfile-readusingtheHadoopclientcodeintheParallelIOTxtFile.TheintegratedParallelIO(MASSHDFS)candirectlyreadtextfilefromHDFStoMASSPlaceproperly.Thiscodeistestedusing1MASSnodeaswellas4and8HDFSnodes.
NetCDFFileInsteadofreadingthefileusingfilesysteminterface,NetCDFFileusestheNetCDFAPItoreadthefileintothelocalmachinethenreadsitintoMASSPlace.ToenableHDFSfile-read,IusedtheHDFScopyToLocalmethodtotransfertherequestedfilefromHDFStolocal.Then,ParallelIOreadsthefilefromlocaltoMASSPlaceasintheTxtFileclass.Thecodeistestedusing1MASSnodeaswellas4and8HDFSnodes.
AnotheroptionIhadwastochangetheNetCDFAPIimplementation.However,Prof.FukudaandIinspectedtheopensourcecodeanddecidedtoleaveitasafuturepossibleprojectforthetimebeing.
IssuesReformattingHadoopmultipletimesHDFShadtobereformattedformultiplereasons:UWserverconnectionsissues,reformatHadooptotestusingdifferentnumberofnodes,changingHadoopJavaheapsize,andmovingHadoopfrommypersonalschoolaccounttodslab.IencounteredseveralHadoopissueswithconnectionrefusaland
bindin.Atfirst,Ithoughttheissuewascausedbymyimplementation,butitturnedoutthatitisbecauseoftheschoolservers’instabilityandfrequentlycalling“./sbin/start-dfs.sh”and“./sbin/stop-dfs.sh”.Whenoneoftheserversgetsrebooted,itclearstheHDFSconfigurationinthetmpdirectory,whichrequiresreformatting.
ConnectionissuesThisissuecausedthemaindelayinmyproject.WhileworkinginTxtFile,IcouldnotgetMASStorunwithmultiplenodesbecauseofconnectionissues.ThisissuecausedaweekofdelayforbothMichaelandI.WesuspecttheissueeithercamefromMASSortheinstabilityofU-Drive,soweswitchedtodevelopmentusingonlyoneMASSnode.Now,IamattheendtheHadoopphasewherebothTxtFileandNetCDFFileworkwithoneMASSnodeandeightHadoopnodes.However,IamcurrentlystuckongettingittorunwithmultipleMASSnodesduetoauthenticationerror.Thisiscausingahugedelayformyevaluation.AlthoughIhavefollowedtheinstructionsandgeneratedtheauthenticationkeysmultipletimes,MASSstillcan’trunwithmultiplenodesonmypersonalaccount.AftermeetingupwithProf.Fukuda,itseemslikeotherMASSapplicationsarerunningcorrectlywithmultiplenodesusingthedslabaccount,Iwillbeswitchingtothedslabfortheevaluation.
DevelopmentIssuesMichaelandIwereworkinginparallelthroughoutAprilandMay.BecausemycodeextendsfromMichael’scode,Ihadtorewritemycodeseveraltimesbecauseofhischangesindesign,cleanup,andfixes.AfterMichaelfinishedhiswork,IranintoabugthatstoppedmefromtestingNetCDFforawhile.Thisissuecausedmeanotherweekofdelay.Icouldn’tfindwhattheproblemwas,butitwasworkingafterIcreatedanewbranchfromMichael’sdevelopmentandrewroteeverything.Thissuggeststheproblemmaycomefromnotresolvingamergeconflictcorrectly.
AnotherissueIhadwasnotbeingabletoconnecttotheHDFSclustercorrectly.Becauseofthisproblem,IwroteaseparateHDFSClientprogramandfoundthattheconnectionfailswhenrunningtheprogramusingJavacommand.HereisanexampleofmyseparateHDFSClientprogramanditsusage:
Afterresearchingovertheinternet,IlearnedthatthiscouldbeabugintheHadoopclientcodeasmanypeoplereportedfollowingtheHadoopinstructionsandwereabletoaddtheconfigurationsaswell.However,theHDFShomedirectorywasstillsettothelocalhomedirectory.Therefore,anyHDFScommandperformedresultsinfailuresincetheHDFSpathdoesn’texistonthelocalsystem.Tosolvethisissue,wedecidedtousetheHadoopcommand(./bin/hadoopjar<jarfile><args>)torunMASSJavainsteadof“java–jar”.
NextStepAsImentioned,IamtryingtotransfertheHadoopsetupfrommypersonalaccounttothedslabaccount.IfIcanrunMASSwithmultiplenodesusingthedslab,thenIwillconductmyperformanceevaluationover1,4,and8nodes.Otherwise,IwillhavetodiscusstheissuewiththeresearchgroupandfindoutwheretheissueresidesinMASS.Aftertheevaluation,Iwillstartmynextdevelopmentphase,SystemIntegration,tointegrateMASSHDFSwiththeUWClimateAnalysiswebapplication.