mass hdfs: multi-agent spatial simulation hadoop...

MASS HDFS: Multi-Agent Spatial Simulation Hadoop

Distributed File System Yun-Ming Shih

Capstone Project Term Report I

Master of Science in Computer Science & Software Engineering

University of Washington

06/17/2017

Project Committee: Munehiro Fukuda, Committee Chair Michael Stiber, Committee Member

Johnny Lin, Committee Member

BackgroundAnincreasingamountofdataandprocessingneedsispushingdevelopmentofparallelizedbigdataanalysis.MostapproachesdealwithdatathathasasimplestructurelikeCSVandSQL.Sciencedataforclimateanalysishasacomplexstructure,whichisnotwellsupported.Toexpandtheuseofparallelizedbigdataanalysiswithinscientificrelatedfields,Prof.Fukudaandhisresearchgroupproposedamulti-agentbasedmethodthatcanprocessmulti-dimensionalNetCDFdata.TheydemonstrateditspracticalitybyincorporatingtheUniversityofWashingtonClimateAnalysis(UWCA)webapplication.ItusesNetCDFsoftwarewiththeParallel-ComputingLibraryforMulti-AgentSpatialSimulationinJava-MASSJavaLibrary.

The original version of MASS UWCA, implemented by JasonWoodring,hasonemasterserverthatreadsallthedatafromthestorage and sends them to the slave servers for processing. Theissuewiththisapplicationistheamountoftimespentinreadinglarge files (in this case, 22 GB). At the time, the slow readingperformancewas suspected to be the design of having only themaster server to read and transfer data to slaves intensively.However,theimprovementwasfarfrommeetingtheexpectationaftermanuallyduplicatingdata toeachof theslaveservers.ThissuggeststheissuemaycomefromtheimplementationofreadingdatafromtheserversitselftoPlacesforprocessing.

InAutumnof2015,aformerstudent,MichaelO’Keefe,proposedasolutiontoimprovetheUWCAperformancebyaddingaMASSParallelI/OtotheMASSJavalibrary.ParallelI/OistheMASSJavalayerthatallowsefficientfilereadingandwritingfromeveryslavenodetotheMASSPlaces.Thislayerdoesnothandlefiletransferfrommastertoslaves.Theimplementationmadeopening,reading,writing,andclosingfilespossibleateachslaveserver,withtheassumptionthatfilesexist.AlthoughtheideacomesfromimprovingUWCAreadperformance,Michael’sworkhasonlybeentestedwithMASSJava,andhaven’tbeenintegratedwithUWCA.Myproposedproject,MASSHDFS,focusesonhandlingstoringdataanddatatransfer.ThiswillbedoneusingHadoopDistributedFileSystem(HDFS).Inthefollowingsections,IwilldiscusstheliteraturereviewthatIhavedoneinchoosingthefilesystemandhowIincorporateHDFSwithMASSParallelI/O.

LiteratureReviewInthisphaseoftheproject,IexploredBigDataandHadoopliteraturereviewstohelpmeunderstandthetopicin-depthpriortomyHadoopsetupprocess.

BigDataBigdatacouldbefoundin3forms:Structured,unstructured,andsemi-structuredwhichcancontainbothforms(xmlfile).Thedataformatforstructureddataiswellknowninadvance.Likearelationaldatabase,itcanstoreandaccessanydata,andprocessitintheformoffixedformat.Theissuewithstructureddataisthatthesizegrowsverylarge(zettabyte=1billionterabytes).Unstructuredformofbigdatareferstoanydatawithanunknownform,orthestructureisclassifiedasunstructureddata.Googlesearch,documentprocessing,isanexampleofunstructuredbigdata.Whenthesizeislargeitisdifficulttoderivevalueoutofit.Asdescribed,allformsofbigdatahavetheFour-Vcharacteristics:

• Volume• Variety-heterogeneoussourcesandthenatureofdata,bothstructuredandunstructured• Velocity-speedofdatageneration(dataflowsinfrombusinessprocesses,applicationlogs,

networksandsocialmediasites,sensors,mobiledevices,etc.)• Variability-theinconsistencywhichcanbeshownbythedataattimes

Withbigdata,businessescanutilizeoutsideintelligencewhiletakingdecisions,customerservicehasimprovedovertime,itcanbeusedforearlyidentificationofrisktotheproduct/services,andforbetteroperationalefficiency.However,inthegeospatialdomainlikeclimateanalysis,datawouldusuallyhavehigherintensitystructureslikeNetCDF.Thesedata,withexponentialgrowthofdatarelationships,areproducedfromvarioussensorsthataredistributedovertheenvironmentthatrecordphysicalchanges.Then,dataaretobeaccessedandappliedwithscientificmodelsforsimulatingandpredictingthephenomena.Techniquesfordatastoring,real-timedataaccessing,andhandlingremainchallengingforbigdataanalysisinthisdomain.

Hadoop-HDFSHadoopisaframeworkthatenablesdistributedprocessingoflargedataacrossclustersofcommodityservers.Itiscomposedoffourcorecomponents:HadoopCommon,HDFS,MapReduce,andYARN.HadoopCommonisasetofutilitiesandlibrariesthatcanbeusedinothermodulesofHadooporotherprograms.Forexample,IamusingHadoopCommontoestablishtheconnectionbetweenMASSJavaParallelIOandHDFS.Theothercorecomponentsthatarepreviouslymentionedwillbeintroducedinthefollowingsections.HDFSArchitecture:InthisMASSHDFS,datastoringandtransferringishandledbyHDFS.ItisformedwithNameNode,DataNode,andSecondaryNameNode.ItoperatesonaMaster-Slavearchitecturemodelwithonenamenodeandmultipledatanodes.

o Namenodeismasterofcluster(UW1-320-03)! Storesmetadataandfiledirectory! Metadata

• Filename,Filesize,Numberofblocks,BlockIDs,User,Group,Permission,Replication,Blocksize,etc.

! MetadatastoredinRAManddisk(storesdataindiskincaseifnamenodefails,informationcanberecoveredfromthedisk)

! Namenodedoesn'tstoreactualdata(Datanodedoes)! Namenodeknowsthedatanodesareactiveordownoftheentirecluster

• Datanodessendaheartbeatevery3seconds• Namenodewaitsfor10mintodetermineifthedatanodeisoutofservice

o Datanodesareslaveservers(UW1-320-00,01,02,04,05,06,07)! Dataarestoredasblocks! Blocksizesareusuallyin128MB! Thedatagetsdividedfirst,thengetsstoredtodatanodesbasedonthereplication

factornumber• Lastblocksize<=blocksize

! Whyareblocksreplicated?(SeeBlockReplicaPolicysection)• Reliability

o Ifblock1indatanode1fails,youcanstillgetblock1fromdatanodes2and5.

o Ifdatanode1itselfisdown,thenthereplicainnode2,3,4,and5willmakemorereplicatotheavailablenodessothatthenumberofreplicastillmatchesthereplicationfactor

• DatanodeshavenoknowledgeoffilesinHDFS,theyonlyhaveknowledgeaboutblocks

• Datanodesscanallblocksondisksandgenerateablockreport–blockreporthasablockversionusedforappendingoperationo Blockreportshappenatstartupandperiodically

HDFSReadOperation:• StepstoreadafilefromHDFS:

o Clientmustcallopen()-ThiswillmakeanRPCcalltothenamenodetogettheblockidandlocationsforthefirstfewblocks.! Thereturnslistissortedbynetworkdistance

o Theclientwillthendirectlycontactthedatanodestorequesttransferringthequeryingblock.Ifallreadsfail,theclientwillcontactthenextclosestdatanode.Thesameprocessisrepeateduntilthewholefileistransferredblockbyblock.

o Thisprocessofrequestingaspecificblockfromonedatanodetoanotherisconcealedfromtheclientapplication.Theclientseesthisasacontinuousstreamofdata.

HDFSWriteOperation:• StepstowriteafiletoHDFS:

o Clientmustcallcreate()-ThiswillmakeanRPCcalltothenamenode.

! Namenodeensuresafiledoesnotalreadyexistandcheckstheclient’swritepermission.

! Clientasksthenamenodetoallocatethefileinblocks(128MB).Basedonthereplicationfactor,thenamenodereturnslistissortedbynetworkdistance.

o Theclientwillthendirectlyflushthedatatotheclosestdatanodein4Kpackets.Then,thatdatanodewillforwardthepackettoitsclosestdatanode,andsoon.Eachdatanodesendsanacknowledgemessagetoitsrequester.ThisishowtheloadisdistributedinanHDFScluster.

o Whenthenumberofreplicameetsthereplicationfactor,namenodewillupdatetheblocklocationmemory.ThesameprocedureisrepeateduntilallblocksarestoredinHDFS.Theclientcallsclose()tocompletewritingdatatoHDFS.

o Thisprocessofrequestingaspecificblockfromonedatanodetoanotherisconcealedfromtheclientapplication.Theclientseesthisasacontinuousstreamofdata.

BlockReplicaPolicy:Blockreplicaplacementpolicyisbasedonfactorsofreliability,availability,andnetworkbandwidthutilization.Supposewehavereplicationfactorof3,4racks,and4datanodes:Scenario1:WhendatawrittenfromoutsideworldtoHDFS–copydatatoHDFS:

o Adatanodeischosenrandomlytostorethefirstreplica.o Then,anodefromadifferentrackwillbechosentostorethesecondreplication.o Thethirdreplicawillbestoredinadifferentnodeofthesamerackwherethesecondreplica

is.o Thisway,ifonerackfails,youwillstillhaveanotherrackavailable.

Scenario2:Whendataiswrittenbysometaskinsidethecluster:o First,replicaisstoredonthedatanodewherethetaskexists.o Second,thethirdarestoredondifferentnodesofasamerack,butadifferentrackfromthe

rackofthefirstreplica.o Thisway,ifonerackfails,youwillstillhaveanotherrackavailable.

Trade-off:Ifthenumberofreplicaishigh,thenthesystemishighlyreliableandavailable.However,morenetworkbandwidthisutilizedandlesswritingefficiency(writeoperationisexpensivebecausewriteoperationneedsnetworkbandwidth).Ifthenumberofreplicaisless,thenthesystemisnothighlyreliableandavailable,butlessnetworkbandwidthisutilizedwhichwouldgiveabetterwriteperformanceforthesamereason.

WhathappenswhenaDataNodeisOutofService?If,forsomereason,datanode1isdown,itwillnotsendaheartbeattonamenode.Namenodewillwait10minfordatanode1tosenditsheartbeatandthendecideifdatanode1isoutofservice.Fortunately,blocksarestillavailableonothernodes,buttheclusterwillbeunder-replicated.Asaresult,namenodewilldotheschedulesjobtomakemorereplicatootherdatanodes.Then,thenewdatanodewillsendablockreporttothenamenodeandthenamenodewillupdateitsblocklocationmapping.

HadoopYarn:YARNisacompletelyrewrittenarchitectureofHadoopcluster.Itoffersclearadvantagesinscalability,efficiency,andflexibilitycomparedtotheclassicalMapReduceengineinthefirstversionofHadoop(MRv1).

Limitations:MRv1limitationsrelatedtoscalability,resourceutilization,andthesupportofworkloadsdifferentfromnewMapReduce.Jobexecutioncontrolledbytwotypesofprocesses:

• SinglemasterprocesscalledJobTracker-coordinatesalljobsrunningontheclusterandassignsmapandreducetaskstorunontheTaskTrackers.

• NumberofsubordinateprocessescalledTaskTrackers-runassignedtasksandperiodicallyreporttheprogresstotheJobTracker.

Issues:1. ScalabilitybottleneckiscausedbyhavingasingleJobTracker.Limitsarereachedwithacluster

of5000nodesand40000tasksrunningconcurrently.2. NeithersmallnorlargeHadoopclustershadusedtheircomputationalresourceswith

optimumefficiency.Theclusteradministratordividesthecomputationalresourcesoneachslavenodeintoafixednumberofmap/reduceslots.Evenwhenthereducetasksarenotrunning,nodescanonlyrunanumberofmaptasksuptothenumberofavailablemapslots,andviceversa.

3. HadoopwasdesignedtorunMapReducejobs.Thisincreasestheneedtosupportotherdataprocessingframeworksthatcouldrunonthesameclustertoshareresourcesinanefficientandfairmanner.

Addressingthescalabilityissue:JobTrackerresponsiblefor

1. ClusterresourcemanagementManagingcomputationalresourcesintheclusterinvolves:maintainingthelistoflivenodes,listofavailableandoccupiedmapandreduceslots,andallocatingavailableslotstoappropriatejobsandtasksaccordingtoselectedschedulingpolicy.

2. TaskcoordinationCoordinatingalltasksrunningonaclusterinvolves:instructingTaskTrackerstostartmapandreducetasks,monitoringtheexecutionofthetasks,restartingfailedtasks,speculativelyrunningslowtasks,calculatingtotalvaluesofjobcounters,andmore.

JobTrackerconstantlykeepstrackofthousandsofTaskTrackers,hundredsofjobs,andtensofthousandsofmap-and-reducetasks.Ontheotherhand,TaskTrackersusuallyrunonlyadozentasks.OnesolutionistoreducetheresponsibilitiesofthesingleJobTrackeranddelegatesomeofthemtotheTaskTrackerssincetherearemanyoftheminacluster.ThisisdonebyseparatingdualresponsibilitiesoftheJobTracker(clusterresourcemanagementandtaskcoordination)intotwodistincttypesofprocesses.YARNintroducesaclustermanagerthatisonlyresponsiblefortrackinglivenodesandtheavailableresourcesintheclusterandassigningthemtothetasks.Foreachjobsubmittedtothecluster,aTraskTrackerstartsadedicatedandshort-livedJobTrackertocontroltheexecutionofthetaskswithinthejob.Doingso,coordinationofajob'slifecycleisspreadacrossalltheavailablemachinesinthecluster.Morejobscanruninparallelandmorenodes/taskscanbedone,whichincreasesscalability.

Namechanges:• ResourceManagerinsteadofaclustermanager• ApplicationMasterinsteadofadedicatedandshort-livedJobTracker• NodeManagerinsteadofTaskTracker• AdistributedapplicationinsteadofaMapReducejob

ThisresearchwasrequiredtodeterminewhetherYARNcanbebeneficialtotheMASSHDFS.YARNisarewrittenarchitectureofHadoopcluster,bothsmallandlargeHadoopclustersgreatlybenefitfromit.ItssuitableforprogramslikeMapReducethatneedsdynamicresourceutilizationontheHadoopframework.MASSHDSFdoesnotuseYARN.InMapReduce,tasksgetsenttowheredataresidesforprocessing.However,MASSJavaprocessdoesnotdecidewhereitshouldgotoperformthetask,butinstead,whichdatatoretrieveforthetasktoperform.EachagentretrievesthedataandreadsthemintoPlacetoprocess.

HadoopSetupPhase: Date Worklog

Text File

4/9 • InstallHadoopandset-up4/10 • Developmentenvironmentset-up

4/11–4/13 • RunMichael’sParalleltestIO,wrotescriptsfordevelopmentuse

• SuccessfullyrunningMASSwith1node(noHDFS)4/14–4/15 • StuckonMASSinitandHadoop

• Addsecondarynamenode(UW1-320-09)resolveHadoopissue

• AlteropenTextFilemethodinPlaceclass4/17–4/18 • Set-updistributedenvironment

• Generateauthenticationkey• HavetroublerunningMASSonremote• Debugissuesandbindingissue• “Hadoopclassnotfound”issue

4/19–4/20 • RewriteprogramduetoMichael’scodecleanedup• TestedrewrittenMASSHDFS,failedonconnection

refused• BindissuealsooccurredwhenrunningParallelIO

4/23 – 4/24 • ReformatHadoopandtestHDFSoperations• TestMASSHDFS-failedonconnectionrefused• ReformatHadoopandtestHDFSoperations

4/25 • ReformatHadoopandstopcalling./sbin/stop-dfs.sh• Michael’scodeiscausingissuessostartonaseparate

project4/26 • ReformatHadoop

• CreatenewmavenHDFSclient;Can’trunonremoteduetomanifest.txt

4/27 • RecreatemavenHDFSclient;Can’trunonremoteduetomanifest.txt

• Createanon-mavenprojectMassHDFS.Adddependenciesmanually

• Issue:MassHDFSnotfindingfileinHDFS4/28 • WithProf.Fukuda,successfullysetupConfiguration

inMassHDFS• Issue:Can’tfindfilesbecausehdfshomeissettolocal

directory5/1 • RunMassHDFSusingHadoopcommandworks

• ModifyandrunMASSHDFSusingHadoopcommandalsoworks

NetCDF File

5/2 • StartonNetCDF• Issue:FailedonreadingNetCDF1000fromHDFS• Debug

5/4 – 5/18 • Debug• PullMichael’snewchanges• Issue:Log4jclassnotfoundissue• Issue:OutOfMemory–heapsizeissue• Issue:openForReadusingNetCDFAPI

5/22 • ReformatHDFStousernodeswith8replicationfactor

• SetJavaheapsizefromHadoop-envfile5/23 – 5/24 • Reformatforheapsizechange

• Issue:afterchangingheapsizeinHadoop-env,stilldoesn’twork.

• TestNetCDFonParallelIOwithoutMASSHDFS5/25 • CheckwithMichael–NetCDFworkswithhisbranch

ofcodesoproblemisfrommergingourcode• Createanewbranchandrewrite–everythingworks

5/26 • Smallbugfix• Cleanup• Test(pass)withNetCDF50,NetCDF100,andTxtfile

Evaluation

6/2 – 6/5 • Issue:Create10Gand50GdummytextfileonUWmachineandrunoutofmemory

• Issue:tryingwithMASS2nodesbutgetting~./ssh/id_rsaissue

• Issue:gettingauthenticationissues6/7 • Createmass_java_appl(MASSapplication)tomake

suretheissueisn’tfromMASS• Issue:stuckonschoolmachinesnotworking

6/10 – 6/16 • InstallHadoopondslabinsteadofshihy4–failedbecausefilesizetoolargeandcan’tlogbackin

• Issue:schoolmachinesconnectionissue–can’tloginfromhome

• Termreport• CreateWritefunction–notworking

HadoopInstallationHDFSusesmaster-slavearchitecturetoenableautomaticdatadistribution,andIcombineParallelIOwithHDFS,whichIcalledMASSHDFS,tohandlefilestoringandtransferring.Ideally,thenumberofMASSnodesshouldbethenumberofHDFSnodes.Inhdfs-site.xml,replicationfactorshouldbesettheequaltothenumberofHDFSnodessothateverynodeintheclusterpossessestheentirefile.Thisway,usingthesamenodesforMASSandHDFSreducesnetworkdelaysinceeveryMASSnodehasacopyofthefile.

Iamusinguw1-320-03forthemasternode.Server00,01,02,03,04,05,06,07aresetupasslavenodes,anduw1-320-09isthesecondarynamenode.

InHadoop-env.sh,Isettheheapsizeto20GtoavoidtheOutOfMemoryissuewhenusingMASSforNetCDFfiles.

TextFileTheoriginalTxtFileclassinParallelIOusesthefilesysteminterfacetoopenandreadfiles.IenabledHDFSfile-readusingtheHadoopclientcodeintheParallelIOTxtFile.TheintegratedParallelIO(MASSHDFS)candirectlyreadtextfilefromHDFStoMASSPlaceproperly.Thiscodeistestedusing1MASSnodeaswellas4and8HDFSnodes.

NetCDFFileInsteadofreadingthefileusingfilesysteminterface,NetCDFFileusestheNetCDFAPItoreadthefileintothelocalmachinethenreadsitintoMASSPlace.ToenableHDFSfile-read,IusedtheHDFScopyToLocalmethodtotransfertherequestedfilefromHDFStolocal.Then,ParallelIOreadsthefilefromlocaltoMASSPlaceasintheTxtFileclass.Thecodeistestedusing1MASSnodeaswellas4and8HDFSnodes.

AnotheroptionIhadwastochangetheNetCDFAPIimplementation.However,Prof.FukudaandIinspectedtheopensourcecodeanddecidedtoleaveitasafuturepossibleprojectforthetimebeing.

IssuesReformattingHadoopmultipletimesHDFShadtobereformattedformultiplereasons:UWserverconnectionsissues,reformatHadooptotestusingdifferentnumberofnodes,changingHadoopJavaheapsize,andmovingHadoopfrommypersonalschoolaccounttodslab.IencounteredseveralHadoopissueswithconnectionrefusaland

bindin.Atfirst,Ithoughttheissuewascausedbymyimplementation,butitturnedoutthatitisbecauseoftheschoolservers’instabilityandfrequentlycalling“./sbin/start-dfs.sh”and“./sbin/stop-dfs.sh”.Whenoneoftheserversgetsrebooted,itclearstheHDFSconfigurationinthetmpdirectory,whichrequiresreformatting.

ConnectionissuesThisissuecausedthemaindelayinmyproject.WhileworkinginTxtFile,IcouldnotgetMASStorunwithmultiplenodesbecauseofconnectionissues.ThisissuecausedaweekofdelayforbothMichaelandI.WesuspecttheissueeithercamefromMASSortheinstabilityofU-Drive,soweswitchedtodevelopmentusingonlyoneMASSnode.Now,IamattheendtheHadoopphasewherebothTxtFileandNetCDFFileworkwithoneMASSnodeandeightHadoopnodes.However,IamcurrentlystuckongettingittorunwithmultipleMASSnodesduetoauthenticationerror.Thisiscausingahugedelayformyevaluation.AlthoughIhavefollowedtheinstructionsandgeneratedtheauthenticationkeysmultipletimes,MASSstillcan’trunwithmultiplenodesonmypersonalaccount.AftermeetingupwithProf.Fukuda,itseemslikeotherMASSapplicationsarerunningcorrectlywithmultiplenodesusingthedslabaccount,Iwillbeswitchingtothedslabfortheevaluation.

DevelopmentIssuesMichaelandIwereworkinginparallelthroughoutAprilandMay.BecausemycodeextendsfromMichael’scode,Ihadtorewritemycodeseveraltimesbecauseofhischangesindesign,cleanup,andfixes.AfterMichaelfinishedhiswork,IranintoabugthatstoppedmefromtestingNetCDFforawhile.Thisissuecausedmeanotherweekofdelay.Icouldn’tfindwhattheproblemwas,butitwasworkingafterIcreatedanewbranchfromMichael’sdevelopmentandrewroteeverything.Thissuggeststheproblemmaycomefromnotresolvingamergeconflictcorrectly.

AnotherissueIhadwasnotbeingabletoconnecttotheHDFSclustercorrectly.Becauseofthisproblem,IwroteaseparateHDFSClientprogramandfoundthattheconnectionfailswhenrunningtheprogramusingJavacommand.HereisanexampleofmyseparateHDFSClientprogramanditsusage:

Afterresearchingovertheinternet,IlearnedthatthiscouldbeabugintheHadoopclientcodeasmanypeoplereportedfollowingtheHadoopinstructionsandwereabletoaddtheconfigurationsaswell.However,theHDFShomedirectorywasstillsettothelocalhomedirectory.Therefore,anyHDFScommandperformedresultsinfailuresincetheHDFSpathdoesn’texistonthelocalsystem.Tosolvethisissue,wedecidedtousetheHadoopcommand(./bin/hadoopjar<jarfile><args>)torunMASSJavainsteadof“java–jar”.

NextStepAsImentioned,IamtryingtotransfertheHadoopsetupfrommypersonalaccounttothedslabaccount.IfIcanrunMASSwithmultiplenodesusingthedslab,thenIwillconductmyperformanceevaluationover1,4,and8nodes.Otherwise,IwillhavetodiscusstheissuewiththeresearchgroupandfindoutwheretheissueresidesinMASS.Aftertheevaluation,Iwillstartmynextdevelopmentphase,SystemIntegration,tointegrateMASSHDFSwiththeUWClimateAnalysiswebapplication.

mass hdfs: multi-agent spatial simulation hadoop...

Documents