yarn essentials - storage.ey.md related/pdfs and... · operating hadoop and yarn clusters starting...
TRANSCRIPT
![Page 1: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/1.jpg)
![Page 2: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/2.jpg)
![Page 3: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/3.jpg)
YARNEssentials
![Page 4: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/4.jpg)
TableofContents
YARNEssentials
Credits
AbouttheAuthors
AbouttheReviewers
www.PacktPub.com
Supportfiles,eBooks,discountoffers,andmore
Whysubscribe?
FreeaccessforPacktaccountholders
Preface
Whatthisbookcovers
Whatyouneedforthisbook
Whothisbookisfor
Conventions
Readerfeedback
Customersupport
Downloadingtheexamplecode
Errata
Piracy
Questions
1.NeedforYARN
Theredesignidea
LimitationsoftheclassicalMapReduceorHadoop1.x
YARNasthemodernoperatingsystemofHadoop
WhatarethedesigngoalsforYARN
Summary
2.YARNArchitecture
CorecomponentsofYARNarchitecture
ResourceManager
ApplicationMaster(AM)
![Page 5: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/5.jpg)
NodeManager(NM)
YARNschedulerpolicies
TheFIFO(FirstInFirstOut)scheduler
Thefairscheduler
Thecapacityscheduler
RecentdevelopmentsinYARNarchitecture
Summary
3.YARNInstallation
Single-nodeinstallation
Prerequisites
Platform
Software
Startingwiththeinstallation
Thestandalonemode(localmode)
Thepseudo-distributedmode
Thefully-distributedmode
HistoryServer
Slavefiles
OperatingHadoopandYARNclusters
StartingHadoopandYARNclusters
StoppingHadoopandYARNclusters
WebinterfacesoftheEcosystem
Summary
4.YARNandHadoopEcosystems
TheHadoop2release
AshortintroductiontoHadoop1.xandMRv1
MRv1versusMRv2
UnderstandingwhereYARNfitsintoHadoop
OldandnewMapReduceAPIs
BackwardcompatibilityofMRv2APIs
Binarycompatibilityoforg.apache.hadoop.mapredAPIs
![Page 6: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/6.jpg)
Sourcecompatibilityoforg.apache.hadoop.mapredAPIs
PracticalexamplesofMRv1andMRv2
Preparingtheinputfile(s)
Runningthejob
Result
Summary
5.YARNAdministration
Containerallocation
Containerallocationtotheapplication
Containerconfigurations
YARNschedulingpolicies
TheFIFO(FirstInFirstOut)scheduler
TheFIFO(FirstInFirstOut)scheduler
Thecapacityscheduler
Capacityschedulerconfigurations
Thefairscheduler
Fairschedulerconfigurations
YARNmultitenancyapplicationsupport
AdministrationofYARN
Administrativetools
AddingandremovingnodesfromaYARNcluster
AdministratingYARNjobs
MapReducejobconfigurations
YARNlogmanagement
YARNwebuserinterface
Summary
6.DevelopingandRunningaSimpleYARNApplication
RunningsampleexamplesonYARN
RunningasamplePiexample
MonitoringYARNapplicationswithwebGUI
YARN’sMapReducesupport
![Page 7: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/7.jpg)
TheMapReduceApplicationMaster
ExampleYARNMapReducesettings
YARN’scompatibilitywithMapReduceapplications
DevelopingYARNapplications
TheYARNapplicationworkflow
WritingtheYARNclient
WritingtheYARNApplicationMaster
ResponsibilitiesoftheApplicationMaster
Summary
7.YARNFrameworks
ApacheSamza
WritingaKafkaproducer
Writingthehello-samzaproject
Startingagrid
Storm-YARN
Prerequisites
HadoopYARNshouldbeinstalled
ApacheZooKeepershouldbeinstalled
SettingupStorm-YARN
Gettingthestorm.yamlconfigurationofthelaunchedStormcluster
BuildingandrunningStorm-Starterexamples
ApacheSpark
WhyrunonYARN?
ApacheTez
ApacheGiraph
HOYA(HBaseonYARN)
KOYA(KafkaonYARN)
Summary
8.FailuresinYARN
ResourceManagerfailures
ApplicationMasterfailures
![Page 8: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/8.jpg)
NodeManagerfailures
Containerfailures
HardwareFailures
Summary
9.YARN–AlternativeSolutions
Mesos
Omega
Corona
Summary
10.YARN–FutureandSupport
WhatYARNmeanstothebigdataindustry
Journey–presentandfuture
Presenton-goingfeatures
Futurefeatures
YARN-supportedframeworks
Summary
Index
![Page 9: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/9.jpg)
![Page 10: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/10.jpg)
YARNEssentials
![Page 11: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/11.jpg)
![Page 12: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/12.jpg)
YARNEssentialsCopyright©2015PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthors,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
Firstpublished:February2015
Productionreference:1190215
PublishedbyPacktPublishingLtd.
LiveryPlace
35LiveryStreet
BirminghamB32PB,UK.
ISBN978-1-78439-173-7
www.packtpub.com
![Page 13: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/13.jpg)
![Page 14: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/14.jpg)
CreditsAuthors
AmolFasale
NirmalKumar
Reviewers
LakshmiNarasimhan
SwapnilSalunkhe
Jenny(Xiao)Zhang
CommissioningEditor
TaronPereira
AcquisitionEditor
JamesJones
ContentDevelopmentEditor
ArwaManasawala
TechnicalEditor
IndrajitA.Das
CopyEditors
KarunaNarayanan
LaxmiSubramanian
ProjectCoordinator
PuravMotiwalla
Proofreaders
SafisEditing
MariaGould
Indexer
PriyaSane
Graphics
SheetalAute
ValentinaD’silva
AbhinashSahu
ProductionCoordinator
![Page 15: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/15.jpg)
ShantanuN.Zagade
CoverWork
ShantanuN.Zagade
![Page 16: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/16.jpg)
![Page 17: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/17.jpg)
AbouttheAuthorsAmolFasalehasmorethan4yearsofindustryexperienceactivelyworkinginthefieldsofbigdataanddistributedcomputing;heisalsoanactivebloggerinandcontributortotheopensourcecommunity.AmolworksasaseniordatasystemengineeratMakeMyTrip.com,averywell-knowntravelandhospitalityportalinIndia,responsibleforreal-timepersonalizationofonlineuserexperiencewithApacheKafka,ApacheStorm,ApacheHadoop,andmanymore.Also,Amolhasactivehands-onexperienceinJava/J2EE,SpringFrameworks,Python,machinelearning,Hadoopframeworkcomponents,SQL,NoSQL,andgraphdatabases.
YoucanfollowAmolonTwitterat@amolfasaleoronLinkedIn.Amolisveryactiveonsocialmedia.Youcancatchhimonlineforanytechnicalassistance;hewouldbehappytohelp.
Amolhascompletedhisbachelor’sinengineering(electronicsandtelecommunication)fromPuneUniversityandpostgraduatediplomaincomputersfromCDAC.
Thegiftofloveisoneofthegreatestblessingsfromparents,andIamheartilythankfultomymom,dad,friends,andcolleagueswhohaveshownandcontinuetoshowtheirsupportindifferentways.Finally,IowemuchtoJamesandArwawithoutwhosedirectionandunderstanding,Iwouldnothavecompletedthiswork.
NirmalKumarisaleadsoftwareengineeratiLabs,theR&DteamatImpetusInfotechPvt.Ltd.Hehasmorethan8yearsofexperienceinopensourcetechnologiessuchasJava,JEE,Spring,Hibernate,webservices,Hadoop,Hive,Flume,Sqoop,Kafka,Storm,NoSQLdatabasessuchasHBaseandCassandra,andMPPdatabasessuchasTeradata.
YoucanfollowhimonTwitterat@nirmal___kumar.Hespendsmostofhistimereadingaboutandplayingwithdifferenttechnologies.Hehasalsoundertakenmanytechtalksandtrainingsessionsonbigdatatechnologies.
Hehasattainedhismaster’sdegreeincomputerapplicationsfromHarcourtButlerTechnologicalInstitute(HBTI),Kanpur,IndiaandiscurrentlypartofthebigdataR&DteaminiLabsatImpetusInfotechPvt.Ltd.
Iwouldliketothankmyorganization,especiallyiLabs,forsupportingmeinwritingthisbook.Also,aspecialthankstothePacktPublishingteam;withoutyouguys,thisworkwouldnothavebeenpossible.
![Page 18: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/18.jpg)
![Page 19: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/19.jpg)
AbouttheReviewersLakshmiNarasimhanisafullstackdeveloperwhohasbeenworkingonbigdataandsearchsincetheearlydaysofLuceneandwasapartofthesearchteamatAsk.com.Heisabigadvocateofopensourceandregularlycontributesandconsultsonvarioustechnologies,mostnotablyDrupalandtechnologiesrelatedtobigdata.Lakshmiiscurrentlyworkingasthecurriculumdesignerforhisowntrainingcompany,http://www.readybrains.com.Heblogsoccasionallyabouthistechnicalendeavorsathttp://www.lakshminp.comandcanbecontactedviahisTwitterhandle,@lakshminp.
It’shardfindareadyreferenceordocumentationforasubjectlikeYARN.I’dliketothanktheauthorforwritingabookonYARNandhopethetargetaudiencefindsituseful.
SwapnilSalunkheisapassionatesoftwaredeveloperwhoiskeenlyinterestedinlearningandimplementingnewtechnologies.Hehasapassionforfunctionalprogramming,machinelearning,andworkingwithdata.Hehasexperienceworkinginthefinanceandtelecomdomains.
I’dliketothankPacktPublishinganditsstaffforanopportunitytocontributetothisbook.
Jenny(Xiao)Zhangisatechnologyprofessionalinbusinessanalytics,KPIs,andbigdata.Shehelpsbusinessesbettermanage,measure,report,andanalyzedatatoanswercriticalbusinessquestionsanddrivebusinessgrowth.SheisanexpertinSaaSbusinessandhadexperienceinavarietyofindustrydomainssuchastelecom,oilandgas,andfinance.Shehaswrittenanumberofblogpostsathttp://jennyxiaozhang.comonbigdata,Hadoop,andYARN.ShealsoactivelyusesTwitterat@smallnarutotoshareinsightsonbigdataandanalytics.
Iwanttothankallmyblogreaders.Itistheencouragementfromthemthatmotivatesmetodeepdiveintotheoceanofbigdata.Ialsowanttothankmydad,Michael(Tiegang)Zhang,forprovidingtechnicalinsightsintheprocessofreviewingthebook.AspecialthankstothePacktPublishingteamforthisgreatopportunity.
![Page 20: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/20.jpg)
![Page 21: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/21.jpg)
www.PacktPub.com
![Page 22: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/22.jpg)
Supportfiles,eBooks,discountoffers,andmoreForsupportfilesanddownloadsrelatedtoyourbook,pleasevisitwww.PacktPub.com.
DidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<[email protected]>formoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
https://www2.packtpub.com/books/subscription/packtlib
DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.
![Page 23: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/23.jpg)
Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser
![Page 24: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/24.jpg)
FreeaccessforPacktaccountholdersIfyouhaveanaccountwithPacktatwww.PacktPub.com,youcanusethistoaccessPacktLibtodayandview9entirelyfreebooks.Simplyuseyourlogincredentialsforimmediateaccess.
![Page 25: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/25.jpg)
![Page 26: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/26.jpg)
PrefaceInashortspanoftime,YARNhasattainedagreatdealofmomentumandacceptanceinthebigdataworld.
YARNessentialsisaboutYARN—themodernoperatingsystemforHadoop.ThisbookcontainsallthatyouneedtoknowaboutYARN,rightfromitsinceptiontothepresentandfuture.
Inthefirstpartofthebook,youwillbeintroducedtothemotivationbehindthedevelopmentofYARNandlearnaboutitscorearchitecture,installation,andadministration.ThispartalsotalksaboutthearchitecturaldifferencesthatYARNbringstoHadoop2withrespecttoHadoop1andwhythisredesignwasneeded.
Inthesecondpart,youwilllearnhowtowriteaYARNapplication,howtosubmitanapplicationtoYARN,andhowtomonitortheapplication.Next,youwilllearnaboutthevariousemergingopensourceframeworksthataredevelopedtorunontopofYARN.YouwilllearntodevelopanddeploysomeusecaseexamplesusingApacheSamzaandStormYARN.
Finally,wewilltalkaboutthefailuresinYARN,somealternativesolutionsavailableonthemarket,andthefutureandsupportforYARNinthebigdataworld.
![Page 27: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/27.jpg)
WhatthisbookcoversChapter1,NeedforYARN,discussesthemotivationbehindthedevelopmentofYARN.ThischapterdiscusseswhatYARNisandwhyitisneeded.
Chapter2,YARNArchitecture,isadeepdiveintoYARN’sarchitecture.Allthemajorcomponentsandtheirinnerworkingsareexplainedinthischapter.
Chapter3,YARNInstallation,describesthestepsrequiredtosetupasingle-nodeandfully-distributedYARNcluster.Italsotalksabouttheimportantconfigurations/propertiesthatyoushouldbeawareofwhileinstallingtheYARNcluster.
Chapter4,YARNandHadoopEcosystems,talksaboutHadoopwithrespecttoYARN.ItgivesashortintroductiontotheHadoop1.xversion,thearchitecturaldifferencesbetweenHadoop1.xandHadoop2.x,andwhereexactlyYARNfitsintoHadoop2.x.
Chapter5,YARNAdministration,coversinformationontheadministrationofYARNclusters.ItexplainstheadministrativetoolsthatareavailableinYARN,whattheymean,andhowtousethem.ThischaptercoversvarioustopicsfromYARNcontainerallocationandconfigurationtovariousschedulingpolicies/configurationsandin-builtsupportformultitenancy.
Chapter6,DevelopingandRunningaSimpleYARNApplication,focusesonsomerealapplicationswithYARN,withsomehands-onexamples.ItexplainshowtowriteaYARNapplication,howtosubmitanapplicationtoYARN,andfinally,howtomonitortheapplication.
Chapter7,YARNFrameworks,discussesthevariousemergingopensourceframeworksthataredevelopedtorunontopofYARN.ThechapterthentalksindetailaboutApacheSamzaandStormonYARN,wherewewilldevelopandrunsomesampleapplicationsusingtheseframeworks.
Chapter8,FailuresinYARN,discussesthefault-toleranceaspectofYARN.ThischapterfocusesonvariousfailuresthatcanoccurintheYARNframework,theircauses,andhowYARNgracefullyhandlesthosefailures.
Chapter9,YARN–AlternativeSolutions,discussesotheralternativesolutionsthatareavailableonthemarkettoday.Thesesystems,likeYARN,sharecommoninspiration/requirementsandthehigh-levelgoalofimprovingscalability,latency,fault-tolerance,andprogrammingmodelflexibility.ThischapterhighlightsthekeydifferencesinthewaythesealternativesolutionsaddressthesamefeaturesprovidedbyYARN.
Chapter10,YARNFutureandSupport,talksaboutYARN’sjourneyanditspresentandfutureintheworldofdistributedcomputing.
![Page 28: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/28.jpg)
![Page 29: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/29.jpg)
WhatyouneedforthisbookYouwillneedasingleLinux-basedmachinewithJDK1.6orlaterinstalled.AnyrecentversionoftheApacheHadoop2distributionwillbesufficienttosetupaYARNclusterandrunsomeexamplesontopofYARN.
ThecodeinthisbookhasbeentestedonCentOS6.4butwillrunonothervariantsofLinux.
![Page 30: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/30.jpg)
![Page 31: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/31.jpg)
WhothisbookisforThisbookisforthebigdataenthusiastswhowanttogainin-depthknowledgeofYARNandknowwhatreallymakesYARNthemodernoperatingsystemforHadoop.YouwilldevelopagoodunderstandingofthearchitecturaldifferencesthatYARNbringstoHadoop2withrespecttoHadoop1.
Youwilldevelopin-depthknowledgeaboutthearchitectureandinnerworkingsoftheYARNframework.
Afterfinishingthisbook,youwillbeabletoinstall,administrate,anddevelopYARNapplications.ThisbooktellsyouanythingyouneedtoknowaboutYARN,rightfromitsinceptiontoitspresentandfutureinthebigdataindustry.
![Page 32: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/32.jpg)
![Page 33: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/33.jpg)
ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.
Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:”TheURLforNameNodeishttp://<namenode_host>:<port>/andthedefaultHTTPportis50070.”
Ablockofcodeissetasfollows:
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
<description>readandwritebuffersizeoffiles</description>
</property>
Anycommand-lineinputoroutputiswrittenasfollows:
${path_to_your_input_dir}
${path_to_your_output_dir_old}
Newtermsandimportantwordsareshowninbold.Wordsthatyouseeonthescreen,forexample,inmenusordialogboxes,appearinthetextlikethis:“UndertheToolssection,youcanfindtheYARNconfigurationfiledetails,schedulinginformation,containerconfigurations,locallogsofthejobs,andalotofotherinformationonthecluster.”
NoteWarningsorimportantnotesappearinaboxlikethis.
TipTipsandtricksappearlikethis.
![Page 34: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/34.jpg)
![Page 35: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/35.jpg)
ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.
Tosendusgeneralfeedback,simplye-mail<[email protected]>,andmentionthebook’stitleinthesubjectofyourmessage.
Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.
![Page 36: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/36.jpg)
![Page 37: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/37.jpg)
CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.
![Page 38: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/38.jpg)
DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
![Page 39: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/39.jpg)
ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.
Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.
![Page 40: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/40.jpg)
PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.
Pleasecontactusat<[email protected]>withalinktothesuspectedpiratedmaterial.
Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.
![Page 41: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/41.jpg)
QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<[email protected]>,andwewilldoourbesttoaddresstheproblem.
![Page 42: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/42.jpg)
![Page 43: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/43.jpg)
Chapter1.NeedforYARNYARNstandsforYetAnotherResourceNegotiator.YARNisagenericresourceplatformtomanageresourcesinatypicalcluster.YARNwasintroducedwithHadoop2.0,whichisanopensourcedistributedprocessingframeworkfromtheApacheSoftwareFoundation.
In2012,YARNbecameoneofthesubprojectsofthelargerApacheHadoopproject.YARNisalsocoinedbythenameofMapReduce2.0.ThisissinceApacheHadoopMapReducehasbeenre-architecturedfromthegrounduptoApacheHadoopYARN.
ThinkofYARNasagenericcomputingfabrictosupportMapReduceandotherapplicationparadigmswithinthesameHadoopcluster;earlier,thiswaslimitedtobatchprocessingusingMapReduce.ThisreallychangedthegametorecastApacheHadoopasamuchmorepowerfuldataprocessingsystem.WiththeadventofYARN,Hadoopnowlooksverydifferentcomparedtothewayitwasonlyayearago.
YARNenablesmultipleapplicationstorunsimultaneouslyonthesamesharedclusterandallowsapplicationstonegotiateresourcesbasedonneed.Therefore,resourceallocation/managementiscentraltoYARN.
YARNhasbeenthoroughlytestedatYahoo!sinceSeptember2012.Ithasbeeninproductionacross30,000nodesand325PBofdatasinceJanuary2013.
Recently,ApacheHadoopYARNwontheBestPaperAwardatACMSymposiumonCloudComputing(SoCC)in2013!
![Page 44: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/44.jpg)
TheredesignideaInitially,HadoopwaswrittensolelyasaMapReduceengine.Sinceitrunsonacluster,itsclustermanagementcomponentswerealsotightlycoupledwiththeMapReduceprogrammingparadigm.
TheconceptsofMapReduceanditsprogrammingparadigmweresodeeplyingrainedinHadoopthatonecouldnotuseitforanythingelseexceptMapReduce.MapReducethereforebecamethebaseforHadoop,andasaresult,theonlythingthatcouldberunonHadoopwasaMapReducejob,batchprocessing.InHadoop1.x,therewasasingleJobTrackerservicethatwasoverloadedwithmanythingssuchasclusterresourcemanagement,schedulingjobs,managingcomputationalresources,restartingfailedtasks,monitoringTaskTrackers,andsoon.
TherewasdefinitelyaneedtoseparatetheMapReduce(specificprogrammingmodel)partandtheresourcemanagementinfrastructureinHadoop.YARNwasthefirstattempttoperformthisseparation.
![Page 45: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/45.jpg)
LimitationsoftheclassicalMapReduceorHadoop1.xThemainlimitationsofHadoop1.xcanbecategorizedintothefollowingareas:
Limitedscalability:
LargeHadoopclustersreportedsomeseriouslimitationsonscalability.ThisiscausedmainlybyasingleJobTrackerservice,whichultimatelyresultsinaseriousdeteriorationoftheoverallclusterperformancebecauseofattemptstore-replicatedataandoverloadlivenodes,thuscausinganetworkflood.AccordingtoYahoo!,thepracticallimitsofsuchadesignarereachedwithaclusterof~5,000nodesand40,000tasksrunningconcurrently.Therefore,itisrecommendedthatyoucreatesmallerandlesspowerfulclustersforsuchadesign.
Lowclusterresourceutilization:
TheresourcesinHadoop1.xoneachslavenode(datanode),aredividedintermsofafixednumberofmapandreduceslots.ConsiderthescenariowhereaMapReducejobhasalreadytakenupalltheavailablemapslotsandnowwantsmorenewmaptaskstorun.Inthiscase,itcannotrunnewmaptasks,eventhoughallthereduceslotsarestillempty.Thisnotionofafixednumberofslotshasaseriousdrawbackandresultsinpoorclusterutilization.
Lackofsupportforalternativeframeworks/paradigms:
ThemainfocusofHadooprightfromthebeginningwastoperformcomputationonlargedatasetsusingparallelprocessing.Therefore,theonlyprogrammingmodelitsupportedwasMapReduce.Withthecurrentindustryneedsintermsofnewusecasesintheworldofbigdata,manynewandalternativeprogrammingmodels(suchApacheGiraph,ApacheSpark,Storm,Tez,andsoon)arecomingintothepictureeachday.ThereisdefinitelyanincreasingdemandtosupportmultipleprogrammingparadigmsbesidesMapReduce,tosupportthevariedusecasesthatthebigdataworldisfacing.
![Page 46: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/46.jpg)
YARNasthemodernoperatingsystemofHadoopTheMapReduceprogrammingmodelis,nodoubt,greatformanyapplications,butnotforeverythingintheworldofcomputation.ThereareusecasesthatarebestsuitedforMapReduce,butnotall.
MapReduceisessentiallybatch-oriented,butsupportforreal-timeandnearreal-timeprocessingaretheemergingrequirementsinthefieldofbigdata.
YARNtookclusterresourcemanagementcapabilitiesfromtheMapReducesystemsothatnewenginescouldusethesegenericclusterresourcemanagementcapabilities.ThislighteneduptheMapReducesystemtofocusonthedataprocessingpart,whichitisgoodatandwillideallycontinuetobeso.
YARNthereforeturnsintoadataoperatingsystemforHadoop2.0,asitenablesmultipleapplicationstocoexistinthesamesharedcluster.Refertothefollowingfigure:
YARNasamodernOSforHadoop
![Page 47: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/47.jpg)
WhatarethedesigngoalsforYARNThissectiontalksaboutthecoredesigngoalsofYARN:
Scalability:
Scalabilityisakeyrequirementforbigdata.Hadoopwasprimarilymeanttoworkonaclusterofthousandsofnodeswithcommodityhardware.Also,thecostofhardwareisreducingyear-on-year.YARNisthereforedesignedtoperformefficientlyonthisnetworkofamyriadofnodes.
Highclusterutilization:
InHadoop1.x,theclusterresourcesweredividedintermsoffixedsizeslotsforbothmapandreducetasks.Thismeansthattherecouldbeascenariowheremapslotsmightbefullwhilereduceslotsareempty,orviceversa.Thiswasdefinitelynotanoptimalutilizationofresources,anditneededfurtheroptimization.YARNfine-grainedresourcesintermsofRAM,CPU,anddisk(containers),leadingtoanoptimalutilizationoftheavailableresources.
Localityawareness:
ThisisakeyrequirementforYARNwhendealingwithbigdata;movingcomputationischeaperthanmovingdata.Thishelpstominimizenetworkcongestionandincreasetheoverallthroughputofthesystem.
Multitenancy:
WiththecoredevelopmentofHadoopatYahoo,primarilytosupportlarge-scalecomputation,HDFSalsoacquiredapermissionmodel,quotas,andotherfeaturestoimproveitsmultitenantoperation.YARNwasthereforedesignedtosupportmultitenancyinitscorearchitecture.Sinceclusterresourceallocation/managementisattheheartofYARN,sharingprocessingandstoragecapacityacrossclusterswascentraltothedesign.YARNhasthenotionofpluggableschedulersandtheCapacitySchedulerwithYARNhasbeenenhancedtoprovideaflexibleresourcemodel,elasticcomputing,applicationlimits,andothernecessaryfeaturesthatenablemultipletenantstosecurelysharetheclusterinanoptimizedway.
Supportforprogrammingmodel:
TheMapReduceprogrammingmodelisnodoubtgreatformanyapplications,butnotforeverythingintheworldofcomputation.Astheworldofbigdataisstillinitsinceptionphase,organizationsareheavilyinvestinginR&Dtodevelopnewandevolvingframeworkstosolveavarietyofproblemsthatbigdatabrings.
Aflexibleresourcemodel:
![Page 48: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/48.jpg)
Besidesmismatchwiththeemergingframeworks’requirements,thefixednumberofslotsforresourceshadseriousproblems.ItwasstraightforwardforYARNtocomeupwithaflexibleandgenericresourcemanagementmodel.
Asecureandauditableoperation:
AsHadoopcontinuedtogrowtomanagemoretenantswithamyriadofusecasesacrossdifferentindustries,therequirementsforisolationbecamemoredemanding.Also,theauthorizationmodellackedstrongandscalableauthentication.ThisisbecauseHadoopwasdesignedwithparallelprocessinginmind,withnocomprehensivesecurity.Securitywasanafterthought.YARNunderstandsthisandaddssecurity-relatedrequirementsintoitsdesign.
Reliability/availability:
Althoughfaulttoleranceisinthecoredesign,inrealitymaintainingalargeHadoopclusterisatedioustask.Allissuesrelatedtohighavailability,failures,failuresonrestart,andreliabilitywerethereforeacorerequirementforYARN.
Backwardcompatibility:
Hadoop1.xhasbeeninthepictureforawhile,withmanysuccessfulproductiondeploymentsacrossmanyindustries.ThismassiveinstallationbaseofMapReduceapplicationsandtheecosystemofrelatedprojects,suchasHive,Pig,andsoon,wouldnottoleratearadicalredesign.Therefore,thenewarchitecturereusedasmuchcodefromtheexistingframeworkaspossible,andnomajorsurgerywasconductedonit.ThismadeMRv2abletoensuresatisfactorycompatibilitywithMRv1applications.
![Page 49: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/49.jpg)
![Page 50: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/50.jpg)
SummaryInthischapter,youlearnedwhatYARNisandhowithasturnedouttobethemodernoperatingsystemforHadoop,makingitamultiapplicationplatform.
InChapter2,YARNArchitecture,wewillbetalkingaboutthearchitecturedetailsofYARN.
![Page 51: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/51.jpg)
![Page 52: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/52.jpg)
Chapter2.YARNArchitectureThischapterdivesdeepintoYARNarchitectureitscorecomponents,andhowtheyinteracttodeliveroptimalresourceutilization,betterperformance,andmanageability.ItalsofocusesonsomeimportantterminologyconcerningYARN.
Inthischapter,wewillcoverthefollowingtopics:
CorecomponentsofYARNarchitectureInteractionandflowofYARNcomponentsResourceManagerschedulingpoliciesRecentdevelopmentsinYARN
ThemotivationbehindtheYARNarchitectureistosupportmoredataprocessingmodels,suchasApacheSpark,ApacheStorm,ApacheGiraph,ApacheHAMA,andsoon,thanjustMapReduce.YARNprovidesaplatformtodevelopandexecutedistributedprocessingapplications.Italsoimprovesefficiencyandresource-sharingcapabilities.
ThedesigndecisionbehindYARNarchitectureistoseparatetwomajorfunctionalities,resourcemanagementandjobschedulingormonitoringofJobTracker,intoseparatedaemons,thatis,aclusterlevelResourceManager(RM)andanapplication-specificApplicationMaster(AM).YARNarchitecturefollowsamaster-slavearchitecturalmodelinwhichtheResourceManageristhemasterandnode-specificslaveNodeManager(NM).TheglobalResourceManagerandper-nodeNodeManagerbuildsamostgeneric,scalable,andsimpleplatformfordistributedapplicationmanagement.TheResourceManageristhesupervisorcomponentthatmanagestheresourcesamongtheapplicationsinthewholesystem.Theper-applicationApplicationMasteristheapplication-specificdaemonthatnegotiatesresourcesfromResourceManagerandworksinhandwithNodeManagerstoexecuteandmonitortheapplication’stasks.
ThefollowingdiagramexplainshowJobTrackerisreplacedbyagloballevelResourceManagerandApplicationManagerandaper-nodeTaskTrackerisreplacedbyanapplication-levelApplicationMastertomanageitsfunctionsandresponsibilities.JobTrackerandTaskTrackeronlysupportMapReduceapplicationswithlessscalabilityandpoorclusterutilization.Now,YARNsupportsmultipledistributeddataprocessingmodelswithimprovedscalabilityandclusterutilization.
![Page 53: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/53.jpg)
TheResourceManagerhasacluster-levelschedulerthathasresponsibilityforresourceallocationtoalltherunningtasksaspertheApplicationManager’srequests.TheprimaryresponsibilityoftheResourceManageristoallocateresourcestotheapplication(s).TheResourceManagerisnotresponsiblefortrackingthestatusofanapplicationormonitoringtasks.Also,itdoesn’tguaranteerestarting/balancingtasksinthecaseofapplicationorhardwarefailure.
Theapplication-levelApplicationMasterisresponsiblefornegotiatingresourcesfromtheResourceManageronapplicationsubmission,suchasmemory,CPU,disk,andsoon.Itisalsoresponsiblefortrackinganapplication’sstatusandmonitoringapplicationprocessesincoordinationwiththeNodeManager.
Let’shavealookatthehigh-levelarchitectureofHadoop2.0.Asyoucansee,moreapplicationscanbesupportedbyYARNthanjusttheMapReduceapplication.ThekeycomponentofHadoop2isYARN,forbetterclusterresourcemanagement,andtheunderlyingfilesystemremainsthesameasHadoopDistributedFileSystem(HDFS)andisshowninthefollowingimage:
![Page 54: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/54.jpg)
HerearesomekeyconceptsthatweshouldknowbeforeexploringtheYARNarchitectureindetail:
Application:Thisisthejobsubmittedtotheframework,forexampleaMapReducejob.Itcouldalsobeashellscript.Container:Thisisthebasicunitofhardwareallocation,forexampleacontainerthathas4GBofRAMandoneCPU.Thecontainerdoesoptimizedresourceallocation;thisreplacesthefixedmapandreduceslotsinthepreviousversionsofHadoop.
![Page 55: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/55.jpg)
CorecomponentsofYARNarchitectureHerearesomecorecomponentsofYARNarchitecturethatweneedtoknow:
ResourceManagerApplicationMasterNodeManager
![Page 56: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/56.jpg)
ResourceManagerResourceManageractsasaglobalresourceschedulerthatisresponsibleforresourcemanagementandschedulingaspertheApplicationMaster’srequestsfortheresourcerequirementsoftheapplication(s).Itisalsoresponsibleforthemanagementofhierarchicaljobqueues.TheResourceManagercanbeseeninthefollowingfigure:
TheprecedingdiagramgivesmoredetailsaboutthecomponentsoftheResourceManager.TheAdminandClientserviceisresponsibleforclientinteractions,suchasajobrequestsubmission,start,restart,andsoon.TheApplicationsManagerisresponsibleforthemanagementofeveryapplication.TheApplicationMasterServiceinteractswitheveryapplication.ApplicationMasterregardingresourceorcontainernegotiation,theResourceTrackerServicecoordinateswiththeNodeManagerandResourceManager.TheApplicationMasterLauncherserviceisresponsibleforlaunchingacontainerfortheApplicationMasteronjobsubmissionfromtheclient.TheSchedulerandSecurityarethecorepartsoftheResourceManager.Asalreadyexplained,theSchedulerisresponsibleforresourcenegotiationandallocationtotheapplicationsaspertherequestoftheApplicationMaster.Therearethreedifferentpoliciesofscheduler,FIFO,Fair,andCapacity,whichwillbeexplainedindetaillaterinthischapter.Thesecuritycomponentisresponsibleforgeneratinganddelegatingan/theApplicationTokenandContainerTokentoaccesstheapplicationandcontainer,respectively.
![Page 57: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/57.jpg)
ApplicationMaster(AM)TheApplicationMasterisataper-applicationlevel.Itisresponsiblefortheapplication’slifecyclemanagementandfornegotiatingtheappropriateresourcesfromtheScheduler,trackingtheirstatusandprogressmonitoring,forexample,MapReduceApplicationMaster.
![Page 58: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/58.jpg)
NodeManager(NM)NodeManageractsasaper-machineagentandisresponsibleformanagingthelifecycleofthecontainerandformonitoringtheirresourceusage.ThecorecomponentsoftheNodeManagerareshowninthefollowingdiagram:
ThecomponentresponsibleforcommunicationbetweentheNodeManagerandResourceManageristheNodeStatusUpdater.TheContainerManageristhecorecomponentoftheNodeManager;itmanagesallthecontainersthatrunonthenode.NodeHealthCheckerServiceistheservicethatmonitorsthenode’shealthandcommunicatesthenode’sheartbeattotheResourceManagerviatheNodeStatusUpdaterservice.TheContainerExecutoristheprocessresponsibleforinteractingwithnativehardwareorsoftwaretostartorstopthecontainerprocess.ManagementofAccessControlList(ACL)andaccesstokenverificationisperformedbytheSecuritycomponent.
Let’stakealookatonescenariotounderstandYARNarchitectureindetail.Refertothefollowingdiagram:
![Page 59: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/59.jpg)
Saywehavetwoclientrequests:onewantstoexecuteasimpleshellscript,whileanotheronewantstoexecuteacomplexMapReducejob.TheShellScriptisrepresentedinmarooncolor,whiletheMapReducejobisrepresentedinlightgreencolorintheprecedingdiagram.
TheResourceManagerhastwomaincomponents,theApplicationManagerandtheScheduler.TheApplicationManagerisresponsibleforacceptingtheclient’sjobsubmissionrequests,negotiatingthecontainerstoexecutetheapplicationsspecifictotheApplicationMaster,andprovidingtheservicestorestarttheApplicationMasteronfailure.TheresponsibilityoftheScheduleristoallocateresourcestothevariousrunningapplicationswithrespecttotheapplicationresourcerequirementsandavailableresources.TheSchedulerisapureschedulerinthesensethatitprovidesnomonitoringortrackingfunctionsfortheapplication.Also,itdoesn’tofferanyguaranteesforrestartingafailedtaskeitherduetofailureintheapplicationorinthehardware.TheSchedulerperformsitsschedulingtasksbasedontheresourcerequirementsoftheapplication(s);itdoessobasedontheabstractnotionoftheresourcecontainer,whichincorporateselementssuchasCPU,memory,disk,andsoon.
TheNodeManageristheper-machineframeworkdaemonthatisresponsibleforthecontainers’lifecycles.Itisalsoresponsibleformonitoringtheirresourceusage,forexample,memory,CPU,disk,network,andsoon,andforreportingthistotheResourceManageraccordingly.Theapplication-levelApplicationMasterisresponsiblefornegotiatingtherequiredresourcecontainersfromthescheduler,trackingtheirstatus,andmonitoringprogress.Intheprecedingdiagram,youcanseethatbothjobs,ShellScriptand
![Page 60: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/60.jpg)
MapReduce,haveanindividualApplicationMasterthatallocatesresourcesforjobexecutionandtotrack/monitorthejobexecutionstatus.
Now,takealookattheexecutionsequenceoftheapplication.Refertotheprecedingapplicationflowdiagram.
AclientsubmitstheapplicationtotheResourceManager.Intheprecedingdiagram,client1submitsaShellScriptRequest(marooncolor),andclient2submitsaMapReducerequest(greencolor):
1. Then,theResourceManagerallocatesacontainertostartuptheApplicationMasteraspertheapplicationsubmittedbytheclient:oneApplicationMasterfortheshellscriptandonefortheMapReduceapplication.
2. WhilestartingtheApplicationMaster,theResourceManagerregisterstheapplicationwiththeResourceManager.
3. AfterthestartupoftheApplicationMaster,itnegotiateswiththeResourceManagerforappropriateresourcesaspertheapplicationrequirement.
4. Then,afterresourceallocationfromtheResourceManager,theApplicationMasterrequeststhattheNodeManagerlaunchesthecontainersallocatedbytheResourceManager.
5. Onsuccessfullaunchingofthecontainers,theapplicationcodeexecuteswithinthecontainer,andtheApplicationManagerreportsbacktotheResourceManagerwiththeexecutionstatusoftheapplication.
6. Duringtheexecutionoftheapplication,theclientcanrequesttheApplicationMasterortheResourceManagerdirectlyfortheapplicationstatus,progressupdates,andsoon.
7. Onexecutionoftheapplication,theApplicationMasterrequeststhattheResourceManagerunregistersandshutdownsitsowncontainerprocess.
![Page 61: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/61.jpg)
![Page 62: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/62.jpg)
YARNschedulerpoliciesAsexplainedintheprevioussection,theResourceManageractsasapluggableglobalschedulerthatmanagesandcontrolsallthecontainers(resources).Therearethreedifferentpoliciesthatcanbeappliedoverthescheduler,asperrequirementsandresourceavailability.Theyareasfollows:
TheFIFOschedulerTheFairschedulerTheCapacityscheduler
![Page 63: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/63.jpg)
TheFIFO(FirstInFirstOut)schedulerFIFOmeansFirstInFirstOut.Asthenameindicates,thejobsubmittedfirstwillgetprioritytoexecute;inotherwords,thejobrunsintheorderofsubmission.FIFOisaqueue-basedscheduler.Itisaverysimpleapproachtoschedulinganditdoesnotguaranteeperformanceefficiency,aseachjobwoulduseawholeclusterforexecution.Sootherjobsmaykeepwaitingtofinishtheirexecution,althoughasharedclusterhasagreatcapabilitytooffermore-than-enoughresourcestomanyusers.
![Page 64: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/64.jpg)
ThefairschedulerFairschedulingisthepolicyofschedulingthatassignsresourcesfortheexecutionoftheapplicationsothatallapplicationsgetanequalshareofclusterrecoursesoveraperiodoftime.Forexample,ifasinglejobisrunning,itwouldgetalltheresourcesavailableinthecluster,andasthejobnumberincreases,freerecourseswillbegiventothejobssothateachuserwillgetafairshareofthecluster.Iftwousershavesubmittedtwodifferentjobs,ashortjobthatbelongstoauserwouldcompleteinasmalltimespanwhilealongerjobsubmittedbytheotheruserkeepsrunning,solongjobswillstillmakesomeprogress.
InaFairschedulingpolicy,alljobsareplacedintojobpools,specifictousers;accordingly,eachusergetstheirownjobpool.Theuserwhosubmitsmorejobsthantheotheruserwillnotgetmoreresourcesthanthefirstuseronaverage.Youmayevendefineyourowncustomizedjobpoolswithspecifiedconfigurations.Fairschedulingisapreemptivescheduling,asifapoolhasnotreceivedfairresourcestorunaparticulartaskforacertainperiodoftime.Inthiscase,theschedulerwillkillthetasksinpoolsthatrunoutofcapacity,toreleaseresourcestothepoolsthatrunundercapacity.
Inadditiontofairscheduling,theFairschedulerallocatesaguaranteedminimumshareofresourcestothepools.Thisisalwayshelpfulfortheusers,groups,orapplications,astheyalwaysgetsufficientresourcesforexecution.
![Page 65: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/65.jpg)
ThecapacityschedulerTheCapacityschedulerisdesignedtoallowapplicationstoshareclusterresourcesinapredictableandsimplefashion.Thesearecommonlyknownas“jobqueues”.Themainideabehindcapacityschedulingistoallocateavailableresourcestotherunningapplications,basedonindividualneedsandrequirements.Thereareadditionalbenefitswhenrunningtheapplicationusingcapacityscheduling,astheycanaccesstheexcesscapacityresourcesthatarenotbeingusedbyanyotherapplications.
Theabstractionprovidedbythecapacityscheduleristhequeue.Itprovidescapacityguaranteesforsupportformultiplequeueswhereajobissubmittedtothequeue,andqueuesareallocatedacapacityinthesensethatacertaincapacityofresourceswillbeattheirdisposal.Allthejobssubmittedtothequeuewillaccesstheresourcesallocatedtothejobqueue.Adminscancontrolthecapacityofeachqueue.
Herearesomebasicfeaturesofthecapacityscheduler:
Security:EachqueuehasstrictACLsthattakecontroloftheauthorizationandauthenticationofuserswhocansubmitjobstoindividualqueues.Elasticity:Freeresourcesareallocatedtoanyqueuebeyonditscapacity.Ifthereisdemandfortheseresourcesfromqueuesthatrunbelowcapacity,thenassoonasthetaskscheduledontheseresourceshascompleted,theywillbeassignedtojobsonqueuesthatrunundercapacity.Operability:Theadmincan,atanypointintime,changequeuedefinitionsandproperties.Multitenancy:Allsetsoflimitsareprovidedtopreventasinglejob,user,andqueuefromobtainingtheresourcesofthequeueorcluster.Thisistoensurethatthesystem,specificallyapreviousversionofHadoop,isnotsuppressedbytoomanytasks.Resource-basedscheduling:Intensivejobsupport,asjobscanspecificallydemandforhigherresourcerequirementsthandefault.Jobpriorities:Thesejobqueuescansupportjobpriorities.Withinthequeue,jobswithhighpriorityhaveaccesstoresourcesbeforejobswithlowerpriority.
![Page 66: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/66.jpg)
![Page 67: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/67.jpg)
RecentdevelopmentsinYARNarchitectureTheResourceManagerisasinglepointoffailureandrestartbecauseofvariousreasons:bugs,hardwarefailure,deliberatedowntimeforupgrading,andsoon.
WealreadysawhowcrucialtheroleoftheResourceManagerinYARNarchitectureis.TheResourceManagerhasbecomeasinglepointoffailure;iftheResourceManagerinaclustergoesdown,everythingonthatclusterwillbelost.
SoinarecentdevelopmentofYARN,ResourceManagerHAbecameahighpriority.ThisrecentdevelopmentofYARNnotonlycoversResourceManagerHA,butalsoprovidestransparencytousersanddoesnotrequirethemtomonitorsucheventsexplicitlyandresubmitthejobs.
OverlycomplexinMRv1forthefactthatJobTrackerhastosavetoomuchofmeta-data:bothclusterstateandper-applicationrunningstate.ThismeansthatifJob-Trackerdies,thenalltheapplicationsinarunningstatewillbelost.
ThedevelopmentofResourceManagerrecoverywillbedoneintwophases:
1. RMRestartPhaseI:Inthisphase,alltheapplicationswillbekilledwhilerestartingtheResourceManageronfailure.Nostateoftheapplicationcanbestored.Developmentofthisphaseisalmostcompleted.
2. RMRestartPhaseII:AsinPhaseII,theapplicationwillstorethestateonRMfailure;thismeansthatapplicationsarenotkilled,andtheyreporttherunningstatebacktotheRMaftertheRMcomesbackup.
TheResourceManagerwillbeusedonlytosaveanapplication’ssubmissionmetadataandcluster-levelinformation.Applicationstatepersistenceandtherecoveryofspecificinformationwillbemanagedbytheapplicationitself.
![Page 68: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/68.jpg)
Asshownintheprecedingdiagram,inthenextversion,wewillgetapluggablestatestore,suchasZookeeperandHDFS,thatcanstorethestateoftherunningapplications.ResourceManagerHAwouldcontainsynchronizedactive-passiveResourceManagerarchitecturalmodelsmanagedbyZookeeper;asonegoesdown,theothercantakeoverclusterresponsibilitywithouthaltingandlosinginformation.
![Page 69: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/69.jpg)
![Page 70: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/70.jpg)
SummaryInthischapter,wecoveredthearchitecturalcomponentsofYARN,theirresponsibilities,andtheirinteroperations.Wealsofocusedonsomemajordevelopmentworkgoingoninthecommunitytoovercomethedrawbacksofthecurrentrelease.Inthenextchapter,wewillcovertheinstallationstepsofYARN.
![Page 71: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/71.jpg)
![Page 72: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/72.jpg)
Chapter3.YARNInstallationInthissection,we’llcovertheinstallationofHadoopandYARNandtheirconfigurationforasingle-nodeandsingle-clustersetup.Now,wewillconsiderHadoopastwodifferentcomponents:oneisHadoopDistributedFileSystem(HDFS),theotherisYARN.TheYARNcomponentstakecareofresourceallocationandtheschedulingofthejobsthatrunoverthedatastoredinHDFS.We’llcovermostoftheconfigurationstomakeYARNdistributedcomputingmoreoptimizedandefficient.
Inthischapter,wewillcoverthefollowingtopics:
HadoopandYARNsingle-nodeinstallationHadoopandYARNfully-distributedmodeinstallationOperatingHadoopandYARNclusters
![Page 73: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/73.jpg)
Single-nodeinstallationLet’sstartwiththestepsforHadoop’ssingle-nodeinstallations,asit’seasytounderstandandsetup.Thisway,wecanquicklyperformsimpleoperationsusingHadoopMapReduceandtheHDFS.
![Page 74: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/74.jpg)
PrerequisitesHerearesomeprerequisitesneededforHadoopinstallations;makesurethattheprerequisitesarefulfilledtostartworkingwithHadoopandYARN.
PlatformGNU/UnixissupportedforHadoopinstallationasadevelopmentaswellasaproductionplatform.TheWindowsplatformisalsosupportedforHadoopinstallation,withsomeextraconfigurations.Now,we’llfocusmoreonLinux-basedplatforms,asHadoopismorewidelyusedwiththeseplatformsandworksmoreefficientlywithLinuxcomparedtoWindowssystems.Herearethestepsforsingle-nodeHadoopinstallationforLinuxsystems.IfyouwanttoinstallitonWindows,refertotheHadoopwikipagefortheinstallationsteps.
SoftwareHere’ssomesoftware;makesurethattheyareinstalledbeforeinstallingHadoop.
Javamustbeinstalled.ConfirmwhethertheJavaversioniscompatiblewiththeHadoopversionthatistobeinstalledbycheckingtheHadoopwikipage(http://wiki.apache.org/hadoop/HadoopJavaVersions).
SSHandSSHDmustbeinstalledandrunning,astheyareusedbyHadoopscriptstomanageremoteHadoopdaemons.
Now,downloadtherecentstablereleaseoftheHadoopdistributionfromApachemirrorsandarchivesusingthefollowingcommand:
$$wgethttp://mirrors.ibiblio.org/apache/hadoop/common/hadoop-
2.6.0/hadoop-2.6.0.tar.gz
Notethatatthetimeofwritingthisbook,Hadoop2.6.0isthemostrecentstablerelease.Nowusethefollowingcommands:
$$mkdir–p/opt/yarn
$$cd/opt/yarn
$$tarxvzf/root/hadoop-2.6.0.tar.gz
![Page 75: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/75.jpg)
StartingwiththeinstallationNow,unzipthedownloaddistributionunderthe/etc/directory.ChangetheHadoopenvironmentalparametersasperthefollowingconfigurations.
SettheJAVA_HOMEenvironmentalparametertotheJAVArootinstalledbefore:
$$exportJAVA_HOME=etc/java/latest
SettheHadoophometotheHadoopinstallationdirectory:
$$exportHADOOP_HOME=etc/hadoop
TryrunningtheHadoopcommand.ItshoulddisplaytheHadoopdocumentation;thisindicatesasuccessfulHadoopconfiguration.
Now,ourHadoopsingle-nodesetupisreadytoruninthefollowingmodes.
Thestandalonemode(localmode)Bydefault,HadooprunsinstandalonemodeasasingleJavaprocess.Thismodeisusefulfordevelopmentanddebugging.
Thepseudo-distributedmodeHadoopcanrunonasinglenodeinpseudo-distributedmode,aseachdaemonisrunasaseparateJavaprocess.TorunHadoopinpseudo-distributedmode,followtheseconfigurationinstructions.First,navigatetothe/etc/hadoop/core-site.xml.
ThisconfigurationfortheNameNodesetupwillrunonlocalhostport9000.YoucansetthefollowingpropertyfortheNameNode:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Nownavigateto/etc/hadoop/hdfs-site.xml.
Bysettingthefollowingproperty,weareensuringthatthereplicationfactorofeachdatablockis3(bydefault,thereplicationfactoris3):
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Then,formattheHadoopfilesystemusingthiscommand:
$$$HADOOP_HOME/bin/hdfsnamenode–format
Afterformattingthefilesystem,startthenamenodeanddatanodedaemonsusingthenext
![Page 76: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/76.jpg)
command.Youcanseelogsunderthe$HADOOP_HOME/logsdirectorybydefault:
$$$HADOOP_HOME/sbin/start-dfs.sh
Now,wecanseethenamenodeUIonthewebinterface.Hithttp://localhost:50070/inthebrowser.
CreatetheHDFSdirectoriesthatarerequiredtorunMapReducejobs:
$$$HADOOP_HOME/bin/hdfs-mkdir/user
$$$HADOOP_HOME/bin/hdfs-mkdir/user/{username}
ToMapReducejobonYARNinpseudo-distributedmode,youneedtostarttheResourceManagerandNodeManagerdaemons.Navigateto/etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Navigateto/etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Now,starttheResourceManagerandNodeManagerdaemonsbyissuingthiscommand:
$$sbin/start-yarn.sh
Bysimplynavigatingtohttp://localhost:8088/inyourbrowser,youcanseethewebinterfacefortheResourceManager.Fromhere,youcanstart,restart,orstopthejobs.
TostoptheYARNdaemons,youneedtorunthefollowingcommand:
$$$HADOOP_HOME/sbin/stop-yarn.sh
ThisishowwecanconfigureHadoopandYARNinasinglenodeinstandaloneandpseudo-distributedmodes.Movingforward,wewillfocusonfully-distributedmode.Asthebasicconfigurationremainsthesame,weonlyneedtodosomeextraconfigurationforfully-distributedmode.Single-nodesetupismainlyusedfordevelopmentanddebuggingofdistributedapplications,whilefully-distributedmodeisusedfortheproductionsetup.
![Page 77: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/77.jpg)
![Page 78: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/78.jpg)
Thefully-distributedmodeIntheprevioussection,wehighlightedthestandaloneHadoopandYARNconfigurations,andinthissectionwe’llfocusonthefully-distributedmodesetup.Thissectiondescribeshowtoinstall,configure,andmanageHadoopandYARNinfully-distributed,verylargeclusterswiththousandsofnodesinthem.
Inordertostartwithfully-distributedmode,wefirstneedtodownloadthestableversionofHadoopfromApachemirrors.InstallingHadoopindistributedmodegenerallymeansunpackingthesoftwaredistributiononeachmachineintheclusterorinstallingRedHatPackageManagers(RPMs).AsHadoopfollowsamaster-slavearchitecture,onemachineintheclusterisdesignatedastheNameNode(NN),oneastheResourceManager(RM),andtherestofthemachines,DataNodes(DN)andNodeManagers(NM),willtypicallyactsasslaves.
AfterthesuccessfulunpackingofsoftwaredistributiononeachclustermachineorRPMinstallation,youneedtotakecareofaveryimportantpartoftheHadoopinstallationphase,Hadoopconfiguration.
Hadooptypicallyhastwotypesofconfiguration:oneistheread-onlydefaultconfiguration(core-default.xml,hdfs-default.xml,yarn-default.xml,andmapred-default.xml),whiletheotheristhesite-specificconfiguration(core-site.xml,hdfs-site.xml,yarn-site.xml,andmapred-site.xml).Allthesefilearefoundunderthe$HADOOP_HOME/confdirectory.
Inadditiontotheprecedingconfigurationfiles,theHadoop-environmentandYARN-environmentspecificfileisfoundinconf/hadoop-env.shandconf/yarn-env.sh.AsfortheHadoopandYARNclusterconfiguration,youneedtosetupanenvironmentinwhichHadoopdaemonscanexecute.TheHadoop/YARNdaemonsaretheNameNode/ResourceManager(masters)andtheDataNode/NodeManager(slaves).
First,makesurethatJAVA_HOMEiscorrectlyspecifiedoneachnode.
Herearesomeimportantconfigurationparameterswithrespecttoeachdaemon:
NameNode:HADOOP_NAMENODE_OPTSDataNode:HADOOP_DATANODE_OPTSSecondaryNameNode:HADOOP_SECONDARYNAMENODE_OPTSResourceManager:YARN_RESOURCEMANAGER_OPTSNodeManager:YARN_NODEMANAGER_OPTSWebAppProxy:YARN_PROXYSERVER_OPTSMapReduceJobHistoryServer:HADOOP_JOB_HISTORYSERVER_OPTS
Forexample,toruntheNameNodeinparallelGCmode,thefollowinglineshouldbeaddedintohadoop-env.sh:
$$exportHADOOP_NAMENODE_OPTS="-XX:+UseParallelGC${HADOOP_NAMENODE_OPTS}"
Herearesomeimportantconfigurationparameterswithrespecttothedaemonanditsconfigurationfiles.
![Page 79: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/79.jpg)
Navigatetoconf/core-site.xmlandconfigureitasfollows:
fs.defaultFS:NameNodeURI,hdfs://<hdfshost>:<hdfsport>
<property>
<name>fs.defaultFS</name>
<value>hdfs://$<hdfshostname>:<hdfsport></value>
<description>ItisaNameNodehostname</description>
</property>
Theio.file.buffer.size:4096,readandwritebuffersizeoffiles.
ThebuffersizeforI/O(read/write)operationonsequencefilesstoredindiskfiles,thatis,itdetermineshowmuchdataisbufferedinI/Opipesbeforetransferringittootheroperationsduringread/writeoperations.IshouldbemultipleofOSfilesystemblocksize.
<property>
<name>io.file.buffer.size</name>
<value>4096</value>
<description>readandwritebuffersizeoffiles</description>
</property>
Nownavigatetoconf/hdfs-site.xml.HereistheconfigurationfortheNameNode:
Parameter Description
dfs.namenode.name.dirThepathonthelocalfilesystemwheretheNameNodegeneratesthenamespaceandapplicationtransactionlogs.
dfs.namenode.hosts ThelistofpermittedDataNodes.
dfs.namenode.hosts.exclude ThelistofexcludedDataNodes.
dfs.blocksize Thedefaultvalueis268435456.TheHDFSblocksizeis256MBforlargefilesystems.
dfs.namenode.handler.countThedefaultvalueis100.MoreNameNodeserverthreadstohandleRPCsfromalargenumberofDataNodes.
TheconfigurationfortheDataNodeisasfollows:
Parameter Description
dfs.datanode.data.dir Comma-delimitedlistofpathsonthelocalfilesystemswheretheDataNodestorestheblocks
Nownavigatetoconf/yarn-site.xml.We’lltakealookattheconfigurationsrelatedtotheResourceManagerandNodeManager:
Parameter Description
yarn.acl.enable ValuesaretrueorfalsetoenableordisableACLs.Thedefaultvalueisfalse.
yarn.admin.aclThisreferstotheadminorACL.Thedefaultis*,whichmeansanyonecandoadmintasks.ACLsetsadminsonthecluster.Thiscouldbeacomma-delimitedusergrouptosetmorethanoneadmin.
yarn.log-
![Page 80: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/80.jpg)
aggregation-
enableThisistrueorfalsetoenableordisablelogaggregation.
Now,wewilltakelookatconfigurationsfortheResourceManagerintheconf/yarn-site.xmlfile:
Parameter Description
yarn.resourcemanager.address ThisistheResourceManagerhost:portforclientstosubmitjobs.
yarn.resourcemanager.scheduler.addressThisistheResourceManagerhost:portforApplicationMasterstotalktotheSchedulertoobtainresources.
yarn.resourcemanager.resource-
tracker.addressThisistheResourceManagerhost:portforNodeManagers.
yarn.resourcemanager.admin.addressThisistheResourceManagerhost:portforadministrativecommands.
yarn.resourcemanager.webapp.address ThisistheResourceManagerweb-uihost:port.
yarn.resourcemanager.scheduler.classThisistheResourceManagerSchedulerclass.ThevaluesareCapacityScheduler,FairScheduler,andFifoScheduler.
yarn.scheduler.minimum-allocation-mbThisistheminimumlimitofmemorytoallocatetoeachcontainerrequestintheResourceManager.
yarn.scheduler.maximum-allocation-mbThisisthemaximumlimitofmemorytoallocatetoeachcontainerrequestintheResourceManager.
yarn.resourcemanager.nodes.include-path/
yarn.resourcemanager.nodes.exclude-path
Thisisthelistofpermitted/excludedNodeManagers.Ifnecessary,usethesefilestocontrolthelistofpermittedNodeManagers.
NowtakelookatconfigurationsfortheNodeManagerinconf/yarn-site.xml:
Parameter Description
yarn.nodemanager.resource.memory-
mb
Thisreferstotheavailablephysicalmemory(MBs)fortheNodeManager.ItdefinesthetotalavailablememoryresourcesontheNodeManagertobemadeavailabletotherunningcontainers.
yarn.nodemanager.vmem-pmem-ratioThisreferstothemaximumratiobywhichvirtualmemoryusageoftasksmayexceedphysicalmemory.
yarn.nodemanager.local-dirsThisreferstothelistofdirectorypathsonthelocalfilesystemwhereintermediatedataiswritten.Thisshouldbeacomma-separatedlist.
yarn.nodemanager.log-dirs Thisreferstothepathonthelocalfilesystemwherelogsarewritten.
yarn.nodemanager.log.retain-
seconds
Thisreferstothetime(inseconds)topersistlogfilesontheNodeManager.Thedefaultvalueis10800seconds.Thisconfigurationisapplicableonlyiflogaggregationisenabled.
yarn.nodemanager.remote-app-log-
dir
ThisistheHDFSdirectorypathtowhichlogshavebeenmovedafterapplicationcompletion.Thedefaultpathis/logs.Thisconfigurationis
![Page 81: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/81.jpg)
applicableonlyiflogaggregationisenabled.
yarn.nodemanager.remote-app-log-
dir-suffix
Thisreferstothespecifiedsuffixappendedtotheremotelogdirectory.Thisconfigurationisapplicableonlyiflogaggregationisenabled.
yarn.nodemanager.aux-servicesThisreferstotheshuffleservicethatspecificallyneedstobesetforMapReduceapplications.
![Page 82: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/82.jpg)
HistoryServerTheHistoryServerallowsallYARNapplicationswithacentrallocationtoaggregatetheircompletedjobsforhistoricalreferenceanddebugging.ThesettingsfortheMapReduceJobHistoryServercanbefoundinthemapred-default.xmlfile:
mapreduce.jobhistory.address:MapReduceJobHistoryServerhost:port.Thedefaultportis10020.mapreduce.jobhistory.webapp.address:ThisistheMapReduceJobHistoryServerWebUIhost:port.Thedefaultportis19888.mapreduce.jobhistory.intermediate-done-dir:ThisisthedirectorywherehistoryfilesarewrittenbyMapReducejobs(inHDFS).Thedefaultis/mr-history/tmp.mapreduce.jobhistory.done-dir:ThisisthedirectorywherehistoryfilesaremanagedbytheMRJobHistoryServer(inHDFS).Thedefaultis/mr-history/done.
![Page 83: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/83.jpg)
SlavefilesWithrespecttotheHadoopslaveandYARNslavenodes,generallyonechoosesonenodeintheclusterastheNameNode(Hadoopmaster),anothernodeastheResourceManager(YARNmaster),andtherestofthemachineactsasbothHadoopslaveDataNodesandYarnslaveNodeManagers.Listalltheslaves,oneperlinehostnameorIPaddressesinyourHadoopconf/slavesfile.
![Page 84: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/84.jpg)
![Page 85: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/85.jpg)
OperatingHadoopandYARNclustersThisisthefinalstageofHadoopandYARNclustersetupandconfiguration.HerearethecommandsthatneedtobeusedtostartandstoptheHadoopandYARNclusters.
![Page 86: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/86.jpg)
StartingHadoopandYARNclustersTostartHadoopandtheYARNcluster,usewiththefollowingprocedure:
1. FormataHadoopdistributedfilesystem:
$HADOOP_HOME/bin/hdfsnamenode-format<cluster_name>
2. ThefollowingcommandisusedtostartHDFS.RunitontheNameNode:
$HADOOP_HOME/sbin/hadoop-daemon.sh--config$HADOOP_CONF_DIR--script
hdfsstartnamenode
3. RunthiscommandtostartDataNodesonallslavesnodes:
$HADOOP_HOME/sbin/hadoop-daemon.sh--config$HADOOP_CONF_DIR--script
hdfsstartdatanode
4. StartYARNwiththefollowingcommandontheResourceManager:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh--config$HADOOP_CONF_DIRstart
resourcemanager
5. ExecutethiscommandtostartNodeManagersonallslaves:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh--config$HADOOP_CONF_DIRstart
nodemanager
6. StartastandaloneWebAppProxyserver.Thisisusedforload-balancingpurposesonamultiservercluster:
$HADOOP_YARN_HOME/sbin/yarn-daemonartproxyserver--config
$HADOOP_CONF_DIR
7. ExecutethiscommandonthedesignatedHistoryServer:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.shstarthistoryserver--config
$HADOOP_CONF_DIR
![Page 87: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/87.jpg)
StoppingHadoopandYARNclustersTostopHadoopandtheYARNcluster,usewiththefollowingprocedure:
1. UsethefollowingcommandontheNameNodetostopit:
$HADOOP_HOME/sbin/hadoop-daemon.sh--config$HADOOP_CONF_DIR--script
hdfsstopnamenode
2. IssuethiscommandonalltheslavenodestostopDataNodes:
$HADOOP_HOME/sbin/hadoop-daemon.sh--config$HADOOP_CONF_DIR--script
hdfsstopdatanode
3. TostoptheResourceManager,issuethefollowingcommandonthespecifiedResourceManager:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh--config$HADOOP_CONF_DIRstop
resourcemanager
4. ThefollowingcommandisusedtostoptheNodeManageronallslavenodes:
$HADOOP_YARN_HOME/sbin/yarn-daemon.sh--config$HADOOP_CONF_DIRstop
nodemanager
5. StoptheWebAppProxyserver:
$HADOOP_YARN_HOME/sbin/yarn-daemon.shstopproxyserver--config
$HADOOP_CONF_DIR
6. StoptheMapReduceJobHistoryServerbyrunningthefollowingcommandontheHistoryServer:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.shstophistoryserver--config
$HADOOP_CONF_DIR
![Page 88: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/88.jpg)
![Page 89: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/89.jpg)
WebinterfacesoftheEcosystemIt’sallabouttheHadoopandYARNsetupandconfigurationsandcommandingoverHadoopandYARN.HerearesomewebinterfacesusedbyHadoopandYARNadministratorsforadmintasks:
TheURLfortheNameNodeishttp://<namenode_host>:<port>/andthedefaultHTTPportis50070.
TheURLfortheResourceManagerishttp://<resourcermanager_host>:<port>/andthedefaultHTTPportis8088.TheWebUIfortheNameNodecanbeseenasfollows:
![Page 90: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/90.jpg)
TheURLfortheMapReduceJobHistoryServerishttp://<jobhistoryserver_host>:<port>/andthedefaultHTTPportis19888.
![Page 91: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/91.jpg)
![Page 92: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/92.jpg)
SummaryInthissection,wecoveredHadoopandYARNsingle-nodeandfully-distributedclustersetupandimportantconfigurations.WealsocoveredthebasicbutimportantcommandstoadministrateHadoopandYARNclusters.Inthenextchapter,we’lllookattheHadoopandYARNcomponentsinmoredetail.
![Page 93: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/93.jpg)
![Page 94: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/94.jpg)
Chapter4.YARNandHadoopEcosystemsThischapterdiscussesYARNwithrespecttoHadoop,sinceitisveryimportanttoknowwhereexactlyYARNfitsinHadoop2now.
Hadoop2hasundergoneacompletechangeintermsofarchitectureandcomponentscomparedtoHadoop1.
Inthischapter,wewillbecoverthefollowingtopics:
AshortintroductiontoHadoop1ThedifferencebetweenMRv1andMRv2WhereYARNfitsinHadoop2OldandnewMapReduceAPIsBackwardcompatibilityofMRv2APIsPracticalexamplesofMRv1andMRv2
![Page 95: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/95.jpg)
TheHadoop2releaseYARNcameintothepicturewiththereleaseofHadoop0.23onNovember11,2011.ThiswasthealphaversionoftheHadoop0.23majorrelease.
Themajordifferencebetween0.23andpre-0.23releasesisthatthe0.23releasehadundergoneacompleterevampintermsoftheMapReduceengineandresourcemanagement.This0.23releaseseparatedoutresourcemanagementandapplicationlifecyclemanagement.
![Page 96: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/96.jpg)
![Page 97: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/97.jpg)
AshortintroductiontoHadoop1.xandMRv1WewillbrieflylookatthebasicApacheHadoop1.xanditsprocessingframework,MRv1(Classic),sothatwecangetaclearpictureofthedifferencesinApacheHadoop2.xMRv2(YARN)intermsofarchitecture,components,andprocessingframework.
ApacheHadoopisascalable,fault-tolerantdistributedsystemfordatastorageandprocessing.ThecoreprogrammingmodelinHadoopisMapReduce.
Since2004,Hadoophasemergedasthedefactostandardtostore,process,andanalyzehundredsofterabytesandevenpetabytesofdata.
ThemajorcomponentsinHadoop1.xareasfollows:
NameNode:Thiskeepsthemetadatainthemainmemory.DataNode:Thisiswherethedataresidesintheformofblocks.JobTracker:Thisassigns/reassignsMapReducetaskstoTaskTrackersintheclusterandtracksthestatusofeachTaskTracker.TaskTracker:ThisexecutesthetaskassignedbytheJobTrackerandsendsthestatusofthetasktotheJobTracker.
ThemajorcomponentsofHadoop1.xcanbeseenasfollows:
AtypicalHadoop1.xcluster(shownintheprecedingfigure)canconsistofthousandsof
![Page 98: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/98.jpg)
nodes.ItfollowstheMaster\Slavepattern,wheretheNameNodes\JobTrackersarethemastersandtheDataNodes\TaskTrackersaretheslaves.
ThemaindataprocessingisdistributedacrosstheclusterintheDataNodestoincreaseparallelprocessing.
ThemasterNameNodeprocess(masterforslaveDataNodes)managesthefilesystem,andthemasterJobTrackerprocess(masterforslaveTaskTrackers)managesthetasks.Thetopologyisseenasfollows:
AHadoopclustercanbeconsideredtobemainlymadeupoftwodistinguishableparts:
HDFS:Thisistheunderlyingstoragelayerthatactsasafilesystemfordistributeddatastorage.Youcanputdataofanyformat,schema,andtypeonit,suchasstructured,semi-structured,orunstructureddata.ThisflexibilitymakesHadoopfitforthedatalake,whichissometimescalledthebitbucketorthelandingzone.MapReduce:Thisistheexecutionlayerwhichistheonlydistributeddata-processingframework.
TipDownloadingtheexamplecode
Youcandownloadtheexamplecodefilesfromyouraccountathttp://www.packtpub.comforallthePacktPublishingbooksyouhavepurchased.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.
![Page 99: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/99.jpg)
![Page 100: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/100.jpg)
MRv1versusMRv2MRv1(MapReduceversion1)ispartofApacheHadoop1.xandisanimplementationoftheMapReduceprogrammingparadigm.
TheMapReduceprojectitselfcanbebrokenintothefollowingparts:
End-userMapReduceAPI:ThisistheAPIneededtodeveloptheMapReduceapplication.MapReduceframework:Thisistheruntimeimplementationofvariousphases,suchasthemapphase,thesort/shuffle/mergeaggregationphase,andthereducephase.MapReducesystem:ThisisthebackendinfrastructurerequiredtorunMapReduceapplicationsandincludesthingssuchasclusterresourcemanagement,schedulingofjobs,andsoon.
Hadoop1.xwaswrittensolelyasanMRengine.Sinceitrunsonacluster,itsclustermanagementcomponentwasalsotightlycoupledwiththeMRprogrammingparadigm.TheonlythingthatcouldberunonHadoop1.xwasanMRjob.
InMRv1,theclusterwasmanagedbyasingleJobTrackerandmultipleTaskTrackersrunningontheDataNodes.
InHadoop2.x,theoldMRv1frameworkwasrewrittentorunontopofYARN.ThisapplicationwasnamedMRv2,orMapReduceversion2.ItisthefamiliarMapReduceexecutionunderneath,exceptthateachjobnowrunsonYARN.
ThecoredifferencebetweenMRv1andMRv2isthewaytheMapReducejobsareexecuted.
WithHadoop1.x,itwastheJobTrackerandTaskTrackers,butnowwithYARNonHadoop2.x,it’stheResourceManager,ApplicationMaster,andNodeManagers.
However,theunderlyingconcept,theMapReduceframework,remainsthesame.
Hadoop2hasbeenredefinedfromHDFS-plus-MapReducetoHDFS-plus-YARN.
Referringtothefollowingfigure,YARNtookcontroloftheresourcemanagementandapplicationlifecyclepartofHadoop1.x.
YARNtherefore,definitelyresultsinincreasedROIforHadoopinvestment,inthesensethatnowthesameHadoop2.xclusterresourcescanbeusedtodomultiplethings,suchasbatchprocessing,real-timeprocessing,SQLapplications,andsoon.
Earlier,runningthisvarietyofapplicationswasnotpossible,andpeoplehadtouseaseparateHadoopclusterforMapReduceandaseparateonetodosomethingelse.
![Page 101: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/101.jpg)
![Page 102: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/102.jpg)
![Page 103: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/103.jpg)
UnderstandingwhereYARNfitsintoHadoopIfwerefertoHadoop1.xinthefirstfigureofthischapter,thenitisclearthattheresponsibilitiesoftheJobTrackermainlyincludedthefollowing:
ManagingthecomputationalresourcesintermsofmapandreduceslotsSchedulingsubmittedjobsMonitoringtheexecutionsoftheTaskTrackersRestartingfailedtasksPerformingaspeculativeexecutionoftasksCalculatingtheJobCounters
Clearly,theJobTrackeralonedoesalotoftaskstogetherandisoverloadedwithlotsofwork.
ThisoverloadingoftheJobTrackerledtotheredesignoftheJobTracker,andYARNtriedtoreducetheresponsibilitiesoftheJobTrackerinthefollowingways:
ClusterresourcemanagementandSchedulingresponsibilitiesweremovedtotheglobalResourceManager(RM)Theapplicationlifecyclemanagement,thatis,jobexecutionandmonitoringwasmovedintoaper-applicationApplicationMaster(AM)
TheGlobalResourceManagerisseeninthefollowingimage:
Ifyoulookattheprecedingfigure,youwillclearlyseethedisappearanceofthesinglecentralizedJobTracker;itsplaceistakenbyaGlobalResourceManager.
Also,foreachjobatiny,dedicatedJobTrackeriscreated,whichmonitorsthetasksspecifictoitsjob.ThistinyJobTrackerisrunontheslavenode.
![Page 104: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/104.jpg)
Thistiny,dedicatedJobTrackeristermedanApplicationMasterinthenewframework(refertothefollowingfigure).
Also,theTaskTrackersarereferredtoasNodeManagersinthenewframework.
Finally,lookingattheJobTrackerredesign(inthefollowingfigure),wecanclearlyseethattheJobTracker’sresponsibilitiesarebrokenintoaper-clusterResourceManagerandaper-applicationApplicationMaster:
![Page 105: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/105.jpg)
TheResourceManagertopologycanbeseenasfollows:
![Page 106: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/106.jpg)
![Page 107: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/107.jpg)
OldandnewMapReduceAPIsThenewAPI(whichisalsoknownasContextObjects)wasprimarilydesignedtomaketheAPIeasiertoevolveinthefutureandistypeincompatiblewiththeoldone.
ThenewAPIcameintothepicturefromthe1.xreleaseseries.However,itwaspartiallysupportedinthisseries.So,theoldAPIisrecommendedfor1.xseries:
Feature\Release 1.x 0.23
OldMapReduceAPI Yes Deprecated
NewMapReduceAPI Partial Yes
MRv1runtime(Classic) Yes No
MRv2runtime(YARN) No Yes
TheoldandnewAPIcanbecomparedasfollows:
OldAPI NewAPI
TheoldAPIisintheorg.apache.hadoop.mapredpackageandisstillpresent.
ThenewAPIisintheorg.apache.hadoop.mapreducepackage.
TheoldAPIusedinterfacesforMapperandReducer. ThenewAPIusesAbstractClassesforMapperandReducer.
TheoldAPIusedtheJobConf,OutputCollector,andReporterobjecttocommunicatewiththeMapReducesystem.
ThenewAPIusesthecontextobjecttocommunicatewiththeMapReducesystem.
IntheoldAPI,jobcontrolwasdonethroughtheJobClient.
InthenewAPI,jobcontrolisperformedthroughtheJobclass.
IntheoldAPI,jobconfigurationwasdonewithaJobConfobject.
InthenewAPO,jobconfigurationisdonethroughtheConfigurationclassviasomeofthehelpermethodsonJob.
IntheoldAPI,boththemapandreduceoutputsarenamedpart-nnnnn.
InthenewAPI,themapoutputsarenamedpart-m-nnnnnandthereduceoutputsarenamedpart-r-nnnnn.
IntheoldAPI,thereduce()methodpassesvaluesasajava.lang.Iterator.
InthenewAPI,the.methodpassesvaluesasajava.lang.Iterable.
TheoldAPIcontrolsmappersbywritingaMapRunnable,butnoequivalentexistsforreducers.
ThenewAPIallowsbothmappersandreducerstocontroltheexecutionflowbyoverridingtherun()method.
![Page 108: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/108.jpg)
![Page 109: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/109.jpg)
BackwardcompatibilityofMRv2APIsThissectiondiscussesthescopeandlevelofbackwardcompatibilitysupportedinApacheHadoopMapReduce2.x(MRv2).
![Page 110: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/110.jpg)
Binarycompatibilityoforg.apache.hadoop.mapredAPIsBinarycompatibilityheremeansthatthecompiledbinariesshouldbeabletorunwithoutanymodificationonthenewframework.
ForthoseHadoop1.xuserswhousetheorg.apache.hadoop.mapredAPIs,theycansimplyruntheirMapReducejobsonYARNjustbypointingthemtotheirApacheHadoop2.xclusterviatheconfigurationsettings.
Theywillnotneedanyrecompilation.AlltheywillneedtodoispointtheirapplicationtotheYARNinstallationandpointHADOOP_CONF_DIRtothecorrespondingconfigurationdirectory.Theyarn-site.xml(configurationforYARN)andmapred-site.xmlfiles(configurationforMapReduceapps)arepresentintheconfdirectory.
Also,mapred.job.trackerinmapred-site.xmlisnolongernecessaryinApacheHadoop2.x.Instead,thefollowingpropertyneedstobeaddedinthemapred-site.xmlfiletomakeMRv1applicationsrunontopofYARN:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
![Page 111: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/111.jpg)
Sourcecompatibilityoforg.apache.hadoop.mapredAPIsSourceincompatibilitymeansthatsomecodechangesarerequiredforcompilation.Sourceincompatibilityisorthogonaltobinarycompatibility.
Binariesforanapplicationthatisbinarycompatiblebutnotsourcecompatiblewillcontinuetorunfineonthenewframework.However,codechangesarerequiredtoregeneratethesebinaries.
ApacheHadoop2.xdoesnotensurecompletebinarycompatibilitywiththeapplicationsthatuseorg.apache.hadoop.mapreduceAPIs,astheseAPIshaveevolvedalotsinceMRv1.However,itensuressourcecompatibilityfororg.apache.hadoop.mapreduceAPIsthatbreakbinarycompatibility.Inotherwords,youshouldrecompiletheapplicationsthatuseMapReduceAPIsagainstMRv2JARs.
ExistingapplicationsthatuseMapReduceAPIsaresourcecompatibleandcanrunonYARNwithnochanges,recompilation,and/orminorupdates.
IfanMRv1MapReduce-basedapplicationfailstorunonYARN,youarerequestedtoinvestigateitssourcecodeandcheckwhetherMapReduceAPIsarereferredtoornot.Iftheyarereferredto,youhavetorecompiletheapplicationagainsttheMRv2JARsthatareshippedwithHadoop2.
![Page 112: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/112.jpg)
![Page 113: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/113.jpg)
PracticalexamplesofMRv1andMRv2WewillnowpresentaMapReduceexampleusingboththeoldandnewMapReduceAPIs.
WewillnowwriteaMapReduceprograminJavathatfindsalltheanagrams(aword,phrase,ornameformedbyrearrangingthelettersofanother,suchascinema,formedfromiceman)presentstheminaninputfile,andfinallyprintsalltheanagramsintheoutputfile.
HereistheAnagramMapperOldAPI.javaclassthatusestheoldMapReduceAPI:
importjava.io.IOException;
importjava.util.Arrays;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapred.MapReduceBase;
importorg.apache.hadoop.mapred.Mapper;
importorg.apache.hadoop.mapred.OutputCollector;
importorg.apache.hadoop.mapred.Reporter;
importjava.util.StringTokenizer;
/**
*TheAnagrammapperclassgetsawordasalinefromtheHDFSinputand
sortsthe
*lettersinthewordandwritesitsbacktotheoutputcollectoras
*Key:sortedword(lettersinthewordsorted)
*Value:theworditselfasthevalue.
*Whenthereducerrunsthenwecangroupanagramstogetherbasedonthe
sortedkey.
*/
publicclassAnagramMapperOldAPIextendsMapReduceBaseimplements
Mapper<Object,Text,Text,Text>{
privateTextsortedText=newText();
privateTextoriginalText=newText();
@Override
publicvoidmap(ObjectkeyNotUsed,Textvalue,
OutputCollector<Text,Text>output,Reporterreporter)
throwsIOException{
Stringline=value.toString().trim().toLowerCase().replace(",","");
System.out.println("LINE:"+line);
StringTokenizerst=newStringTokenizer(line);
System.out.println("----Splitbyspace------");
while(st.hasMoreElements()){
Stringword=(String)st.nextElement();
char[]wordChars=word.toCharArray();
Arrays.sort(wordChars);
![Page 114: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/114.jpg)
StringsortedWord=newString(wordChars);
sortedText.set(sortedWord);
originalText.set(word);
System.out.println("\torig:"+word+"\tsorted:"+sortedWord);
output.collect(sortedText,originalText);
}
}
}
HereistheAnagramReducerOldAPI.javaclassthatusestheoldMapReduceAPI:
importjava.io.IOException;
importjava.util.Iterator;
importjava.util.StringTokenizer;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapred.MapReduceBase;
importorg.apache.hadoop.mapred.OutputCollector;
importorg.apache.hadoop.mapred.Reducer;
importorg.apache.hadoop.mapred.Reporter;
publicclassAnagramReducerOldAPIextendsMapReduceBaseimplements
Reducer<Text,Text,Text,Text>{
privateTextoutputKey=newText();
privateTextoutputValue=newText();
publicvoidreduce(TextanagramKey,Iterator<Text>anagramValues,
OutputCollector<Text,Text>output,Reporterreporter)
throwsIOException{
Stringout="";
//Consideringwordswithlength>2
if(anagramKey.toString().length()>2){
System.out.println("ReducerKey:"+anagramKey);
while(anagramValues.hasNext()){
out=out+anagramValues.next()+"~";
}
StringTokenizeroutputTokenizer=newStringTokenizer(out,"~");
if(outputTokenizer.countTokens()>=2){
out=out.replace("~",",");
outputKey.set(anagramKey.toString()+"-->");
outputValue.set(out);
System.out.println("************Writingreduceroutput:"
+anagramKey.toString()+"-->"+out);
output.collect(outputKey,outputValue);
}
}
}
![Page 115: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/115.jpg)
}
Finally,toruntheMapReduceprogram,wehavetheAnagramJobOldAPI.javaclasswrittenusingtheoldMapReduceAPI:
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapred.FileInputFormat;
importorg.apache.hadoop.mapred.FileOutputFormat;
importorg.apache.hadoop.mapred.JobClient;
importorg.apache.hadoop.mapred.JobConf;
publicclassAnagramJobOldAPI{
publicstaticvoidmain(String[]args)throwsException{
if(args.length!=2){
System.err.println("Usage:Anagram<inputpath><outputpath>");
System.exit(-1);
}
JobConfconf=newJobConf(AnagramJobOldAPI.class);
conf.setJobName("AnagramJobOldAPI");
FileInputFormat.addInputPath(conf,newPath(args[0]));
FileOutputFormat.setOutputPath(conf,newPath(args[1]));
conf.setMapperClass(AnagramMapperOldAPI.class);
conf.setReducerClass(AnagramReducerOldAPI.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
JobClient.runJob(conf);
}
}
Next,wewillwritethesameMapper,Reducer,andJobclassesusingthenewMapReduceAPI.
HereistheAnagramMapper.javaclassthatusesthenewMapReduceAPI:
importjava.io.IOException;
importjava.util.Arrays;
importjava.util.StringTokenizer;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Mapper;
publicclassAnagramMapperextendsMapper<Object,Text,Text,Text>{
privateTextsortedText=newText();
privateTextorginalText=newText();
@Override
![Page 116: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/116.jpg)
publicvoidmap(Objectkey,Textvalue,Contextcontext)
throwsIOException,InterruptedException{
Stringline=value.toString().trim().toLowerCase().replace(",","");
System.out.println("LINE:"+line);
StringTokenizerst=newStringTokenizer(line);
System.out.println("----Splitbyspace------");
while(st.hasMoreElements()){
Stringword=(String)st.nextElement();
char[]wordChars=word.toCharArray();
Arrays.sort(wordChars);
StringsortedWord=newString(wordChars);
sortedText.set(sortedWord);
orginalText.set(word);
System.out.println("\torig:"+word+"\tsorted:"+sortedWord);
context.write(sortedText,orginalText);
}
}
}
HereistheAnagramReducer.javaclassthatusesthenewMapReduceAPI:
importjava.io.IOException;
importjava.util.StringTokenizer;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Reducer;
publicclassAnagramReducerextendsReducer<Text,Text,Text,Text>{
privateTextoutputKey=newText();
privateTextoutputValue=newText();
publicvoidreduce(TextanagramKey,Iterable<Text>anagramValues,
Contextcontext)throwsIOException,InterruptedException{
Stringout="";
if(anagramKey.toString().length()>2){
System.out.println("ReducerKey:"+anagramKey);
for(Textanagram:anagramValues){
out=out+anagram.toString()+"~";
}
StringTokenizeroutputTokenizer=newStringTokenizer(out,"~");
if(outputTokenizer.countTokens()>=2){
out=out.replace("~",",");
outputKey.set(anagramKey.toString()+"-->");
![Page 117: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/117.jpg)
outputValue.set(out);
System.out.println("******Writingreduceroutput:"
+anagramKey.toString()+"-->"+out);
context.write(outputKey,outputValue);
}
}
}
}
Finally,hereistheAnagramJob.javaclassthatusesthenewMapReduceAPI:
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
publicclassAnagramJob{
publicstaticvoidmain(String[]args)throwsException{
if(args.length!=2){
System.err.println("Usage:Anagram<inputpath><outputpath>");
System.exit(-1);
}
Jobjob=newJob();
job.setJarByClass(AnagramJob.class);
job.setJobName("AnagramJob");
FileInputFormat.addInputPath(job,newPath(args[0]));
FileOutputFormat.setOutputPath(job,newPath(args[1]));
job.setMapperClass(AnagramMapper.class);
job.setReducerClass(AnagramReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}
![Page 118: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/118.jpg)
Preparingtheinputfile(s)1. Createa${Inputfile_1}filewiththefollowingcontents:
TheProjectGutenbergEtextofMobyWordIIbyGradyWard
hellotheredrawehllolemonsmelonssolemn
Also,bluestbluetsbustlesubletsubtle
2. Createanotherfile,${Inputfile_2},withthefollowingcontents:
Cinemaisanagramtoiceman
Secondisstop,tops,opts,pots,andspot
Stoolandtools
Secureandrescue
3. Copythesefilesinto${path_to_your_input_dir}.
![Page 119: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/119.jpg)
RunningthejobRuntheAnagramJobOldAPI.javaclassandpassthefollowingascommand-lineargs:
${path_to_your_input_dir}
${path_to_your_output_dir_old}
Now,runtheAnagramJob.javaclassandpassthefollowingascommand-lineargs:
${path_to_your_input_dir}
${path_to_your_output_dir_new}
![Page 120: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/120.jpg)
ResultThefinaloutputwrittentois${path_to_your_output_dir_old}and${path_to_your_output_dir_new}.
Thesearethecontentsthatwewillseeintheoutputfile:
aceimn-->cinema,iceman,
adn-->and,and,and,
adrw-->ward,draw,
belstu-->subtle,bustle,bluets,bluest,sublet,
ceersu-->rescue,secure,
ehllo-->hello,ehllo,
elmnos-->lemons,melons,solemn,
loost-->stool,tools,
opst-->pots,tops,stop,spot,opts,
![Page 121: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/121.jpg)
![Page 122: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/122.jpg)
SummaryInthischapter,westartedwithabriefhistoryofHadoopreleases.Next,wecoveredthebasicsofHadoop1.xandMRv1.WethenlookedatthecoredifferencesbetweenMRv1andMRv2andhowYARNfitsintoaHadoopenvironment.WealsosawhowtheJobTracker’sresponsibilitieswerebrokendowninHadoop2.x.
WealsotalkedabouttheoldandnewMapReduceAPIs,theirorigin,differences,andsupportinYARN.Finally,weconcludedthechapterwithsomepracticalexamplesusingtheoldandnewMapReduceAPIs.
Inthenextchapter,youwilllearnabouttheadministrationpartofYARN.
![Page 123: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/123.jpg)
![Page 124: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/124.jpg)
Chapter5.YARNAdministrationInthissection,wewillfocusonYARN’sadministrativepartandontheadministratorrolesandresponsibilitiesofYARN.Wewillalsogainamoredetailedinsightintotheadministrationconfigurationsettingsandparameters,applicationcontainermonitoring,andoptimizedresourceallocations,aswellasschedulingandmultitenancyapplicationsupportinYARN.We’llalsocoverthebasicadministrationtoolsandconfigurationoptionsofYARN.
Thefollowingtopicswillbecoveredinthischapter:
YARNcontainerallocationandconfigurationsSchedulingpoliciesYARNmultitenancyapplicationsupportYARNadministrationandtools
![Page 125: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/125.jpg)
ContainerallocationAtaveryfundamentallevel,thecontaineristhegroupofphysicalresourcessuchasmemory,disk,network,CPU,andsoon.Therecanbeoneormorecontainersonasinglemachine;forexample,ifamachinehas16GBofRAMand8coreprocessors,thenasinglecontainercouldbe1CPUcoreand2GBofRAM.Thismeansthatthereareatotalof8containersonasinglemachine,ortherecouldbeasinglelargecontainerwithalltheoccupiedresources.So,acontainerisaphysicalnotationofmemory,CPU,network,disk,andsooninthecluster.Thecontainer’slifecycleismanagedbytheNodeManager,andtheschedulingisdonebytheResourceManager.Thecontainerallocationcanbeseenasfollows:
YARNisdesignedtoallocateresourcecontainerstotheindividualapplicationsinashared,secure,andmultitenantmanner.WhenanyjobortaskissubmittedtotheYARNframework,theResourceManagertakescareoftheresourceallocationstotheapplication,dependingonschedulingconfigurationsandtheapplication’sneedsandrequirementsviatheApplicationMaster.Toachievethisgoal,thecentralschedulermaintainsthemetadataaboutalltheapplication’sresourcerequirements;thisleadstoefficientschedulingdecisionsforalltheapplicationsthatrunintothecluster.
Let’stakealookathowcontainerallocationhappensinatraditionalHadoopsetup.InthetraditionalHadoopapproach,oneachnodethereisapredefinedandfixednumberofmapslotsandapredefinedandfixednumberofreduceslots.Themapandreducefunctionsareunabletoshareslots,astheyarepredefinedforspecificoperationsonly.Thisstaticallocationisnotefficient;forexample,oneclusterhasafixedtotalof32mapslotsand32reduceslots.WhilerunningaMapReduceapplication,ittookonly16mapslotsandrequiredmorethan32slotsforreduceoperations.Thereduceroperationisunabletousethe16freemapperslots,astheyarepredefinedformapperfunctionalitiesonly,sothereducefunctionhastowaituntilsomereduceslotsbecomefree.
Toovercomethisproblem,YARNhascontainerslots.Irrespectiveoftheapplication,allcontainersareabletorunallapplications;forexample,ifYARNhas64availablecontainersintheclusterandisrunningthesameMapReduceapplication,ifthemapperfunctiontakesonly16slotsandthereducerrequiresmoreresourceslots,thenallother
![Page 126: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/126.jpg)
freeresourcesintheclusterareallocatedtothereduceroperation.Thismakestheoperationmoreefficientandproductive.
Essentially,anapplicationdemandstherequiredresourcesfromtheResourceManagertosatisfyitsneedsviatheApplicationMaster.Then,byallocatingtherequestedresourcestoanapplication,theResourceManagerrespondstotheapplication’sResourceRequest.TheResourceRequestcontainsthenameoftheresourcethathasbeenrequested;priorityoftherequestwithinthevariousotherResourceRequestsofthesameapplication;resourcerequirementcapabilities,suchasRAM,disk,CPU,network,andsoon;andthenumberofresources.ContainerallocationfromtheResourceManagertotheapplicationmeansthesuccessfulfulfillmentofthespecificResourceRequest.
![Page 127: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/127.jpg)
ContainerallocationtotheapplicationNow,takealookatthefollowingsequencediagram:
ThediagramshowshowcontainerallocationisdoneforapplicationsviatheApplicationMaster.Itcanbeexplainedasfollows:
1. TheclientsubmitstheapplicationrequesttotheResourceManager.2. TheResourceManagerregisterstheapplicationwiththeApplicationManager,
generatestheApplicationID,andrespondstotheclientwiththesuccessfullyregisteredApplicationID.
3. Then,theResourceManagerstartstheclientApplicationMasterinaseparateavailablecontainer.Ifnocontainerisavailable,thisrequesthastowaituntilasuitablecontainerisfoundandthensendtheapplicationregistrationrequestforapplicationregistration.
4. TheResourceManagersharesalltheminimumandmaximumresourcecapabilitiesoftheclusterwiththeApplicationMaster.Then,theApplicationMasterdecideshowtoefficientlyusetheavailableresourcestofulfilltheapplication’sneeds.
5. DependingontheresourcecapabilitiessharedbytheResourceManager,theApplicationMasterrequeststhattheResourceManagerallocatesanumberofcontainersonbehalfoftheapplication.
6. TheResourceManagerrespondstotheResourceRequestbytheApplicationMasteraspertheschedulingpoliciesandresourceavailability.ContainerallocationbytheResourceManagermeansthesuccessfulfulfillmentoftheResourceRequestbytheApplicationMaster.
Whilerunningthejob,theApplicationMastersendstheheartbeatandjobprogress
![Page 128: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/128.jpg)
informationoftheapplicationtotheResourceManager.Duringtheruntimeoftheapplication,theApplicationMasterrequestsforthereleaseorallocationofmorecontainersfromtheResourceManager.Whenthejobfinishes,theApplicationMastersendsacontainerde-allocationrequesttotheResourceManagerandexitsitselffromrunningthecontainer.
![Page 129: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/129.jpg)
![Page 130: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/130.jpg)
ContainerconfigurationsHerearethesomeimportantconfigurationsrelatedtoresourcecontainersthatareusedtocontrolcontainers.
Tocontrolthememoryallocationtoacontainer,theadministratorneedstosetthefollowingthreeparametersintheyarn-site.xmlconfigurationfile:
Parameter Description
yarn.nodemanager.resource.memory-
mb
ThisistheamountofmemoryinMBsthattheNodeManagercanuseforthecontainers.
yarn.scheduler.minimum-
allocation-mb
ThisisthesmallestamountofmemoryinMBsallocatedtothecontainerbytheResourceManager.Thedefaultvalueis1024MB.
yarn.scheduler.maximum-
allocation-mb
ThisisthelargestamountofmemoryinMBsallocatedtothecontainerbytheResourceManager.Thedefaultvalueis8192MB.
TheCPUcoreallocationstothecontainerarecontrolledbysettingthefollowingpropertiesintheyarn-site.xmlconfigurationfile:
Parameter Description
yarn.scheduler.minimum-allocation-
vcores
ThisistheminimumnumberofCPUcoresthatareallocatedtothecontainer.
yarn.scheduler.maximum-allocation-
vcores
ThisisthemaximumnumberofCPUcoresthatareallocatedtothecontainer.
yarn.nodemanager.resource.cpu-vcores Thisisthenumberofcoresthatthecontainercanrequestforthenode.
![Page 131: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/131.jpg)
![Page 132: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/132.jpg)
YARNschedulingpoliciesTheYARNarchitecturehaspluggableschedulingpoliciesthatdependontheapplication’srequirementsandtheusecasedefinedfortherunningapplication.YoucanfindtheYARNschedulingconfigurationsintheyarn-site.xmlfile.Here,youcanspecifytheschedulingsystemaseitherFIFO,capacity,orfairschedulingaspertheapplication’sneeds.YoucanalsofindtherunningapplicationschedulinginformationintheResourceManagerUI.Manycomponentsoftheschedulingsystemaredefinedbrieflythere.
Asalreadymentioned,therearethreetypeofschedulingpoliciesthattheYARNschedulerfollows:
FIFOschedulerCapacityschedulerFairscheduler
![Page 133: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/133.jpg)
TheFIFO(FirstInFirstOut)schedulerThisistheschedulingpolicyintroducedintothesystemfromHadoop1.0.TheJobTrackerwasusedtobeFIFOschedulingpolicies.Asthenameindicates,FIFOmeansFirstinFirstOut,thatis,thejobsubmittedfirstwillexecutefirst.TheFIFOschedulerpolicydoesnotfollowanyapplicationpriorities;thispolicymightefficientlyworkforsmallerjobs,butwhileexecutinglargerjobs,FIFOworksveryinefficiently.Soforheavy-loadedclusters,thispolicyisnotrecommended.TheFIFOschedulercanbeseenasfollows:
TheFIFO(FirstInFirstOut)schedulerHereistheconfigurationpropertyfortheFIFOscheduler.Byspecifyingthisinyarn-site.xml,youcanenabletheFIFOschedulingpolicyinyourYARNcluster:
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoSch
eduler</value>
</property>
![Page 134: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/134.jpg)
ThecapacityschedulerThecapacityschedulingpolicyisoneoftheveryfamouspluggableschedulerpoliciesthatallowsmultipleapplicationsorusergroupstosharetheHadoopclusterresourcesinasecureway.Nowadays,thisschedulingpolicyrunssuccessfullyonmanyofthelargestHadoopproductionclustersinanefficientway.
Thecapacityschedulingpolicyallowsauserorusergroupstoshareclusterresourcesinsuchawaythateachuserorgroupofuserswouldgetassignedacertaincapacityoftheclusterforsure.Toenablethispolicy,theclusteradministratorconfiguresoneormorequeueswithsomeprecalculatedsharesofthetotalclusterresourcecapacity;thisassignmentguaranteestheminimumresourcecapacityallocationtoeachqueue.Theadministratorcanalsoconfigurethemaximumandminimumconstraintsontheuseofclusterresources(capacity)oneachqueue.EachqueuehasitsownAccessControlList(ACL)policiesthatcanmanagewhichuserhaspermissiontosubmittheapplicationsonwhichqueues.ACLsalsomanagethereadandmodifypermissionsatthequeuelevelsothatuserscannotviewormodifytheapplicationssubmittedbyotherusers.
CapacityschedulerconfigurationsCapacityschedulerconfigurationscomewithHadoopYARNbydefault.Sometimes,itisnecessarytoconfigurethepolicyinYARNconfigurationfiles.Herearetheconfigurationpropertiesthatneedtobespecifiedinyarn-site.xmltoenablethecapacityschedulerpolicy:
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.Cap
acityScheduler</value>
</property>
Thecapacityscheduler,bydefault,comeswithitsownconfigurationfilenamed$HADOOP_CONF_DIR/capacity-scheduler.xml;thisshouldbepresentintheclasspathsothattheResourceManagerisabletolocateitandloadthepropertiesforthisaccordingly.
![Page 135: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/135.jpg)
ThefairschedulerThefairschedulerisoneofthemostfamouspluggableschedulersforlargeclusters.Itenablesmemory-intensiveapplicationstoshareclusterresourcesinaveryefficientway.Fairschedulingisapolicythatenablestheallocationofresourcestoapplicationsinawaythatallapplicationsget,onaverage,anequalshareoftheclusterresourcesoveragivenperiod.
Inafairschedulingpolicy,ifoneapplicationisrunningonthecluster,itmightrequestallclusterresourcesforitsexecution,ifneeded.Ifotherapplicationsaresubmitted,thepolicycandistributethefreeresourcesamongtheapplicationsinsuchawaythateachapplicationgetsafairlyequalshareofclusterresources.AfairscheduleralsofollowsapreemptionwheretheResourceManagermightrequesttheresourcecontainersbackfromtheApplicationMaster,dependingonthejobconfigurations.Itmightbeahealthyoranunhealthypreemption.
Inthisschedulingmodel,everyapplicationispartofaqueue,soresourcesareassignedtothequeue.Bydefault,eachusersharesthequeuecalled‘DefaultQueue’.Afairschedulersupportsmanyfeaturesatthequeuelevel,suchasassigningweighttothequeue.Aheavyweightqueuewouldgetahighernumberofresourcesthanlightweightqueues,minimumandmaximumsharesthatqueuewouldgetFIFOpolicywithinthequeue.
Whilesubmittingtheapplication,usersmightspecifythenameofthequeuetheapplicationwantstouseresourcesfrom.Forexample,iftheapplicationrequiresahighernumberofresources,itcanspecifytheheavyweightqueuesothatitcangetalltherequiredresourcesthatareavailablethere.
Theadvantageofusingthefairschedulingpolicyisthateveryqueuewouldgetaminimumshareoftheclusterresources.Itisveryimportanttonotethatwhenaqueuecontainsapplicationsthatarewaitingfortheresources,theywouldgettheminimumresourceshare.Ontheotherhand,ifthequeuesresourcesaremorethanenoughfortheapplication,thentheexcessamountwouldbedistributedequallyamongtherunningapplications.
FairschedulerconfigurationsToenablethefairschedulingpolicyinyourYARNcluster,youneedtospecifythefollowingpropertyintheyarn-site.xmlfile:
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSch
eduler</value>
</property>
Thefairscheduleralsohasaspecificconfigurationfileforamoredetailedconfigurationsetup;youwillfinditat$HADOOP_CONF_DIR/fair-scheduler.xml.
![Page 136: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/136.jpg)
![Page 137: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/137.jpg)
YARNmultitenancyapplicationsupportYARNcomeswithbuilt-inmultitenancysupport.Now,let’shavealookatwhatmultitenancymeans.Considerasocietythathasmultipleapartmentsinit,sotherearedifferenttypesoffamilylivingindifferentapartmentswithsecurityandprivacy,buttheyallsharethesociety’scommonareas,suchasthesocietygate,garden,playarea,andotheramenities.Theirapartmentsalsosharecommonwalls.ThesameconceptisfollowedinYARN:thethatrunrunningintotheclustersharetheclusterresourcesinamultitenantway.Theyshareclusterprocessingcapacity,clusterstoragecapacity,dataaccesssecurities,andsoon.Multitenancyisachievedintheclusterbydifferentiatingapplicationsintomultiplebusinessunits,forexample,differentqueuesandusersfordifferenttypesofapplications.
SecurityandprivacycanbeachievedbyconfiguringLinuxandHDFSpermissionstoseparatefilesanddirectoriestocreatetenantboundaries.ThiscanbeachievedbyintegratingwithLDAPorActiveDirectory.Securityisusedtoenforcethetenantapplicationboundaries,andthiscanbeintegratedwiththeKerberossecuritymodel.
ThefollowingdiagramwillexplainhowanapplicationrunsintheYARNclusterinamultitenantway:
IntheprecedingYARNcluster,youcanseethattwojobsarerunning:oneisStorm,andtheotheristheMapReducejob.Theyaresharingtheclusterscheduler,clusterprocessingcapacity,HDFSstorage,andclustersecurity.WecanalsoseethetwoapplicationsarerunningonasingleYARNcluster.TheMapReduceandStormjobsarerunningoverYARNandsharingthecommonclusterinfrastructure,CPU,RAM,andsoon.TheStorm
![Page 138: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/138.jpg)
ApplicationMaster,StormSupervisor,MapRedApplicationMaster,Mappers,andReducersarerunningovertheYARNclusterinamultitenantwaybysharingclusterresources.
![Page 139: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/139.jpg)
![Page 140: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/140.jpg)
AdministrationofYARNNow,wewilltakealookatsomeYARNbasicadministrationconfigurations,basicallyfromHadoop2.0.YARNwasintroducedandmadechangesinHadoopconfigurationfiles.HadoopandYARNhavethefollowingbasicconfigurationfiles:
core-default.xml:Thisfilecontainspropertiesrelatedtothesystem.hdfs-default.xml:ThisfilecontainsHDFS-relatedconfigurations.mapred-default.xml:ThisconfigurationfilecontainspropertiesrelatedtotheYARNMapReduceframework.yarn-default.xml:ThisfilecontainsYARN-relatedproperties.
YouwillfindallthesepropertieslistedontheApachewebsite(http://hadoop.apache.org/docs/current/)intheconfigurationsection,withdetailedinformationoneachpropertyanditsdefaultandpossiblevalues.
![Page 141: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/141.jpg)
AdministrativetoolsYARNhasseveraladministrativetoolsbydefault;youcanfindthemusingthermadmincommand.HereisamoredetailedexplanationoftheResourceManageradmincommand:
$yarnrmadmin-help
ThermadmincommandisthecommandtoexecuteMapReduceadministrativecommands.Thefullsyntaxis:
hadooprmadmin[-refreshQueues][-refreshNodes]
[-refreshSuperUserGroupsConfiguration][-refreshUserToGroupsMappings]
[-refreshAdminAcls][-refreshServiceAcl][-getGroup[username]][-help
[cmd]]
Theprecedingcommandcontainsthefollowingfields:
-refreshQueues:Reloadsthequeues’acls,states,andscheduler-specificproperties.TheResourceManagerwillreloadthemapred-queuesconfigurationfile.-refreshNodes:Refreshesthehost’sinformationattheResourceManager.-refreshUserToGroupsMappings:Refreshesuser-to-groupsmappings.-refreshSuperUserGroupsConfiguration:Refreshessuperuserproxygroupsmappings.-refreshAdminAcls:RefreshesaclsfortheadministrationoftheResourceManager.-refreshServiceAcl:Reloadstheservice-levelauthorizationpolicyfile.ResourceManagerwillreloadtheauthorizationpolicyfile.-getGroups[username]:Getthegroupsthatthegivenuserbelongsto.-help[cmd]:Displayshelpforthegivencommand,orallcommandsifnoneisspecified.
Thegenericoptionssupportedareasfollows:
-conf<configurationfile>:Thiswillspecifyanapplicationconfigurationfile.-D<property=value>:Thiswillusethevalueforthegivenproperty.-fs<local|namenode:port>:ThiswillspecifyaNameNode.-jt<local|jobtracker:port>:ThiswillspecifyaJobTracker.-files<commaseparatedlistoffiles>:Thiswillspecifycomma-separatedfilestobecopiedtotheMapReducecluster.-libjars<commaseparatedlistofjars>:Thiswillspecifycomma-separatedJARfilestoincludeintheclasspath.-archives<commaseparatedlistofarchives>:Thiswillspecifycomma-separatedarchivestobeunarchivedonthecomputemachines.
Thegeneralcommandlinesyntaxis:
bin/hadoopcommand[genericOptions][commandOptions]
![Page 142: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/142.jpg)
AddingandremovingnodesfromaYARNclusterAYARNclusterishorizontallyscalable;youcanaddorremoveworkernodesinorfromtheclusterwithoutstoppingit.Toaddanewnode,allthesoftwareandconfigurationsmustbedoneoverthenewnode.
Thefollowingpropertyisusedtoaddanewnodetothecluster:
yarn.resourcemanager.nodes.include-path
Forremovingthenodefromthecluster,thefollowingpropertyisused:
yarn.resourcemanager.exclude-path
Theprecedingtwopropertiestakevaluesasalocalfilethatcontainsthelistofnodesthatneedtobeaddedorremovedfromthecluster.ThisfilecontainseitherthehostnamesortheIPsoftheworkernodesseparatedbyanewline,tab,orspace.
Afteraddingorremovingthenode,theYARNclusterdoesnotrequirearestart.ItjustneedstorefreshthelistofworkernodessothattheResourceManagergetsinformedaboutthenewlyaddedorremovednodes:
$yarnrmadmin-refreshNodes
![Page 143: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/143.jpg)
AdministratingYARNjobsThemostimportantYARNadmintaskisadministratingtherunningofYARNjobs.YoucanmanageYARNjobsusingtheyarnapplicationCLIcommand.
Usingtheyarnapplicationcommand,theadministratorcankillajob,listalljobs,andfindoutthestatusofajob.MapReducejobscanbecontrolledbythemapredjobcommand.
Hereistheusageoftheyarnapplicationcommand:
usage:application
-appTypes<Comma-separatedlistofapplicationtypes>Workswith--listto
filterapplicationsbasedontheirtype.
-helpDisplayshelpforallcommands.
-kill<ApplicationID>Killstheapplication.
-listListsapplicationsfromtheRM.Supportsoptionaluseof–appTypesto
filter
applicationsbasedonapplicationtype.
-status<ApplicationID>Printsthestatusoftheapplication.
![Page 144: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/144.jpg)
MapReducejobconfigurationsAsMapReducejobsarenowrunningonYARNcontainersinsteadoftraditionalMapReduceslots,it’snecessarytoconfigureMapReducepropertiesintomapred-site.xml.HerearesomepropertiesofMapReducejobsthatcouldbeconfiguredtorunMapReducejobsonYARNcontainers:
Properties Description
mapred.child.java.optsThispropertyisusedtosettheJavaheapsizeforchildJVMsofmaps,forexampleXmx4096m.
mapreduce.map.memory.mbThispropertyisusedtoconfiguretheresourcelimitformapfunctionsforexample,1536MB.
mapreduce.reduce.memory.mbThispropertyisusedtoconfiguretheresourcelimitforreducerfunctions,forexample3072MB.
mapreduce.reduce.java.optsThispropertyisusedtosettheJavaheapsizeforchildJVMsofreducers,forexampleXmx4096m.
![Page 145: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/145.jpg)
YARNlogmanagementThelogmanagementCLItoolisveryusefulforYARNapplicationlogmanagement.TheadministratorcanusethelogsCLIcommanddescribedhere:
$yarnlogs
RetrievelogsforcompletedYARNapplications.
usage:yarnlogs-applicationId<applicationID>[OPTIONS]
generaloptionsare:
-appOwner<ApplicationOwner>AppOwner(assumedtobecurrentuserif
notspecified)
-containerId<ContainerID>ContainerId(mustbespecifiedifnode
addressis
specified)
-nodeAddress<NodeAddress>NodeAddressintheformatnodename:port
(mustbespecifiedifcontainerIDisspecified)
Let’stakeanexample.Ifyouwantedtoprintallthelogsofaspecificapplication,usethefollowingcommand:
$yarnlogs-applicationId<applicationID>
Thiscommandwillprintallthelogsrelatedtotheapplication_IDspecifiedintheconsole’sinterface.
![Page 146: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/146.jpg)
YARNwebuserinterfaceIntheYARNwebuserinterface(http://localhost:8088/cluster),youcanfindinformationonclusternodes,containersconfiguredoneachnode,andapplicationsandtheirstatus.TheYARNwebinterfaceisasfollows:
UndertheSchedulersection,youcanseetheschedulinginformationofallthesubmitted,acceptedbythescheduler,runningapplications,withthetotalclustercapacity,usedandmaximumcapacity,andresourcesallocatedtotheapplicationqueue.Inthefollowingscreenshot,youcanseetheresourcesallocatedtothedefaultqueue:
UndertheToolssection,youcanfindtheYARNconfigurationfiledetails,schedulinginformation,containerconfigurations,locallogsofthejobs,andalotofotherinformation
![Page 147: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/147.jpg)
onthecluster.
![Page 148: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/148.jpg)
![Page 149: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/149.jpg)
SummaryInthischapter,wecoveredYARNcontainerallocationsandconfigurations,schedulingpolicies,andconfigurations.WealsocoveredmultitenancyapplicationsupportinYARNandsomebasicYARNadministrativetoolsandsettings.Inthenextchapter,wewillcoversomeusefulpracticalexamplesaboutYARNandtheecosystem.
![Page 150: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/150.jpg)
![Page 151: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/151.jpg)
Chapter6.DevelopingandRunningaSimpleYARNApplicationInthepreviouschapters,wediscussedtheconceptsoftheYARNarchitecture,clustersetup,andadministration.Nowinthischapter,wewillfocusmoreonMapReduceapplicationswithYARNanditsecosystems,withsomehands-onexamples.YoupreviouslylearnedaboutwhenaclientsubmitsanapplicationrequesttotheYARNclusterandhowYARNregisterstheapplication,allocatestherequiredcontainersforitsexecution,andmonitorstheapplicationwhileit’srunning.Now,wewillseesomepracticalusecasesofYARN.
Inthischapter,wewilldiscuss:
RunningsampleapplicationsonYARNDevelopingYARNexamplesApplicationmonitoringandtracking
Now,let’sstartbyrunningsomeofthesampleapplicationsthatcomeasapartoftheYARNdistributionbundle.
![Page 152: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/152.jpg)
RunningsampleexamplesonYARNRunningtheavailablesampleMapReduceprogramsisasimpletaskwithYARN.TheHadoopversionshipswithsomebasicMapReduceexamples.Youcanfindtheminside$HADOOP_HOME/share/Hadoop/mapreduce/Hadoop-mapreduce-examples-
<HADOOP_VERSION>.jar.ThelocationofthefilemaydifferdependingonyourHadoopinstallationfolderstructure.
Let’sincludethisintheYARN_EXAMPLESpath:
$exportYARN_EXAMPLES=$HADOOP_HOME/share/Hadoop/mapreduce
Now,wehaveallthesampleexamplesintheYARN_EXAMPLESenvironmentalvariable.Youcanaccessalltheexamplesusingthisvariable;tolistalltheavailableexamples,trytypingthefollowingcommandontheconsole:
$yarnjar$YARN_EXAMPLES/hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar
Anexampleprogrammustbegivenasthefirstargument.
Thevalidprogramnamesareasfollows:
aggregatewordcount:Thisisanaggregate-basedmap/reduceprogramthatcountsthewordsintheinputfilesaggregatewordhist:Thisisanaggregate-basedmap/reduceprogramthatcomputesthehistogramofthewordsintheinputfilesbbp:Thisisamap/reduceprogramthatusesBailey-Borwein-PlouffetocomputetheexactdigitsofPidbcount:Thisisanexamplejobthatcountsthepageviewcountsfromadatabasedistbbp:Thisisamap/reduceprogramthatusesaBBP-typeformulatocomputetheexactbitsofPigrep:Thisisamap/reduceprogramthatcountsthematchesofaregexintheinputjoin:Thisisajobthataffectsajoinoversorted,equally-partitioneddatasetsmultifilewc:Thisisajobthatcountswordsfromseveralfilespentomino:Thisisamap/reducetilethatlaysaprogramtofindsolutionstopentominoproblemspi:Thisisamap/reduceprogramthatestimatesPiusingaquasi-MonteCarlomethodrandomtextwriter:Thisisamap/reduceprogramthatwrites10GBofrandomtextualdatapernoderandomwriter:Thisisamap/reduceprogramthatwrites10GBofrandomdatapernodesecondarysort:Thisisanexamplethatdefinesasecondarysorttothereducesort:Thisisamap/reduceprogramthatsortsthedatawrittenbytherandomwritersudoku:Thisisasudokusolverteragen:Thisgeneratesdatafortheterasortterasort:Thisrunstheterasortteravalidate:Thischeckstheresultsofterasort
![Page 153: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/153.jpg)
wordcount:Thisisamap/reduceprogramthatcountsthewordsintheinputfileswordmean:Thisisamap/reduceprogramthatcountstheaveragelengthofthewordsintheinputfileswordmedian:Thisisamap/reduceprogramthatcountsthemedianlengthofthewordsintheinputfileswordstandarddeviation:Thisisamap/reduceprogramthatcountsthestandarddeviationofthelengthofthewordsintheinputfiles
ThesewerethesampleexamplesthatcomeaspartoftheYARNdistributionbydefault.Now,let’stryrunningsomeoftheexamplestoshowcaseYARNcapabilities.
![Page 154: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/154.jpg)
RunningasamplePiexampleTorunanyapplicationontopofYARN,youneedtofollowthisJavacommandsyntax:
$yarnjar<application_jar.jar><arg0><arg1>
TorunasampleexampletocalculatethevalueofPIwith16mapsand10,000samples,usethefollowingcommand:
$yarnjar$YARN_EXAMPLES/hadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jar
PI1610000
Notethatweareusinghadoop-mapreduce-examples-2.4.0.2.1.1.0-385.jarhere.TheJARversionmaychangedependingonyourinstalledHadoopdistribution.
Onceyouhittheprecedingcommandontheconsole,youwillseethelogsgeneratedbytheapplicationontheconsole,asshowninthefollowingcommand.Thedefaultloggerconfigurationisdisplayedontheconsole.ThedefaultmodeisINFO,andyoumaychangeitbyoverwritingthedefaultloggersettingsbyupdatinghadoop.root.logger=WARN,consoleinconf/log4j.properties:
NumberofMaps=16
SamplesperMap=10000
WroteinputforMap#0
WroteinputforMap#1
WroteinputforMap#2
WroteinputforMap#3
WroteinputforMap#4
WroteinputforMap#5
WroteinputforMap#6
WroteinputforMap#7
WroteinputforMap#8
WroteinputforMap#9
WroteinputforMap#10
WroteinputforMap#11
WroteinputforMap#12
WroteinputforMap#13
WroteinputforMap#14
WroteinputforMap#15
StartingJob
11/09/1421:12:02INFOmapreduce.Job:map0%reduce0%
11/09/1421:12:09INFOmapreduce.Job:map25%reduce0%
11/09/1421:12:11INFOmapreduce.Job:map56%reduce0%
11/09/1421:12:12INFOmapreduce.Job:map100%reduce0%
11/09/1421:12:12INFOmapreduce.Job:map100%reduce100%
11/09/1421:12:12INFOmapreduce.Job:Jobjob_1381790835497_0003completed
successfully
11/09/1421:12:19INFOmapreduce.Job:Counters:44
FileSystemCounters
FILE:Numberofbytesread=358
FILE:Numberofbyteswritten=1365080
FILE:Numberofreadoperations=0
FILE:Numberoflargereadoperations=0
FILE:Numberofwriteoperations=0
HDFS:Numberofbytesread=4214
![Page 155: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/155.jpg)
HDFS:Numberofbyteswritten=215
HDFS:Numberofreadoperations=67
HDFS:Numberoflargereadoperations=0
HDFS:Numberofwriteoperations=3
JobCounters
Launchedmaptasks=16
Launchedreducetasks=1
Data-localmaptasks=14
Rack-localmaptasks=2
Totaltimespentbyallmapsinoccupiedslots
(ms)=184421
Totaltimespentbyallreducesinoccupiedslots
(ms)=8542
Map-ReduceFramework
Mapinputrecords=16
Mapoutputrecords=32
Mapoutputbytes=288
Mapoutputmaterializedbytes=448
Inputsplitbytes=2326
Combineinputrecords=0
Combineoutputrecords=0
Reduceinputgroups=2
Reduceshufflebytes=448
Reduceinputrecords=32
Reduceoutputrecords=0
SpilledRecords=64
ShuffledMaps=16
FailedShuffles=0
MergedMapoutputs=16
GCtimeelapsed(ms)=195
CPUtimespent(ms)=7740
Physicalmemory(bytes)snapshot=6143396896
Virtualmemory(bytes)snapshot=23142254400
Totalcommittedheapusage(bytes)=43340769024
ShuffleErrors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
FileInputFormatCounters
BytesRead=1848
FileOutputFormatCounters
BytesWritten=98
JobFinishedin23.144seconds
EstimatedvalueofPiis3.14127500000000000000
YoucancomparetheexamplethatrunsoverHadoop1.xandtheonethatrunsoverYARN.Youcanhardlydifferentiatebylookingatthelogs,butyoucanclearlyidentifythedifferenceinperformance.YARNhasbackward-compatibilitysupportwithMapReduce1.x,withoutanycodechange.
![Page 156: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/156.jpg)
![Page 157: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/157.jpg)
MonitoringYARNapplicationswithwebGUINow,wewilllookattheYARNwebGUItomonitortheexamples.YoucanmonitortheapplicationsubmissionID,theuserwhosubmittedtheapplication,thenameoftheapplication,thequeueinwhichtheapplicationissubmitted,thestarttimeandfinishtimeinthecaseoffinishedapplications,andthefinalstatusoftheapplication,usingtheResourceManagerUI.TheResourceManagerwebUIdiffersfromtheUIoftheHadoop1.xversions.ThefollowingscreenshotshowstheinformationwecouldgetfromtheYARNwebUI(http://localhost:8088).
Currently,thefollowingwebUIisshowinginformationrelatedtothePIexampleweranintheprevioussection,exploringtheYARNwebUI:
ThefollowingscreenshotshowsthePIexamplerunningovertheYARNframeworkandthePIexamplesubmittedbytherootuserintothedefaultqueue.AnApplicationMasterisassignedtoit,whichiscurrentlyintherunningstate.Similarly,youcanalsomonitorallthesubmitted,acceptedandrunning,finished,andfailedjobs’statusesfromtheResourceManagerwebUI.
![Page 158: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/158.jpg)
Ifyoudrilldownfurther,youcanseetheapplicationmaster-levelinformationofthesubmittedapplication,suchasthetotalcontainersallocatedtothemapandreducefunctionsandtheirrunningstatus.Forexample,thefollowingscreenshotshowsthatwealreadysubmittedaPIexamplewith16mappers.Sointhefollowingscreenshot,youcanseethatthetotalnumberofcontainersallocatedtothemapfunctionis16,outofwhich8arecompletedand8areintherunningstate.YoucanalsotrackthecontainersallocatedtothereducefunctionanditsprogressfromUI:
Youcanseealltheinformationdisplayedovertheconsolewhilerunningthejob.ThesameinformationwillalsobedisplayedonthewebUIinatabularformandinamoresophisticatedway:
AllthemapperandreducerjobsandfilesystemcounterswillbedisplayedunderthecountersectionoftheYARNapplicationwebGUI.Youcanalsoexploretheconfigurationsoftheapplicationintheconfigurationssection:
![Page 159: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/159.jpg)
Thefollowingscreenshotshowsthestatisticsofthefinishedjob,suchasthetotalnumberofmappers,reducers,starttime,finishtime,andsoon:
ThefollowingscreenshotoftheYARNwebUIgivesschedulinginformationabouttheYARNcluster,suchastheclusterresourcecapacityandcontainersallocatedtotheapplicationorqueue:
![Page 160: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/160.jpg)
Attheend,youwillseethejobsummarypage.Youmayalsoexaminethelogsbyclickingonthelogslinkprovidedonthejobsummarypage.
OnceauserreturnstothemainclusterUI,choosesanyfinishedapplications,andthenselectsajobwerecentlyran,theuserwillabletoseethesummarypage,asshowninfollowingscreenshot:
Thereareafewthingstonoteaswemovedthroughthewindowsdescribedearlier.First,asYARNmanagesapplications,allinputfromYARNreferstoanapplication.YARNhasnodataontheactualapplication.DatafromtheMapReducejobisprovidedbytheMapReduceframework.Therefore,therearetwoclearlydifferentdatastreamsthatarecombinedinthewebGUI,YARNapplicationsandMapReduceframeworkjobs.Iftheframeworkdoesnotprovidejobinformation,thencertainpartsofthewebGUIwillhavenothingtodisplay.
AveryimportantfactaboutYARNjobsisthedynamicnatureofthecontainerallocationstothemapperandreducertasks.TheseareexecutedasYARNcontainers,andtheirrespectivenumberalsochangesdynamicallyaspertheapplication’sneedsandrequirements.Thisfeatureprovidesmuchbetterclusterutilizationduetothedynamic
![Page 161: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/161.jpg)
container(“slots”intraditionallanguage)allocations.
![Page 162: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/162.jpg)
![Page 163: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/163.jpg)
YARN’sMapReducesupportMapReducewastheonlyusecaseonwhichthepreviousversionsofHadoopweredeveloped.WeknowthatMapReduceismainlyusedfortheefficientandeffectiveprocessingofbigdata.Itisusedtoprocessagraphandmillionsofitsnodesandedges.Goingforwardwithtechnology,tocaterfortherequirementsofdatalocationavailability,faulttolerantsystems,andapplicationpriorities,YARNbuiltsupportforeverythingfromasimpleshellscriptapplicationtoacomplexMapReduceapplication.
Forthedatalocationavailability,MapReducer’sApplicationMasterhastofindoutthedatablocklocationsandallocationsofcontainerstoprocesstheseblocksaccordingly.Faulttolerantsystemmeanstheabilitytohandlefailedtasksandactonthemaccordingly,suchastohandlefailedmapandreducetasksandrerunthemwithothercontainersifneeded.Prioritiesareassignedtoeachapplicationinthequeue;thelogictohandlecomplexintra-applicationprioritiesformapandreducetaskshastobebuiltintotheApplicationMaster.Thereisnoneedtostartidlereducersbeforemappersfinishenoughdataprocessing.ReducersarenowunderthecontroloftheYARNApplicationMasterandarenotfixedastheyhadbeeninHadoopversion1.
![Page 164: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/164.jpg)
TheMapReduceApplicationMasterTheMapReduceApplicationMasterserviceismadeupofmultipleloosely-coupledservices;theseservicesinteractwitheachotherviaevents.Everyservicegetstriggeredonaneventandproducesanoutputastheeventtriggersanotherservice;thishappenshighlyconcurrentlyandwithoutsynchronization.Allservicecomponentsareregisteredwiththecentraldispatcherservice,andserviceinformationissharedbetweenthemultiplecomponentsviaApplicationContext(AppContext).
InHadoopversion1,alltherunningandsubmittedjobsarepurelydependentontheJobTracker,sothefailureofJobTrackerresultsinalossofalltherunningandsubmittedjobs.However,withYARN,theApplicationMasterisequivalenttotheJobTracker.TheApplicationMasterrunsandallocatesnodestoanapplication.Itmayfail,butYARNhasthecapabilitytorestarttheApplicationMasteraspecifiednumberoftimesandthecapabilitytorecovercompletedtasks.MorelikeJobTracker,theApplicationMasterkeepsthemetricsofthejobscurrentlyrunning.ThefollowingsettingsintheconfigurationfileenableMapReducerecoveryinYARN.
ToenabletherestartoftheApplicationMaster,executethefollowingsteps:
1. Insideyarn-site.xml,youcantunetheyarn.resourcemanager.am.max-retriesproperty.Thedefaultis2.
2. Insidemapred-site.xml,youcandirectlytunehowmanytimesaMapReduceApplicationMastershouldrestartwiththemapreduce.am.max-attemptsproperty.Thedefaultis2.
3. Toenablerecoveryofcompletedtasks,lookinsidethemapred-site.xmlfile.Theyarn.app.mapreduce.am.job.recovery.enablepropertyenablestherecoveryoftasks.Bydefault,itistrue.
![Page 165: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/165.jpg)
ExampleYARNMapReducesettingsYARNhasreplacedthefixedslotarchitectureformappersandreducerswithflexibledynamiccontainerallocation.TherearesomeimportantparameterstorunMapReduceefficiently,andtheycanbefoundinmapred-site.xmlandyarn-site.xml.Asanexample,thefollowingaresomesettingsthathavebeenusedtoruntheMapReduceapplicationonYARN:
Property Propertyfile Value
mapreduce.map.memory.mb mapred-site.xml 1536
mapreduce.reduce.memory.mb mapred-site.xml 2560
mapreduce.map.java.opts mapred-site.xml -Xmx1024m
mapreduce.reduce.java.opts mapred-site.xml -Xmx2048m
yarn.scheduler.minimum-allocation-mb yarn-site.xml 512
yarn.scheduler.maximum-allocation-mb yarn-site.xml 4096
yarn.nodemanager.resource.memory-mb yarn-site.xml 36864
yarn.nodemanager.vmem-pmem-ratio yarn-site.xml 2.1
YARNconfigurationallowsacontainersizebetween512MBto4GB.Ifnodeshave36GBofRAMwithavirtualmemoryof2.1,eachmapcanhavemax3225.6MB,andeachreducercanhave5376MBofvirtualmemory.So,thecomputenodeconfiguredfor36GBofcontainerspacecansupportupto24mapsand14reducers,oranycombinationofmapperandreducersallowedbytheavailableresourcesonthenode.
![Page 166: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/166.jpg)
YARN’scompatibilitywithMapReduceapplicationsForasmoothtransitionfromHadoopv1toYARN,applicationbackwardcompatibilityhasbeenthemajorgoaloftheYARNimplementationteamtoensurethatexistingMapReduceapplicationsthatwereprogrammedusingHadoopv1(MRv1)APIsandcompliedagainstthemcancontinuetorunoverYARN,withlittleenhancement.
YARNensuresfullbinarycompatibilitywithHadoopv1(MRv1)APIs;userswhousedtheorg.apache.hadoop.mapredAPIsprovidefullcompatibilitywiththeYARNframework,withoutrecompilation.YoucanuseyourMapReduceJARfileandbin/hadooptosubmitthemdirectlytoYARN.
YARNintroducednewAPIchangesforMapReduceapplicationsontopoftheYARNframeworkintoorg.apache.hadoop.mapreduce.
Ifanapplicationisdevelopedbyorg.apache.hadoop.mapreduceandcompliedbytheHadoopv1(MRv1)APIs,thenunfortunatelyYARNdoesn’tprovidecompatibilitywithit,asorg.apache.hadoop.mapreduceAPIshavegonethroughaYARNtransitionandshouldberecompiledagainstHadoopv2(MRv2)torunoverYARN.
![Page 167: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/167.jpg)
DevelopingYARNapplicationsTodevelopaYARNapplication,youneedtokeeptheYARNarchitectureinmind.YARNisaplatformthatallowsdistributedapplicationstotakefulladvantageoftheresourcesthatYARNhasdeployed.Currently,resourcescanbethingssuchasCPU,memory,anddata.Manydeveloperswhocomefromaserver-sideapplication-developmentbackgroundorfromaMapReducedeveloperbackgroundmaybeaccustomedtoacertainflowinthedevelopmentanddeploymentcycle.
Inthissection,we’lldescribethedevelopmentlifecycleofYARNapplications.Also,we’llfocusonthekeyareasofYARNapplicationdevelopment,suchashowYARNapplicationscanlaunchcontainers,howresourceallocationhasbeendonefortheapplications,andmanyotherareasindetail.
ThegeneralworkflowoftheYARNapplicationsubmissionisthattheYARNClientcommunicateswiththeResourceManagerthroughtheApplicationClientProtocoltogenerateanewApplicationID.ItthensubmitstheapplicationtotheResourceManagertorunviatheApplicationClientProtocol.Asapartoftheprotocol,theYARNClienthastoprovidealltherequiredinformationtotheResourceManagertolaunchtheapplication’sfirstcontainer,thatis,theApplicationMaster.TheYARNClientalsoneedstoprovideinformationdetailsofthedependencyJARs/filesfortheapplicationviacommand-linearguments.YoucanalsospecifythedependencyJARs/filesintheenvironmentvariables.
ThefollowingaresomeinterfaceprotocolsthattheYARNframeworkwilluseforintercomponentcommunication:
ApplicationClientProtocol:ThisprotocolisusedbyYARNforcommunicationbetweentheYARNClientandResourceManagertolaunchanewapplication,checkitsstatus,ortokilltheapplication.ApplicationMasterProtocol:ThisprotocolisusedbytheYARNframeworktocommunicatebetweentheApplicationMasterandResourceManager.ItisusedbytheApplicationMastertoregister/unregisteritselfto/fromtheResourceManagerandalsofortheresourceallocation/deallocationrequesttotheResourceManager.ContainerManagerProtocol:ThisprotocolisusedforcommunicationbetweentheApplicationMasterandNodeManagertostartandstopcontainersandtheirstatusupdates.
![Page 168: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/168.jpg)
![Page 169: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/169.jpg)
TheYARNapplicationworkflowNow,takealookatthefollowingsequencediagramthatdescribestheYARNapplicationworkflowandalsoexplainshowcontainerallocationisdoneforanapplicationviatheApplicationMaster:
Refertotheprecedingdiagramforthefollowingdetails:
TheclientsubmitstheapplicationrequesttotheResourceManager.TheResourceManagerregisterstheapplicationwiththeApplicationManager,generatestheApplicationID,andrespondstotheclientwiththesuccessfullyregisteredApplicationID.Then,theResourceManagerstartstheclient’sApplicationMasterinaseparateavailablecontainer.Ifnocontainerisavailable,thenthisrequesthastowaittillasuitablecontainerisfound,andsendtheapplicationregistrationrequestforapplicationregistration.TheResourceManagersharesalltheminimumandmaximumresourcecapabilitiesoftheclusterwiththeApplicationMaster.Then,theApplicationMasterdecideshowtoefficientlyusetheavailableresourcestofulfillapplicationneeds.DependingontheresourcecapabilitiessharedbytheResourceManager,theApplicationMasterrequeststheResourceManagertoallocatethenumberof
![Page 170: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/170.jpg)
containersonbehalfoftheapplication.TheResourceManagerrespondstotheResourceRequestbytheApplicationMasteraspertheschedulingpoliciesandresourceavailabilities.ContainerallocationbytheResourceManagermeanssuccessfulfulfillingoftheResourceRequestbytheApplicationMaster.
Whilerunningthejob,theApplicationMastersendstheheartbeatandjobprogressinformationoftheapplicationtotheResourceManager.Duringtherunningtimeoftheapplication,theApplicationMasterrequestsforareleaseof,orallocatesmorecontainersto,theResourceManager.Whenthetimejobfinishes,theApplicationMastersendsacontainerdeallocationrequesttotheResourceManager,thusexitingitselffromtherunningcontainer.
![Page 171: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/171.jpg)
WritingtheYARNclientTheYARNclientisrequiredtosubmitthejobtotheYARNframework.ItisaplainJavaclass,simplyhavingmainasentrypointfunctioninto.ThemainfunctionoftheYARNclientistosubmittheapplicationtotheYARNenvironmentbyinstantiatingtheorg.apache.hadoop.yarn.conf.YarnConfigurationobject.TheYarnConfigurationobjectdependsonfindingtheyarn-default.xmlandyarn-site.xmlfilesinitsclasspath.AlltheserequirementsneedtobesatisfiedtoruntheYARNclientapplication.TheYARNclientprocessisshowninthefollowingimage:
OnceaYarnConfigurationobjectisinstantiatedinyourYARNclient,wehavetocreateanobjectoforg.apache.hadoop.client.api.YarnClientusingtheYarnConfigurationobjectthathasalreadybeeninstantiated.Thenewly-instantiatedYarnClientobjectwillbeusedtosubmittheapplicationstotheYARNframeworkusingthefollowingsteps:
1. CreateaninstanceofaYarnClientobjectusingYarnConfiguration.2. InitializetheYarnClientandtheYarnConfigurationobject.3. StartaYarnClient.4. GettheYARNcluster,node,andqueueinformation.5. GetAccessControlListinformationfortheuserrunningtheclient.6. Createtheclientapplication.7. SubmittheapplicationtotheYARNResourceManager.8. Getapplicationreportsaftersubmittingtheapplication.
Also,theYarnClientwillcreateacontextforapplicationsubmissionandfortheApplicationMaster’scontainerlaunch.TherunnableYarnClientwilltakethecommand-lineargumentsfromtheuserwhoisrequiredtorunthejob.We’llseethesimplecodesnippetfortheYARNapplicationclienttogetabetterideaaboutit.
ThefirststepoftheYARNClientistoconnectwiththeResourceManager.Thefollowingisthecodesnippetforit:
//DeclareApplicationClientProtocol
ApplicationClientProtocolapplicationsManager;
//InstamtiateYarnConfiguration
YarnConfigurationyarnConf=newYarnConfiguration(conf);
![Page 172: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/172.jpg)
//GettheResourceManagerIPaddress,ifnotprovidedusedefault
InetSocketAddressrmAddress=
NetUtils.createSocketAddr(yarnConf.get(
YarnConfiguration.RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_ADDRESS));
LOGGER.info("ConnectingtoResourceManagerat"+rmAddress);
configurationappsManagerServerConf=newConfiguration(conf);
appsManagerServerConf.setClass(
YarnConfiguration.YARN_SECURITY_INFO,
ClientRMSecurityInfo.class,SecurityInfo.class);
//InitializeApplicationManagerhandle
applicationsManager=((ApplicationClientProtocol)rpc.getProxy(
ApplicationClientProtocol.class,rmAddress,
appsManagerServerConf));
OncetheconnectionbetweentheYARNClientandResourceManagerisestablished,theYARNClientneedstorequesttheApplicationIDfromtheResourceManager:
GetNewApplicationRequestnewRequest=
Records.newRecord(GetNewApplicationRequest.class);
GetNewApplicationResponsenewResponse=
applicationsManager.getNewApplication(newRequest);
TheresponsefromtheApplicationManageristhenewly-generatedApplicationIDfortheapplicationsubmittedbytheYARNClient.Youcanalsogettheinformationrelatedtotheminimumandmaximumresourcecapabilitiesofthecluster(usingtheGetNewApplicationResponseAPI).Usingthisinformation,developerscansettherequiredresourcesfortheApplicationMastercontainertolaunch.
TheYARNClientneedstosetupthefollowinginformationfortheApplicationSubmissionContextinitialization;thisinformationincludesalltherequiredinformationneededbytheResourceManagertolaunchtheApplicationMaster,asmentionedhere:
Applicationinformation,suchasApplicationIDgeneratedbythepreviousstepNameoftheapplicationQueueandpriorityinformation,suchasinwhichqueuetheapplicationneedstobesubmittedandtheprioritiesassignedtotheapplicationUserinformation,thatis,bywhomtheapplicationistobesubmittedContainerLaunchContext,thatis,theinformationneededbytheApplicationMastertolaunchlocalresources(suchasJARs,binaries,andfiles)
Italsocontainsthesecurity-relatedinformation(securitytokens)andenvironmentalvariables(classpathsettings)withthecommandtobeexecutedviatheApplicationMaster:
//CreateanewlaunchcontextforAppMaster
ApplicationSubmissionContextappContext=
Records.newRecord(ApplicationSubmissionContext.class);
//settheApplicationId
appContext.setApplicationId(appId);
//settheapplicationname
appContext.setApplicationName(appName);
![Page 173: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/173.jpg)
//CreateanewcontainerlaunchcontextfortheApplicationMaster
ContainerLaunchContextamContainer=
Records.newRecord(ContainerLaunchContext.class);
//setthelocalresourcesrequiredfortheApplicationMaster
//localfilesorarchivesasneeded(forexamplesjarfiles)
Map<String,LocalResource>localResources=
newHashMap<String,LocalResource>();
//CopyApplicationMasterjartothefilesystemandcreate
//localresourcetopointdestinationjarpath
FileSystemfs=FileSystem.get(conf);
Pathsrc=newPath(AppMaster.jar);
StringpathSuffix=appName+"/"+appId.getId()+
"/AppMaster.jar";
Pathdst=newPath(fs.getHomeDirectory(),pathSuffix);
//CopyfilefromsrctodestionationonHDFS
fs.copyFromLocal(false,true,src,dst);
//getHDFSfilestatusfromthepathwhereitcopied
FileStatusjarStatus=fs.getFileStatus(dst);
LocalResourceamJarResorce=Records.newRecord(LocalResource.class);
//Setthetypeofresource-fileorarchive
//archivesareuntarredatthedestinationbytheframework
amJarResorce.setType(LocalResourceType.FILE);
//Setvisibilityoftheresource
//Settingtomostprivateoption
amJarResorce.setVisibility(LocalResourceVisibility.APPLICATION);
//Settheresourcetobecopiedoverlocation
amJarResorce.setResource(ConverterUtils.getYarnUrlFromPath(dst));
//Settimestampandlengthoffilesothattheframework
//candobasicsanitychecksforthelocalresource
//afterithasbeencopiedovertoensureitisthesame
//resourcetheclientintendedtousewiththeapplication
amJarResorce.setTimestamp(jarStatus.getModificationTime());
amJarResorce.setSize(jarStatus.getLen());
localResources.put("AppMaster.jar",amJarResorce);
//Setthelocalresourcesintothelaunchcontext
amContainer.setLocalResources(localResources);
//setthesecuritytokensasneeded
//amContainer.setContainerTokens(containerToken);
//Setuptheenvironmentneededforthelaunchcontextwherethe
//ApplicationMastertoberun
Map<String,String>env=newHashMap<String,String>();
//Forexample,wecouldsetuptheclasspathneeded.
//incaseofshellscriptexample,putrequiredresources
env.put(DSConstants.SCLOCATION,HdfsSCLocation);
env.put(DSConstants.SCTIMESTAMP,Long.toString(HdfsSCTimeStamp));
env.put(DSConstants.SCLENGTH,Long.toString(HdfsSCLength));
//AddAppMaster.jarlocationtotheClasspath.
//Bydefault,allthehadoopspecificclasspathswillalreadybe
![Page 174: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/174.jpg)
//available
//in$CLASSPATH,soweshouldbecarefulnottooverwriteit.
StringBuilderclassPathEnv=newStringBuilder("$CLASSPATH:./*:");
for(Stringstr:
conf.get(YarnConfiguration.YARN_APPLICATION_CLASSPATH).split(",")){
classPathEnv.append(':');
classPathEnv.append(str.trim());
}
//addlog4jpropertiesintotheenvvariableifrequired
classPathEnv.append(":./log4j.properties");
env.put("CLASSPATH",classPathEnv);
//setenvironmentalvaribalesintothecontainer
amContainer.setEnvironment(env);
//setnecessarycommandtobeexecutetheApplicationMaster
vector<CharSequence>vargs=newVector<CharSequence>(30);
//setjavaexecutablecommand
vargs.add("${JAVA_HOME}"+"/bin/java");
//setmemoryXmxbasedonAMmemoryrequirements
vargs.add("-Xms"+amMemory+"m");
//setClassName
vargs.add(amMasterMainClass);
//Setparametersforapplicationmaster
vargs.add("--container_memory"+String.valueOf(containerMemory));
vargs.add("--num_containers"+String.valueOf(numContainers));
vargs.add("--priority"+String.valueOf(shellCmdPriority));
if(!shellCommand.isEmpty()){
vargs.add("--shell_command"+shellCommand+"");
}
if(!shellArgs.isEmpty()){
vargs.add("--shell_args"+shellArgs+"");
}
for(Map.Entry<String,String>entry:shellEnv.entrySet()){
vargs.add("--shell_env"+entry.getKey()+"="+
entry.getValue());
}
if(debugFlag){
vargs.add("--debug");
}
vargs.add("1>"+ApplicationConstants.LOG_DIR_EXPANSION_VAR+
"/AppMaster.stdout");
vargs.add("2>"+ApplicationConstants.LOG_DIR_EXPANSION_VAR+
"/AppMaster.stderr");
//Getfinalcommand
StringBuildercommand=newStringBuilder();
for(CharSequencestr:vargs){
![Page 175: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/175.jpg)
command.append(str).append("");
}
List<String>commands=newArrayList<String>();
commands.add(command.toString());
//Setthecommandarrayintothecontainerspec
amContainer.setCommands(commands);
//ForlaunchinganAMcontainer,settinguserhereisnot
//needed
//amContainer.setUser(amUser);
Resourcecapability=Records.newRecord(Resource.class);
//Fornowonlymemoryissupported,sowesetthememory
capability.setMemory(amMemory);
amContainer.setResource(capability);
//Setthecontainerlaunchcontentintothe
ApplicationSubmissionContext
appContext.setAMContainerSpec(amContainer);
Nowthesetupprocessiscomplete,andourYARNClientisreadytosubmittheapplicationtotheApplicationManager:
//CreatetheApplicationrequesttosendtotheApplicationsManager
SubmitApplicationRequestappRequest=
Records.newRecord(SubmitApplicationRequest.class);
appRequest.setApplicationSubmissionContext(appContext);
//SubmittheapplicationtotheApplicationsManager
//Ignoretheresponseaseitheravalidresponseobjectis
//returnedon
//successoranexceptionthrowntodenotethefailure
applicationsManager.submitApplication(appRequest);
Duringthisprocess,theResourceManagerwillacceptalltherequestsofapplicationsubmissionandallocatecontainerstotheApplicationMastertorun.TheprogressofthetasksubmittedbytheclientcanbetrackedbycommunicatingwiththeResourceManagerandrequestinganapplicationstatusreportviatheApplicationClientProtocol:
GetApplicationReportRequestreportRequest=
Records.newRecord(GetApplicationReportRequest.class);
reportRequest.setApplicationId(appId);
GetApplicationReportResponsereportResponse=
applicationsManager.getApplicationReport(reportRequest);
ApplicationReportreport=reportResponse.getApplicationReport();
TheresponsetothereportrequestreceivedfromtheResourceManagercontainsgeneralapplicationinformation,suchastheApplicationID,thequeueinformationinwhichtheapplicationisrunning,andinformationontheuserwhosubmittedtheapplication.ItalsocontainstheApplicationMasterdetails,thehostonwhichtheApplicationMasterisrunning,andapplication-trackinginformationtomonitortheprogressoftheapplication.
![Page 176: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/176.jpg)
Theapplicationreportalsocontainstheapplicationstatusinformation,suchasSUBMITTED,RUNNING,FINISHED,andsoon.
Also,theclientcandirectlyquerytheApplicationMastertogetreportinformationviahost:rpc_portobtainedfromtheApplicationReport.
Sometimes,theapplicationmaybewronglysubmittedinanotherqueueormaytakelongerthanusual.Insuchcases,theclientmaywanttokilltheapplication.TheApplicationClientProtocolsupportstheforcefullykilloperationthatcansendakillsignaltotheApplicationMasterviatheResourceManager:
KillApplicationRequestkillRequest=
Records.newRecord(KillApplicationRequest.class);
killRequest.setApplicationId(appId);
applicationsManager.forceKillApplication(killRequest);
![Page 177: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/177.jpg)
WritingtheYARNApplicationMasterThistaskistheheartofthewholeprocess.ThiswouldbelaunchedbytheResourceManager,andallthenecessaryinformationwillbeprovidedbytheclient.AstheApplicationMasterislaunchedinthefirstcontainerallocatedbytheResourceManager,severalparametersaremadeavailablebytheResourceManagerviaenvironment.TheseparametersincludecontainerIDfortheApplicationMastercontainer,applicationsubmissiontimeanddetailsabouttheNodeManagerandthehostonwhichtheApplicationMasterisrunning.InteractionsbetweentheApplicationMasterandtheResourceManagerwouldrequiretheApplicationAttemptID.ThiswillbeobtainedfromtheApplicationMaster’sContainerID:
Map<String,String>envs=System.getenv();
StringcontainerIdString=
envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV);
if(containerIdString==null){
thrownewIllegalArgumentException(
"ContainerIdnotsetintheenvironment");
}
ContainerIdcontainerId=
ConverterUtils.toContainerId(containerIdString);
ApplicationAttemptIdappAttemptID=
containerId.getApplicationAttemptId();
AfterthesuccessfulinitializationoftheApplicationMaster,itneedstoberegisteredwiththeResourceManagerviatheApplicationMasterProtocol.TheApplicationMasterandResourceManagercommunicateviatheSchedulerinterface:
//ConnecttotheResourceManagerandreturnhandlewithRM
YarnConfigurationyarnConf=newYarnConfiguration(conf);
InetSocketAddressrmAddress=
NetUtils.createSocketAddr(yarnConf.get(
YarnConfiguration.RM_SCHEDULER_ADDRESS,
YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS));
LOG.info("ConnectingtoResourceManagerat"+rmAddress);
ApplicationMasterProtocolresourceManager=
(ApplicationMasterProtocol)
rpc.getProxy(ApplicationMasterProtocol.class,rmAddress,conf);
//RegistertheApplicationMastertotheResourceManager
//Settherequiredinfointotheregistrationrequest:
//ApplicationAttemptId,
//hostonwhichtheappmasterisrunning
//rpcportonwhichtheappmasteracceptsrequestsfromtheclient
//trackingurlfortheclienttotrackappmasterprogress
RegisterApplicationMasterRequestappMasterRequest=
Records.newRecord(RegisterApplicationMasterRequest.class);
appMasterRequest.setApplicationAttemptId(appAttemptID);
appMasterRequest.setHost(appMasterHostname);
appMasterRequest.setRpcPort(appMasterRpcPort);
appMasterRequest.setTrackingUrl(appMasterTrackingUrl);
RegisterApplicationMasterResponseresponse=
![Page 178: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/178.jpg)
resourceManager.registerApplicationMaster(appMasterRequest);
TheApplicationMastersendsstatustotheResourceManagerviaheartbeatsignals,andthetimeoutexpiryintervalsattheResourceManageraredefinedbyconfigurationsettingsintheYarnConfiguration.TheApplicationMasterProtocolcommunicateswiththeResourceManagertosendheartbeatsandapplicationprogressinformation.
Dependingonapplicationrequirements,theApplicationMastercanrequestfromtheResourceManagerthenumberofcontainerresourcestobeallocated.Forthisrequest,theApplicationMasterwillusetheResourceRequestAPItodefinecontainerspecifications.TheResourceRequestwillcontainthehostnameifthecontainersneedtobehostedonspecifichosts,orthe*wildcardcharacterwhichimpliesthatanyhostcanfulfilltheresourcecapabilities,suchasthememorytobeallocatedtothecontainer.Itwillalsocontainpriorities,tosetcontainersthatcanbeallocatedtospecifictasksonhigherpriority.Forexample,inmap-reducetasks,higherpriorityforacontainerisallocatedtothemaptaskandlowerpriorityforthecontainersisallocatedtothereducetask:
//ResourceRequest
ResourceRequestrequest=Records.newRecord(ResourceRequest.class);
//setuprequirementsforhosts
//whetheraparticularrack/hostisexpected
//Refertoapisunderorg.apache.hadoop.netformoredetailson
//using*asanyhostwilldo
request.setHostName("*");
//setnumberofcontainers
request.setNumContainers(numContainers);
//setthepriorityfortherequest
Prioritypri=Records.newRecord(Priority.class);
pri.setPriority(requestPriority);
request.setPriority(pri);
//Setupresourcetyperequirements
//Fornow,onlymemoryissupportedsowesetmemoryrequirements
Resourcecapability=Records.newRecord(Resource.class);
capability.setMemory(containerMemory);
request.setCapability(capability);
Afterdefiningthecontainerrequests,theApplicationMasterhastobuildanallocationrequestfortheResourceManager.TheAllocationRequestconsistsoftherequestedcontainers,containerstobereleased,theResponseID(theIDoftheresponsethatwouldbesentbackfromtheallocatecall)andprogressupdateinformation:
List<ResourceRequest>requestedContainers;
List<ContainerId>releasedContainers
AllocateRequestreq=Records.newRecord(AllocateRequest.class);
//Theresponseidsetintherequestwillbesentbackin
//theresponsesothattheApplicationMastercan
//matchittoitsoriginalaskandactappropriately.
req.setResponseId(rmRequestID);
![Page 179: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/179.jpg)
//SetApplicationAttemptId
req.setApplicationAttemptId(appAttemptID);
//AddthelistofcontainersbeingaskedbytheAM
req.addAllAsks(requestedContainers);
//ApplicationMastercanrequestResourceManagertodeallocation
//ofthecontainerifnolongerrequires.
req.addAllReleases(releasedContainers);
//ApplicationMastercantrackitsprogressbysettingprogess
req.setProgress(currentProgress);
AllocateResponseallocateResponse=resourceManager.allocate(req);
TheresponsetothecontainerallocationrequestfromtheApplicationMastertotheResourceManagercontainstheinformationonthecontainersallocatedtotheApplicationMaster,thenumberofhostsavailableinthecluster,andmanymoresuchdetails.
ContainersarenotimmediatelyassignedtotheApplicationMasterbytheResourceManager.However,whenthecontainerrequestissenttotheResourceManager,theApplicationMasterwilleventuallygetthecontainersbasedoncluster-capacity,prioritiesandcluster-schedulingpolicy:
//Retrievelistofallocatedcontainersfromtheresponse
List<Container>allocatedContainers=
allocateResponse.getAllocatedContainers();
for(ContainerallocatedContainer:allocatedContainers){
LOG.info("Launchingshellcommandonanewcontainer."
+",containerId="+allocatedContainer.getId()
+",containerNode="+allocatedContainer.getNodeId().getHost()
+":"+allocatedContainer.getNodeId().getPort()
+",containerNodeURI="+allocatedContainer.getNodeHttpAddress()
+",containerState"+allocatedContainer.getState()
+",containerResourceMemory"
+allocatedContainer.getResource().getMemory());
LaunchContainerRunnablerunnableLaunchContainer=
newLaunchContainerRunnable(allocatedContainer);
ThreadlaunchThread=newThread(runnableLaunchContainer);
launchThreads.add(launchThread);
launchThread.start();
}
//Checkwhatthecurrentavailableresourcesinthecluster
ResourceavailableResources=allocateResponse.getAvailableResources();
LOG.info("Currentavailableresourcesinthecluster"+
availableResources);
//Basedonthisinformation,anApplicationMastercanmake
//appropriatedecisions
//Checkthecompletedcontainers
![Page 180: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/180.jpg)
List<ContainerStatus>completedContainers=
allocateResponse.getCompletedContainersStatuses();
for(ContainerStatuscontainerStatus:completedContainers){
LOG.info("GotcontainerstatusforcontainerID="
+containerStatus.getContainerId()
+",state="+containerStatus.getState()
+",exitStatus="+containerStatus.getExitStatus()
+",diagnostics="+containerStatus.getDiagnostics());
intexitStatus=containerStatus.getExitStatus();
if(0!=exitStatus){
//containerfailed
if(-100!=exitStatus){
//applicationjoboncontainerreturnedanon-zeroexit
//codecountsascompleted
numCompletedContainers.incrementAndGet();
numFailedContainers.incrementAndGet();
}
else{
//somethingelsebadhappened
//appjobdidnotcompleteforsomereason
//weshouldre-tryasthecontainerwaslostforsome
//reason
numRequestedContainers.decrementAndGet();
//wedonotneedtoreleasethecontainerasthathas
//alreadybeendonebytheResourceManager/NodeManager.
}
}
else{
//nothingtodo
//containercompletedsuccessfully
numCompletedContainers.incrementAndGet();
LOG.info("Containercompletedsuccessfully."+",
containerId="+containerStatus.getContainerId());
}
}
}
AftercontainerallocationissuccessfullyperformedfortheApplicationMaster,ithastosetuptheContainerLaunchContextforthetasksonwhichitwillrun.OncetheContainerLaunchContextisset,theApplicationMastercanrequesttheContainerManagertostarttheallocatedcontainer:
//AssuminganallocatedContainerobtainedfromAllocateResponse
//andhasbeenalreadyinitializationofcontainerisdone
Containercontainer;
LOG.debug("ConnectingtoContainerManagerforcontainerid="+
container.getId());
//ConnecttoContainerManagerontheallocatedcontainer
StringcmIpPortStr=container.getNodeId().getHost()+":"
+container.getNodeId().getPort();
InetSocketAddresscmAddress=NetUtils.createSocketAddr(cmIpPortStr);
LOG.info("ConnectingtoContainerManagerat"+cmIpPortStr);
ContainerManagercm=((ContainerManager)
rpc.getProxy(ContainerManager.class,cmAddress,conf));
![Page 181: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/181.jpg)
//NowwesetupaContainerLaunchContext
LOG.info("Settingupcontainerlaunchcontainerforcontainerid="+
container.getId());
ContainerLaunchContextctx=
Records.newRecord(ContainerLaunchContext.class);
ctx.setContainerId(container.getId());
ctx.setResource(container.getResource());
try{
ctx.setUser(UserGroupInformation.getCurrentUser().getShortUserName());
}catch(IOExceptione){
LOG.info(
"Gettingcurrentuserfailedwhentryingtolaunchthe
container",+e.getMessage());
}
//Settheenvironment
Map<String,String>unixEnv;
//Setuptherequiredenv.
//Pleasenotethatthelaunchedcontainerdoesnotinherit
//theenvironmentoftheApplicationMastersoallthe
//necessaryenvironmentsettingswillneedtobere-setup
//forthisallocatedcontainer.
ctx.setEnvironment(unixEnv);
//Setthelocalresources
Map<String,LocalResource>localResources=
newHashMap<String,LocalResource>();
//Again,thelocalresourcesfromtheApplicationMasterisnotcopied
over
//bydefaulttotheallocatedcontainer.Thus,itisthe
responsibility
//oftheApplicationMastertosetupallthenecessarylocal
resources
//neededbythejobthatwillbeexecutedontheallocated
container.
//Assumethatweareexecutingashellscriptontheallocated
container
//andtheshellscript'slocationinthefilesystemisknowntous.
PathshellScriptPath;
LocalResourceshellRsrc=Records.newRecord(LocalResource.class);
shellRsrc.setType(LocalResourceType.FILE);
shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION);
shellRsrc.setResource(
ConverterUtils.getYarnUrlFromURI(newURI(shellScriptPath)));
shellRsrc.setTimestamp(shellScriptPathTimestamp);
shellRsrc.setSize(shellScriptPathLen);
localResources.put("MyExecShell.sh",shellRsrc);
ctx.setLocalResources(localResources);
//Setthenecessarycommandtoexecuteontheallocatedcontainer
![Page 182: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/182.jpg)
Stringcommand="/bin/sh./MyExecShell.sh"
+"1>"+ApplicationConstants.LOG_DIR_EXPANSION_VAR+"/stdout"
+"2>"+ApplicationConstants.LOG_DIR_EXPANSION_VAR+"/stderr";
List<String>commands=newArrayList<String>();
commands.add(command);
ctx.setCommands(commands);
//SendthestartrequesttotheContainerManager
StartContainerRequeststartReq=
Records.newRecord(StartContainerRequest.class);
startReq.setContainerLaunchContext(ctx);
try{
cm.startContainer(startReq);
}catch(YarnRemoteExceptione){
LOG.info("Startcontainerfailedfor:"+",containerId="+
container.getId());
e.printStackTrace();
}
TheApplicationMasterwillgettheapplicationstatusinformationviatheApplicationMasterProtocol.Also,itmaymonitorbyqueryingtheContainerManagerfortheapplicationstatus:
GetContainerStatusRequeststatusReq=
Records.newRecord(GetContainerStatusRequest.class);
statusReq.setContainerId(container.getId());
GetContainerStatusResponsestatusResp;
try{
statucResp=cm.getContainerStatus(statusReq);
LOG.info("ContainerStatus"
+",id="+container.getId()
+",status="+statusResp.getStatus());
}catch(YarnRemoteExceptione){
e.printStackTrace();
}
ThiscodesnippetexplainshowtowritetheYARNClientandApplicationMasteringeneral.Actually,theApplicationMasteristheapplication-specificentity;eachapplicationorframeworkthatwantstorunoverYARNhasadifferentApplicationMaster,buttheflowisthesame.FormoredetailsontheYARNClientandApplicationMasterfordifferentframeworks,visittheApacheFoundationwebsite.
ResponsibilitiesoftheApplicationMasterTheApplicationMasteristheapplication-specificlibraryandisresponsiblefornegotiatingresourcesfromtheResourceManageraspertheclientapplication’srequirementsandneeds.TheApplicationMasterworkswiththeNodeManagertoexecuteandmonitorthecontainerandtracktheapplication’sprogress.TheApplicationMasteritselfrunsinoneofthecontainersallocatedbytheResourceManager,andtheResourceManagertrackstheprogressoftheApplicationMaster.
TheApplicationMasterprovidesscalabilitytotheYARNframework,asthe
![Page 183: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/183.jpg)
ApplicationMastercanprovideafunctionalitythatismuchsimilartothatofthetraditionalResourceManager,sotheYARNclusterisabletoscalewithmanyhardwarechanges.Also,bymovingalltheapplication-specificcodeintotheApplicationMaster,YARNgeneralizesthesystemsothatitcansupportmultipleframeworks,justbywritingtheApplicationMaster.
![Page 184: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/184.jpg)
![Page 185: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/185.jpg)
SummaryInthischapter,youlearnedhowtousebundledapplicationsthatcomewiththeYARNframework,howtodeveloptheYARNClientandApplicationMaster,thecorepartsoftheYARNframework,howtosubmitanapplicationtoYARN,howtomonitoranapplication,andtheresponsibilitiesoftheApplicationMaster.
Inthenextchapter,youwilllearntowritesomereal-timepracticalexamples.
![Page 186: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/186.jpg)
![Page 187: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/187.jpg)
Chapter7.YARNFrameworksIt’sthedawnof2015,andbigdataisstillinitsboomingstage.Manynewstart-upsandgiantsareinvestingahugeamountintodevelopingPOCsandnewframeworkstocatertoanewandemergingvarietyofproblems.Theseframeworksarethenewcutting-edgetechnologiesorprogrammingmodelsthattendtosolvetheproblemsacrossindustriesintheworldofbigdata.Asthecorporationsaretryingtousebigdata,theyarefacinganewanduniquesetofproblemsthattheyneverfacedbefore.Hence,tosolvethesenewproblems,manyframeworksandprogrammingmodelsarecomingontothemarket.
YARN’ssupportformultipleprogrammingmodelsandframeworksmakesitidealtobeintegratedwiththesenewandemergingframeworksorprogrammingmodels.WithYARNtakingresponsibilityforresourcemanagementandothernecessarythings(schedulingjobs,faulttolerance,andsoon),itallowsthesenewapplicationframeworkstofocusonsolvingtheproblemsthattheywerespecificallymeantfor.
Atthetimeofwritingthisbook,manynewandemergingopensourceframeworksarealreadyintegratedwithYARN.
Inthischapter,wewillcoverthefollowingframeworksthatrunonYARN:
ApacheSamzaStormonYARNApacheSparkApacheTezApacheGiraphHoya(HBaseonYARN)KOYA(KafkaonYARN)
WewilltalkindetailaboutApacheSamzaandStormonYARN,wherewewilldevelopandrunsomesampleapplications.Forotherframeworks,wewillhaveabriefdiscussion.
![Page 188: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/188.jpg)
ApacheSamzaSamzaisanopensourceprojectfromLinkedInandiscurrentlyanincubationprojectattheApacheSoftwareFoundation.Samzaisalightweightdistributedstream-processingframeworktodoreal-timeprocessingofdata.TheversionthatisavailablefordownloadfromtheApachewebsiteisnottheproductionversionthatLinkedInuses.
Samzaismadeupofthefollowingthreelayers:
AstreaminglayerAnexecutionlayerAprocessinglayer
Samzaprovidesout-of-the-boxsupportforalltheprecedingthreelayers:
Streaming:ThislayerissupportedbyKafka(anotheropensourceprojectfromLinkedIn)Execution:supportedbyYARNProcessing:supportedbySamzaAPI
ThefollowingthreepiecesfittogethertoformSamza:
ThefollowingarchitectureshouldbefamiliartoanyonewhohasusedHadoop:
Beforegoingintoeachofthesethreelayersindepth,itshouldbenotedthatSamza’ssupportisnotlimitedtothesesystems.BothSamza’sexecutionandstreaminglayersarepluggableandallowdeveloperstoimplementalternativesasrequired.
Samzaisastream-processingsystemtoruncontinuouscomputationoninfinitestreamsofdata.
Samzaprovidesasystemtoprocessstreamdatafrompublish-subscribesystemssuchasApacheKafka.Thedeveloperwritesastream-processingtaskandexecutesitasaSamzajob.Samzathenroutesmessagesbetweenthestream-processingtasksandthepublish-
![Page 189: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/189.jpg)
subscribesystemsthatthemessagesareaddressedto.
SamzaworksalotlikeStorm,theTwitter-developedstream-processingtechnology,exceptthatSamzarunsonKafka,LinkedIn’sownmessagingsystem.Samzawasdevelopedwithapluggablearchitecture,enablingdeveloperstousethesoftwarewithothermessagingsystems.
ApacheSamzaisbasicallyacombinationofthefollowingtechnologies:
Kafka:SamzausesApacheKafkaasitsunderlyingmessagepassingsystemApacheYARN:SamzaalsousesApacheYARNfortaskschedulingZooKeeper:BothYARNandKafka,inturn,relyonApacheZooKeeperforcoordination
Moreinformationisavailableontheofficialsiteathttp://samza.incubator.apache.org/.
Wewillusethehello-samzaprojecttodevelopasampleexampletoprocesssomereal-timestreamprocessing.
WewillwriteaKafkaproducerusingtheJavaKafkaAPIstopublishacontinuousstreamofmessagestoaKafkatopic.Finally,wewillwriteaSamzaconsumerusingtheSamzaAPItoprocessthesestreamsfromtheKafkatopicinrealtime.Forsimplicity,wewilljustprintamessageandrecordeachtimeamessageisreceivedintheKafkatopic.
![Page 190: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/190.jpg)
WritingaKafkaproducerLet’sfirstwriteaKafkaproducertopublishmessagestoaKafkatopic(namedstorm-sentence):
importjava.io.BufferedReader;
importjava.io.File;
importjava.io.FileInputStream;
importjava.io.FileNotFoundException;
importjava.io.FileReader;
importjava.io.IOException;
importjava.io.PrintStream;
importjava.util.Properties;
importkafka.javaapi.producer.Producer;
importkafka.producer.KeyedMessage;
importkafka.producer.ProducerConfig;
/**
*AsimpleJavaClasstopublishmessagesintoKAFKA.
*
*
*@authornirmal.kumar
*
*/
publicclassKafkaStringProducerService{
publicProducer<String,String>producer;
publicProducer<String,String>getProducer(){
returnthis.producer;
}
publicvoidsetProducer(Producer<String,String>producer){
this.producer=producer;
}
publicKafkaStringProducerService(Propertiesprop){
setProducer(newProducer(newProducerConfig(prop)));
}
/**
*Changethelocationofproducer.propertiesaccordinglyinLineNo.123
*
*Loadtheproducer.propertieshavingfollowingproperties:
*kafka.zk.connect=192.xxx.xxx.xxx
*serializer.class=kafka.serializer.StringEncoder
*producer.type=async
*queue.buffering.max.ms=5000000
*queue.buffering.max.messages=1000000
*metadata.broker.list=192.xxx.xxx.xxx:9092
*
*@paramfilepath
*@return
*/
privatestaticPropertiesgetConfiguartionProperties(Stringfilepath){
Filepath=newFile(filepath);
![Page 191: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/191.jpg)
Propertiesproperties=newProperties();
try{
properties.load(newFileInputStream(path));
}catch(FileNotFoundExceptione){
e.printStackTrace();
}catch(IOExceptione){
e.printStackTrace();
}
returnproperties;
}
/**
*PublisheseachmessagetoKAFKA
*
*@paraminput
*@paramii
*/
publicvoidexecute(Stringinput,intii){
KeyedMessagedata=newKeyedMessage("storm-sentence",input);
this.producer.send(data);
//LogstoSystemConsoletheno.ofmessagespublished(each100000)
if((ii!=0)&&(ii%100000==0))
System.out.println("$$$$$$$PUBLISHED"+ii+"messages@"
+System.currentTimeMillis());
}
/**
*Readseachlinefromtheinputmessagefile
*
*@paramfile
*@return
*@throwsIOException
*/
privatestaticStringreadFile(Stringfile)throwsIOException{
BufferedReaderreader=newBufferedReader(newFileReader(file));
Stringline=null;
StringBuilderstringBuilder=newStringBuilder();
Stringls=System.getProperty("line.separator");
while((line=reader.readLine())!=null){
stringBuilder.append(line);
stringBuilder.append(ls);
}
returnstringBuilder.toString();
}
/**
*mainmethodforinvokingtheJavaapplication
*Needtopasscommandlineargument:theabsolutefilepathcontaining
Stringmessages.
*
*@paramargs
*/
![Page 192: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/192.jpg)
publicstaticvoidmain(String[]args){
intii=0;
intnoOfMessages=Integer.parseInt(args[1]);
Strings=null;
try{
s=readFile(args[2]);
}catch(IOExceptione){
e.printStackTrace();
}
/**
*instantiatetheMainclass.
*Changethelocationofproducer.propertiesaccordingly
*/
KafkaStringProducerServiceservice=newKafkaStringProducerService(
getConfiguartionProperties("/home/cloud/producer.properties"));
System.out.println("********START:Publishing"+noOfMessages
+"messages@"+System.currentTimeMillis());
while(ii<=noOfMessages){
//invoketheexecutemethodtopublishmessagesintoKAFKA
service.execute(s,ii);
ii++;
}
System.out.println("#######END:Published"+noOfMessages
+"messages@"+System.currentTimeMillis());
try{
service.producer.close();
}catch(Exceptione){
e.printStackTrace();
}
}
}
CreatetheProducer.propertiesfilesomewherein/home/cloud/producer.propertiesandspecifythelocationinthepreviousKafkaproducerJavaclass.
TheProducer.propertiesfilewillhavethefollowinginformation:
![Page 193: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/193.jpg)
Writingthehello-samzaprojectLet’snowwriteaSamzaconsumerandpackageitwiththehello-samzaproject:
1. Downloadandbuildthehello-samzaproject.Checkoutthehello-samzaproject:
gitclonegit://git.apache.org/incubator-samza-hello-samza.githello-
samza
cdhello-samza
Theoutputoftheprecedingcodecanbeseenhere:
2. Next,wewillwriteaSamzaconsumerusingtheSamzaAPItoprocesstheseNmessagesfromaKafkatopic.Gottohello-samza/samza-wikipedia/src/main/java/samza/examples/wikipedia/taskandwritetheYarnEssentialsSamzaConsumer.javafileasfollows:
3. AfterwritingtheSamzaconsumerclassinthehello-samzaproject,youwillneedtobuildtheproject:
mvncleanpackage
4. Createasamzadirectoryinsidethedeploydirectory:
mkdir-pdeploy/samza
5. Finally,createtheSamzajobpackage:
![Page 194: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/194.jpg)
tar-xvf./samza-job-package/target/samza-job-package-0.7.0-dist.tar.gz
-Cdeploy/samza
6. ForSamzaconsumerproperties,goto/home/cloud/hello-samza/deploy/samza/config.
7. Writeasamza-test-consumer.propertiesfileasfollows:
Thispropertiesfilewillmainlycontainthefollowinginformation:
job.name:ThisisthenameoftheSamzajobyarn.package.path:ThisisthepathoftheSamzajobpackagetask.class:ThisistheclassoftheactualSamzaconsume.task.inputs:ThisistheKafkatopicnamewherethepublishedwillbereadfromsystems.kafka.consumer.zookeeper.connect:ThisistheZooKeeper-relatedinformation
StartingagridASamzagridusuallycomprisesthreedifferentsystems:YARN,Kafka,andZooKeeper.Thehello-samzaprojectcomeswithascriptcalledgridtohelpyousetupthesesystems.Startbyrunningthefollowingcommand:
bin/gridbootstrap
Thiscommandwilldownload,install,andstartZooKeeper,Kafka,andYARN.ItwillalsocheckoutthelatestversionofSamzaandbuildit.Allthepackagefileswillbeputinasubdirectorycalleddeployinsidethehello-samzaproject’srootfolder.Theresultoftheprecedingcommandisshownhere:
![Page 195: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/195.jpg)
ThefollowingscreenshotshowsthatZookeeper,YARN,andKafkaarebeingstarted:
Oncealltheprocessesareupandrunningyoucanchecktheprocesses,asshowninthisscreenshot:
TheYARNResourceManagerwebUIwilllooklikethis:
![Page 196: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/196.jpg)
TheYARNNodeManagerwebUIwilllooklikethis:
Sincewestartedthegrid,let’snowdeploytheSamzajobtoit:
deploy/samza/bin/run-job.sh--config-
factory=org.apache.samza.config.factories.PropertiesConfigFactory--config-
path=file:/home/cloud/hello-samza/deploy/samza/config/samza-test-
consumer.properties
ChecktheapplicationprocessesandRMUI.Asyoucanseeinthefollowingscreenshot,runningtheSamzajobfirstcreatesaSamzaAppMasterandthenaSamzaContainertorun
![Page 197: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/197.jpg)
theconsumerthatwewrote:
TheResourceManagerwebUInowshowstheSamzaapplicationupandrunning:
TheApplicationMasterUIlooksasfollows:
![Page 198: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/198.jpg)
ThefollowingscreenshotshowstheApplicationMasterUIinterface:
![Page 199: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/199.jpg)
SincenowourSamzaconsumerisupandrunningandlisteningforanymessagesintheKafkatopic(namedstorm-sentence),let’spublishsomemessagestotheKafkatopicusingtheKafkaproducerwewroteinitially.ThefollowingJavacommandisusedtoinvoketheKafkaproducerthathastwocommand-linearguments:
N:ThisisthenumberoftimesthemessageispublishedintoKafka{pathOfFileNameHavingMessage}:Thisistheactualstringmessage
Createanyfilehavingastringmessage(strmsg10K.txt)andpassthisfilenameandpathasthesecondcommand-lineargumenttotheJavacommand,asshowninthefollowingscreenshot:
AssoonasthesemessagesarepublishedintheKafkatopic,theSamzaconsumerconsumesitandprintsthetimestamp,aswrittenintheSamzaconsumercode.
TheresultaftercheckingtheSamzaconsumerlogsisasfollows:
![Page 200: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/200.jpg)
![Page 201: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/201.jpg)
![Page 202: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/202.jpg)
Storm-YARNApacheStormisanopensourcedistributedreal-timecomputationsystemfromTwitter.
Stormhelpsinprocessingunboundedstreamsofdatainareliablemanner.Stormcanbeusedwithanyprogramminglanguage.SomeofthemostcommonusecasesofStormarereal-timeanalytics,real-timemachinelearning,continuouscomputation,ETL,andmanymore.
Storm-YARNisaprojectfromYahoothatenablestheStormclustertobedeployedandmanagedbyYARN.Earlier,aseparateclusterwasneededforHadoopandStorm.
Onemajorbenefitthatcomeswiththisintegrationiselasticity.Batchprocessing(HadoopMapReduce)isusuallydoneonthebasisofneed,andreal-timeprocessing(Storm)isanongoingprocessing.WhentheHadoopclusterisidle,youcanleverageitforanyreal-timeprocessingwork.
Inatypicalreal-timeprocessingusecase,constantandpredictableloadsareveryrare.Storm,therefore,willneedmoreresourcesduringpeaktimewhentheloadisgreater.Atpeaktime,Stormcanstealresourcesfromthebatchjobsandgivethembackwhentheloadisless.
Thisway,theoverallresourceutilizationcanscaleupanddowndependingontheloadanddemand.Thiselasticityis,therefore,usefulforutilizingtheavailableresourcesonthebasisofdemandbetweenreal-timeandbatchprocessing.
AnotherbenefitisthatthisintegrationreducesthephysicaldistanceofdatatransfersbetweenStormandHadoop.ManyapplicationsusebothStormandHadooponseparateclusterswhilesharingdatabetweenthem(MapReduce).Forsuchascenario,Storm-YARNreducesnetworktransfers,andinturnthetotalcostofacquiringthedata,astheysharethesamecluster,asshowninthefollowingimage:
![Page 203: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/203.jpg)
Referringtotheprecedingdiagram,Storm-YARNasksYARN’sResourceManagertolaunchaStormApplicationMaster.TheStormApplicationMasterthenlaunchesaStormNimbusserverandaStormUIserverlocally.ItalsousesYARNtoallocateresourcesforthesupervisorsandfinallylaunchthem.
WewillnowinstallStorm-YARNonaHadoopYARNclusteranddeploysomeStormtopologiestothecluster.
![Page 204: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/204.jpg)
PrerequisitesThefollowingaretheprerequisitesforStorm-YARN.
HadoopYARNshouldbeinstalledRefertotheHadoopYARNinstallationathttp://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html.
TheMasterThriftserviceofStorm-on-YARNusesport9000,andifStorm-YARNislaunchedfromtheNameNode,therewillbeaportcrash.
Inthiscase,youwillneedtochangetheportoftheNameNodeinyourHadoopinstallation.Typically,thefollowingprocessesshouldbeupandrunninginHadoop:
ApacheZooKeepershouldbeinstalledAtthetimeofwritingthisbook,theStorm-on-YARNApplicationMasterimplementationdoesnotincluderunningZookeeperonYARN.Therefore,itispresumedthatthereisaZookeeperclusteralreadyrunningtoenablecommunicationbetweenNimbusandworkers.
Thereisanopenissuethatthisthoughtathttps://github.com/yahoo/storm-yarn/issues/22.
InstallingZookeeperisverystraightforwardandeasy.
Refertohttp://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html.
![Page 205: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/205.jpg)
SettingupStorm-YARNStorm-YARNisbasicallyanimplementationoftheYARNclientandApplicationMasterforStorm.
TheclientgetsanewapplicationIDforStormandsubmitstheapplication,andtheApplicationMastersetsuptheStormcomponents(Nimbus,Supervisor,andsoon)onYARNusingthecontainersthattheApplicationMasterrequestsfromtheResourceManager.
NotethatStorm-on-YARNisnotanewimplementationofStormthatworksonYARN.Frameworks(thatisSamza,Storm,Spark,Tez,andsoon)themselvesdonotneedtobemodifiedtobeabletorunonYARN.OnlytheApplicationMasterandtheYARNclientcodeneedtobewrittenforeachoftheframeworkssothattheyrunonYARNasanapplicationjustlikeanyother.Now,proceedwiththefollowingsteps:
1. ClonetheStorm-YARNrepositoryfromGit:
cdstorm-on-yarn-poc/
gitclonehttps://github.com/yahoo/storm-yarn.git
cdstorm-yarn
TheStormclientmachinereferstothemachinethatwillsubmittheYARNclientandApplicationMastertotheResourceManager.
Asofnow,thereissinglereleaseofStorm-on-YARNfromYahoothatcontainsbothStorm-YARNandStormversions(0.9.0-wip21).TheStormreleaseispresentinthelibdirectoryoftheextractedStorm-on-YARNrelease.
2. BuildStorm-YARNusingMaven:
mvnpackageormvnpackage-DskipTests
3. Wewillgetthefollowingoutput:
[INFO]Scanningforprojects…
[INFO]
[INFO]Usingthebuilder
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThread
edBuilderwithathreadcountof1
[INFO]
[INFO]----------------------------------------------------------------
--------
[INFO]Buildingstorm-yarn1.0-alpha
[INFO]----------------------------------------------------------------
--------
[INFO]
[INFO]Compiling5sourcefilesto/home/nirmal/storm-on-yarn-
poc/storm-yarn-master/target/test-classes
[INFO]
[INFO]---maven-jar-plugin:2.4:jar(default)@storm-yarn---
[INFO]
[INFO]---maven-surefire-plugin:2.10:test(default-test)@storm-yarn
---
![Page 206: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/206.jpg)
[INFO]Testsareskipped.
[INFO]
[INFO]---maven-jar-plugin:2.4:jar(default-jar)@storm-yarn---
[INFO]----------------------------------------------------------------
--------
[INFO]BUILDSUCCESS
[INFO]----------------------------------------------------------------
--------
[INFO]Totaltime:10.153s
[INFO]Finishedat:2014-11-12T15:57:49+05:30
[INFO]FinalMemory:10M/118M
[INFO]----------------------------------------------------------------
--------
[INFO]FinalMemory:14M/152M
[INFO]----------------------------------------------------
4. Next,youwillneedtocopythestorm.zipfilefromstorm-yarn/libtoHDFS.ThisissinceStorm-on-YARNwilldeployacopyofStormcodethroughoutallthenodesoftheYARNclusterusingHDFS.However,thelocationofwheretofetchthiscopyoftheStormcodeishardcodedintotheStorm-on-YARNclient.Copythestorm.zipfiletoHDFSusingthefollowingcommand:
hdfsdfs-mkdir-p/lib/storm/0.9.0-wip21
Alternatively,youcanalsousethefollowingcommand:
hadoopfs–mkdir-p/lib/storm/0.9.0-wip21
hdfsdfs-put/home/nirmal/storm-on-yarn-poc/storm-yarn-
master/lib/storm.zip/lib/storm/0.9.0-wip21/storm.zip
Youcanalsousethefollowingcommand:
hadoopfs-put/home/nirmal/storm-on-yarn-poc/storm-yarn-
master/lib/storm.zip/lib/storm/0.9.0-wip21/storm.zip
TheexactversionofStormmightdiffer,inyourcase,from0.9.0-wip21.
5. CreateadirectorytoholdourStormconfiguration:
mkdir-p/home/nirmal/storm-on-yarn-poc/storm-data/
cp/home/nirmal/storm-on-yarn-poc/storm-yarn-master/lib/storm.zip
/home/nirmal/storm-on-yarn-poc/storm-data/
cd/home/nirmal/storm-on-yarn-poc/storm-data
unzipstorm.zip
6. Addthefollowingconfigurationinthestorm.yamlfilelocatedat/home/nirmal/storm-on-yarn-poc/storm-data/storm-0.9.0-wip21/conf.Youcanchangethefollowingvaluesasperyoursetup:
storm.zookeeper.servers:localhostnimbus.host:localhostmaster.initial-num-supervisors:2master.container.size-mb:1024
![Page 207: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/207.jpg)
7. Addthestorm-yarn/binfoldertoyourpathvariable:
exportPATH=$PATH:/home/nirmal/storm-on-yarn-poc/storm-data/storm-
0.9.0-wip21/bin:/home/nirmal/storm-on-yarn-poc/storm-yarn-master/bin
8. Finally,launchStorm-YARNusingthefollowingcommand:
storm-yarnlaunch/home/nirmal/storm-on-yarn-poc/storm-data/storm-
0.9.0-wip21/conf/storm.yaml
LaunchingStorm-YARNexecutestheStorm-YARNclientthatgetsanappIDfromYARN’sResourceManagerandstartsrunningtheStorm-YARNApplicationMaster.TheApplicationMasterthenstartstheNimbus,Workers,andSupervisorservices.Youwillgetanoutputsimilartotheoneshowninthefollowingscreenshot:
9. WecanretrievethestatusofourapplicationusingthefollowingYARNcommand:
yarnapplication-list
Wewillgetthestatusofourapplicationasfollows:
10. YoucanalsoseeStorm-YARNrunningonthefollowingResourceManagerwebUIathttp://localhost:8088/cluster/:
![Page 208: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/208.jpg)
11. Nimbusshouldalsoberunningnow,andyoushouldbeabletoseeitthroughtheNimbuswebUIathttp://localhost:7070/.Thislooksasfollows:
12. Thefollowingprocessesshouldbeupandrunning:
![Page 209: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/209.jpg)
![Page 210: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/210.jpg)
Gettingthestorm.yamlconfigurationofthelaunchedStormclusterThemachinethatwillusetheStormclientcommandtosubmitanewtopologytoStormneedsthestorm.yamlconfigurationfileofthelaunchedStormclusteronYARNtobestoredin/home/nirmal/.storm/storm.yaml.
Normally,whenStormisnotrunonYARN,thisconfigurationfileismanuallyedited,soyoushouldknowtheIPaddressesoftheStormcomponents.However,sincethelocationofwheretheStormcomponentswillberunonYARNdependsonthelocationoftheallocatedcontainers,Storm-on-YARNisresponsibleforsettingstorm.yamlforus.Youcanfetchthisstorm.yamlfilefromtherunningStorm-on-YARN:
$cd
$mkdir.storm/
$storm-yarngetStormConfig-appId(checktheappIdontheYARNapplication
UIatport8088)-output/home/nirmal/.storm/storm.yaml
![Page 211: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/211.jpg)
BuildingandrunningStorm-StarterexamplesInthissection,wewillseehowtogettheexamplecodefromGitHub,builditusingMaven,andfinally,runtheexamples.Toperformthesetasks,you’llhavetoexecutethefollowingsteps:
1. GetthecodefromGitHub.Wewillusethestorm-starterfromGitHub:
gitclonehttps://github.com/nathanmarz/storm-starter
Cloninginto'storm-starter'...
remote:Countingobjects:756,done.
remote:Total756(delta0),reused0(delta0)
Receivingobjects:100%(756/756),171.81KiB|56.00KiB/s,done.
Resolvingdeltas:100%(274/274),done.
Checkingconnectivity…done
2. Next,gotothedownloadedstorm-starterdirectory:
cdstorm-starter/
3. Checkthecontentusingthefollowingcommands:
ls-ltr
-rw-r--r--1nirmalnirmal171Nov1212:58README.markdown
-rw-r--r--1nirmalnirmal5047Nov1212:58m2-pom.xml
drwxr-xr-x3nirmalnirmal4096Nov1212:58multilang
-rw-r--r--1nirmalnirmal580Nov1212:58LICENSE
drwxr-xr-x4nirmalnirmal4096Nov1212:58src
-rw-r--r--1nirmalnirmal929Nov1212:58project.clj
drwxr-xr-x3nirmalnirmal4096Nov1212:58test
-rw-r--r--1nirmalnirmal8042Nov1212:58storm-starter.iml
4. Buildthestorm-starterprojectusingMaven:
mvn-fm2-pom.xmlpackageormvn-fm2-pom.xmlpackage-DskipTests
5. Youwillseeanoutputsimilartothefollowingcommands:
[INFO]Scanningforprojects…
[INFO]Usingthebuilder
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThread
edBuilderwithathreadcountof1
[INFO]
[INFO]----------------------------------------------------------------
--------
[INFO]Buildingstorm-starter0.0.1-SNAPSHOT
[INFO]----------------------------------------------------------------
--------
[INFO]META-INF/MANIFEST.MFalreadyadded,skipping
[INFO]META-INF/alreadyadded,skipping
[INFO]META-INF/maven/alreadyadded,skipping
[INFO]Buildingjar:/home/nirmal/storm-on-yarn-poc/storm-
starter/target/storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar
[INFO]META-INF/MANIFEST.MFalreadyadded,skipping
[INFO]META-INF/alreadyadded,skipping
[INFO]META-INF/maven/alreadyadded,skipping
![Page 212: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/212.jpg)
[INFO]----------------------------------------------------------------
--------
[INFO]BUILDSUCCESS
[INFO]----------------------------------------------------------------
--------
[INFO]Totaltime:05:21min
[INFO]Finishedat:2014-11-12T13:05:40+05:30
[INFO]FinalMemory:30M/191M
[INFO]----------------------------------------------------------------
--------
6. Afterthebuildissuccessful,youwillseethefollowingJARfilebeingcreatedunderthetargetdirectory:
storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar
7. RuntheStormtopologyexampleontheStorm-YARNcluster:
stormjarstorm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar
storm.starter.WordCountTopologyword-count-topology
Theoutputcanbeseeninthefollowingscreenshot:
8. Clickonthetopology,asshowninthefollowingscreenshot:
![Page 213: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/213.jpg)
![Page 214: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/214.jpg)
![Page 215: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/215.jpg)
ApacheSparkApacheSparkisafastandgeneralengineforlarge-scaledataprocessing.Itwasoriginallydevelopedin2009inUCBerkeley’sAMPLabandopensourcedin2010.
ThemainfeaturesofSparkareasfollows:
Speed:SparkenablesapplicationsinHadoopclusterstorunupto100xfasterinmemoryand10xfasterevenwhenrunningondisk.Easeofuse:SparkletsyouquicklywriteapplicationsinJava,Scala,orPython.YoucanuseitinteractivelytoquerybigdatasetsfromtheScalaandPythonshells.Runseverywhere:SparkrunsonHadoop,Mesos,instandalonemode,orinthecloud.Itcanaccessdiversedatasources,includingHDFS,Cassandra,HBase,andS3.YoucanrunSparkreadilyusingitsstandaloneclustermode,onEC2,orrunitonHadoopYARNorApacheMesos.ItcanreadfromHDFS,HBase,Cassandra,andanyHadoopdatasource.Generality:Sparkpowersastackofhigh-leveltools,includingSparkSQL,MLlibformachinelearning,GraphX,andSparkStreaming.Youcancombinetheseframeworksseamlesslyinthesameapplication.
![Page 216: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/216.jpg)
WhyrunonYARN?YARNenablesSparktoruninasingleclusteralongsideotherframeworks,suchasTez,Storm,HBase,andothers.ThisavoidstheneedtocreateandmanageseparateanddedicatedSparkclusters.
Typically,customerswanttorunmultipleworkloadsonasingledatasetinasinglecluster.YARN,asagenericresourcemanagementandsingledataplatformforalldifferentframeworks/engines,makesithappen.
YARN’sbuilt-inmultitenancysupportallowsdynamicandoptimalsharingofthesamesharedclusterresourcesbetweendifferentframeworksthatrunonYARN.
YARNhaspluggableschedulerstocategorize,isolate,andprioritizeworkloads.
![Page 217: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/217.jpg)
![Page 218: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/218.jpg)
ApacheTezApacheTezispartoftheStingerinitiativeledbyHortonworkstomaketheHiveenterprisereadyandsuitableforinteractiveSQLqueries.TheTezdesignisbasedonresearchdonebyMicrosoftonparallelanddistributedcomputing.
TezenteredtheApacheIncubatorinFebruary2013andgraduatedtoatop-levelprojectinJuly2014.
Tezisbasicallyanembeddableandextensibleframeworktobuildhigh-performancebatchandinteractivedata-processingapplicationsthatneedtointegrateeasilywithYARN.
ConfusionoftenariseswhenTezisthoughtofasanengine.Tezisnotageneral-purposeengine,butmoreofaframeworkfortoolstoexpresstheirpurpose-builtneeds.Tez,forexample,enablesHive,Pig,andotherstobuildtheirownpurpose-builtenginesandembedtheminthosetechnologiestoexpresstheirpurpose-builtneeds.ProjectssuchasHive,Pig,andCascadingnowhavesignificantimprovementsinresponsetimeswhentheyuseTezinsteadofMapReduce.
TezgeneralizestheMapReduceparadigmtoamorepowerfulframeworkbasedonexpressingcomputationsasadataflowgraph.TezexiststoaddresssomeofthelimitationsofMapReduce.Forexample,inatypicalMapReduce,alotoftemporarydataisstored(suchaseachmapper’soutput,whichisadiskI/O),whichisanoverhead.InthecaseofTez,thisdiskI/Ooftemporarydataissaved,therebyresultinginhigherperformancecomparedtotheMapReducemodel.
Also,Tezcanadjusttheparallelismofreducetasksatruntime,dependingontheactualdatasizecomingoutoftheprevioustask.Ontheotherhand,inMapReducethenumberofreducersisstaticandhastobedecidedbytheuserbeforethejobissubmittedtothecluster.
TheprocessingdonebymultipleMapReducejobscannowbedonebyasingleTezjob,asfollows:
![Page 219: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/219.jpg)
Referringtotheprecedingdiagram,earlier(withPIG/HIVE),weusedtoneedmultipleM/Rjobstodosomeprocessing.However,now,inTez,asingleM/Rjobdoesthesame,thatis,thereducers(thegreenboxes)ofthepreviousstepfeedthemappers(theblueboxes)ofthenextstep.
Theprecedingimageistakenfromhttp://www.infoq.com/articles/apache-tez-saha-murthy.
Tezisnotmeantdirectlyforendusers;infact,itenablesdeveloperstobuildend-userapplicationswithmuchbetterperformanceandflexibility.Traditionally,Hadoophasbeenabatch-processingplatformtoprocesslargeamountsofdata.However,therearealotofusecasesfornear-real-timeperformanceofqueryprocessing.Therearealsoseveralworkloads,suchasmachinelearning,thatdonotfitintotheMapReduceparadigm.TezhelpsHadoopaddresstheseusecases.
Tezprovidesanexpressivedataflow-definitionAPIthatletsdeveloperscreatetheirownuniquedata-processinggraphs(DAGs)torepresenttheirapplications’data-processingflows.Oncethedeveloperdefinesaflow,TezthenprovidesadditionalAPIstoinjectcustombusinesslogicthatwillruninthatflow.TheseAPIsthencombineinputs(thatreaddata),outputs(thatwritedata),andprocessors(thatprocessdata)toprocesstheflow.
TezcanalsorunanyexistingMRjobwithoutanymodification.FormoreinformationonTez,refertohttp://tez.apache.org/.
![Page 220: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/220.jpg)
![Page 221: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/221.jpg)
ApacheGiraphApacheGiraphisagraph-processingsystemthatusestheMapReducemodeltoprocessgraphs.Currently,itisinincubationattheApacheSoftwareFoundation.
ItisbasedonGoogle’sPregel,whichisusedtocalculatepagerank.
Currently,GiraphisbeingusedbyFacebook,Twitter,andLinkedIntocreatesocialgraphsoftheirusers.BothGiraphandPregelarebasedontheBulkSynchronousParallel(BSP)modelofdistributedcomputation,whichwasintroducedbyLeslieValiant.
SupportforYARNisfromrelease1.1.0.Formoreinformation,refertotheofficialsiteathttp://giraph.apache.org/.
![Page 222: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/222.jpg)
![Page 223: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/223.jpg)
HOYA(HBaseonYARN)HoyaisbasicallyrunningHBaseonYARN.ItiscurrentlyhostedonGithub,butthereareplanstomoveittotheApacheFoundation.
HoyacreatesHBaseclustersontopofYARN.ItdoesthiswithaclientapplicationcalledHoyaclient;thisapplicationcreatesthepersistentconfigurationfiles,setsuptheHBaseclusterXMLfiles,andthenasksYARNtocreateanApplicationMaster,whichistheHoyaAMhere.
Formoreinformation,refertohttps://github.com/hortonworks/hoya,http://hortonworks.com/blog/introducing-hoya-hbase-on-yarn/andhttp://hortonworks.com/blog/hoya-hbase-on-yarn-application-architecture/.
![Page 224: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/224.jpg)
![Page 225: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/225.jpg)
KOYA(KafkaonYARN)OnNovember5,2014,DataTorrent,acompanyfoundedbyex-Yahoo!,announcedanewprojecttobringthefault-tolerant,high-performance,scalableApacheKafkamessagingsystemtoYARN.
Theso-calledKafkaonYARN(KOYA)projectplanstoleverageYARNforKafkabrokermanagement,automaticbrokerrecovery,andmore.Plannedfeaturesincludeafully-HAApplicationMaster,stickyallocationofcontainers(sothatarestartcanaccesslocaldata),awebinterfaceforKafka,andmore.
TheexpectedreleasetotheopensourcecommunityissomewhereinQ22015.
Moreinformationisavailableathttps://www.datatorrent.com/introducing-koya-apache-kafka-on-apache-hadoop-2-0-yarn/.
![Page 226: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/226.jpg)
![Page 227: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/227.jpg)
SummaryThischaptertalkedaboutthedifferentframeworksandprogrammingmodelsthatcanberunonYARN.WediscussedApacheSamzaandStormonYARNindetail.
WiththewideacceptanceofYARNintheindustry,moreandmoreframeworkswillsupportYARN,takingcompleteadvantageofYARN’sgenericfeatures.
WelookedattheexistingframeworksthatareintegratedwithYARNatthemoment.
ThereisalotmoreworkgoingonintheindustrytomakeexistingandnewapplicationsrunonYARN.
InChapter8,FailuresinYARN,wewilldiscusshowfaults,failuresatvariouslevels,arehandledinYARN.
![Page 228: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/228.jpg)
![Page 229: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/229.jpg)
Chapter8.FailuresinYARNDealingwithfailuresindistributedsystemsiscomparativelymorechallengingandtimeconsuming.Also,theHadoopandYARNframeworksrunoncommodityhardwareandclustersizenowadays;thissizecanvaryfromseveralnodestoseveralthousandnodes.Sohandlingfailurescenariosanddealingwithever-growingscalingissuesisveryimportant.Inthissection,wewillfocusonfailuresintheYARNframework:thecausesoffailuresandhowtoovercomethem.
Inthischapter,wewillcoverthefollowingtopics:
ResourceManagerfailuresApplicationMasterfailuresNodeManagerfailuresContainerfailuresHardwarefailures
Wewillbedealingwiththerootcausesofthesefailuresandthesolutionstothem.
![Page 230: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/230.jpg)
ResourceManagerfailuresIntheinitialversionsoftheYARNframework,ResourceManagerfailuresmeantatotalclusterfailure,asitwasasinglepointoffailure.TheResourceManagerstoresthestateofthecluster,suchasthemetadataofthesubmittedapplication,informationonclusterresourcecontainers,informationonthecluster’sgeneralconfigurations,andsoon.Therefore,iftheResourceManagergoesdownbecauseofsomehardwarefailure,thenthereisnowaytoavoidmanuallydebuggingtheclusterandrestartingtheResourceManager.DuringthetimetheResourceManagerisdown,theclusterisunavailable,andonceitgetsrestarted,alljobswouldneedarestart,sothehalf-completedjobsloseanydataandneedtoberestartedagain.Inshort,arestartoftheResourceManagerusedtorestartalltherunningApplicationMasters.
ThelatestversionsofYARNaddressthisproblemintwoways.Onewayisbycreatinganactive-passiveResourceManagerarchitecture,sothatwhenonegoesdown,anotherbecomesactiveandtakesresponsibilityforthecluster.TheResourceManagerRMstatecanbeseeninthefollowingimage:
AnotherwayisbyusingtheZookeeperResourceManagerquorum,sothattheResourceManagerstateisstoredexternallyovertheZookeeper,andoneResourceManagerisinanactivestateandoneormoreResourceManagersareinpassivemode,waitingforsomethingtohappenthatbringsthemtoanactivestate.TheResourceManager’sstatecanbeseeninthefollowingimage:
![Page 231: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/231.jpg)
Intheprecedingdiagram,youcanseethattheResourceManager’sstateismanagedbytheZookeeper.Wheneverthereisafailurecondition,theResourceManager’sstateissharedwiththepassiveResourceManager(s)tochangetoanactivestateandtakeoverresponsibilityforthecluster,withoutanydowntime.
![Page 232: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/232.jpg)
![Page 233: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/233.jpg)
ApplicationMasterfailuresTorecovertheapplication’sstateafteritsrestartbecauseofanApplicationMasterfailureistheresponsibilityoftheApplicationMasteritself.WhentheApplicationMasterfails,theResourceManagersimplystartsanothercontainerwithanewApplicationMasterrunninginitforanotherapplicationattempt.ItistheresponsibilityofthenewApplicationMastertorecoverthestateoftheolderApplicationMaster,andthisispossibleonlywhenApplicationMasterspersisttheirstatesintheexternallocationsothatitcanbeusedforfuturereference.AnyApplicationMastercanrunanyapplicationfromscratchinsteadofrecoveringitsstateandrerunningagain.
Forexample,anApplicationMastercanrecoveritscompletedjobs.However,ifthejobsthatarerunningandcompletedduringtheApplicationMaster’srecoverytimeframegethaltedforsomereason,theirstatewillbediscardedandtheApplicationMasterwillsimplyrerunthemfromscratch.
TheYARNframeworkiscapableofrerunningtheApplicationMasteraspecifiednumberoftimesandrecoveringthecompletedtasks.
![Page 234: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/234.jpg)
![Page 235: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/235.jpg)
NodeManagerfailuresAlmostallnodesintheclusterrunsaNodeManagerservicedaemon.TheNodeManagertakescareofexecutingacertainpartofaYARNjoboneveryindividualmachine,whileotherpartsareexecutedonothernodes.Fora1000nodeYARNcluster,thereareprobablyaround999nodemanagersrunning.Sonodemanagersareindeedaper-nodeagentandtakescareoftheindividualnodesdistributedinthecluster.
IfaNodeManagerfails,theResourceManagerdetectsthisfailureusingatime-out(thatis,stopsreceivingtheheartbeatsfromtheNodeManager).TheResourceManagerthenremovestheNodeManagerfromitspoolofavailableNodeManagers.Italsokillsallthecontainersrunningonthatnode&reportsthefailuretoallrunningAMs.AMsarethenresponsibleforreactingtonodefailures,byredoingtheworkdonebyanycontainersrunningonthatnodeduringthefault.
Ifthefaultcausingthetime-outistransientthentheNodeManagerwillresynchronizeswiththeResourceManager.OnthesimilarlinesifanewNodeManagerjoinsthecluster,theResourceManagernotifiesallApplicationMastersabouttheavailabilityofnewresources.
![Page 236: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/236.jpg)
![Page 237: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/237.jpg)
ContainerfailuresWheneveracontainerfinishes,theApplicationMasterisinformedofthiseventbytheResourceManager.SotheApplicationMasterinterpretsthatthecontainerstatusreceivedthroughtheResourceManageristhesuccessorfailurefromcontainerexitstatus.TheApplicationMasterhandlesthefailuresofthejobcontainers.
Itistheresponsibilityoftheapplicationframeworkstomanagethecontainer’sfailures,andtheresponsibilityoftheYARNframeworkistoprovideinformationtotheapplicationframework.AsapartofallocatingtheAPI’sresponse,theResourceManagercollectsinformationonthefinishedcontainersfromtheApplicationMaster,asthecontainersreturnallthisinformationtothecorrespondingApplicationMaster.ItistheresponsibilityoftheApplicationMastertovalidatethecontainer’sstatus,exitcode,anddiagnosticinformationandappropriateactiononit,forexamplewhentheMapReduceApplicationMasterretriesthemapandreducetasksbyrequestingnewcontainers,untiltheconfigurednumberoftasksfailforasinglejob.
Toaddresscontainerallocationfailurescenarios,theResourceManagercollectscontainerinformationbyexecutingtheAllocatecall,andtheAllocateResponseusuallydoesnotreturnanycontainers.However,theAllocatecallshouldbemadeperiodicallytoensurethatallcontainersareassigned.Whenthecontainerarrives,itisforsurethattheframeworkwillhavesufficientresources,andtheApplicationMasterwillnotreceivemorecontainersthanitaskedfor.Also,theApplicationMastercanmakeseparatecontainerrequests,ResourceRequests,typicallyonepersecond.
![Page 238: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/238.jpg)
![Page 239: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/239.jpg)
HardwareFailuresAstheHadoopandYARNframeworksusecommodityhardwarefortheclustersetupandscalingfromseveralnodestoseveralthousandnodes,allthecomponentsofHadooporYARNaredesignedontheassumptionthathardwarefailuresareverycommon.Therefore,thesefailureswouldbeautomaticallyhandledbytheframeworksothatimportantdataisnotlostbecauseofthem.Forthis,Hadoopprovidesdatareplicationacrossthenodes/rackssothatevenifthewholerackfails,datawouldberecoveredfromanothernodeonanotherrack,andjobswouldberestartedoveranotherreplicadatasettocomputetheresults.
![Page 240: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/240.jpg)
![Page 241: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/241.jpg)
SummaryInthischapter,wediscussedYARNfailurescenariosandhowtheseareaddressedintheYARNframework.Inthenextchapter,wewillbefocusingonalternativesolutionsfortheYARNframework.WewillalsoseeabriefoverviewofthemostcommonframeworksthatarecloselyrelatedtoYARN.
![Page 242: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/242.jpg)
![Page 243: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/243.jpg)
Chapter9.YARN–AlternativeSolutionsDuringthedevelopmentofYARN,manyotherorganizationssimultaneouslyidentifiedthelimitationsofHadoop1.xandwereactivelyinvolvedindevelopingalternativesolutions.
ThischapterwillbrieflytalkaboutsuchalternatesolutionsandcomparethemtoYARN.AmongthemostcommonframeworksthatarecloselyrelatedtoYARNare:
MesosOmegaCorona
![Page 244: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/244.jpg)
MesosMesoswasoriginallydevelopedattheUniversityofCaliforniaatBerkeleyandlaterbecameopensourceundertheApacheSoftwareFoundation.
Mesoscanbethoughtofasahighly-availableandfault-tolerantoperatingsystemkernelforyourclusters.It’saclusterresourcemanagerthatprovidesefficientresourceisolationandsharingacrossmultiplediversecluster-computingorframeworks.
MesoscanbecomparedtoYARNinsomeaspectsbutacompletequantitativecomparisonisliterallynotpossible.
WewilltalkaboutthearchitectureofMesosandcomparesomeofthearchitecturaldifferenceswithrespecttoYARN.Thiswaywewillhaveahighlevelunderstandingofthemaindifferencebetweenthetwoframeworks.
TheprecedingfigureshowsthemaincomponentsofMesos.Itbasicallyconsistsofamasterprocessthatmanagesslaveprocessesrunningoneachclusternodeandmesosapplications(alsocalledframeworks)thatruntasksontheseslaves.
Formoreinformationpleaserefertotheofficialsiteathttp://mesos.apache.org/.
Herearethehigh-leveldifferencesbetweenMesosandYARN:
Mesos YARN
MesosusesLinuxcontainergroups(http://lxc.sourceforge.net).
Linuxcontainergroupsareastrongerisolationbutmayhavesomeadditionaloverhead.
YARNusessimpleUnixprocesses.
MesosisprimarilywritteninC++. YARNisprimarilywritteninJavawithbitsofnativecode.
![Page 245: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/245.jpg)
MesossupportsbothmemoryandCPUscheduling.
Currently,YARNonlysupportsmemoryscheduling(forexample,yourequestxcontainersofyMBeach),butthereareplanstoextendittootherresourcessuchasnetworkanddiskI/Oresources.
Mesosintroducesadistributedtwo-levelschedulingmechanismcalledresourceoffers.Mesosdecideshowmanyresourcestooffereachframework,whileframeworksdecidewhichresourcestoacceptandwhichcomputationstorunonthem.
YARNhasarequest-basedapproach.ItallowstheApplicationMastertoaskforresourcesbasedonvariouscriteria,includinglocations,andalsoallowstherequestertomodifyfuturerequestsbasedonwhatwasgivenandonthecurrentusage.
Mesosleveragesapoolofcentralschedulers(forexample,classicHadooporMPI).
YARNontheotherhandhasaperjobscheduler.AlthoughYARNenableslatebindingofcontainerstotasks,whereeachindividualjobcanperformlocaloptimizations,theper-jobApplicationMastermightresultingreateroverheadthantheMesosapproach.
![Page 246: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/246.jpg)
![Page 247: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/247.jpg)
OmegaOmegaisGoogle’snextgenerationclustermanagementsystem.
Omegaisspecificallyfocusedonaclusterschedulingarchitecturethatusesparallelism,sharedstate,andoptimisticconcurrencycontrol.
Fromthepastexperience,Googlenoticedthatastheclustersandtheirworkloadsincrease,theschedulerisatriskofbecomingascalabilitybottleneck.
Google’sproductionjobschedulerhasexperiencedallofthis.Overtheyears,ithasevolvedintoacomplicated,sophisticatedsystemthatishardtochange.
Aschematicoverviewoftheschedulingarchitecturescanbeseeninthefollowingfigure:
contribprojecttoHadoop0.20branchandisnotaverylargecodebase.Coronaisintegratedwiththefair-scheduler.YARNismoreinterestedinthecapacityscheduler.
Googleidentifiedthefollowingtwoprevalentschedulerarchitecturesshownintheprecedingfigure:
Monolithicschedulers:Thisusesasingle,centralizedschedulingalgorithmforalljobs(ourexistingschedulerisoneofthese).Theydonotmakeiteasytoaddnewpoliciesandspecializedimplementations,andmaynotscaleuptotheclustersizesoneisplanningforinthefuture.Two-levelschedulers:Thiswillhaveasingleactiveresourcemanagerthatofferscomputeresourcestomultipleparallel,independentschedulerframeworks,asinMesosandHadoopOnDemand(HOD).Theirarchitecturesdoappeartoprovideflexibilityandparallelism,butinpracticetheirconservativeresourcevisibilityandlockingalgorithmslimitboth,andmakeithardtoplacedifficultto-schedule“picky”
![Page 248: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/248.jpg)
jobsortomakedecisionsthatrequireaccesstothestateoftheentirecluster.
ThesolutionisOmega—anewparallelschedulerarchitecturebuiltaroundthesharedstate,usinglock-freeoptimisticconcurrencycontrol,toachievebothimplementationextensibilityandperformancescalability.
Omega’sapproachreflectsagreaterfocusonscalability,butmakesithardertoenforceglobalproperties,suchascapacity,fairness,anddeadlines.
Formoreinformation,refertohttp://research.google.com/pubs/pub41684.html.
![Page 249: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/249.jpg)
![Page 250: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/250.jpg)
CoronaCoronaisanotherworkfromFacebook,whichisnowopen-sourcedandhostedontheGitHubrepositoryathttps://github.com/facebookarchive/hadoop-20/tree/master/src/contrib/corona.
Facebook,withitshugepeta-scalequantityofdata,sufferedseriousperformance-relatedissueswiththeclassicMapReduceframeworkbecauseofthesingleJobTrackertakingcareofthousandsofjobsanddoingalotofworkalone.
Inordertosolvetheseissues,FacebookcreatedCorona,whichseparatedclusterresourcemanagementfromjobcoordination.
InHadoopCorona,theclusterresourcesaretrackedbyacentralClusterManager.EachjobgetsitsownCoronaJobTrackerwhichtracksjustthatparticularjob.
CoronahasentirelyredesignedMapReducearchitecturetobringbetterclusterutilizationandjobscheduling,justlikeYARNdid.
Facebook’sgoalsinre-writingtheHadoopschedulingframeworkwerenotthesameasYARN’s.FacebookwantedquickimprovementsinMapReduce,butonlythepartthattheywereusing.TheyhadnointerestinrunningmultipleheterogeneousframeworkssuchasYARNdoesorotherkeydesignconsiderationsofYARN.
ForFacebook,doingaquickrewriteoftheschedulerseemedfeasibleandlowrisk,comparedtogoingwithYARN,gettingfeaturesthatwerenotneeded,understandingit,fixingitsproblemsandthenlandingupwithsomethingthatdidn’taddresstheprimarygoalofloweringlatency.
Thefollowingaresomeofthekeydifferences:
Coronadoespush-basedschedulingandhasanevent-driven,callback-orientedmessageflow.Thiswascriticaltoachievingfast,low-latencyscheduling.PollingisabigpartofwhytheHadoopschedulerisslowandhasscalabilityissues.YARNdoesnotdocallback-basedmessageflow.InCorona,JobTrackercanrunonthesameJVMastheJobClient(thatisHive).FacebookhadfatclientmachineswithtonsofRAMandCPU.Toreducelatency,maximumprocessingontheclientmachineispreferred.InYARN,JobTrackerhastobescheduledwithinthecluster.Thismeansthatthere’soneextrastepbetweenstartingaqueryandgettingitrunning.CoronaisstructuredasacontribprojecttoHadoop0.20branchandisnotaverylargecodebase.Coronaisintegratedwiththefair-scheduler.YARNismoreinterestedinthecapacityscheduler.
FormoreinformationonCorona,refertohttps://www.facebook.com/notes/facebook-engineering/under-the-hood-scheduling-mapreduce-jobs-more-efficiently-with-corona/10151142560538920.
![Page 251: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/251.jpg)
![Page 252: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/252.jpg)
SummaryWetalkedaboutvariousworksrelatedtoYARNthatareavailableonthemarkettoday.Thesesystemssharecommoninspiration/requirements,andthehigh-levelgoalofimprovingscalability,latency,fault-tolerance,andprogramming-modelflexibility.Thevariedarchitecturaldifferencesareduetothediverseandvarieddesignpriorities.Inthenextchapter,wewilltalkaboutYARN’sfutureandsupportintheindustry.
![Page 253: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/253.jpg)
![Page 254: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/254.jpg)
Chapter10.YARN–FutureandSupportYARNisthenewmoderndataoperatingsystemforHadoop2.YARNactsasacentralorchestratortosupportmixedworkloads/programmingmodels,runningmultipleengines,andmultipleaccesspatternssuchasbatchprocessing,interactive,streaming,andreal-time,inHadoop2.
Inthischapter,wewilltalkaboutYARN’sjourneyanditspresentandfutureinthebigdataindustry.
![Page 255: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/255.jpg)
WhatYARNmeanstothebigdataindustryItcanbesaidthatYARNisaboontothebigdataindustry.WithoutYARNtheentirebigdataindustrywouldhavebeenatseriousrisk.Astheindustrystartedplayingwithbigdata,newandemergingvarietiesofproblemscameintothepictureandhencenewframeworks.
YARN’ssupporttorunthesenewandemergingframeworksallowstheseframeworkstofocusonsolvingtheproblemsforwhichtheywerespecificallymeantfor,whileYARNtakescareofresourcemanagementandothernecessarythings(resourceallocation,schedulingjobs,faulttolerance,andsoon).
HadtherebeennoYARN,theseframeworkswouldhavehadtodoalltheresource-managementontheirown.Therearemanybigdataprojectsthatfailedinthepastduetounrealisticexpectationsonimmaturetechnologies.
YARNistheenablerforportingmatureandenterprise-classtechnologiesdirectlyontoHadoop.WithoutYARN,theonlythinginHadoopwastouseMapReduce.
![Page 256: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/256.jpg)
![Page 257: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/257.jpg)
Journey–presentandfutureAroundtwoyearsback,YARNwasintroducedwiththeHadoop0.23releaseon11Nov,2011.
Sincethen,therewasnolookingbackandtherewereanumberofreleases.
Finally,onOctober15,2013ApacheHadoop2.2.0wastheGA(GeneralAvailability)releaseofApacheHadoop2.x.
InOctober2013,ApacheHadoopYARNwontheBestPaperawardatACMSoCC(SymposiumonCloudComputing)2013.
ApacheHadoop2.x,poweredbyYARN,isnodoubtthebestplatformforalloftheHadoopecosystemcomponentssuchasMapReduce,ApacheHive,ApachePig,andsoonthatuseHDFSastheunderlyingdatastorage.
YARNwasalsohonoredbyotheropensourcecommunitiesforframeworkssuchasApacheGiraph,ApacheTez,ApacheSpark,ApacheFlink,andmanyothers.
VendorssuchasHP,Microsoft,SAS,Teradata,SAP,RedHat,andthelistgoeson,aremovingtowardsYARNtoruntheirexistingproductsandservicesonHadoop.
PeoplewillingtomodifyapplicationscanalreadyuseYARNdirectly,buttherearemanycustomers/vendorswhodon’twanttomodifytheirexistingapplication.Forthem,thereisApacheSlider,anotheropensourceprojectfromHortonworks,whichcandeployanyexistingdistributedapplicationswithoutrequiringthemtobeportedtoYARN.
ApacheSliderallowsyoutobridgeexistingalways-onservicesandmakessuretheyworkreallywellontopofYARN,withouthavingtomodifytheapplicationitself.
Sliderfacilitatesmanylong-runningservicesandapplicationssuchasApacheStorm,ApacheHBase,ApacheAccumulo,andsoonrunningonYARN.
ThisinitiativewilldefinitelyexpandthespectrumofapplicationsandusecasesthatonecanactuallyusewithHadoopandYARNinfuture.
![Page 258: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/258.jpg)
Presenton-goingfeaturesNow,let’sdiscussthepresenton-goingworksinYARN.
LongRunningApplicationsonSecureClusters(YARN-896)
Supportlong-livedapplicationsandlong-livedcontainers.Refertohttps://issues.apache.org/jira/browse/YARN-896.
ApplicationTimelineServer(YARN-321,YARN-1530)
Currently,wehaveaJobHistoryServerforMapReducehistory.TheMapReducejobhistoryservercurrentlyneedstobedeployedasatrustedserverinsyncwiththeMapReduceruntime.Everynewapplicationwouldneedasimilarapplicationhistoryserver.HavingtodeployO(T*V)(whereTisthenumberoftypeofapplication,Visthenumberofversionofapplication)trustedserversisclearlynotscalable.
ThisJIRAistocreateonlyonetrustedapplicationhistoryserver,whichcanhaveagenericUI.Refertothefollowinglinksformoreinformation:
https://issues.apache.org/jira/browse/YARN-321https://issues.apache.org/jira/browse/YARN-1530
Diskscheduling(YARN-2139)
SupportfordiskasaresourceinYARN.YARNshouldconsiderdiskasanotherresourceforschedulingtasksonnodes,isolationatruntime,andspindlelocality.Refertohttps://issues.apache.org/jira/browse/YARN-2139.
Reservation-basedscheduling(YARN-1051)
ToextendtheYARNRMtohandletimeexplicitly,allowinguserstoreservecapacityovertime.ThisisanimportantsteptowardsSLAs,long-runningservices,workflows,andhelpsingangscheduling.
![Page 259: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/259.jpg)
FuturefeaturesLet’sdiscussthefutureworksinYARN.
ContainerResizing(YARN-1197)
ThecurrentYARNresourcemanagementlogicassumesthattheresourcesallocatedtoacontainerarefixedduringitslifetime.Whenuserswanttochangetheresourcesofanallocatedcontainer,theonlywayisreleasingitandallocatinganewcontainerwiththeexpectedsize.Allowingruntimechangestotheresourcesofanallocatedcontainerwillgiveusbettercontrolofresourceusageontheapplicationside.Refertohttps://issues.apache.org/jira/browse/YARN-1197.
Adminlabels(YARN-796)
Supportforadminstospecifylabelsfornodes.TheexamplesoflabelsareOS,processorarchitecture,andsoon.Refertohttps://issues.apache.org/jira/browse/YARN-796.
ContainerDelegation(YARN-1488)
Allowcontainerstodelegateresourcestoanothercontainer.ThiswouldallowexternalframeworkstosharenotjustYARN’sresource-managementcapabilities,butalsoitsworkload-managementcapabilities.
ThisalsoshowsthatYARNisnotonlyfocusedontheApacheHadoopecosystemcomponents,butalsoonanyexistingexternalnon-HadoopproductsandservicesthatwanttouseHadoop.
Also,workisgoingoninbringingtogethertheworldsofDataandPaaSbyusingDocker,GoogleKubernetes,andRedHatOpenShiftonYARNsothatacommonresourcemanagementcanbedoneacrossdataandPaaSworkloads.
![Page 260: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/260.jpg)
![Page 261: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/261.jpg)
YARN-supportedframeworksThefollowingisthecurrentlistofframeworksthatrunsontopofYARN,andthislistwillgoongettinglongerinthefuture:
ApacheHadoopMapReduceanditsecosystemcomponentsApacheHAMAOpenMPIApacheS4ApacheSparkApacheTezImpalaStormHOYA(HBaseonYARN)ApacheSamzaApacheGiraphApacheAccumuloApacheFlinkKOYA(KafkaonYARN)Solr
![Page 262: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/262.jpg)
![Page 263: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/263.jpg)
SummaryInthischapter,webrieflytalkedaboutYARN’sjourneysinceitsinception.YARNhascompletelychangedHadoopfromthewayitwasearlierintheHadoop1.xversion.NowYARNisafirst-classresourcemanagementframeworkforsupportingmixedworkloads/processingframeworks.
Fromwhatcanbeenseenandpredicted,YARNissurelyahitinthebigdataindustryandhasmanymorenewandpromisingfeaturestocomeinthefuture.Currently,YARNhandlesmemoryandCPUandwillcoordinateadditionalresourcessuchasdiskandnetworkI/Ointhefuture.
![Page 264: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/264.jpg)
IndexA
AccessControlList(ACL)about/NodeManager(NM),Thecapacityscheduler
administrativetoolsabout/Administrativetoolscommands/Administrativetoolsgenericoptions,supporting/Administrativetools
/Administrativetoolsanagrams/PracticalexamplesofMRv1andMRv2ApacheGiraph
about/ApacheGiraphURL/ApacheGiraph
ApacheHadoop2.2.0about/Journey–presentandfuture
ApacheSamzaabout/ApacheSamzaKafka/ApacheSamzaApacheYARN/ApacheSamzaZooKeeper/ApacheSamzaKafkaproducer,writing/WritingaKafkaproducerhello-samzaproject,writing/Writingthehello-samzaproject
ApacheSamza,layersprocessinglayer/ApacheSamzastreaminglayer/ApacheSamzaexecutionlayer/ApacheSamza
ApacheSliderabout/Journey–presentandfuture
ApacheSoftwareFoundationabout/Mesos
ApacheSparkabout/ApacheSparkfeatures/ApacheSparkrunning,onYARN/WhyrunonYARN?
ApacheTezabout/ApacheTezURL/ApacheTez
ApplicationContext(AppContext)/TheMapReduceApplicationMasterApplicationMaster
about/TheMapReduceApplicationMasterApplicationMaster(AM)/ApplicationMaster(AM)
restarting/TheMapReduceApplicationMaster
![Page 265: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/265.jpg)
writing/WritingtheYARNApplicationMasterresposibilities/ResponsibilitiesoftheApplicationMasterfailures/ApplicationMasterfailures
ApplicationMasterLauncherserviceabout/ResourceManager
ApplicationMasterServiceabout/ResourceManager
ApplicationsManagerabout/ResourceManager
![Page 266: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/266.jpg)
Bbackwardcompatibility,MRv2APIs
about/BackwardcompatibilityofMRv2APIsbinarycompatibility,oforg.apache.hadoop.mapredAPIs/Binarycompatibilityoforg.apache.hadoop.mapredAPIssourcecompatibility,oforg.apache.hadoop.mapredAPIs/Sourcecompatibilityoforg.apache.hadoop.mapredAPIs
BulkSynchronousParallel(BSP)about/ApacheGiraph
![Page 267: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/267.jpg)
Ccapacityscheduler
about/Thecapacityscheduler,Thecapacityschedulerbenefits/Thecapacityschedulerfeatures/Thecapacityschedulerconfigurations/Capacityschedulerconfigurations
clusterschedulingarchitectureabout/Omega
configurationparametersabout/Thefully-distributedmode
containerfailures/Containerfailures
containerallocationabout/Containerallocationtoapplication/Containerallocationtotheapplication
containerconfigurationsabout/Containerconfigurationsparameters/Containerconfigurations
ContainerExecutorabout/NodeManager(NM)
ContainerManagerabout/NodeManager(NM)
ContextObjects/OldandnewMapReduceAPIsCorona
about/CoronaandFacebook,differences/CoronaURL/Corona
![Page 268: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/268.jpg)
Ddata-processinggraphs(DAGs)
about/ApacheTezDataNodes(DN)/Thefully-distributedmode
configuring/Thefully-distributedmodeDocker
about/Futurefeatures
![Page 269: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/269.jpg)
EEcoSystem
webinterfaces/WebinterfacesoftheEcosystem
![Page 270: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/270.jpg)
FFacebook
about/CoronaandCorona,differences/Corona
Fairscheduler/Thefairschedulerabout/Thefairschedulerconfigurations/Fairschedulerconfigurations
FIFOscheduler/TheFIFO(FirstInFirstOut)schedulerabout/TheFIFO(FirstInFirstOut)schedulerconfigurations/TheFIFO(FirstInFirstOut)scheduler
fully-distributedmodeabout/Thefully-distributedmodeHistoryServer/HistoryServerslavefiles/Slavefiles
![Page 271: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/271.jpg)
GGoogleKubernetes
about/Futurefeaturesgrid
starting/Startingagrid
![Page 272: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/272.jpg)
HHadoop
URL/SoftwareYARN,usingin/UnderstandingwhereYARNfitsintoHadoop
Hadoop0.23about/Journey–presentandfuture
Hadoop1.xabout/AshortintroductiontoHadoop1.xandMRv1components/AshortintroductiontoHadoop1.xandMRv1
Hadoop2releaseabout/TheHadoop2release
HadoopandYARNclusteroperating/OperatingHadoopandYARNclustersstarting/StartingHadoopandYARNclustersstopping/StoppingHadoopandYARNclusters
HadoopclusterHDFS/AshortintroductiontoHadoop1.xandMRv1MapReduce/AshortintroductiontoHadoop1.xandMRv1
HadoopOnDemand(HOD)/Omegahello-samzaproject
writing/Writingthehello-samzaprojectproperties/Writingthehello-samzaprojectgrid,starting/Startingagrid
HistoryServer/HistoryServerHOYA(HBaseonYARN)
about/HOYA(HBaseonYARN)URL/HOYA(HBaseonYARN)
![Page 273: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/273.jpg)
KKafkaproducer
writing/WritingaKafkaproducerKOYA(KafkaonYARN)
about/KOYA(KafkaonYARN)URL/KOYA(KafkaonYARN)
![Page 274: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/274.jpg)
MMapReduce,YARN
about/YARN’sMapReducesupportApplicationMaster/TheMapReduceApplicationMastersettings,example/ExampleYARNMapReducesettingsYARNapplications,developing/DevelopingYARNapplications
MapReduceapplicationsYARN,compatiblewith/YARN’scompatibilitywithMapReduceapplications
MapReducejobconfigurations/MapReducejobconfigurationsproperties/MapReducejobconfigurations
MapReduceJobHistoryServersettings/HistoryServer
MapReduceprojectEnd-userMapReduceAPI/MRv1versusMRv2MapReduceframework/MRv1versusMRv2MapReducesystem/MRv1versusMRv2
Mesosabout/MesosandYARN,differencebetween/MesosURL/Mesos
modernoperatingsystem,ofHadoopYARN,usedas/YARNasthemodernoperatingsystemofHadoop
monolithicschedulers/OmegaMRv1
about/AshortintroductiontoHadoop1.xandMRv1versusMRv2/MRv1versusMRv2examples/PracticalexamplesofMRv1andMRv2,Runningthejob
MRv2versusMRv1/MRv1versusMRv2examples/PracticalexamplesofMRv1andMRv2,Preparingtheinputfile(s)
![Page 275: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/275.jpg)
NNameNode(NN)/Thefully-distributedmode
configuring/Thefully-distributedmodenewMapReduceAPI
about/OldandnewMapReduceAPIsversusoldMapReduceAPI/OldandnewMapReduceAPIs
NodeHealthCheckerServiceabout/NodeManager(NM)
NodeManager(NM)/NodeManager(NM)configuring/Thefully-distributedmodeparameters/Thefully-distributedmode
NodeManagers(NM)/Thefully-distributedmodeNodeStatusUpdater
about/NodeManager(NM)
![Page 276: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/276.jpg)
OoldMapReduceAPI
about/OldandnewMapReduceAPIsversusnewMapReduceAPI/OldandnewMapReduceAPIs
Omegaabout/Omega
![Page 277: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/277.jpg)
PPiexample
running/RunningasamplePiexampleprerequisites,single-nodeinstallation
platform/Platformsoftwares/Software
prerequisites,Storm-YARNHadoopYARN,installing/HadoopYARNshouldbeinstalledApacheZooKeeper,installing/ApacheZooKeepershouldbeinstalled
programnamesaggregatewordcount/RunningsampleexamplesonYARNaggregatewordhist/RunningsampleexamplesonYARNbbp/RunningsampleexamplesonYARNdbcount/RunningsampleexamplesonYARNdistbbp/RunningsampleexamplesonYARNgrep/RunningsampleexamplesonYARNjoin/RunningsampleexamplesonYARNmultifilewc/RunningsampleexamplesonYARNpentomino/RunningsampleexamplesonYARNpi/RunningsampleexamplesonYARNrandomtextwriter/RunningsampleexamplesonYARNrandomwriter/RunningsampleexamplesonYARNsecondarysort/RunningsampleexamplesonYARNsort/RunningsampleexamplesonYARNsudoku/RunningsampleexamplesonYARNteragen/RunningsampleexamplesonYARNterasort/RunningsampleexamplesonYARNteravalidate/RunningsampleexamplesonYARNwordcount/RunningsampleexamplesonYARNwordmean/RunningsampleexamplesonYARNwordmedian/RunningsampleexamplesonYARNwordstandarddeviation/RunningsampleexamplesonYARN
pseudo-distributedmode/Thepseudo-distributedmodepush-basedscheduling/Corona
![Page 278: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/278.jpg)
Rredesignidea
about/TheredesignideaMapReduce,limitations/LimitationsoftheclassicalMapReduceorHadoop1.xHadoop1.x,limitations/LimitationsoftheclassicalMapReduceorHadoop1.x
RedHatOpenShiftabout/Futurefeatures
RedHatPackageManagers(RPMs)/Thefully-distributedmodeResourceManager/ResourceManagerResourceManager(RM)
scheduler/ResourceManagersecurity/ResourceManagerRMRestartPhaseI/RecentdevelopmentsinYARNarchitectureRMRestartPhaseII/RecentdevelopmentsinYARNarchitectureabout/Thefully-distributedmodeconfiguring/Thefully-distributedmodeparameters/Thefully-distributedmodefailures/ResourceManagerfailures
ResourceManager(RM),componentsApplicationManager/NodeManager(NM)Scheduler/NodeManager(NM)
![Page 279: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/279.jpg)
Sschedulerarchitectures
monolithicschedulers/Omegatwo-levelschedulers/Omega
single-nodeinstallationabout/Single-nodeinstallationprerequisites/Prerequisitesstarting/Startingwiththeinstallationstandalonemode(localmode)/Thestandalonemode(localmode)pseudo-distributedmode/Thepseudo-distributedmode
slavefiles/Slavefilesstandalonemode(localmode)/Thestandalonemode(localmode)Storm-Starterexamples
building/BuildingandrunningStorm-Starterexamplesrunning/BuildingandrunningStorm-Starterexamples
Storm-YARNabout/Storm-YARNprerequisites/Prerequisitessettingup/SettingupStorm-YARNstorm.yamlconfiguration,obtaining/Gettingthestorm.yamlconfigurationofthelaunchedStormclusterStorm-Starterexamples,building/BuildingandrunningStorm-StarterexamplesStorm-Starterexamples,running/BuildingandrunningStorm-Starterexamples
storm.yamlconfigurationobtaining/Gettingthestorm.yamlconfigurationofthelaunchedStormcluster
![Page 280: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/280.jpg)
Ttwo-levelschedulers/Omega
![Page 281: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/281.jpg)
WwebGUI
YARNapplications,monitoringwith/MonitoringYARNapplicationswithwebGUI
![Page 282: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/282.jpg)
YYARN
used,asmodernoperatingsystemofHadoop/YARNasthemodernoperatingsystemofHadoopdesigngoals/WhatarethedesigngoalsforYARNused,inHadoop/UnderstandingwhereYARNfitsintoHadoopmultitenancyapplicationsupport/YARNmultitenancyapplicationsupportsampleexamples,runningon/RunningsampleexamplesonYARNsamplePiexample,running/RunningasamplePiexamplecompatibility,withMapReduceapplications/YARN’scompatibilitywithMapReduceapplicationsApacheSpark,runningon/WhyrunonYARN?and,Mesosdifferencebetween/Mesosimportance,toBigDataindustry/WhatYARNmeanstothebigdataindustrypresent/Journey–presentandfuturefuture/Journey–presentandfuturepresenton-goingfeatures/Presenton-goingfeaturesfuturefeatures/Futurefeatures
YARN,featuresLongRunningApplicationsonSecureClusters(YARN-896)/Presenton-goingfeaturesApplicationTimelineServer(YARN-321,YARN-1530)/Presenton-goingfeaturesDiskscheduling(YARN-2139)/Presenton-goingfeaturesReservation-basedscheduling(YARN-1051)/Presenton-goingfeaturesContainerResizing(YARN-1197)/FuturefeaturesAdminlabels(YARN-796)/FuturefeaturesContainerDelegation(YARN-1488)/Futurefeatures
YARN-321URL/Presenton-goingfeatures
YARN-796URL/Futurefeatures
YARN-896URL/Presenton-goingfeatures
YARN-1197URL/Futurefeatures
YARN-1530URL/Presenton-goingfeatures
YARN-2139URL/Presenton-goingfeatures
YARN-supportedframeworksabout/YARN-supportedframeworks
YARNadministrations
![Page 283: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/283.jpg)
about/AdministrationofYARNconfigurationfiles/AdministrationofYARNadministrativetools/Administrativetoolsnodes,addingfromYARNcluster/AddingandremovingnodesfromaYARNclusternodes,removingfromYARNcluster/AddingandremovingnodesfromaYARNclusterYARNjobs,administrating/AdministratingYARNjobsMapReducejob,configurations/MapReducejobconfigurationsYARNlogmanagement/YARNlogmanagementYARNwebuserinterface/YARNwebuserinterface
YARNapplicationsmonitoring,withwebGUI/MonitoringYARNapplicationswithwebGUIdeveloping/DevelopingYARNapplicationsApplicationClientProtocol/DevelopingYARNapplicationsApplicationMasterProtocol/DevelopingYARNapplicationsContainerManagerProtocol/DevelopingYARNapplications
YARNapplicationworkflowabout/TheYARNapplicationworkflowYARNclient,writing/WritingtheYARNclientApplicationMaster,writing/WritingtheYARNApplicationMaster
YARNarchitecturecomponents/CorecomponentsofYARNarchitecturedevelopment/RecentdevelopmentsinYARNarchitecture
YARNarchitecture,componentsResourceManager/ResourceManagerApplicationMaster(AM)/ApplicationMaster(AM)NodeManager(NM)/NodeManager(NM)
YARNclientwriting/WritingtheYARNclient
YARNclusternodes,addingfrom/AddingandremovingnodesfromaYARNclusternodes,removingfrom/AddingandremovingnodesfromaYARNcluster
YARNjobsadministrating/AdministratingYARNjobs
YARNlogmanagement/YARNlogmanagementYARNMapReducesettings
example/ExampleYARNMapReducesettingsproperties/ExampleYARNMapReducesettings
YARNschedulerpoliciesabout/YARNschedulerpoliciesFIFOscheduler/TheFIFO(FirstInFirstOut)schedulerFairscheduler/Thefairschedulercapacityscheduler/Thecapacityscheduler
![Page 284: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/284.jpg)
YARNschedulingpolicesabout/YARNschedulingpoliciesFIFOscheduler/TheFIFO(FirstInFirstOut)schedulercapacityscheduler/ThecapacityschedulerFairscheduler/Thefairscheduler
YARNwebuserinterface/YARNwebuserinterface
![Page 285: YARN Essentials - storage.ey.md Related/PDFs and... · Operating Hadoop and YARN clusters Starting Hadoop and YARN clusters Stopping Hadoop and YARN clusters Web interfaces of the](https://reader034.vdocuments.net/reader034/viewer/2022052518/5f0dffed7e708231d43d1f9b/html5/thumbnails/285.jpg)
ZZookeeper
URL/ApacheZooKeepershouldbeinstalled