piccolo - princeton university computer science€¦ · piccolo building fast, distributed programs...
Post on 17-Oct-2020
0 Views
Preview:
TRANSCRIPT
PICCOLOBUILDINGFAST,DISTRIBUTEDPROGRAMS
WITHPARTITIONEDTABLES
RussellPower,JinyangLiNewYorkUniversity
OSDI2010
WhatisPICCOLO?
• Adata-centricprogrammingmodelforapplicationsthat• Aredistributed• Arein-memory• Accessandmutatesomesharedintermediatestate.
• Allowsuserstospecify:• Howisdatapartitioned?• LocalityPolicies
• MPIrequirestoomuchwork!• Fine-grainedcontrol.
• Typicaldata-centricprogrammingmodels• Aregoodforbulkprocessingofon-diskdata.• Notforin-memoryapplications?
• Read1MBsequentiallyfrommemory– 0.00025ms• Read1MBsequentiallyfromnetwork– 0.01000ms• Read1MBsequentiallyfromdisk– 0.03000ms
• Canwedobetter?• Yes,lettheuserfigureoutcertaindetails.• But..Why?Betterperformance.
[https://www.usenix.org/legacy/events/osdi10/tech/slides/power.pdf]
https://www.usenix.org/legacy/events/osdi10/tech/slides/power.pdf
https://www.usenix.org/legacy/events/osdi10/tech/slides/power.pdf
ProgrammingModel
• Control functions• Launchkernels• Createtables• Sychronize throughbarriers• Runsononemachine
• Kernel functions• Distributed,manyinstancesareruntogether• Readandwritetotables
Tables
• Key-valuestores• Get(Key)• Put(Key,Value)• Update(Key,Value)• Flush()
• Userdefinedaccumulationfunctions• Commutative,Associative• Local– noaccesstoglobalstate.• Dealwithwrite-writeconflicts.
• But..Why?Writechangesarebuffered.
https://www.usenix.org/legacy/events/osdi10/tech/slides/power.pdf
https://www.usenix.org/legacy/events/osdi10/tech/slides/power.pdf
Paritioning andLocality
• Tablescanbepartitioned.• Assumeeachfragement fitsinmemory.
• LocalityPreferences• Co-locatecertainparitions ofdifferenttables.• Co-locatedataandexecution.
https://www.usenix.org/legacy/events/osdi10/tech/slides/power.pdf
https://www.usenix.org/legacy/events/osdi10/tech/slides/power.pdf
Checkpoints
• Asynchronous• Takesasarguments– Tables,User-Data
• Synchronous• Takesasarguments– TimeInterval,Tables,Callback
https://www.usenix.org/legacy/events/osdi10/tech/slides/power.pdf
SystemDesign
• Master node• ControlThread• AssignsKernelstoWorkers• Theassignmentisa“publicannouncement”.
• Worker nodes• HandleKernelexecutions• Principle:Bufferaslongasyoucan.• Ifneedtoread,thenflush(),andread().
• Why?Wantthatforasinglethread,thechronologymakessense.
LoadBalancing
• Initialallocation• Doround-robin.• Ifthereisadistributedfile,minimizeinter-racktransfer.
• DynamicLoadBalancing• But..Why?• Heterogeneoushardwareconfigs• How?
• Killnorunningtask.• Havetomigratetablepartitions.
WorkStealing
• Ifaworkerisfree,assignitataskfromthebusiestworker.• Dolargertasksfirst.• Estimatetasksizebysizeofpartitionoftable.
TablePartitionMigration(OLDtoNEW)
• Phase1—• MastersaysBEGIN.• AllworkersflushchangestoOLD,sendnewrequeststoNEW.• OLDpauses,relayrequeststoNEW.Then,transferstate.• NEWbuffersrequests,doesnotact.• ACKsfromall.
• Phase2—• MastersaysDO_IT_NOWtoOLDandNEW.• AllworkersflushchangestoOLD,sendnewrequeststoNEW.• OLDpauses,relayrequeststoNEW.• NEWbuffersrequests,doesnotact.• Now,OLDsendsrequests.• NEWnowacts.
FaultTolerance
• Ifonefails,restartallfromthelastcheckpoint.• Check-point• Needtosaveaconsistentcheckpointwithoutstoppingtheexecution.
• Chady-Lamport algorithm• Idea:Takeasnapshotofstate.Keepalogofchanges.• Whentodoit?
• Early.Logcouldbetoolarge.• Late.Missedoppurtunity toconcurrentlydoexecutionandcheckpoint.• Doitwhenthefirstworkerisdone.
Experiments
• On• 12nodeNYUnetwork.• EC2
• Implements• PageRank– co-locaterankandgraph.• DistributedCrawler– co-locate“polite”,“robots”,“sites”• K-Means• N-BodySimulation• MatrixMultiplication
Strengths
• Naturalmodelforsomeapplications.• Configurabilityallowsforapplicationtuning.• 11x,4xperformanceforPageRank,K-Means
• Notanall-purposesystem;targetsspecificapps.• Makesgoodchoicesastowhattodelegate.
• Checkpointingcontrolvariableistheuser’sjob.
“Not-Strength”s
• Theone-fails-all-dopolicy.• Associative,Commutativeaccumulators?• Whatifthemasterfails?• Key-valueinterface.Multi-entrywrites?• Tooreliantontheuser?• Checkpointscalability?
• Comparedtosomeotherin-memorysystem?
top related