mapreduce/bigtable for distributed...

21
MapReduce/Bigtable for Distributed Op7miza7on Keith Hall, Sco@ Gilpin, Gideon Mann presented by: Slav Petrov Zurich, Thornton, New York Google Inc.

Upload: others

Post on 05-Jan-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

MapReduce/BigtableforDistributedOp7miza7on

KeithHall,Sco@Gilpin,GideonMannpresentedby:SlavPetrovZurich,Thornton,NewYork

GoogleInc.

Page 2: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

Outline

•  Large‐scaleGradientOp7miza7on– DistributedGradient– AsynchronousUpdates–  Itera7veParameterMixtures

•  MapReduceandBigtable

•  ExperimentalResults

Page 3: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

GradientOp7miza7onSeRng

Ifisdifferen7able,thensolveviagradientupdates:

Considercasewhereiscomposedofasumofdifferen7ablefunc7ons,thenthegradientupdatecanbewri@enas:

ffq

!i+1 = !i + "!

q

!fq(!i)

!i+1 = !i + "!f(!i)

!! = argmin!

f(!)

f

Goal:find

Thiscaseisthefocusofthetalk

Page 4: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

MaximumEntropyModelsX Y

S = ((x1, y1), ..., (xm, ym))

p![y|x] =1Z

exp(! · !(x, y))

! : X ! Y " Rn

:inputspace :outputspace

:featuremapping

:trainingdata

:probabilitymodel

Theobjec7vefunc7onisasumma7onoffunc7ons,eachofwhichiscomputedononedatainstance.

!! = argmin!

1m

!

i

log p!(yi|xi)

Page 5: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

DistributedGradient

!i+1 = !i + "!

q

!fq(!i)Observethatthegradientupdatecanbebrokendownintothreephases:

!fq(!i)!

q

Canbedoneinparallel

Cheaptocompute

UpdateStep

Dependsonnumberoffeatures,notdatasize.

Ateachitera7on,mustsendcompletetoeachparallelworker.

!

[Chuetal.’07]

Page 6: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

Stochas7cGradient

!i+1 = !i + "!

q

!fq(!i)Alterna7velyapproximatethesumbyasubsetoffunc7ons.Ifthesubsetissize1,thenthealgorithmisanonline:

!fq(!i)

!!+1 = !i + "!fq(!i)

Stochas7cgradientapproachesprovablyconverge,andinprac7ceoZenmuchquickerthanexactgradientcalcula7onmethods.

Page 7: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

AsynchronousUpdates

Asynchronousupdatesareadistributedextensionofstochas7cgradient.

!i

!fq(!i)

Eachworker:GetcurrentComputeUpdateglobalparametervector

Sinceeachworkerwillnotbecompu7nginlock‐step,somegradientswillbebasedonoldparameters.Nonetheless,thisalsoconverges.

[Langfordetal.’09]

Page 8: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

Itera7veParameterMixtures

Separatelyes7mateondifferentsamples,andthencombine.Itera7ve:takeresul7ngasstar7ngpointfornextroundandrerun.

!!

Forcertainclassesofobjec7vefunc7ons(e.g.maxentandperceptron),convergencecanalsobeshown.

[Mannetal.’09,McDonaldetal.‘10]

Li@lecommunica7onbetweenworkers.

Page 9: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

TypicalDatacenter(atGoogle)

Commoditymachines,withrela7velyfewcores(e.g.<=4)

Racksofmachinesconnectedbycommodityethernetswitches.

NoGPUs.NoSharedMemory.NoNAS.NomassivemulKcoremachines.

[Holzle,Barroso‘09]

Page 10: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

Implica7onsforalgorithms

•  Communica7onbetweenworkersiscostly– Nosharedmemorymeansveryhardtomaintainglobalstate

– Evenethernetcommunica7oncanbeasignificantsourceoflatency

•  Eachworkerrela7velylowpowered– NomachineshavemassiveamountsofRAMordisk

– Simpleisolatedjobsperformedinparallelwillhavesignificantgains.

Page 11: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

Outline

•  DistributedGradientOp7miza7on•  MapReduceandBigtable– Op7malConfigura7ons

•  ExperimentalResults

Page 12: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

MapReduce

1.Backupmapworkerscompensateforfailureofmaptask.

2.Sor7ngphaserequiresallmapjobstobecompletedbeforestar7ngreducephase.

Page 13: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

MapReduceImplementa7ons

DistributedGradientandItera7veParameterMixturesarestraight‐forwardtoimplementin

MapReduce.

ComputeGradient

SumandUpdate

Map

Reduce

DistributedGradient:Eachitera7onhasmul7plemappersandonereducer.

Itera7veParameterMixtures:Onemapper/reducerpair,eachrunstoconvergence,andthenthere’sanoutsidemixtureandloop.

Asynchronous–needssharedmemory.

Page 14: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

Bigtable

Distributedstorage–builtoffofGFSNotadatabase:cellsindexedbyrow/column/7me.(e.g.canrecovernmostrecentvalues)

Mul7plelayersofcaching:

Machineandnetworklevel

Transac7onprocessingisop7onal

Page 15: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

AsynchronousUpdates

UpdateParameters,ComputeGradient

SumandUpdateBigtable

Map

Reduce

Bigtable

Sor7ngturnedoffTransac7onprocessingturnedoff:overwritesarepossible

Gradientscollectedintomini‐batches

Page 16: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

Outline

•  DistributedGradientOp7miza7on•  MapReduceandBigtable

•  ExperimentalResults

Page 17: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

ExperimentalSet‐up

Click‐throughdata– A.370milliontraininginstances•  Par77onedinto200pieces•  240workermachines

– B.1.6billioninstances•  Par77onedinto900pieces•  600workermachines

– Evalua7on:185millioninstances(temporallyoccurringaZerfirstinstances)

Page 18: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

StoppingCriterionTricky

Complicatedbecausetheimplementa7onsareallslightlydifferent

•  Gradientchange:–  noglobalgradientinasychronousoritera7veparameter

mixtures•  Log‐likelihood/objec7vefunc7onvalue:

–  Asynchronouslog‐likelihoodfluctua7ngfrequently•  Fixednumberofitera7ons

–  Meansdifferentthingsforeachimplementa7on

Currentlyworkingonmorecleanperformanceresultes7ma7on–take

AUCresultswithagrainofsalt.

Page 19: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

370MillionTrainingInstances

AUC CPU(rel.) Wallclock(rel.)

NetworkUsage(rel.)

Single‐coreSGD .8557 0.10 9.64 0.03

DistributedGradient .8323 1.00 1.00 1.00

AsynchronousSGD .7283 2.17 7.50 82.47

Itera7veParameterMixture:MaxEnt

.8557 0.12 0.13 0.10

Itera7veParameterMixture:Perceptron

.8870 0.16 0.20 0.22

Observa7ons:Evenwithtuning,AsynchronousUpdatesareimprac7calIteratorParametermixturesperformwell

IPMw/240machinesis~70xasfastassinglecorew/10machinesis8xasfastassinglecore.

Page 20: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

1.6BillionTrainingInstances

AUC CPU(rel.) Wallclock(rel.)

NetworkUsage(rel.)

DistributedGradient .8117 1.00 1.00 1.00

AsynchronousSGD .8525 3.71 14.33 133.93

Itera7veParameterMixture:MaxEnt

.8904 0.17 0.15 0.15

Itera7veParameterMixture:Perceptron

.8961 0.24 0.20 0.38

Observa7ons:Networkusagemuchworsewithmoremachines,andasynchronousmuchslower.

Itera7veparametermixturess7lleffec7vewith900par77ons.

Page 21: MapReduce/Bigtable for Distributed Opmizaonlccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdfBigtable Distributed storage – built off of GFS Not a database: cells indexed by row/column

Conclusions

AsynchronousUpdatesnotsuitedtocurrentdatacenterconfigura7ons

Itera7veparametermixturesgivesignificant7meimprovementsoverdistributedgradient.