distributed systems communication · approach of building distributed systems to address these...

5
1 DISTRIBUTED SYSTEMS COMMUNICATION Last class we discussed about the core challenges of building distributed systems (incremental scalability is hard, at scale failures are inevitable, constant attacks, etc.). We've said that the core approach of building distributed systems to address these challenges is to construct layers upon layers of systems and services that raise the level of abstraction increasingly through well-defined APIs. Google's storage stack, as well as Spark and other systems, are perfect examples of that layered design for distributed systems. In this lecture, we look at one of these abstractions: RPC, which enables communication. To cooperate, machines need to first communicate. How should they do that? RPC is the predominant way of communicating in a DS. Interestingly enough, RPC – the simplest possible DS abstraction – reflects most of the challenges in distributed systems, showing you how fundamental those things are. I. Remote Procedure Calls (RPCs) Overall goal: To simplify the programmer's job when building distributed systems. - many complexities when building a distributed program - some are fundamental and must be dealt with by programmer, while others are mechanical and can be solved by good language, runtime support - pre-RPC, all of the gory details of socket-level communication were exposed to the programmer - post-RPC, a remote interaction looks (nearly) identical to a local procedure call, with the system hiding (nearly) all of the differences Example of (badly written) socket communication code: Any issues with it? Many… - lots of ugly boiler plate - bug prone - portability details are easy to miss - hard to understand and hard to maintain - hard to evolve protocol - vulnerability prone RPC: RPC tries to make net communication look just like a local procedure call (LPC): struct foomsg { u_int32_t len; } send_foo(int outsock, char *contents) { int msglen = sizeof(struct foomsg) + strlen(contents); char buf = malloc(msglen); struct foomsg *fm = (struct foomsg *)buf; fm->len = htonl(strlen(contents)); memcpy(buf + sizeof(struct foomsg), contents, strlen(contents)); write(outsock, buf, msglen); } Client-side code: z = fn(x, y) Server-side code: fn(x, y) { // compute result z return z; }

Upload: others

Post on 26-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DISTRIBUTED SYSTEMS COMMUNICATION · approach of building distributed systems to address these challenges is to construct layers upon layers of systems and services that raise the

1

DISTRIBUTEDSYSTEMSCOMMUNICATIONLastclasswediscussedaboutthecorechallengesofbuildingdistributedsystems(incrementalscalabilityishard,atscalefailuresareinevitable,constantattacks,etc.).We'vesaidthatthecoreapproachofbuildingdistributedsystemstoaddressthesechallengesistoconstructlayersuponlayersofsystemsandservicesthatraisethelevelofabstractionincreasinglythroughwell-definedAPIs.Google'sstoragestack,aswellasSparkandothersystems,areperfectexamplesofthatlayereddesignfordistributedsystems.Inthislecture,welookatoneoftheseabstractions:RPC,whichenablescommunication.Tocooperate,machinesneedtofirstcommunicate.Howshouldtheydothat?RPCisthepredominantwayofcommunicatinginaDS.Interestinglyenough,RPC–thesimplestpossibleDSabstraction–reflectsmostofthechallengesindistributedsystems,showingyouhowfundamentalthosethingsare.

I.RemoteProcedureCalls(RPCs)Overallgoal:Tosimplifytheprogrammer'sjobwhenbuildingdistributedsystems.

- manycomplexitieswhenbuildingadistributedprogram- somearefundamentalandmustbedealtwithbyprogrammer,whileothersaremechanical

andcanbesolvedbygoodlanguage,runtimesupport- pre-RPC,allofthegorydetailsofsocket-levelcommunicationwereexposedtothe

programmer- post-RPC,aremoteinteractionlooks(nearly)identicaltoalocalprocedurecall,withthe

systemhiding(nearly)allofthedifferencesExampleof(badlywritten)socketcommunicationcode:

Anyissueswithit?Many…-lotsofuglyboilerplate-bugprone-portabilitydetailsareeasytomiss-hardtounderstandandhardtomaintain-hardtoevolveprotocol-vulnerabilityprone…

RPC:RPCtriestomakenetcommunicationlookjustlikealocalprocedurecall(LPC):

structfoomsg{u_int32_tlen;}send_foo(intoutsock,char*contents){intmsglen=sizeof(structfoomsg)+strlen(contents);charbuf=malloc(msglen);structfoomsg*fm=(structfoomsg*)buf;fm->len=htonl(strlen(contents));memcpy(buf+sizeof(structfoomsg),contents,strlen(contents));write(outsock,buf,msglen);}

Client-sidecode:z=fn(x,y)

Server-sidecode:fn(x,y){

//computeresultzreturnz;

}

Page 2: DISTRIBUTED SYSTEMS COMMUNICATION · approach of building distributed systems to address these challenges is to construct layers upon layers of systems and services that raise the

2

WhytheLPCabstraction?Because,it’ssimple,elegant,andevennoviceprogrammersknowhowtousefunctioncalls!Othermodelsexist,buttheyare(arguably)hardertothinkabout.RPCmessagediagram:ClientServerrequest--->process<---responseRPCsoftwarestructure:

(figuretakenfromRPCpaperbyBirrell,et.al.)

Inimplementations,stubsaregeneratedautomaticallybyRPCframeworks(libraries),whichalsoprovidetheRPCRuntime.Theprogrammeronlywritesdefinitionsfortheirdatastructuresandprotocolsinan“interfacedefinitionlanguage”(IDL).We’llseetwoexamplesofRPClibrarieslater.GoodthingsaboutRPCabstraction:

- Hidesgorynetwork/marshalingdetailsthatonewouldhavetoimplementifdoing,e.g.,network-levelcommunication,byteorders(bigendianvs.littleendian)

- Supportsevolutionofthecommunicatingcomponentsindependently.(ManyRPCframeworksprovideexplicitsupportforevolution,e.g.,protocolversionnumbers,rulesofwhentoremove/renameargumentstoprocedurecalls,etc.)

- Allowsforefficientpackaging- Authenticationsupport- Locationindependence(particularlyifcombinedwithadirectoryservice)

ProblemstheRPCabstraction(i.e.,wheredistributionofRPCpeeksthroughtheLPCillusion):

- Latency:LPCisfast,RPCcanbereallyexpensive(esp.ifitgoesoverWAN,butnotonly)

Page 3: DISTRIBUTED SYSTEMS COMMUNICATION · approach of building distributed systems to address these challenges is to construct layers upon layers of systems and services that raise the

3

- Pointertransfers:localaddressspaceisn’tsharedwithremoteprocess.Theprogrammer/RPClibraryhastomakeadecisionofwhethertofollowpointersandserializeindepth,ortoexcludethatinformation.Typically,IDLsavoidthisproblembyrequiringonetospecifyone’smessages/datastructurestoavoidpointerconfusion.

- Failures:theyaremorefundamentalthaninthesingle-process/LPCcase.o IfIissueanLPC,thenIcanbeinoneofthreesituations:(1)theLPCreturns,

thereforeIknowtheprocedurehasbeenexecuted;or(2)theLPCdoesn’treturn,inwhichcaseIknowtheLPChasn’tfinishedexecuting(itmaystillbeexecuting,ortheprocessthathostsboththecallerandthecalleemighthavedied).So,withLPC,thecallerandcalleecan’thavea“splitmind.”WithRPC,theycan.

o IfIissueanRPCandgetareturn,Iknowtheprocedurehasbeenexecutedremotely.ButifIdon’tgetareturn,whatdoIknow?HasthefunctioncompletedontheremotesideandIjustdon’tknowaboutitbecause,e.g.,theresponsehasbeenlostoverthenetwork?Hasitnothappened,because,e.g.,myrequesthasbeenlostoverthenetworkortheremotemachineisdead?Ormaybethemachine/networkisslow–shouldIwaitlongerfortheresponse?HowlongshouldIwaitforaresponse?Ican’ttellthedifferencebetweenthesesituations,andIdon’tknowhowtoact.Incertainscenarios,that’snotabigproblem,butinothersitis.

Forexample,imaginethedistributedsystemiscomposedofanATMmachineanditsbankback-endsystem.ApersonrequestsawithdrawalofcashfromtheATMmachine.TheATMmachineneedstocheckthatthepersonhassufficientfundsin,andwithdrawthesumfrom,hisbankaccountbeforegivingoutthemoney.TheATMrequeststhemoneytransferandthenwaits,andwaits,andwaits.Noresponse.Shoulditgiveoutthemoneyorshoulditnot,ifitdoesn’tgetaresponsefromthebank?Nosimpleanswer.Needsometimeout(personcan’twaitforever),butwhatshouldthatbe?SayATMrefusestogiveoutthemoneyuponatimeout.Butwhatifthebankalreadydeductedthemoneyfromtheaccount,howshouldwedealwiththat?TheATMneedstokeeptryingtotalkwiththebanktotellitthatithasn’tactuallyfinishedthetransaction,sothebankcanputbackthemoneyintheperson’saccount.ButwhatiftheATMitselffailsatsomepoint?Istheperson’smoneylostforever?IwouldhateinteractingwithsuchanATM.Alternatively,sayATMgivesoutthemoneyuponatimeout.Thisofcoursecanopenthebanktofraud:apersonmightquicklygotoanotherATMandextractthesamemoneyagain.

Hereinliesthebiggestdifficultyindistributedsystemscomparedtosingle-processsystems:withsinglemachines,youlargelyhavesharedfate(i.e.,eitherallfailsornothingdoes);withdistributedsystems,machines/processescanfailindependently,soyoucanhavesomemachinesthatareupwhileothersaredown,andyoucanalsohaveallmachinesupbutunabletocommunicateduetonetworkfailuresandpartitions.Thisleadstocoordinationchallenges,whichinturnleadtoinconsistenciesbetweenthedecisionsthesemachinesmake.SothatgoalofDSofprovidinga“coherentservice”isdifficulttoachieve.

Page 4: DISTRIBUTED SYSTEMS COMMUNICATION · approach of building distributed systems to address these challenges is to construct layers upon layers of systems and services that raise the

4

II.Semantics

TheprecedingATMexampleleadsdirectlytoaquestionofsemantics.Whatshouldbethemeaning(semantic)ofanRPCinvocation?Networkfailuresimplytimeoutsandretransmissions.Theyimplythatclientswillendupretransmittingrequests,andtheserverwillseeduplicates.Howshouldtheserverbehavewhenitseesduplicates?Belowaresomepotentialanswers,anddifferentRPCframeworkswillimplementdifferentsemantics.At-least-once:theRPCcall,onceissuedbytheclient,isexecutedeventuallyatleastonce,butpossiblymultipletimes.Thissemanticistheeasiesttoimplementbutraisesissues.Toimplement,theRPClibraryontheclientkeepsissuingtheRPCmultipletimesuntilitgetsaresponsefromtheserver.Assumingthatfailures(ofnetwork,server)aretemporary,theRPCwillbeexecutedatleastonce.Thisisfineforidempotentrequests(e.g.,thisemailinauser’sinboxhasbeenread),butmightbeproblematicforothertypesofrequests(e.g.,ATMexample,purchasinganitem).

At-most-once:theRPCcall,onceissuedbyaclient,getsexecutedzerooronetime.Thissemanticisnexteasiesttoimplement,buthascomplicationsifyoutrytodoitsensibly.Toimplement,theserverhastorememberrequeststhatithasalreadyseenandprocessed,todetectduplicatestosquelch.Client/serversneedtoembeduniqueIDs(XIDs)inmessages.Somethinglikefollows:

Servermaintainss[xid](state),r[xid](returnvalue)ifs[xid]==DONE:returnr[xid]x=handler(args)s[xid]=DONEr[xid]=xreturnx

Onecomplication:whathappensifservercrashes/restarts?Serverwilllosethes,rtables.Solution:storeondisk[slow,expensive].Anothercomplication:evenifstoreondisk,whathappensifservercrashesjustbeforeoraftercalltohandler()?-afterrestart,servercan'ttellifhandler()wascalled-servermustreplywith"CRASH"whenclientresendsrequestSolution:serverhas"servernonce"thatidentifiesrestartgeneration-clientobtainsnonceduringbindwithserver-clienttransmitsnonceineveryRPCrequest-ifnoncedoesn'tmatchserver'sstate,returnCRASHExactlyonce:thisistheideal(itresemblestheLPCmodelmostcloselyandit’seasiesttounderstand),butit’ssurprisinglyhardtoimplement.ThisiswhentheRPC,onceissuedbytheclient,isinvokedexactlyoncebytheserver.Whyhardtobuild?-serverfailureataninopportunetimecanstymie--clientblocksforever-needtoimplementhandler()asatransactionthatincludess[xid],r[xid]commitonserver

Page 5: DISTRIBUTED SYSTEMS COMMUNICATION · approach of building distributed systems to address these challenges is to construct layers upon layers of systems and services that raise the

5

III.ExampleLibrariesSomesamplecodeusingvariousRPClibrariescanbefoundathttps://columbia.github.io/ds1-class/lectures/02-rpc-handout.pdf.

IV.OtherCommunicationModelsRPCisjustonecommunicationmodelindistributedsystems.Itworkswellwhenthecommunicationpatternlookslikethis:

-point-to-point,request/process/responsetoaserverorservice.-lotsofcommunicationindistributedsystemslooklikethat,butnotall.

Herearesomeotherpopularcommunicationpatternsandthecorrespondingcommunicationabstractionsthatsupportthem.Wewon’tstudythemindepthinthiscourse.

Acknowledgements

TheprecedingnoteswereadaptedfromtheUniversityofWashingtondistributedsystemscourse,SteveGribble’sedition:http://courses.cs.washington.edu/courses/csep552/13sp/lectures/1/rpc.txt