distributed systems - university of cambridge · recommended reading • “distributed systems:...

23
Distributed systems Lecture 1: Introduction to distributed systems; RPC Lent 2016 Dr Robert N. M. Watson (With thanks to Dr Steven Hand) 1

Upload: others

Post on 28-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

DistributedsystemsLecture1:Introductiontodistributedsystems;RPC

Lent2016Dr RobertN.M.Watson

(WiththankstoDr StevenHand)

1

Page 2: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

RecommendedReading

• “DistributedSystems:ConceptsandDesign”,(5th Ed)Coulouris etal,Addison-Wesley2012

• “Distributed Systems: Principles and Paradigms”(2nd Ed),Tannenbaum etal,PrenticeHall,2006

• “OperatingSystems,ConcurrentandDistributedS/WDesign“,Bacon&Harris,Addison-Wesley2003– or“ConcurrentSystems”,(2nd Ed),JeanBacon,Addison-Wesley1997

2

Page 3: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

WhatareDistributedSystems?

• Asetofdiscretecomputers(“nodes”)thatcooperatetoperformacomputation– Operates“asif”itwereasinglecomputingsystem

• Examplesinclude:– Computeclusters(e.g.CERN,HPCF)– BOINC(akaSETI@Homeandfriends)– Distributedstoragesystems(e.g.NFS,Dropbox,…)– TheWeb(client/server;CDNs;andback-endtoo!)– Peer-to-peersystemssuchasTor– Vehicles,factories,buildings(?)

3

Page 4: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

Concurrentsystemsreminder• Foundationsofconcurrency:processor(s),ISAs,threads• Mutualexclusion:locks,semaphores,monitors,etc.• Producer-consumer,activeobjects,messagepassing• Races,deadlock,livelock,starvation,priorityinversion• Transactions,ACID,isolation,serialisability,schedules• 2-phaselocking,rollback,time-stampordering(TSO),optimisticconcurrencycontrol(OCC)

• Durability,write-aheadlogging,crashrecovery• Lock-freealgorithms,transactionalmemory• Operating-systemcasestudy

4

Theseproblemswerenotdifficultenough– distributedsystemsadd:lossofglobalvisibility;lossofglobalordering;newfailuremodes

Page 5: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

DistributedSystems:Advantages• Scaleandperformance– Cheapertobuy100PCsthanasupercomputer…– …andeasiertoincrementallyscaleuptoo!

• SharingandCommunication– Allowaccesstosharedresources(e.g.aprinter)andinformation(e.g.distributedFSorDBMS)

– Enableexplicitcommunicationbetweenmachines(e.g.EDI,CDNs)orpeople(e.g.email,twitter)

• Reliability– Canhopefullycontinuetooperateevenifsomepartsofthesystemareinaccessible,orsimplycrash

5

Page 6: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

DistributedSystems:Challenges

• DistributedSystemsareConcurrentSystems– Needtocoordinateindependentexecutionateachnode(c/ffirstpartofcourse)

• Failureofanycomponents(nodes,network)– Atanytime,foranyreason

• Networkdelays– Can’tdistinguishcongestionfromcrash/partition

• Noglobaltime– Trickytocoordinate,orevenagreeonordering!

6

Page 7: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

Kernel

Localnetwork/OSservices

Kernel

Localnetwork/OSservices

Middleware

• Middleware helpsapplicationauthorswritesoftwareintendedtorunonmorethanonemachineatatime. 7

E.g.,TCP/IP,Ethernet

MachineBMachineA MachineB

Kernel

Localnetwork/OSservices

Middlewareservices

Distributedapplications

Network

E.g.,Linux,BSD,

Windows

E.g.,Javaruntime

E.g.,JavaRMI

Whatyouactuallywantedto

do!

Page 8: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

Transparency&Middleware• Recalladistributedsystemshouldappear“asif”itwereexecutingonasinglecomputer

• Weoftencallthistransparency:– Userisunawareofmultiplemachines– Programmerisunawareofmultiplemachines

• How“unaware”canvaryquiteabit– e.g.webuserawarethatthere’snetworkcommunication...butnotthenumberorlocationofthemachinesinvolved

– e.g.programmermayexplicitlycodecommunication,ormayhavelayersofabstraction:middleware

8

Page 9: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

ClassicaltypesofTransparencyTransparency Description

Access Hide differences in data representation and how a resource is accessed

Location Hide where a resource is located

Migration Hide that a resource may move to another location

Relocation Hide that a resource may be moved to another location while in use

Replication Hide that a resource may be provided by multiple cooperating systems

Concurrency Hide that a resource may be simultaneously shared by several competitive users

Failure Hide the failure and recovery of a resource

Persistence Hide whether a (software) resource is in memory or on disk

9Scalability increasinglyimportant– “performancetransparency”?

Page 10: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

InthisCourse• Wewilllookattechniques,protocols&algorithmsusedindistributedsystems– inmanycases,thesewillbeprovidedforyoubyamiddlewaresoftwaresuite

– butknowinghowthingsworkwillstillbeuseful!• AssumeOS&networkingsupport– processes,threads,synchronization– basiccommunicationviamessages– (willseelaterhowassumptionsaboutmessageswillinfluencethesystemswe[can]build)

• Let’sstartwithasimpleclient-serversystems

10

Page 11: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

Client-ServerModel• 1970s:developmentofLocalAreaNetworks(LANs)• 1980s:standarddeploymentinvolvessmallnumberofservers,plusmanyworkstations– Servers:always-on,powerfulmachines– Workstations:personalcomputers

• Workstationsrequest‘service’fromserversoverthenetwork,e.g.accesstoasharedfile-system:

11

Page 12: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

Request-ReplyProtocols• Basicscheme:– Clientissuesarequestmessage– Serverperformsoperation,andsendsreply

• Simplestversionissynchronous:– clientblocksawaitingreply

• Example:HTTP1.0– Client(browser)sends“GET/index.html”– Webserverfetchesfileandreturnsit– BrowserdisplaysHTMLwebpage

• Laterwewilltalkaboutasynchronousmodels:– Clientscancontinueworkwithoutblockingawaitingreply

12

Page 13: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

HandlingErrors&Failures

• Errors areapplication-level things=>easy;-)– E.g.clientrequestsnon-existentwebpage– Needspecialreply(e.g.“404NotFound”)

• Failures aresystem-level things,e.g.:– lostmessage,client/servercrash,networkdown,…

• Tohandlefailure,clientmusttimeout ifitdoesn’treceiveareplywithinacertaintimeT– Ontimeout,clientcanretry request– (Q:whatshouldwesetTto?)

13

Page 14: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

RetrySemantics• Clientcouldtimeoutbecause:

1. Requestwaslost2. Requestwassent,butservercrashedonreceipt3. Requestwassent&received,andserverperformedoperation

(orsomeofit?),butcrashedbeforereplying4. Requestwassent&received,andserverperformedoperation

correctly,andsentreply…whichwasthenlost5. As#4,butreplyhasjustbeendelayedforlongerthanT

• Forread-onlystatelessrequests(likeHTTPGET),canretryinallcases,butwhatifrequestwasanorderwithAmazon?– Incase#1,weprobablywanttore-order…andincase#5we

wanttowaitforalittlebitlonger,andotherwisewe…erm?• Worse:wedon’tknowwhatcaseitactuallywas!

14

Page 15: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

IdealSemantics

• Whatwewantisexactly-once semantics:– Ourrequestoccursoncenomatterhowmanytimesweretry(orifthenetworkduplicatesourmessages)

• E.g.addauniqueIDtoeveryrequest– ServerremembersIDs,andassociatedresponses– Ifseesaduplicate,justreturnsoldresponse– Clientignoresduplicateresponses

• Prettytrickytoensureexactly-onceinpractice– e.g.ifserverexplodes;-)

15

Page 16: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

PracticalSemantics• Inpractice,protocolsguaranteeoneof:• All-or-nothing (atomic)semantics

– Useschemeonpreviouspage;persistentlog– (similarideatotransactionprocessing).

• At-most-once semantics– Requestcarriedoutonce,ornotatall– Ifnoreply,wedon’tknowwhichoutcomeitwas– e.g.sendonerequest;giveupontimeout

• At-least-once semantics– Retryontimeout; riskoperationoccurringagain– Okiftheoperationisread-only,oridempotent

• Note:Assumptionofnonetworkduplication

16

Serverstatenotrequired

Serverstaterequiredtosuppressretries

Page 17: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

RemoteProcedureCall(RPC)• Request/responseprotocolsareuseful– andwidelyused– butratherclunkytouse– e.g.needtodefinethesetofrequests,includinghowtheyarerepresentedinnetworkmessages

• AnicerabstractionisRemoteProcedureCall(RPC)– Programmersimplyinvokesaprocedure…– …butitexecutesonaremotemachine(theserver)– RPCsubsystemhandlesmessageformats,sending&receiving,handlingtimeouts,etc

• Aimistomakedistribution(mostly)transparent– Certainfailurecaseswouldn’thappenlocally– Distributedandlocalfunctioncallperformancedifferent

17

Page 18: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

MarshallingArguments

• RPCisintegratedwiththeprogramminglanguage– Someadditionalmagictospecifythingsareremote

• RPClayermarshals parameterstothecall,aswellasanyreturnvalue(s),e.g.

Caller RPCService RPCService RemoteFunction

call(…)

1)Marshalargs2)GenerateID4)Starttimer 5)Unmarshal args

6)RecordID

7)Marshalreturnvalues

9)Settimer10)Unmarshal

returnvalues11)Acknowledge

fun(…)

3)Sendmessage

18

8)Sendreply

Page 19: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

IDLsandStubs• Tomarshal,theRPClayer(onbothsides!)mustknow:

– howmanyargumentstheprocedurehas,– howmanyresultsareexpected,and– thetypesofalloftheabove

• TheprogrammermustspecifythisbydescribingthingsinanInterfaceDefinitionLanguage(IDL)– Inhigher-levellanguages,thismayalreadybeincludedas

standard(e.g.C#,Java)– Inothers(e.g.C),IDLispartofthemiddleware

• TheRPClayercanthenautomaticallygeneratestubs– Smallpiecesofcodeatclientandserver(seeprevious)– Mayalsoprovideauthentication,encryption– Providesintegrity,confidentiality

19

Page 20: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

Example:SunRPC• Developedmid80’sforSunUnixsystems• Simplerequest/responseprotocol:– Serverregistersoneormore“programs”(services)– Clientissuesrequeststoinvokespecificprocedureswithinaspecificservice

• Messagescanbesentoveranytransportprotocol(mostcommonlyUDP/IPandlaterTCP/IP)– RequestshaveauniquetransactionIDthatcanbeusedtodetect&handleretransmissions

– At-least-once semantics– Varioustypesofaccesstransparency includingbyte-order

20

Page 21: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

XDR:ExternalDataRepresentation

• SunRPC usedXDR fordescribinginterfaces:

21

// file: test.xprogram test {

version testver { int get(getargs) = 1; // procedure numberint put(putargs) = 2; // procedure number

} = 1; // version number} = 0x12345678; // program number

• rpcgen generates[un]marshalingcode,stubs• Singlearguments…butrecursivelyconvertvalues• Somesupportforfollowingpointerstoo

• Dataonthewirealwaysinbig-endianformat(oops!)

Page 22: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

UsingSunRPC1. WriteXDR,anduserpcgen togenerateskeletoncode2. Fillinblanks(i.e.writeclient/serverparts),compilecode3. Runserverprogram&registerwithportmapper (now:

rpcbind)– Mappingsfrom{prog#,ver#,proto}->port– (onLinux/UNIX,try“/usr/sbin/rpcinfo –p”)– Portmapper isitselfanRPCserviceonawell-knownport

4. Serverprocesswillthenlisten(),awaitingclients5. Whenaclientstarts,clientstubcallsclnt_create()

– Sends{prog#,ver#,proto}toportmapper onserver,receivesappropriateportnumbertouseforactualRPCconnection

– Clientinvokesremoteproceduresasneeded6. Recently:GSSauthentication/encryption– e.g.,Kerberos

22

Page 23: Distributed systems - University of Cambridge · Recommended Reading • “Distributed Systems: Concepts and Design”, (5th Ed) Coulouriset al, Addison-Wesley 2012 • “DistributedSystems:PrinciplesandParadigms”

Summary+nexttime• Aboutthiscourse• Advantagesandchallengesofdistributedsystems• Typesoftransparency(+scalability)• Middleware,theclient-servermodel• Errorsandretrysemantics• RPC,marshalling,SunRPC,andXDR

• Sun’sNetworkFileSystem(NFS)• Object-OrientedMiddleware(OOM)

23