distributed systems - cl.cam.ac.uk · distributed systems lecture 1: introduction to distributed...

12
Distributed systems Lecture 1: Introduction to distributed systems; RPC Lent 2016 Dr Robert N. M. Watson (With thanks to Dr Steven Hand) 1 Recommended Reading Distributed Systems: Concepts and Design”, (5 th Ed) Coulouris et al, Addison-Wesley 2012 Distributed Systems: Principles and Paradigms(2 nd Ed), Tannenbaum et al, Prentice Hall, 2006 Operating Systems, Concurrent and Distributed S/W Design“, Bacon & Harris, Addison-Wesley 2003 or “Concurrent Systems”, (2 nd Ed), Jean Bacon, Addison-Wesley 1997 2

Upload: others

Post on 28-May-2020

12 views

Category:

Documents


2 download

TRANSCRIPT

Distributedsystems

Lecture1:Introductiontodistributedsystems;RPC

Lent2016

Dr RobertN.M.Watson

(WiththankstoDr StevenHand)

1

RecommendedReading

• “DistributedSystems:ConceptsandDesign”, (5th Ed)Coulouris etal,Addison-Wesley2012

• “Distributed Systems: Principles and Paradigms”(2

nd

Ed),Tannenbaum etal,PrenticeHall,2006

• “OperatingSystems,ConcurrentandDistributedS/WDesign“,Bacon&Harris,Addison-Wesley2003

– or“ConcurrentSystems”,(2nd Ed),JeanBacon,Addison-Wesley1997

2

WhatareDistributedSystems?

• Asetofdiscretecomputers(“nodes”) that

cooperatetoperformacomputation

– Operates“asif”itwereasinglecomputingsystem

• Examples include:

– Computeclusters(e.g.CERN,HPCF)

– BOINC(akaSETI@Home andfriends)

– Distributedstoragesystems(e.g.NFS,Dropbox,…)

– TheWeb(client/server; CDNs;andback-end too!)

– Peer-to-peer systemssuchasTor

– Vehicles,factories,buildings(?)

3

Concurrentsystemsreminder

• Foundationsofconcurrency:processor(s),ISAs,threads

• Mutualexclusion:locks,semaphores,monitors,etc.

• Producer-consumer,activeobjects,messagepassing

• Races,deadlock,livelock,starvation,priorityinversion

• Transactions,ACID,isolation,serialisability, schedules

• 2-phaselocking,rollback,time-stampordering(TSO),

optimisticconcurrencycontrol(OCC)

• Durability,write-aheadlogging,crashrecovery

• Lock-freealgorithms,transactionalmemory

• Operating-systemcasestudy

4

Theseproblemswerenotdifficultenough– distributedsystemsadd:

lossofglobalvisibility;lossofglobalordering;newfailuremodes

DistributedSystems:Advantages

• Scaleandperformance– Cheapertobuy100PCsthanasupercomputer…

– …andeasiertoincrementallyscaleuptoo!

• SharingandCommunication– Allowaccesstosharedresources(e.g.aprinter)and

information(e.g.distributedFSorDBMS)

– Enableexplicitcommunicationbetweenmachines

(e.g.EDI,CDNs)orpeople(e.g.email,twitter)

• Reliability– Canhopefullycontinuetooperateevenifsomeparts

ofthesystemareinaccessible,orsimplycrash

5

DistributedSystems:Challenges

• DistributedSystemsareConcurrentSystems– Needtocoordinateindependentexecutionateachnode(c/ffirstpartofcourse)

• Failureofanycomponents(nodes,network)– Atanytime,foranyreason

• Networkdelays– Can’tdistinguishcongestionfromcrash/partition

• Noglobaltime– Trickytocoordinate,orevenagreeonordering!

6

Kernel

Local

network/OS

services

Kernel

Local

network/OS

services

Middleware

• Middleware helpsapplicationauthorswritesoftwareintendedtorunonmorethanonemachineatatime.

7

E.g.,

TCP/IP,

Ethernet

MachineBMachineA MachineB

Kernel

Local

network/OS

services

Middlewareservices

Distributedapplications

Network

E.g.,

Linux,

BSD,

Windows

E.g.,Java

runtime

E.g.,Java

RMI

Whatyou

actually

wantedto

do!

Transparency&Middleware

• Recalladistributedsystemshouldappear“asif”

itwereexecutingonasinglecomputer

• Weoftencallthistransparency:– Userisunawareofmultiplemachines

– Programmerisunawareofmultiplemachines

• How“unaware”canvaryquiteabit

– e.g.webuserawarethatthere’snetwork

communication...butnotthenumberorlocationof

themachinesinvolved

– e.g.programmermayexplicitlycodecommunication,

ormayhavelayersofabstraction:middleware

8

ClassicaltypesofTransparency

Transparency Description

Access Hide differences in data representation and how a resource is accessed

Location Hide where a resource is locatedMigration Hide that a resource may move to another location

Relocation Hide that a resource may be moved to another location while in use

Replication Hide that a resource may be provided by multiple cooperating systems

Concurrency Hide that a resource may be simultaneously shared by several competitive users

Failure Hide the failure and recovery of a resource

Persistence Hide whether a (software) resource is in memory or on disk

9Scalabilityincreasinglyimportant– “performancetransparency”?

InthisCourse

• Wewilllookattechniques,protocols&

algorithmsusedindistributedsystems

– inmanycases,thesewillbeprovidedforyoubya

middlewaresoftwaresuite

– butknowinghowthingsworkwillstillbeuseful!

• AssumeOS&networkingsupport

– processes,threads,synchronization

– basiccommunicationviamessages

– (willseelaterhowassumptionsaboutmessageswill

influence thesystemswe[can]build)

• Let’sstartwithasimpleclient-serversystems

10

Client-ServerModel

• 1970s:developmentofLocalAreaNetworks(LANs)• 1980s:standarddeploymentinvolvessmallnumberof

servers,plusmanyworkstations– Servers:always-on,powerfulmachines

– Workstations:personalcomputers

• Workstationsrequest‘service’fromserversoverthe

network,e.g.accesstoasharedfile-system:

11

Request-ReplyProtocols

• Basicscheme:

– Clientissuesarequestmessage

– Serverperformsoperation,andsendsreply

• Simplestversionissynchronous:– clientblocksawaitingreply

• Example:HTTP1.0

– Client(browser)sends“GET/index.html”

– Webserverfetchesfileandreturnsit

– BrowserdisplaysHTMLwebpage

• Laterwewilltalkaboutasynchronousmodels:

– Clientscancontinueworkwithoutblockingawaitingreply

12

HandlingErrors&Failures

• Errors areapplication-level things=>easy;-)– E.g.clientrequestsnon-existentwebpage– Needspecialreply(e.g.“404NotFound”)

• Failures aresystem-level things,e.g.:– lostmessage,client/servercrash,networkdown,…

• Tohandlefailure,clientmusttimeout ifitdoesn’treceiveareplywithinacertaintimeT– Ontimeout,clientcanretry request– (Q:whatshouldwesetTto?)

13

RetrySemantics

• Clientcouldtimeoutbecause:

1. Requestwaslost

2. Requestwassent,butservercrashedonreceipt

3. Requestwassent&received,andserverperformedoperation

(orsomeofit?),butcrashedbeforereplying

4. Requestwassent&received,andserverperformedoperation

correctly,andsentreply…whichwasthenlost

5. As#4,butreplyhasjustbeendelayedforlongerthanT

• Forread-onlystatelessrequests(likeHTTPGET),canretry

inallcases,butwhatifrequestwasanorderwithAmazon?

– Incase#1,weprobablywanttore-order…andincase#5we

wanttowaitforalittlebitlonger,andotherwisewe…erm?

• Worse:wedon’tknowwhatcaseitactuallywas!

14

IdealSemantics

• Whatwewantisexactly-once semantics:

– Ourrequestoccursoncenomatterhowmanytimes

weretry(orifthenetworkduplicatesourmessages)

• E.g.addauniqueIDtoeveryrequest– Server remembersIDs,andassociatedresponses

– Ifseesaduplicate,justreturnsoldresponse

– Clientignoresduplicateresponses

• Prettytrickytoensureexactly-once inpractice

– e.g.ifserverexplodes;-)

15

PracticalSemantics

• Inpractice,protocolsguaranteeoneof:

• All-or-nothing (atomic)semantics

– Useschemeonpreviouspage;persistentlog

– (similarideatotransactionprocessing).

• At-most-once semantics

– Requestcarriedoutonce,ornotatall

– Ifnoreply,wedon’tknowwhichoutcomeitwas

– e.g.sendonerequest;giveupontimeout

• At-least-once semantics

– Retryontimeout; riskoperationoccurringagain

– Okiftheoperationisread-only,oridempotent• Note:Assumptionofnonetworkduplication

16

Serverstate

notrequired

Serverstate

requiredto

suppress

retries

RemoteProcedureCall(RPC)

• Request/responseprotocolsareuseful– andwidely

used– butratherclunkytouse

– e.g.needtodefinethesetofrequests,includinghowthey

arerepresentedinnetworkmessages

• AnicerabstractionisRemoteProcedureCall(RPC)– Programmersimplyinvokesaprocedure…

– …butitexecutesonaremotemachine(theserver)

– RPCsubsystemhandlesmessageformats,sending&

receiving,handlingtimeouts,etc

• Aimistomakedistribution(mostly)transparent

– Certainfailurecaseswouldn’thappenlocally

– Distributedandlocalfunctioncallperformancedifferent

17

MarshallingArguments

• RPCisintegratedwiththeprogramminglanguage

– Someadditionalmagictospecifythingsareremote

• RPClayermarshals parameterstothecall,aswell

asanyreturnvalue(s),e.g.

Caller RPCService RPCService Remote

Function

call(…)

1)Marshalargs

2)GenerateID

4)Starttimer 5)Unmarshal args

6)RecordID

7)Marshal

returnvalues

9)Settimer

10)Unmarshal

returnvalues

11)Acknowledge

fun(…)

3)Send

message

18

8)Send

reply

IDLsandStubs

• Tomarshal,theRPClayer(onbothsides!)mustknow:

– howmanyargumentstheprocedurehas,

– howmanyresultsareexpected,and

– thetypesofalloftheabove

• Theprogrammermustspecifythisbydescribingthingsin

anInterfaceDefinitionLanguage(IDL)– Inhigher-level languages,thismayalreadybeincludedas

standard(e.g.C#,Java)

– Inothers(e.g.C),IDLispartofthemiddleware

• TheRPClayercanthenautomaticallygeneratestubs– Smallpiecesofcodeatclientandserver(seeprevious)

– Mayalsoprovideauthentication,encryption

– Providesintegrity,confidentiality

19

Example:SunRPC

• Developedmid80’sforSunUnixsystems

• Simplerequest/responseprotocol:

– Serverregistersoneormore“programs”(services)

– Clientissuesrequeststoinvokespecificprocedureswithin

aspecificservice

• Messagescanbesentoveranytransportprotocol

(mostcommonlyUDP/IPandlaterTCP/IP)

– RequestshaveauniquetransactionIDthatcanbeusedtodetect&handleretransmissions

– At-least-oncesemantics

– Varioustypesofaccesstransparency includingbyte-order

20

XDR:ExternalDataRepresentation

• SunRPC usedXDR fordescribinginterfaces:

21

// file: test.xprogram test {version testver {

int get(getargs) = 1; // procedure numberint put(putargs) = 2; // procedure number

} = 1; // version number} = 0x12345678; // program number

• rpcgen generates[un]marshalingcode,stubs

• Singlearguments…butrecursivelyconvertvalues

• Somesupportforfollowingpointerstoo

• Dataonthewirealwaysinbig-endianformat(oops!)

UsingSunRPC

1. WriteXDR,anduserpcgen togenerateskeletoncode

2. Fillinblanks(i.e.writeclient/serverparts),compilecode

3. Runserverprogram&registerwithportmapper (now:rpcbind)– Mappingsfrom{prog#,ver#,proto}->port

– (onLinux/UNIX,try“/usr/sbin/rpcinfo–p”)– Portmapper isitselfanRPCserviceonawell-knownport

4. Serverprocesswillthenlisten(),awaitingclients5. Whenaclientstarts,clientstubcallsclnt_create()

– Sends{prog#,ver#,proto}toportmapper onserver,receivesappropriateportnumbertouseforactualRPCconnection

– Clientinvokesremoteproceduresasneeded

6. Recently:GSSauthentication/encryption– e.g.,Kerberos

22

Summary+nexttime

• Aboutthiscourse

• Advantagesandchallengesofdistributedsystems

• Typesoftransparency (+scalability)

• Middleware,theclient-servermodel

• Errorsandretrysemantics

• RPC,marshalling,SunRPC,andXDR

• Sun’sNetworkFileSystem(NFS)

• Object-OrientedMiddleware(OOM)

23