models of distributed computing

45
Models of Distributed Computing Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)

Upload: niyati

Post on 12-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

COMP 150-IDS: Internet Scale Distributed Systems (Fall 2012). Models of Distributed Computing. Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah. Architecting a universal Web. Identification: URIs Interaction: HTTP - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Models of Distributed Computing

Models of Distributed Computing

Noah MendelsohnTufts UniversityEmail: [email protected]: http://www.cs.tufts.edu/~noah

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)

Page 2: Models of Distributed Computing

© 2010 Noah Mendelsohn

Architecting a universal Web

Identification: URIs

Interaction: HTTP

Data formats: HTML, JPEG, GIF, etc.

Page 3: Models of Distributed Computing

© 2010 Noah Mendelsohn3

Goals

Introduce basics of distributed system design

Explore some traditional models of distributed computing

Prepare for discussion of REST: the Web’s model

Page 4: Models of Distributed Computing

© 2010 Noah Mendelsohn

Communicating systems

Page 5: Models of Distributed Computing

© 2010 Noah Mendelsohn

Communicating systems

CPUMemoryStorage

CPUMemoryStorage

We have multiple programs, running asynchronously, sending messages

Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

Page 6: Models of Distributed Computing

© 2010 Noah Mendelsohn

Communicating Sequential Processes

CPUMemoryStorage

CPUMemoryStorage

We have multiple programs, running asynchronously, sending messages

Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

We’ve got pretty clean higher level abstractions for use on a

single machine

Page 7: Models of Distributed Computing

© 2010 Noah Mendelsohn

Communicating systems

CPUMemoryStorage

CPUMemoryStorage

We have multiple programs, running asynchronously, sending messages

Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

How can we get a clean model of two communicating machines?

Page 8: Models of Distributed Computing

© 2010 Noah Mendelsohn

Large scale systems

Internet

What are the clean abstractions on this scale?

How can we get a clean model of a worldwide network of

communicating machines?

Page 9: Models of Distributed Computing

© 2010 Noah Mendelsohn

WARNING!!

This is a very big topic…

…many important approaches have been studied and used…

…there is lots of operational experience, and also formalisms…

This presentation does not attempt to be either comprehensive or balanced…the goal is to introduce some key concepts

Page 10: Models of Distributed Computing

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Message Passing

Page 11: Models of Distributed Computing

© 2010 Noah Mendelsohn

Message passing

CPUMemoryStorage

CPUMemoryStorage

Programs send messages to and from each others’ memories

Page 12: Models of Distributed Computing

© 2010 Noah Mendelsohn

Half duplex: one way at a time

CPUMemoryStorage

CPUMemoryStorage

Programs send messages to and from each others’ memories

Page 13: Models of Distributed Computing

© 2010 Noah Mendelsohn

Full duplex: both ways at the same time

CPUMemoryStorage

CPUMemoryStorage

Programs send messages to and from each others’ memories

Page 14: Models of Distributed Computing

© 2010 Noah Mendelsohn

Message passing

Data abstraction:

– Low level: bytes (octets)

– Sometimes: agreed metaformat (XML, C struct, etc.)

Synchronization

– Wait for message

– Timeout

Page 15: Models of Distributed Computing

© 2010 Noah Mendelsohn

Interaction Patterns

Page 16: Models of Distributed Computing

© 2010 Noah Mendelsohn

Between pairs of machines

Message passing: no constraints

Common pattern: request/response

CPUMemoryStorage

CPUMemoryStorage

Request

Response

Page 17: Models of Distributed Computing

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Client Server

Page 18: Models of Distributed Computing

© 2010 Noah Mendelsohn

Client / server

Request / response is a traffic pattern

Client / server describes the roles of the nodes

Server provides service for client

CPUMemoryStorage

CPUMemoryStorage

Request service

Response

Page 19: Models of Distributed Computing

© 2010 Noah Mendelsohn

Client / server

Probably the most common dist. sys. architecture

Simple – well understood

Doesn’t explain:

– How to exploit more than 2 machines

– How to make programming easier

– How to prove correctness: though the simple model helps

Most client/server systems are request/response

Page 20: Models of Distributed Computing

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-N-Tier

Page 21: Models of Distributed Computing

© 2010 Noah Mendelsohn

N-tier – also called Multilevel Client/Server

Layered

Each tier provides services for next higher level

Reasons:– Information hiding

– Management

– Scalability

CPUMemoryStorage

CPUMemoryStorage

Request

Response

CPUMemoryStorage

Request

Response

Page 22: Models of Distributed Computing

© 2010 Noah Mendelsohn

Typical N-tier system: airline reservation

Application - logicBrowser or Phone App Application - logic

iPhone or AndroidReservation Application

Flight ReservationLogic

ReservationRecords

Many commercial applications work this way

Page 23: Models of Distributed Computing

© 2010 Noah Mendelsohn

The Web itself is a 2 or 3 Tier system

E.g. SquidE.g. Firefox E.g. Apache

Browser

Proxy Cache(optional!)

Web Server

Many commercial applications work this way

Page 24: Models of Distributed Computing

© 2010 Noah Mendelsohn

Web Reservation System

Application - logicBrowser or Phone App Application - logic

Web-BaseReservation Application

Flight ReservationLogic

ReservationRecords

Many commercial applications work this way

E.g. Squid

Proxy Cache(optional!)

HTTPHTTP RPC? ODBC? Proprietary?

Page 25: Models of Distributed Computing

© 2010 Noah Mendelsohn

Web Publishing System

E.g. cnn.comBrowser or Phone App Database or CMS

Web-BaseReservation Application

Content Web Site

Content Management System

Many commercial applications work this way

E.g. Akamia

Content Distribution

Network

Page 26: Models of Distributed Computing

© 2010 Noah Mendelsohn

Advantages of n-tier system

Separation of concerns – each layer has own role

Parallism and performance?

– If done right: multiple mid-tier servers work in parallel

– Back end systems centralize mainly data requiring sharing & synchronization

– Mid tier can provide shared, scalable caching

Information hiding

– Mid-tier apps shielded from data layout

Security

– Credit card numbers etc. not stored at mid-tier

Page 27: Models of Distributed Computing

© 2010 Noah Mendelsohn

Other patterns

Spanning tree

Broadcast (send to many nodes at once)

Flood

Various P2P

Etc.

Page 28: Models of Distributed Computing

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Remote Procedure Call

Page 29: Models of Distributed Computing

© 2010 Noah Mendelsohn

Remote Procedure Call

The term RPC was coined by the late Bruce Nelson in his 1981 CMU PhD thesis

Key idea: an ordinary function call executes remotely

The trick: the language runtime or helper code must automatically generate code to send parameters and results

For languages like C: proxies and stubs are generated

– Not needed in dynamic languages like Ruby, JavaScript, etc.

RPC is often (erroneously IMO) used to describe any request / response system

Page 30: Models of Distributed Computing

© 2010 Noah Mendelsohn

RPC: Call remote functions automatically

Interface definition: float sqrt(float n);

Proxies and stubs generated automatically

RPC provides transparent remote invocation

CPUMemoryStorage

CPUMemoryStorage

x = sqrt(4)

floatsqrt(float n) { …compute sqrt… return result;}

floatsqrt(float n) { send n; read s; return s;}

proxy

voiddoMsg(Msg m) { s = sqrt(m.s); send s;}

stub

Requestinvoke sqrt(4)

Responseresult=2 (no exception thrown)

Page 31: Models of Distributed Computing

© 2010 Noah Mendelsohn

RPC: Pros and Cons

Pros:

– Transparency is very appealing

– Simple programming model

– Useful as organizing principle even when not fully automated

Cons

– Getting language details right is tricky (e.g. exceptions)

– No client/server overlap: doesn’t work well for long-running operations

– May not optimize large transfers well

– Not all APIs make sense to remote: e.g. answer = search(tree)

– Versioning can be a problem: client and server need to agree exactly on interface (or have rules for dealing with differences)

Page 32: Models of Distributed Computing

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Distributed Object Systems

Page 33: Models of Distributed Computing

© 2010 Noah Mendelsohn

How do you build an RPC for this?

Class Point { int x,y int getx() {return x;} int gety() {return y;}}

Class Rectangle { …members and constructs not shown… Point getUpperLeft() {…}; Point getLowerRight {…};}

myRect = new Rectangle;…assume position set here..int a = area(myRect); // REMOTE THIS CALL!

intarea (Rectangle r) { width=r.getLowerRight().getx() – r.getUpperLeft.getx(); width=r.getLowerRight().gety() – r.getUpperLeft.gety();}

Pass object to remote method

Call method on remoted object

Distributed Object systems make this work!

Page 34: Models of Distributed Computing

© 2010 Noah Mendelsohn

Distributed object systems

In the 1990s, seemed like a great idea

Advantages of OO encapsulation & inheritance + RPC

Examples– CORBA (Industry standard)

– DCOM (Microsoft)

Still quite widely used within enterprises

Complicated– Marshalling object references

– Distributed object lifetime management

– Brokering: which object provides the service today

– Remote “new”: creating objects on remote systems

– All the pros & cons of RPC, plus the above

Generally not appropriate at Internet scale

Page 35: Models of Distributed Computing

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Some Other Options

Page 36: Models of Distributed Computing

© 2010 Noah Mendelsohn

Special Purpose Models

Remote File System

– Network provides transparent access to remote files

– Examples: NFS, CIFS

Remote Database

– Examples: ODBJ, JDBC

Remote Device

– Remote printing, disk drive etc.

Virtual terminal

– One computer simulates an interactive terminal to another

Page 37: Models of Distributed Computing

© 2010 Noah Mendelsohn

Some other interesting models

Broadcast / multicast– Send messages to everyone (broadcast) / named group (multicast)

Publish / subscribe (pub/sub)– Subscribe to named events or based on query filter

– Call me whenever Pepsi’s stock price changes

– Implements a distributed associative memory

Reliable queuing– Examples: IBM MQSeries, Java Message Service (JMS)

– Model: queued messages, preserved across hardware crashes

– Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions;

– Depends on disk-based transaction systems at each node to keep queues

Tuple spaces– Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM)

– Network-scale shared variable space, with synchronization

– Good for queues of work to do: some cloud architectures use a related model to distribute work to servers

Page 38: Models of Distributed Computing

© 2010 Noah Mendelsohn

Stateful and Stateless Protocols

Page 39: Models of Distributed Computing

© 2010 Noah Mendelsohn

Stateful and Stateless Protocols

Stateful: server knows which step (state) has been reached

Stateless:

– Client remembers the state, sends to server each time

– Server processes each request independently

Can vary with level

– Many systems like Web run stateless protocols (e.g. HTTP) over streams…at the packet level, TCP streams are stateful

– HTTP itself is mostly stateless, but many HTTP requests (typically POSTs) update persistent state at the server

Page 40: Models of Distributed Computing

© 2010 Noah Mendelsohn

Advantages of stateless protocols

Protocol usually simpler

Server processes each request independently

Load balancing and restart easier

Typically easier to scale and make fault-tolerant

Visibility: individual requests more self-describing

Page 41: Models of Distributed Computing

© 2010 Noah Mendelsohn

Advantages of stateful protocols

Individual messages carry less data

Server does not have to re-establish context each time

There’s usually some changing state at the server at some level, except for completely static publishing systems

Page 42: Models of Distributed Computing

© 2010 Noah Mendelsohn

Text vs. Binary Protocols

Page 43: Models of Distributed Computing

© 2010 Noah Mendelsohn

Protocols can be text or binary on the wire

Text: messages are encoded characters

Binary: any bit patterns

Pros and cons quite similar to those for text vs. binary file formats

When sending between compatible machines, binary can be much faster because no conversion needed

Most Internet-scale application protocols (HTTP, SMTP) use text for protocol elements and for all content except photo/audio/video

HTTP 2.0 moving to binary (for msg size and parsing speed)

Page 44: Models of Distributed Computing

© 2010 Noah Mendelsohn

Summary

Page 45: Models of Distributed Computing

© 2010 Noah Mendelsohn

Summary

The machine-level model is complex: multiple CPUs, memories

A number of abstractions are widely used for limited-scale distribution

RPC is among the most interesting and successful

Statefulness / statelessness is a key design tradeoff

We’ll see next time why a new model was needed for the Web