models of distributed computing

Models of Distributed Computing

Noah MendelsohnTufts UniversityEmail: noah@cs.tufts.eduWeb: http://www.cs.tufts.edu/~noah

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)

Architecting a universal Web

Identification: URIs

Interaction: HTTP

Data formats: HTML, JPEG, GIF, etc.

Introduce basics of distributed system design

Explore some traditional models of distributed computing

Prepare for discussion of REST: the Web’s model

Communicating systems

CPUMemoryStorage

We have multiple programs, running asynchronously, sending messages

Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

Communicating Sequential Processes

CPUMemoryStorage

We’ve got pretty clean higher level abstractions for use on a

single machine

CPUMemoryStorage

How can we get a clean model of two communicating machines?

Large scale systems

Internet

What are the clean abstractions on this scale?

How can we get a clean model of a worldwide network of

communicating machines?

WARNING!!

This is a very big topic…

…many important approaches have been studied and used…

…there is lots of operational experience, and also formalisms…

This presentation does not attempt to be either comprehensive or balanced…the goal is to introduce some key concepts

Traditional Models of Distributed Computing

-Message Passing

Message passing

CPUMemoryStorage

Programs send messages to and from each others’ memories

Half duplex: one way at a time

CPUMemoryStorage

Full duplex: both ways at the same time

CPUMemoryStorage

Message passing

Data abstraction:

– Low level: bytes (octets)

– Sometimes: agreed metaformat (XML, C struct, etc.)

Synchronization

– Wait for message

– Timeout

Interaction Patterns

Between pairs of machines

Message passing: no constraints

Common pattern: request/response

CPUMemoryStorage

Request

Response

-Client Server

Client / server

Request / response is a traffic pattern

Client / server describes the roles of the nodes

Server provides service for client

CPUMemoryStorage

Request service

Response

Client / server

Probably the most common dist. sys. architecture

Simple – well understood

Doesn’t explain:

– How to exploit more than 2 machines

– How to make programming easier

– How to prove correctness: though the simple model helps

Most client/server systems are request/response

-N-Tier

N-tier – also called Multilevel Client/Server

Layered

Each tier provides services for next higher level

Reasons:– Information hiding

– Management

– Scalability

CPUMemoryStorage

Request

Response

CPUMemoryStorage

Request

Response

Typical N-tier system: airline reservation

Application - logicBrowser or Phone App Application - logic

iPhone or AndroidReservation Application

Flight ReservationLogic

ReservationRecords

Many commercial applications work this way

The Web itself is a 2 or 3 Tier system

E.g. SquidE.g. Firefox E.g. Apache

Browser

Proxy Cache(optional!)

Web Server

Web Reservation System

Application - logicBrowser or Phone App Application - logic

Web-BaseReservation Application

Flight ReservationLogic

ReservationRecords

E.g. Squid

Proxy Cache(optional!)

HTTPHTTP RPC? ODBC? Proprietary?

Web Publishing System

E.g. cnn.comBrowser or Phone App Database or CMS

Web-BaseReservation Application

Content Web Site

Content Management System

E.g. Akamia

Content Distribution

Network

Advantages of n-tier system

Separation of concerns – each layer has own role

Parallism and performance?

– If done right: multiple mid-tier servers work in parallel

– Back end systems centralize mainly data requiring sharing & synchronization

– Mid tier can provide shared, scalable caching

Information hiding

– Mid-tier apps shielded from data layout

Security

– Credit card numbers etc. not stored at mid-tier

Other patterns

Spanning tree

Broadcast (send to many nodes at once)

Various P2P

-Remote Procedure Call

Remote Procedure Call

The term RPC was coined by the late Bruce Nelson in his 1981 CMU PhD thesis

Key idea: an ordinary function call executes remotely

The trick: the language runtime or helper code must automatically generate code to send parameters and results

For languages like C: proxies and stubs are generated

– Not needed in dynamic languages like Ruby, JavaScript, etc.

RPC is often (erroneously IMO) used to describe any request / response system

RPC: Call remote functions automatically

Interface definition: float sqrt(float n);

Proxies and stubs generated automatically

RPC provides transparent remote invocation

CPUMemoryStorage

x = sqrt(4)

floatsqrt(float n) { …compute sqrt… return result;}

floatsqrt(float n) { send n; read s; return s;}

voiddoMsg(Msg m) { s = sqrt(m.s); send s;}

Requestinvoke sqrt(4)

Responseresult=2 (no exception thrown)

RPC: Pros and Cons

– Transparency is very appealing

– Simple programming model

– Useful as organizing principle even when not fully automated

– Getting language details right is tricky (e.g. exceptions)

– No client/server overlap: doesn’t work well for long-running operations

– May not optimize large transfers well

– Not all APIs make sense to remote: e.g. answer = search(tree)

– Versioning can be a problem: client and server need to agree exactly on interface (or have rules for dealing with differences)

-Distributed Object Systems

How do you build an RPC for this?

Class Point { int x,y int getx() {return x;} int gety() {return y;}}

Class Rectangle { …members and constructs not shown… Point getUpperLeft() {…}; Point getLowerRight {…};}

myRect = new Rectangle;…assume position set here..int a = area(myRect); // REMOTE THIS CALL!

intarea (Rectangle r) { width=r.getLowerRight().getx() – r.getUpperLeft.getx(); width=r.getLowerRight().gety() – r.getUpperLeft.gety();}

Pass object to remote method

Call method on remoted object

Distributed Object systems make this work!

Distributed object systems

In the 1990s, seemed like a great idea

Advantages of OO encapsulation & inheritance + RPC

Examples– CORBA (Industry standard)

– DCOM (Microsoft)

Still quite widely used within enterprises

Complicated– Marshalling object references

– Distributed object lifetime management

– Brokering: which object provides the service today

– Remote “new”: creating objects on remote systems

– All the pros & cons of RPC, plus the above

Generally not appropriate at Internet scale

-Some Other Options

Special Purpose Models

Remote File System

– Network provides transparent access to remote files

– Examples: NFS, CIFS

Remote Database

– Examples: ODBJ, JDBC

Remote Device

– Remote printing, disk drive etc.

Virtual terminal

– One computer simulates an interactive terminal to another

Some other interesting models

Broadcast / multicast– Send messages to everyone (broadcast) / named group (multicast)

Publish / subscribe (pub/sub)– Subscribe to named events or based on query filter

– Call me whenever Pepsi’s stock price changes

– Implements a distributed associative memory

Reliable queuing– Examples: IBM MQSeries, Java Message Service (JMS)

– Model: queued messages, preserved across hardware crashes

– Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions;

– Depends on disk-based transaction systems at each node to keep queues

Tuple spaces– Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM)

– Network-scale shared variable space, with synchronization

– Good for queues of work to do: some cloud architectures use a related model to distribute work to servers

Stateful and Stateless Protocols

Stateful: server knows which step (state) has been reached

Stateless:

– Client remembers the state, sends to server each time

– Server processes each request independently

Can vary with level

– Many systems like Web run stateless protocols (e.g. HTTP) over streams…at the packet level, TCP streams are stateful

– HTTP itself is mostly stateless, but many HTTP requests (typically POSTs) update persistent state at the server

Advantages of stateless protocols

Protocol usually simpler

Server processes each request independently

Load balancing and restart easier

Typically easier to scale and make fault-tolerant

Visibility: individual requests more self-describing

Advantages of stateful protocols

Individual messages carry less data

Server does not have to re-establish context each time

There’s usually some changing state at the server at some level, except for completely static publishing systems

Text vs. Binary Protocols

Protocols can be text or binary on the wire

Text: messages are encoded characters

Binary: any bit patterns

Pros and cons quite similar to those for text vs. binary file formats

When sending between compatible machines, binary can be much faster because no conversion needed

Most Internet-scale application protocols (HTTP, SMTP) use text for protocol elements and for all content except photo/audio/video

HTTP 2.0 moving to binary (for msg size and parsing speed)

Summary

The machine-level model is complex: multiple CPUs, memories

A number of abstractions are widely used for limited-scale distribution

RPC is among the most interesting and successful

Statefulness / statelessness is a key design tradeoff

We’ll see next time why a new model was needed for the Web

models of distributed computing

noah mendelsohnarchitecting

noah mendelsohnwarning

noah mendelsohnntier

noah mendelsohnhalf

noah mendelsohnfull

clean model

clean abstractions

multiple programs

Documents

data processing models for distributed computing and it’s...

neural and evolutionary computing - lecture 10 1 parallel...

relative power of models in distributed computing petr...

system models for distributed and cloud computing -...

reliable distributed systems models for distributed...

distributed computing environment distributed computing...

distributed computing

itec452 distributed computing lecture 2 – part 2 models in...

system models for distributed and cloud...

distributed computing distributed computing introduction1...

grid computing in distributed high-end computing...

distributed computing -...

system models for distributed and cloud computing

chapter 2 technologies for e-commerce e-commerce models ...

relative power of models in distributed computing

distributed systems and the internet distributed system...

system models for distributed and cloud...

parallel and distributed computing - university of...

unit-iv distributed computing systems distributed...

phani vamsi krishna.maddali. distributed ???? is it...