models of distributed computing

Post on 12-Jan-2016

30 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

COMP 150-IDS: Internet Scale Distributed Systems (Fall 2012). Models of Distributed Computing. Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah. Architecting a universal Web. Identification: URIs Interaction: HTTP - PowerPoint PPT Presentation

TRANSCRIPT

Models of Distributed Computing

Noah MendelsohnTufts UniversityEmail: noah@cs.tufts.eduWeb: http://www.cs.tufts.edu/~noah

COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015)

© 2010 Noah Mendelsohn

Architecting a universal Web

Identification: URIs

Interaction: HTTP

Data formats: HTML, JPEG, GIF, etc.

© 2010 Noah Mendelsohn3

Goals

Introduce basics of distributed system design

Explore some traditional models of distributed computing

Prepare for discussion of REST: the Web’s model

© 2010 Noah Mendelsohn

Communicating systems

© 2010 Noah Mendelsohn

Communicating systems

CPUMemoryStorage

CPUMemoryStorage

We have multiple programs, running asynchronously, sending messages

Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

© 2010 Noah Mendelsohn

Communicating Sequential Processes

CPUMemoryStorage

CPUMemoryStorage

We have multiple programs, running asynchronously, sending messages

Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

We’ve got pretty clean higher level abstractions for use on a

single machine

© 2010 Noah Mendelsohn

Communicating systems

CPUMemoryStorage

CPUMemoryStorage

We have multiple programs, running asynchronously, sending messages

Reference: http://www.usingcsp.com/cspbook.pdf (very theoretical)

How can we get a clean model of two communicating machines?

© 2010 Noah Mendelsohn

Large scale systems

Internet

What are the clean abstractions on this scale?

How can we get a clean model of a worldwide network of

communicating machines?

© 2010 Noah Mendelsohn

WARNING!!

This is a very big topic…

…many important approaches have been studied and used…

…there is lots of operational experience, and also formalisms…

This presentation does not attempt to be either comprehensive or balanced…the goal is to introduce some key concepts

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Message Passing

© 2010 Noah Mendelsohn

Message passing

CPUMemoryStorage

CPUMemoryStorage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Half duplex: one way at a time

CPUMemoryStorage

CPUMemoryStorage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Full duplex: both ways at the same time

CPUMemoryStorage

CPUMemoryStorage

Programs send messages to and from each others’ memories

© 2010 Noah Mendelsohn

Message passing

Data abstraction:

– Low level: bytes (octets)

– Sometimes: agreed metaformat (XML, C struct, etc.)

Synchronization

– Wait for message

– Timeout

© 2010 Noah Mendelsohn

Interaction Patterns

© 2010 Noah Mendelsohn

Between pairs of machines

Message passing: no constraints

Common pattern: request/response

CPUMemoryStorage

CPUMemoryStorage

Request

Response

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Client Server

© 2010 Noah Mendelsohn

Client / server

Request / response is a traffic pattern

Client / server describes the roles of the nodes

Server provides service for client

CPUMemoryStorage

CPUMemoryStorage

Request service

Response

© 2010 Noah Mendelsohn

Client / server

Probably the most common dist. sys. architecture

Simple – well understood

Doesn’t explain:

– How to exploit more than 2 machines

– How to make programming easier

– How to prove correctness: though the simple model helps

Most client/server systems are request/response

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-N-Tier

© 2010 Noah Mendelsohn

N-tier – also called Multilevel Client/Server

Layered

Each tier provides services for next higher level

Reasons:– Information hiding

– Management

– Scalability

CPUMemoryStorage

CPUMemoryStorage

Request

Response

CPUMemoryStorage

Request

Response

© 2010 Noah Mendelsohn

Typical N-tier system: airline reservation

Application - logicBrowser or Phone App Application - logic

iPhone or AndroidReservation Application

Flight ReservationLogic

ReservationRecords

Many commercial applications work this way

© 2010 Noah Mendelsohn

The Web itself is a 2 or 3 Tier system

E.g. SquidE.g. Firefox E.g. Apache

Browser

Proxy Cache(optional!)

Web Server

Many commercial applications work this way

© 2010 Noah Mendelsohn

Web Reservation System

Application - logicBrowser or Phone App Application - logic

Web-BaseReservation Application

Flight ReservationLogic

ReservationRecords

Many commercial applications work this way

E.g. Squid

Proxy Cache(optional!)

HTTPHTTP RPC? ODBC? Proprietary?

© 2010 Noah Mendelsohn

Web Publishing System

E.g. cnn.comBrowser or Phone App Database or CMS

Web-BaseReservation Application

Content Web Site

Content Management System

Many commercial applications work this way

E.g. Akamia

Content Distribution

Network

© 2010 Noah Mendelsohn

Advantages of n-tier system

Separation of concerns – each layer has own role

Parallism and performance?

– If done right: multiple mid-tier servers work in parallel

– Back end systems centralize mainly data requiring sharing & synchronization

– Mid tier can provide shared, scalable caching

Information hiding

– Mid-tier apps shielded from data layout

Security

– Credit card numbers etc. not stored at mid-tier

© 2010 Noah Mendelsohn

Other patterns

Spanning tree

Broadcast (send to many nodes at once)

Flood

Various P2P

Etc.

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Remote Procedure Call

© 2010 Noah Mendelsohn

Remote Procedure Call

The term RPC was coined by the late Bruce Nelson in his 1981 CMU PhD thesis

Key idea: an ordinary function call executes remotely

The trick: the language runtime or helper code must automatically generate code to send parameters and results

For languages like C: proxies and stubs are generated

– Not needed in dynamic languages like Ruby, JavaScript, etc.

RPC is often (erroneously IMO) used to describe any request / response system

© 2010 Noah Mendelsohn

RPC: Call remote functions automatically

Interface definition: float sqrt(float n);

Proxies and stubs generated automatically

RPC provides transparent remote invocation

CPUMemoryStorage

CPUMemoryStorage

x = sqrt(4)

floatsqrt(float n) { …compute sqrt… return result;}

floatsqrt(float n) { send n; read s; return s;}

proxy

voiddoMsg(Msg m) { s = sqrt(m.s); send s;}

stub

Requestinvoke sqrt(4)

Responseresult=2 (no exception thrown)

© 2010 Noah Mendelsohn

RPC: Pros and Cons

Pros:

– Transparency is very appealing

– Simple programming model

– Useful as organizing principle even when not fully automated

Cons

– Getting language details right is tricky (e.g. exceptions)

– No client/server overlap: doesn’t work well for long-running operations

– May not optimize large transfers well

– Not all APIs make sense to remote: e.g. answer = search(tree)

– Versioning can be a problem: client and server need to agree exactly on interface (or have rules for dealing with differences)

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Distributed Object Systems

© 2010 Noah Mendelsohn

How do you build an RPC for this?

Class Point { int x,y int getx() {return x;} int gety() {return y;}}

Class Rectangle { …members and constructs not shown… Point getUpperLeft() {…}; Point getLowerRight {…};}

myRect = new Rectangle;…assume position set here..int a = area(myRect); // REMOTE THIS CALL!

intarea (Rectangle r) { width=r.getLowerRight().getx() – r.getUpperLeft.getx(); width=r.getLowerRight().gety() – r.getUpperLeft.gety();}

Pass object to remote method

Call method on remoted object

Distributed Object systems make this work!

© 2010 Noah Mendelsohn

Distributed object systems

In the 1990s, seemed like a great idea

Advantages of OO encapsulation & inheritance + RPC

Examples– CORBA (Industry standard)

– DCOM (Microsoft)

Still quite widely used within enterprises

Complicated– Marshalling object references

– Distributed object lifetime management

– Brokering: which object provides the service today

– Remote “new”: creating objects on remote systems

– All the pros & cons of RPC, plus the above

Generally not appropriate at Internet scale

© 2010 Noah Mendelsohn

Traditional Models of Distributed Computing

-Some Other Options

© 2010 Noah Mendelsohn

Special Purpose Models

Remote File System

– Network provides transparent access to remote files

– Examples: NFS, CIFS

Remote Database

– Examples: ODBJ, JDBC

Remote Device

– Remote printing, disk drive etc.

Virtual terminal

– One computer simulates an interactive terminal to another

© 2010 Noah Mendelsohn

Some other interesting models

Broadcast / multicast– Send messages to everyone (broadcast) / named group (multicast)

Publish / subscribe (pub/sub)– Subscribe to named events or based on query filter

– Call me whenever Pepsi’s stock price changes

– Implements a distributed associative memory

Reliable queuing– Examples: IBM MQSeries, Java Message Service (JMS)

– Model: queued messages, preserved across hardware crashes

– Widely used for bank machine transactions; long-running (multi-day) eCommerce transactions;

– Depends on disk-based transaction systems at each node to keep queues

Tuple spaces– Pioneered by Gelernter at Yale (Linda kernel), picked up by Jini (Sun), and TSpaces (IBM)

– Network-scale shared variable space, with synchronization

– Good for queues of work to do: some cloud architectures use a related model to distribute work to servers

© 2010 Noah Mendelsohn

Stateful and Stateless Protocols

© 2010 Noah Mendelsohn

Stateful and Stateless Protocols

Stateful: server knows which step (state) has been reached

Stateless:

– Client remembers the state, sends to server each time

– Server processes each request independently

Can vary with level

– Many systems like Web run stateless protocols (e.g. HTTP) over streams…at the packet level, TCP streams are stateful

– HTTP itself is mostly stateless, but many HTTP requests (typically POSTs) update persistent state at the server

© 2010 Noah Mendelsohn

Advantages of stateless protocols

Protocol usually simpler

Server processes each request independently

Load balancing and restart easier

Typically easier to scale and make fault-tolerant

Visibility: individual requests more self-describing

© 2010 Noah Mendelsohn

Advantages of stateful protocols

Individual messages carry less data

Server does not have to re-establish context each time

There’s usually some changing state at the server at some level, except for completely static publishing systems

© 2010 Noah Mendelsohn

Text vs. Binary Protocols

© 2010 Noah Mendelsohn

Protocols can be text or binary on the wire

Text: messages are encoded characters

Binary: any bit patterns

Pros and cons quite similar to those for text vs. binary file formats

When sending between compatible machines, binary can be much faster because no conversion needed

Most Internet-scale application protocols (HTTP, SMTP) use text for protocol elements and for all content except photo/audio/video

HTTP 2.0 moving to binary (for msg size and parsing speed)

© 2010 Noah Mendelsohn

Summary

© 2010 Noah Mendelsohn

Summary

The machine-level model is complex: multiple CPUs, memories

A number of abstractions are widely used for limited-scale distribution

RPC is among the most interesting and successful

Statefulness / statelessness is a key design tradeoff

We’ll see next time why a new model was needed for the Web

top related