[ppt]

65
Reliability Tools and Options Professor Ken Birman Dept. of Computer Science Cornell University

Upload: sammy17

Post on 27-Jan-2015

189 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: [ppt]

Reliability Tools and Options

Professor Ken BirmanDept. of Computer Science

Cornell University

Page 2: [ppt]

Last Time

We saw that reliability is a complex spectrum of properties and tradeoffs

We developed the idea of e-triage And we glanced at some

technologies

Page 3: [ppt]

Today

Last of three lectures on reliability Focus on technologies in more

depth What can they do for us? How do they work? How well to they integrate? Limitations? Scalability issues?

Page 4: [ppt]

Technologies

Communication Tools: TCP/IP Remote Procedure Call (or “method invocation”) Process group membership tracking and multicast Publish/subscribe (also called “MOMS”)

Checkpoint and Restart, perhaps with mirrored disks Transactions and Databases Web servers and Java/JNI/JavaScript Components and Object-oriented architectures Cluster fault-tolerance and load-balancing Traditional Linux tools and “scripts” Hardware reliability – fault-tolerant computers

Page 5: [ppt]

Sorting Things Out

Computer scientists like to think in terms of big chunks of technology that they classify into categories

Often we talk about “layers” Lowest layers are close to the hardware Higher layers deal with things closer to

the user who sits in front of a screen

Page 6: [ppt]

Examples of Layers

Network ProtocolsOperating System

Server Technologies

Middleware

Applications

Page 7: [ppt]

What Makes a Layer?

Layer uses stuff below it but nothing from above it

And the layer offers a set of services to things above it

Sometimes we imagine a layer as a thing that transforms a computer or a network into a new one with new properties!

Somewhat like looking through a set of magic eyeglasses, each one somehow transforming the world into a magic new world…

Page 8: [ppt]

Examples of Layers

Network ProtocolsOperating System

Server Technologies

Middleware

Applications

Page 9: [ppt]

Operating System

Major ones are Windows (several varients), from

Microsoft Linux (one of many versions of Unix) Macintosh OS Palm OS VxWorks, QNX

Many other minor ones

Page 10: [ppt]

Operating System

It runs the hardware for one computer Also supports “processes”, manages memory and

other resources, provides security

People refer to the OS as a “platform” Applications use OS features and run “on” it They don’t need to deal with special issues involving

the hardware because the OS handles them

These days OS also includes components that handle networking

A modern OS is structured as a set of “objects”

Page 11: [ppt]

Protocols

These are little programs that run in applications or in the OS

They work by sending messages over network connections

Goal is to do something useful in a distributed manner For example, network can lose packets But web pages can’t tolerate missing chunks of

data! So web uses a protocol that resends lost packets

Page 12: [ppt]

Representative Protocols?

Just look at two examples to get the feel Don’t worry about the details Idea is to understand the “kinds of

things” each layer is doing, not the specifics We do teach the specifics in Cornell courses But any one of these would take weeks to

cover in a comprehensive way

Page 13: [ppt]

Communication Tools: TCP/IP

The basic communication technology of the Web

Works like a telephone call: Your browser connects to a server using its IP

address (looks like 128.64.77.133) Your request is sent as a message over the

connection. The result comes back. The connection automatically matches sending

and receiving rates (easily fooled by noisy links!) Also, automatically corrects for data loss

Page 14: [ppt]

TCP sliding window

mi+k mi+k-1 .... mi+k+1

- - mi+k-2 - mi+k-3 ... mi

sender provides data

receiver consumes data

IP packets carry segments

window has k “segments”

receiver replies with acks and nacks. sender resends missing data

When acknowledgement is received, segment number keeps incrementing but slot number is reused.

Page 15: [ppt]

TCP/IP: Pros and Cons

Simple, widely supported way to communicate

Overcomes packet loss, duplication, out of order delivery But can reduce rate down to zero when network

becomes congested, easily fooled by a noisy link Also, connections can break even if neither

endpoint actually fails. Things that use TCP, like web browsers,

inherit these benefits… and these problems!

Page 16: [ppt]

Communication Tools: RPC

Idea is that each program declares a set of actions it can perform – “methods” that can be invoked using an “interface”

Client programs “bind” to interface Send a message to invoke a method,

reply comes back in form of a message too. Special protocols overcome failure

Page 17: [ppt]

The basic RPC protocol

client server“binds” to

serverregisters with name service

Page 18: [ppt]

The basic RPC protocol

client server“binds” to

server

prepares, sends request

registers with name service

receives request

Page 19: [ppt]

The basic RPC protocol

client server“binds” to

server

prepares, sends request

registers with name service

receives requestinvokes handler

Page 20: [ppt]

The basic RPC protocol

client server“binds” to

server

prepares, sends request

registers with name service

receives requestinvokes handlersends reply

Page 21: [ppt]

The basic RPC protocol

client server“binds” to

server

prepares, sends request

unpacks reply

registers with name service

receives requestinvokes handlersends reply

Page 22: [ppt]

RPC Summary

Basic technology in most “client-server” situations with exception of the Web

Can hide packet loss but not server failure Can certainly fail (due to timeout) when

server and client are actually both healthy Many limitations in terms of form of data

you can send, packet size, etc.

Page 23: [ppt]

When are they used?

TCP is used to transfer “objects” Usually objects are reasonably large Examples are email messages, files,

web pages, copies of programs RPC is used when a program asks for

a service provided by some other program Best for small requests and replies

Page 24: [ppt]

Examples of Layers

Network ProtocolsOperating System

Middleware

Applications

Server Technologies

Page 25: [ppt]

Concept Of Middleware

Middleware is any kind of a software tool that runs over a basic infrastructure Provides a standard set of services for some

class of applications Idea is that OS and network may be “too

general” Middleware creates a better environment for

some large class of applications that all share a need poorly addressed by the lower layers

Middleware is increasingly important

Page 26: [ppt]

Communication Middleware Example: Multicast

Broad term covering a variety of one-many communication tools

We talk about the: Process group: set of programs for which

membership is tracked Multicast: a way of sending data to group State transfer: brings a joining program up to

date Order, atomicity: guarantee that messages are

seen in same order by all members, despite failure

Page 27: [ppt]

Virtual Synchrony Model

crash

G0={p,q} G1={p,q,r,s} G2={q,r,s} G3={q,r,s,t}

p

q

r

s

tr, s request to join

r,s added; state xfer

t added, state xfer

t requests to join

p fails

Page 28: [ppt]

Communication Middleware Example: Publish/Subscribe

Packaging of one-many communication tools into an elegant, easily understood form

Idea is that data producers “publish” information, marked with “subjects” that each item is about

Subscribers “subscribe” to the subjects of interest to them

Page 29: [ppt]

Conceptually, a message “bus”

Boxes are publishers (red / green subjects) Circles are subscribers (“ “ ) Disks represent spoolers used for playback Flexible and easily extended over time

Supports huge numbers of subjects

Page 30: [ppt]

Conceptually, a message “bus”

Boxes are publishers (blue / green subjects) Circles are subscribers (“ “ ) Disks represent spoolers used for playback Flexible and easily extended over time

Supports huge numbers of subjects

Page 31: [ppt]

Conceptually, a message “bus”

Boxes are publishers (blue / green subjects) Circles are subscribers (“ “ ) Disks represent spoolers used for playback Flexible and easily extended over time

Supports huge numbers of subjects

Page 32: [ppt]

Publish/Subscribe Pros and Cons

Conceptually very simple, popular But in practice the infrastructure can be

limiting and cumbersome Often end up with more or less all

processes receiving more or less all the messages, anyhow

Example of a technology that made more sense when computers were slower

Page 33: [ppt]

When Are They Used?

Process groups? New York Stock Exchange, Swiss Exchange French air traffic control system AEGIS rebuild

Publish-Subscribe message bus Most trading floors Factory automation and process control Some internal use for gluing databases to web

sites

Page 34: [ppt]

Examples of Layers

Network ProtocolsOperating System

Server Technologies

Middleware

Applications

Page 35: [ppt]

Servers

Many modern technologies follow a client-server programming model You are the client The server handles incoming requests

This model is probably the big success of the 1980-2000 period for computing

Normally, client connects to server on network and uses some form of RPC to talk to it

Page 36: [ppt]

Servers

Web servers Database servers Weblogic: a fancy web server that

combines features needed for eCommerce sites

Mail servers, message queuing servers Other application-specific servers

E.g. computer-aided design, payroll, etc…

Page 37: [ppt]

Servers

Secretly, most servers are a database perhaps extended to know about a specific category of application or use We call this domain-specific refinements Idea is that an Oracle database, out of

the box, is a very general platform but that a lot of work is needed to use it for, say, payroll

Databases use “transactional” model

Page 38: [ppt]

Transactions and Databases

One of the very big, well supported technologies

Associated with databases Each program “runs a transaction”

beginaction1 action2 action3 ….

commit or abort Either entire transaction is performed, or

entire transaction is erased (if disrupted by crash)

Page 39: [ppt]

ACID Properties

Atomicity: entire group of actions is treated as one “atomic unit”

Concurrency: more than one can run at the same time on the same database

Isolation: but they are isolated from each other, as if only one ran at a time

Durability: committed transactions survive failures and recoveries

Page 40: [ppt]

Pros and Cons

Mixture of a powerful model with powerful, comprehensive vendor support

More or less integrated with web But recovery can be slow And high availability databases usually sacrifice some

aspects of ACID guarantee

Note that vendors offer “replication” products but nobody uses these – performance is terrible.

Hot topic: cluster-style parallel servers Clustering is a way to get scalability

Page 41: [ppt]

Trends in Systems

Enough on layers In previous lectures looked at business

issues associated with the Internet Today have also seen lots of technology

Mixture of current systems Emerging products and systems Technologies

What comes next in distributed computing?

Page 42: [ppt]

Ways of posing questions

As a business question: I want to get rich, what should I invest in?

Ultimately a flakey and meaningless question Should ask “what should I learn about”

As a research question I want to be famous, what should I invent?

If you’re so smart, you should tell me!

As a big-picture question Where is dramatic change inevitable? This question makes more sense than the others

Page 43: [ppt]

Looking for Exciting Change

Our goal is to anticipate dramatic, unexpected change

Is there a methodology for identifying the big opportunities?

How can we apply it to networks and distributed computing?

Page 44: [ppt]

Traditional Areas

File systems Communications Naming of objects, interoperation Security Resource management Transactions Extensibility

Page 45: [ppt]

Emerging areas

Scalable service management Tools for hosting data Mechanisms for offloading work

from customers onto 3rd party solution provider systems

QoS mechanisms Power-aware and mobility support

Page 46: [ppt]

Where are the big opportunities?

We could review these one topic at a time, but that might get dull

Can we develop a methodology for recognizing big opportunities and “leaping in”?

Page 47: [ppt]

Technology trends

Source: Scientific American, Sept. 1995

0100200300400500600700

1985-1990

1990-1995

1995-2000

2000-2005

CPU MIPS

Memor y MB

LAN Mbits

WAN Mbits

O/ Sover head

Note tremendous growth in WAN speeds

Page 48: [ppt]

Typical latencies (milliseconds)

0.01

0.1

1

10

100

1000

1985

-199

0

1990

-199

5

1995

-200

0

2000

-200

5

Disk I/O

EthernetRPC

ATMroundtrip

WANroundtrip

WAN, disk latencies arefairly constant due to physical limitations

Page 49: [ppt]

O/S latency: the most expensive overhead on LAN communication!

05

1015

202530

3540

1985-

1990

1995-

2000

O/Soverhead inproportionalterms

Page 50: [ppt]

Suggests?

Notice that revolutionary opportunity is triggered by technical discontinuity

To predict a revolution…… just identify a technology sector

about to be shaken up by a trend that breaks the usual relationships

… predict “big things will happen”

Page 51: [ppt]

Recent revolutions

Internet became much faster, more widely available

Operating systems became object oriented Enabled the Web Which enabled all sorts of B2B

developments people knew were coming…

Page 52: [ppt]

Other examples?

For a long time, PCs were slow and balky, but very cheap

But around 1990 technology gave us a fast, big PC Suddenly, desktop world yielded to PC

world Price point can trigger a discontinuity

Page 53: [ppt]

Other examples?

We used to be short on memory hence relied heavily on disks

But around 1985 memory sizes and cost changed the equation

Suddenly massive caches made sense Giving us ideas like log-structured file

systems and new styles of caching in file and database systems

A world where 100% hit rates made sense

Page 54: [ppt]

Looking to the future?

Major discontinuities: Move from PC to PDA/telephone

hybrids Mobility, disconnected operation Emergence of huge numbers of

computing systems that need to cooperate

Perhaps, some form of QoS?

Page 55: [ppt]

Want to have an impact?

Trick is to zero in on one of these areas Be an early player

For example, get a mobile hand-held system and start to play with it

Lots of things in the legacy infrastructure just aren’t right for it

Your opportunity: fix a few of them by doing the obvious things

And you’ll instantly be famous!

Page 56: [ppt]

Mobile Trends

Nomadicity: increasingly powerful nomadic devices Anticipate fusion of web browser, telephone

and also PDA functionality Some devices of this sort already exist – but

they remain primitive Low bandwidth interaction a big obstacle

right now – you can’t talk to it, but typing without a keyboard is a pain

Page 57: [ppt]

Mobile trends

Communications standards We already are seeing widespread use of

wireless ethernet cards Bluetooth is the next big step: widespread

low-power connectivity for small devices XML helps: data objects are readily

understood… fewer proprietary standards

Page 58: [ppt]

Mobile trends

Power conservation Also better understood Flexibility: compute faster or slower,

move code or data, sleep or run more actively

Signal strength also a factor

Page 59: [ppt]

Mobile trends

Suggests a future in which We’ll move from place to place with

our computing context In a given setting, devices find the

appropriate local resources and can talk to them

And device is smart about when to ship code, when to ship data

Page 60: [ppt]

Mobile trends

But this also points to a missing link: exciting research opportunity How to do naming of objects in this new

mobile world? User wants a single personalized name for

resources and a single name space But we also need to share things

And how to organize or structure a nomadic or wireless environment

Peer-to-peer and multi-peer opportunity will be enormous

Page 61: [ppt]

Illustrating…

A discontinuous development From fixed infrastructure to mobile wireless

one High performance but power-aware Fusion of previously independent

technologies (voice, web, email) Stress on existing infrastructure

We tend to adapt the existing infrastructure to the new setting

But a whole new approach may be needed

Page 62: [ppt]

Driving…

New ideas in file systems How should we do file systems for mobile and

wireless systems? Communication

How should we do point to point and multicast for wireless peer-to-peer or “ad-hoc” networks?

Is TCP the right protocol for a wireless connection to a server?

The list goes on…

Page 63: [ppt]

Dangers

It is easy to overreach People tend to try to do 10 things all at

the same time… Need to be incremental

Challenge? Picking the right first step The right infrastructure can enable just

about anything!

Page 64: [ppt]

But we’re out of time…

Take-aways from this lecture series? Business roles in eCommerce

Examples of existing sectors Some thought about business role in

developing new technology-limited ventures And some review of how technologies are

structured Leading to an angle on how to identify

big emerging opportunity areas

Page 65: [ppt]

What should I know?

If you want to remember just one thing… Remember the French air traffic control project Where the US project overreached and failed, the

French went slowly, tested like crazy, and built a better system that really worked

Scalability and stability of technology is the key Be French!

Also drink moderate amounts of good red wine Visit http://www.fromages.com now and then Remember that vision of the world as 100

people…