1. introduction. 2 computing everywhere computing everywhere: – desktop, laptop, cars, cell phones...

30
1. Introduction

Upload: henry-peters

Post on 11-Jan-2016

233 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

1. Introduction

Page 2: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

2

Computing Everywhere

Computing everywhere:• – Desktop, Laptop, Cars, Cell phones

Input devices everywhere:• – Sensors, cameras, microphones

Connectivity everywhere:• – Rapid growth of bandwidth in the interior of the net• – Internet at home and office

Increased reliance on computers is inevitable Computer systems will become invisible only

when they are reliable

Page 3: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

3

Fault-Tolerance

Why?• Computers are increasingly being used in

critical applications where system failures may have severe consequences.

How?• By introducing redundancy (extra resources) in

the computer system, e.g., hardware redundancy and software redundancy.

Page 4: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

4

Need for Fault Tolerance: Universal

Natural objects:• • Fat deposits in body: survival in starvation• • Duplication of eyes: graceful degradation

upon failure

Man-made objects• • Redundancy in ordinary text• • Asking for password twice during initial set-

up• • Duplicate tires in trucks

Page 5: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

5

Mission Specific Approaches High availability systems:

• – Telephone• – Transaction processing: banks/airlines

Long life missions:• – Unscheduled maintenance too costly• – Manned and unmanned space borne systems

Critical applications:• – Real-time industrial control• – Aircraft control systems• – Life support systems

General Purpose Systems:• – CDs: encoding• – Internet: packet retransmission

Page 6: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

6

Example of Failures - eBay Crash

eBay: giant internet auction house• – A top 10 internet business• – Market value of $22 billion• – 3.8 million users as of March 1999

June 6, 1999• – eBay system is unavailable for 22 hours with

problems ongoing for several days• – Stock drops by 6.5%, $3-5 billion lost revenues• – Problems blamed on Sun server software

Page 7: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

7

Example - Ariane 5 Rocket Crash

Ariane 5 and its payload destroyed about 40 seconds after liftoff

Error due to software bug:• – Conversion of floating point to 16-bit int• – Out of range error generated but not handled

Testing of full system under actual conditions not done due to budget limits

Estimated cost: 120 million $

Page 8: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

8

Example - The Therac-25 Failure

Therac-25 is a linear accelerator used for radiation therapy

More dependent on software for safety than predecessors (Therac-20, Therac-6)

Machine reliably treated thousands of patients, but occasionally there were serious accidents, involving major injuries and 1 death.

Software problems:• – No locks on shared variables (race conditions).• – Timing sensitivity in user interface.

Page 9: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

9

Example - Tele Denmark

Tele Denmark Internet, ISP August 31, 1999

• – Internet service down for 3 hours• – Truck drove into the power supply cabinet at

Tele Denmark• – Where were the UPSs?

• Old ones had been disconnected for upgrade

• New ones were on the truck!

Page 10: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

10

Dependable

Webster Dictionary Dependable: capable of being depended

on: RELIABLE Reliable: suitable or fit to be relied on:

DEPENDABLE Rely:

• 1) to be dependent <the system for which we depend on water>

• 2) to have confidence based on experience <someone you can rely on>

Page 11: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

11

Dependability Dependability is that property of a computer

system such that reliance can justifiably be placed on the service it delivers.

Attributes Of Dependability• Reliability• Availability• Safety• Confidentiality• Integrity• Maintainability

Fault tolerance is not a system requirement. Fault tolerance is one of the mechanisms that can be used to provide dependability

Page 12: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

12

Motivation Extreme fault tolerance has always been around

• – NASA’s deep space probes• – Medical computing devices (e.g., pacemakers)

But now fault tolerance is becoming more important• – More reliance on computers

Extreme fault tolerance• – Car controllers (e.g., anti-lock brakes), etc.

High fault tolerance• – Commercial servers (databases, web servers), file servers,

etc. Some fault tolerance

• – Desktops, laptops (really!), etc.

Page 13: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

13

Reliability R(t) - Unreliability

R(t) is the probability that the system performs as specified without interruption over the entire interval [0,t]. R(t) is conditioned on the system being operational at time t=0.

time t can be very long, e.g. years in case of space applications

Unreliability F(t) is the probability that the system fails at any time in the interval [0,t].

F(t) = 1 - R(t)

Page 14: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

14

Availability A(t)

A(t) is the probability that the system is up and running correctly at time t

This is different from reliability.• – Reliability considers the interval [0,t]• – Availability takes an instance of time

examples: transaction processing systems, e.g. reservation systems

Page 15: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

15

Reliability vs. Availability

Example: A system that fails, on average, once per

hour but which restarts automatically in ten milliseconds is not very reliable but is highly available

Availability= Uptime/(Uptime+Downtime) = (3600000-10)/(3600000) = 0.9999972

Page 16: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

16

Nines of Availability

Page 17: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

17

Question ?

Which has higher availability?• (1) two 4 hour outage / year• (2) 1 minute outage / day• A(1) = (365*24-2*4)/(365*24) = 0.9990• A(2) = (24*60-1)/(24*60) = 0.9993

For an Internet-base company such as EBay or AOL, which would be more desirable? Why?

For a Hacker? Need to specify details of acceptable outages

Page 18: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

18

Safety S(t) S(t) is the probability that the system does

not fail in the interval [0,t] in such a manner as to cause unacceptable damage or other terrible effects.

S(t) is attribute of a system which either operates correctly or fails in a safe manner.

Safety is a measure of the fail-safe capability of the system

• – system can be unreliable, yet safe• – bias towards safe failure

Page 19: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

19

Safety Safety is a property of a system that it will not

endanger human life or the environment

A safety-related system is one by which the safety of equipment or plant is assured

The term safety-critical system is normally used as synonym for a safety-related system, although it may suggest a system of high criticality

Page 20: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

20

Confidentiality

Absence of unauthorized disclosure of information

• Microsoft source code vs. Linux source code• Web browsing• Operating Systems Security Model• Files• Medical records• Credit card transaction records• School grades

Page 21: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

21

Integrity

Absence of improper system state alterations

• Operating systems:• Files, memory, network packets• Linux kernel backdoor attempt• Database records• Your bank account• File transfer• Did I really get the right version of software X?

Page 22: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

22

Maintainability M(t)

M(t) is the probability that a failed system will be restored within a specified period of time t

Restoration process• – locating problem, e.g. via diagnostics• – physically repairing system• – bringing system back to its operational

condition

Page 23: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

23

Ex. Dependability Requirements

Telecommunications:• Availability, maintainability

Transportation:• Reliability, availability, safety

Weapons:• Safety

Nuclear systems:• Safety

Page 24: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

24

Dependability of Pacemaker What matters for this system?

• Correct computation?• Correct logic?• Usability?

No, The safety of the patient

Page 25: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

25

Dependability of Pacemaker General Characteristics Eight-bit processors, moving to 32-bit Software:

• Approximately 30K lines, mostly “C”• Vastly more software in external programmer

Patient data storage example:• 200 samples/sec

Long battery life necessary-device• “sleeps” between heart beats

Page 26: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

26

Dependability of Pacemaker Is reliability the goal?

• Typical battery life is five years, but persistent storage is needed

Is availability the goal? How about an availability of 0.99999?

• This corresponds to an average of five minutes per year of downtime

• Death would result if this occurred all at once Is safety the goal? It’s safe when it’s off - or is it?

• Leaving the system off might result in death very quickly.....

Page 27: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

27

Security

Security is a combination of attributes:• Integrity• Confidentiality• Availability

Under different situations, these attributes are more or less important:

• Denial of service is an availability issue• Disclosure of information is a confidentiality

Page 28: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

28

Performability P(L,t)

P(L,t) is the probability that the system performance will be at or above some level L at time t

Measure of the likelihood that some subset of the function is performed correctly

This differs from reliability, which dictates that all functions are performed correctly

Page 29: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

29

Graceful Degradation

The ability of system to automatically decrease its level of performance to compensate for hardware failure and software errors

Page 30: 1. Introduction. 2 Computing Everywhere Computing everywhere: – Desktop, Laptop, Cars, Cell phones Input devices everywhere: – Sensors, cameras, microphones

30

Testability

Testability: ease of detecting presence of a fault