dependability: what is it?staff.cs.upt.ro/~vancusa/fsc/c1.pdf · •rapid advancement of hardware...

DEPENDABILITY:

WHAT IS IT? Why do we care?

Dependability and security attributes

(Avizienis, et al. 2004)

Dependability the ability to deliver service that can

justifiably be trusted

Availability: readiness for correct service

Security

Reliability: continuity of correct service

Safety: absence of catastrophic consequences on the

user(s) and the environment

Integrity: absence of improper system alterations

Security

Maintainability: ability to undergo modifications and repairs

Confidentiality: the absence of unauthorized disclosure of

information

Security

When it’s a good time for dependability? D

ependab

ility

th

e a

bili

ty t

o d

eliv

er

serv

ice t

hat

can justifiably

be t

ruste

d


Reliability: continuity of correct service

Safety: absence of catastrophic consequences on

the user(s) and the environment


Maintainability: ability to undergo modifications and repairs

• graduating university • you have to deliver passable

data (projects, papers, exams)

• if possible, continuously (without failing (too many) exams)

• while taking care not to kill/maim/etc. the teachers

• and do not mess up the original system (enter the right classroom without any cheating means on you)

• while undergoing repairs (recovering from hangovers).

Note: “The dependability and security specification of a

system must include the requirements for the attributes

in terms of the acceptable frequency and severity of

service failures for specified classes of faults and a

given use environment.

One or more attributes may not be required at all for a

given system.” (Avizienis, et al. 2004)

When it’s good to have security? S

ecu

rity



Confidentiality: the absence of unauthorized disclosure of

information

• a system that deals with ATMs and banks • be ready to provide the

correct service (availability),

• while not any unsanctioned modifications change it, like tiny cameras to catch you typing in your pin (integrity)

• all the while maintaining the identity of the client as private as possible (confidentiality)

Complex hardware/software

• NASA Space Shuttle flies • 500,000 lines of software code on board

• 3.5 million lines of code in ground control and processing

• International Space Station Alpha • millions of lines of software

• innumerable hardware pieces for its navigation, communication, and experimentation

• Telecommunications industry • operations for phone carriers are supported by hundreds of software

systems, with hundreds of millions of lines of source code

• Avionics industry • instruments contain their own microprocessor system with extensive

embedded software

• A massive amount of hardware and complicated software also exists in the Federal Aviation Administration’s Advanced Automation System, the new generation air traffic control system.

“More common” complexity

• Windows

• 1 to 5 million lines of code

• Your car

• Embedded controllers + soft

• Your cell

• Your toaster?

What happens when things go wrong?

Inconvenience

• malfunctions of home appliances

Economic damage

• interruptions of banking systems

Loss of life

• failures of flight systems or medical software

Who’s to blame: sw? hw?

• rapid advancement of hardware technology

• proper development of software technology has failed to keep pace

in all measures, including quality, productivity, cost, and

performance

• the last decade of the 20th century, computer software has already

become the major source of reported outages in many systems

• software assumes a larger burden

• based on a less firm foundation, than hardware

“Famous” (and a bit funny) failures

• NASA Voyager project,

• the Uranus encounter was in jeopardy because of late software deliveries and reduced capability in the Deep Space Network

• Space Shuttle missions

• delayed due to hardware/software interaction problems

• F-16 jet fighter

• “software problems” caused the first flight of the AFTI/F-16 jet fighter to be delayed over a year

• none of the advanced modes originally planned could be used

• ozone hole over Antarctica

• data analysis program had suppressed the anomalous data because it was “out of range.”

• Denver International Airport

• empty more than a year due to software glitches in an automated baggage-handling system

Tragic failures

• Therac-25 radiation therapy machine

• software errors in its sophisticated control systems malfunctioned

and claimed several patients’ lives in 1985 and 1986

• Computer Aided Dispatch system of the London

Ambulance Service broke down right after its installation

• 5000 daily requests to transport patients

• aviation industry

• misunderstandings between computers and pilots have been

implicated in several airline crashes

• in some cases experts hold software control responsible because

of inappropriate reaction of the aircraft to the pilots’ desperate

inquiries during an abnormal flight.

Common failures

• Many software systems and packages

• Bugs

• Microsoft is fearful of “killer bugs” which can easily wipe out all the

profits of a glorious product if a recall is required on the tens of

millions of copies they have sold.

• Who’s seen a blue screen in Windows?

• a fault in a switching system’s newly released software

caused massive disruption of a major carrier’s long-

distance network

• series of local phone outages traced to software problems

Factors that play an instrumental role in

increasing the importance of reliability • Complex and sophisticated products.

• Over the years engineering products have become more sophisticated and complex. For example, in 1935 a farm tractor had 1200 critical parts and in 1990 the number increased to around 2900. Furthermore, today a typical Boeing 747 jumbo jet airplane is composed of around 4.5 million parts, including fasteners.

• High acquisition cost.

• Many engineering products cost millions of dollars (e.g., commercial airplanes, defense systems, and space satellites). Failure of such items could result in loss of millions of dollars.

• The past well-publicized system failures.

• Three examples of these failures are Space Shuttle Challenger disaster, Chernobyl Nuclear Reactor explosion, and Point Pleasant Bridge Disaster. These disasters occurred in January 1986, April 1986, and December 1967, respectively

• Q: Fukushima Nuclear Plant ?

The fundamental chain of dependability

and security threats

Exemple 1: Columbia space shuttle

• NASA communicate:

• “There was a hole of protective material in a wing of the Columbia shuttle. When the shuttle returned into earth atmosphere, the hole cause the overheat of the wing. Lastly, the overheat of the wing caused the explosion of the shuttle”.

Solution:

Fault = hole in the wing • belongs to the

physical universe

• by activation causes an error

Error = overheat of the wing • belongs to the

information universe

• by propagation causes a failure

Failure = explosion • belongs to the

external universe

• by causation may create another fault

Ex 2: Genetic mechanisms drive cancer

progression

Mutation inactivates tumor supressor gene

Cells proliferate Mutation inactivates

DNA repair gene

Mutation of proto-ocogene creates an

oncogene

Mutation inactivates several more tumor suppressor genes

Cancer

Cancer seen as a causal link of faults

Fault = mutation inactivates tumor supressor gene

Error = mutated cells proliferate

Failure = cells cannot stop themselves from turning into something

else

Fault = new mutation (oncogene)

Error = twice mutated cells proliferate

Failure = cells turn into something else

Fault = new mutation (inactivate DNA repair)

Error = thrice mutated cells proliferate

Failure = cells cannot be stoped by the local

control

Fault = new mutation (other tumor suppressors inactivated)


Failure = Cancer

Ways to dependability

• Fault prevention means to prevent the occurrence or

introduction of faults.

• Fault tolerance means to avoid service failures in the

presence of faults.

• Fault removal means to reduce the number and severity

of faults.

• Fault forecasting means to estimate the present number,

the future incidence, and the likely consequences of faults

Do we have that in the human body?

• Fault prevention means to prevent the occurrence or introduction of faults.

• Fault tolerance means to avoid service failures in the presence of faults.

• Fault removal means to reduce the number and severity of faults.

• Fault forecasting means to estimate the present number, the future incidence, and the likely consequences of faults

Fault = mutation inactivates tumor supressor gene


Failure = cells cannot stop

themselves from turning into

something else

Fault = new mutation

(oncogene)

Error = twice mutated cells

proliferate

Failure = cells turn into

something else

Fault = new mutation

(inactivate DNA repair)

Error = thrice mutated cells

proliferate

Failure = cells cannot be stoped

by the local control

Fault = new mutation (other

tumor suppressors inactivated)


Failure = Cancer

Note Fault prevention and fault tolerance aim to provide the

ability to deliver a service that can be trusted, while fault

removal and fault forecasting aim to reach confidence in

that ability by justifying that the functional and the

dependability and security specifications are adequate

and that the system is likely to meet them

Readings for this chapter:

• Avizienis, Algirdas, Jean-Claude Laprie, Brian Randell, and Carl Landwehr. "Basic Concepts and Taxonomy of Dependable and Secure Computing." IEEE Transactions on Dependable and Secure Computing 1, no. 1 (2004): 11-33.

• Dhillon, B. S. Reliability, quality, and safety for engineers. Boca Raton, Florida: CRC Press, 2005.

• Lyu, Michael R. "Introduction." In Handbook of Software Reliability Engineering, by Michael R. Lyu, 3 - 25. IEEE Computer Society Press and McGraw-Hill Book Company, 2006.

• Xing, Liudong. ECE591-03 Dependable Computing and Networking Course. Spring 2012. http://www.ece.umassd.edu/Faculty/lxing/Homepage/ECE591-S12-Homepage/ECE591-S12-Lecture.html.

What’s next?

2. Software reliability 1 - introduction

• Software Reliability and System Reliability

• Software Reliability Modeling Survey

• Techniques for Prediction Analysis and Recalibration

• The Operational Profile

3. Software reliability 2 - data analysis and

measurement

4. Software reliability 3 - testing, simulation and

solutions

5. Hardware reliability - parallel to software

6. FMEA/ FMECA, Six Sigma, HAZOP

7. Event & fault tree analysis

Administrative details

• Time:

• course will last 7 weeks

• should take approximately 6-10 hours / week of dedicated time to

complete, with all its readings and assignments

• To pass the class

• Application part:

• Select & present in 5 minutes a project

• List will be sent as a google docs tomorrow

• May be a team, but each individual must speak

• Final exam:

• Present 3 examples that illustrate ideas from any course

• Examples from any domain

• Individually

dependability: what is it?staff.cs.upt.ro/~vancusa/fsc/c1.pdf · •rapid advancement of hardware...

Documents