birolini_affidabilita 121130

7/23/2019 Birolini_Affidabilita 121130

1/26

Modeling Reliability & availability of Complex Systems:

Possibilities and Limits

Prof. Dr. A. Birolini, emeritus ETH Zrich

Bologna, 30/11/2012

1.Some basic definitions

2.Non repairable systems up to system failure

3.Repairable series-parallel structures (exact solutions, approximate expressions)

4.Repairable complex systems (preventive maintenance, imperfect switching, incomplete coverage, more than one failure mode, common cause failures, reconfigurable systems, networks, human aspects)

5.Alternative investigations methods

6.Conclusions

7.Some basic literature sources

Prof. Dr. A. Birolini


2/26

1. Some basic definitions

Reliability (R, R( )t )

Probability that the item is able to perform as required for a given time interval

= failure-free time, F Pr ( ) { }t t= R( ) Pr{ } F( ), R( )t t t= > = = 1 0 1

E [ ] = R( )t dt0

=MTTF (Mean operating time to failure)

at system levelR ( ) Pr{ }

( )

( , ]Si i

Si Si

t t Z t t dtMTTF

=

=

system up in is entered at = 0

R

0

0

'= repair (restoration) time, G Pr G( ) { } , ( )'t t= = 0 0

E G( ' )[ ] = =

t dt0

MTTR (Mean time to repair (restoration))

at system level MTTRS

Availability, Point Availability ( ( ) ( )),A PAt t

Probability that the item is in a state to perform as required at a given instant

steady-state value PA MTTF MTTF MTTR= +/( )

at system level PAS S S SMTTF MTTF MTTR= +/( )

(see e.g. [3] for other form of availability)



3/26

Failure Rate ( ( )) t

( ) lim Pr{ }f( )

F( )

R( ) /

R( )t

tt t t t

t

t

t

d t dt

t= < + > =

=

0

11

R( )0 1= R( ) ( )t e x dxt

= 0 R( )t e t= ( )t =

Repair Rate ( ) ( )( )

tt

t=

gG

1

, G( )t e t= 1 ( )t =

Reliability Block Diagram

Block diagram showing how failures of subitems, represented

by the blocks, can result in a failure of the item

is an event diagram

in series, all elementswhich must operate

in parallel, elements which can fail (redundant elements)

removed, all elements which are not relevant for the required function



4/26

2. Non repairable systems up to system failure

Table 2.1 Series-parallelstructures and associated reliability functions

Reliability Block DiagramReliability Function

( ( ) ; ( ), ( ) )R t R tS S i i i= = =R R R0 0 1Remarks

1

E i R RS i=

one-item structure

( ) ( )t tR i e it= =

2

E 1

E 2

E n R RS i

i

n

==

1

series structure

S nt t t( ) ( ) ( )= ++1

3

E 1

E 2

1-out-of-2

R R R R RS = + 1 2 1 2

1-out-of-2 redundancy

(R R

R1 2

22

( ) ( )

( ))

t tt eee

t

St

t

= =

=

4

E 1

E 2

k-out-of-n

E n

E E E

R R R

R R R

n

n

Si n i

i k

n ni

1

1

1

= = =

= = =

= =

...

( )( )

k-out-of-n redundancy

(for k

R RS

n

=

=

1

1 1( ) )

E 1

E 2

E 3

E 4

E 5

E 6

E 7

5

R R R R R R

R R R R R R R

S = +

(

)1 2 3 4 5

1 2 3 4 5 6 7

Series-parallel structure

6

E 2

E

3

E

2-out-of-3

Alarm

E 1 E E E E

R R R R

R R R RS

1 2 3

1 2 32 33 2

= = =

= = =

= ( )

Majority redundancy

general case

( )n + 1 out of

( )2 1n + , n = 1 2, ,...



5/26

Methods for non series-parallel, or complex, structures

key item

successful path

state space

Boolean functions

parallel models with load sharing

elements with more than one failure mechanism or mode

mechanical structures

Table 2.1 Complex structures and associated reliability functions(key item method)

E 1

E 3

E 2

E 4

E 5

7

R R R R R R

R R R R R

R R R R R R R R

S = + + +

+

5 1 2 1 2

3 4 3 4 5

1 3 2 4 1 2 3 4

1( ).

( ) ( ).

( )

Bridge structure

(bi-directional on E 5 )

8

E 1

E 3

E 2

E 4

E 5

R R R R R R R R

R R R R R R

R R R

S= + +

+

+

4 2 1 3 5 3 5

1 2 3 5 3 5

4 1 31

[ ( )

( )]

( )

Bridge structure

(unidirectional on E 5 )

9.

E 1

E 2

E 3

E 4

E 5

E 2 RS R R R R R R

R R R R

= +

+

2 1 4 5 4 5

1 2 1 3 5

( )

( )

The element E2 ap-pearstwicein the rel.block diagram (not inthe hardware)



6/26

3. Repairable series-parallel structures

Investigation is performed using stochastic processes

Table 6.1Basic stochastic proc. for reliability & availability analysis of repairable systems

Stochastic process Can be used in modeling Background Difficulty

Renewal processOne-item structure(spare part) witharbitrary failure rate,negligible repairtime, new after repair

Renewaltheory

Medium

Alternating renewalprocess One-item repairablestructurewitharbitrary failure and repair rates, newafter repair

Renewaltheory Medium

Markov process (MP)

(finite state space, time-homogeneous, rgenera-tive at every timet )

Systems of arbitrary structure whoseelements have constant failure and re-pair rates ( , ) i i *during the stay timein every state(not necessarily at a statechange, e.g. because of load sharing)

Differentialequationsor Integralequations

Low

Semi-Markov process(SMP)

(regenerative at everystate change)

Somesystems with only one repair crew,whose elements have constant failure

and arbitrary repair rates*

Integralequations

Medium

Semi-regenerativeprocess

(process with an embed-ded SMP with2 states)

Systems of arbitrary structurewithonly one repair crew,whose elementshave constant failureand arbitraryrepair rates*

Integralequations High

Regenerative processwithjust one (orsomefew) regenerationstate(s)

Systems of arbitrary structurewhoseelements have constant failure andarbitrary repair rates (in some casesconstant failure rate only in a reserve state)

Integralequations

High tovery high

Non regenerativeprocess

Systems of arbitrary structure whoseelements have arbitrary failure and repairrates

Partial diff.equations

High tovery high

+constant failure rate can be extended to Erlang distributed failure-free tines, same for repair times



7/26

to simplify, let us consider firstMarkov processes

diagram of transition probabilities in ( ,t t t+ ](later of transition rates to simplify the notation)

Examples

Z0 Z1 Z2

a)

1 (+

r)

t 1 (

+

)

t 1

t(+r)t t

tt

Z0 Z1 Z2

b)

1(+ ) t r

t

t

1 (+

r)

t 1 (

+

)

t

01 0 10 21 2 12 1 01 0 10 12 1 2 1= = + = = = = = + = = + = = = + =r r, , , , , , ,

point availability reliability function(Z2absorbing)

Figure 6.8 Diagrams of the transition probabilities in ( , ]t t t+ for a

repairable 1-out-of-2 warm redundancywith 2 identical elements (idealfailure detection & switch, one repair crew, Z2 down state, arbitrary t, t0)



8/26

1-out-of-2 (active)

E 3

E 2

E 1

E2 =E3 =E

Distribution of

failure-free times: F(t)= 1 e- tforE, F(t)= 1 e-1tforE1 repair times: G(t)= 1 e- tforE, G(t)= 1 e-1tforE

1

1

t

1t

1 (1+)t

1 (2+1)t

1t

Z1

Z2

Z0

Z5

Z6

Z4

2t

1t

1t

1t

t

1 (+1)t

t

t

1 (+1+)t

2tt

1

(2

+

1)t

Z8Z3

Z7

1

1

t

1

tt

t

1 (+)t

t

t

1

t

1tZ1

Z2

Z0

Z3

Z4

1t

t

t

1 (+1+)t

2t

t

1

(2

+

1)t

1 1t

1t

1 t

1 t

t

01= 23= 47= 1; 02= 15= 2; 10= 52= 64= 1;20= 31= 42= 73= 85= ; 24= 38= 56=

01= 23= 1; 02= 2; 24= ;10= 1; 20= 31= 42=

c) No repair priority (repair as per first-in first-out, yielding 16 states forE E1 2 )

d) As c), but no furtherfailures at system down

Figure A7.6 Diagram of transition probabilities in ( , ]t t t+ for a repairableseries - parallel structurewith E E E2 3= = , ideal failure detection & switch,one repair crew



9/26

= ; 01 1

= ; 03 3

= ; 10 1

= 2 ; 17 2

= ; 1 10 3

= ; 20 2

= ; 24 3

= ; 26 2

= ; 27 1

= ; 30 3

= 2 ; 34 2

= ; 3 10 1

= ; 42 3

= ; 45 2

= ; 48 1

= ; 56 3

= ; 59 1

= ; 62 2

= ; 65 3

= ; 6 11 1

= ; 72 1

= ; 78 3

= ; 7 11 2

= ; 84 1

= ; 95 1

= ; 10 3 1

= 11 6 1

= 2 ; 02 2

Z 0

l t 0

l t 1

l t 2

l t 3

l t 4

l t 5

l t 6

l t 7

l t 1

l t 1

l t 1

l t 1

t 1 t

1 t 1

t 3

t 3

t 1

t 2

t 2

t 1

t 1

t 1

t 1

t 1

t 1

t 3

t 3

t 3

t 3 t

1

t 1

t 2

2 t 2

t 3

2 t 2

2 t 2

t 2

t 2

t 3

Z 2

Z 1

Z 3

Z 6

Z 7

Z 4

Z 5

Z 11

Z 9

Z 8

Z 10

, 1 1

E 2

E 2'

E 1

E 3

, 2 2

, 3 3

, 2 2

1-out-of-2 active(E2'=E2)

Repair priority:E1,E3,E2

Figure 6.20 Reliability block diagram and diagram of transition probabilities in( , ]t t t+ , ideal failure detection & switch, one repair crew, repair priority in thesequence E E E1 2 3, , , no further failures at system down

Note: The diagram of transition probabilities would have 14 states for E E2 2 ',

16 states for totally independent elements, and 65 states forE E2 2 ',onerepair crew & repair as per first-in first-out



10/26

Relationships for the reliability and point availability of systems

described by time-homogeneous Markov processes (method of

integral equations, U system up states set of the= )

Reliability function

R ( ) R ( ) , ,( )S i

ti j

xSjt e e t x dx i i

Z U tt

Z Uj i

Sij

i= +

>=

,R

0

00 1

(A7.122)

MTTF MTTFSii

i j

iSj

Z Uj ij

iZ U= +

1

,

(A7.173)

i j= transition rate, i i jjj i

m

=

=

0

(A7.103)

Point availability

PA ( ) P ( ) , , , ,S i ij t tZ Uj

i m t=

= > 0 0 (A7.112)

PA t PS Si j t

Z Uj

=

=lim ( )PA (A7.131)

P ( ) P ( )

, { , ..., }, , P ( ) , ,

i j i j t

ikx

kj

t

kk i

m

i j m j i

t e t x dx i ie

t i j i j i i i j

= + =

> = = =

00

0 0 0 1 0

,

for

(A7.120)

P P fromP t t P Pjt

jt

i j j j i i j

i i j

m

= = =

=

lim ( ) lim ( ) ,

,

0

(A7.179), (A7.127)

one equation for Pj must be dropped and replaced byP Pm0 1+ + =



11/26

Results for a one-item structure

R ( ) ,S t e t t0 0= > , MTTFS0 1= / (6.14), (6.15)

PA ( ) ( ) ( / )( )S t e t e t0 1= ++

+ + +

(6.20)

Results for a 1-out-of-2 warm redundancy

R ( ) , ,( ) ( )

S t e Sr rt t S

S rMTTF0

01

20 > = =

+ +

+

+

(6.94)

PA ,S S S t tt PA PA e0 01( ) ( ) + > (6.88)

limPA ( )( )

( ) ( )

( )

t

S St PAr

r

r

= = + +

+ + +

+

0 12

2 2

(6.87)



12/26

Approximate Expressions for Large Series - Parallel Structures

totally independent elementsmacro-structures

one repair crew and no further failures at system down

cutting states with more than k failures

clustering of states

Example using macro-structures (Table 6.10)

1, 1

E1

2 2,

E2 3 3

,

E32

, 2

E2

System

1, 1

E1

7, 7

E7

7 3 22

22 + / ,

7

3 22

2 3 2 2

2 3 2 3 22

3 2

2 1 2

2 2

+ +

+ +

( ) ( / )

/ (6.180)

E

S

, S

S S

+

+

+

1 7 1 7

1 7 1 7

1 7

12 72

( ), (6.181)



13/26

Table 6.10 Basicmacro-structures, constant failure & repair rates ( , ), activeredundancy, ideal failure detection & switch, one repair crewfor each macro-structure, repair priority on E, no further failure at system down

,

E

S S S S S S S SMTTF PA= = = + 1 1 1 10/ , , / ( / ) /

=

SS S

S

S

S

PA

PA PA1 1

n,

n1, 1

E1 E

n

S n S S nMTTF PA n= ++ + +1 0 11 1 1/ , ( )/ /

+ +

+ +

/ /

SS

S

n

n nPA11

1 1

( ... ) = = = for 1 n

1-out-of-2 (active)

1, 1

2

,

2E2

E1

1 0 1 2 1 2 1 2/ / ( ( )) S SMTTF +

PAS +11 2

12

22 1

222

( )

+

+

S

S

SPA11 2

1 2

12

22

( ) = for 1 2

1-out-of-2 active (E1

=E2

=E)

repair priority onE

,

E2

E1

E

,

1 1 2 3 1 2

2 20

2 2

2 2

/ ( / ( ) ) ( / )

/ / )

/ /

S SMTTF = + + +


14/26

comparison of results

1

1/100 1/100 1/1,000 1/1,000

2 1/1,000 1/1,000 1/10,000 1/10,000

3 1/10,000 1/10,000 1/100,000 1/100,000

1 1 1/5 1 1/5

2 1/5 1/5 1/5 1/5

3 1/5 1/5 1/5 1/5

MTTFS0 (Eq. (6.178), totally IE) 1.5810 5+ 9.3010+4 1.6610+7 9.9310+6

MTTFS0 (Eq. (6.182), MS) 1.5310 5+ 9.1410+4 1.6510+7 9.9110+6

MTTFS0 (Eq. (6.186), no FF) 1.5910 5+ 9.3310+4 1.6610+7 9.9310+6

MTTFS0 (Method 4, Cutting) 1.4910 5+ 9.2910+4 1.6510+7 9.9210+6

MTTFS0 (only one repair crew) 1.6010 5+ 9.3310+4 1.6610+7 9.9310+6

1 PAS (Eq. (6.179), totally IE) 5.2510 6 2.6310 5 5.0310 8 2.5110 7

1 PAS (Eq. (6.183), MS) 2.8110 5 5.4510 5 2.6210 7 5.0510 7

1 PAS (Eq. (6.189), no FF) 6.6110 6 6.0010 5 6.0610 8 5.0610 7

1 PAS (Method 4, Cutting) 2.9910 5 5.5610 5 2.6510 7 5.06107

1 PAS (only one repair crew) 6.5810 6 5.6310 5 6.0610 8 5.0610 7



15/26

4. Repairable complex systems

Complex repairable systems are fault tolerant repairable systemsfor which a reliability block diagram does not exist or can not

easily be found

Typical situations for complex systems are

preventive maintenance

imperfect switching

incomplete coverage

more than one failure mode

common cause failures

reconfigurable systems

networks

human aspects

Constant failure ratesand, in general, also constant repair rates

are assumed



16/26

Example for imperfect switching

failure mode stuck at the state occupied, ideal failure detectionand localization, one repair crew, repair priority on switch

Z0 +

r

Z1'Z0'

r

Z1 Z2

Z0 1+

r

r

Z1

Z0' Z1'

Z2

Z2''

Z2'

a) For reliability b) For availability

Figure 6.24 Diagram of transition rates for a repairable 1-out-of-2 warmredundancy(constant failure & repair rates ( , ) , r , imperfect switching( , ) , ideal failure detection, no further failure at system down)

MTTFS r r

r r

02 3

+ + + + +

+ + + +

( ) /

( ) / ( ) / (6.206)

PA AA P P P PS Sr

r

= = + + + + +

+ + +0 0 1 1 1

2 2

' '( / )

[ / ]

(6.209)

the effect of imperfect switching becomes negligiblefor

) /


17/26

Example for incomplete coverage

identification of the failed element with probabilityc

(with probability 1cthe 1-out-of-2 active redundancy goes

down because outputs of both elements differ), 1 repair crew

2 (1 c)

Z0 Z1 Z22 c

Z2down state, absorbing for rel. calculation

Figure 6.28 State transition diagram for a repairable 1-out-of-2 activeredundancy with constant failure & repair rates ( ) , , incomplete coverage

MTTF c c cS022 2 2 2 1= + + + + [ ] / [ ( )] / ( )][ (6.228)

PA AA P Pc

S S c= = +

+

+ +

=

+

+0 1 22

1

11

2 2 1

22

2

2

( ).( ) (6.229)

the effect of incomplete coverage becomes negligible for

2 1 2 2 1 ( ) / > c c or (6.230)



18/26

Typical causes for common cause failures are

overload (electrical, thermal, mechanical)

technological weakness (material, design, production)

misuse (e.g. caused by operating or maintenance personnel)

external event

Example

,

,

E1

E2

1-out-of-2active

(E1= E

2= E )

C32

C23

C

C

2

Z1

Z2

Z3

C54

C45

Z4

Z5

C21

C41

Z0

Figure 6.36 Reliability block diagram and diagram of transition rates foravailability calculation of a 1-out-of-2 active redundancy withcommon cause

failures(C) for different possibilities (Fig.6.35), ideal failure detection & switch,one repair crew (Z Z Z Z1 3 4 5, ,, down states, absorbing for reliability calculation)

MTTFSC C C C C C C

021 23 21 23

1

2 3

1=

+ + + + + +

( ) ( )/ (6.276)

PA AAS SC

C CC C=

birolini_affidabilita 121130

Documents