birolini_affidabilita 121130
TRANSCRIPT
-
7/23/2019 Birolini_Affidabilita 121130
1/26
Modeling Reliability & availability of Complex Systems:
Possibilities and Limits
Prof. Dr. A. Birolini, emeritus ETH Zrich
Bologna, 30/11/2012
1.Some basic definitions
2.Non repairable systems up to system failure
3.Repairable series-parallel structures (exact solutions, approximate expressions)
4.Repairable complex systems (preventive maintenance, imperfect switching, incomplete coverage, more than one failure mode, common cause failures, reconfigurable systems, networks, human aspects)
5.Alternative investigations methods
6.Conclusions
7.Some basic literature sources
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
2/26
1. Some basic definitions
Reliability (R, R( )t )
Probability that the item is able to perform as required for a given time interval
= failure-free time, F Pr ( ) { }t t= R( ) Pr{ } F( ), R( )t t t= > = = 1 0 1
E [ ] = R( )t dt0
=MTTF (Mean operating time to failure)
at system levelR ( ) Pr{ }
( )
( , ]Si i
Si Si
t t Z t t dtMTTF
=
=
system up in is entered at = 0
R
0
0
'= repair (restoration) time, G Pr G( ) { } , ( )'t t= = 0 0
E G( ' )[ ] = =
t dt0
MTTR (Mean time to repair (restoration))
at system level MTTRS
Availability, Point Availability ( ( ) ( )),A PAt t
Probability that the item is in a state to perform as required at a given instant
steady-state value PA MTTF MTTF MTTR= +/( )
at system level PAS S S SMTTF MTTF MTTR= +/( )
(see e.g. [3] for other form of availability)
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
3/26
Failure Rate ( ( )) t
( ) lim Pr{ }f( )
F( )
R( ) /
R( )t
tt t t t
t
t
t
d t dt
t= < + > =
=
0
11
R( )0 1= R( ) ( )t e x dxt
= 0 R( )t e t= ( )t =
Repair Rate ( ) ( )( )
tt
t=
gG
1
, G( )t e t= 1 ( )t =
Reliability Block Diagram
Block diagram showing how failures of subitems, represented
by the blocks, can result in a failure of the item
is an event diagram
in series, all elementswhich must operate
in parallel, elements which can fail (redundant elements)
removed, all elements which are not relevant for the required function
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
4/26
2. Non repairable systems up to system failure
Table 2.1 Series-parallelstructures and associated reliability functions
Reliability Block DiagramReliability Function
( ( ) ; ( ), ( ) )R t R tS S i i i= = =R R R0 0 1Remarks
1
E i R RS i=
one-item structure
( ) ( )t tR i e it= =
2
E 1
E 2
E n R RS i
i
n
==
1
series structure
S nt t t( ) ( ) ( )= ++1
3
E 1
E 2
1-out-of-2
R R R R RS = + 1 2 1 2
1-out-of-2 redundancy
(R R
R1 2
22
( ) ( )
( ))
t tt eee
t
St
t
= =
=
4
E 1
E 2
k-out-of-n
E n
E E E
R R R
R R R
n
n
Si n i
i k
n ni
1
1
1
= = =
= = =
= =
...
( )( )
k-out-of-n redundancy
(for k
R RS
n
=
=
1
1 1( ) )
E 1
E 2
E 3
E 4
E 5
E 6
E 7
5
R R R R R R
R R R R R R R
S = +
(
)1 2 3 4 5
1 2 3 4 5 6 7
Series-parallel structure
6
E 2
E
3
E
2-out-of-3
Alarm
E 1 E E E E
R R R R
R R R RS
1 2 3
1 2 32 33 2
= = =
= = =
= ( )
Majority redundancy
general case
( )n + 1 out of
( )2 1n + , n = 1 2, ,...
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
5/26
Methods for non series-parallel, or complex, structures
key item
successful path
state space
Boolean functions
parallel models with load sharing
elements with more than one failure mechanism or mode
mechanical structures
Table 2.1 Complex structures and associated reliability functions(key item method)
E 1
E 3
E 2
E 4
E 5
7
R R R R R R
R R R R R
R R R R R R R R
S = + + +
+
5 1 2 1 2
3 4 3 4 5
1 3 2 4 1 2 3 4
1( ).
( ) ( ).
( )
Bridge structure
(bi-directional on E 5 )
8
E 1
E 3
E 2
E 4
E 5
R R R R R R R R
R R R R R R
R R R
S= + +
+
+
4 2 1 3 5 3 5
1 2 3 5 3 5
4 1 31
[ ( )
( )]
( )
Bridge structure
(unidirectional on E 5 )
9.
E 1
E 2
E 3
E 4
E 5
E 2 RS R R R R R R
R R R R
= +
+
2 1 4 5 4 5
1 2 1 3 5
( )
( )
The element E2 ap-pearstwicein the rel.block diagram (not inthe hardware)
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
6/26
3. Repairable series-parallel structures
Investigation is performed using stochastic processes
Table 6.1Basic stochastic proc. for reliability & availability analysis of repairable systems
Stochastic process Can be used in modeling Background Difficulty
Renewal processOne-item structure(spare part) witharbitrary failure rate,negligible repairtime, new after repair
Renewaltheory
Medium
Alternating renewalprocess One-item repairablestructurewitharbitrary failure and repair rates, newafter repair
Renewaltheory Medium
Markov process (MP)
(finite state space, time-homogeneous, rgenera-tive at every timet )
Systems of arbitrary structure whoseelements have constant failure and re-pair rates ( , ) i i *during the stay timein every state(not necessarily at a statechange, e.g. because of load sharing)
Differentialequationsor Integralequations
Low
Semi-Markov process(SMP)
(regenerative at everystate change)
Somesystems with only one repair crew,whose elements have constant failure
and arbitrary repair rates*
Integralequations
Medium
Semi-regenerativeprocess
(process with an embed-ded SMP with2 states)
Systems of arbitrary structurewithonly one repair crew,whose elementshave constant failureand arbitraryrepair rates*
Integralequations High
Regenerative processwithjust one (orsomefew) regenerationstate(s)
Systems of arbitrary structurewhoseelements have constant failure andarbitrary repair rates (in some casesconstant failure rate only in a reserve state)
Integralequations
High tovery high
Non regenerativeprocess
Systems of arbitrary structure whoseelements have arbitrary failure and repairrates
Partial diff.equations
High tovery high
+constant failure rate can be extended to Erlang distributed failure-free tines, same for repair times
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
7/26
to simplify, let us consider firstMarkov processes
diagram of transition probabilities in ( ,t t t+ ](later of transition rates to simplify the notation)
Examples
Z0 Z1 Z2
a)
1 (+
r)
t 1 (
+
)
t 1
t(+r)t t
tt
Z0 Z1 Z2
b)
1(+ ) t r
t
t
1 (+
r)
t 1 (
+
)
t
01 0 10 21 2 12 1 01 0 10 12 1 2 1= = + = = = = = + = = + = = = + =r r, , , , , , ,
point availability reliability function(Z2absorbing)
Figure 6.8 Diagrams of the transition probabilities in ( , ]t t t+ for a
repairable 1-out-of-2 warm redundancywith 2 identical elements (idealfailure detection & switch, one repair crew, Z2 down state, arbitrary t, t0)
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
8/26
1-out-of-2 (active)
E 3
E 2
E 1
E2 =E3 =E
Distribution of
failure-free times: F(t)= 1 e- tforE, F(t)= 1 e-1tforE1 repair times: G(t)= 1 e- tforE, G(t)= 1 e-1tforE
1
1
t
1t
1 (1+)t
1 (2+1)t
1t
Z1
Z2
Z0
Z5
Z6
Z4
2t
1t
1t
1t
t
1 (+1)t
t
t
1 (+1+)t
2tt
1
(2
+
1)t
Z8Z3
Z7
1
1
t
1
tt
t
1 (+)t
t
t
1
t
1tZ1
Z2
Z0
Z3
Z4
1t
t
t
1 (+1+)t
2t
t
1
(2
+
1)t
1 1t
1t
1 t
1 t
t
01= 23= 47= 1; 02= 15= 2; 10= 52= 64= 1;20= 31= 42= 73= 85= ; 24= 38= 56=
01= 23= 1; 02= 2; 24= ;10= 1; 20= 31= 42=
c) No repair priority (repair as per first-in first-out, yielding 16 states forE E1 2 )
d) As c), but no furtherfailures at system down
Figure A7.6 Diagram of transition probabilities in ( , ]t t t+ for a repairableseries - parallel structurewith E E E2 3= = , ideal failure detection & switch,one repair crew
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
9/26
= ; 01 1
= ; 03 3
= ; 10 1
= 2 ; 17 2
= ; 1 10 3
= ; 20 2
= ; 24 3
= ; 26 2
= ; 27 1
= ; 30 3
= 2 ; 34 2
= ; 3 10 1
= ; 42 3
= ; 45 2
= ; 48 1
= ; 56 3
= ; 59 1
= ; 62 2
= ; 65 3
= ; 6 11 1
= ; 72 1
= ; 78 3
= ; 7 11 2
= ; 84 1
= ; 95 1
= ; 10 3 1
= 11 6 1
= 2 ; 02 2
Z 0
l t 0
l t 1
l t 2
l t 3
l t 4
l t 5
l t 6
l t 7
l t 1
l t 1
l t 1
l t 1
t 1 t
1 t 1
t 3
t 3
t 1
t 2
t 2
t 1
t 1
t 1
t 1
t 1
t 1
t 3
t 3
t 3
t 3 t
1
t 1
t 2
2 t 2
t 3
2 t 2
2 t 2
t 2
t 2
t 3
Z 2
Z 1
Z 3
Z 6
Z 7
Z 4
Z 5
Z 11
Z 9
Z 8
Z 10
, 1 1
E 2
E 2'
E 1
E 3
, 2 2
, 3 3
, 2 2
1-out-of-2 active(E2'=E2)
Repair priority:E1,E3,E2
Figure 6.20 Reliability block diagram and diagram of transition probabilities in( , ]t t t+ , ideal failure detection & switch, one repair crew, repair priority in thesequence E E E1 2 3, , , no further failures at system down
Note: The diagram of transition probabilities would have 14 states for E E2 2 ',
16 states for totally independent elements, and 65 states forE E2 2 ',onerepair crew & repair as per first-in first-out
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
10/26
Relationships for the reliability and point availability of systems
described by time-homogeneous Markov processes (method of
integral equations, U system up states set of the= )
Reliability function
R ( ) R ( ) , ,( )S i
ti j
xSjt e e t x dx i i
Z U tt
Z Uj i
Sij
i= +
>=
,R
0
00 1
(A7.122)
MTTF MTTFSii
i j
iSj
Z Uj ij
iZ U= +
1
,
(A7.173)
i j= transition rate, i i jjj i
m
=
=
0
(A7.103)
Point availability
PA ( ) P ( ) , , , ,S i ij t tZ Uj
i m t=
= > 0 0 (A7.112)
PA t PS Si j t
Z Uj
=
=lim ( )PA (A7.131)
P ( ) P ( )
, { , ..., }, , P ( ) , ,
i j i j t
ikx
kj
t
kk i
m
i j m j i
t e t x dx i ie
t i j i j i i i j
= + =
> = = =
00
0 0 0 1 0
,
for
(A7.120)
P P fromP t t P Pjt
jt
i j j j i i j
i i j
m
= = =
=
lim ( ) lim ( ) ,
,
0
(A7.179), (A7.127)
one equation for Pj must be dropped and replaced byP Pm0 1+ + =
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
11/26
Results for a one-item structure
R ( ) ,S t e t t0 0= > , MTTFS0 1= / (6.14), (6.15)
PA ( ) ( ) ( / )( )S t e t e t0 1= ++
+ + +
(6.20)
Results for a 1-out-of-2 warm redundancy
R ( ) , ,( ) ( )
S t e Sr rt t S
S rMTTF0
01
20 > = =
+ +
+
+
(6.94)
PA ,S S S t tt PA PA e0 01( ) ( ) + > (6.88)
limPA ( )( )
( ) ( )
( )
t
S St PAr
r
r
= = + +
+ + +
+
0 12
2 2
(6.87)
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
12/26
Approximate Expressions for Large Series - Parallel Structures
totally independent elementsmacro-structures
one repair crew and no further failures at system down
cutting states with more than k failures
clustering of states
Example using macro-structures (Table 6.10)
1, 1
E1
2 2,
E2 3 3
,
E32
, 2
E2
System
1, 1
E1
7, 7
E7
7 3 22
22 + / ,
7
3 22
2 3 2 2
2 3 2 3 22
3 2
2 1 2
2 2
+ +
+ +
( ) ( / )
/ (6.180)
E
S
, S
S S
+
+
+
1 7 1 7
1 7 1 7
1 7
12 72
( ), (6.181)
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
13/26
Table 6.10 Basicmacro-structures, constant failure & repair rates ( , ), activeredundancy, ideal failure detection & switch, one repair crewfor each macro-structure, repair priority on E, no further failure at system down
,
E
S S S S S S S SMTTF PA= = = + 1 1 1 10/ , , / ( / ) /
=
SS S
S
S
S
PA
PA PA1 1
n,
n1, 1
E1 E
n
S n S S nMTTF PA n= ++ + +1 0 11 1 1/ , ( )/ /
+ +
+ +
/ /
SS
S
n
n nPA11
1 1
( ... ) = = = for 1 n
1-out-of-2 (active)
1, 1
2
,
2E2
E1
1 0 1 2 1 2 1 2/ / ( ( )) S SMTTF +
PAS +11 2
12
22 1
222
( )
+
+
S
S
SPA11 2
1 2
12
22
( ) = for 1 2
1-out-of-2 active (E1
=E2
=E)
repair priority onE
,
E2
E1
E
,
1 1 2 3 1 2
2 20
2 2
2 2
/ ( / ( ) ) ( / )
/ / )
/ /
S SMTTF = + + +
-
7/23/2019 Birolini_Affidabilita 121130
14/26
comparison of results
1
1/100 1/100 1/1,000 1/1,000
2 1/1,000 1/1,000 1/10,000 1/10,000
3 1/10,000 1/10,000 1/100,000 1/100,000
1 1 1/5 1 1/5
2 1/5 1/5 1/5 1/5
3 1/5 1/5 1/5 1/5
MTTFS0 (Eq. (6.178), totally IE) 1.5810 5+ 9.3010+4 1.6610+7 9.9310+6
MTTFS0 (Eq. (6.182), MS) 1.5310 5+ 9.1410+4 1.6510+7 9.9110+6
MTTFS0 (Eq. (6.186), no FF) 1.5910 5+ 9.3310+4 1.6610+7 9.9310+6
MTTFS0 (Method 4, Cutting) 1.4910 5+ 9.2910+4 1.6510+7 9.9210+6
MTTFS0 (only one repair crew) 1.6010 5+ 9.3310+4 1.6610+7 9.9310+6
1 PAS (Eq. (6.179), totally IE) 5.2510 6 2.6310 5 5.0310 8 2.5110 7
1 PAS (Eq. (6.183), MS) 2.8110 5 5.4510 5 2.6210 7 5.0510 7
1 PAS (Eq. (6.189), no FF) 6.6110 6 6.0010 5 6.0610 8 5.0610 7
1 PAS (Method 4, Cutting) 2.9910 5 5.5610 5 2.6510 7 5.06107
1 PAS (only one repair crew) 6.5810 6 5.6310 5 6.0610 8 5.0610 7
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
15/26
4. Repairable complex systems
Complex repairable systems are fault tolerant repairable systemsfor which a reliability block diagram does not exist or can not
easily be found
Typical situations for complex systems are
preventive maintenance
imperfect switching
incomplete coverage
more than one failure mode
common cause failures
reconfigurable systems
networks
human aspects
Constant failure ratesand, in general, also constant repair rates
are assumed
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
16/26
Example for imperfect switching
failure mode stuck at the state occupied, ideal failure detectionand localization, one repair crew, repair priority on switch
Z0 +
r
Z1'Z0'
r
Z1 Z2
Z0 1+
r
r
Z1
Z0' Z1'
Z2
Z2''
Z2'
a) For reliability b) For availability
Figure 6.24 Diagram of transition rates for a repairable 1-out-of-2 warmredundancy(constant failure & repair rates ( , ) , r , imperfect switching( , ) , ideal failure detection, no further failure at system down)
MTTFS r r
r r
02 3
+ + + + +
+ + + +
( ) /
( ) / ( ) / (6.206)
PA AA P P P PS Sr
r
= = + + + + +
+ + +0 0 1 1 1
2 2
' '( / )
[ / ]
(6.209)
the effect of imperfect switching becomes negligiblefor
) /
-
7/23/2019 Birolini_Affidabilita 121130
17/26
Example for incomplete coverage
identification of the failed element with probabilityc
(with probability 1cthe 1-out-of-2 active redundancy goes
down because outputs of both elements differ), 1 repair crew
2 (1 c)
Z0 Z1 Z22 c
Z2down state, absorbing for rel. calculation
Figure 6.28 State transition diagram for a repairable 1-out-of-2 activeredundancy with constant failure & repair rates ( ) , , incomplete coverage
MTTF c c cS022 2 2 2 1= + + + + [ ] / [ ( )] / ( )][ (6.228)
PA AA P Pc
S S c= = +
+
+ +
=
+
+0 1 22
1
11
2 2 1
22
2
2
( ).( ) (6.229)
the effect of incomplete coverage becomes negligible for
2 1 2 2 1 ( ) / > c c or (6.230)
Prof. Dr. A. Birolini
-
7/23/2019 Birolini_Affidabilita 121130
18/26
Typical causes for common cause failures are
overload (electrical, thermal, mechanical)
technological weakness (material, design, production)
misuse (e.g. caused by operating or maintenance personnel)
external event
Example
,
,
E1
E2
1-out-of-2active
(E1= E
2= E )
C32
C23
C
C
2
Z1
Z2
Z3
C54
C45
Z4
Z5
C21
C41
Z0
Figure 6.36 Reliability block diagram and diagram of transition rates foravailability calculation of a 1-out-of-2 active redundancy withcommon cause
failures(C) for different possibilities (Fig.6.35), ideal failure detection & switch,one repair crew (Z Z Z Z1 3 4 5, ,, down states, absorbing for reliability calculation)
MTTFSC C C C C C C
021 23 21 23
1
2 3
1=
+ + + + + +
( ) ( )/ (6.276)
PA AAS SC
C CC C=