birolini_affidabilita 121130

Upload: riccardo-cozza

Post on 19-Feb-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/23/2019 Birolini_Affidabilita 121130

    1/26

    Modeling Reliability & availability of Complex Systems:

    Possibilities and Limits

    Prof. Dr. A. Birolini, emeritus ETH Zrich

    Bologna, 30/11/2012

    1.Some basic definitions

    2.Non repairable systems up to system failure

    3.Repairable series-parallel structures (exact solutions, approximate expressions)

    4.Repairable complex systems (preventive maintenance, imperfect switching, incomplete coverage, more than one failure mode, common cause failures, reconfigurable systems, networks, human aspects)

    5.Alternative investigations methods

    6.Conclusions

    7.Some basic literature sources

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    2/26

    1. Some basic definitions

    Reliability (R, R( )t )

    Probability that the item is able to perform as required for a given time interval

    = failure-free time, F Pr ( ) { }t t= R( ) Pr{ } F( ), R( )t t t= > = = 1 0 1

    E [ ] = R( )t dt0

    =MTTF (Mean operating time to failure)

    at system levelR ( ) Pr{ }

    ( )

    ( , ]Si i

    Si Si

    t t Z t t dtMTTF

    =

    =

    system up in is entered at = 0

    R

    0

    0

    '= repair (restoration) time, G Pr G( ) { } , ( )'t t= = 0 0

    E G( ' )[ ] = =

    t dt0

    MTTR (Mean time to repair (restoration))

    at system level MTTRS

    Availability, Point Availability ( ( ) ( )),A PAt t

    Probability that the item is in a state to perform as required at a given instant

    steady-state value PA MTTF MTTF MTTR= +/( )

    at system level PAS S S SMTTF MTTF MTTR= +/( )

    (see e.g. [3] for other form of availability)

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    3/26

    Failure Rate ( ( )) t

    ( ) lim Pr{ }f( )

    F( )

    R( ) /

    R( )t

    tt t t t

    t

    t

    t

    d t dt

    t= < + > =

    =

    0

    11

    R( )0 1= R( ) ( )t e x dxt

    = 0 R( )t e t= ( )t =

    Repair Rate ( ) ( )( )

    tt

    t=

    gG

    1

    , G( )t e t= 1 ( )t =

    Reliability Block Diagram

    Block diagram showing how failures of subitems, represented

    by the blocks, can result in a failure of the item

    is an event diagram

    in series, all elementswhich must operate

    in parallel, elements which can fail (redundant elements)

    removed, all elements which are not relevant for the required function

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    4/26

    2. Non repairable systems up to system failure

    Table 2.1 Series-parallelstructures and associated reliability functions

    Reliability Block DiagramReliability Function

    ( ( ) ; ( ), ( ) )R t R tS S i i i= = =R R R0 0 1Remarks

    1

    E i R RS i=

    one-item structure

    ( ) ( )t tR i e it= =

    2

    E 1

    E 2

    E n R RS i

    i

    n

    ==

    1

    series structure

    S nt t t( ) ( ) ( )= ++1

    3

    E 1

    E 2

    1-out-of-2

    R R R R RS = + 1 2 1 2

    1-out-of-2 redundancy

    (R R

    R1 2

    22

    ( ) ( )

    ( ))

    t tt eee

    t

    St

    t

    = =

    =

    4

    E 1

    E 2

    k-out-of-n

    E n

    E E E

    R R R

    R R R

    n

    n

    Si n i

    i k

    n ni

    1

    1

    1

    = = =

    = = =

    = =

    ...

    ( )( )

    k-out-of-n redundancy

    (for k

    R RS

    n

    =

    =

    1

    1 1( ) )

    E 1

    E 2

    E 3

    E 4

    E 5

    E 6

    E 7

    5

    R R R R R R

    R R R R R R R

    S = +

    (

    )1 2 3 4 5

    1 2 3 4 5 6 7

    Series-parallel structure

    6

    E 2

    E

    3

    E

    2-out-of-3

    Alarm

    E 1 E E E E

    R R R R

    R R R RS

    1 2 3

    1 2 32 33 2

    = = =

    = = =

    = ( )

    Majority redundancy

    general case

    ( )n + 1 out of

    ( )2 1n + , n = 1 2, ,...

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    5/26

    Methods for non series-parallel, or complex, structures

    key item

    successful path

    state space

    Boolean functions

    parallel models with load sharing

    elements with more than one failure mechanism or mode

    mechanical structures

    Table 2.1 Complex structures and associated reliability functions(key item method)

    E 1

    E 3

    E 2

    E 4

    E 5

    7

    R R R R R R

    R R R R R

    R R R R R R R R

    S = + + +

    +

    5 1 2 1 2

    3 4 3 4 5

    1 3 2 4 1 2 3 4

    1( ).

    ( ) ( ).

    ( )

    Bridge structure

    (bi-directional on E 5 )

    8

    E 1

    E 3

    E 2

    E 4

    E 5

    R R R R R R R R

    R R R R R R

    R R R

    S= + +

    +

    +

    4 2 1 3 5 3 5

    1 2 3 5 3 5

    4 1 31

    [ ( )

    ( )]

    ( )

    Bridge structure

    (unidirectional on E 5 )

    9.

    E 1

    E 2

    E 3

    E 4

    E 5

    E 2 RS R R R R R R

    R R R R

    = +

    +

    2 1 4 5 4 5

    1 2 1 3 5

    ( )

    ( )

    The element E2 ap-pearstwicein the rel.block diagram (not inthe hardware)

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    6/26

    3. Repairable series-parallel structures

    Investigation is performed using stochastic processes

    Table 6.1Basic stochastic proc. for reliability & availability analysis of repairable systems

    Stochastic process Can be used in modeling Background Difficulty

    Renewal processOne-item structure(spare part) witharbitrary failure rate,negligible repairtime, new after repair

    Renewaltheory

    Medium

    Alternating renewalprocess One-item repairablestructurewitharbitrary failure and repair rates, newafter repair

    Renewaltheory Medium

    Markov process (MP)

    (finite state space, time-homogeneous, rgenera-tive at every timet )

    Systems of arbitrary structure whoseelements have constant failure and re-pair rates ( , ) i i *during the stay timein every state(not necessarily at a statechange, e.g. because of load sharing)

    Differentialequationsor Integralequations

    Low

    Semi-Markov process(SMP)

    (regenerative at everystate change)

    Somesystems with only one repair crew,whose elements have constant failure

    and arbitrary repair rates*

    Integralequations

    Medium

    Semi-regenerativeprocess

    (process with an embed-ded SMP with2 states)

    Systems of arbitrary structurewithonly one repair crew,whose elementshave constant failureand arbitraryrepair rates*

    Integralequations High

    Regenerative processwithjust one (orsomefew) regenerationstate(s)

    Systems of arbitrary structurewhoseelements have constant failure andarbitrary repair rates (in some casesconstant failure rate only in a reserve state)

    Integralequations

    High tovery high

    Non regenerativeprocess

    Systems of arbitrary structure whoseelements have arbitrary failure and repairrates

    Partial diff.equations

    High tovery high

    +constant failure rate can be extended to Erlang distributed failure-free tines, same for repair times

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    7/26

    to simplify, let us consider firstMarkov processes

    diagram of transition probabilities in ( ,t t t+ ](later of transition rates to simplify the notation)

    Examples

    Z0 Z1 Z2

    a)

    1 (+

    r)

    t 1 (

    +

    )

    t 1

    t(+r)t t

    tt

    Z0 Z1 Z2

    b)

    1(+ ) t r

    t

    t

    1 (+

    r)

    t 1 (

    +

    )

    t

    01 0 10 21 2 12 1 01 0 10 12 1 2 1= = + = = = = = + = = + = = = + =r r, , , , , , ,

    point availability reliability function(Z2absorbing)

    Figure 6.8 Diagrams of the transition probabilities in ( , ]t t t+ for a

    repairable 1-out-of-2 warm redundancywith 2 identical elements (idealfailure detection & switch, one repair crew, Z2 down state, arbitrary t, t0)

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    8/26

    1-out-of-2 (active)

    E 3

    E 2

    E 1

    E2 =E3 =E

    Distribution of

    failure-free times: F(t)= 1 e- tforE, F(t)= 1 e-1tforE1 repair times: G(t)= 1 e- tforE, G(t)= 1 e-1tforE

    1

    1

    t

    1t

    1 (1+)t

    1 (2+1)t

    1t

    Z1

    Z2

    Z0

    Z5

    Z6

    Z4

    2t

    1t

    1t

    1t

    t

    1 (+1)t

    t

    t

    1 (+1+)t

    2tt

    1

    (2

    +

    1)t

    Z8Z3

    Z7

    1

    1

    t

    1

    tt

    t

    1 (+)t

    t

    t

    1

    t

    1tZ1

    Z2

    Z0

    Z3

    Z4

    1t

    t

    t

    1 (+1+)t

    2t

    t

    1

    (2

    +

    1)t

    1 1t

    1t

    1 t

    1 t

    t

    01= 23= 47= 1; 02= 15= 2; 10= 52= 64= 1;20= 31= 42= 73= 85= ; 24= 38= 56=

    01= 23= 1; 02= 2; 24= ;10= 1; 20= 31= 42=

    c) No repair priority (repair as per first-in first-out, yielding 16 states forE E1 2 )

    d) As c), but no furtherfailures at system down

    Figure A7.6 Diagram of transition probabilities in ( , ]t t t+ for a repairableseries - parallel structurewith E E E2 3= = , ideal failure detection & switch,one repair crew

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    9/26

    = ; 01 1

    = ; 03 3

    = ; 10 1

    = 2 ; 17 2

    = ; 1 10 3

    = ; 20 2

    = ; 24 3

    = ; 26 2

    = ; 27 1

    = ; 30 3

    = 2 ; 34 2

    = ; 3 10 1

    = ; 42 3

    = ; 45 2

    = ; 48 1

    = ; 56 3

    = ; 59 1

    = ; 62 2

    = ; 65 3

    = ; 6 11 1

    = ; 72 1

    = ; 78 3

    = ; 7 11 2

    = ; 84 1

    = ; 95 1

    = ; 10 3 1

    = 11 6 1

    = 2 ; 02 2

    Z 0

    l t 0

    l t 1

    l t 2

    l t 3

    l t 4

    l t 5

    l t 6

    l t 7

    l t 1

    l t 1

    l t 1

    l t 1

    t 1 t

    1 t 1

    t 3

    t 3

    t 1

    t 2

    t 2

    t 1

    t 1

    t 1

    t 1

    t 1

    t 1

    t 3

    t 3

    t 3

    t 3 t

    1

    t 1

    t 2

    2 t 2

    t 3

    2 t 2

    2 t 2

    t 2

    t 2

    t 3

    Z 2

    Z 1

    Z 3

    Z 6

    Z 7

    Z 4

    Z 5

    Z 11

    Z 9

    Z 8

    Z 10

    , 1 1

    E 2

    E 2'

    E 1

    E 3

    , 2 2

    , 3 3

    , 2 2

    1-out-of-2 active(E2'=E2)

    Repair priority:E1,E3,E2

    Figure 6.20 Reliability block diagram and diagram of transition probabilities in( , ]t t t+ , ideal failure detection & switch, one repair crew, repair priority in thesequence E E E1 2 3, , , no further failures at system down

    Note: The diagram of transition probabilities would have 14 states for E E2 2 ',

    16 states for totally independent elements, and 65 states forE E2 2 ',onerepair crew & repair as per first-in first-out

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    10/26

    Relationships for the reliability and point availability of systems

    described by time-homogeneous Markov processes (method of

    integral equations, U system up states set of the= )

    Reliability function

    R ( ) R ( ) , ,( )S i

    ti j

    xSjt e e t x dx i i

    Z U tt

    Z Uj i

    Sij

    i= +

    >=

    ,R

    0

    00 1

    (A7.122)

    MTTF MTTFSii

    i j

    iSj

    Z Uj ij

    iZ U= +

    1

    ,

    (A7.173)

    i j= transition rate, i i jjj i

    m

    =

    =

    0

    (A7.103)

    Point availability

    PA ( ) P ( ) , , , ,S i ij t tZ Uj

    i m t=

    = > 0 0 (A7.112)

    PA t PS Si j t

    Z Uj

    =

    =lim ( )PA (A7.131)

    P ( ) P ( )

    , { , ..., }, , P ( ) , ,

    i j i j t

    ikx

    kj

    t

    kk i

    m

    i j m j i

    t e t x dx i ie

    t i j i j i i i j

    = + =

    > = = =

    00

    0 0 0 1 0

    ,

    for

    (A7.120)

    P P fromP t t P Pjt

    jt

    i j j j i i j

    i i j

    m

    = = =

    =

    lim ( ) lim ( ) ,

    ,

    0

    (A7.179), (A7.127)

    one equation for Pj must be dropped and replaced byP Pm0 1+ + =

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    11/26

    Results for a one-item structure

    R ( ) ,S t e t t0 0= > , MTTFS0 1= / (6.14), (6.15)

    PA ( ) ( ) ( / )( )S t e t e t0 1= ++

    + + +

    (6.20)

    Results for a 1-out-of-2 warm redundancy

    R ( ) , ,( ) ( )

    S t e Sr rt t S

    S rMTTF0

    01

    20 > = =

    + +

    +

    +

    (6.94)

    PA ,S S S t tt PA PA e0 01( ) ( ) + > (6.88)

    limPA ( )( )

    ( ) ( )

    ( )

    t

    S St PAr

    r

    r

    = = + +

    + + +

    +

    0 12

    2 2

    (6.87)

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    12/26

    Approximate Expressions for Large Series - Parallel Structures

    totally independent elementsmacro-structures

    one repair crew and no further failures at system down

    cutting states with more than k failures

    clustering of states

    Example using macro-structures (Table 6.10)

    1, 1

    E1

    2 2,

    E2 3 3

    ,

    E32

    , 2

    E2

    System

    1, 1

    E1

    7, 7

    E7

    7 3 22

    22 + / ,

    7

    3 22

    2 3 2 2

    2 3 2 3 22

    3 2

    2 1 2

    2 2

    + +

    + +

    ( ) ( / )

    / (6.180)

    E

    S

    , S

    S S

    +

    +

    +

    1 7 1 7

    1 7 1 7

    1 7

    12 72

    ( ), (6.181)

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    13/26

    Table 6.10 Basicmacro-structures, constant failure & repair rates ( , ), activeredundancy, ideal failure detection & switch, one repair crewfor each macro-structure, repair priority on E, no further failure at system down

    ,

    E

    S S S S S S S SMTTF PA= = = + 1 1 1 10/ , , / ( / ) /

    =

    SS S

    S

    S

    S

    PA

    PA PA1 1

    n,

    n1, 1

    E1 E

    n

    S n S S nMTTF PA n= ++ + +1 0 11 1 1/ , ( )/ /

    + +

    + +

    / /

    SS

    S

    n

    n nPA11

    1 1

    ( ... ) = = = for 1 n

    1-out-of-2 (active)

    1, 1

    2

    ,

    2E2

    E1

    1 0 1 2 1 2 1 2/ / ( ( )) S SMTTF +

    PAS +11 2

    12

    22 1

    222

    ( )

    +

    +

    S

    S

    SPA11 2

    1 2

    12

    22

    ( ) = for 1 2

    1-out-of-2 active (E1

    =E2

    =E)

    repair priority onE

    ,

    E2

    E1

    E

    ,

    1 1 2 3 1 2

    2 20

    2 2

    2 2

    / ( / ( ) ) ( / )

    / / )

    / /

    S SMTTF = + + +

  • 7/23/2019 Birolini_Affidabilita 121130

    14/26

    comparison of results

    1

    1/100 1/100 1/1,000 1/1,000

    2 1/1,000 1/1,000 1/10,000 1/10,000

    3 1/10,000 1/10,000 1/100,000 1/100,000

    1 1 1/5 1 1/5

    2 1/5 1/5 1/5 1/5

    3 1/5 1/5 1/5 1/5

    MTTFS0 (Eq. (6.178), totally IE) 1.5810 5+ 9.3010+4 1.6610+7 9.9310+6

    MTTFS0 (Eq. (6.182), MS) 1.5310 5+ 9.1410+4 1.6510+7 9.9110+6

    MTTFS0 (Eq. (6.186), no FF) 1.5910 5+ 9.3310+4 1.6610+7 9.9310+6

    MTTFS0 (Method 4, Cutting) 1.4910 5+ 9.2910+4 1.6510+7 9.9210+6

    MTTFS0 (only one repair crew) 1.6010 5+ 9.3310+4 1.6610+7 9.9310+6

    1 PAS (Eq. (6.179), totally IE) 5.2510 6 2.6310 5 5.0310 8 2.5110 7

    1 PAS (Eq. (6.183), MS) 2.8110 5 5.4510 5 2.6210 7 5.0510 7

    1 PAS (Eq. (6.189), no FF) 6.6110 6 6.0010 5 6.0610 8 5.0610 7

    1 PAS (Method 4, Cutting) 2.9910 5 5.5610 5 2.6510 7 5.06107

    1 PAS (only one repair crew) 6.5810 6 5.6310 5 6.0610 8 5.0610 7

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    15/26

    4. Repairable complex systems

    Complex repairable systems are fault tolerant repairable systemsfor which a reliability block diagram does not exist or can not

    easily be found

    Typical situations for complex systems are

    preventive maintenance

    imperfect switching

    incomplete coverage

    more than one failure mode

    common cause failures

    reconfigurable systems

    networks

    human aspects

    Constant failure ratesand, in general, also constant repair rates

    are assumed

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    16/26

    Example for imperfect switching

    failure mode stuck at the state occupied, ideal failure detectionand localization, one repair crew, repair priority on switch

    Z0 +

    r

    Z1'Z0'

    r

    Z1 Z2

    Z0 1+

    r

    r

    Z1

    Z0' Z1'

    Z2

    Z2''

    Z2'

    a) For reliability b) For availability

    Figure 6.24 Diagram of transition rates for a repairable 1-out-of-2 warmredundancy(constant failure & repair rates ( , ) , r , imperfect switching( , ) , ideal failure detection, no further failure at system down)

    MTTFS r r

    r r

    02 3

    + + + + +

    + + + +

    ( ) /

    ( ) / ( ) / (6.206)

    PA AA P P P PS Sr

    r

    = = + + + + +

    + + +0 0 1 1 1

    2 2

    ' '( / )

    [ / ]

    (6.209)

    the effect of imperfect switching becomes negligiblefor

    ) /

  • 7/23/2019 Birolini_Affidabilita 121130

    17/26

    Example for incomplete coverage

    identification of the failed element with probabilityc

    (with probability 1cthe 1-out-of-2 active redundancy goes

    down because outputs of both elements differ), 1 repair crew

    2 (1 c)

    Z0 Z1 Z22 c

    Z2down state, absorbing for rel. calculation

    Figure 6.28 State transition diagram for a repairable 1-out-of-2 activeredundancy with constant failure & repair rates ( ) , , incomplete coverage

    MTTF c c cS022 2 2 2 1= + + + + [ ] / [ ( )] / ( )][ (6.228)

    PA AA P Pc

    S S c= = +

    +

    + +

    =

    +

    +0 1 22

    1

    11

    2 2 1

    22

    2

    2

    ( ).( ) (6.229)

    the effect of incomplete coverage becomes negligible for

    2 1 2 2 1 ( ) / > c c or (6.230)

    Prof. Dr. A. Birolini

  • 7/23/2019 Birolini_Affidabilita 121130

    18/26

    Typical causes for common cause failures are

    overload (electrical, thermal, mechanical)

    technological weakness (material, design, production)

    misuse (e.g. caused by operating or maintenance personnel)

    external event

    Example

    ,

    ,

    E1

    E2

    1-out-of-2active

    (E1= E

    2= E )

    C32

    C23

    C

    C

    2

    Z1

    Z2

    Z3

    C54

    C45

    Z4

    Z5

    C21

    C41

    Z0

    Figure 6.36 Reliability block diagram and diagram of transition rates foravailability calculation of a 1-out-of-2 active redundancy withcommon cause

    failures(C) for different possibilities (Fig.6.35), ideal failure detection & switch,one repair crew (Z Z Z Z1 3 4 5, ,, down states, absorbing for reliability calculation)

    MTTFSC C C C C C C

    021 23 21 23

    1

    2 3

    1=

    + + + + + +

    ( ) ( )/ (6.276)

    PA AAS SC

    C CC C=