735 reliability, m - springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … ·...

13
735 Reliability, M 42. Reliability, Maintainability, and Safety Gérard Morel, Jean-François Pétin, Timothy L. Johnson Within the last 20 years, digital automation has increasingly taken over manual control functions in manufacturing plants, as well as in products. With this shift, reliability, maintainability, and safety responsibilities formerly delegated to skilled human operators have increasingly shifted to au- tomation systems that now close the loop. In order to design highly dependable automation systems, the original concept of design for reliability has been refined and greatly expanded to include new engineering concepts such as availability, safety, maintainability, and survivability. Technical defi- nitions for these terms are provided in this chapter, as well as an overview of engineering methods that have been used to achieve these properties. Current standards and industrial practice in the design of dependable systems are noted. The in- tegration of dependable automation systems in multilevel architectures has also evolved greatly, and new concepts of control and monitoring, remote diagnostics, software safety, and auto- mated reconfigurability are described. An extended example of the role of dependable automation 42.1 Definitions ........................................... 736 42.2 RMS Engineering .................................. 738 42.2.1 Predictive RMS Assessment ............ 738 42.2.2 Towards a Safe Engineering Process for RMS ....................................... 739 42.3 Operational Organization and Architecture for RMS ....................... 741 42.3.1 Integrated Control and Monitoring Systems ................ 741 42.3.2 Integrated Control, Maintenance, and Technical Management Systems ...................................... 743 42.3.3 Remote and e-Maintenance .......... 743 42.3.4 Industrial Applications .................. 745 42.4 Challenges, Trends, and Open Issues ...... 745 References .................................................. 746 systems at the enterprise level is also provided. Finally, recent research trends, such as automated verification, are cited, and many citations from the extensive literature on this topic are provided. Industrial automation systems are intensively embed- ding infotronics and mechatronics technology (IMT) in order to fulfil complex applications required by the increasing customization of both services and goods [42.26]. The resulting behavior of these IMT- based automation systems is shifting system depend- ability responsibility [42.7] from the human operator to the automation software. Management, engineering, and maintenance per- sonnel have a primary responsibility to assure re- liability [42.8, 9], maintainability, and safety of all automated systems, and manufacturing systems in par- ticular. Therefore, safety, reliability, and availability as performance attributes to access the dependability of a system are threatened by a rapid growth in software 0 2 4 6 8 10 Normalized value Availability Hardware reliability Software complexity Year 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Complexity growth with availability decline Fig. 42.1 Growth of software complexity and its impact on system availability (after [42.1]) Part E 42

Upload: others

Post on 26-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

735

Reliability, M42. Reliability, Maintainability, and Safety

Gérard Morel, Jean-François Pétin, Timothy L. Johnson

Within the last 20 years, digital automation hasincreasingly taken over manual control functionsin manufacturing plants, as well as in products.With this shift, reliability, maintainability, andsafety responsibilities formerly delegated to skilledhuman operators have increasingly shifted to au-tomation systems that now close the loop. In orderto design highly dependable automation systems,the original concept of design for reliability hasbeen refined and greatly expanded to include newengineering concepts such as availability, safety,maintainability, and survivability. Technical defi-nitions for these terms are provided in this chapter,as well as an overview of engineering methodsthat have been used to achieve these properties.Current standards and industrial practice in thedesign of dependable systems are noted. The in-tegration of dependable automation systems inmultilevel architectures has also evolved greatly,and new concepts of control and monitoring,remote diagnostics, software safety, and auto-mated reconfigurability are described. An extendedexample of the role of dependable automation

42.1 Definitions ........................................... 736

42.2 RMS Engineering .................................. 73842.2.1 Predictive RMS Assessment ............ 73842.2.2 Towards a Safe Engineering Process

for RMS ....................................... 739

42.3 Operational Organizationand Architecture for RMS ....................... 74142.3.1 Integrated Control

and Monitoring Systems................ 74142.3.2 Integrated Control, Maintenance,

and Technical ManagementSystems ...................................... 743

42.3.3 Remote and e-Maintenance .......... 74342.3.4 Industrial Applications .................. 745

42.4 Challenges, Trends, and Open Issues ...... 745

References .................................................. 746

systems at the enterprise level is also provided.Finally, recent research trends, such as automatedverification, are cited, and many citations from theextensive literature on this topic are provided.

Industrial automation systems are intensively embed-ding infotronics and mechatronics technology (IMT)in order to fulfil complex applications required bythe increasing customization of both services andgoods [42.2–6]. The resulting behavior of these IMT-based automation systems is shifting system depend-ability responsibility [42.7] from the human operator tothe automation software.

Management, engineering, and maintenance per-sonnel have a primary responsibility to assure re-liability [42.8, 9], maintainability, and safety of allautomated systems, and manufacturing systems in par-ticular. Therefore, safety, reliability, and availability asperformance attributes to access the dependability ofa system are threatened by a rapid growth in software

0 2 4 6 8 10

Normalized value

Availability

Hardware reliability

Software complexity

Year

1.61.41.2

10.80.60.40.2

0

Complexity growth with availability decline

Fig. 42.1 Growth of software complexity and its impact on systemavailability (after [42.1])

PartE

42

Page 2: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

736 Part E Automation Management

Maintainability Safety SurvivabilityAvailabilityReliability

Dependability

Fig. 42.2 The dependability tree (after [42.10])

complexity that could limit further automation progress(Fig. 42.1).

Section 42.1 provides definitions of dependabilitykey concepts (Fig. 42.2) that enlarge reliability, main-tainability, and safety (RMS) concepts [42.11, 12] bycharacterizing the ability of a device or system to de-liver the correct service that can justifiably be trusted byall stakeholders in the automated process.

Then, methods for design of highly dependableautomation systems are outlined in Sect. 42.2. Sec-tion 42.3 discusses the methods for achieving long-termdependable operation for an existing system.

Finally, dependability has evolved from reliabil-ity/availability concerns to information control con-cerns, as an outgrowth of the technological deploymentof information-intensive systems and the economicalpressure for cost-effective automation [42.13]. Sec-tion 42.4 concludes with challenges, trends, and openissues related to system resilience, aiming to cope withsystem dependability in the presence of active faults,i. e., system survivability. Chapter 39 of this handbookcontains information related to the concepts covered inthis Chapter.

42.1 Definitions

Dependability is an integrative concept that encom-passes required attributes (qualities) of a systemassessed by quantitative measures (reliability, maintain-ability) or qualitative ones (safety) in order to cope withthe chain of fault–error–failure threats of an operationalsystem, by combining a set of means related to faultprevention, fault tolerance, fault removal, and fault fore-casting [42.14].

Reliability is the ability of a device or system toperform a required function under stated conditions fora specified period of time. This property is often meas-ured by the probability R(t) that a system will operatewithout failure before time t, often defined according tothe failure rate (λ(t)) as

R(t) = exp

⎛⎝−

t∫

0

λ(u)du

⎞⎠ ,

meaning

R(t) = Pr(TTF > t) ,

where TTF is the time to failure.This definition of reliability is concerned with the

following four key elements:

1. First, reliability is a probability. This means thatthere is always some chance for failure. Reliabilityengineering is concerned with achieving a speci-fied probability of success, at a specified statisticalconfidence level.

2. Second, reliability is predicated on intended func-tion. The system requirements specification is thecriterion against which reliability is measured.

3. Third, reliability applies to a specified period oftime. In practical terms, this means that a systemhas a specified chance that it will operate withoutfailure before a final time (e.g., 0 < t < T ).

4. Fourth, reliability is restricted to operation un-der stated conditions. This constraint is necessarybecause it is impossible to design a system forunlimited conditions. Both normal and abnormaloperating environments must be addressed duringdesign and testing.

Maintainability is the ease with which a deviceor system can be repaired or modified to correct andprevent faults, anticipate degradation, improve perfor-mance or adapt to a changed environment. Beyondsimple physical accessibility, it is the ability to reacha component to perform the required maintenancetask: maintainability should be described [42.15] as thecharacteristic of material design and installation thatdetermines the requirements for maintenance expendi-tures, including time, manpower, personnel skill, testequipment, technical data, and facilities, to accomplishoperational objectives in the user’s operational environ-ment. Like reliability, maintainability can be expressedas a probability M(t) based on the repair rate (μ(t)) as

M(t) = 1− exp

⎛⎝−

t∫

0

μ(u)du

⎞⎠ ,

PartE

42.1

Page 3: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

Reliability, Maintainability, and Safety 42.1 Definitions 737

meaning

M(t) = Pr(TTR < t) ,

where TTR is the time to repair.Availability characterizes the degree to which a sys-

tem or equipment is operable and in a committable stateat the start of a mission, when the mission lasts for anunknown, i. e., random, time. A simple representationfor availability is the proportion of time a system isin a functioning condition, and this can be expressedmathematically [42.17] by

A(t) = μ

μ+λ+ λ

μ+λe−(μ+λ)t ,

where λ is the constant failure rate and μ the constantrepair rate, meaning

A(t) ≡ Pr(Z(t) = 1) ,

with

Z(t) ≡⎧⎨⎩

1 if the system is up at time t

0 if the system is down at time t .

System availability is important in achieving productionrate goals, but additional processes must be invoked to

SIL1

W3

(1)

SIL2

SIL3

SIL4

(2)

(1)

W2

(1)

SIL1

SIL2

SIL3

SIL4

(1)

W1

(1)

(1)

SIL1

SIL2

SIL3

(1): no special safety requirements(2): single safety function insufficient

C1

C4

C2 F1

F2

F1

F2

F1

F2

P1

P2

P1

P2

P1

P2

P1

P2

C3

Consequence severity C1 → minor injury C2 → minor injury or single death C3 → multiple deaths C4 → a very high number of deaths

Exposure time F1 → rare to frequent F2 → frequent to continuous

Probability of undesirableoccurrences W1 → very slight probability W2 → low probability W3 → high probability

Possibility of avoidance P1 → possible P2 → not likely

Fig. 42.3 Determining safety integrity level according to IEC [42.16]

assure a high level of product quality. Historically (be-fore 1960), a quality laboratory would draw samplesfrom the production line and subject them to a battery ofmaterial, dimensional, and/or functional tests, with theobjective of verifying that quality was being attained fora typical part. In recent years, the focus has shifted fromassurance of average quality to assurance of quality ofevery part produced, driven by consumer product safetyconcerns. Deming [42.18] and others were instrumen-tal in developing methods for statistical process control,which focused on the use of quality control data to ad-just process parameters in a quality feedback loop thatassured consistently high product quality; these tech-niques were developed and perfected in the 1970s and1980s. Still more recently, sensors to measure criticalquality variables online have been developed, and thequality feedback loop is now often automated (algorith-mic statistical process control). At the same time, thestandards for product quality have moved up from abouttwo sigma (1 defective product in 100) to five or sixsigma (about 1 defective product in 100 000).

Increasing availability consists of reducing the num-ber of failures (reliability) and reducing the time torepair (maintainability) according to the following for-

PartE

42.1

Page 4: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

738 Part E Automation Management

mula

A(∞) = MTBF

MTBF+MTTR

as the asymptotic value of A(t), where MTBF is themean time between failures and the MTTR is the meantime to repair.

Safety is the state of being safe, the condition of theautomated system being protected against catastrophicconsequences to the user(s) and the environment dueto component failure, equipment damage, operator er-ror, accidents or any other undesirable abnormal event.Safety hazard mitigation can take the form of being pro-tected from the event or from exposure to somethingthat causes health or economical losses. It can includeprotection of people and limitation of environmentalimpact.

Industrial automation standards (Fig. 42.3), intro-duce engineering and design requirements that varyaccording to the safety integrity levels (SIL). SIL spec-ifies the target level of safety integrity that can bedetermined by a risk-based approach to quantify thedesired average probability of failure of a designedfunction, probability of a dangerous failure per hour,and the consequent severity of the failure. Combiningthese criteria for a given function leads to four levelsof SILs that can be associated with specific engineer-

ing guidelines and architecture recommendations; forexample, SIL 4 is the most critical level and the useof formal methods is strongly recommended to han-dle the complexity of software-intensive applicationsand to prove safety properties. To achieve RMS prop-erties over the lifecycle of an automated system, twocomplementary activities must be undertaken:

• During the system development and design phase,the occurrence of faults should be prevented by us-ing appropriate models and methods: quantitativeapproaches based on stochastic models can be usedto perform a predictive RMS analysis, and quali-tative approaches focusing on engineering process(e.g., Six Sigma) can be used to improve the qualityof the automated system and its products.• During the operational life of the automated system,personnel should avoid or react to undesired situa-tions by deploying appropriate safety architectures,maintenance procedures, and management methods.

Survivability is the quantified ability of a system tocontinue to fulfil its mission during and after a naturalor manmade disturbance. In contrast to dependabilitystudies, which focus on analysis of system dysfunction,resilience for survivability focuses on the analysis of therange of conditions over which the system will survive.

42.2 RMS Engineering

42.2.1 Predictive RMS Assessment

To evaluate and measure the various parameters thatcharacterize system dependability, many methods andapproaches have been developed. Their goal is toprovide a structured framework to represent failuresqualitatively and/or quantitatively. They are mainly oftwo types: declarative and probabilistic.

Declarative methods are designed to identify, clas-sify, and bracket the failures and provide methodsand techniques to avoid them. Most classical mod-els use graphical classification of failure, causes, andcriticality (failure mode, effects and criticality anal-ysis (FMECA), hazardous operation (Hazop), etc.),block diagrams, and fault trees to provide a graphicalmeans of evaluating the relationships between differentparts of the system (Fig. 42.4). These models incor-porate predictions based on parts-count failure ratestaken from historical data. While the predictions areoften not very accurate in an absolute sense, they

are valuable to assess relative differences in designalternatives.

Probabilistic methods are designed to measure, interms of probability, some RMS parameters. Models aremainly based on the complete enumeration of a sys-tem’s possible states, including faulty states. Thesemodels use state-transition notation involved in the clas-sical stochastic models of discrete event systems suchas Markov chains and Petri nets [42.19]. The benefitof Markov and stochastic Petri net approaches relies ontheir capability to support quantitative analysis of themodels, but these models suffer from the combinatoricexplosion of the states that occurs when modeling com-plex industrial systems. Moreover, all of these analyticapproaches assume that the stochastic processes can bemodeled using a constant exponential law. For indus-trial processes that do not fit with this strong Markovianhypothesis, the definition of simulation models, suchas Monte Carlo simulation, remains the only way toevaluate the RMS parameters.

PartE

42.2

Page 5: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

Reliability, Maintainability, and Safety 42.2 RMS Engineering 739

Item Actions taken

Sev

Occ

ur

Det

RP

N

Cla

ss

ProcessFunction/requirement

Potentialfailure mode

7 8

Potential effect(s)of failure

Currentprocesscontrols

prevention

Current processcontrolsdetection

Potentialcause(s)/mechanism(s)

of failure

Manual application ofwax side indoor

To cover inner door,lower surfaces atminimum wax thicknessto retard corrosion

Insufficient waxcoverage overspecifiedsurface

Deteriorated life ofdoor leading to:• Unsatisfactory appearance due to rust through paint over time• Impaired function of interior door hardware

Manually inserted sprayhead not inserted farenough

8

RP

N

70

Det

5

Occ

ur

2

Sev

7280Visual check eachhour-1/shift forfilm thickness(depth meter) andcoverage

Responsability& target

completion date

Stop addedsprayerchecked online

Recommendedactions(s) Actions

taken

Add positivedepth stop tosprayer

5Spray head dogged:• Viscosity too high• Temperature too low• Pressure too low

3 21317105Test spray patternat start-up and afteridle periods, andpreventive mainten-ance program toclean head

2Spray head deformeddue to impact

2 2822728Preventive mainten-ance program tomaintain head

8Spray time insufficient 7 49717392Operators instruc-tions and lot samp-ling (10 doors/shift)to check for cover-age of critical areas

3. Front door l.h.

a)

b)

Component 1State: regenerated1

Component 2State: OK

Component 1State: OK

Function 1State: Degraded

Component 1State: regenerated2

Component 2State: regenerated1

Component 2State: regenerated2

Fig. 42.4a,b Example of declarative models. (a) Fault tree. (b) FMECA (RPN – risk priority number, Sev – severity, Occur –occurence, Det – detectability (high detectability implies lower risk))

Whatever the kind of used approaches, models forpredictive RMS evaluation rely upon system data col-lection that does not always reflect the system realitydue to a gap between real and estimated states. Thislimitation reinforces the need to establish reliable gatesbetween RMS engineering and system deployment toupdate the RMS model data with real-time informationprovided by the automated system.

42.2.2 Towards a Safe Engineering Processfor RMS

Automation techniques have proven their effective-ness in controlling the behavior of complex systems,

based on the use of suitable mathematical relationshipsinvolving feedback system dynamics during the de-sign process. Nevertheless, the process of automatinga system, as addressed by system theory for auto-matic control, also deals with qualitative phases [42.19]that require intuitive modeling of real phenomena(a quantity of material, energy, information, a robot,a cell, a plant, etc.) to be controlled for achievingend-user goals. The modeler’s intuition remains impor-tant [42.20, 21] to build the model as an abstractionof the real system by identifying the appropriate input,output, and state variables in order to logically definethe required system behavior. The main difficulty is tohandle the quality of the automation engineering pro-

PartE

42.2

Page 6: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

740 Part E Automation Management

Table 42.1 Capability maturity model [42.24]

Level 1:Initial

The software process is characterized as ad hoc and occasionally even chaotic. Few processes aredefined and success depends on individual effort.

Level 2:Repeatable

Basic project management processes are established to track cost, schedule, and functionality.The necessary process discipline is in place to repeat earlier successes on projects with similarapplications.

Level 3:Defined

The software process for both management and engineering activities is documented, standard-ized, and integrated into a standard software process for the organization.

Level 4:Managed

Detailed measures of the software process and product quality are collected. Both the softwareprocess and product are quantitatively understood and controlled.

Level 5:Optimizing

Continuous process improvement is enabled by quantitative feedback from the process and frompiloting innovative ideas and technologies.

cesses from definition and development to deploymentand operation of the target system by standardizationand use of best practices that are generic to well-identified problem classes and whose quality has beenestablished by experience. Capability maturity models(CMM) [42.22], and validation–verification methods,guide engineers to combine prescriptive and descrip-tive models in order to meet system requirements suchas RMS, but without any formal proof of accuracy ofthe resulting system model. Finally, the present trendto compose automation logic by assembling standard-ized, configurable, off-the-shelf components [42.23]strengthens the need to first better relate the modelingprocess and the system goals and then to preserve themthrough the transformation of models of the automa-tion engineering chain. The CMM, was developed asa means of rating the thoroughness of a software de-velopment process, by the Carnegie Melon UniversitySoftware Engineering Institute in the 1990s.

To pave the way toward CMM level 5, there isa growing demand for formalized methods for assur-ing dependability in industrial automation engineering,in order to compensate for the increasing complexityof software-intensive applications [42.25]. In particular,high levels of safety integrity, as addressed by the In-ternational Electrotechnical Commission (IEC) 61508standard, should be formally checked and proven bymathematically sound techniques in order to verify therequired completeness, consistency, unambiguity, andfinally correctness of the system models throughout thedefinition, development, and deployment phases of theengineering lifecycle [42.26, 27].

The conformance measure of system models withregards to the requirements, and especially RMS fea-tures, can be obtained using:

• Assertion methods that include the properties to bechecked in the system models proceed to an a pos-teriori verification using automatic techniques suchas model checking [42.28].• Refinement methods that start with the formal-ization of a requirement model and progressivelyenrich this model until a concrete model of thesystem that fulfils, by construction, the identi-fied requirements is obtained. They can be basedon:– Semiformal mechanisms that identify and clas-

sify RMS requirements and then allocate thoserequirements to the function, components, andequipment of the automated system. In this case,classical models combine computer-science ap-proaches such as unified modeling language(UML) with discrete-event analysis models.

– Formal mechanisms [42.29] that allow a se-quence of formal models to be systematicallyderived while preserving the link between for-mal models and required properties (goals): anextension of the spiral method for software en-gineering.

All of these techniques may be combined to contributeto RMS issues [42.30, 31], but the emphasis on correctsystem definition is then shifted to earlier requirementsanalysis and elicitation phases.

PartE

42.2

Page 7: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

Reliability, Maintainability, and Safety 42.3 Operational Organization and Architecture for RMS 741

42.3 Operational Organization and Architecture for RMS

Taking advantage of technological advances in thefield of communications (web services embedded inprogrammable logic controllers) or in the field ofelectronics and information technology (radiofrequencyidentification (RFID), sensor networks, software em-bedded components, etc.), automated systems nowinclude an increasing part of information technologyand communication distributed at the very heart ofproduction processes and products. However, this au-tomation comes at a price: the complexity of the controlsystem in terms of both heterogeneous material (ded-icated computers, communications networks, supplychain operations and capture, etc.) and software func-tions (scheduling, control, supervisory control, moni-toring, diagnosis, reconfiguration, etc.) that it houses(Fig. 42.5).

This section deals with the operational archi-tectures and organizations required to enable activedependability of the automated system by provid-ing information processing, storage, and communi-cation capabilities to anticipate undesired situationsor to react as effectively as possible to fault occur-rences.

today

OPCdata server

OPCdata server

PLCRemote I/OFieldbusHMI

PLCRemote I/OFieldbusHMI

MESMES

SCADA

SCADA

OPCdata server

PLCRemote I/OFieldbusHMI

SCADA

ERP

CRM SCMERPCRM SCM

EAICRM

MES

SCMERP

EAI

Dynamic synchronizationS95/OAGIS standardStatic

synchronization(B2MML)

Dynamic synchronizationS95/OAGIS standard

t

Fig. 42.5 Evolution of automated system architecture (CRM – customer relationship management, ERP – enterpriseresource planning, SCM – system configuration maintenance, MES – manufacturing execution system, OPC – on-line process control, PLC – programmable logic controller, SCADA – supervisory control and data acquisition, EAI– enterprise architecture interface, HMI – human machine interface, OAGIS – open applications group integrationspecification)

42.3.1 Integrated Controland Monitoring Systems

In order to maintain an acceptable quality of service, de-pendability should no longer be considered redundant,but should be integrated with production systems inorder to be an asset in the business competitive environ-ment. This leads to integration of additional monitoringfunctions with the classical control functions of an au-tomated system in order to provide the system with theability to reconfigure itself to continue some or all ofits missions. The main idea is to avoid a complete shut-down of the system when a failure (with a consequentreduction in the productive potential of the system) oc-curs. Considering the system’s intrinsic flexibilities, theaim is to promote system reconfiguration using a reflexloop including:

• Failure detection reports about the normal or abnor-mal behavior of the system. These are mainly basedon a theoretical model of the functional and dys-functional behavior of the devices involved in theautomated system.

PartE

42.3

Page 8: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

742 Part E Automation Management

Supervision

Faultprocessing

Errorrecovery

On/Off

Action

Action

Action

Sensors

State ofprocess

Cr_act

Cr_act

Er_act

Commandfilters

Controlcommand

Decisionalunit

Controlunit

Process

Fig. 42.6 Integrated control and monitoring systems (after [42.32])

• Diagnosis is mandated to establish a causal connec-tion between an observed symptom and the failurethat occurred, its causes, and its consequences.This function involves failure localization to iso-late the failure to a subarea of the system and/ordevices, failure identification to precisely deter-mine the causes that brought about the default, andprognosis to determine whether or not there are im-

Businesslevel

Shop floorlevel

Technicalinformationsystem

BusinessManagementprocessing

Technical management

MaintenanceControl

Field-bus

Process

Fig. 42.7 Integrated control, main-tenance, and technical management:layers of automation

mediate consequences of the failure on the plant’sfuture operation.• Reconfiguration concerns reorganization of hard-ware and/or software of a control system to ensureproduction within a timeframe compatible withthe specifications. This function involves decision-making activities to define the most appropriatecontrol policy and operational activities to imple-ment the reconfigured control actions.

Integration of monitoring [42.33, 34], diagnosis [42.35]or even prognosis into control for manufacturing sys-tems have been widely explored for discrete-eventsystems (Fig. 42.6) and today provide material for iden-tifying degradation or failure modes where controlreconfiguration may be required [42.36].

Reconfiguration exploits the various flexibilities ofthe automated system (functional and/or material redun-dancies). In this way, it aims to satisfy fault-toleranceproperties that characterize the ability of a system (of-ten computer-based) to continue operating properly inthe event of the failure of some of its components. Oneof the most important design techniques is replication –providing multiple identical instances of the same sys-tem or subsystem, directing tasks or requests to all ofthem in parallel, and choosing the correct result on thebasis of a quorum – and redundancy – providing multi-ple identical instances of the same system and switchingto one of the remaining instances in case of a failure.These techniques significantly increase system reliabil-

PartE

42.3

Page 9: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

Reliability, Maintainability, and Safety 42.3 Operational Organization and Architecture for RMS 743

ity, and are often the only viable means of doing so.However, they are difficult to design and expensive toimplement, and are therefore limited to critical parts ofthe system.

While automation of these functions is obviouslynecessary for ensuring the best reactivity of the in-dustrial production system to failure occurrence, it isnevertheless true that system stoppage is often per-formed by the human operator, who must act manuallyto put it back into a admissible state. This justifiesthe use of supervision and supervisory control anddata-acquisition (SCADA) systems that help humanoperators for plant monitoring and decision-making re-lated to the various corrective actions to be performed inorder to get back to a normal functioning situation (re-configuration, management of operating mode). Giventhe ever-increasing complexity of industrial processes,the burden itself tends to become difficult or even im-possible. For these reasons, much research is aimed atdeveloping and proposing solutions aimed at assistingthe human operator in the phases of reconfiguration.

42.3.2 Integrated Control, Maintenance,and Technical Management Systems

Further developments of integrated control and moni-toring systems have lead European projects in intelli-gent actuation and measurement [42.37–40] to demon-strate the benefit of integrating control, maintenance,and technical management (CMTM) activities [42.41]:

• To optimize control activities by exploiting the plantas efficiently as possible and taking into accountreal-time information about process status (deviceand function availability) provided by monitoringand maintenance activities• To optimize the scheduling of the maintenanceactivities by taking into account production con-straints and objectives• To optimize, by technical management based onvalidated information, the operation phase by modi-fying control or maintenance procedures, tools, andmaterials

Applying this principle at the shop-floor level of theproduction system consists of integrating the opera-tional activities of the CMM agents responsible forthe plant and its lower-level interfaces with the sys-tem devices. They are also linked with the businesslevel of the enterprise (enterprise resource planning,etc.) for business-to-manufacturing integration issues(manufacturing execution system (MES)). These oper-

ational activities are based on collaboration betweenhuman stakeholders and technical resources that sup-port schedule management, quality management, etc.,but also process management and maintenance manage-ment, which are more dependent on the e-Connectivityof the supporting devices.

The expected integrated organization for shop-flooractivities requires that information is made availablefor use by all the operational activities (MES orCMM). In this way, intelligence embedded in field de-vices (e.g., devices such as actuators, sensors, PLCs(programmable logic controllers), etc.) and digital com-munication provide a solution to an informationalrepresentation of the production process as efficientlyas possible: the system provides the right informationat the right time and at the right place. In other words,the closer the data representation (e.g., in an object-oriented system) to the physical and material flows, thebetter the semantics of its informational representationfor integration purposes (Fig. 42.7).

At the shop-floor level, local intelligence (software)allows distribution of information processing, informa-tion storage, and communication capabilities in fielddevices and adds to their classical roles new servicesrelated to monitoring, validation, evaluation, decisionmaking, etc., with regard to their own operations (in-creased degree of autonomy) but also their applicationcontext (increased degree of component interaction).

42.3.3 Remote and e-Maintenance

Modern production equipment (manufactured byoriginal equipment manufacturers, OEMs) is highlyspecialized; for example, a semiconductor manufactur-ing plant may have over 200 specialized productionstages and over 100 equipment suppliers. In a serialprocess of this type, all 200 steps must operate withinspecification to produce an operational semiconductorat the end of the line. This type of process requiresextraordinarily high reliability (and availability) of theOEM production equipment. When such equipmentmust be taken out of service, it is not uncommon to incurproduction loss rates of over 100 000 $/h, and there-fore accurate diagnosis and rapid repair of equipmentare essential. Since the year 2000, OEMs have increas-ingly provided network-capable diagnostic interfaces toequipment, so that experts do not have to come to thesite to make a diagnosis or repair, but can guide plantpersonnel in doing this, and can order and ship partsovernight. This is often termed e-Diagnostics, and iscrucial to maintaining high availability of production

PartE

42.3

Page 10: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

744 Part E Automation Management

equipment. Using e-Diagnostics, a manufacturer maymaintain remote service contracts with dozens of OEMsuppliers to assure reliable operation of an entire pro-duction process.

In some processes, where production equipment issubject to wear or usage that is predictably related, forexample, to the number of parts produced, it is possibleto forecast the need for inspection, repair, or periodicreplacement of critical parts, a process called prognos-tics. Although some statistical methods for prognostics(such as Weibull analysis) are well known, the abilityto accurately predict the need for service of an individ-ual part is still not well developed, and is not yet widelyaccepted. One goal of this type of analysis is condition-based maintenance (CBM), the practice of maintainingequipment based on its condition rather than on thebasis of a fixed schedule [42.43].

Proactive maintenance is a new maintenance pol-icy [42.44] based on prognostics, and improves oncondition-based maintenance (CBM). CBM acquiresreal-time information in order to propose actions and torepair only when maintenance is necessary. CBM con-

Automation

DCS

Data recordhistorian

Automation

DCS

Data recordhistorian

On-demand

Belgium

Belgium

e-Diagnosticcenter

Investigationcenter

France

Italy

On-demand

On-demand

Offline Online

Automation

DCS

Automation

DCS

Data recordhistorian

Automation

DCS

Data recordhistorian

Data recordhistorian

Netherlands

Automation

DCS DCS

Dataconcentrator“data hub”

Fig. 42.8 Distributed e-Maintenance infrastructure in a power energy plant [42.42] (DCS – distributed control system)

sists of equipment health monitoring to determine theequipment state; CBM is a kind of just-in-time main-tenance. CBM is not able to predict the future state ofequipment. The prognostic capability of the proactivemaintenance is based on the history of the equipmentoperation, its current state, and its future operating con-ditions. The objective of proactive maintenance is toknow if the system is able to accomplish its functionfor a given time (for example, until the next plant main-tenance shutdown).

Information from control systems (distributed ornot), automation, data-acquisition systems, and sensorsmakes it possible to measure variables continuously inorder to produce symptoms or indicators of malfunc-tion, to acquire the number of cycles of production,the time of production, the energies consumed, etc.,in order to correlate this information with the diagno-sis and assess the probabilities of root cause. Basedon these monitoring and diagnosis functions, proac-tive maintenance, thanks to prognosis, propagates thedrift of system behavior through time, taking into ac-count the future exploitation conditions. Based on this

PartE

42.3

Page 11: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

Reliability, Maintainability, and Safety 42.4 Challenges, Trends, and Open Issues 745

extrapolation, prognostics can be used to evaluate thetime when the drift will exceed a threshold and to pro-pose a time before next potential failure. In this way,proactive maintenance can optimize maintenance ac-tions and planning in order to minimize productiondowntime.

Proactive maintenance allows a maintenance ac-tion improvement (mean availability), to follow thedegradation tendency (quality of service), to avoid theoccurrence of dangerous situations (safety), and finallyto support the operator with knowledge oriented to thedegradation cause and effect (maintainability).

e-Maintenance is an organizational point of viewof maintenance. The concept of e-Maintenance comesfrom remote maintenance capabilities coupled withinformation and communication capabilities. Remotemaintenance was first a concept of remote data acquisi-tion or consultation. Data are accessible during a limitedtime. In order to realize e-Maintenance objectives datastorage must be organized to allow flexible access tohistorical data.

In order to improve remote maintenance, a new con-cept of e-Maintenance emerged at the end of the 1990s.The e-Maintenance concept integrates cooperation, col-laboration, and knowledge-sharing capabilities in orderto evolve the existing maintenance processes and to tryto tend towards new enterprise concepts: extended en-terprise, supply-chain management, lean maintenance,distributed support and expert centers, etc. Based onweb technologies, the e-Maintenance concept is nowa-days available and industrial e-Maintenance softwareplatforms exist. e-Maintenance platforms (sometimestermed asset management systems) manage the wholeof the maintenance processes throughout the systemlifecycle from engineering, maintenance, logistic, expe-rience feedback, maintenance knowledge capitalization,optimization, etc. to reengineering and revamping.

e-Maintenance is not based on software functions buton maintenance services that are well-defined, self-contained, and do not depend on the context or state ofother services. So, with the advent of service-orientedarchitectures (SOA) and enterprise service-bus tech-nologies [42.45], e-Maintenance platforms are easyto evolve and can provide interoperability, flexibility,adaptability, and agility. e-Maintenance platforms area kind of hub for maintenance services based on exist-ing, new, and future applications.

42.3.4 Industrial ApplicationsIndustrial software platforms have been developed dur-ing the 1990s in order to provide the proof of conceptof this RMS modeling framework before marketing off-the-shelf products. The first applications have appearedsince 2000 in various sectors such as power energy, steelfactory, petrochemical process, navy logistics and main-tenance support, nuclear fuel manufacturing and wastetreatment, etc.

A common objective of these multisector applica-tions is to reduce operation costs by increasing theavailability, maintainability, and reliability of plants andsystems, and to facilitate their compliance with regu-lation laws. Another common objective is to elicit andsave the implicit knowledge acquired by skilled oper-ators as well as by skilled engineers when performingtheir tasks. Others objectives are specific to an industrialsector; for example, understanding complex phenomenato anticipate maintenance operations is critical to opti-mize the impact of shutdown and startup operations inprocess plants [42.46].

Return on investment is estimated to be at most1 year from these industrial experiments, and leads toa distributed service-oriented e-Maintenance infrastruc-ture to warrantee by contract a level of availability inplant operation (Fig. 42.8).

42.4 Challenges, Trends, and Open Issues

All aspects of dependability such as reliability, main-tainability, and safety should be viewed in a broadercontext depending on both management and technicalprocesses within the enterprise system to ensure thenecessary resilience to intrinsic and extrinsic complexphenomena occurring when systems are operating inchanging environments. For example, MTBF is a mea-sure of the random nature of an event and does notpredict when something will fail but only predicts theprobability that a system will fail within a certain time

boundary. Contrary to conventional wisdom, accidentsoften result from interactions between perfectly func-tioning components, i. e., before a system has reachedits expected life as predicted by RMS analysis.

Such considerations underscore that other advancedconcepts are beyond traditional RMS analyses and theindividual mind-set of each engineering discipline tocope with emergent behavior as one of the results ofcomplexity. In other words, dependability assumes thatcause–effect relationships can be ordered in known and

PartE

42.4

Page 12: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

746 Part E Automation Management

knowable ways, while resilience [42.47] should con-fine the contextual emergence of complex relationshipswithin the system and between the system and its envi-ronment in unordered ways [42.48].

An initial challenge is to understand that the con-cept of system as the unique result of normal emergencewithin a collaborative systems engineering processleads to an ad hoc solution based on heuristics and nor-mative process-driven guidelines [42.49].

A second challenge relies on weak emergence[42.50] to perceive, model, and check added behaviors

due to the interactions between the component sys-tems. This should be led by extensive model-drivenrequirements analysis adding more details than currentpractices and complementary experiments such as mul-tiagent simulation to track self-organizing patterns inorder to improve component systems’ adaptability.

A third open challenge deals with the quality ofthe engineering process to determine whether a systemcan survive a strongly emergent event, as well as theadaptability of the whole enterprise to come into play infacing an inevitable systemic instability.

References

42.1 T.L. Johnson: Improving automation software de-pendability: a role for formal methods?, ControlEng. Pract. 15(11), 1403–1415 (2007)

42.2 J. Stark: Handbook of Manufacturing Automationand Integration (Auerbach, Boston 1989)

42.3 R.S. Dorf, A. Kusiak: Handbook of Design, Manu-facturing and Automation (Wiley, New York 1994)

42.4 A. Ollero, G. Morel, P. Bernus, S.Y. Nof, J. Sasi-adek, S. Boverie, H. Erbe, R. Goodall: From MEMSto enterprise systems, IFAC Annu. Rev. Control 26(2),151–162 (2002)

42.5 S.Y. Nof, G. Morel, L. Monostori, A. Molina, F. Filip:From plant and logistics control to multi-enterprisecollaboration, IFAC Annu. Rev. Control 30(1), 55–68(2006)

42.6 G. Morel, P. Valckenaers, J.M. Faure, C.E. Pereira,C. Diedrich: Manufacturing plant control challengesand issues, IFAC Control Eng. Pract. 15(11), 1321–1331(2007)

42.7 A. Avizienis, J.C. Laprie, B. Randell, C. Landwehr:Basic Concepts and Taxonomy of Dependable andSecure Computing, IEEE Trans. Dependable Secur.Comput. 1(1), 11–33 (2004)

42.8 S.E. Rigdon, A.P. Basu: Statistical Methods for theReliability of Repairable Systems (Lavoisier, Paris2000)

42.9 J. Moubray: Reliability-Centered Maintenance (In-dustrial, New York 1997)

42.10 A. Avizienis, J.C. Laprie, B. Randell: Fundamentalconcepts of dependability, LAAS Techn. Rep. 1145,1–19 (2001), http://www.laas.fr

42.11 J.W. Foster, D.T. Philips, T.R. Rogers: ReliabilityAvailability and Maintainability: The AssuranceTechnologies Applied to the Procurement of Pro-duction Systems (MA Press, 1979)

42.12 M. Pecht: Product Reliability, Maintainability andSupportability Handbook (CRC, New York 1995)

42.13 H. Erbe: Technologies for cost-effective automationin manufacturing, IFAC Professional Briefs (2003)pp. 1–32

42.14 IEEE: IEEE Standard Computer Dictionary: A Compi-lation of IEEE Standard Computer Glossaries (IEEE,1990), http://ieeexplore.ieee.org/xpls/abs_all.jsp?tp=&isnumber=4683&arnumber=182763&punumber=2267

42.15 D. Kumar, J. Crocker, J. Knezevic, M. El-Haram: Re-liability, Maintenance and Logistic Support. A lifeCycle Approach (Springer, Berlin, Heidelberg 2000)

42.16 IEC 61508: Functional safety of electrical/electronic/programmable electronic (E/E/PE) safety-relatedsystems

42.17 T. Nakagawa: Maintenance Theory of Reliability(Springer, London 2005)

42.18 W.E. Deming: Out of the Crisis: For Industry, Gov-ernment, Education (MIT Press, Cambridge 2000)

42.19 C.G. Cassandras, S. Lafortune: Introduction to Dis-crete Event Systems (Kluwer Academic, Norwell1999)

42.20 F. Lhote, P. Chazelet, M. Dulmet: The extension ofprinciples of cybernetics towards engineering andmanufacturing, Annu. Rev. Control 23(1), 139–148(1999)

42.21 N. Viswanadham, Y. Narahari: Performance Mod-eling of Automated Manufacturing Systems(Prentice-Hall, Englewood Cliffs 1992)

42.22 http://www.sei.cmu.edu/cmmi42.23 http://www.oooneida.info42.24 M.C. Paulk: How ISO 9001 compares with the CMM,

IEEE Softw. 12(1), 74–83 (1995)42.25 K. Polzer: Ease of use in engineering – availability

and safety during runtime, Autom. Technol. Pract.1, 49–60 (2004)

42.26 T. Shell: Systems functions implementation andbehavioural modelling: system theoretic approach,Int. J. Syst. Eng. 4(1), 58–75 (2001)

42.27 A. Moik: Engineering-related formal method forthe development of safe industrial automationsystems, Autom. Technol. Pract. 1, 45–53 (2003)

42.28 E.M. Clarke, O. Grunberg, D.A. Peled: Model Check-ing (MIT Press, Cambridge 2000)

PartE

42

Page 13: 735 Reliability, M - Springerextras.springer.com/2009/978-3-540-78830-0/11605119/11605119-c … · goods [42.2–6]. The resulting behavior of these IMT-based automation systems is

Reliability, Maintainability, and Safety References 747

42.29 J.R. Abrial: The B Book: Assigning Programs toMeanings (Cambridge Univ. Press, Cambridge 1996)

42.30 T. Kim, D. Stringer-Calvert, S. Cha: Formal verifica-tion of functional properties of a SCR-style softwarerequirements specification using PVS, Reliab. Eng.Syst. Saf. 87, 351–363 (2005)

42.31 J. Yoo, T. Kim, S. Cha, J.-S. Lee, H.S. Son: A for-mal software requirements specification methodfor digital nuclear plant protection systems, Syst.Softw. 74(1), 73–83 (2005)

42.32 S. Elkhattabi, D. Corbeel, J.C. Gentina: Integra-tion of dependability in the conception of FMS, 7thIFAC Symp. on Inf. Control Probl. Manuf. Technol.,Toronto (1992) pp. 169–174

42.33 R. Vogrig, P. Baracos, P. Lhoste, G. Morel, B. Salze-mann: Flexible manufacturing shop, Manuf. Syst.16(3), 43–55 (1987)

42.34 E. Zamaï, A. Chaillet-Subias, M. Combacau: Anarchitecture for control and monitoring of dis-crete events systems, Comput. Ind. 36(1–2), 95–100(1998)

42.35 A.K.A. Toguyeni, E. Craye, L. Sekhri: Study of thediagnosability of automated production systemsbased on functional graphs, Math. Comput. Simul.70(5–6), 377–393 (2006)

42.36 M.G. Mehrabi, A.G. Ulsoy, Y. Koren: Reconfigurablemanufacturing systems: key to future manufactur-ing, J. Intell. Manuf. 11(4), 403–419 (2000)

42.37 ESPRIT II-2172 DIAS Distributed Intelligent Actuatorsand Sensors

42.38 ESPRIT III-6188 PRIAM Pre-normative Requirementsfor Intelligent Actuation and Measurement

42.39 ESPRIT III-6244 EIAMUG European Intelligent Actu-ation and Measurement User Group

42.40 ESPRIT IV-23525 IAM-PILOT Intelligent Actuation andMeasurement Pilot

42.41 J.F. Pétin, B. Iung, G. Morel: Distributed intelligentactuation and measurement system within an in-tegrated shop-floor organisation, Comput. Ind. J.37, 197–211 (1998)

42.42 http://www.predict.fr42.43 http://www.openoandm.org42.44 B. Iung, G. Morel, J.-B. Léger: Proactive main-

tenance strategy for harbour crane operationimprovement, Robotica 21, 313–324 (2003)

42.45 F.B. Vernadat: Interoperable enterprise systems:Principles, concepts and methods, IFAC Annu. Rev.Control. 31(1), 137–145 (2007)

42.46 D. Galara: Roadmap to master the complexityof process operation to help operators improvesafety, productivity and reduce environmental im-pact, Annu. Rev. Control 30, 215–222 (2006)

42.47 http://www.resilience-engineering.org42.48 C.F. Kurtz, D.J. Snowden: The new dynamics of

strategy: sense-making in a complex and compli-cated world, IBM Syst. J. 42(3), 462–483 (2003)

42.49 ISO/IEC 15288, http://www.incose.org

42.50 M. Bedau: Weak Emergence, Philosophical Per-spectives: Mind, Causation and World, Vol. 11(Blackwell, Oxford 1997)

PartE

42