bayesian modeling of multi-state hierarchical systems with multi-level information aggregation

7
Bayesian modeling of multi-state hierarchical systems with multi-level information aggregation Mingyang Li a , Jian Liu a,n , Jing Li b , Byoung Uk Kim c a Department of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721, USA b Industrial Engineering, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85287, USA c Autonomous Control Branch, Air Force Research Laboratory Wright-Patterson AFB, OH 45433, USA article info Article history: Received 22 February 2013 Received in revised form 22 October 2013 Accepted 10 December 2013 Available online 18 December 2013 Keywords: System reliability Multiple failure states Hierarchical structure Bayesian networks Prior elicitation abstract Reliability modeling of multi-state hierarchical systems is challenging because of the complex system structures and imbalanced reliability information available at different system levels. This paper proposes a Bayesian multi-level information aggregation approach to model the reliability of multi- level hierarchical systems by utilizing all available reliability information throughout the system. Cascading failure dependency among components and/or sub-systems at the same level is explicitly considered. The proposed methodology can signicantly improve the accuracy of system-level reliability modeling. A case study demonstrates the effectiveness of the proposed methodology. & 2013 Elsevier Ltd. All rights reserved. 1. Introduction Hierarchical system structures are widely adopted in the design of complex engineering systems for its advantages of scalability, tractability and modularity [1]. A system is dened as hierarch- icalif it consists of multiple sub-systems, which may consist of multiple sub-systems/components themselves. A Hierarchical sys- tem often contains multiple levels of hierarchy. Fig. 1 shows a three-level Electro-Mechanical Actuator (EMA) system. The system consists of two sub-systems, i.e., a Motor Power Supply (PS) sub- system and an Actuator Servo Drive (ASD) sub-system. Each sub- system consists of a number of sub-systems/components, e.g., ASD sub-system consists of a Pulse-Width Modulation (PWM) Con- troller, an H-Bridge Circuit and a Direct Current (DC) Motor. A general hierarchical system may consist of more than three levels. In this paper, for the convenience of describing a general hierarchical system, a functional unit at the lth level of a system is dened as a level-l element, i.e., the system is dened as the level- 1 element, a sub-system it consists of is called a level-2 element, and a sub-system/component of a level-2 element is called a level-3 element, etc. According to this denition, the EMA system in Fig. 1 is called a level-1 element, the sub-systems are called level-2 elements, and the sub-systems/components of a level-2 element are called level-3 elements. Such elements in a hierarch- ical system are interconnected and interacting with each other, jointly contributing to the whole system functionality. In many real-world situations, performance levels or failure modes of elements may not be restricted to binary values, i.e., functioning and failed. For example, a power generating system can function at different capacity levels and partial failure of its composing elements may result in different reduced capacity levels [2]. Another example is the valves used in a uid control system, which has two common failure modes of stuck-openand stuck-closed[3]. It is noted that a major difference between performance levels and failure modes is that the latter does not have intrinsic order. In this paper, this difference is neglected and thus they are dened interchangeably as failure states. A hierarchical system with multi-state elements is therefore dened as a multi-state hierarchical system (MSHS) [4]. The scope of this paper will be restricted to MSHSs. It is a subset of multi-state reliability systems and the hierarchyhere corresponds to a system structure. Reliability modeling of multi-state systems is challenging and traditional Boolean reliability theory is no longer appropriate. Recent years witness the development of effective approaches (e.g., stochastic process approach, universal generating function approach, etc.) in modeling multi-state system reliability, as summarized in [4]. For the MSHSs, one critical issue is the failure relationship representation. Representation of failure relationship among MSHS elements is signicantly more complex than the Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/ress Reliability Engineering and System Safety 0951-8320/$ - see front matter & 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ress.2013.12.001 n Corresponding author. E-mail address: [email protected] (J. Liu). Reliability Engineering and System Safety 124 (2014) 158164

Upload: byoung

Post on 21-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian modeling of multi-state hierarchical systems with multi-level information aggregation

Bayesian modeling of multi-state hierarchical systemswith multi-level information aggregation

Mingyang Li a, Jian Liu a,n, Jing Li b, Byoung Uk Kim c

a Department of Systems and Industrial Engineering, The University of Arizona, Tucson, AZ 85721, USAb Industrial Engineering, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85287, USAc Autonomous Control Branch, Air Force Research Laboratory Wright-Patterson AFB, OH 45433, USA

a r t i c l e i n f o

Article history:Received 22 February 2013Received in revised form22 October 2013Accepted 10 December 2013Available online 18 December 2013

Keywords:System reliabilityMultiple failure statesHierarchical structureBayesian networksPrior elicitation

a b s t r a c t

Reliability modeling of multi-state hierarchical systems is challenging because of the complex systemstructures and imbalanced reliability information available at different system levels. This paperproposes a Bayesian multi-level information aggregation approach to model the reliability of multi-level hierarchical systems by utilizing all available reliability information throughout the system.Cascading failure dependency among components and/or sub-systems at the same level is explicitlyconsidered. The proposed methodology can significantly improve the accuracy of system-level reliabilitymodeling. A case study demonstrates the effectiveness of the proposed methodology.

& 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Hierarchical system structures are widely adopted in the designof complex engineering systems for its advantages of scalability,tractability and modularity [1]. A system is defined as “hierarch-ical” if it consists of multiple sub-systems, which may consist ofmultiple sub-systems/components themselves. A Hierarchical sys-tem often contains multiple levels of hierarchy. Fig. 1 shows athree-level Electro-Mechanical Actuator (EMA) system. The systemconsists of two sub-systems, i.e., a Motor Power Supply (PS) sub-system and an Actuator Servo Drive (ASD) sub-system. Each sub-system consists of a number of sub-systems/components, e.g., ASDsub-system consists of a Pulse-Width Modulation (PWM) Con-troller, an H-Bridge Circuit and a Direct Current (DC) Motor.A general hierarchical system may consist of more than threelevels. In this paper, for the convenience of describing a generalhierarchical system, a functional unit at the lth level of a system isdefined as a level-l element, i.e., the system is defined as the level-1 element, a sub-system it consists of is called a level-2 element,and a sub-system/component of a level-2 element is called alevel-3 element, etc. According to this definition, the EMA systemin Fig. 1 is called a level-1 element, the sub-systems are calledlevel-2 elements, and the sub-systems/components of a level-2

element are called level-3 elements. Such elements in a hierarch-ical system are interconnected and interacting with each other,jointly contributing to the whole system functionality.

In many real-world situations, performance levels or failuremodes of elements may not be restricted to binary values, i.e.,functioning and failed. For example, a power generating systemcan function at different capacity levels and partial failure of itscomposing elements may result in different reduced capacitylevels [2]. Another example is the valves used in a fluid controlsystem, which has two common failure modes of “stuck-open” and“stuck-closed” [3]. It is noted that a major difference betweenperformance levels and failure modes is that the latter does nothave intrinsic order. In this paper, this difference is neglected andthus they are defined interchangeably as “failure states” . Ahierarchical system with multi-state elements is therefore definedas a multi-state hierarchical system (MSHS) [4]. The scope of thispaper will be restricted to MSHSs. It is a subset of multi-statereliability systems and the “hierarchy” here corresponds to asystem structure.

Reliability modeling of multi-state systems is challenging andtraditional Boolean reliability theory is no longer appropriate.Recent years witness the development of effective approaches(e.g., stochastic process approach, universal generating functionapproach, etc.) in modeling multi-state system reliability, assummarized in [4]. For the MSHSs, one critical issue is the failurerelationship representation. Representation of failure relationshipamong MSHS elements is significantly more complex than the

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/ress

Reliability Engineering and System Safety

0951-8320/$ - see front matter & 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.ress.2013.12.001

n Corresponding author.E-mail address: [email protected] (J. Liu).

Reliability Engineering and System Safety 124 (2014) 158–164

Page 2: Bayesian modeling of multi-state hierarchical systems with multi-level information aggregation

representation of typical series or parallel systems with binaryfailure states. The complexity is caused by the inter-level failurerelationship randomness and intra-level failure dependency. Inter-level failure relationship randomness refers to the probabilisticinter-level failure relationship between elements at two adjacentlevels [5]. That is, for example, given a combinatorial states of thecomposing sub-systems, a system's failure is not deterministic.Rather, the failure may be associated with a probability between0 and 1. The intra-level failure dependency refers to the failuredependency among system elements at the same level. The intra-level failure dependency can be categorized into common-causefailure, cascading failure and negative dependency failure [6]. Thecommon-cause failure relationship have been extensively studied[7,8] and the negative dependency failure is rarely encounteredand can be treated similarly as cascading failure [6], which is thefocus of this paper. Cascading failure is defined as the relationshipthat the under-performance or failure of one element in ahierarchical system will subsequently influence the performanceor trigger the failure of other elements at the same level [9]. Forexample, in a multiprocessor system, if one sub-system, the powersupply, is at its failure state of “over-voltage”, it will increase theprobability for the other sub-systems, processors, to break down[5]. In this paper, a Bayesian Network (BN) [10] is employed toaddress the aforementioned complexity of probabilistic inter-levelfailure relationship and cascading intra-level failure dependency.Compared to traditional representation methods, such as FaultTree or Block Diagram, BN allows more flexible representation offailure relationship, as summarized in [10,11], and has beensuccessfully applied in many multi-state engineering systems[12–14].

Another critical issue in the reliability engineering modeling ofMSHSs is the imbalanced reliability information available forelements at different levels. Reliability information is defined asthe reliability test data and prior knowledge for elements. Priorknowledge includes historical studies (e.g., past reliability testresults, warranty and maintenance records, etc.) and domainknowledge (e.g., expert's judgment, engineering experience, etc.).With the development of reliability data collecting technology andinformation storage systems, reliability information for elementscloser to the bottom of the hierarchy of an MSHS, e.g., components,is either abundantly available or easily accessible. This is becausethese elements are often selected from standard products withhigh volume of production and deployment. Comprehensiveknowledge and empirical data on failures can be accumulatedfrom operations, maintenance, and inexpensive reliability tests.However, reliability information is often scarce or even absent forelements closer to the top of the hierarchy, especially for thesystem as a whole, since reliability tests are often costly and time-consuming and accumulated knowledge is limited. Therefore, toestimate the reliability of an MSHS, it is desirable to integrate theavailable reliability information from elements at all levels. In theexisting literature, Bayesian approaches are a popular choicedue to the capability in multi-source data fusion [15,16]. Forexample, Martz et al. [17,18] proposed a multi-level informationaggregation approach for series and parallel systems with binaryfailure states; Hulting and Robinson [19] extended the method tofailure-time data in series system structures; Li et al. [20] proposeda semi-parametric modeling approach for hierarchical systems

with multi-level information aggregation; Johnson et al. [21]present a “full-Bayesian approach” to integrate all reliabilityinformation of the system. Recent work was well summarized in[22,23]. However, most of the existing work is based on theassumption of independent failure relationship and, thus, cannotbe applied to MSHS with inter-level probabilistic failure relation-ship and intra-level cascading failures.

To fill out the gap of existing research and provide a genericinformation aggregation framework, a Bayesian multi-level infor-mation aggregation approach is proposed in this paper to estimatethe system reliability of an MSHS. The proposed method modelssystem reliability by simultaneously considering: (i) multiple fail-ure states of an element; (ii) inter-level probabilistic failure relation-ship; (iii) intra-level cascading failure dependency. The rest of thispaper is organized as follows: Section 2 presents the BN repre-sentation of the failure relationship among elements in an MSHS.Section 3 gives a detailed illustration of the proposed multi-levelinformation aggregation method. A numerical case study pre-sented in Section 4 demonstrates the effectiveness of the proposedmethod and Section 5 concludes the paper.

2. BN representation of MSHS

In this paper, a BN is employed to represent the systemstructure of an MSHS. As compared to traditional methods, suchas Fault Tree, BN possesses the following attractive features inrepresenting failure relationship of an MSHS: (i) Fault Tree mainlyrepresents elements with binary failure states, whereas BN coulddeal with multiple failure states; (ii) Logic gates in Fault Tree, suchas “AND” gate and “OR” gate, can only represent simple, determi-nistic inter-level failure relationship, whereas conditional prob-ability tables in BN could represent complex, probabilisticrelationship with deterministic relationship being just a specialcase (i.e., probability 0 or 1); (iii) Fault tree represents elements atthe same level with independence assumption, whereas BNreleases such assumption by allowing intra-level cascading depen-dency among elements.

A general BN consists of nodes, fX1;…;Xng, and directed arcsbetween some nodes. Fig. 2 shows a BN with 5 nodes and 7 arcs[24,25]. Each node Xi is a random variable. If there is a directed arcfrom Xi to Xj, Xi is called a “parent” of Xj . There may be no or morethan one parents for each node. An arc characterizes the prob-abilistic dependency of a node on its parent nodes. That is,depending on the values a node's parents take on, the conditionalprobability distribution of the node may be different. In this paper,a BN is employed to represent the cause-and-effect failure rela-tionship among elements of an MSHS, in which nodes areelements of the MSHS. Since elements are at different levels, a

Fig. 1. Block diagram for an EMA system.

Fig. 2. A general BN structure.

M. Li et al. / Reliability Engineering and System Safety 124 (2014) 158–164 159

Page 3: Bayesian modeling of multi-state hierarchical systems with multi-level information aggregation

slightly different notation is adopted for nodes, i.e., each node/element is denoted by Xðl;mÞ. l represents levels, l¼ 1;…; L, and mrepresents elements at level l, m¼ 1;…;Ml. According to thisdefinition, Xðl;mÞ is the mth element at level l. The values of Xðl;mÞare failure states, f1;2;…;K ðl;mÞg. Discrete values, 1;2;…;K ðl;mÞ,correspond to “perfect functioning” to “complete failure” of Xðl;mÞ.

Furthermore, arcs in the BN representation of an MSHS havesome new interpretations. Specifically, arcs between nodes at thesame level represent intra-level cascading failures; arcs betweennodes at different levels represent inter-level failure relationship,e.g., how the failure of sub-systems affect the failure of the system.Both of the aforementioned failure dependency relationships arein one-way rather than two-way directions. Also, there are tworestrictions in the BN representation of an MSHS, compared to ageneral BN. First, arcs from level l1 to l2, where l1o l2, are notallowed. It emphasizes that failure of an element can be caused byits composing elements and the reverse is incorrect. Second, arcsfrom non-adjacent levels are not allowed. It indicates that relia-bility of a component cannot influence the reliability of the systemdirectly but has to be through the sub-system it belongs to. Fig. 3provides a graphical description of the proposed BN representa-tion of an MSHS.

In addition to nodes and arcs, some other notations andconcepts are introduced here and to be used in the followingsections. PAðXðl;mÞÞ denotes the parent set of Xðl;mÞ. It is noted thatPAðXðl;mÞÞ can include nodes/elements at the same level as Xðl;mÞand nodes/elements at level lþ1. PrðXðl;mÞjPAðXðl;mÞÞÞ denotes theconditional probability of Xðl;mÞ given its parent set. Since nodestake discrete values (failure states), this quantity can be specifiedby a conditional probability table (CPT). Joint probability of Xðl;mÞand its non-empty parents nodes can be obtained by decomposingtheir joint probability as

PrðXðl;mÞ;PAðXðl;mÞÞÞ ¼ PrðXðl;mÞ jPAðXðl;mÞÞÞ � PrðPAðXðl;mÞÞÞ: ð1Þ

Based on Eq. (1), the joint probability PrðPAðXðl;mÞÞÞ can be decom-posed recursively. Once the joint probability is obtained, marginalprobability of any node can be further expressed as

PrðXðl;mÞÞ ¼ ∑PAðXðl;mÞÞ

PrðXðl;mÞ;PAðXðl;mÞÞÞ: ð2Þ

CPTs in the BN can be categorized as intra-level CPTs and inter-level CPTs according to whether the corresponding parents arefrom the same level or not. Inter-level CPT can be expressed asPrðXðl;mÞjPA1ðXðl;mÞÞÞ, where PA1ðXðl;mÞÞ is a set of parents at adifferent level, i.e., level lþ1, from Xðl;mÞ. Similarly, intra-levelCPT can be expressed as PrðXðl;mÞjPA2ðXðl;mÞÞÞ, where PA2ðXðl;mÞÞ isa set of parents at the same level l as Xðl;mÞ. Inter-level CPTscharacterize inter-level failure relationship; intra-level CPTs char-acterize intra-level cascading failure dependency. For example, inFig. 3, the CPTs for Xðl;1Þ given PA1ðXðl;1ÞÞ ¼ fXðlþ1;1Þ;Xðlþ1;2Þg char-acterize how the failure states of components Xðlþ1;1Þ and Xðlþ1;2Þaffect the failure of sub-system Xðl;1Þ. The CPT for Xðlþ1;1Þ givenPA2ðXðlþ1;1ÞÞ ¼ fXðlþ1;2Þg characterizes the cascading effect of failure

of component Xðlþ1;2Þ on failure of component Xðlþ1;1Þ . In thispaper, both inter-level and intra-level CPTs are assumed as knownand deterministic values, which can be specified from domainknowledge of expertise or learned from the reliability test data[10,25].

3. Bayesian multi-level information aggregation

This section presents the details of the proposed multi-levelinformation aggregation method, which is developed in this paperto resolve the reliability information imbalance issue and improvethe accuracy of the reliability modeling of the MSHS .

3.1. An illustrative example

Fig. 4 shows the BN that represents the EMA system in Fig. 1,which is used to illustrate the rationale of the proposed method.The general procedure of the method is discussed in the next sub-section.

The goal of the proposed modeling method is to better estimatethe reliability of the system, i.e., to estimate the probabilitydistribution of the failure states for node Xð1;1Þ, which is denotedby PrðXð1;1ÞÞ. According to the BN in Fig. 4 and Eqs. (1) and (2),PrðXð1;1ÞÞ can be decomposed as

PrðXð1;1ÞÞ ¼ ∑Xð2;1Þ ;Xð2;2Þ

PrðXð1;1Þ jXð2;1Þ;Xð2;2ÞÞ

�PrðXð2;2Þ jXð2;1ÞÞ � PrðXð2;1ÞÞ; ð3Þwhere PrðXð2;2ÞjXð2;1ÞÞ and PrðXð2;1ÞÞ can be further decomposed as

PrðXð2;2Þ jXð2;1ÞÞ ¼ ∑Xð3;3Þ ;Xð3;4Þ ;Xð3;5Þ

PrðXð2;2Þ jXð2;1Þ;Xð3;3Þ;Xð3;4Þ;Xð3;5ÞÞ

�PrðXð3;4Þ jXð3;3Þ;Xð3;5ÞÞ � PrðXð3;3ÞÞ � PrðXð3;5ÞÞ; ð4Þ

PrðXð2;1ÞÞ ¼ ∑Xð3;1Þ ;Xð3;2Þ

PrðXð2;1Þ jXð3;1Þ;Xð3;2ÞÞ � PrðXð3;1Þ jXð3;2ÞÞ � PrðXð3;2ÞÞ: ð5Þ

The proposed multi-level information aggregation method is abottom-up approach. Specifically, it starts from the bottom level,i.e., level 3, and utilizes the Bayesian framework to integrate thereliability test data for Xð3;2Þ and potential prior knowledge. Thiswill produce a posterior probability distribution for Xð3;2Þ. Xð3;1Þ is

Fig. 3. BN representation of MSHS.

Fig. 4. BN representation of the EMA system.

M. Li et al. / Reliability Engineering and System Safety 124 (2014) 158–164160

Page 4: Bayesian modeling of multi-state hierarchical systems with multi-level information aggregation

at the same level as Xð3;2Þ and is an element with cascading failure,i.e., the probability of failure for Xð3;1Þ depends on the failure stateof Xð3;2Þ. This dependency is characterized by PrðXð3;1ÞjXð3;2ÞÞ, whichis available from the CPT for Xð3;1Þ. Xð2;1Þ is at level 2 and itsdependency on level-3 elements is characterized byPrðXð2;1ÞjXð3;1Þ;Xð3;2ÞÞ, which is available from the CPT for Xð2;1Þ. Byutilizing these CPTs and the posterior probability distribution forXð3;2Þ, PrðXð2;1ÞÞ can be obtained according to Eq. (5). This completesinformation aggregation from level 3 to level 2.

Next, the same procedure is repeated to aggregate informationfrom level 2 to level 1. Specifically, the PrðXð2;1ÞÞ obtained pre-viously utilizes information at level 3 but not the reliabilityinformation for Xð2;1Þ, so this PrðXð2;1ÞÞ can be considered as acertain type of “prior” distribution for Xð2;1Þ, which is called the“induced prior” in this paper. An induced prior is defined thereforeas the prior elicited based on aggregated information. The inducedprior can be combined with the “native” prior in some way andthen integrated with the reliability test data through a Bayesianframework. A native prior is defined as the prior elicited based onprior knowledge regarding an element itself. This will produce aposterior probability distribution for Xð2;1Þ. Then, the posteriorprobability distribution for Xð2;1Þ and the intra- and inter- levelCPTs, i.e., PrðXð2;2ÞjXð2;1ÞÞ and PrðXð1;1ÞjXð2;1Þ;Xð2;2ÞÞ, respectively, canbe used to obtain PrðXð1;1ÞÞ according to Eq. (3). This completes theestimation for the system reliability. Of course, this estimation is infact an induced prior for Xð1;1Þ, which can be combined with thenative prior and system test data if available to obtain an estimatefor the posterior of system reliability.

It can be seen from this illustrative example that the proposedmethod can effectively integrate all available information at alllevels of the system to estimate the system reliability. It isespecially useful for addressing reliability information imbalance,which is a common and prominent problem for multi-levelhierarchical systems. In an MSHS, reliability information forelements closer to the bottom of the hierarchy’ such as thecomponents’ is either abundant or easily accessible. But informa-tion for elements closer to the top of the hierarchy such as the sub-systems or the whole system’ may be scarce or absent. Inadequateinformation will affect the accuracy of system reliability modeling[20]. This issue is effectively tackled by the proposed method byaggregating potentially rich reliability information in a bottom-upfashion.

This example also reveals several technical issues to beaddressed, including how to integrate reliability test data andprior (native prior or a combination of induced and native priors)in a Bayesian framework (Section 3.2.1) and how to aggregateinformation from one level to an adjacent upper-level (Section3.2.2). These will be discussed in details in Section 3.2.

3.2. Proposed method for multi-level information aggregation

To facilitate illustration of the proposed method, elements in aBN representation are categorized into three types: (i) type-Ielement, XI

ðl;mÞ, i.e., PAðXIðl;mÞÞ ¼∅, which does not have any parent

node; (ii) type-II elements, XIIðl;mÞ i.e., PA1ðXII

ðl;mÞÞa∅ andPA2ðXII

ðl;mÞÞ ¼∅, which only has parent node(s) from a different

level; (iii) type-III elements, XIIIðl;mÞ, i.e., PA2ðXIII

ðl;mÞÞa∅, which has atleast one parent node from the same level. It is noted thatPA1ðXðl;mÞÞ and PA2ðXðl;mÞÞ refer to parent sets including nodes ata different level from Xðl;mÞ and at the same level as Xðl;mÞ,respectively. Fig. 5 gives a graphical illustration of threeelement types.

3.2.1. Integration of prior and test data in a Bayesian frameworkFor the mth element at the lth level, assuming that Nðl;mÞ

identical units are tested and Yk;ðl;mÞ is the number of unitswith failure state “k” at the end of the test, reliability test datacan be expressed as a K ðl;mÞ-dimension random vector;Yðl;mÞ ¼ ½Y1;ðl;mÞ;…;YKðl;mÞ ;ðl;mÞ�T. Since Xðl;mÞ can take on one of K ðl;mÞpossible mutually exclusive outcomes with probabilityPrðXðl;mÞ ¼ kðl;mÞÞ ¼ pkðl;mÞ ;ðl;mÞ , kðl;mÞ ¼ 1;…;K ðl;mÞ, Yðl;mÞ can be modeledby a multinomial distribution, i.e.,

f ðyðl;mÞ jpðl;mÞÞ ¼Nðl;mÞ! � ∏Kðl;mÞ

k ¼ 1

pyk;ðl;mÞk;ðl;mÞ

yk;ðl;mÞ!; ð6Þ

where pðl;mÞ ¼ ½p1;ðl;mÞ;…; pKðl;mÞ ;ðl;mÞ�T and yðl;mÞ is the correspondingrealization vector of Yðl;mÞ. In the Bayesian modeling framework,pðl;mÞ is treated as a random vector. A prior distribution for pðl;mÞ canbe specified based on the available knowledge. As discussed inSection 3.1, there are two types of priors in an MSHS, a native priorbased on prior knowledge and an induced prior based on informa-tion aggregated from lower levels. A type-I element (e.g., Xð3;2Þ inFig. 4) has only one native prior, since no parent node exists andprior distribution can only be elicited from the prior knowledge ofitself. Dirichlet is chosen as the prior distribution for pðl;mÞ due toits mathematical convenience as being the conjugate prior formultinomial distributions and the flexibility in representing priorknowledge [22,26]. Specifically, the Dirichlet distribution can beexpressed as

πðpðl;mÞ jαNðl;mÞÞ ¼

Γð∑Kðl;mÞk ¼ 1α

Nk;ðl;mÞÞ

∏K ðl;mÞk ¼ 1ΓðαN

k;ðl;mÞÞ� ∏K ðl;mÞ

k ¼ 1pαNk;ðl;mÞ �1

k;ðl;mÞ ; ð7Þ

where αNðl;mÞ is a hyper-parameter vector ½αN

1;ðl;mÞ;…;αNK ðl;mÞ ;ðl;mÞ�T. By

integrating Eqs. (6) and (7), the posterior for pIðl;mÞ is given by

πðpIðl;mÞ jyðl;mÞ;αN

ðl;mÞÞ ¼Γð∑Kðl;mÞ

k ¼ 1yk;ðl;mÞ þαNk;ðl;mÞÞ

∏K ðl;mÞk ¼ 1ΓðαN

k;ðl;mÞ þyk;ðl;mÞÞ

� ∏Kðl;mÞ

k ¼ 1ðpIk;ðl;mÞÞα

Nk;ðl;mÞ þyk;ðl;mÞ �1; ð8Þ

where pIðl;mÞ denotes the posterior of failure states probabilities for

type-I element.A more complicated situation is when the prior includes a

native prior and an induced prior. This is true for type-II elements(e.g., Xð2;1Þ in Fig. 4). The difficulty is how to combine the two typesof priors and how to integrate the combined prior with the testdata, which will be discussed in Section 3.2.2 steps 3 and 4,respectively.

3.2.2. Inter-level information aggregationInter-level information aggregation is a four-step recursive

procedure to aggregate reliability information among elementsat two adjacent levels. It starts at the bottom level of the hierarchyand recursively aggregates information in a bottom-up fashionuntil the system-level model is achieved. Note that this procedureonly applies to type-II elements. Without loss of generality, thefour-step procedure at two adjacent level of lþ1 and l is explainedas following.Fig. 5. Graphical illustration of element types.

M. Li et al. / Reliability Engineering and System Safety 124 (2014) 158–164 161

Page 5: Bayesian modeling of multi-state hierarchical systems with multi-level information aggregation

Step 1: Information aggregation and induced prior generation. Fora type-II element XII

ðl;mÞ, its reliability can be expressed as

PrðXIIðl;mÞÞ ¼ ∑

PA1ðXIIðl;mÞ Þ

PrðXIIðl;mÞ jPA1ðXII

ðl;mÞÞÞ � PrðPA1ðXIIðl;mÞÞÞ; ð9Þ

PA1ðXIIðl;mÞÞ may include type-I, -II and -III elements. Therefore,

based on intra-level CPTs, PrðPA1ðXIIðl;mÞÞÞ can be further expressed

as a product of the three types of elements, i.e.,

PrðPA1ðXIIðl;mÞÞÞ ¼ ∏

XIIIðlþ 1;m3 Þ

APA1ðXIIðl;mÞÞ

PrðXIIIðlþ1;m3Þ jPA2ðXIII

ðlþ1;m3ÞÞÞ�

� ∏XIIðlþ 1;m2 Þ

APA1ðXIIðl;mÞÞ

PrðXIIðlþ1;m2ÞÞ�

� ∏XIðlþ 1;m1 Þ

APA1ðXIIðl;mÞÞ

PrðXIðlþ1;m1ÞÞ: ð10Þ

In Eq. (10), PrðXIðlþ1;m1ÞÞ can be computed by Eq. (8). PrðXII

ðlþ1;m2ÞÞcan be obtained from the previous iteration of the currentlydiscussed inter-level aggregation procedure for type-II elements,since XII

ðlþ1;m2Þ is also a type-II element. PrðXIIIðlþ1;m3ÞjPA2ðXIII

ðlþ1;m3ÞÞÞcan be obtained from the CPTs for XIII

ðlþ1;m3Þ. For example, Xð1;1Þ andXð2;1Þ in Fig. 4 can be explicitly expressed as Eqs. (3) and (5) basedon Eqs. (9) and (10).

Step 2: Induced prior approximation. The induced prior obtainedin Eq. (9) is not in closed-form and it will be computationallychallenging to directly integrate such information with reliabilityinformation at level l. Since the native prior for XII

ðl;mÞ follows aDirichlet distribution as assumed in Eq. (7), it will be mathema-tically convenient for the later prior combination step if theinduced prior follows, at least approximately, the same distribu-tion as the native prior. To generate an approximation for theinduced prior, Monte Carlo simulation is employed to obtainempirical large samples for PrðXII

ðl;mÞÞ. The hyper-parameter vectorαIIðl;mÞ of pII

ðl;mÞ can be obtained through the maximal likelihoodestimation [27] by fitting a Dirichlet distribution to the generatedsamples.

Step 3: Prior combination. The induced prior can be combinedwith the native prior of XII

ðl;mÞ to formulate a new combined prior,pCðl;mÞ, as pC

ðl;mÞ �DirichðαCðl;mÞÞ. The hyper-parameter vector αC

ðl;mÞ isgiven by

αCðl;mÞ ¼wðl;mÞ � αII

ðl;mÞ þð1�wðl;mÞÞ � αNðl;mÞ; ð11Þ

where wðl;mÞA ½0;1� is a weighting coefficient that balances theinfluence of the native prior, αN

ðl;mÞ, and that of the induced prior,αIIðl;mÞ. Native prior parameters can be elicited based on prior

knowledge regarding the element itself while induced priorparameters are elicited from step 2 of prior approximation.Weighting coefficient can be specified by the practitioners. Ageneral guidance is that if the native prior is biased while theinduced prior is accurate with high confidence, a close-to-onewðl;mÞ is assigned in Eq. (11) to ensure the dominance of theinduced prior in constituting the combined prior; on the contrary,if the native prior is assumed accurate with high confidence due toabundant available prior knowledge, a smaller or close-to-onewðl;mÞ can be set instead to reduce the influence of the inducedprior. Eq. (11) provides a generic form in combining the nativeprior and induced prior in order to obtain a more reliable andaccurate prior for Xðl;mÞ.

Step 4: Posterior computation. The combined prior pCðl;mÞ resolves

the information inadequacy problem for elements at level l. Tointegrate such prior knowledge with the available multinomial testdata for Xðl;mÞ, Eq. (8) can be modified for type-II elements as

πðpIIðl;mÞ jyðl;mÞ;αC

ðl;mÞÞ ¼Γð∑K ðl;mÞ

k ¼ 1yk;ðl;mÞ þαCk;ðl;mÞÞ

∏K ðl;mÞk ¼ 1ΓðαC

k;ðl;mÞ þyk;ðl;mÞÞ� ∏K ðl;mÞ

k ¼ 1ðpIIk;ðl;mÞÞα

Ck;ðl;mÞ þyk;ðl;mÞ �1;

ð12Þ

where pIIðl;mÞ denotes the posterior of failure states probabilities for

type-II elements. The proposed method described in Section 3 canbe summarized as follows:

S1. Integrate the native prior and test data for type-I elements.S2. For level l¼ L�1;…;1, repeat S3–S6 recursively by aggre-

gating information from level lþ1 to level l.S3. Compute PrðXII

ðl;mÞÞ based on CPTs of BN by aggregating

information of PA1ðXIIðl;mÞÞ.

S4. Approximate PrðXIIðl;mÞÞ into induce prior with parameters αII

ðl;mÞ.S5. Construct the combined prior, i.e., αC

ðl;mÞ ¼wðl;mÞ�αIIðl;mÞ þð1�wðl;mÞÞ � αN

ðl;mÞ.S6. Compute posterior for XII

ðl;mÞ based on combined prior andtest data at level l.

4. A numerical case study

A numerical case study is conducted to illustrate the informa-tion aggregation procedure and demonstrate the effectiveness ofthe proposed method, i.e., improvement of system reliabilitymodeling accuracy by considering cascading failure. The typicalhierarchical EMA system illustrated in Fig. 4 is considered withoutloss of generality. The EMA elements exhibit multiple performancelevels due to the degradation of themselves and/or their criticalsub-systems or components. For instance, the over-voltage ofthe Motor PS is caused by the degradation of its capacitors. Theinter-level probabilistic failure relationship exists among system

Table 1Predetermined parameter values.

Components k¼1 k¼2 k¼3

PrðXð3;2Þ ¼ kÞ 0.71 0.21 0.08PrðXð3;3Þ ¼ kÞ 0.66 0.25 0.09PrðXð3;5Þ ¼ kÞ 0.82 0.14 0.04

Table 2Predetermined CPTs.

Intra-level CPT I k¼1 k¼2 k¼3

PrðXð3;1Þ ¼ kjXð3;2Þ ¼ 1Þ 0.73 0.15 0.12PrðXð3;1Þ ¼ kjXð3;2Þ ¼ 2Þ 0.20 0.70 0.10PrðXð3;1Þ ¼ kjXð3;2Þ ¼ 3Þ 0.10 0.60 0.30

Intra-level CPT II

PrðXð3;4Þ ¼ kjXð3;3Þ ¼ 1;Xð3;5Þ ¼ 1Þ 0.69 0.13 0.18⋯ ⋯ ⋯ ⋯PrðXð3;4Þ ¼ kjXð3;3Þ ¼ 3;Xð3;5Þ ¼ 3Þ 0.05 0.20 0.75

Inter-level CPT I

PrðXð2;1Þ ¼ kjXð3;1Þ ¼ 1;Xð3;2Þ ¼ 1Þ 1 0 0⋯ ⋯ ⋯ ⋯PrðXð2;1Þ ¼ kjXð3;1Þ ¼ 3;Xð3;2Þ ¼ 3Þ 0 0 1

Inter-level CPT II

PrðXð1;1Þ ¼ kjXð2;1Þ ¼ 1;Xð2;2Þ ¼ 1Þ 1 0 0⋯ ⋯ ⋯ ⋯PrðXð1;1Þ ¼ kjXð2;1Þ ¼ 3;Xð2;2Þ ¼ 3Þ 0 0 1

Intra- and Inter- level CPT

PrðXð2;2Þ ¼ kjXð2;1Þ ¼ 1;Xð3;3Þ ¼ 1;Xð3;4Þ ¼ 1;Xð3;5Þ ¼ 1Þ 1 0 0⋯ ⋯ ⋯ ⋯PrðXð2;2Þ ¼ kjXð2;1Þ ¼ 3;Xð3;3Þ ¼ 3;Xð3;4Þ ¼ 3;Xð3;5Þ ¼ 3Þ 0 0 1

M. Li et al. / Reliability Engineering and System Safety 124 (2014) 158–164162

Page 6: Bayesian modeling of multi-state hierarchical systems with multi-level information aggregation

elements due to the complex inter-level relationship between anelement and its composing elements. The intra-level cascadingfailure dependency also exists among elements at the same systemlevel. For instance, over-voltage of the PS sub-system, mayincrease the failure probability of its connected ASD sub-systemdue to excessive current in the circuit. For simplification, thenumber of failure states for all the elements is assumed as three,i.e., X ðl;mÞ ¼ f1;2;3g; 8ðl;mÞAS. Failure states “1”, “2” and “3”denote “perfect functioning”, “partial failure” and “complete fail-ure”, respectively. It is noticed that the method will remain thesame for any other scenarios with a different number of systemlevels and failure states. Fig. 4 shows the system structureconsidered in this section. For cascading failure relationship,directed arcs may exist among elements at the same level, e.g.,Xð3;2Þ-Xð3;1Þ , as described in Fig. 4.

Case study settings are assumed based on a real EMA systemtestbed but are adjusted due to confidentiality. Tables 1 and 2

show predetermined probabilities of failure states for componentsXð3;2Þ, Xð3;3Þ and Xð3;5Þ and all CPTs. Based on Eqs. (1) and (2),probabilities of failure states for the remaining elements can becalculated and are shown in Table 3. They will serve as the groundtruth in evaluating the performance of the proposed method. Toimitate the information imbalance scenario, multinomial data withsample sizes of 100, 20 and 10 for components, subsystems andthe whole system are randomly generated (as shown in Table 4)based on their true parameter values in Tables 1 and 3. In addition,accurate native priors are assigned to the components since theyare often well deployed and accumulative knowledge is abundant.For the sub-systems and the whole system, non-informative nativepriors are assigned instead due to the fact that prior knowledge forupper-level elements is often unavailable. A Jeffreys prior, apopular choice of non-informative priors, is adopted in this paper.For a Dirichlet prior, parameter values for its correspondingJeffreys prior are all 0.5. Thus, αN

ð2;1Þ, αNð2;2Þ and αN

ð1;1Þ are all set as½0:5;0:5;0:5�T.

The multi-level information aggregation procedure starts at thebottom level. Take Xð3;2Þ as an example, its posterior is firstcalculated as Dirich (239.1,78.6,31) based on S1 in the stepwisealgorithm. Then by S3, the posterior is aggregated throughPrðXð3;1ÞjXð3;2ÞÞ and PrðXð2;1ÞjXð3;1Þ;Xð3;2ÞÞ towards Xð2;1Þ. The resultingaggregated posterior is approximated into induced prior by S4 andcombined with the native prior by S5. Xð2;1Þ's posterior of Dirich(742.8,190.6,169) is finally obtained by S6. S3–S6 will repeat untilthe system level model is achieved. Fig. 6 shows the comparisonsof posteriors for Xð2;1Þ, Xð2;2Þ and Xð1;1Þ, where the proposed methodsignificantly improves the estimation precision as compared to themethod without considering information aggregation. It is notedthat posteriors are relatively flat in the scenario of not consideringinformation aggregation due to non-informative native priorsassumed and small test data simulated for Xð2;1Þ, Xð2;2Þ and Xð1;1Þ.In addition, it improves the estimation accuracy as compared tothe method without considering cascading failure dependency.Table 5 shows the 95% credible intervals and posterior mean forthe posteriors of pð2;1Þ, pð2;2Þ and pð1;1Þ at each failure state. 95%credible intervals fully cover the predetermined true value asshown in Table 5.

Table 3Calculated parameter values.

Remaining system elements k¼1 k¼2 k¼3

PrðXð3;1Þ ¼ kÞ 0.5683 0.3015 0.1302PrðXð2;1Þ ¼ kÞ 0.6872 0.1671 0.1457PrðXð1;1Þ ¼ kÞ 0.6698 0.2238 0.1065PrðXð3;4Þ ¼ kÞ 0.5516 0.2243 0.2241PrðXð2;2Þ ¼ kÞ 0.5480 0.2837 0.1683

Table 4Simulated multinomial test data.

Element Xð3;1Þ Xð3;2Þ Xð3;3Þ Xð3;4Þ Xð3;5Þ Xð2;1Þ Xð2;2Þ Xð1;1Þ

# of counts in k¼1 53 72 63 51 81 15 12 8# of counts in k¼2 32 23 28 24 16 2 6 1# of counts in k¼3 15 5 9 25 3 3 2 1

Fig. 6. Comparisons of posteriors of Xð2;1Þ (left column), Xð2;2Þ (middle column) and Xð1;1Þ(right column). True parameter value (vertical line); w/o information aggregation(dotted curve); w/ information aggregation and assuming independent elements (dashed curve); w/ information aggregation and considering the cascading failuredependency (proposed method, solid curve).

M. Li et al. / Reliability Engineering and System Safety 124 (2014) 158–164 163

Page 7: Bayesian modeling of multi-state hierarchical systems with multi-level information aggregation

5. Conclusion

This paper proposes a multi-level information aggregationapproach to model the reliability of an MSHS. A BN is employedto represent the failure relationship among elements and cascad-ing failure dependency is considered. Based on the existing failurerelationship, all available reliability information of the elements inan MSHS has been aggregated and utilized to improve thereliability modeling of the whole system. The improved systemreliability modeling will benefit system design, warranty policymaking and maintenance service planning. It is noted that in thispaper, information aggregation is considered to address theinformation imbalance issue (i.e., accurate lower-level informationand scarce higher-level information). The full Bayesian approachfurther considers the influence of higher-level information onlower-level system elements [21–23,25]. However, when higher-level information is limited or absent, such influence is negligible;when higher-level information is biased, such influence may bemisleading. It will be interesting to compare the aggregation-based and the full Bayesian approach in the future from differentaspects such as computational complexity, modeling accuracy and/or precision, data availability and quality, etc.

Acknowledgment

The authors gratefully acknowledge the financial support from theNSF Grants (CMMI-1100949, CMMI-1069246 and CMMI-1149602).

References

[1] Simon HA. The sciences of the artificial. The MIT Press; 1996.[2] Billinton R, Allan RN. Reliability evaluation of power systems. Springer; 1996.[3] Abou SC. Performance assessment of multi-state systems with critical failure

modes: application to the flotation metallic arsenic circuit. Reliab Eng Syst Saf2010;95(6):614–22.

[4] Lisnianski A, Levitin G. Multi-state system reliabilityassessment, optimizationand applications. World Scientific Publishing Co., Inc.; 2003.

[5] Bobbio A, Portinale L, Minichino M, Ciancamerla E. Improving the analysis ofdependable systems by mapping fault trees into Bayesian networks. ReliabEng Syst Saf 2001;71(3):249–60.

[6] Sun Y, Ma L, Mathew J, Zhang S. An analytical model for interactive failures.Reliab Eng Syst Saf 2006;91(5):495–504.

[7] Mosleh A. Common cause failures: an analysis methodology and examples.Reliab Eng Syst Saf 1991;34(3):249–92.

[8] Levitin G. Incorporating common-cause failures into nonrepairable multistateseries-parallel system analysis. IEEE Trans Reliab 2001;50(4):380–8.

[9] Greig GL. Second moment reliability analysis of redundant systems withdependent failures. Reliab Eng Syst Saf 1993;41(1):57–70.

[10] Jensen FV, Nielsen TD. Bayesian networks and decision graphs. Springer-Verlag; 2007.

[11] Langseth H, Portinale L. Bayesian networks in reliability. Reliab Eng Syst Saf2007;92(1):92–108.

[12] Mi J, Li Y, Huang H, Liu Y, Zhang X. Reliability analysis of multi-state systemwith common cause failure based on Bayesian networks. Eksploat NiezawodnMaint Reliab 2013;15(2):169–75.

[13] Zhai S, Lin S. Bayesian networks application in multi-state system reliabilityanalysis. In: Proceedings of the 2nd international symposium on computer,communication, control and automation (ISCCCA-13). Paris, France: AtlantisPress; 2013.

[14] Zhou Z, Jin G, Dong D, Zhou J. Reliability analysis of multistate systems basedon Bayesian networks. In: 13th annual IEEE international symposium andworkshop on engineering of computer based systems (ECBS); 2006.

[15] Martz HF, Waller RA. Bayesian reliability analysis. New York: Wiley; 1982.[16] Ibrahim JG, Chen MH, Sinha D. Bayesian survival analysis. Wiley Online

Library; 2005.[17] Martz HF, Waller RA, Fickas ET. Bayesian reliability analysis of series systems of

binomial sub-systems and components. Technometrics 1988:143–54.[18] Martz HF, Waller RA. Bayesian reliability analysis of complex series/parallel

systems of binomial sub-systems and components. Technometrics 1990:407–16.[19] Hulting FL, Robinson JA. The reliability of a series system of repairable sub-

systems: a Bayesian approach. Naval Res Logist 1994;41(4):483–506.[20] Li M, Hu Q, Liu J. Proportional hazard modeling for hierarchical systems with

multi-level information aggregation. IIE Trans, http://dx.doi.org/10.1080/0740817X.2013.772692.

[21] Johnson VE, Graves TL, Hamada MS, Reese CS. A hierarchical model forestimating the reliability of complex systems. Bayesian Stat 2003;7:199–213.

[22] Hamada M, Martz HF, Reese CS, Graves T, Johnson V, Wilson AG. A fullyBayesian approach for combining multilevel failure information in fault treequantification and optimal follow-on resource allocation. Reliab Eng Syst Saf2004;86(3):297–305.

[23] Hamada M, Wilson AG, Reese CS, Martz HF. Bayesian reliability. New York:Springer; 2008.

[24] Li J, Shi J. Knowledge discovery from observational data for process controlusing causal Bayesian networks. IIE Trans 2007;39(6):681–90.

[25] Wilson AG, Huzurbazar AV. Bayesian networks for multilevel system relia-bility. Reliab Eng Syst Saf 2007;92(10):1413–20.

[26] Langseth H, Lindqvist BH. Uncertainty bounds for a monotone multistatesystem. Probab Eng Inf Sci 1998;12:239–60.

[27] Yee TW, Wild CJ. Vector generalized additive models. J R Stat Soc, Ser B(Methodol) 1996:481–93.

Table 5Statistical results of posteriors for the sub-systems and the whole system.

Posterior mean 95% credible interval

Xð2;1Þp1;ð2;1Þ 0.6738 [0.6459,0.7012]p2;ð2;1Þ 0.1729 [0.1512,0.1958]p3;ð2;1Þ 0.1533 [0.1326,0.1751]

Xð1;1Þp1;ð1;1Þ 0.6637 [0.6487,0.6785]p2;ð1;1Þ 0.2253 [0.2123,0.2386]p3;ð1;1Þ 0.1110 [0.1013,0.1211]

Xð2;2Þp1;ð2;2Þ 0.5465 [0.5409,0.5521]p2;ð2;2Þ 0.2839 [0.2789,0.2890]p3;ð2;2Þ 0.1696 [0.1654,0.1738]

M. Li et al. / Reliability Engineering and System Safety 124 (2014) 158–164164