a bayesian hierarchical model for quantitative and ...a bayesian hierarchical model for quantitative...

20
Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ujqt20 Journal of Quality Technology A Quarterly Journal of Methods, Applications and Related Topics ISSN: 0022-4065 (Print) 2575-6230 (Online) Journal homepage: http://www.tandfonline.com/loi/ujqt20 A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin To cite this article: Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin (2018) A Bayesian hierarchical model for quantitative and qualitative responses, Journal of Quality Technology, 50:3, 290-308, DOI: 10.1080/00224065.2018.1489042 To link to this article: https://doi.org/10.1080/00224065.2018.1489042 Published online: 18 Sep 2018. Submit your article to this journal View Crossmark data

Upload: others

Post on 01-Jun-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

Full Terms & Conditions of access and use can be found athttp://www.tandfonline.com/action/journalInformation?journalCode=ujqt20

Journal of Quality TechnologyA Quarterly Journal of Methods, Applications and Related Topics

ISSN: 0022-4065 (Print) 2575-6230 (Online) Journal homepage: http://www.tandfonline.com/loi/ujqt20

A Bayesian hierarchical model for quantitative andqualitative responses

Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin

To cite this article: Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin (2018) A Bayesianhierarchical model for quantitative and qualitative responses, Journal of Quality Technology, 50:3,290-308, DOI: 10.1080/00224065.2018.1489042

To link to this article: https://doi.org/10.1080/00224065.2018.1489042

Published online: 18 Sep 2018.

Submit your article to this journal

View Crossmark data

Page 2: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

RESEARCH PAPER

A Bayesian hierarchical model for quantitative and qualitative responses

Lulu Kanga, Xiaoning Kangb, Xinwei Dengc, and Ran Jind

aDepartment of Applied Mathematics, Illinois Institute of Technology, Chicago, Illinois; bInstitute of Supply Chain Analytics andInternational Business College, Dongbei University of Finance and Economics, Dalian, China; cDepartment of Statistics, VirginiaPolytechnic Institute and State University, Blacksburg, Virginia; dGrado Department of Industrial and Systems Engineering, VirginiaPolytechnic Institute and State University, Blacksburg, Virginia

ABSTRACTIn many science and engineering systems both quantitative and qualitative output observations arecollected. If modeled separately the important relationship between the two types of responses isignored. In this article, we propose a Bayesian hierarchical modeling framework to jointly model acontinuous and a binary response. Compared with the existing methods, the Bayesian method over-comes two restrictions. First, it solves the problem in which the model size (specifically, the numberof parameters to be estimated) exceeds the number of observations for the continuous response. Weuse one example to show how such a problem can easily occur if the design of the experiment isnot proper; all the frequentist approaches would fail in this case. Second, the Bayesian model canprovide statistical inference on the estimated parameters and predictions, whereas it is not clear howto obtain inference using the latest method proposed by Deng and Jin (2015), which jointly modelsthe two responses via constrained likelihood. We also develop a Gibbs sampling scheme to generateaccurate estimation and prediction for the Bayesian hierarchical model. Both the simulation and thereal case study are shown to illustrate the proposed method.

KEYWORDSBayesian hierarchical model;conjugate prior; generalizedlinear model; Gibbssampling; qualitativeresponse; quantita-tive response

1. Introduction

Many data collected from engineering and scientificsystems contain both quantitative and qualitative(QQ) output observations or responses. For example,in the lapping stage of the wafer manufacturing pro-cess the qualitative response is the conformity of thesite total indicator reading (STIR) of the wafer, whichhas two possible outcomes: whether or not the waferSTIR is within the tolerance. The quantitativeresponse is the total thickness variation (TTV) of thewafer. Both of the response variables measure thesmoothness of the wafers, which is an important geo-metrical quality index of the wafers. See Ning et al.(2012), Zhao et al. (2011), and Zhang et al. (2015) fordetailed studies of these two quality characteristics.An interpretable and accurate statistical modelingapproach is needed to find out how the controllableprocess variables and the covariates affect the twokinds of responses. Among all possible modeling tech-niques, the simplest approach is to model the twotypes of responses separately. We can use linearregression models for the quantitative response and

generalized linear models or classification methods forthe qualitative response. But doing so would ignorethe possible association between the two responses.Deng and Jin (2015) have shown the necessity forjointly modeling the two kinds of responses for thelapping process experiment. Such association isimportant as it provides us with some insightfulunderstandings of the system under study as well as asignificant improvement in the prediction accuracy.

There is some existing literature on how to jointlymodel the mixed-type responses. One group of theexisting methods directly use latent variables to modelthe correlations between different responses. Amongthem, Dunson and Herring (2005), Weiss et al. (2011),and Bello et al. (2012) are based on Bayesian frame-work. The others use the frequentist approach, includ-ing Catalano and Ryan (1992) and Sammel et al.(1997). Another group of methods factorize the jointdistribution of the mixed-type responses, includingCox and Wermuth (1992), Fitzmaurice and Laird(1995, 1997), Chen et al. (2014), Yang et al. (2014),Deng and Jin (2015), and Guglielmi et al. (2016). Cox

CONTACT Lulu Kang [email protected] Department of Applied Mathematics, Illinois Institute of Technology, 10 West 32nd Street, Chicago, IL60616, USA.Lulu Kang and Xiaoning Kang share the first authorship.Color versions of one or more figures in this article can be found online at www.tandfonline.com/ujqt.� 2018 American Society for Quality

JOURNAL OF QUALITY TECHNOLOGY2018, VOL. 50, NO. 3, 290–308https://doi.org/10.1080/00224065.2018.1489042

Page 3: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

and Wermuth (1992) discussed two factorization mod-els for a continuous and a binary response as the func-tion of covariates. Fitzmaurice and Laird (1995, 1997)considered the marginal distribution of the qualitativeresponse as well as the conditional distribution of thequantitative response and proposed a likelihood-basedmethod. Chen et al. (2014) and Yang et al. (2014)introduced the mixed graphical model to analyze theassociation between the quantitative and qualitativeresponses. Guglielmi et al. (2016) proposed a semipara-metric Bayesian joint model for one continuous andtwo binary responses. See Deng and Jin (2015) andGuglielmi et al. (2016) for more detailed reviews.

Based on the same factorization of the joint-likelihoodidea, Deng and Jin (2015) assumed a conditional modelfor the quantitative response given the values of the quali-tative response. Specifically, the quantitative response fol-lows different normal distributions for differentqualitative response values. To achieve model sparsity theauthors used the nonnegative garrote and constrainedlikelihood estimation method. The authors also demon-strated that this approach provides parsimonious, inter-pretable, and accurate models for both responses. Despitethese advantages, it has two main restrictions.

The first restriction is common to all frequentistapproaches. We demonstrate it using a toy example.Denote by Y and Z a continuous response and a binaryqualitative response, respectively. Assume that the truemodel of Z follows the Bernoulli distribution withE(Zjx)¼ p(x)¼ exp (1þ x)=(1þ exp (1þ x)) and the truemodel of Y is YjZ¼ z�N(1� (1� z)x2, 0.32) (i.e.,E(YjZ¼ 1)¼ 1 and E(YjZ¼ 0)¼ 1� x2). We construct a14-point design, which consists of an 8-point local D-optimal design for the logistic modellog (p(x)=(1� p(x)))¼ g0þ g1x of Z and a 6-pointD-optimal design for the quadratic regression model ofY. Figure 1(a)–(c) shows the data from the 8-point, 6-point, and their combined 14-point designs, respectively.

Clearly, if using the frequentist approach it is impossibleto estimate the quadratic model of E(YjZ¼ 0) as thereare no observations of YjZ¼ 0 at x¼ 1. But such a defectfrom the design of experiment can be solved by theBayesian approach with an informative prior.

The second restriction is that statistical inference ishard to obtain from the joint model developed in Dengand Jin (2015). This joint QQ model is also introducedin Section 2.1. A constraint-likelihood approach wasused to estimate parameters for the joint QQ models.The authors chose the nonnegative garrote approach(Breiman 1995) to enable variable selection for a parsi-monious model with meaningful interpretation. Moreimportantly, they incorporated the heredity principles(Wu and Hamada 2011) into the nonnegative garrotein the forms of constraints. Thus the estimationbecomes the constrained maximum likelihood estima-tion (MLE). Such a complicated constrained likelihoodoptimization often does not have an analytical solutionas in linear regression models. To solve this con-strained optimization Deng and Jin (2015) developedthe constrained sequential quadratic programming thatiteratively searches for the optimal solution.Consequently, the classic asymptotical distribution ofthe maximum likelihood estimation cannot be easilyapplied here as in the ordinary generalized linearmodel. Thanks to the Bayesian framework, the infer-ences of the estimated parameters and the predictionsare naturally available from the posterior distributionand the posterior predictive distribution. What ismore, we also incorporate the hierarchical orderingprinciple (Wu and Hamada 2011) in the marginal priorcomponents of the linear coefficients so as to achievesparsity and meaningful interpretation.

The two restrictions on the existing frequentistmethods motivate us to investigate the Bayesianframework as an alternative. Different from some exist-ing Bayesian approaches, we do not consider using the

+

+ +

+

+

+

–1.0 –0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0(a) (b) (c)

x

y

+ ++

–1.0 –0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

x

y

+

+ +

+

+

+

+ ++

–1.0 –0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

2.0

x

y

Figure 1. (a) Observations from design for Z; (b) observations from design for Y; (c) observations from the combined design. Adashed line “- - -” denotes E(YjZ¼ 1)¼ 1; a solid line “—” denotes E(YjZ¼ 0)¼ 1� x2; point “+” denotes (xi, yi) with zi¼ 1; point“o” denotes (xi, yi) with zi¼ 0. (See also Figure 1 in Kang et al. 2015.)

JOURNAL OF QUALITY TECHNOLOGY 291

Page 4: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

latent variable to model the correlation between thetwo responses because we prefer the clear model struc-ture in Deng and Jin (2015). It is also different fromthe Bayesian method in Guglielmi et al. (2016), wherethe marginal distribution was applied to the continu-ous response, whereas our approach applies the mar-ginal distribution to the binary response. We alsopropose a Bayesian hierarchical structure to linktogether all the conditional models for the continuousresponse and the marginal model for the binaryresponse. The structure also has satisfactory perform-ances in prediction and variable selection.

The rest of the article is outlined as follows.Section 2 introduces the Bayesian hierarchical model-ing framework. In Section 3 a Gibbs sampling algo-rithm is developed for estimation and prediction. Weuse simulation studies to demonstrate the perform-ance of the proposed method under different circum-stances in Section 4; Section 5 considers the casestudy on the lapping process in wafer manufacturingand compares the results with previous results. Thearticle concludes with Section 6. Our algorithm anddata analysis are all implemented in R.

2. The Bayesian hierarchical model

2.1. Sampling distribution

Suppose that Y is a continuous response and Z is abinary response. The independent variable x ¼ðx1; :::; xpÞ0 2 R

p has p dimensions. It can containcontrollable process variables and uncontrollable cova-riates. To jointly model the two responses Y and Zgiven x, consider the joint probability of YjZ and Z.The conditional model of YjZ is assumed to be a lin-ear regression model, while the model of Z is a logis-tic regression model. Specifically, we consider a jointmodeling of Y and Z as follows:

Z ¼�1; with probability pðxÞ0; with probability 1� pðxÞwith pðx; gÞ

¼ expðf ðxÞ0gÞ1þ expðf ðxÞ0gÞ ;

[1]

where f ðxÞ ¼ f1ðxÞ; :::; ; fqðxÞ� �0 contains q modeling

effects including the intercept and main, interaction,and quadratic effects, etc., and g ¼ ðg1; :::; gqÞ0 is avector of coefficient parameters. Conditional on Z¼ z,the quantitative variable Y has the distribution

YjZ ¼ z�N�zf ðxÞ0bð1Þ þ ð1� zÞf ðxÞ0bð2Þ; r2

�; [2]

where bðiÞ ¼ ðbðiÞ1 ; :::; bðiÞq Þ0; i ¼ 1; 2 are the correspond-ing coefficients of the modeling effects. The parameter

r2 is the noise variance. The above QQ modelindicates that YjZ ¼ 1�Nðf ðxÞ0bð1Þ; r2Þ and YjZ ¼0�Nðf ðxÞ0bð2Þ; r2Þ. The association between the tworesponses Y and Z is represented using the conditionalmodel YjZ. When the two linear models for YjZ¼ 0and YjZ¼ 1 are different (i.e., bð1Þ 6¼ bð2Þ), then it isimportant to take into account the influence of thequalitative response Z when modeling the quantitativeresponse Y. This conditional model structure of YjZand Z, called the joint QQ model, was also used inDeng and Jin (2015) and Kang et al. (2015).

Note that here we assume that the linear model forY and the logistic model for Z involve the same model-ing effects f ðxÞ. But if necessary we can assume differ-ent modeling effects for E(YjZ¼ 1), E(YjZ¼ 0), andlogitðpðxÞÞ. In our model we let f ðxÞ contain all thenecessary terms for both Y and Z. If some terms arestatistically insignificant for a certain response, the pos-terior means of the corresponding coefficients shouldbe shrunk and be close to zero, as in a Bayesian linearregression or ridge regression. Another simplificationwe make is to assume the same variance r2 for bothYjZ¼ 1 and YjZ¼ 0. Users can easily extend the pro-posed model to different variances.

Denote the data as ðxi; yi; ziÞ; i ¼ 1; :::; n where yi 2R and zi 2 f0, 1g. The vectors y ¼ ðyi; :::; ynÞ0 andz ¼ ðz1; :::; znÞ0 are the vectors of response observa-tions. Suppose that there are n1 observed yi’s whenZ¼ 1 and n2 observed yi’s when Z¼ 0. Letn¼ n1þ n2. Based on Eqs. [1] and [2], we can expressthe sampling distributions as

yjz; bð1Þ; bð2Þ; r2�N V1Fbð1Þþ;V2F; b

ð2Þ; ; r2; In� �

;zijg�ind Bernoulli pðxi; gÞð Þ for i ¼ 1; :::; n;

[3]

where V1 ¼ diagfz1; :::; zng is a diagonal matrix, In isthe n� n identity matrix, V2 ¼ In � V1, and F is themodel matrix with the ith row as f ðxiÞ0. Let p( � )denote a general density function. The sampling dis-tribution of z is

pðzjgÞ / expXni¼1

zif ðxiÞ0g� log 1þ exp f ðxiÞ0g� �� �� �

:

(

2.2. Prior and hyperprior distributions

The unknown parameters in the sampling distributionare h ¼ ðbð1Þ; bð2Þ; g; r2Þ. Due to the normal distributionof y the semiconjugate marginal prior components ofbð1Þ and bð2Þ should come from the normal distributionfamily. The semiconjugate marginal prior component forg is not normal (Chen and Ibrahim 2003), but we still

292 L. KANG ET AL.

Page 5: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

prefer to use the nonconjugate normal distribution as themarginal prior for g. In this way, all the coefficients ofthe model terms f ðxÞ have marginal prior componentsof the same format, thus making it simpler to specify thehyperprior distribution of the hyperparameters. In thecontext of this article we consider r2 a nuisance param-eter, so we choose a weakly informative marginal prior,Inv � v2ð0:001; 0:001Þ, which is a commonly consideredvague proper prior distribution for variance parameters(Chapter 5, Gelman et al. 2014). To sum up, the mar-ginal prior components for bð1Þ, bð2Þ, g, and r2 are

bðiÞ�Nð0; s21RiÞfor i ¼ 1; 2;g�Nð0; s22R3Þ;r2�Inv � v2ð0:001; 0:001Þ:

[4]

The matrices R1, R2, and R3 are the correlation matri-ces for the normal distributions and are explainedlater. The marginal prior components for bð1Þ and bð2Þ

are independent for simplicity and are also based onthe assumption that two, YjZ¼ 1 and YjZ¼ 0, are notcorrelated. Under certain circumstances it might benecessary to assume that bð1Þ and bð2Þ are correlatedin the prior. The proposed Bayesian hierarchicalmodel can still be applied, but the full-conditional dis-tribution of bð1Þ and bð2Þ—given the rest of theparameters and data—has a slightly more complicatedcovariance matrix. The prior mean g is such that

Pr ðZ ¼ 1jx;EðgÞÞ ¼ 1=2. To avoid too manyhyperparameters we also assume the prior mean ofbð1Þ and bð2Þ to be zero. It is reasonable as long as wecenter y in the data preprocessing stage. Let �y1 be thesample mean of yi’s for zi¼ 1 and �y2 for zi¼ 0. Wecenter the Y observations as yi � �y1 if zi¼ 1 and yi ��y2 if zi¼ 0, and we use the same notation y for thecentralized observations. Again for simplicity, weassume the same variance s21 in the marginal priorcomponents of bð1Þ and bð2Þ. The variance can be eas-ily changed to different values.

The correlation matrices in Eq. [4] in the marginalprior components for bð1Þ, bð2Þ, and g are assumed to bediagonal, which means that the coefficients are independ-ent of each other. This assumption is reasonable if weuse the orthogonal polynomial basis of x, which consistsof the intercept, the linear effects, the quadratic effects,and the interactions, etc., up to a user-specified order. Ifthe controllable variable settings are from a full factorialdesign or an orthogonal design, then we can achieve fullor near orthogonality between the bases. For the basesinvolving covariates, achieving full or near orthogonalityis not likely. But we still assume independence for simpli-city and leave the posterior distribution to correct it. LetRi ¼ diagf1; ri; :::; ri; r2i ; :::; r2i ; :::g for i¼ 1, 2, 3 where ri

2 (0, 1) is a hyperparameter. The power index of ri isthe same as the order of the corresponding polynomialterm. For example, if the model f ðxÞ for x 2 R

2 is a fullquadratic model and contains the term f1, x1, x2, x21,x22, x1x2g then the corresponding prior correlation matrixshould specified as R ¼ diagf1; r; r; r2; r2; r2g. In thisway the prior variance of the effect decreases exponen-tially as the order of the effect increases, following thehierarchy ordering principle defined in Wu and Hamada(2011). The hierarchy ordering principle can reduce thesize of the model and avoid including higher-order andless-significant model terms. Such prior distribution wasfirst proposed by Joseph (2006) and later used by Kangand Joseph (2009) and Ai et al. (2009).

The hyperparameters in the marginal prior compo-nents of Eq. [4] are / ¼ ðr1; r2; r3; s21; s22Þ. We use thefollowing marginal hyperprior components:

s21; s22�iidInv � v2ð�; d2Þ;

r1; r2; r3�iidBetaða; bÞ:Here Inv � v2ð�; d2Þ denotes the scaled inverse Chi-square distribution with � degrees of freedom andscale d. For s2i , the inverse-v2 distribution is semicon-jugate with normal prior distributions of bð1Þ, bð2Þ,and g because the full-conditional distribution of s2i isalso an inverse-v2 distribution. As ri 2 (0, 1), the betadistribution is an obvious choice. The parameters (�,d2, a, b) can be considered as tuning parameters forthe Bayesian hierarchical model.

3. Monte Carlo Markov Chain (MCMC) sampling

Given the model assumptions, the next step in a stand-ard Bayesian approach is to sample the posterior distri-bution of the parameters and posterior predictivedistribution of the response variables. The mode valuesof these distributions are usually considered as the par-ameter estimation or predictions at the query points,and statistical inference such as confidence intervalscan be obtained from these empirical distributions.

Our goal is to sample from the joint posterior dis-tribution pðh;/jy; zÞ. For a relatively complicatedBayesian framework, the Gibbs sampling algorithm, apopular MCMC sampling method, can sequentiallyupdate each block of parameters following the corre-sponding full-conditional distribution. To brieflyexplain the Gibbs sampler let us use the general nota-tion of parameter vector h and data y. The vector h

can be divided into s blocks (i.e., h ¼ ðh1; :::; hsÞ).In the tth iteration of the MCMC chain each ht;jfor j¼ 1, … , s is sampled from the conditionaldistribution pðhjjht�1;�j; yÞ where ht�1;�j ¼ ðht;1; :::;ht;j�1; ht�1;jþ1; :::; ht�1;sÞ. For a more detailed

JOURNAL OF QUALITY TECHNOLOGY 293

Page 6: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

illustration of a Gibbs sampler and its convergencecondition readers can turn to Casella and George(1992) and Roberts and Smith (1994). To develop theGibbs sampler for the Bayesian hierarchical model wefirst derive the full-conditional distributions for theblocks of parameters in ðh;/Þ in Section 3.1 and elab-orate the Gibbs sampler in Section 3.2.

3.1. Full-conditional distributions

According to the sampling distribution, the prior, andhyperprior distribution, we can derive the full-condi-tional distributions for the parameters and the hyper-parameters as follows. Many details on the derivationare omitted to save space. The derivation follows theclassic Bayesian theory on deriving posterior distribu-tions, which can be found in any Bayesian statisticstextbook such as Gelman et al. (2014).

Due to the semiconjugacy it is straightforward toderive the full-conditional distribution for the linearcoefficients bð1Þ and bð2Þ following the formula onBayesian linear regression:

ðbð1Þ; bð2ÞÞj g; r2;/; y; z�Nðlb;RbÞ;wherelb ¼ Vn

F0V1yF0V2y

�;

Rb ¼ r2Vn ¼ r2F0V1F þ r2=s21R

�11 ; 0

0; F0V2F þ r2=s21R�12

��1

:

The full-conditional distribution of r2 is an inverse-v2

distribution according to the Bayesian linear regres-sion model (Chapter 14 of Gelman et al. 2014). Thus,

r2jbð1Þ; bð2Þ; y; z�Inv � v2 nþ 0:001;

Pni¼1 ðyi � liÞ2 þ 10�6

nþ 0:001

� ;

where li’s are the elements of l ¼ V1Fbð1Þ þ V2Fb

ð2Þ.Because the normal marginal prior for g is not

semiconjugate with the sampling distribution of z, theexact full-conditional distribution pðgjz; r3; s22Þ in Eq.[5] does not belong to any known distribution family:

pðgjr3; s22; zÞ / ð2ps22Þ�p=2detðR3Þ�p=2hðgjz; r3; s22Þ;/ hðgjz; r3; s22Þ

[5]

where

hðgjz; r3; s22Þ ¼ exp

�� 12s22

g0R�13 g

þXni¼1

zif ðxiÞ0g� logð1þ ef ðxiÞ0gÞ

� �:

The Metropolis-Hasting (MH) algorithm is the go-tomethod to sample from a target distribution with

an unknown distribution family. It generates a can-didate sample from the proposal distribution, com-putes the acceptance probability, and then eitheraccepts or rejects the candidate value based on theacceptance probability. More details can be found inChapter 11 of Gelman et al. (2014). In the case ofa logistic model for the binary response Z, Holmesand Held (2006) introduced a sampling methodfrom the exact full-conditional distribution in Eq.[5] via auxiliary variables. We have implementedboth algorithms and conducted some simulationexamples, which are omitted here. Compared withthe MH algorithm the Holmes and Held (2006)algorithm is not as efficient when embedded in theGibbs sampling algorithm in Section 3.2. We thuschoose the MH algorithm to sample from Eq. [5],but readers can choose whichever one of the twoseems to fit their application.

The full-conditional distributions of s21 and s22should follow an inverse-v2 distribution due to thesemiconjugacy; they are

s21jrest; y; z � Inv � v2

� þ 2p;1

� þ 2p½bð1Þ0R�1

1 bð1Þ þ bð2Þ0R�12 bð2Þ þ �d2�

� ;

[6]

s22jrest; y; z � Inv � v2 � þ p;1

� þ p½g0R�1

3 gþ �d2��

:

[7]

The exact full-conditional distributions of r1, r2, andr3 are listed below:

pðr1jrest parameters; y; zÞ / jR1j�12

expf� 12s21

bð1Þ0R�11 bð1Þg ra�1

1 ð1� r1Þb�1;[8]

pðr2jrest parameters; y; zÞ / jR2j�12

expf� 12s21

bð2Þ0R�12 bð2Þg ra�1

2 ð1� r2Þb�1;[9]

pðr3jrest parameters; y; zÞ / jR3j�12

expf� 12s22

g0R�13 gg ra�1

3 ð1� r3Þb�1:[10]

Although the full-conditional distributions in Eqs. [8]through [10] are straightforward to derive, they arenot any known distributions. Again, we use the MHalgorithm to sample from them.

To sum up this section, the merits of the proposedBayesian framework, besides overcoming the limita-tions of the frequentist approach, lie in the hierarch-ical structure. The hyperparameters (r1, r2, r3) connectthe coefficient parameters bð1Þ, bð2Þ, and g by follow-ing the same marginal hyperprior. In the same waythe hyperparameters (s21, s

22) connect the models for

294 L. KANG ET AL.

Page 7: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

Y and Z. The hierarchical model is flexible enough toprovide a reasonable fit to the data—as shown later—and such flexibility is not paid by the price of toomany parameters.

3.2. Gibbs sampling algorithm

For the blocks of parameters bð1Þ bð2Þ, r2, s21, and s22we can easily obtain their samples from the normal orinverse-v2 distributions. For g, r1, r2, and r3 we usethe MH algorithm since their full-conditional distribu-tions are not from any known distribution family.Denote B and b as the length of the entire MCMCchain and the burn-in sample size, respectively. Thefollowing is the Gibbs sampling algorithm for the pro-posed Bayesian hierarchical model.

Step 1: Set the initial values for the hyperparametersr1,0, r2,0, r3,0, s21;0, and s22;0.

Step 2: Set the initial values for the parametersbð1Þ0 ; b

ð2Þ0 ; g0, and r20 as follows.

a. Use the least squares estimates for bð1Þ0 ; b

ð2Þ0 .

When the least squares estimates are not avail-able choose the ridge regression estimates.

b. Use the logistic regression estimate for g0.c. Set r20 ¼ 1

n ½ðy � V1Fbð1Þ � V2Fb

ð2ÞÞTðy � V1

Fbð1Þ� V2Fbð2ÞÞ�.

Step 3: For t¼ 1 to B, do the following:

a. Sample ðbð1Þt ; bð2Þt Þ from Nðlb;RbÞ.

b. Sample r2t from Inv � v2 nþ 0:001;ðPn

i¼1ðyi�liÞ2þ10�6

nþ0:001 Þ, where li’s are the elements

of V1Fbð1Þt þ V2Fb

ð2Þt .

c. Sample gt by the MH algorithm from the dis-tribution in Eq. [5].

d. Sample s21;t and s22;t from Eqs. [6] and [7].e. Sample r1,t, r2,t and r3,t by the MH algorithm

from the distributions in Eqs. [8) through [10].f. If the prediction at any query point x is

required, sample zt from the sampling distribu-tion of Zjgt in Eq. [1] and yt from the samplingdistribution of Yjzt; bð1Þt ;b

ð2Þt ; r2t in Eq. [2].

Step 4: Discard the first b samples ðht;/tÞ and pos-terior predictive samples (yt, zt) from theburn-in period. Using the rest of the sam-ples, return the empirical posterior distribu-tions for ðh;/Þ and the empirical posteriorpredictive distributions for both responsesat x.

There are three kinds of tuning parameters. Thetuning parameters for the MCMC are (B, b), which

can be set by considering the amount of computationneeded to achieve the convergence and the presetaccuracy level. The tuning parameters for the algo-rithm are the initial values (r1,0, r2,0, r3,0, s21;0, s22;0).The tuning parameters for the Bayesian model are theparameters of the hyperprior distribution (�, d2, a, b),which can be chosen by analyzing historical data ifthey are available. Alternatively, we suggest setting (a,b) with a

aþb ¼ 1=3 or 1=2 and setting �¼ 2 or �¼ 4.5,which lead to weakly informative distributions.Through a simulation-based sensitivity study (omittedfrom this article), we find that the Bayesian hierarch-ical model is robust to the choice of (�, d2, a, b)around the suggested settings.

4. Simulation examples

4.1. A small-scale example

We first use a small-scale simulated example to dem-onstrate the estimation accuracy of the proposedmodel and the sampling algorithm. In this examplewe set the dimension of x as p¼ 5 and the samplesize n¼ 100; we then generate the training and test-ing data from the joint QQ model specified in Eq.[3]. The input variable values x1; :::; xn are independ-ent and identically distributed samples from Nð0;RÞ,where R ¼ ðRijÞp�p and the (i, j)th entryRij¼ 0.5ji� jj. The true values of bð1Þ, bð2Þ, g are plot-ted as the red vertical lines in the posterior histo-grams in Figures 2, 3, and 4. The true value of r2 isset to be one, also shown in Figure 4. We can seethat the modes of the posterior distribution of theparameters h match their respective true valuesquite closely.

Using the Gibbs sampling algorithm in Section 3.2we set the model tuning parameters (�, d2, a,b)¼ (4.5, 5, 0.1, 0.2) and the MCMC tuning parame-ters (B, b)¼ (10,000, 1,000). In Step 1 we set the ini-tial values of (r1,0, r2,0, r3,0, s21;0, s

22)=(0.3, 0.3, 0.3, 0.5,

0.5). Gelman and Rubin’s (1992) convergence diag-nostic is frequently used for monitoring the conver-gence of the MCMC output, and it is implemented inthe R package coda by Plummer et al. (2006). Themultivariate scale reduction factor values returned bythe gelman.diag function for bð1Þ, bð2Þ, and g

are 1, 1, and 1.03, respectively, and the univariatescale reduction factor for r2 is also 1, indicating thatthe MCMC chains of all the parameters have con-verged within B iterations.

JOURNAL OF QUALITY TECHNOLOGY 295

Page 8: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

4.2. Comparisons with alternative methods

This subsection reports a large-scale simulation studyon the proposed Bayesian hierarchical model and theposterior sampling scheme. We compare the followingfour methods.

� BHQQ(1): The proposed Bayesian hierarchicalmodel and the model tuning parameters are (m, d2,a, b)¼ (2, 2, 0.1, 0.1).

� BHQQ(2): The proposed Bayesian hierarchicalmodel and the model tuning parameters are (m, d2,a, b)¼ (4.5, 5, 0.1, 0.2).

� SM(F): Separate models of the two responses usinga frequentist approach.

� SM(B): Separate models of the two responses usinga Bayesian approach.

For both BHQQ(1) and (2) we set the length of theMCMC chain B¼ 10,000 and burn-in size b¼ 1,000and let the initial value (r1,0, r2,0, r3,0, s21;0, s

22;0) be (0.3,

0.3, 0.3, 0.5, 0.5). Both SM(F) and SM(B) fit a linearregression model for the continuous response Y and alogistic regression for the binary response Z. SM(F)employs the LASSO technique for both linear and

logistic regression models to shrink the estimated coeffi-cients of parameters and select the significant variables.SM(B) sets the marginal normal priors for the parame-ters in linear and logistic regression models.

4.2.1. Different setups of data generationWe consider the similar simulation cases used in Dengand Jin (2015). There are two settings for the input dimen-sion, p¼ 20 and p¼ 50. The relatively high input-dimen-sion settings are meant to test if the BHQQ can accuratelyidentify the significant variables. To generate the datafrom the joint QQ model specified in Eq. [3], we considerdifferent setups of the true values of bð1Þ and bð2Þ.

Let I1 and I2 be the index sets of the significantcoefficients of bð1Þ and bð2Þ, respectively. Denote �b1 ¼fbð1Þk : k 2 I1g and �b2 ¼ fbð2Þk : k 2 I2g. Similarly,denote I3 as the index set of significant coefficients ofg and �g ¼ fgk : k 2 I3g. The coefficients that are notin I1, I2, and I3 are set to zero. The following four set-ups are used:

� Setup 1: I1 and I2 are the same, and the values of�b1 and �b2 are similar.

� Setup 2: I1 and I2 are different, but the values of �b1

and �b2 are similar.

β0(1)

0.0

0.5

1.0

1.5

β1(1) β2

(1)

3.0 3.5 4.0 4.5 1.5 2.0 2.5 3.0 2.0 2.5 3.0 3.5

β3(1) β4

(1)

2.5 3.0 3.5 1.5 2.0 2.5 3.0 3.5

0.0

0.5

1.0

1.5

0.0

0.5

1.0

1.5

0.0

0.5

1.0

1.5

0.0

0.5

1.0

1.5

Figure 2. The histogram of the posterior samples of bð1Þ compared against its true values plotted as the red vertical lines.

296 L. KANG ET AL.

Page 9: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

� Setup 3: I1 and I2 are the same, but the values of�b1 and �b2 are different.

� Setup 4: I1 and I2 are different, and the values of�b1 and �b2 are also different.

The four setups represent different relationshipsbetween YjZ¼ 0 and YjZ¼ 1. Under Setup 1 themodel of Y remains the same for different values of Z(i.e., bð1Þ ¼ bð2Þ). Under Setups 2 – 4 there are differ-ent degrees of dependency of Y on Z, as bð1Þ and bð2Þ

are different at different levels. Here we define propor-tion sparsity, denoted by s, as the proportion of non-zero entries in a parameter vector (i.e., the size of I1or I2 divided by p). In each Setup 1–4 we use two set-tings of s, s¼ 0.2 and s¼ 0.5. To sum up, we considertwo settings of p, four settings of the differencesbetween bð1Þ and bð2Þ, and two settings of the modelsparsity s. Overall, the full factorial combinations have2� 4� 2¼ 16 cases.

For each of the 16 cases the entry values of param-eter vector �g are generated from the uniform distribu-tion U(� 2, 2). For the cases under Setups 1 and 2 wefirst generate the parameter vector �b1 with each entry

value from the normal distribution N(2, 1) and thenobtain the entry values of �b2 by adding a small per-turbation from N(0, 0.01) to the values of �b1. InSetups 3 and 4 the entry values of parameter vectors�b1 and �b2 are generated independently from N(2, 1).The rest of the coefficients not in �g, �b1, and �b2 arezero. The n input design points x1; :::; xn in the train-ing set are independent and identically distributed(i.i.d.) samples from Nð0;RÞ, where R ¼ ðrijÞp�p withthe (i, j)th entry rij¼ 0.5ji� jj. The n points of the test-ing data set are i.i.d. samples from U(� 2, 2). Thesample size for both training and testing data sets isn¼ 100. The r2 in Eq. [2] is one.

4.2.2. Comparison criteriaWe measure the performances of the four methodsusing three aspects: prediction accuracy, variable selec-tion accuracy, and parameter estimation accuracy.

To measure the prediction accuracy we consider

the root mean squared prediction error RMSPE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1n

Pni¼1 ðyi � yiÞ2

qfor the quantitative response Y,

where yi represents the predicted value (the mean,

β0(2)

3.0 3.5 4.0

β1(2)

1.0 1.5 2.0 2.5

β2(2)

2.0 2.5 3.0

β3(2) β4

(2)

−0.5 0.0 0.5

0.0

0.5

1.0

1.5

2.0

2.5

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4

0.0

0.5

1.0

1.5

2.0

0.0

0.5

1.0

1.5

2.0

Figure 3. The histogram of the posterior samples of bð2Þ compared against its true values plotted as the red vertical lines.

JOURNAL OF QUALITY TECHNOLOGY 297

Page 10: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

median, or mode of the posterior predictive distribu-tion) of yi in the testing data set. The prediction per-formance for the qualitative response Z is measuredby the misclassification error ME ¼ 1

n

Pni¼1 Iðzi 6¼ z iÞ,

where z i is the predicted value of zi and I( � ) is an indi-cator function. The prediction value z i is set as one ifthe corresponding posterior predictive sample meanor median of Pr ðZ ¼ 1jx; gtÞ (gt is the MCMC chain)is larger than 0.5.

In addition, to gauge the performance of variableselection, false positive rates (FP) and false negativerates (FN) are defined. In the FP case the model esti-mation classifies insignificant predictors as significantones. Similarly, FN estimation classifies significantpredictors as insignificant ones. Here we use the totalnumber of falsely selected variables c¼ FPþ FN as theperformance measure. For the frequentist approachSM(F) the selected variables are from LASSO. For theBayesian approach BHQQ and SM(B) the significantvariables are selected based on the credit intervalsfrom the rest of the 9,000 after burn-in. Denote clm asthe number of false selection for bð1Þ and bð2Þ, andclogit as the number of false selection for g.

Moreover, to evaluate the parameter estimationaccuracy, L2 loss for each parameter estimate is

considered as follows:

L2ðbð1ÞÞ ¼ 1pjjbð1Þ � bð1Þjj22; L2ðbð2ÞÞ ¼ 1

pjjbð2Þ � bð2Þjj22;

L2ðgÞ ¼ 1pjjg � gjj22;

where jj � jj2 stands for the vector L2 norm. Here bð1Þ,

bð2Þ, and g are the estimated parameter values that

are compared against the corresponding true param-eter values. For Bayesian models we use the modes ofthe posterior samples as the parameter estimates.

4.2.3. Simulation resultsUnder each of the 16 cases of data-generation setupwe repeat M¼ 100 times of simulation. In each simu-lation we generate the new training and testing dataand then fit BHQQ(1), BHQQ(2), SM(F), and SM(B)to make predictions. Therefore, we can obtainM¼ 100 values for each of the seven performancemeasures. Table 1 presents the average values andtheir corresponding standard deviations (in parenthe-ses) of the aforementioned seven criteria for p¼ 20.Table 2 displays the results for p¼ 50, and its conclu-sions are similar to the ones for p¼ 20. So we focus

η0

−2.0 −1.5 −1.0 −0.5

0.0

0.5

1.0

1.5

η1 η2

η3 η4

−2.0 −1.5 −1.0 −0.5 0.0 0.5

0.0

0.2

0.4

0.6

0.8

1.0

0.5 1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

−1.5 −1.0 −0.5 0.0

0.0

0.2

0.4

0.6

0.8

1.0

0.5 1.0 1.5 2.0 2.5 3.0 3.5

0.0

0.2

0.4

0.6

0.8

1.0

σ2

1.0 1.5 2.0

0.0

0.5

1.0

1.5

2.0

2.5

Figure 4. The histogram of the posterior samples of g and r2 compared against their true values plotted as the red vertical lines.

298 L. KANG ET AL.

Page 11: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

Table1.

Theaverages

andstandard

errors

(inparentheses)of

sevenloss

measuresforp¼20.

BHQQ(1)

BHQQ(2)

SM(F)

SM(B)

Setup

0.2

0.5

0.2

0.5

0.2

0.5

0.2

0.5

1RM

SPE

0.664(0.022)

0.910(0.021)

0.656(0.022)

0.908(0.021)

0.464(0.015)

0.547(0.013)

0.500(0.020)

0.576(0.016)

ME

0.220(0.006)

0.198(0.005)

0.224(0.006)

0.199(0.005)

0.305(0.006)

0.234(0.005)

0.273(0.005)

0.236(0.005)

c lm

1.660(0.157)

1.880(0.134)

1.620(0.148)

1.900(0.128)

10.08(0.423)

4.860(0.200)

2.740(0.252)

1.600(0.173)

c logit

1.810(0.104)

4.710(0.121)

1.770(0.094)

4.790(0.123)

5.930(0.178)

6.080(0.191)

1.720(0.087)

4.950(0.108)

L2ðb

ð1ÞÞ

0.027(0.003)

0.031(0.002)

0.026(0.003)

0.031(0.002)

0.009(0.001)

0.013(0.001)

0.011(0.001)

0.014(0.001)

L2ðb

ð2ÞÞ

0.025(0.002)

0.069(0.005)

0.025(0.002)

0.068(0.005)

0.009(0.001)

0.013(0.001)

0.011(0.001)

0.014(0.001)

L2ðgÞ

0.075(0.008)

0.288(0.027)

0.081(0.011)

0.374(0.036)

0.060(0.003)

0.254(0.039)

0.064(0.006)

0.225(0.009)

2RM

SPE

2.346(0.030)

3.625(0.065)

2.342(0.030)

3.632(0.068)

3.127(0.030)

5.145(0.045)

3.333(0.037)

5.444(0.050)

ME

0.252(0.006)

0.196(0.005)

0.247(0.005)

0.196(0.005)

0.326(0.006)

0.234(0.005)

0.297(0.005)

0.235(0.005)

c lm

1.970(0.170)

2.480(0.140)

1.900(0.176)

2.550(0.147)

17.68(0.552)

16.52(0.266)

9.700(0.250)

12.08(0.175)

c logit

2.410(0.112)

4.920(0.139)

2.340(0.097)

5.070(0.165)

5.830(0.278)

6.510(0.180)

2.340(0.108)

5.320(0.146)

L2ðb

ð1ÞÞ

0.024(0.002)

0.046(0.003)

0.023(0.002)

0.046(0.003)

0.366(0.010)

1.066(0.029)

0.435(0.015)

1.023(0.039)

L2ðb

ð2ÞÞ

0.028(0.002)

0.050(0.003)

0.028(0.002)

0.050(0.003)

0.429(0.012)

1.069(0.032)

0.464(0.017)

1.222(0.041)

L2ðgÞ

0.078(0.004)

0.291(0.019)

0.083(0.006)

0.410(0.037)

0.061(0.003)

0.192(0.008)

0.082(0.004)

0.218(0.009)

3RM

SPE

1.196(0.017)

1.455(0.021)

1.191(0.017)

1.481(0.021)

1.416(0.012)

1.724(0.017)

1.442(0.016)

1.746(0.019)

ME

0.202(0.005)

0.204(0.005)

0.202(0.005)

0.209(0.005)

0.277(0.006)

0.254(0.005)

0.238(0.005)

0.236(0.005)

c lm

1.940(0.161)

2.800(0.127)

1.960(0.161)

2.830(0.132)

12.30(0.570)

11.20(0.461)

2.400(0.218)

2.540(0.200)

c logit

1.300(0.111)

5.670(0.153)

1.320(0.113)

5.810(0.136)

4.480(0.243)

6.270(0.224)

1.250(0.110)

5.660(0.109)

L2ðb

ð1ÞÞ

0.016(0.002)

0.021(0.001)

0.016(0.002)

0.021(0.001)

0.062(0.002)

0.100(0.003)

0.057(0.002)

0.094(0.004

)L2ðb

ð2ÞÞ

0.046(0.004)

0.105(0.010)

0.047(0.004)

0.107(0.010)

0.102(0.003)

0.185(0.003)

0.115(0.004)

0.198(0.005)

L2ðgÞ

0.108(0.009)

0.180(0.017)

0.126(0.012)

0.205(0.020)

0.073(0.004)

0.129(0.006)

0.091(0.007)

0.141(0.008)

4RM

SPE

1.699(0.024)

2.955(0.048)

1.714(0.026)

3.006(0.057)

2.425(0.025)

4.210(0.035)

2.512(0.031)

4.526(0.044)

ME

0.201(0.005)

0.216(0.005)

0.201(0.006)

0.223(0.005)

0.286(0.005)

0.255(0.005)

0.251(0.005)

0.257(0.006)

c lm

1.680(0.153)

2.370(0.121)

1.760(0.164)

2.340(0.117)

17.86(0.546)

14.78(0.290)

6.640(0.270)

9.580(0.178)

c logit

2.490(0.100)

4.350(0.134)

2.380(0.094)

4.700(0.149)

5.630(0.276)

5.330(0.181)

2.600(0.114)

4.650(0.118)

L2ðb

ð1ÞÞ

0.029(0.003)

0.045(0.003)

0.029(0.003)

0.045(0.003)

0.278(0.008)

0.662(0.020)

0.273(0.010)

0.763(0.027)

L2ðb

ð2ÞÞ

0.019(0.002)

0.034(0.002)

0.020(0.002)

0.034(0.002)

0.182(0.007)

0.795(0.021)

0.218(0.009)

0.893(0.026)

L2ðgÞ

0.065(0.005)

0.268(0.039)

0.077(0.007)

0.329(0.036)

0.054(0.003)

0.160(0.007)

0.070(0.007)

0.176(0.009)

JOURNAL OF QUALITY TECHNOLOGY 299

Page 12: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

Table2.

Theaverages

andstandard

errors

(inparentheses)of

sevenloss

measuresforp¼50.

BHQQ(1)

BHQQ(2)

SM(F)

SM(B)

Setup

0.2

0.5

0.2

0.5

0.2

0.5

0.2

0.5

1RM

SPE

2.227(0.053)

3.841(0.083)

2.188(0.054)

3.795(0.079)

0.741(0.017)

1.130(0.021)

1.174(0.027)

1.326(0.026)

ME

0.250(0.007)

0.305(0.006)

0.255(0.007)

0.314(0.005)

0.288(0.005)

0.315(0.006)

0.308(0.007)

0.382(0.006)

c lm

6.120(0.268)

15.49(0.379)

6.080(0.267)

15.61(0.397)

27.90(0.997)

29.64(0.848)

13.88(0.679)

9.620(0.484)

c logit

6.920(0.149)

19.50(0.204)

7.060(0.160)

19.93(0.194)

13.64(0.347)

16.86(0.314)

7.540(0.113)

22.99(0.119)

L2ðb

ð1ÞÞ

0.098(0.006)

0.140(0.009)

0.098(0.007)

0.140(0.009)

0.009(0.001)

0.020(0.001)

0.022(0.001)

0.027(0.001)

L2ðb

ð2ÞÞ

0.094(0.006)

0.511(0.030)

0.093(0.006)

0.524(0.032)

0.009(0.001)

0.020(0.001)

0.022(0.001)

0.027(0.001)

L2ðgÞ

0.152(0.006)

0.417(0.009)

0.174(0.007)

0.429(0.008)

0.133(0.004)

0.348(0.007)

0.168(0.005)

0.481(0.007)

2RM

SPE

4.555(0.054)

8.801(0.107)

4.549(0.057)

8.844(0.123)

5.827(0.048)

10.83(0.106)

7.096(0.076)

13.23(0.140)

ME

0.270(0.007)

0.304(0.005)

0.269(0.007)

0.304(0.006)

0.292(0.006)

0.290(0.005)

0.293(0.008)

0.366(0.008)

c lm

7.760(0.299)

14.43(0.420)

7.910(0.312)

14.35(0.411)

37.90(0.749)

41.06(0.424)

26.42(0.558)

40.20(0.461)

c logit

7.310(0.140)

19.48(0.206)

7.270(0.144)

19.77(0.211)

14.14(0.393)

15.93(0.290)

7.360(0.349)

23.50(0.115)

L2ðb

ð1ÞÞ

0.095(0.005)

0.868(0.061)

0.095(0.005)

0.854(0.061)

0.510(0.013)

2.020(0.039)

0.763(0.021)

2.946(0.066)

L2ðb

ð2ÞÞ

0.098(0.007)

0.208(0.019)

0.099(0.007)

0.207(0.019)

0.511(0.014)

1.804(0.042)

0.762(0.025)

2.585(0.064)

L2ðgÞ

0.154(0.010)

0.448(0.019)

0.132(0.006)

0.474(0.012)

0.096(0.003)

0.384(0.008)

0.119(0.005)

0.505(0.013)

3RM

SPE

2.833(0.040)

5.194(0.077)

2.855(0.037)

5.244(0.071)

2.860(0.031)

5.474(0.067)

3.606(0.053)

6.272(0.083)

ME

0.269(0.008)

0.303(0.006)

0.270(0.007)

0.317(0.006)

0.281(0.006)

0.294(0.006)

0.276(0.008)

0.373(0.008)

c lm

7.760(0.282)

14.53(0.367)

7.720(0.263)

14.63(0.368)

26.90(1.065)

27.64(0.731)

15.50(0.664)

15.70(0.574)

c logit

7.280(0.135)

18.94(0.241)

7.180(0.137)

19.25(0.249)

12.78(0.391)

16.45(0.317)

6.910(0.121)

23.26(0.124)

L2ðb

ð1ÞÞ

0.034(0.002)

0.304(0.023)

0.034(0.003)

0.308(0.023)

0.091(0.003)

0.474(0.014)

0.162(0.006)

0.618(0.019)

L2ðb

ð2ÞÞ

0.230(0.014)

0.312(0.021)

0.232(0.014)

0.311(0.020)

0.181(0.005)

0.450(0.012)

0.265(0.007)

0.619(0.016)

L2ðgÞ

0.158(0.008)

0.555(0.012)

0.171(0.011)

0.583(0.013)

0.103(0.004)

0.475(0.009)

0.120(0.006)

0.631(0.014)

4RM

SPE

4.660(0.056)

7.482(0.095)

4.637(0.059)

7.469(0.098)

6.162(0.048)

8.813(0.092)

7.781(0.101)

10.60(0.122)

ME

0.256(0.006)

0.311(0.007)

0.252(0.006)

0.307(0.006)

0.306(0.006)

0.293(0.005)

0.299(0.006)

0.380(0.006)

c lm

6.540(0.288)

18.24(0.386)

6.290(0.283)

18.63(0.390)

40.38(0.840)

39.92(0.424)

29.10(0.575)

36.86(0.428)

c logit

7.600(0.159)

19.65(0.243)

7.560(0.134)

19.46(0.223)

12.98(0.362)

16.98(0.305)

7.710(0.117)

22.55(0.142)

L2ðb

ð1ÞÞ

0.080(0.005)

0.989(0.057)

0.080(0.005)

1.026(0.058)

0.553(0.014)

1.630(0.028)

0.923(0.026)

2.141(0.043)

L2ðb

ð2ÞÞ

0.094(0.006)

0.099(0.006)

0.089(0.005)

0.099(0.006)

0.548(0.014)

0.884(0.023)

0.864(0.027)

1.431(0.042)

L2ðgÞ

0.116(0.005)

0.502(0.011)

0.124(0.010)

0.509(0.010)

0.086(0.003)

0.423(0.008)

0.113(0.005)

0.582(0.009)

300 L. KANG ET AL.

Page 13: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

on explaining only the results from Table 1. Followingare our observations from the simulation.

� BHQQ(1) and BHQQ(2) have very similar per-formances, indicating that the proposed Bayesianhierarchical model is not sensitive to the twooptions (m, d2, a, b).

� In terms of RMSPE for Setup 1 the BHQQ is infer-ior to SM(F) and SM(B), but this is expected. Onthe one hand, bð1Þ and bð2Þ are similar, whichimplies the quantitative response Y does notdepend on the qualitative response Z, so the separ-ate models should be sufficient. On the otherhand, the BHQQ involves twice as many unknownparameters for Y as the separate models, whichcould lead to the inferior parameter estimates. Thegood news is that the RMSPE of BHQQ outper-forms the separate models for Setups 2 – 4, inwhich bð1Þ is different from bð2Þ, reflecting thedependency of Y on Z. The proposed model isespecially significantly superior to the separatemodels for Setups 2 and 4. One possible explan-ation is that when the significant variables of bð1Þ

and bð2Þ are the same (i.e., Setups 1 and 3), theseparate models are still able to distinguishbetween the significant and insignificant variables,despite the fact that �b1 and �b2 are different (Setup3). However, when the significant variables are dif-ferent (i.e., Setups 2 and 4), there is a strongerdependency of Y on Z; thus the separate models

have much more difficulty in identifying the sig-nificant variables.

� We also note that the BHQQ is comparable to,and sometimes slightly better than, the SM(F) andSM(B) with respect to ME. This is because bothBHQQ and separate models fit the binary responseZ based on the marginal distribution pðzjgÞ.

� In terms of clm, the proposed Bayesian hierarchicalmodel remarkably dominates SM(F). It is also bet-ter than SM(B) for Setups 1 and 3 in the settings¼ 0.2, and it is comparable in the setting s¼ 0.5.For Setups 2 and 4 the BHQQ significantly outper-forms SM(B) for the same reason as mentionedabove—the separate models do not perform wellbecause of the dependency between Y and Z.

� For the parameter estimation accuracy L2ðbð1ÞÞ andL2ðbð2ÞÞ the BHQQ is inferior to the SM(F) andSM(B) for Setup 1 but greatly outperforms the sep-arate methods for Setups 2 – 4. This is due to thesame reason as the RMSPE for Setup 1.

� For clogit the proposed model is similar to SM(B)for all settings. Compared with SM(F) the pro-posed model gives superior performance in the set-ting s¼ 0.2 and is similar to SM(F) in thesetting s¼ 0.5.

� In addition, we also notice that the proposedmodel is slightly worse than the separate modelswith respect to L2ðgÞ. One possible reason couldbe the generalized linear model plus LASSOslightly outperforms the Bayesian version of thegeneralized linear model.

β2(1) β8

(1) β14(1) β5

(2)

β7(2) β17

(2) η4

0.0

0.5

1.0

1.5

η7

−0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5

−0.5 0.0 0.5 1.0

0.0

0.5

1.0

1.5

1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

2.0

−0.5 0.0 0.5

0.0

0.5

1.0

1.5

2.0

−1.0 −0.5 0.0 0.5

0.0

0.5

1.0

1.5

−1.0 −0.5 0.0 0.5 1.0 −1.5 −1.0 −0.5 0.0 0.5

0.0

0.5

1.0

1.5

Figure 5. Histograms for the selected parameters of one replicate from Setup 2 when p¼ 20 and s¼ 0.2.

JOURNAL OF QUALITY TECHNOLOGY 301

Page 14: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

To check the convergence property of MCMCchains for our interests bð1Þ, bð2Þ, and g we use theGelman-Rubin diagnostic through the coda pack-age. Here we show only the multivariate scale reduc-tion factor from one randomly selected simulation outof the 100 from Setup 2, where p¼ 20 and s¼ 0.2.Their values are 1.004, 1.003, and 1.209 for bð1Þ, bð2Þ,and g, respectively. In addition, Figure 5 displays thehistograms for the posterior samples of some parame-ters after the burn-in period.

5. Case study: Lapping experiment

We study the experiment of the lapping processdescribed in Section 1. The experimental data includen = 450 observations, p = 10 predictor variablesx ¼ ðx1; :::; xpÞ0, and two responses as described inTable 3. The quantitative response is the TTV, andthe binary response is the indicator variable denoting

whether or not the STIR is larger than the tolerance.The basis function f ðxÞ here is just the intercept termand predictor variables x1, … , xp, (i.e.,f ðxÞ ¼ ð1; x1; :::; xpÞ0).

We set the number of MCMC iterations B¼ 10,000with burn-in period b¼ 1,000. The initial values ofthe hyperparameters (r1,0, r2,0, r3,0, s21;0, s

22;0) are set as

(0.3, 0.3, 0.3, 1.5, 3). We set the model tuning param-eter (�, d2, a, b)¼ (2, 2, 0.5, 0.5). To compare the per-formance of the BHQQ with SM(F), SM(B), and theone proposed by Deng and Jin (2015) (denoted byQQ(F)), we randomly split the whole data set intotwo groups: a training set of 350 samples and a testingset of 100 samples. For each random split we fit thefour methods and make predictions. We repeat this50 times.

Figure 6 shows the boxplots of the RMSPE valuesfor Y and the ME values for Z. “Mean” or “Median”indicate whether we use the mean or median of the

Table 3. Measured predictor variables and quality responses in the lapping experiment.Type Name Definition

Lapping Process Variables x1: Pressure (N=m2) The high pressure of the upper to lower platex2: Rotation (Rpm) The rotation speedx3: LPTime (Sec.) The time for low pressurex4: HPTime (Sec.) The time for high pressure

Quality Covariates before Lapping x5: CTHK0 (lm) Central thickness of wafersx6: TTV0 (lm) Total thickness variation of wafersx7: TIR0 (lm) Total indicator reading of wafersx8: STIR0 (lm) Site total indicator reading (STIR) of wafersx9: BOW0 (lm) Deviation of local warp at the center of wafersx10: WARP0 (lm) Maximum of local warp of wafers

QQ Responses y: TTV (lm) Continuous total thickness variation of wafersz: STIR indicator (0 or 1) Binary indicator for the conformity of STIR

Mean Median QQ(F) SM(F) SM(B)

0.4

0.5

0.6

0.7

0.8

RMSPE

Mean Median QQ(F) SM(F) SM(B)

0.15

0.20

0.25

0.30

0.35

0.40

0.45

ME

Figure 6. Box plots of two prediction measures for lapping data for different methods.

302 L. KANG ET AL.

Page 15: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

9,000 posterior predictive samples as the predictedvalues for Y and Z. We compare them with theRMSPE and ME of three alternative methods. For thisexperiment the sample size is greatly larger than themodel size; thus all the frequentist approaches can beused. According to Figure 6, SM(F) and SM(B)perform comparably in terms of RMSPE and ME.QQ(F) has a lower prediction error than SM(F) and

SM(B) since it considers the dependency of Y on Z.Nevertheless, BHQQ greatly outperforms QQ(F) interms of both RMSPE and ME. Overall, theproposed Bayesian hierarchical model showsobvious improvement in prediction accuracy. Notethat using the median of the posterior predictivesamples is more accurate than the mean for theY response.

0 10 20 30 40

β2(1)

0 10 20 30 40

β8(1)

0 10 20 30 40

β5(2)

0 10 20 30 40

β7(2)

0 10 20 30 40

η2

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

η4

Figure 7. ACF plots for the selected parameters for the lapping data.

0 10 20 30 40

σ2

0 10 20 30 40

r1

0 10 20 30 40

r2

0 10 20 30 40

r3

0 10 20 30 40

τ12

0 10 20 30 40

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

τ22

Figure 8. ACF plots for (r2, r1, r2, r3, s21, s22).

JOURNAL OF QUALITY TECHNOLOGY 303

Page 16: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

It is worth pointing out that the MEs of all fourmethods are relatively large, even though the pro-posed BHQQ method has the smallest ME. A possibleexplanation is that the binary response of the lappingexperiment is quite noisy, which is confirmed by theQQ(F), SM(F), and SM(B) methods and by Deng andJin (2015). Compared with SM(F) and SM(B), theBHQQ method has reduced the misclassificationerror, but such improvement is not as significant as itis for the continuous response. This is due to the con-ditional structure YjZ. Additional information fromthe responses z is included in the model of Y, whereasZ is still modeled marginally.

Next, we look into the analysis results of theBHQQ model for only one randomly split trainingand testing data set. There are 34 observations withZ = 0 and 66 with Z = 1 in this particular testing set.For clear illustration we reorder the testing data suchthat the first 34 observations correspond to Z¼ 0.Figures 7 and 8 are the autocorrelation function(ACF) plots for some selected parameters and (r2, r1,r2, r3, s

21, s

22). The quick drop off of ACF in these

plots, especially of bð1Þ, bð2Þ, and r2, implies the fastconvergence of the Gibbs sampling iteration. Figure 9presents the trace plots for some selected parameters.After the burn-in period the traces of the parametersfluctuate around the means with relatively stable vari-ation, which further confirms that the parameter pos-terior sampling converges. The rest of the parametershave similar patterns; hence their plots are omitted.

Figure 10 shows the histograms of the MCMC sam-ples (after the burn-in ones have been removed) cor-responding to the significant variables. Figure 11displays the box plots of the posterior samples of eachelement of bð1Þ, bð2Þ, and g from which we can easilyidentify the significant ones. Also, it is clear that bð1Þ

and bð2Þ have very different distributions, implyingthat Y strongly depends on Z; thus the proposedBayesian hierarchical model is able to account for thisrelationship.

We plot the true observations y against the pre-dicted values y of the testing set in Figure 12. Theprediction is relatively accurate since most points arearound the line of y ¼ y. Figure 13 shows the trueobservations y (black dashed line) in the testing setand their 95 percent prediction intervals (red dottedlines) from the posterior predictive samples. Amongthem, 61 out of the 100 observations are correctlycovered by the intervals. Figure 14 plots the estimatedprobability (black dashed line) and its 95 percent pre-diction intervals (red dotted lines) for testing the dataset. The first 34 binary responses are equal to zero.They are on the left of the black solid line of index =34. The horizontal solid line represents p¼ 0.5. If theestimated probability is above the line p¼ 0.5, its pre-dicted value is set to be one and zero otherwise. Weare more confident assigning values one or zero whenthe prediction intervals are totally above or below theline p¼ 0.5. These predicted values are shown inFigure 15—the red triangles are the misclassifications.

β2(1) β8

(1) β5(2)

β7(2) η1

0 100 200 300 400 500 600

−0.

85−

0.75

−0.

65

0 100 200 300 400 500 600

−0.

2−

0.1

0.0

0.1

0.2

0 100 200 300 400 500 600

−0.

2−

0.1

0.0

0.1

0.2

0 100 200 300 400 500 600

−0.

10.

00.

10.

20.

30.

4

0 100 200 300 400 500 600

−1.

0−

0.8

−0.

6−

0.4

0 100 200 300 400 500 600

−0.

4−

0.2

0.0

0.2

η4

Figure 9. Trace plots for the selected parameters for the lapping data.

304 L. KANG ET AL.

Page 17: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

The proposed model incorrectly classifies 12 goodwafers (STIR indicator is zero) as bad (STIR indicatoris one), and five bad wafers as good. The misclassifi-cation error rate is 17 percent. In contrast, QQ(F)incorrectly gives 21 good wafers as bad and wronglypredicts 9 bad wafers as good. The misclassificationrate of QQ(F) is 30 percent, which is worse than theproposed model.

6. Conclusion

We propose a Bayesian hierarchical model to analyzethe experiments whose outputs contain a continuousresponse and a binary response. The proposed methodcan jointly model the two types of responses and dis-cover the dependency between them, which exceeds

the simple separate modeling of the two types ofresponses. The latest existing method developed byDeng and Jin (2015) can jointly model the tworesponses via maximizing the constrained likelihood,but it has two restrictions that are discussed in theIntroduction. The proposed Bayesian framework caneasily overcome these two restrictions. In addition, itsestimation, prediction, and inference accuracy arecomparable to or much better than all the existingmethods, including Deng and Jin (2015), as shown inSection 4.

In this article we focus only on the simplest case ofthe quantitative and qualitative system (i.e., a singlenormal distributed quantitative response and a singlebinary qualitative response). The proposed Bayesianhierarchical modeling approach can be extended to

β1(1) β2

(1) β3(1) β4

(1)

β7(1) β1

(2) β2(2) β3

(2)

β4(2) β7

(2) β8(2) η1

η2 η3 η9 η10

−0.10 0.00 0.10 0.20 −0.90 −0.80 −0.70 −0.60 −0.3 −0.2 −0.1 0.0 −0.35 −0.25 −0.15 −0.05

−0.3 −0.2 −0.1 0.0 −0.9 −0.7 −0.5 −0.6 −0.5 −0.4 −0.3 −0.2 −0.4 −0.2 0.0 0.1

−0.1 0.0 0.1 0.2 0.3 0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.1 −1.2 −0.8 −0.4

0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.6 1.0 1.4 −0.8 −0.4 0.0 0.2 −0.4 0.0 0.4 0.8

02

46

8

02

46

8 8

02

46

810

01

23

45

6

01

23

45

6

01

23

45

6

01

23

45

6

01

23

45

67

0

02

46

12

34

01

23

45

0.0

1.0

2.0

3.0

0.0

1.0

2.0

0.0

1.0

2.0

0.0

1.0

2.0

0.0

1.0

2.0

Figure 10. Histograms for the parameter estimates of the significant variables for the lapping data. Red color presents elementsof bð1Þ, blue for bð2Þ, and yellow for g.

JOURNAL OF QUALITY TECHNOLOGY 305

Page 18: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

some more complicated situations. For example, whenthere are multivariate quantitative responses weassume a multivariate normal distribution modelaccordingly and assume the correlation between thedifferent quantitative responses if needed. When thequalitative response has finite but more than two lev-els we can assume the multinomial logistic regressionmodel instead of the Bernoulli logistic regressionmodel. Some more complicated cases may occur, such

as multiple qualitative responses, in which the pro-posed approach cannot be easily extended. We plan topursue these topics in our future research.

About the authors

Dr. Lulu Kang is an Associate Professor of theDepartment of Applied Math at Illinois Institute ofTechnology (IIT). She obtained an M.S. in Operations

●●

●●●●●

●●

●●

●●●●●●●●●

●●●●

●●

●●

●●●●●

●●

●●●●●●●

●●

●●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●●●

●●●

●●●●●

●●

●●

●●●●●

●●

●●●

●●●

●●●●

●●

●●

●●●

●●●●

●●●

●●●●●●●

●●

●●●●

●●●

●●●

●●●●

●●

●●

●●●●●●●●

●●

● ●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●●

●●●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●●

●●●●

●●●●

●●●

●●

●●●●●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●●

●●●●

●●●●

●●

●●●

●●●● ●

●●●●●●

●●

●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●

●●●●●

●●●

●●

●●●●

●●●

●●

β(1)

●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●●

●●●

●●

●●●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●●●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●●

●●●●

●●●●●

●●●●●

●●

●●●

●●●●●●●●

●●

●●●●

●●

●●

●●

●●●

●●●●

●●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●●●

●●

●●●●●

●●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●

●●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●●

● ●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●●●●●●

●●

β(2)

●●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●●●

●●

●●

●●●

●●●

●●●

●●

●●●●●

●●●

●●

●●●●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●

●●●●●

●●●

●●●

●●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●●

●●●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●●●

●●●●

●●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●●●●●●●

●●●●

●●

●●●

●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●●

0 1 2 3 6 7 8 9 10

0 1 2 3 6 7 8 9 10

0 1 2 3

4 5

4 5

4 5 6 7 8 9 10

−1.

0−

0.5

0.0

0.5

−1.

0−

0.5

0.0

0.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

η

Figure 11. Box plots for the parameter estimates for the lapping data.

306 L. KANG ET AL.

Page 19: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

Research and a Ph.D. in Industrial Engineering fromthe Stewart School of Industrial and SystemsEngineering at Georgia Tech. Dr. Kang has workedon various areas in Statistics, including uncertaintyquantification, statistical design and analysis of experi-ments, Bayesian computational statistics, etc. She has

collaborated with researchers in mechanical, manufac-turing, environmental engineering fields. Dr. Kang isalso the co-founder and Associate Program Directorof the Data Science Program at IIT.Dr. Xiaoning Kang holds a Ph.D. in statistics fromVirginia Tech. He is now an Assistant Professor atInternational Business College in Dongbei Universityof Finance and Economics, China. His research inter-ests include high dimensional data analysis, largecovariance matrix estimation, and statistical method-ology with financial applications.Dr. Xinwei Deng is an Associate Professor in theDepartment of Statistics at Virginia Tech. He receivedhis Ph.D. from Georgia Institute of Technology in2009. Dr. Deng’s research interests includes data ana-lytics, machine learning, design of experiments, andinterface between experimental design and machinelearning. Dr. Deng actively collaborates with research-ers in engineering, nanotechnology, bioinformatics,and environmental sciences. He has over 35 peer-reviewed top journal and conference publications. Dr.Deng is currently an associate editor in Technometrics,International Statistics Review, and IISE Transactions.Dr. Deng’s research has been funded by the NationalScience Foundation and industrial collaborations.Dr. Ran Jin is an Assistant Professor and the Directorof Laboratory of Data Science and Visualization at theGrado Department of Industrial and SystemsEngineering at Virginia Tech. He received his Ph.D.degree in Industrial Engineering from Georgia Tech,Atlanta, his Master’s degrees in Industrial Engineering,and in Statistics, both from the University ofMichigan, Ann Arbor, and his Bachelor’s degree inElectronic Engineering from Tsinghua University,Beijing. His research focuses on Data Fusion in SmartManufacturing, including the integration of different-types of data sets (e.g., ensemble models), variables(e.g., quantitative and qualitative models), and infor-mation (e.g., product quality and equipment reliability)

−2 −1 0 1 2

−2

−1

0

1

2

predicted y

true

y

Figure 12. Plot of the true observations y against the pre-dicted y of the testing data set.

0 20 40 60 80 100

−2

−1

0

1

2

Index

Figure 13. Plot of the true observations y of testing data setand the corresponding 95 percent prediction intervals.

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Index

Figure 14. Plot of the estimated probabilities and the corre-sponding 95 percent prediction intervals using the testingdata set.

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

Index

Figure 15. Plot of the estimated qualitative responses Z oftesting data set. The black circles represent the correct classifi-cations; the red triangles indicate the misclassifications.

JOURNAL OF QUALITY TECHNOLOGY 307

Page 20: A Bayesian hierarchical model for quantitative and ...A Bayesian hierarchical model for quantitative and qualitative responses Lulu Kang, Xiaoning Kang, Xinwei Deng & Ran Jin ... generalized

for synergistically modeling, monitoring and control ofmanufacturing processes and systems. He is currentlyserving as an Associate Editor for IISE Transactions,Focus Issue on Design and Manufacturing.

Funding

This research was supported by U.S. National ScienceFoundation Grants CMMI-1435902 and CMMI-1435996.

References

Ai, M., Kang, L., and V. R, Joseph. 2009. Bayesian optimalblocking of factorial designs. Journal of StatisticalPlanning and Inference 139 (9):3319–3328.

Bello, N. M., J. P. Steibel, and R. J. Tempelman. 2012.Hierarchical Bayesian modeling of heterogeneous cluster-and subject-level associations between continuous andbinary outcomes in dairy production. Biometrical Journal54 (2):230–248.

Breiman, L. 1995. Better subset regression using the non-negative garrote. Technometrics 37 (4):373–384.

Casella, G., and E. I. George. 1992. Explaining the Gibbssampler. The American Statistician 46 (3):167–174.

Catalano, P. J., and L. M. Ryan. 1992. Bivariate latent vari-able models for clustered discrete and continuous out-comes. Journal of the American Statistical Association 87(419):651–658.

Chen, M. H., and J. G. Ibrahim. 2003. Conjugate priors forgeneralized linear models. Statistica Sinica 13(2):461–476.

Chen, S., D. M. Witten, and A. Shojaie. 2014. Selection andestimation for mixed graphical models. Biometrika 102(1):47–64.

Cox, D. R., and N. Wermuth. 1992. Response models formixed binary and quantitative variables. Biometrika 79(3):441–461.

Deng, X., and R. Jin. 2015. QQ models: Joint modeling forquantitative and qualitative quality responses in manufac-turing systems. Technometrics 57 (3):320–331.

Dunson, D. B., and A. H. Herring. 2005. Bayesian latentvariable models for mixed discrete outcomes. Biostatistics6 (1):11–25.

Fitzmaurice, G. M., and N. M. Laird. 1995. Regression mod-els for a bivariate discrete and continuous outcome withclustering. Journal of the American Statistical Association90 (431):845–852.

Fitzmaurice, G. M., and N. M. Laird. 1997. Regression mod-els for mixed discrete and continuous responses withpotentially missing values. Biometrics 53 (1):110–122.

Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A.Vehtari, and D. B. Rubin. 2014. Bayesian data analysis.3rd ed., Vol. 2. Boca Raton, FL: CRC Press.

Gelman, A., and D. B. Rubin. 1992. Inference from iterativesimulation using multiple sequences. Statistical Science 7(4):457–472.

Guglielmi, A., F. Ieva, A. M. Paganoni, and F. A. Quintana.2016. A semiparametric Bayesian joint model for multiplemixed-type outcomes: An application to acute myocardialinfarction. Advances in Data Analysis and Classification12 (2):399–423.

Holmes, C. C., and L. Held. 2006. Bayesian auxiliary vari-able models for binary and multinomial regression.Bayesian Analysis 1 (1):145–168.

Joseph, V. R,. 2006. A Bayesian approach to the design andanalysis of fractionated experiments. Technometrics 48(2):219–229.

Kang, L., X. Deng, and R. Jin. 2015. Bayesian d-optimaldesign of experiments with continuous and binaryresponses. Revision invited for Journal of AmericanStatistical Association.

Kang, L., and V. R. Joseph. 2009. Bayesian optimal singlearrays for robust parameter design. Technometrics 51(3):250–261.

Ning, Y. D., Y. Z. Bian, and B. Liu. 2012. Improving a lap-ping process using robust parameter design and run-to-run control. Journal of the Chinese Institute of IndustrialEngineers 29 (2):111–124.

Plummer, M., N. Best, K. Cowles, and K Vines. 2006.CODA: Convergence diagnosis and output analysis forMCMC. R News 6 (1):7–11.

Roberts, G. O., and A. F. Smith. 1994. Simple conditionsfor the convergence of the Gibbs sampler andMetropolis-Hastings algorithms. Stochastic Processes andTheir Applications 49 (2):207–216.

Sammel, M. D., L. M. Ryan, and J. M. Legler. 1997. Latentvariable models for mixed discrete and continuous out-comes. Journal of the Royal Statistical Society: Series B(Statistical Methodology) 59 (3):667–678.

Weiss, R. E., J. Jia, and M. A. Suchard. 2011. A Bayesianmodel for the common effects of multiple predictors onmixed outcomes. Interface Focus 1 (6):886–894.

Wu, C. J., and M. S. Hamada. 2011. Experiments: Planning,analysis, and optimization. Vol. 552. Hoboken, NJ: JohnWiley & Sons.

Yang, E., Y. Baker, P. Ravikumar, G. Allen, and Z Liu.2014. Mixed graphical models via exponential families.Paper presented at the Seventeenth InternationalConference on Artificial Intelligence and Statistics,Reykjavik, Iceland, JMLR: W&CP vol. 33. 1042–1050.

Zhang, L., K. Wang, and N. Chen. 2015. Monitoring wafers’geometric quality using an additive Gaussian processmodel. IIE Transactions 48 (1):1–15.

Zhao, H., R. Jin, S. Wu, and J. Shi. 2011. PDE-constrainedGaussian process model on material removal rate of wiresaw slicing process. Journal of Manufacturing Science andEngineering 133 (2):021012.

308 L. KANG ET AL.