a multivariate input uncertainty in output analysis for …homepages.rpi.edu/~xiew3/mv_final.pdfa...

27
A Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation WEI XIE, Rensselaer Polytechnic Institute BARRY L. NELSON, Northwestern University RUSSELL R. BARTON, The Pennsylvania State University When we use simulations to estimate the performance of stochastic systems, the simulation is often driven by input models estimated from finite real-world data. A complete statistical characterization of system performance estimates requires quantifying both input model and simulation estimation errors. The components of input models in many complex systems could be dependent. In this paper, we characterize the distribution of a random vector by its marginal distributions and a dependence measure: either product- moment or Spearman rank correlations. To quantify the impact from dependent input model and simulation estimation errors on system performance estimates, we propose a metamodel-assisted bootstrap framework. Specifically, we estimate the key properties of dependent input models with real-world data and construct a joint distribution by using the flexible NORmal To Anything (NORTA) representation. Then, we employ the bootstrap to capture the estimation error of the joint distributions, and an equation-based stochastic kriging metamodel to propagate the input uncertainty to the output mean, which can also reduce the influence of simulation estimation error due to output variability. Asymptotic analysis provides theoretical support for our approach, while an empirical study demonstrates that it has good finite-sample performance. Categories and Subject Descriptors: I.6.6 [Simulation and Modeling]: Simulation Output Analysis General Terms: Algorithms,Experimentation Additional Key Words and Phrases: bootstrap, confidence interval, Gaussian process, multivariate input uncertainty, NORTA, output analysis 1. INTRODUCTION Stochastic simulation is used to estimate the behavior of complex systems that are driven by random input models. The distributions of these input models are often es- timated from finite real-world data. Therefore, a complete statistical characterization of stochastic system performance requires quantifying both input and simulation esti- mation error. Ignoring either source of uncertainty could lead to unfounded confidence in the system performance estimate. The choice of input models directly impacts the system performance estimates. A prevalent practice is to model the input processes as a collection of independent and identically distributed (i.i.d.) univariate distributions. However, considering that com- ponents of real inputs could be dependent, these simple models do not always faithfully This paper is based upon work supported by the National Science Foundation under Grant No. CMMI- 1068473. Author’s addresses: W. Xie (corresponding author), Department of Industrial and Systems Engineer- ing, Rensselaer Polytechnic Institute, Troy, NY 12180-3590; email: [email protected]; B. L. Nelson, Depart- ment of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL 60208; email: [email protected]; R. R. Barton, Smeal College of Business, Pennsylvania State University, University Park, PA 16802; email: [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is per- mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c YYYY ACM 1049-3301/YYYY/-ARTA $15.00 DOI:http://dx.doi.org/10.1145/0000000.0000000 ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Upload: lamdiep

Post on 10-Apr-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

A

Multivariate Input Uncertainty in Output Analysis for StochasticSimulation

WEI XIE, Rensselaer Polytechnic InstituteBARRY L. NELSON, Northwestern UniversityRUSSELL R. BARTON, The Pennsylvania State University

When we use simulations to estimate the performance of stochastic systems, the simulation is oftendriven by input models estimated from finite real-world data. A complete statistical characterization ofsystem performance estimates requires quantifying both input model and simulation estimation errors. Thecomponents of input models in many complex systems could be dependent. In this paper, we characterizethe distribution of a random vector by its marginal distributions and a dependence measure: either product-moment or Spearman rank correlations. To quantify the impact from dependent input model and simulationestimation errors on system performance estimates, we propose a metamodel-assisted bootstrap framework.Specifically, we estimate the key properties of dependent input models with real-world data and construct ajoint distribution by using the flexible NORmal To Anything (NORTA) representation. Then, we employ thebootstrap to capture the estimation error of the joint distributions, and an equation-based stochastic krigingmetamodel to propagate the input uncertainty to the output mean, which can also reduce the influence ofsimulation estimation error due to output variability. Asymptotic analysis provides theoretical support forour approach, while an empirical study demonstrates that it has good finite-sample performance.

Categories and Subject Descriptors: I.6.6 [Simulation and Modeling]: Simulation Output Analysis

General Terms: Algorithms,Experimentation

Additional Key Words and Phrases: bootstrap, confidence interval, Gaussian process, multivariate inputuncertainty, NORTA, output analysis

1. INTRODUCTIONStochastic simulation is used to estimate the behavior of complex systems that aredriven by random input models. The distributions of these input models are often es-timated from finite real-world data. Therefore, a complete statistical characterizationof stochastic system performance requires quantifying both input and simulation esti-mation error. Ignoring either source of uncertainty could lead to unfounded confidencein the system performance estimate.

The choice of input models directly impacts the system performance estimates. Aprevalent practice is to model the input processes as a collection of independent andidentically distributed (i.i.d.) univariate distributions. However, considering that com-ponents of real inputs could be dependent, these simple models do not always faithfully

This paper is based upon work supported by the National Science Foundation under Grant No. CMMI-1068473.Author’s addresses: W. Xie (corresponding author), Department of Industrial and Systems Engineer-ing, Rensselaer Polytechnic Institute, Troy, NY 12180-3590; email: [email protected]; B. L. Nelson, Depart-ment of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL 60208;email: [email protected]; R. R. Barton, Smeal College of Business, Pennsylvania State University,University Park, PA 16802; email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrightsfor components of this work owned by others than ACM must be honored. Abstracting with credit is per-mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any componentof this work in other works requires prior specific permission and/or a fee. Permissions may be requestedfrom Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)869-0481, or [email protected]© YYYY ACM 1049-3301/YYYY/-ARTA $15.00DOI:http://dx.doi.org/10.1145/0000000.0000000

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:2 W. Xie et al.

represent the physical processes. For example, in a project planning network, the ac-tivity durations for different tasks could be correlated if they are affected by the samenuisance factors, e.g., weather conditions. In a supply chain system, the demands ofa customer over different products, e.g., low-fat and whole milk, could be related. Ig-noring such dependence can lead to poor estimates of system performance measures.Thus, it is desirable to build input models that can faithfully capture the dependence.In this paper we account for input models with random-vector distributions and do notconsider time-series input processes.

Biller and Ghosh [2006] reviewed various approaches to construct joint input dis-tributions. Considering the amount of information needed to specify the joint distri-bution, almost all these methods suffer from some serious drawbacks. In light of thisdifficulty, most input-modeling research focuses on methods that match only certainkey properties of the input models including the marginal distributions and some de-pendence measure.

We assume that dependent input models are characterized by their marginal distribu-tions and a dependence measure. Specifically, the marginal distributions have knownparametric families with parameter values unknown. The dependence between differ-ent components of input models can be measured by various criteria [Biller and Ghosh2006]. We focus on product-moment and Spearman rank correlations in this paper.Product-moment correlation is widely used in engineering applications. The definitionof product correlation needs the variances of the components to be finite. Thus, wealso include the use of Spearman rank correlation as a dependence measure, whichfinds wide applications in business studies, e.g., decision and risk analysis [Clemenand Reilly 1999]. Instead of measuring linear dependence, the Spearman rank corre-lation is a nonparametric method for capturing the monotonic relationship betweendifferent components of input models. It does not require the variances of the compo-nents to be finite. Further, since it is based on ranks, this measure is not sensitive toobservation outliers. Notice that in general dependence may be more than just pair-wise, and may be nonlinear, in which case more complex characterization is needed[Wu and Mielniczuk 2010].

Since marginal distribution parameters and dependence measures are estimatedfrom real-world data, their estimation error is called input uncertainty. When we usethe simulation outputs to estimate system performance, there are two sources of uncer-tainty: input and simulation error. To quantify the overall uncertainty about the sys-tem performance estimate, we build on Xie et al. [2014b], which proposed a metamodel-assisted bootstrapping approach to form a confidence interval (CI) accounting for theimpact of input and simulation uncertainty. Further, a variance decomposition wasproposed to estimate the relative contribution of input to overall uncertainty. However,our previous study in Xie et al. [2014b] was based on the assumption that the inputdistributions are univariate and mutually independent. The independence assumptiondoes not hold in general for input models in many stochastic systems.

This paper is a significant enhancement of Xie et al. [2014b]. To efficiently andcorrectly account for uncertainty in multivariate input models, we introduce a moregeneral metamodel-assisted bootstrapping framework; it can quantify the impact ofdependent input uncertainty and simulation estimation error on system performanceestimates while also reducing the influence of simulation estimation error due to fi-nite simulation effort as compared with direct simulation methods. Specifically, weestimate marginal distribution parameters and dependence measures of input mod-els with real-world data, and construct the joint distributions by using the flexibleNORmal To Anything (NORTA) representation [Cario and Nelson 1997]. Then, thebootstrap is used to quantify the estimation error of these joint distributions, and anequation-based stochastic kriging (SK) metamodel propagates the input uncertainty

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:3

to output mean. We can derive a CI that accounts for both simulation and input uncer-tainty by using this generalized metamodel-assisted bootstrapping approach. There-fore, our approach allows us to do statistical uncertainty analysis for stochastic sys-tems with dependence in the input models. Notice that our metamodel-assisted boot-strapping framework does not require NORTA. Other approaches that characterizedependent input models by their parametric marginal distributions and a dependencematrix could be easily employed in our framework.

There are two central contributions of this paper: First, we generalize the metamodel-assisted bootstrap framework [Xie et al. 2014b] to stochastic simulation with dependentinput models; second, we propose a rigorous analysis for cases where the dependence ismeasured by product-moment or Spearman rank correlation.

The next section describes other research on dependent input modeling and inputuncertainty analysis. This is followed by a formal description of the problem of inter-est in Section 3. In Section 4, we propose a generalized metamodel-assisted bootstrap-ping framework and provide a procedure to build a CI accounting for both input andsimulation estimation error on system mean performance estimates. Our approach issupported by asymptotic analysis. We then report results of finite sample behaviorfrom an empirical study in Section 5 and conclude the paper in Section 6. All proofsare in the Appendix.

2. BACKGROUNDFor stochastic simulations, various approaches to account for input uncertainty havebeen proposed; see Barton [2012] and Song et al. [2014] for reviews. The methodscan be divided into Bayesian and frequentist approaches, which have their underlyingmerits and limitations [Xie et al. 2014a].

To faithfully capture the dependence between different components of input mod-els, Cario and Nelson [1997] proposed a flexible NORTA distribution to represent andgenerate random vectors with almost arbitrary marginal distributions and product-moment correlation matrix. Clemen and Reilly [1999] used NORTA to represent thedependent input models for decision and risk analysis with dependence measured bySpearman’s and Kendall’s rank correlations.

Biller and Corlu [2011] proposed a Bayesian approach to account for the parameteruncertainty for dependent input models. Correlated inputs are modeled with NORTAand the dependence is measured by product-moment correlation. The uncertaintyaround the NORTA distribution parameters estimated from real-world data is quan-tified by posterior distributions. For complex stochastic systems with a large numberof correlated inputs, a fast algorithm draws samples from these posterior distributionsto quantify the input uncertainty. Then, the direct simulation method is used to prop-agate the input uncertainty to output mean by running simulations at each samplepoint, which could be computationally expensive for complex simulated systems. Fur-ther, the direct simulation method does not incorporate the simulation uncertaintyinto the Bayesian formulation.

Direct bootstrapping uses bootstrap resampling of the real-world data to measurethe input uncertainty and propagates it to output mean by direct simulation [Bartonand Schruben 1993; 2001; Barton 2007; Cheng and Holland 1997]. Compared withthe Bayesian approaches, the direct bootstrap can be adapted to any input modelswithout additional analysis and it does not need to resort to computationally expen-sive approaches to draw samples to quantify the input uncertainty. However, directsimulation cannot efficiently use the computational budget to reduce the impact fromsimulation estimation error. Further, since the statistic that is bootstrapped is the ran-dom output of a simulation, it is not a smooth function of input data; this violates theasymptotic validity of the bootstrap.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:4 W. Xie et al.

The metamodel-assisted bootstrapping approach was introduced by Barton et al.[2014]. The input uncertainty is measured by bootstrapping and an equation-basedSK metamodel propagates the input uncertainty to the output mean. This approachaddresses some of the shortcomings of the direct bootstrap. Specifically, the meta-model can reduce the impact of simulation estimation error. Further, metamodelingmakes the bootstrap statistic a smooth function of the input data so that the asymp-totic validity concerns faced by the direct bootstrap method disappear. However, Bar-ton et al. [2014] assumed that the simulation budget is not tight and the metamodeluncertainty can be ignored. If the true mean response surface is complex, especiallyfor high-dimensional problems with many input distributions, and the computationalbudget is tight, the impact of metamodel uncertainty can no longer be ignored.

The metamodel-assisted bootstrapping approach was improved in Xie et al. [2014b]to build a CI accounting for the impact from both input and metamodel uncertaintyon the system mean estimates. Further, a variance decomposition was proposed to es-timate the relative contribution of input to overall uncertainty, which is very usefulfor decision makers to determine where to put more effort to reduce the estimatorerror. The metamodel-assisted bootstrapping approach demonstrates robust perfor-mance even when there is a tight computational budget and simulation estimationerror is large. However, Xie et al. [2014b] is based on the assumption that input mod-els are a collection of mutually independent univariate distributions.

The success of metamodel-assisted bootstrapping for stochastic simulations with in-dependent univariate input distributions in Xie et al. [2014b] motivates us to extendit to more complex cases with dependence in the input models.

3. PROBLEM STATEMENTThe stochastic simulation output is a function of random numbers and the input modeldenoted by F . For notation simplification, we do not explicitly include the randomnumbers. The output from the jth replication of a simulation with input model F canbe written as

Yj(F ) = µ(F ) + εj(F )

where µ(F ) = E[Yj(F )] denotes the unknown output mean and εj(F ) represents thesimulation error with mean zero. Notice that the simulation output depends on thechoice of input model.

Let F = {F1, F2, . . . , FL} with F1, F2, . . . , FL mutually independent; F could be com-posed of univariate and multivariate joint distributions. Let F` be a d` dimensionaldistribution having marginal distributions denoted by {F`,1, F`,2, . . . , F`,d`} with d` ≥ 1.For the distributions F` with d` > 1, we focus on the continuous marginal distributionswith strictly increasing cumulative distribution functions (cdf) {F`,1, F`,2, . . . , F`,d`}.

For the `th distribution F` with d` > 1, we suppose that it is characterized bymarginal distributions and a dependence measure: either product-moment or Spear-man rank correlation matrix. Specifically, let a d` × 1 random vector X` ∼ F` havingd` × d` product-moment and Spearman rank correlation matrix denoted, respectively,by ρX`

and RX`

ρX`(i, j) = corr(X`,i, X`,j) =

Cov(X`,i, X`,j)√Var(X`,i)Var(X`,j)

,

RX`(i, j) = corr(F`,i(X`,i), F`,j(X`,j)) =

E[F`,i(X`,i)F`,j(X`,j)]− E[F`,i(X`,i)]E[F`,j(X`,j)]√Var(F`,i(X`,i))Var(F`,j(X`,j))

with i, j = 1, 2, . . . , d`. Suppose these correlation matrices are positive definite. Sincethe correlation matrices are symmetric and their diagonal terms are 1, we can view a

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:5

d` × d` correlation matrix as an element of d∗` ≡ d`(d` − 1)/2 dimensional space. There-fore, the product-moment and Spearman rank correlation matrix can be uniquely spec-ified by d∗` × 1 vectors denoted by Vρ

X`and VR

X`, respectively. For F` with d` = 1, these

correlation vectors are empty and d∗` = 0.For F` with ` = 1, 2, . . . , L, we assume that the families of marginal distributions{F`,1, F`,2, . . . , F`,d`} are known, but not their parameter values. Let an h`,i × 1 vectorθθθ`,i denote the unknown parameters for the ith marginal distribution F`,i. By stackingθθθ`,i with i = 1, 2, . . . , d` together, we have a d†` × 1 dimensional parameter vector θθθ>` ≡(θθθ>`,1, θθθ

>`,2, . . . , θθθ

>`,d`

) with d†` ≡∑d`i=1 h`,i.

Input models characterized by marginal distributions and correlation matrices canbe specified by ϑϑϑ ≡ {(θθθ`;VX`

), ` = 1, 2, . . . , L} that includes d ≡∑L`=1 d

†` +

∑L`=1 d

∗`

elements, where VX`= Vρ

X`or VR

X`. We call ϑϑϑ the input model parameters. By abusing

notation, we can rewrite µ(F ) as µ(ϑϑϑ). The true input parameters ϑϑϑc are unknown andestimated from finite samples of real-world data. Thus, our goal is finding a (1−α)100%CI [QL, QU ] such that

Pr{µ(ϑϑϑc) ∈ [QL, QU ]} = 1− α. (1)

Given finite real-world data, the sampling distribution of estimators for input modelparameters, denoted by ϑϑϑm, can be used to quantify the input uncertainty. Specifically,let m` denote the number of i.i.d. real-world observations available from the `th inputprocess X`,m` ≡

{X

(1)` ,X

(2)` , . . . ,X

(m`)`

}with a d` × 1 random vector X

(i)`

i.i.d∼ F c` fori = 1, 2, . . . ,m`. Let Xm = {X`,m` , ` = 1, 2, . . . , L} be the collection of samples from allL input distributions in F c, where m = (m1,m2, . . . ,mL). The real-world data are aparticular realization of Xm, say x

(0)m , and the estimator of input model parameters

ϑϑϑm = (θθθm; VρX,m) or (θθθm; VR

X,m) is a function of Xm.Under the assumption that the first h`,i marginal moments are finite for i =

1, 2, . . . , d` and ` = 1, 2, . . . , L, we can use the moment estimators for marginal dis-tribution parameters, denoted by θθθm [Xie et al. 2014b]. The usual estimators for theproduct-moment and Spearman rank correlations are

ρX`,m`(i, j) =

∑m`k=1

(X

(k)`,i − X`,i

)(X

(k)`,j − X`,j

)/(m` − 1)

S`,iS`,j(2)

RX`,m`(i, j) =

∑m`k=1

(r(X

(k)`,i )− r(X`,i)

)(r(X

(k)`,j )− r(X`,j)

)√[∑m`

k=1

(r(X

(k)`,i )− r(X`,i)

)2]·[∑m`

k=1

(r(X

(k)`,j )− r(X`,j)

)2] (3)

for i, j = 1, 2, . . . , d` and ` = 1, 2, . . . , L with d` > 1, where X`,i =∑m`k=1X

(k)`,i /m` and

S2`,i =

∑m`k=1

(X

(k)`,i − X`,i

)2/(m` − 1). We denote the rank function by r(·) ≡ rank(·). In

this paper, we use the uprank to estimate the Spearman rank correlations. It is definedby r

(X

(k)`,i

)≡∑m`j=1 I

(X

(j)`,i ≤ X

(k)`,i

)and r(X`,i) =

∑m`k=1 r

(X

(k)`,i

)/m` with I(·) denoting

an indicator function. Then, based on Equations (2) and (3), we can find correspondingcorrelation estimators Vρ

X,m and VRX,m.

The impact of input uncertainty on the system mean performance estimate is quan-tified by the sampling distribution of µ(ϑϑϑm). Further, since the underlying responsesurface µ(·) is unknown, at any θθθ, let µ(ϑϑϑ) denote the corresponding mean response

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:6 W. Xie et al.

estimator. Thus, there are both input and simulation estimation errors in the systemmean performance estimates.

For stochastic systems with dependent input models, our objective is to propose anapproach to quantify the overall impact of both input and simulation estimation erroron system mean performance estimates and then build a CI satisfying Equation (1).Further, since each simulation run could be computationally expensive and we havetight computational budget, we want to reduce the influence of simulation estimationerror.

4. METAMODEL-ASSISTED BOOTSTRAPPING FRAMEWORKFor problems with parametric input distributions that are univariate and mutually in-dependent, the metamodel-assisted bootstrapping framework was used to account forthe impact of both input and simulation estimation errors on the system performanceestimates in Xie et al. [2014b]. In this section, we generalize the metamodel-assistedbootstrapping approach for stochastic simulations with dependence in the input mod-els. The procedure to build a CI quantifying the overall uncertainty of system meanperformance estimates is described. Further, asymptotic analysis provides theoreticalsupport for our approach.

To make this section easy to follow, we start with an overall description of the gener-alized metamodel-assisted bootstrapping framework. We assume that dependent inputmodels can be characterized by the marginal distributions and a dependence measure,either product-moment or Spearman correlation matrix. Since the partial character-ization by these key properties does not uniquely determine the joint distributionsexcept in special cases, we use the NORTA representation to construct the joint distri-butions. Notice that unless the true distribution is NORTA, there is unmeasured errordue to incorrect input models. This error is not addressed in this paper. Further, wesuppose that the parametric marginal distributions can be specified by either parame-ters or moments. Since the marginal moments/parameters and correlation matrix areestimated from finite real-world data, the metamodel-assisted bootstrapping frame-work is proposed to quantify the overall uncertainty of system performance estimatefrom input and simulation estimation errors. Specifically, we employ the bootstrap tocapture the estimation error of the joint distributions and propagate the input uncer-tainty to the output mean by using an equation-based stochastic kriging metamodelthat is built based on the simulation outputs at a few well-chosen design points. Bothsimulation and metamodel uncertainty can be estimated using properties of the meta-model. Then, we derive a CI that accounts for both simulation and input uncertainty.

4.1. Bootstrap for Input UncertaintyIn this section, we describe how to employ the bootstrap to quantify the input uncer-tainty. The way we choose to represent input models plays an important role in theimplementation of metamodel-assisted bootstrapping. For the `th input distributionF`, instead of using the natural parameters θθθ` to characterize the marginal distribu-tions, we can use moments; see Barton et al. [2014] for an explanation. Suppose thatthe parametric marginal distribution F`,i can be uniquely characterized by its first h`,ifinite moments denoted by the h`,i × 1 vector ψψψ`,i for i = 1, 2, . . . , d`. By stacking ψψψ`,iwith i = 1, 2, . . . , d` together, we have a d†` × 1 dimensional vector of marginal momentsψψψ>` ≡ (ψψψ>`,1,ψψψ

>`,2, . . . ,ψψψ

>`,d`

). Therefore, the input models can be characterized by the col-lection of momentsM ≡ {M`, ` = 1, 2, . . . , L} withM` = (ψψψ`;VX`

) and VX`= Vρ

X`or

VRX`

. Notice that there is a one-to-one mapping between the input parameters ϑϑϑ andmomentsM. Abusing notation, we rewrite µ(ϑϑϑ) as µ(M).

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:7

The true moments of dependent input models, denoted by Mc, are unknown andestimated based on a finite sample Xm. Specifically, we use standardized sample mo-ments as estimators for marginal distributions, denoted by ψψψm; see Xie et al. [2014b].The correlation estimator Vρ

X,m or VRX,m is obtained by using Equation (2) or (3). The

estimation error of input models can be quantified by the sampling distribution ofMm = (ψψψm; Vρ

X,m) or (ψψψm; VRX,m), denoted by F cMm

. Therefore, the impact of input un-certainty on the system mean performance estimate can be measured by the samplingdistribution of µ(Mm) with Mm ∼ F cMm

.Since it is hard to derive the sampling distribution F cMm

, we use bootstrap re-sampling to approximate it [Shao and Tu 1995]. Specifically, denote index sets byA` ≡ {1, 2, . . . ,m`} for the `th distribution with ` = 1, 2, . . . , L. Notice that sinceF1, F2, . . . , FL are mutually independent, we can do bootstrapping for each distributionseparately. Implementation of bootstrap resampling in metamodel-assisted bootstrap-ping is as follows.

(1) For the `th distribution F` with ` = 1, 2, . . . , L, draw m` samples with replacementfrom set A` and obtain bootstrapped indexes {i1, i2, . . . , im`}; choose correspondingsamples from real-world data x

(0)m and get x

(1)`,m`

≡ {x(i1)` ,x

(i2)` , . . . ,x

(im` )

` }. Denotethe collection of bootstrap samples by x

(1)m = {x(1)

`,m`, ` = 1, 2, . . . , L} and use them to

calculate the bootstrapped moment estimate, denoted by M(1)m ≡

(ψψψ

(1)

m ; (VρX,m)(1)

)or(ψψψ

(1)

m ; (VRX,m)(1)

).

(2) Repeat the previous step B times to generate M(b)m with b = 1, 2, . . . , B.

The bootstrap resampled moments are drawn from the bootstrap distribution denotedby FMm

(·|x(0)

m

)with Mm ∼ FMm

(·|x(0)

m

). For estimation of a CI, B is recommended to

be a few thousand; see Xie et al. [2014a] for an explanation. In this paper, adenotes aquantity estimated from real-world data, while a˜denotes a quantity estimated frombootstrapped data.

Theorem 4.1 shows that when the amount of real-world data increases to infinity,the bootstrap provides a consistent estimator for the true input momentsMc.

THEOREM 4.1. Suppose F1, F2, . . . , FL are mutually independent and the followingconditions hold.

(1) We have X(k)`

i.i.d∼ F c` with k = 1, 2, . . . ,m` and ` = 1, 2, . . . , L;(2) The marginal distribution F c`,i is uniquely characterized by first h`,i moments and

it has finite first 4h`,i moments for i = 1, 2, . . . , d` and ` = 1, 2, . . . , L.(3) E(X4

`,iX4`,j) <∞ for i, j = 1, 2, . . . , d` and ` = 1, 2, . . . , L with d` > 1.

(4) As m→∞, we have m`/m→ c` with ` = 1, 2, . . . , L for a constant c` > 0.

Then, as m → ∞, the bootstrap moment estimator Mm converges a.s. to the true mo-mentsMc.

The detailed proof of Theorem 4.1 is provided in the online appendix.

4.2. NORTA RepresentationIn this section we describe how to employ the NORTA representation for constructingthe joint distributions and generating samples of random vectors, given the partial

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:8 W. Xie et al.

characterization specified by marginal distributions and a dependence measure eitherproduct-moment or Spearman rank correlations.

Since F = {F1, F2, . . . , FL} with F1, F2, . . . , FL mutually independent, we only needto apply NORTA separately for the distributions F` with d` > 1. Specifically, we repre-sent X` as a transformation of a d`-dimensional standard multivariate normal (MVN)vector Z` = (Z`,1, Z`,2, . . . , Z`,d`)

> with product-moment correlation matrix denoted byρZ` ,

X>` =(F−1`,1 [Φ(Z`,1);θθθ`,1], F−1`,2 [Φ(Z`,2);θθθ`,2], . . . , F−1`,d`

[Φ(Z`,d`);θθθ`,d` ])

(4)

where Φ(·) denotes the cdf for the standard normal distribution. If the marginal distri-bution families are given, as we assume here, then the NORTA representation for F`can be specified by (θθθ`, ρZ`).

Let RZ` denote the d` × d` Spearman rank correlation matrix for Z`. There is aclosed form relation between product-moment and Spearman rank correlations for thestandard normal distribution [Clemen and Reilly 1999]

RZ`(i, j) =6

πsin−1

(ρZ`(i, j)

2

)(5)

with i, j = 1, 2, . . . , d`.Since the NORTA implementations for cases where the dependence is measured

by Spearman rank and product-moment correlations are different, we describe themseparately.

4.2.1. Spearman Rank Correlation. If the dependence in the input models is measuredby the Spearman rank correlation, we have RX`

= RZ` since it is invariant undermonotone one-to-one transformation F−1`,i [Φ(·)] for i = 1, 2, . . . , d`. Therefore, given keyproperties of the input models characterized by ϑϑϑ = (θθθ;VR

X), the procedure to findNORTA representations and generate samples for X is as follows.

(1) From VRX`

, get the Spearman rank correlation matrix for Z`, RZ` = RX`.

(2) Calculate ρZ`(i, j) = 2 sin(πRZ`(i, j)/6) for i, j = 1, 2, . . . , d`.(3) Generate Z`

i.i.d.∼ MVN(0, ρZ`) and obtain X` by using Equation (4).(4) Apply Steps 1-3 to all F` with d` > 1. For F` with d` = 1, use a standard approach

[Nelson 2013], such as the inverse cdf method, to generate samples for X`.

By repeating this procedure, we generate samples for X and then use them to drivesimulations to estimate the mean response µ(ϑϑϑ).

Notice that when we use Spearman rank correlation to measure the dependence be-tween the components of input models, the choice of marginal distributions and corre-lation is separable. Thus, for F` specified by any combination of feasible θθθ` and positivedefinite RX`

, we can find corresponding NORTA representations. However, this maynot be true when we use the product-moment correlation for the dependence measure.

4.2.2. Product-Moment Correlation. If the dependence between the components of inputmodels is measured by product-moment correlation, then the procedure to find NORTArepresentations becomes more complex because the choice of marginal distributionsinfluences the feasibility of correlation matrices. Specifically, for F` with d` > 1, thereis a pairwise relation between product-moment correlation matrices of X` and Z`

ρX`(i, j) = C`,ij [ρZ`(i, j);θθθ`]

≡∫ ∫

F−1`,i [Φ(z`,i);θθθ`,i]F−1`,j [Φ(z`,j);θθθ`,j ]ϕρZ` (i,j)(z`,i, z`,j)dz`,idz`,j − E(X`,i)E(X`,j)√

Var(X`,i)Var(X`,j)(6)

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:9

where, ϕρZ` (i,j) denotes the standard bivariate normal density with correlation ρZ`(i, j)and C`,ij denotes a pairwise transformation from ρZ`(i, j) to ρX`

(i, j) for i, j =1, 2, . . . , d`. Given θθθ` and ρX`

, we solve d∗` moment matching Equations (6) to find ρZ` .Unlike Spearman rank correlation, the marginal distributions characterized by pa-rameters θθθ` play an important role in determining the value and feasibility of correla-tion matrix ρZ` .

Therefore, given key properties of input models characterized by ϑϑϑ = (θθθ;VρX), if their

NORTA representations exist, the procedure to find them and generate samples for Xis as follows.

(1) Given VρX`

, solve Equations (6) for correlation matrix ρZ` .

(2) Generate Z`i.i.d.∼ MVN(0, ρZ`) and obtain X` by using Equation (4).

(3) Apply Steps 1-2 for all F` with d` > 1. For F` with d` = 1, use a standard approach[Nelson 2013], such as the inverse cdf method, to generate sample for X`.

By repeating this procedure, we generate samples for X and then use them to drivesimulations to estimate µ(ϑϑϑ). When we solve for ρZ` in Step 1, typically, there is noclosed form analytical solution except for some special marginal distributions, e.g.,uniform distribution, and we need resort to numerical search to obtain ρZ` .

Notice that this procedure only works under the condition that the correlation ma-trix ρZ` obtained in Step (1) is positive semidefinite, which may not hold in general[Ghosh and Henderson 2002]. That means a NORTA representation may not exist forsome combination of feasible θθθ` and positive definite ρX`

. If ρZ` obtained by solving d∗`Equations (6) is positive semidefinite, (θθθ`, ρX`

) is called NORTA feasible; Otherwise,it is called NORTA infeasible. Therefore, input models with dependence measured byproduct-moment correlation are NORTA feasible if (θθθ`, ρX`

) is NORTA feasible for all` = 1, 2, . . . , Lwith d` > 1. Based on the study by Ghosh and Henderson [2002], NORTAinfeasibility is more likely in high-dimensions and with correlations close to ±1. Fora NORTA infeasible (θθθ`, ρX`

), we can find a feasible correlation matrix that is close toρX`

.Since F1, F2, . . . , FL are mutually independent, we can consider each F` separately.

Theorem 4.2 gives a property of the NORTA feasible region: For any F` with d` > 1,if the true moment combination Mc

` has a feasible NORTA representation with pos-itive definite ρcZ` , then there exists a small open neighborhood centered at Mc

` in thed†` +d∗` space such that any moment combinationM` in the neighborhood has a feasibleNORTA representation. This property is useful when we show the asymptotic consis-tency of the CI built by the metamodel-assisted bootstrapping to quantify both inputand simulation uncertainty in Section 4.4.

THEOREM 4.2. Let ΘΘΘ` ⊆ <d†` be the feasible domain for marginal distribution pa-

rameters θθθ` and suppose θθθc` is an interior point in ΘΘΘ`. Suppose there is one-to-one contin-uous mapping between marginal moments ψψψ`,i and parameters θθθ`,i for i = 1, 2, . . . , d`.For F` with d` > 1, suppose the following conditions hold.

(1) F c` has a NORTA representation (θθθc`, ρcZ`

) with ρcZ` positive definite.(2) At any x, suppose the marginal distributions F`,i(x;θθθ`,i), density functions

f`,i(x;θθθ`,i) and inverse distributions F−1`,i (x;θθθ`,i) are continuously differentiable overθθθ`,i for i = 1, 2, . . . , d` on ΘΘΘ`.

(3) For any θθθ` ∈ΘΘΘ`, suppose the marginal cdfs F`,1(x;θθθ`,1), F`,2(x;θθθ`,2), . . . , F`,d`(x;θθθ`,d`)are continuous and strictly increasing in x.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:10 W. Xie et al.

Then the true moment vector Mc` = (ψψψc`; (Vρ

X`)c) is an interior point of the NORTA

feasible region: In the d†` + d∗` dimensional space, there exists a constant δ > 0 such thatany moment combinationM` in the open ball Bδ(Mc

`) is NORTA feasible.

The detailed proof of Theorem 4.2 is provided in the online appendix.

4.3. Stochastic Kriging MetamodelIn the metamodel-assisted bootstrapping framework, after quantifying the input un-certainty with the bootstrap as described in Section 4.1, an equation-based SK meta-model introduced by Ankenman et al. [2010] is used to propagate this uncertaintyto the output mean. The succinct review of SK in this section is based on Xie et al.[2014b].

Dependent input models characterized by momentsM can be interpreted as a loca-tion x in a d =

∑L`=1(d†` + d∗` ) dimension space. Suppose that the underlying true (but

unknown) response surface is a continuous funtion of moments of input models and itcan be thought of as a realization of a stationary Gaussian Process (GP). We model thesimulation output Y by

Yj(x) = β0 +W (x) + εj(x). (7)This model includes two sources of uncertainty: the simulation output uncertaintyεj(x) and mean response uncertainty W (x). For many, but not all, simulation settingsthe output is an average of a large number of more basic outputs, so a normal approx-imation can be applied: ε(x) ∼ N(0, σ2

ε (x)).Since stochastic systems with dependent input models having similar key proper-

ties tend to have close mean responses, a zero-mean, second-order stationary GP W (·)is used to account for this spatial dependence. Therefore, the uncertainty about theunknown true response surface µ(x) is represented by a GP M(x) ≡ β0 + W (x) (notethat β0 can be replaced by a more general trend term f(x)>β). Its spatial dependenceis characterized by the covariance function, Σ(x,x′) = Cov[W (x),W (x′)] = τ2γ(x− x′),where τ2 denotes the variance and γ(·) is a correlation function that depends only onthe distance x−x′. Based on prior information about smoothness of µ(·), we can choosethe form of correlation function [Xie et al. 2010]. Considering that mean response sur-faces for most system engineering problems have a high order of smoothness, we usethe product-form Gaussian correlation function

γ(x− x′) = exp

(−

d∑j=1

ζj(xj − x′j)2)

(8)

for the empirical evaluation in Section 5. Let ζζζ = (ζ1, ζ2, . . . , ζd) represent the correla-tion parameters. Before having any simulation result, the uncertainty about µ(x) canbe represented by a Gaussian process M(x) ∼ GP(β0, τ

2γ(x− x′)).To reduce the uncertainty about µ(x) we choose an experiment design consist-

ing of pairs D ≡ {(xi, ni), i = 1, 2, . . . , k} with (xi, ni) denoting the location andthe number of replications at the ith design point. The simulation outputs at Dare YD ≡ {(Y1(xi), Y2(xi), . . . , Yni(xi)); i = 1, 2, . . . , k} and the sample mean at designpoint xi is Y (xi) =

∑nij=1 Yj(xi)/ni. Let the sample means at all k design points be

YD = (Y (x1), Y (x2), . . . , Y (xk))T and its variance is represented by a k × k diagonalmatrix C = diag

{σ2ε (x1)/n1, σ

2ε (x2)/n2, . . . , σ

2ε (xk)/nk

}because the use of common ran-

dom numbers is detrimental to prediction [Chen et al. 2012].The simulation outputs YD and spatial dependence characterized by the covariance

function Σ(·, ·) can be used to improve system mean prediction at any fixed point x.Specifically, let Σ be the k × k spatial covariance matrix of the design points and let

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:11

Σ(x, ·) be the k × 1 spatial covariance vector between each design point and x. If theparameters (τ2, ζζζ, C) are known, then the metamodel uncertainty can be character-ized by a refined GP Mp(x) that denotes the conditional distribution of M(x) given allsimulation outputs,

Mp(x) ∼ GP(mp(x), σ2p(x)) (9)

where mp(·) is the minimum mean squared error (MSE) linear unbiased predictor

mp(x) = β0 + Σ(x, ·)>(Σ + C)−1(YD − β0 · 1k×1), (10)and the corresponding variance is

σ2p(x) = τ2 − Σ(x, ·)>(Σ + C)−1Σ(x, ·) + η>[1>k×1(Σ + C)−11k×1]−1η (11)

where β0 = [1>k×1(Σ + C)−11k×1]−11>k×1(Σ + C)−1YD and η = 1 − 1>k×1(Σ + C)−1Σ(x, ·)[Ankenman et al. 2010].

Since in reality the spatial correlation parameters τ2 and ζζζ are unknown, maximumlikelihood estimates are typically used for prediction, and the sample variance is usedas an estimate for the simulation variance at design points C [Ankenman et al. 2010].By substituting parameter estimates (τ2, ζζζ, C) in Equations (10) and (11) we can ob-tain the estimated mean mp(x) and variance σ2

p(x). Thus, the metamodel we use isµ(x) = mp(x) with variance estimated by σ2

p(x). This is commonly done in the krigingliterature because including the parameter estimation error is intractable.

4.4. Procedure to Build CISince there are both input and simulation estimation errors in the system mean perfor-mance estimates, in this section, we propose a procedure to build a CI quantifying theoverall uncertainty for µ(Mc). We show that as m,B → ∞, the CI has asymptoticallyconsistent coverage.

Based on a hierarchical sampling approach, we propose the following procedure tobuild a (1− α) bootstrap percentile CI.

(1) Identify a design space E for the moments of input models M over which to fitthe metamodel. Since the metamodel is used to propagate the input uncertaintymeasured by the bootstrapped moments Mm to the output mean, the design spaceis chosen to be the smallest ellipsoid covering most likely bootstrapped moments.See Barton et al. [2014] for more detailed information.

(2) Use a maximin distance Latin hypercube design to embed k design points into thedesign space E. Assign equal replications to k design points to exhaust the simu-lation budget N and obtain an experiment design D = {(M(i), ni), i = 1, 2, . . . , k}.

(3) At k design points, generate samples of X` by using NORTA representations forF` with d` > 1 as described in Section 4.2 and using a standard approach [Nelson2013] for F` with d` = 1, ` = 1, 2, . . . , L. Use these samples to drive simulationsand obtain outputs yD. Compute the sample average y(M(i)) and sample variances2(M(i)) of the simulation outputs, i = 1, 2, . . . , k. Fit a SK metamodel to obtainmp(·) and σ2

p(·) using(y(M(i)), s2(M(i)),M(i)

), i = 1, 2, . . . , k.

(4) For b = 1 to B(a) Generate bootstrap moments M(b)

m by following the procedure in Section 4.1.(b) Draw Mb ∼ N

(mp(M(b)

m ), σ2p(M(b)

m ))

.Next b

(5) Report CI:[M(dB α

2 e), M(dB(1−α2 )e)

]where, M(1) ≤ M(2) ≤ · · · ≤ M(B) are the sorted

values.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:12 W. Xie et al.

Remark 4.3. Unlike direct bootstrapping that runs simulation at all samples gen-erated to quantify the input uncertainty, M(b)

m with b = 1, 2, . . . , B, the metamodel-assisted bootstrapping builds a metamodel based on the simulation results at well-chosen design points and uses the metamodel to propagate the input uncertainty to theoutput mean. To build joint distributions to drive the simulation runs, design pointsneed to be NORTA feasible. Since some bootstrapped moments may be NORTA in-feasible, the design space E built to cover the most likely bootstrapped samples couldinclude moment combinationsM that are also NORTA infeasible. However, when thedimension of correlated random vector is relatively low, say d` ≤ 5, and the pairwisecorrelation is not so extreme or close to ±1, the NORTA infeasible problem is rare forthe sample sizes of real-world data encountered in many applications, as shown bythe empirical study in Section 5. Therefore, we remove any NORTA infeasible designpoints in the design space E in Step (2) and assign equal replications to the remainingpoints. Then, we construct a SK metamodel and use it to estimate mean responsesat all bootstrapped samples M(b)

m with b = 1, 2, . . . , B even including the NORTA in-feasible trials because there typically exists a close NORTA approximation for the in-feasible bootstrapped moments [Ghosh and Henderson 2002] and µ(·) for most systemengineering problems has a high order of smoothness.

Since F1, F2, . . . , FL are mutually independent, by applying Theorem 4.2, we have:If the true moments Mc

` are NORTA feasible having positive definite ρcZ` for ` =1, 2, . . . , L with d` > 1, as the amount of real-world data increases, all the momentsM in the design space E eventually become NORTA feasible. Specifically, if Mc

` hasa NORTA representation with positive definite ρcZ` , it is an interior point of NORTAfeasible region as described in Theorem 4.2. By Theorem 4.1, as m → ∞, we have aconsistent moment estimator Mm

a.s.→ Mc. Since the design space E is built to covermost likely bootstrap samples, it automatically shrinks to smaller and smaller regionaroundMc as the amount of real-world data increases. Therefore, E would eventuallybe included in the NORTA feasible region and all moment combinations in the designspace have feasible NORTA representation. That means asymptotically we can ignorethe effect of NORTA infeasible problem in the metamodel-assisted bootstrap framework.

The CI[M(dB α

2 e), M(dB(1−α2 )e)

]characterizes the impact from both input and meta-

model uncertainty on system performance estimate. A variance decomposition in Xieet al. [2014b] can be used to assess their relative contributions and guide a decisionmaker as to where to put more effort: If the input uncertainty dominates, then getmore real-world data; if the metamodel uncertainty dominates, then run more simula-tions; if neither dominates, then do both activities to improve the estimation accuracyof µ(Mc).

If SK parameters (τ2, ζζζ, C) are known and we replace Mb in Step (4.b) of theCI procedure with Mb ∼ N

(mp(M(b)

m ), σ2p(M(b)

m )), we can show that the CI ob-

tained [M(dB α2 e),M(dB(1−α2 )e)] is asymptotically consistent by Theorem 4.4. In practice,

(τ2, ζζζ, C) must be estimated. Based on the sensitivity study in Xie et al. [2014b], SKcan provide robust inference without accounting for parameter-estimation error whenwe employ an adequate experiment design, e.g., space-filling design used in this paper.

THEOREM 4.4. Suppose conditions of Theorems 4.1 and 4.2 and the following ad-ditional assumptions hold.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:13

Fig. 1. A stochastic activity network.

(1) εj(x)i.i.d.∼ N(0, σ2

ε (x)) for any x, and M(x) is a stationary, separable GP with acontinuous correlation function satisfying

1− γ(x− x′) ≤ c

|log(‖ x− x′ ‖2)|1+δ1for all ‖ x− x′ ‖2≤ δ2 (12)

for some c > 0, δ1 > 0 and δ2 < 1, where ‖ x− x′ ‖2=√∑d

j=1(xj − x′j)2.(2) The input processes, simulation noise εj(x) and GP M(x) are mutually independent

and the bootstrap process is independent of all of them.

Then the interval [M(dB α2 e),M(dB(1−α2 )e)] is asymptotically consistent, meaning the iter-

ated limit

limm→∞

limB→∞

Pr{M(dBα/2e) ≤Mp(Mc) ≤M(dB(1−α/2)e)} = 1− α. (13)

The detailed proof of Theorem 4.4 is provided in the online appendix.

5. EMPIRICAL STUDYSince F1, F2, . . . , FL are mutually independent and we addressed the case of indepen-dent univariate distributions in Xie et al. [2014b], here we consider an example withL = 1 multivariate distribution and suppress the subscript for the `th input distribu-tion. We use the stochastic activity network shown in Figure 1 to examine the finite-sample performance of our metamodel-assisted bootstrapping approach. Suppose thatthe time required to complete task (arc) i is denoted by Xi for i = 1, 2, . . . , 5 andX> = (X1, X2, . . . , X5). We wish to compute the time to complete the project, whichis the longest path through the network, Y = max{X1 + X2 + X5, X1 + X4, X3 + X5}.We are interested in the mean response E[Y ].

We assume that F c is NORTA. The marginal distributions are Xi ∼ exp(θci ) fori = 1, 2, . . . , 5 with means θθθc = (10, 5, 12, 11, 5)>. We consider two cases with depen-dence measured by either Spearman rank or product-moment correlations. The truecorrelation matrices RcX or ρcX are equal to

A ≡

1 0.5 0.5 0.3 0

1 0.5 0 0.31 0 0.3

1 0.11

.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:14 W. Xie et al.

Therefore, the number of parameters characterizing the input model F is d = d† +d∗ = 5 + (5 × 4)/2 = 15. Since the true mean response µ(ϑϑϑc) is unknown, we run 107

replications and obtain the following results:

— Case 1: If we use the Spearman rank correlation with RcX = A, the estimated truemean response is 27.549 with standard error 0.005.

— Case 2: If we use the product-moment correlation with ρcX = A, the estimated truemean response is 27.482 with standard error 0.005.

To evaluate our metamodel-assisted bootstrapping approach, we pretend that theinput-model parameters (θθθc, RcX) or (θθθc, ρcX) are unknown and they are estimated by mi.i.d. observations from F c; this represents obtaining “real-world data.” The goal is tobuild a CI quantifying the impact of both input and simulation estimation error on thesystem mean response estimate.

We compare metamodel-assisted bootstrapping to the conditional CI and direct boot-strapping. For the conditional CI, we fit the input distribution to the real-world databy moment matching and allocate the entire computational budget of N replicationsto simulating the resulting system. In direct bootstrapping, we run N/B replicationsof the simulation at each bootstrap moment M(b)

m , record the average simulation out-put Yb = Y (M(b)

m ), and report the percentile CI[Y(dB α

2 e), Y(dB(1−α2 )e)

]. In metamodel-

assisted bootstrapping, we evenly assign N replications to k design points, run simu-lations, build a SK metamodel and record the percentile CI

[M(dB α

2 e), M(dB(1−α2 )e)

]by

following the procedure in Section 4.4.For the input distribution with dependence characterized by either Spearman rank

or product-moment correlations, Tables I and II show the statistical performance ofconditional and direct bootstrapping CIs and metamodel-assisted bootstrapping withk = 80 design points, m = 100, 500, 1000 real-world observations, and computationalbudget of N = 103, 104, and 105 replications. We ran 1000 macro-replications of theentire experiment. In each macro-replication, we first generate m multivariate obser-vations by using NORTA with parameters (θθθc, RcX) or (θθθc, ρcX). Then, for the conditionalCI, we run N replications at the estimated parameters (θθθm, RX,m) or (θθθm, ρX,m) andbuild CIs with nominal 95% coverage of the response mean. For direct bootstrappingand metamodel-assisted bootstrapping, we use bootstrapping to generate B = 1000sample moments to quantify the input uncertainty. Since µ(·) is unknown, we use thefixed computational budget N to propagate the input uncertainty either via direct sim-ulation or via the SK metamodel to build percentile CIs with nominal 95% coverage.

When we use the product-moment correlation to measure the dependence, the prob-ability of NORTA infeasible bootstrapped moments M(b)

m is low with mean 0.11% andstandard deviation 0.34% form = 100, and 0% form = 500, 1000. In metamodel-assistedbootstrapping, we remove any NORTA infeasible design points and evenly assign Nreplications to the remaining points to build the SK metamodel.

In direct bootstrapping, for the small percentage of NORTA infeasible bootstrap re-sampled moments M(b)

m , an approximation is used to estimate the system mean re-sponse. Specifically, for each M(b)

m , we solve the moment-matching equations (6) for ρZ.If ρZ is positive definite, we can use the NORTA representation to generate samplesof X and run simulations. Otherwise, we find a close positive semi-definite approxima-tion for ρZ, denoted by ρZ, [Higham 2002] and let ρaZ ≡ ρZ + δ′I5×5, where I5×5 denotesa 5×5 identity matrix and δ′ is a small positive value. We use δ′ = 10−5 in the empiricalstudy. Then, we set ρaZ as the correlation matrix for NORTA and generate samples ofX for simulation runs.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:15

Table I. Results of nominal 95% CIs when m = 100, 500, 1000 for Case 1 when thedependence is characterized by Spearman rank correlations.

m = 100 N = 103 N = 104 N = 105

conditional CI coverage 43% 14.1% 4.4%CI width (mean) 2.2 0.7 0.2

CI width (SD) 0.19 0.1 0.02direct bootstrap coverage 100% 100% 99.4%

CI width (mean) 67.3 22.8 9.7CI width (SD) 6 1.9 0.8

metamodel-assisted coverage 97.6% 96.1% 94.8%bootstrap CI width (mean) 12.4 8 7.4

CI width (SD) 2.5 1.2 0.9m = 500 N = 103 N = 104 N = 105

conditional CI coverage 76.9% 32.1% 12%CI width (mean) 2.2 0.7 0.2

CI width (SD) 0.1 0.02 0.01direct bootstrap coverage 100% 100% 100%

CI width (mean) 66.4 21.9 7.5CI width (SD) 3.6 1 0.3

metamodel-assisted coverage 96.3% 98.7% 97.2%bootstrap CI width (mean) 10.3 4.5 3.5

CI width (SD) 2.9 0.9 0.4m = 1000 N = 103 N = 104 N = 105

conditional CI coverage 83.4% 42.2% 16.7%CI width (mean) 2.2 0.7 0.2

CI width (SD) 0.09 0.02 0.01direct bootstrap coverage 100% 100% 100%

CI width (mean) 66.5 21.7 7.2CI width (SD) 3.2 0.9 0.3

metamodel-assisted coverage 97.1% 94% 96.6%bootstrap CI width (mean) 10.2 3.6 2.6

CI width (SD) 2.9 1.1 0.3

From Tables I and II, the results for Cases 1 and 2 with dependence measured byeither product-moment or Spearman rank correlations are similar. We observe thatunder the same computational budget N , the conditional CIs that only account forthe simulation uncertainty tend to have undercoverage. The CIs obtained by directbootstrapping are much wider and they typically have obvious over-coverage. The CIsobtained by metamodel-assisted bootstrapping have coverage much closer to the nom-inal level of 95%. As N increases and simulation estimation error decreases, the un-dercoverage problem for the conditional CI becomes worse. Since direct bootstrap andmetamodel-assisted bootstrap use the same set of bootstrapped samples to quantifyinput uncertainty, the overcoverage for the direct bootstrap represents the additionalsimulation uncertainty introduced while propagating the input uncertainty to the out-put mean. Tables I and II show that the metamodel can effectively use the computa-tional budget and reduce the impact from simulation estimation error. Further, as thecomputational budget increases, the difference between the CIs obtained by the twomethods diminishes.

Figure 2 shows scatter plots of conditional CIs and CIs obtained by direct bootstrap-ping and metamodel-assisted bootstrapping with m = 500, k = 80 and N = 104 whenwe use either Spearman-rank or product-moment correlations. They include resultsfrom 100 macro-replications. The horizontal axis represents (QL+QU )/2, the center ofthe CI, where QL and QU are the lower and upper bounds of the CIs. The vertical axisis (QU −QL)/2, the half width of the CIs. Region 1 contains points that correspond toCIs having underestimation and Region 3 contains points corresponding to overesti-mation, while Region 2 contains CIs that cover µ(F c); see Kang and Schmeiser [1990].

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:16 W. Xie et al.

Table II. Results of nominal 95% CIs when m = 100, 500, 1000 for Case 2 when thedependence is characterized by product-moment correlations.

m = 100 N = 103 N = 104 N = 105

conditional CI coverage 44.6% 14.3% 4.7%CI width (mean) 2.2 0.7 0.2

CI width (SD) 0.19 0.05 0.02direct bootstrap coverage 100% 100% 99.1%

CI width (mean) 68.3 23.3 9.8CI width (SD) 6.1 2 0.9

metamodel-assisted coverage 97.5% 97.5% 96.3%bootstrap CI width (mean) 12.6 8.4 7.6

CI width (SD) 2.6 1.3 0.9m = 500 N = 103 N = 104 N = 105

conditional CI coverage 74% 32.5% 9.2%CI width (mean) 2.2 0.7 0.2

CI width (SD) 0.1 0.03 0.01direct bootstrap coverage 100% 100% 100%

CI width (mean) 67.4 22.1 7.6CI width (SD) 3.7 1 0.3

metamodel-assisted coverage 97.9% 97.4% 96.5%bootstrap CI width (mean) 10.7 4.6 3.6

CI width (SD) 2.9 0.9 0.4m = 1000 N = 103 N = 104 N = 105

conditional CI coverage 82.6% 44.8% 14.4%CI width (mean) 2.2 0.7 0.2

CI width (SD) 0.1 0.02 0.01direct bootstrap coverage 100% 100% 100%

CI width (mean) 67.1 22 7.3CI width (SD) 3.3 0.9 0.3

metamodel-assisted coverage 97.5% 95.6% 98.4%bootstrap CI width (mean) 10.4 3.7 2.7

CI width (SD) 3 1.1 0.3

The conclusions obtained for both Cases 1 and 2 are similar. Conditional CIs havewidth too short and their centers have large variance. Therefore, they have seriousundercoverage. The variance for centers of CIs comes from the impact of input uncer-tainty. Since metamodel-assisted bootstrapping accounts for both input and simulationuncertainty, its CI width is large enough to avoid undercoverage. The proportion of CIsin Region 2 is close to 95% and CIs outside tend to have underestimation based on re-sults from 1000 macro-replications. The width of CIs obtained by the direct bootstrapis too large; all CIs are located in Region 2 and they have serious overcoverage.

6. CONCLUSIONSIn this paper, we extended the metamodel-assisted bootstrapping framework of Xieet al. [2014b] to stochastic simulation with dependent input models. The input mod-els are characterized by their marginal distribution parameters and dependence mea-sured either by Spearman rank or product-moment correlations, which are estimatedfrom real-world data. The joint distributions are constructed by using the flexibleNORTA representation. Metamodel-assisted bootstrapping uses the bootstrap to quan-tify the estimation error of these joint distributions and propagates it to the outputmean by using an equation-based SK metamodel. We proposed a procedure to de-liver a CI quantifying the overall uncertainty of the system performance estimate.The asymptotic consistency of this interval is proved under the assumption that thetrue mean response surface is a realization of a GP.

An empirical study using a stochastic activity network demonstrates that for theinput distribution with dependence measured by either Spearman rank or product-

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:17

Fig. 2. Scatter plots of conditional CIs and CIs obtained by direct bootstrapping and metamodel-assistedbootstrapping when m = 500, k = 80 and N = 104 for (a) Case 1 with dependence measured by Spearmanrank correlation, (b) Case 2 with dependence measured by product-moment correlation.

moment correlations, our metamodel-assisted bootstrap approach has good finite-sample performance under various quantities of real-world data and simulation bud-get. When the simulation budget is tight, compared with the direct bootstrap, themetamodel-assisted bootstrap can make more effective use of the simulation budget.

ELECTRONIC APPENDIXThe electronic appendix for this article can be accessed in the ACM Digital Library.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

A:18 W. Xie et al.

AcknowledgmentsThis research was partially supported by National Science Foundation Grant CMMI-1068473 and GOALI sponsor Simio LLC. The authors thank Cheng Li in the proof ofLemma A.4. Portions of this paper were previously published in the Proceedings of the2014 Winter Simulation Conference as Xie et al. [2014c].

REFERENCESBruce E. Ankenman, Barry L. Nelson, and Jeremy Staum. 2010. Stochastic Kriging for Simulation Meta-

modeling. Operations Research 58 (2010), 371–382.Eusebio Arenal-Gutierrez, Carlos Matran, and Juan A. Cuesta-Albertos. 1996. Unconditional Glivenko-

Gantelli-Type Theorems and Weak Laws of Large Numbers for Bootstrap. Statistics & Probability Let-ters 26 (1996), 365–375.

Russell R. Barton. 2007. Presenting A More Complete Characterization of Uncertainty: Can It Be Done?.In Proceedings of the 2007 INFORMS Simulation Society Research Workshop. INFORMS SimulationSociety, Fontainebleau.

Russell R. Barton. 2012. Tutorial: Input Uncertainty in Output Analysis. In Proceedings of the 2012 WinterSimulation Conference, C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher(Eds.). IEEE Computer Society, Washington, DC, 67–78.

Russell R. Barton, Barry L. Nelson, and Wei Xie. 2014. Quantifying input uncertainty via simulation confi-dence intervals. Informs Journal on Computing 26 (2014), 74–87.

Russell R. Barton and Lee W. Schruben. 1993. Uniform And Bootstrap Resampling of Input Distributions.In Proceedings of the 1993 Winter Simulation Conference, G. W. Evans, M. Mollaghasemi, E. C. Russell,and W. E. Biles (Eds.). IEEE Computer Society, Washington, DC, 503–508.

Russell R. Barton and Lee W. Schruben. 2001. Resampling Methods for Input Modeling. In Proceedings ofthe 2001 Winter Simulation Conference, B. A. Peters, J. S. Smith, D. J. Medeiros, and M. W. Rohrer(Eds.). IEEE Computer Society, Washington, DC, 372–378.

Bahar Biller and Canan G. Corlu. 2011. Accounting for Parameter Uncertainty in Large-Scale StochasticSimulations with Correlated Inputs. Operations Research 59 (2011), 661–673.

Bahar Biller and Soumyadip Ghosh. 2006. Multivariate Input Processes. In Handbooks in Operations Re-search and Management Science: Simulation, S. Henderson and B. L. Nelson (Eds.). Elsevier, Chapter 5.

Patrick Billingsley. 1995. Probability and Measure. Wiley-Interscience, New York.Marne C. Cario and Barry L. Nelson. 1997. Modeling and Generating Random Vectors with Arbitrary

Marginal Distributions and Correlation Matrix. Technical report. Department of Industrial Engineeringand Management Sciences, Northwestern University.

Xi Chen, Bruce E. Ankenman, and Barry L. Nelson. 2012. The Effect of Common Random Numbers onStochastic Kriging Metamodels. ACM Transactions on Modeling and Computer Simulation 22 (2012),7:1–7:20.

Russell C. H. Cheng and Wayne Holland. 1997. Sensitivity of Computer Simulation Experiments to Errorsin Input Data. Journal of Statistical Computation and Simulation 57 (1997), 219–241.

Robert T. Clemen and Terence Reilly. 1999. Correlations and Copulas for Decision and Risk Analysis. Man-agement Science 45 (1999), 208–224.

Soumyadip Ghosh and Shane G. Henderson. 2002. Properties of the NORTA Method in Higher Dimensions.In Proceedings of the 2002 Winter Simulation Conference, E. Yucesan, C. H. Chen, J. L. Snowdon, andJ. M. Charnes (Eds.). IEEE Computer Society, Washington, DC, 263–269.

Shane G. Henderson, Belinda A. Chiera, and Roger M. Cooke. 2000. Generating Dependent Quasi-RandomNumbers. In Proceedings of the 2000 Winter Simulation Conference, J. A. Joines, R. R. Barton, K. Kang,and P.A. Fishwick (Eds.). IEEE Computer Society, Washington, DC, 527–536.

Nicholas J. Higham. 2002. Computing the Nearest Correlation Matrix – A Problem from Finance. IMA J.Numer. Anal. 22 (2002), 329–343.

Keebom Kang and Bruce Schmeiser. 1990. Graphical Methods for Evaluation And Comparing Confidence-Interval Procedures. Operations Research 38, 3 (1990), 546–553.

Barry L. Nelson. 2013. Foundations and Methods of Stochastic Simulation: A First Course. Springer-Verlag.Jun Shao and Dongsheng Tu. 1995. The Jackknife and Bootstrap. Springer-Verlag.Eunhye Song, Barry L. Nelson, and C. Dennis Pegden. 2014. Advanced Tutorial: Input Uncertainty Quan-

tification. In Proceedings of the 2014 Winter Simulation Conference, A. Tolk, S. Y. Diallo, I. O. Ryzhov,L. Yilmaz, S. Buckley, and J. A. Miller (Eds.). IEEE Computer Society, Washington, DC.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation A:19

A. W. Van Der Vaart. 1998. Asymptotic Statistics. Cambridge University Press, Cambridge, UK.William R. Wade. 2010. An Introduction to Analysis (4 ed.). Prentice Hall.Wei B. Wu and Jan Mielniczuk. 2010. A New Look at Measuring Dependence. In Dependence in Probability

and Statistics, P. Doukhan, G. Lang, D. Surgailis, and G. Teyssiere (Eds.).Wei Xie, Barry L. Nelson, and Russell R. Barton. 2014a. A Bayesian Framework for Quantifying Uncertainty

in Stochastic Simulation. accepted by Operational Research (2014).Wei Xie, Barry L. Nelson, and Russell R. Barton. 2014b. Statistical Uncertainty Analysis for Stochastic

Simulation. (2014). Working Paper, Department of Industrial Engineering and Management Sciences,Northwestern University, Evanston, IL.

Wei Xie, Barry L. Nelson, and Russell R. Barton. 2014c. Statistical Uncertainty Analysis for StochasticSimulation with Dependent Input Models. In Proceedings of the 2014 Winter Simulation Conference,A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller (Eds.). IEEE Computer Society,Washington, DC.

Wei Xie, Barry L. Nelson, and Jeremy Staum. 2010. The influence of correlation functions on stochastickriging metamodels. In Proceedings of the 2010 Winter Simulation Conference, B. Johansson, S. Jain,J. Montoya-Torres, J. Hugan, and E. Yucesan (Eds.). IEEE Computer Society, Washington, DC, 1067–1078.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Online Appendix to:Multivariate Input Uncertainty in Output Analysis for StochasticSimulation

WEI XIE, Rensselaer Polytechnic InstituteBARRY L. NELSON, Northwestern UniversityRUSSELL R. BARTON, The Pennsylvania State University

APPENDIXIn terms of the notations, the superscript k in X

(k)` indicates the index of k-th sample

of random-vector from F c` , and the k in Xk`,i represents the power index.

LEMMA A.1. Suppose we have X(k)`

i.i.d∼ F c` for k = 1, 2, . . . ,m` and E(X4`,iX

4`,j) <∞

for i, j = 1, 2, . . . , d` and ` = 1, 2, . . . , L with d` > 1. As m → ∞, we have m`/m → c`for a constant c` > 0. Then, as m → ∞, the bootstrapped product-moment correlationestimator ρX,m converges a.s. to the true correlation ρcX.

PROOF. Since F1, F2, . . . , FL are mutually independent, we only need to show asm` → ∞, ρX`,m`(i, j)

a.s.→ ρcX`(i, j) for any i 6= j and ` with d` > 1. The bootstrapped

product-moment covariance estimator is

1

m` − 1

m∑k=1

(X

(k)`,i −

¯X`,i

)(X

(k)`,j −

¯X`,j

)=

1

m` − 1

m∑k=1

(X

(k)`,i X

(k)`,j − X

(k)`,i

¯X`,j −

¯X`,iX

(k)`,j +

¯X`,i

¯X`,j

)=

1

m` − 1

[m∑k=1

X(k)`,i X

(k)`,j −m`

¯X`,i

¯X`,j

]

=m`

m` − 1︸ ︷︷ ︸→1

· 1

m`

m∑k=1

X(k)`,i X

(k)`,j −

m`

m` − 1︸ ︷︷ ︸→1

¯X`,i

¯X`,j

where, ¯X`,i =

∑m`k=1 X

(k)`,i /m`. By Lemma 1 in Xie et al. [2014b], we have ¯

X`,ia.s.→ E(X`,i)

as m` → ∞ for i = 1, 2, . . . , d`. If∑m`k=1 X

(k)`,i X

(k)`,j /m`

a.s.→ E(X`,iX`,j) as m` → ∞,we can get ρX`,m`(i, j)

a.s.→ ρcX`(i, j) for i, j = 1, 2, . . . , d` by applying the continuous

mapping theorem [Vaart 1998]. Let ΣX`,m`(i, j) ≡∑m`k=1 X

(k)`,i X

(k)`,j /m` and ΣcX`

(i, j) ≡E(X`,iX`,j). Therefore, we only need to show

ΣX`,m`(i, j)a.s.→ ΣcX`

(i, j) as m` →∞. (14)

By the Chebyshev inequality, for every ε > 0, we have

Pr{|ΣX`,m`(i, j)− ΣcX`(i, j)| > ε} ≤

E[(

ΣX`,m`(i, j)− ΣcX`(i, j)

)4]ε4

c© YYYY ACM 1049-3301/YYYY/-ARTA $15.00DOI:http://dx.doi.org/10.1145/0000000.0000000

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

App–2 W. Xie et al.

where,

E[(

ΣX`,m`(i, j)− ΣcX`(i, j)

)4]= E

[(ΣX`,m`(i, j)

)4]−4ΣcX`

(i, j)E[(

ΣX`,m`(i, j))3]

+6(ΣcX`

(i, j))2 E

[(ΣX`,m`(i, j)

)2]−4(ΣcX`

(i, j))3 E

[ΣX`,m`(i, j)

]+(ΣcX`

(i, j))4. (15)

Then, we work on each term on the right-hand side of Equation (15),

E[ΣX`,m`(i, j)

]= E

[1

m`

m∑k=1

X(k)`,i X

(k)`,j

]

= E[E(X

(k)`,i X

(k)`,j

∣∣∣Xm

)]= E

[1

m`

m∑k=1

X(k)`,i X

(k)`,j

]= E[X`,iX`,j ].

Notice that

E[(

ΣX`,m`(i, j))2]

= E

( 1

m`

m∑k=1

X(k)`,i X

(k)`,j

)2

=1

m2`

E

(m∑k=1

X(k)`,i X

(k)`,j

)2

=1

m2`

E

m∑k=1

(X

(k)`,i X

(k)`,j

)2+∑k1 6=k2

X(k1)`,i X

(k1)`,j X

(k2)`,i X

(k2)`,j

. (16)

The first term on the right-hand side of Equation (16) becomes

E

[m∑k=1

(X

(k)`,i X

(k)`,j

)2]= m`E

[E(X

(k)`,i X

(k)`,j

)2∣∣∣∣Xm

]

= m`E

[1

m`

m∑k=1

(X

(k)`,i X

(k)`,j

)2]= m`E

[X2`,iX

2`,j

]and the second term on the right-hand side of Equation (16) becomes

E

∑k1 6=k2

X(k1)`,i X

(k1)`,j X

(k2)`,i X

(k2)`,j

= m`(m` − 1)E

[E(X

(k1)`,i X

(k1)`,j

∣∣∣Xm

)· E(X

(k2)`,i X

(k2)`,j

∣∣∣Xm

)]= m`(m` − 1)E

( 1

m`

m∑k=1

X(k)`,i X

(k)`,j

)2

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation App–3

=m`(m` − 1)

m2`

E

m∑k=1

(X

(k)`,i X

(k)`,j

)2+∑k1 6=k2

X(k1)`,i X

(k1)`,j X

(k2)`,i X

(k2)`,j

=

m`(m` − 1)

m2`

[m`E(X2

`,iX2`,j) +m`(m` − 1)E2(X`,iX`,j)

]= (m` − 1)E(X2

`,iX2`,j) + (m` − 1)2E2(X`,iX`,j).

Then, substituting them back into Equation (16),

E[(

ΣX`,m`(i, j))2]

=1

m2`

[m`E(X2

`,iX2`,j) + (m` − 1)E(X2

`,iX2`,j) + (m` − 1)2E2(X`,iX`,j)

]=

1

m2`

[(2m` − 1)E(X2

`,iX2`,j) + (m` − 1)2E2(X`,iX`,j)

].

Similar derivations show that if E(X4`,iX

4`,j

)<∞, we have

E[(

ΣX`,m`(i, j))3]

= E

( 1

m`

m∑k=1

X(k)`,i X

(k)`,j

)3

=6(m` − 1)3

m4`

E(X2`,iX

2`,j)E(X`,iX`,j) +

(m` − 1)2(m` − 2)2

m4`

E3(X`,iX`,j) +O(m−2` ).

and

E[(

ΣX`,m`(i, j))4]

=6(m` − 1)2(m` − 2)2(2m` − 3)

m6`

E(X2`,iX

2`,j

)E2(X`,iX`,j

)+

(m` − 1)2(m` − 2)2(m` − 3)2

m6`

E4(X`,iX`,j

)+O(m−2` ).

Substituting all results into Equation (15) and then simplifying the right side gives,

E[(

ΣX`,m`(i, j)− ΣcX`(i, j)

)4]= 0 +O(m−2` ) (17)

Therefore, we have∞∑

m`=1

Pr{|ΣX`,m`(i, j)− ΣcX`(i, j)| > ε} ≤

∞∑m`=1

c

m2`ε

4<∞ (18)

where c is a finite constant. By applying the Borel-Cantelli lemma [Billingsley 1995],we have as m` →∞, ΣX`,m`(i, j)

a.s.→ ΣcX`(i, j) for all i, j = 1, 2, . . . , d` and ` = 1, 2, . . . , L

with d` > 1. Thus, as m→∞, we have ρX,ma.s.→ ρcX.

LEMMA A.2. Suppose X(k)`

i.i.d∼ F c` for k = 1, 2, . . . ,m` and ` = 1, 2, . . . , L. Asm→∞,we have m`/m → c` with ` = 1, 2, . . . , L for a constant c` > 0. Then, as m → ∞,the bootstrapped Spearman rank correlation estimator RX,m converges a.s. to the truecorrelation RcX.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

App–4 W. Xie et al.

PROOF. Since F1, F2, . . . , FL are mutually independent, we only need to show asm` → ∞, RX`,m`(i, j)

a.s.→ RcX`(i, j) for any i 6= j and ` with d` > 1, where the boot-

strapped Spearman rank correlation is

RX`,m`(i, j) ≡

∑m`k=1

(r(X

(k)`,i )− r(X`,i)

)(r(X

(k)`,j )− r(X`,j)

)√[∑m`

k=1

(r(X

(k)`,i )− r(X`,i)

)2]·[∑m`

k=1

(r(X

(k)`,j )− r(X`,j)

)2] (19)

with r(X(k)`,i ) =

∑m`j=1 I

(X

(j)`,i ≤ X

(k)`,i

)and r(X`,i) =

∑m`k=1 r(X

(k)`,i )/m`.

The normalized bootstrapped Spearman rank covariance is

τX`,m`(i, j) ≡1

m` − 1

m∑k=1

(r(X

(k)`,i )

m`− r(X`,i)

m`

)(r(X

(k)`,j )

m`− r(X`,j)

m`

)

=m`

m` − 1︸ ︷︷ ︸→1

[1

m`

m∑k=1

(r(X

(k)`,i )

m`

r(X(k)`,j )

m`

)− r(X`,i)

m`

r(X`,j)

m`

].

To show τX`,m`(i, j)a.s.→ E[F`,i(X`,i)F`,j(X`,j)]− E[F`,i(X`,i)]E[F`,j(X`,j)] as m` →∞, we

only need show

r(X`,i)

m`

a.s.→ E[F`,i(X`,i)], (20)

1

m`

m∑k=1

(r(X

(k)`,i )

m`

r(X(k)`,j )

m`

)a.s.→ E[F`,i(X`,i)F`,j(X`,j)]. (21)

Let F`,i denote the empirical marginal distribution generated by{X(1)

`,i , X(2)`,i , . . . , X

(m`)`,i }. Notice that F`,i(X

(k)`,i ) = r(X

(k)`,i )/m`.

We first show Equation (20). By the definition of the sample rank correlation esti-mator,

r(X`,i)

m`=

1

m`

m∑k=1

r(X(k)`,i )

m`=

1

m`

m∑k=1

F`,i(X(k)`,i )

=1

m`

m∑k=1

(F`,i(X

(k)`,i )− F`,i(X(k)

`,i ))

+1

m`

m∑k=1

F`,i(X(k)`,i ). (22)

By Theorem 1.1 in Arenal-Gutierrez et al. [1996], as m` → ∞, ‖ F`,i − F`,i ‖∞a.s.→ 0,

where ‖ h ‖∞ is the sup-norm of h on <, ‖ h ‖∞= supt |h(t)|. Thus, for the first term onthe right-hand side of Equation (22), we have

1

m`

m∑k=1

(F`,i(X

(k)`,i )− F`,i(X(k)

`,i ))a.s.→ 0 as m` →∞.

For the second term, by following steps similar to the proof of Lemma A.1, we have

1

m`

m∑k=1

F`,i(X(k)`,i )

a.s.→ E[F`,i(X`,i)] as m` →∞.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation App–5

Therefore, Equation (20) follows.Then, to show Equation (21), we have

1

m`

m∑k=1

(r(X

(k)`,i )

m`

r(X(k)`,j )

m`

)=

1

m`

m∑k=1

F`,i(X(k)`,i )F`,j(X

(k)`,j )

=1

m`

m∑k=1

[F`,i(X

(k)`,i )F`,j(X

(k)`,j )− F`,i(X(k)

`,i )F`,j(X(k)`,j )

]+

1

m`

m∑k=1

F`,i(X(k)`,i )F`,j(X

(k)`,j ). (23)

Since as m` →∞, ‖ F`,i − F`,i ‖∞a.s.→ 0, we have

F`,i(X(k)`,i )F`,j(X

(k)`,j )− F`,i(X(k)

`,i )F`,j(X(k)`,j )

= F`,i(X(k)`,i )︸ ︷︷ ︸

≤1

[F`,j(X

(k)`,j )− F`,j(X(k)

`,j )]

︸ ︷︷ ︸a.s.→ 0

+F`,j(X(k)`,j )︸ ︷︷ ︸

≤1

[F`,i(X

(k)`,i )− F`,i(X(k)

`,i )]

︸ ︷︷ ︸a.s.→ 0

.

Thus, the first term in Equation (23) converges to zero a.s. as m` →∞. For the secondterm on the right-hand side of Equation (23), by following steps similar to the proof ofLemma A.1, we have

1

m`

m∑k=1

F`,i(X(k)`,i )F`,j(X

(k)`,j )

a.s.→ E[F`,i(X`,i)F`,j(X`,j)] as m` →∞. (24)

Therefore, Equation (21) follows. By applying the continuous mapping theorem[Vaart 1998], as m` → ∞, we have τX`,m`(i, j)

a.s.→ E[F`,i(X`,i)F`,j(X`,j)] −E[F`,i(X`,i)]E[F`,j(X`,j)] for i, j = 1, 2, . . . , d`.

For the Spearman rank correlation, we rewrite Equation (19) and apply the contin-uous mapping theorem [Vaart 1998],

RX`,m`(i, j) =

∑m`k=1

(r(X

(k)`,i )− r(X`,i)

)(r(X

(k)`,j )− r(X`,j)

)√[∑m`

k=1

(r(X

(k)`,i )− r(X`,i)

)2]·[∑m`

k=1

(r(X

(k)`,j )− r(X`,j)

)2]

=

1m`−1

∑m`k=1

(r(X(k)

`,i)

m`− r(X`,i)

m`

)(r(X(k)

`,j)

m`− r(X`,j)

m`

)√√√√[ 1

m`−1∑m`k=1

(r(X(k)

`,i)

m`− r(X`,i)

m`

)2]·

[1

m`−1∑m`k=1

(r(X(k)

`,j)

m`− r(X`,j)

m`

)2]

a.s.→ E[F`,i(X`,i)F`,j(X`,j)]− E[F`,i(X`,i)] · E[F`,j(X`,j)]√Var[F`,i(X`,i)] · Var[F`,j(X`,j)]

= RcX`(i, j) as m` →∞

for all i, j = 1, 2, . . . , d` and ` = 1, 2, . . . , L with d` > 1.

THEOREM A.3. Suppose F1, F2, . . . , FL are mutually independent and the followingconditions hold.

(1) We have X(k)`

i.i.d∼ F c` with k = 1, 2, . . . ,m` and ` = 1, 2, . . . , L;

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

App–6 W. Xie et al.

(2) The marginal distribution F c`,i is uniquely characterized by first h`,i moments andit has finite first 4h`,i moments for i = 1, 2, . . . , d` and ` = 1, 2, . . . , L.

(3) E(X4`,iX

4`,j) <∞ for i, j = 1, 2, . . . , d` and ` = 1, 2, . . . , L with d` > 1.

(4) As m→∞, we have m`/m→ c` with ` = 1, 2, . . . , L for a constant c` > 0.

Then, as m → ∞, the bootstrap moment estimator Mm converges a.s. to the true mo-mentsMc.

PROOF. By Lemma 1 in Xie et al. [2014b], we have as m` → ∞, ψψψX`,m`

a.s.→ ψψψcX`for

` = 1, 2, . . . , L. As m` → ∞, ρX`,m`(i, j)a.s.→ ρcX`

(i, j) by Lemma A.1 and RX`,m`(i, j)a.s.→

RcX`(i, j) by Lemma A.2 for i, j = 1, 2, . . . , d` and ` = 1, 2, . . . , L with d` > 1. Therefore,

we have

Mm =(ψψψm; VS

X,m

)a.s.→ Mc =

(ψψψc;

(VS

X

)c)as m→∞

where, S = ρ or R.

LEMMA A.4. Let ΘΘΘ` ⊆ <d†` be the feasible domain for marginal distribution param-

eters θθθ` and suppose θθθc` is an interior point in ΘΘΘ` for ` = 1, 2, . . . , L. For ` = 1, 2, . . . , Lwith d` > 1,

(1) At any x, suppose the marginal distributions F`,i(x;θθθ`,i), density functionsf`,i(x;θθθ`,i) and inverse distributions F−1`,i (x;θθθ`,i) are continuously differentiable overθθθ`,i for i = 1, 2, . . . , d` on ΘΘΘ`.

(2) For any θθθ` ∈ ΘΘΘ`, suppose the marginal cdfs F`,1(x;θθθ`,1), F`,2(x;θθθ`,2), . . . , F`,d`(x;θθθ`,d`)are continuous and strictly increasing in x.

Then, the function C`,ij(ρZ`(i, j);θθθ`) is invertible and the inverse functionC−1`,ij(ρX`

(i, j);θθθ`) is continuously differentiable at (θθθc`, ρcX`

) for i, j = 1, 2, . . . , d` and` = 1, 2, . . . , L with d` > 1, where the pairwise mapping C`,ij is definedρX`

(i, j) = C`,ij [ρZ`(i, j);θθθ`]

≡∫ ∫

F−1`,i [Φ(z`,i);θθθ`,i]F−1`,j [Φ(z`,j);θθθ`,j ]ϕρZ` (i,j)(z`,i, z`,j)dz`,idz`,j − E(X`,i)E(X`,j)√

Var(X`,i)Var(X`,j).(25)

PROOF. Let g be the mapping such that(θθθ`Vρ

X`

)= g

(θθθ`Vρ

Z`

). (26)

To show the inverse mapping C−1`,ij is continuous at (θθθc`; (VρX`

)c), we first calculate theJacobian determinate of g

|J(g)| =∣∣∣∣ ∂θθθ`/∂θθθ` ∂θθθ`/∂V

ρZ`

∂VρX`/∂θθθ` ∂V

ρX`/∂Vρ

Z`

∣∣∣∣ . (27)

We have ∂θθθ`/∂θθθ` = Id†`×d†

`and ∂θθθ`/∂V

ρZ`

= 0d†`×d∗

`, where Id†

`×d†

`denotes a d†` × d†`

identity matrix and 0d†`×d∗

`denotes a d†` × d∗` zero matrix.

Since F`,i(x;θθθ`,i), f`,i(x;θθθ`,i) and F−1`,i (x;θθθ`,i) are differentiable over θθθ`,i for i =

1, 2, . . . , d`, ∂VρX`/∂θθθ` exists and it can be calculated by applying the chain rule on

Equation (25). The form of the result is complex. However, since the elements of VρX`

are bounded between ±1, ∂VρX`/∂θθθ` is finite in the interior of ΘΘΘ`.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

Multivariate Input Uncertainty in Output Analysis for Stochastic Simulation App–7

Since X`,i has continuous and strictly increasing cdfs for i = 1, 2, . . . , d`, the pairwisemapping C`,ij is continuous and strictly increasing function of ρZ`(i, j) by Hendersonet al. [2000] and Cario and Nelson [1997]. According to the formula of C`,ij in Equa-tion (25), it is also differentiable in ρZ`(i, j). Thus, ∂ρX`

(i, j)/∂ρZ`(i, j) > 0 around ρcZ`for i, j = 1, 2, . . . , d` and ∂Vρ

X`/∂Vρ

Z`is a d∗` × d∗` diagonal matrix with diagonal terms

strictly positive.Therefore, by substituting into Equation (27), the Jacobian determinate of g is posi-

tive

|J(g)| =∣∣∣∣ ∂θθθ`/∂θθθ` ∂θθθ`/∂V

ρZ`

∂VρX`/∂θθθ` ∂V

ρX`/∂Vρ

Z`

∣∣∣∣ =

∣∣∣∣ Id†`×d†

`0d†

`×d∗

`

∂VρX`/∂θθθ` ∂V

ρX`/∂Vρ

Z`

∣∣∣∣ =

∣∣∣∣∣∂VρX`

∂VρZ`

∣∣∣∣∣ > 0. (28)

Since |J(g)| > 0 at (θθθc`; (VρZ`

)c), by applying the inverse function theorem [Wade 2010],there exists an open set denoted by V containing (θθθc`; (Vρ

Z`)c) and another open set W

containing (θθθc`; (VρX`

)c) = g(θθθc`; (VρZ`

)c) such that g : V → W has a continuously differ-entiable inverse g−1 : W → V . Therefore, the function C`,ij(ρZ`(i, j);θθθ`) is invertibleand the inverse function C−1`,ij(ρX`

(i, j);θθθ`) is continuously differentiable at (θθθc`, ρcX`

).

THEOREM A.5. Let ΘΘΘ` ⊆ <d†` be the feasible domain for marginal distribution pa-

rameters θθθ` and suppose θθθc` is an interior point in ΘΘΘ`. Suppose there is one-to-one contin-uous mapping between marginal moments ψψψ`,i and parameters θθθ`,i for i = 1, 2, . . . , d`.For F` with d` > 1, suppose the following conditions hold.

(1) F c` has a NORTA representation (θθθc`, ρcZ`

) with ρcZ` positive definite.(2) At any x, suppose the marginal distributions F`,i(x;θθθ`,i), density functions

f`,i(x;θθθ`,i) and inverse distributions F−1`,i (x;θθθ`,i) are continuously differentiable overθθθ`,i for i = 1, 2, . . . , d` on ΘΘΘ`.

(3) For any θθθ` ∈ ΘΘΘ`, suppose the marginal cdfs F`,1(x;θθθ`,1), F`,2(x;θθθ`,2), . . . , F`,d`(x;θθθ`,d`)are continuous and strictly increasing in x.

Then the true moment vector Mc` = (ψψψc`; (Vρ

X`)c) is an interior point of the NORTA

feasible region: In the d†` + d∗` dimensional space, there exists a constant δ > 0 such thatany moment combinationM` in the open ball Bδ(Mc

`) is NORTA feasible.

PROOF. Let ρZ` = g1(θθθ`, ρX`). By Lemma A.4, g1 is continuous at (θθθc`, ρ

cX`

). The map-ping hwith (θθθ`, ρX`

) = h(ψψψ`,VρX`

) is continuous atMc`. Let λZ`,min denote the minimum

eigenvalue of ρZ` . It is a continuous function of the entries of ρZ` : λZ`,min = g2(ρZ`) withthe mapping g2 as a continuous function. Then,

λZ`,min = g2(ρZ`) = g2 ◦ g1(θθθ`, ρX`)

= g2 ◦ g1 ◦ h(M`) ≡ f(M`)

with f being continuous atMc`.

We have positive definite ρcZ` with λcZ`,min = a > 0. Then, ρZ` having the minimumeigenvalue in the open interval (λcZ`,min − ε, λcZ`,min + ε) is positive definite if ε < a.Since f is continuous at Mc

`, for ε = a/2 there exists δ > 0 such that for all momentcombinations M` having dM`

(M`,Mc`) < δ, M` has positive minimum eigenvalue

λZ`,min ∈ (λcZ`,min− ε, λcZ`,min + ε), where dM`(·, ·) denotes the Euclidean distance in the

d†` + d∗` space. Therefore,Mc` is an interior point of NORTA feasible region.

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.

App–8 W. Xie et al.

THEOREM A.6. Suppose conditions of Theorems A.3 and A.5 and the following ad-ditional assumptions hold.

(1) εj(x)i.i.d.∼ N(0, σ2

ε (x)) for any x, and M(x) is a stationary, separable GP with acontinuous correlation function satisfying

1− γ(x− x′) ≤ c

|log(‖ x− x′ ‖2)|1+δ1for all ‖ x− x′ ‖2≤ δ2 (29)

for some c > 0, δ1 > 0 and δ2 < 1, where ‖ x− x′ ‖2=√∑d

j=1(xj − x′j)2.(2) The input processes, simulation noise εj(x) and GP M(x) are mutually independent

and the bootstrap process is independent of all of them.

Then the interval [M(dB α2 e),M(dB(1−α2 )e)] is asymptotically consistent, meaning the iter-

ated limit

limm→∞

limB→∞

Pr{M(dBα/2e) ≤Mp(Mc) ≤M(dB(1−α/2)e)} = 1− α. (30)

PROOF. By Theorem A.3, we have Mma.s.→ Mc as m→∞. Therefore, as m→∞, all

the bootstrapped moments eventually locate in an arbitrarily small neighborhood ofMc and they are NORTA feasible by applying Theorem A.5. Then, by following stepssimilar to the proof of Theorem 1 in Xie et al. [2014b], we can show Equation (30).

ACM Transactions on Modeling and Computer Simulation, Vol. V, No. N, Article A, Publication date: YYYY.