high-dimensional uncertainty quantification via tensor

12
1 High-Dimensional Uncertainty Quantification via Tensor Regression with Rank Determination and Adaptive Sampling Zichang He and Zheng Zhang, Member, IEEE (Invited Paper) Abstract—Fabrication process variations can significantly in- fluence the performance and yield of nano-scale electronic and photonic circuits. Stochastic spectral methods have achieved great success in quantifying the impact of process variations, but they suffer from the curse of dimensionality. Recently, low-rank tensor methods have been developed to mitigate this issue, but two fundamental challenges remain open: how to automatically determine the tensor rank and how to adaptively pick the informative simulation samples. This paper proposes a novel tensor regression method to address these two challenges. We use a q /‘2 group-sparsity regularization to determine the tensor rank. The resulting optimization problem can be efficiently solved via an alternating minimization solver. We also propose a two- stage adaptive sampling method to reduce the simulation cost. Our method considers both exploration and exploitation via the estimated Voronoi cell volume and nonlinearity measurement respectively. The proposed model is verified with synthetic and some realistic circuit benchmarks, on which our method can well capture the uncertainty caused by 19 to 100 random variables with only 100 to 600 simulation samples. Index Terms—Tensor regression, high dimensionality, uncer- tainty quantification, polynomial chaos, process variation, rank determination, adaptive sampling. I. I NTRODUCTION Fabrication process variations (e.g., surface roughness of interconnects and photonic waveguide, and random doping effects of transistors) have been a major concern in nano-scale chip design. They can significantly influence chip performance and decrease product yield [2]. Monte Carlo (MC) is one of the most popular methods o quantify the chip performance under uncertainty, but it requires a huge amount of computational cost [3]. Instead, stochastic spectral methods based on gener- alized polynomial chaos (gPC) [4] offer efficient solutions for fast uncertainty quantification by approximating a real uncer- tain circuit variable as a linear combination of some stochastic basis functions [5–7]. These techniques have been increasingly used in design automation [8–15]. The main challenge of the stochastic spectral method is the curse of dimensional- ity: the computational cost grows very fast as the number of random parameters increases. In order to address this fundamental challenge, many high-dimensional solvers have been developed. The representative techniques include (but The preliminary results of this work were published in EPEPS 2020 [1]. This work was partly supported by NSF grants #1763699 and #1846476. Zichang He and Zheng Zhang are with Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106, USA (e-mails: [email protected]; [email protected]). are not limited to) compressive sensing [16, 17], hyperbolic regression [18], analysis of variance (ANOVA) [19, 20], model order reduction [21], and hierarchical modeling [22, 23], and tensor methods [23, 24]. The low-rank tensor approximation has shown promising performance in solving high-dimensional uncertainty quan- tification problems [24–29]. By low-rank tensor decomposi- tion, one may reduce the number of unknown variables in uncertainty quantification to a linear function of the parameter dimensionality. However, there is a fundamental question: how can we determine the tensor rank and the associated model complexity? Because it is hard to exactly determine a tensor rank a-priori [30], existing methods often use a tensor rank pre-specified by the user or use a greedy method to update the tensor rank until convergence [24, 31, 32]. These methods often offer inaccurate rank estimation and are complicated in computation. Besides rank determination, another important question is: how can we adaptively add a few simulation samples to update the model with a low computation budget? This is very important in electronic and photonic design automation because obtaining each piece of data sample requires time-consuming device-level or circuit- level numerical simulations. Paper contributions. We propose a novel tensor regres- sion method for high-dimensional uncertainty quantification. Tensor regression has been studied in machine learning and image data analysis [33–35]. There are some existing works of automatic rank determination [36–38] and adaptive sam- pling [39, 40] for tensor decomposition and completion. The Bayesian frameworks [41, 42] can enable and guide the adaptive sampling procedure for tensor regression, but limit the model in the meanwhile. Focusing on uncertainty quantification, there are few works about tensor regression and its automatic rank determination and adaptive sampling. The main contributions of this paper include: We formulate high-dimensional uncertainty quantification as a tensor regression problem. We further propose a q /‘ 2 group-sparsity regularization method to determine rank automatically. Based on variation equality, the tensor- structured regression problem can be efficiently solved via a block coordinate descent algorithm with an analytical solution in each subproblem. We propose a two-stage adaptive sampling method to reduce the simulation cost. This method balances the exploration and exploitation via combining the estimation of Voronoi

Upload: others

Post on 21-Feb-2022

11 views

Category:

Documents


0 download

TRANSCRIPT

1

High-Dimensional Uncertainty Quantification viaTensor Regression with Rank Determination and

Adaptive SamplingZichang He and Zheng Zhang, Member, IEEE

(Invited Paper)

Abstract—Fabrication process variations can significantly in-fluence the performance and yield of nano-scale electronic andphotonic circuits. Stochastic spectral methods have achieved greatsuccess in quantifying the impact of process variations, butthey suffer from the curse of dimensionality. Recently, low-ranktensor methods have been developed to mitigate this issue, buttwo fundamental challenges remain open: how to automaticallydetermine the tensor rank and how to adaptively pick theinformative simulation samples. This paper proposes a noveltensor regression method to address these two challenges. Weuse a `q/`2 group-sparsity regularization to determine the tensorrank. The resulting optimization problem can be efficiently solvedvia an alternating minimization solver. We also propose a two-stage adaptive sampling method to reduce the simulation cost.Our method considers both exploration and exploitation via theestimated Voronoi cell volume and nonlinearity measurementrespectively. The proposed model is verified with synthetic andsome realistic circuit benchmarks, on which our method can wellcapture the uncertainty caused by 19 to 100 random variableswith only 100 to 600 simulation samples.

Index Terms—Tensor regression, high dimensionality, uncer-tainty quantification, polynomial chaos, process variation, rankdetermination, adaptive sampling.

I. INTRODUCTION

Fabrication process variations (e.g., surface roughness ofinterconnects and photonic waveguide, and random dopingeffects of transistors) have been a major concern in nano-scalechip design. They can significantly influence chip performanceand decrease product yield [2]. Monte Carlo (MC) is one of themost popular methods o quantify the chip performance underuncertainty, but it requires a huge amount of computationalcost [3]. Instead, stochastic spectral methods based on gener-alized polynomial chaos (gPC) [4] offer efficient solutions forfast uncertainty quantification by approximating a real uncer-tain circuit variable as a linear combination of some stochasticbasis functions [5–7]. These techniques have been increasinglyused in design automation [8–15]. The main challenge ofthe stochastic spectral method is the curse of dimensional-ity: the computational cost grows very fast as the numberof random parameters increases. In order to address thisfundamental challenge, many high-dimensional solvers havebeen developed. The representative techniques include (but

The preliminary results of this work were published in EPEPS 2020 [1].This work was partly supported by NSF grants #1763699 and #1846476.

Zichang He and Zheng Zhang are with Department of Electrical andComputer Engineering, University of California, Santa Barbara, CA 93106,USA (e-mails: [email protected]; [email protected]).

are not limited to) compressive sensing [16, 17], hyperbolicregression [18], analysis of variance (ANOVA) [19, 20], modelorder reduction [21], and hierarchical modeling [22, 23], andtensor methods [23, 24].

The low-rank tensor approximation has shown promisingperformance in solving high-dimensional uncertainty quan-tification problems [24–29]. By low-rank tensor decomposi-tion, one may reduce the number of unknown variables inuncertainty quantification to a linear function of the parameterdimensionality. However, there is a fundamental question:how can we determine the tensor rank and the associatedmodel complexity? Because it is hard to exactly determinea tensor rank a-priori [30], existing methods often use atensor rank pre-specified by the user or use a greedy methodto update the tensor rank until convergence [24, 31, 32].These methods often offer inaccurate rank estimation andare complicated in computation. Besides rank determination,another important question is: how can we adaptively adda few simulation samples to update the model with a lowcomputation budget? This is very important in electronic andphotonic design automation because obtaining each piece ofdata sample requires time-consuming device-level or circuit-level numerical simulations.

Paper contributions. We propose a novel tensor regres-sion method for high-dimensional uncertainty quantification.Tensor regression has been studied in machine learning andimage data analysis [33–35]. There are some existing worksof automatic rank determination [36–38] and adaptive sam-pling [39, 40] for tensor decomposition and completion.The Bayesian frameworks [41, 42] can enable and guidethe adaptive sampling procedure for tensor regression, butlimit the model in the meanwhile. Focusing on uncertaintyquantification, there are few works about tensor regression andits automatic rank determination and adaptive sampling. Themain contributions of this paper include:• We formulate high-dimensional uncertainty quantification

as a tensor regression problem. We further propose a`q/`2 group-sparsity regularization method to determinerank automatically. Based on variation equality, the tensor-structured regression problem can be efficiently solved viaa block coordinate descent algorithm with an analyticalsolution in each subproblem.

• We propose a two-stage adaptive sampling method to reducethe simulation cost. This method balances the explorationand exploitation via combining the estimation of Voronoi

2

cell volumes and the nonlinearity of an output function.• We verify the proposed uncertainty quantification model on

a 100-dim synthetic function, a 19-dim photonic band-passfilter, and a 57-dim CMOS ring oscillator. Our model canwell capture the high-dimensional stochastic output withonly 100-600 samples.

Compared with our conference paper [1], this manuscriptpresents the following additional results:• The detailed implementations of the proposed method, in-

cluding both the compact tensor regression solver and theadaptive sampling procedure (Section III and IV)

• The post-processing step of extracting statistical informationfrom the obtained tensor regression model (Section V).

• The enriched experiments (Section VI), including a demon-strative synthetic example and detailed comparisons withother methods.

II. NOTATION AND PRELIMINARIES

Throughout this paper, a scalar is represented by a lowercaseletter, e.g., x ∈ R; a vector or matrix is represented by aboldface lowercase or capital letter respectively, e.g., x ∈ Rnand X ∈ Rm×n. A tensor, which describes a multidimensionaldata array, is represented by a bold calligraphic letter, e.g.,X ∈ Rn1×n2···×nd . The (i1, i2, · · · , id)-th data element ofa tensor X is denoted as xi1i2···id . Obviously X reduces toa matrix X when d = 2, and its data element is xi1i2 .In this section, we will briefly introduce the background ofgeneralized polynomial chaos (gPC) and tensor computation.

A. Generalized Polynomial Chaos Expansion

Let ξ = [ξ1, . . . , ξd] ∈ Rd be a random vector describingfabrication process variations with mutually independent com-ponents. We aim to estimate the interested performance metricy(ξ) (e.g., chip frequency or power) under such uncertainty.We assume that y(ξ) has a finite variance under the processvariations. A truncated gPC expansion approximates y(ξ) asthe summation of a series of orthornormal basis functions [4]:

y(ξ) ≈ y(ξ) =∑α∈Θ

cαΨα(ξ), (1)

where α ∈ Nd is an index vector in the index set Θ, cα is thecoefficient, and Ψα is a polynomial basis function of degree|α| = α1 + α2 + · · · + αd. One of the most commonly usedindex set is the total degree one, which selects multivariatepolynomials up to a total degree p, i.e.,

Θ = α|αk ∈ N, 0 ≤d∑k=1

αk ≤ p, (2)

leading to a total of (d+p)!d!p! terms of expansion. Let φ(k)

αk (ξk)denote the order-αk univariate basis of the k-th randomparameter ξk, the multivariate basis is constructed via takingthe product of univariate orthornormal polynomial basis:

Ψα(ξ) =

d∏k=1

φ(k)αk

(ξk). (3)

Therefore, given the joint probability density function ρ(ξ),the multivariate basis satisfies the orthornormal condition:

〈Ψα(ξ),Ψβ(ξ)〉 =

∫Rd

Ψα(ξ)Ψβ(ξ)ρ(ξ)dξ = δα,β. (4)

The detailed formulation and construction of univariate basisfunctions can be found in [4, 43].

In order to estimate the unknown coefficients cα’s, severalpopular methods can be used, including intrusive (i.e., non-sampling) methods (e.g., stochastic Galerkin [44] and stochas-tic testing [6]) and non-intrusive (i.e., sampling) methods(e.g., stochastic collocation based on pseudo-projection orregression [45]). It is well known that gPC expansion suffersthe curse of dimensionality. The computational cost growsexponentially as the dimension of ξ increases.

B. Tensor and Tensor Decomposition

Given two tensors X and Y ∈ Rn1×n2···×nd , their innerproduct is defined as:

〈X ,Y〉 :=∑i1···id

xi1···idyi1···id . (5)

A tensor X can be unfolded into a matrix alongthe k-th mode/dimension, denoted as Unfoldk(X ) :=X(k) ∈ Rnk×n1···nk−1nk+1···nd . Conversely, folding the k-mode matrization back to the original tensor is denoted asFoldk(X(k)) := X .

Given a d-dim tensor, it can be factorized as a sum-mation some rank-1 vectors, which is called CANDE-COMP/PARAFAC (CP) decomposition [46]:

X =

R∑r=1

a(1)r a(2)

r · · · a(d)r = [[A(1),A(2), . . . ,A(d)]], (6)

where denotes the outer product. The last term is the Krusalform, where factor matrix A(k) =

[a

(k)1 , . . . ,a

(k)R

]∈ Rnk×R

includes all vectors associated with the k-th dimension. Thesmallest number of R that ensures the above equality is calleda CP rank. The k-th mode unfolding matrix X(k) can bewritten with CP factors as

X(k) =A(k)A(\k)T with

A(\k) =A(d) · · · A(k−1) A(k+1) · · · A(1),(7)

where denotes the Khatri-Rao product, which performscolumn-wise Kronecker products [46]. More details of tensoroperations can be found in [46].

III. PROPOSED TENSOR REGRESSION METHOD

A. Low-Rank Tensor Regression Formulation

To approximate y(ξ) as a tensor regression model, wechoose a full tensor-product index set for the gPC expansion:

Θ = α = [α1, α2, · · · , αd] | 0 ≤ αk ≤ p,∀k ∈ [1, d] . (8)

This specifies a gPC expansion with (p+ 1)d basis functions.Let ik = αk+1, then we can define two d-dimensional tensorsX and B(ξ) with their (i1, i2, · · · id)-th elements as

xi1i2···id = cα and bi1i2···id(ξ) = Ψα(ξ). (9)

3

Fig. 1. Visualization of the tensor rank determination. Here the grayvectors denote some shrinking tensor factors that can be removed from aCP decomposition.

Combining Eqs (1), (8) and (9), the truncated gPC expansioncan be written as a tensor inner product

y(ξ) ≈ y(ξ) = 〈X ,B(ξ)〉. (10)

The tensor B(ξ) ∈ R(p+1)×···×(p+1) is a rank-1 tensor thatcan be exactly represented as:

B(ξ) = φ(1)(ξ1) φ(2)(ξ2) · · · φ(d)(ξd), (11)

where φ(k)(ξk) = [φ(k)0 (ξk), · · · , φ(k)

p (ξk)]T ∈ Rp+1 collectsall univariate basis functions of random parameter ξk up toorder-p.

The unknown coefficient tensor X has (p+ 1)d variables intotal, but we can describe it via a rank-R CP approximation:

X ≈R∑r=1

u(1)r u(2)

r · · · u(d)r = [[U(1),U(2), . . . ,U(d)]].

(12)It decreases the number of unknown variables to (p + 1)dR,which only linearly depends on d and thus effectively over-comes the curse of dimensionality.

Our goal is to compute coefficient tensor X given a setof data samples ξn, y(ξn)Nn=1 via solving the followingoptimization problem

minU(k)dk=1

h(X ) = 12

N∑n=1

(yn − 〈[[U(1),U(2), . . . ,U(d)]],Bn〉

)2,

(13)where yn = y(ξn), Bn = B(ξn), and ξn denotes the n-thsample.

B. Automatic Rank Determination

The low-rank approximation (12) assumes that X can bewell approximated by R rank-1 terms. In practice, it is hardto determine R in advance. In this work, we leverage a group-sparsity regularization function to shrink the tensor rank froman initial estimation. Specifically, define the following vector:

v := [v1, v2, · · · , vR] with vr =

(d∑k=1

‖u(k)r ‖22

) 12

∀r ∈ [1, R].

(14)We further use its `q norm with q ∈ (0, 1] to measure thesparsity of v:

g(X ) = ‖v‖q, q ∈ (0, 1] . (15)

This function groups all rank-1 term factors together andenforces the sparsity among R groups. The rank is reducedwhen the r-th columns of all factor matrices are enforced tozero. When q = 1, this method degenerates to a group lasso,and a smaller q leads to a stronger shrinkage force.

Based on this rank-shrinkage function, we compute thetensor-structured gPC coefficients by solving a regularizedtensor regression problem:

minU(k)dk=1

f(X ) =h(X ) + λg(X ), (16)

where λ > 0 is a regularization parameter. As shown in Fig. 1,after solving this optimization problem, some columns withthe same column indices among all matrices U(k)’s are closeto zero. These columns can be deleted and the actual rank ofour obtained tensor becomes R ≤ R, where R is the numberof remaining columns in each factor matrix.

C. A More Tractable Regularization

It is non-trivial to minimize f(X ) since g(X ) is non-differentiable and non-convex with respect to U(k)’s. There-fore, we replace the regularization function with a moretractable one based on the following variational equality.

Lemma 1 (Variational equality [47]). Let α ∈ (0, 2], andβ = α

2−α . For any vector y ∈ Rp, we have the followingequality

‖y‖α = minη∈Rp+

1

2

p∑r=1

y2r

ηr+

1

2‖η‖β , (17)

where the minimum is uniquely attained for ηr =|yr|2−α‖y‖α−1

α , r = 1, 2, . . . , p.

Proof. See Appendix A.

If we take p = R, α = q, and yr = vr (defined in (14)), onthe right-hand side of Eq. (17), then we have

g(X ,η) =1

2

R∑r=1

v2r

ηr+

1

2‖η‖ q

2−q. (18)

The original rank-shrinking function (15) is equivalent to

g(X ) = minη∈RR+

g(X ,η). (19)

As a result, we solve the following optimization problem asan alternative to (16):

minU(k)dk=1,η

f(X ) =h(X ) + λg(X ,η). (20)

D. A Block Coordinate Descent Solver for Problem (20)

Now we present an alternating minimization solver forProblem (20). Specifically, we decompose Problem (20) into(d + 1) sub-problems with respect to U(k)dk=1 and η, andwe can obtain the analytical solution to each sub-problem.• U(k)-subproblem: By fixing η and all tensor factors

except U(k), we can see that the variational equalityinduces a convex subproblem. Based on the first-orderoptimality condition, U(k) can be updated analytically:

vec(U(k)) = (ΦTΦ + λΛ)−1

ΦTy, (21)

4

where Φ = [Φ1, · · · ,ΦN ]T ∈ RN×R(p+1) with rows

ΦTn = vec

(Bn

(k)U(\k))T

for any n ∈ [1, N ], Λ =

diag( 1η1, . . . , 1

ηR) ⊗I ∈ RR(p+1)×R(p+1), and y ∈ RN is

a collection of output simulation samples. Here ⊗ denotesa Kronecker product, U(\k) is a series of Khatri-Raoproduct defined as Eq. (7), and Bn

(k) is the k-th modematrization of the tensor Bn = B(ξn). For simplicity, weleave the derivation of Eq. (21) to Appendix B.

• η-subproblem: Suppose that U(k)dk=1 are fixed, theformulation of the η-subproblem is shown in (19). Ac-cording to lemma 1, we update η as

ηr = (vr)2−q‖v‖q−1

q + ε, (22)

where ε > 0 is a small scalar to avoid numericalissues. Suppose that the tensor rank is reduced in theoptimization, i.e., u

(k)r = 0, ∀k ∈ [1, d], then we can see

that ηr will become zero without ε.

E. Discussions

We would like to highlight a few key points in practicalimplementations.• The solution depends on the initialization process. In the first

iteration of updating the k-th factor matrices, we suggest thefollowing initialization

Φ = [Φ1,Φ2, . . .ΦN ]T with

Φn = vec(On,k)T ,∀n ∈ [1, N ],

On,k =[φ(k)(ξnk ), . . . ,φ(k)(ξnk )

]∈ R(p+1)×R,

(23)

where ξnk is the k-th variable of sample ξn, φ(k)(ξnk ) ∈Rp+1 collects all univariate basis functions of ξnk up todegree p, and On,k stores R copies of φ(k)(ξnk ). Besides,in the first iteration without adaptive sampling, we set ηas an all-ones vector multiplied by a scalar factor sincewe do not have a good initial guess for U(k)dk=1. Thevalue of the scalar factor does not influence a lot once itmakes Eq. (21) numerically stable. In an adaptive samplingsetting (see Section IV), we need to solve (20) after addingnew samples. In this case, we use a warm-up initializationby setting the initial guess of U(k)dk=1 as the solutionobtained based on the last-round sampling, and thereforecan initialize η via Eq. (22).

• The regularization parameter λ is highly related to the forceof rank shrinkage. To adaptively balance the empirical lossand the rank shrinkage term, we suggest an iterative updateof the parameter

λ = λ0 max(η), (24)

where λ0 is chosen via cross validation.• We stop the block coordinate descent solver for prob-

lem (20) when the update of factor matrices U(k)dk=1

is below a predefined threshold, or the algorithm reachesa predefined maximal number of iterations.The overall algorithm, including an adaptive sampling

which will be introduced in Section IV, is summarized inAlg. 1. After solving the factor matrices U(k)dk=1, i.e. the

Algorithm 1: Overall Adaptive Tensor Regression

Input: Initial sample pairs ξn, y(ξn)Nn=1, unitarypolynomial order p, initial tensor rank R

Output: Constructed surrogate model [Eq. (25)]while Adaptive sampling does not stop do

Construct the basis tensor B(ξ)if No additional samples then

Initialize with Eq. (23)else

Initialize U(k)dk=1 with the last solution

while Tensor regression does not stop dofor k = 1, 2, . . . , d do

update U(k) via Eq. (21)Update η via Eq. (22)Update regularization parameter λ via Eq. (24)

Shrink the tensor rank to R if possibleSelect new sample pairs based on Alg. 2

coefficient tensor X , the surrogate on a sample ξ can beefficiently calculated as

y(ξ) = 〈X ,B(ξ)〉 =

R∑r=1

d∏k=1

[φ(k)(ξk)

]Tu(k)r . (25)

In this work, tensor X is approximated by a low-rank CPdecomposition. It is also possible to use other kinds of tensordecompositions. In those cases, although the tensor ranks aredefined in different ways, the idea of enforcing group-sparsityover tensor factors still works. It is also worth noting that (20)can be seen as a generalization of weighted group lasso. Tofurther exploit the sparsity structure of the gPC coefficients,many variants can be developed from the statistic regressionperspective, including the sparse group lasso, tensor-structuredElastic-Net regression, and so forth [48].

IV. ADAPTIVE SAMPLING APPROACH

Another fundamental question in uncertainty quantificationis how to select the parameter samples ξ for simulation. Weaim to reduce the simulation cost by selecting only a fewinformative samples for the detailed device- or circuit-levelsimulations.

Given a set of initial samples Θ, we design a two-stagemethod to balance the exploration and exploitation in ouractive sampling process. In the first stage, we estimate thevolume of some Voronoi cells via a Monte Carlo method tomeasure the sampling density in each region. In the secondstage, we roughly measure the nonlinearity of y(ξ) at somecandidate samples via a Taylor expansion. We choose newsamples that are located in a low-density region and make y(ξ)highly nonlinear. In our implementation, the initial samplesΘ = ξn, y(ξn)Nn=1 are generated by the Latin Hypercube(LH) sampling method [49]. Specifically, we first generatesome standard LH samples ζLH

n Nn=1 in a hyper cube [0, 1]d,

then we transform them to the practical parameter space Ω viathe inverse transforms of the cumulative distribution function.

5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

LH sample MC sample

Selected

cell

Fig. 2. An example of Voronoi diagram on [0, 1]2. Each LH sample is aVoronoi cell center. The lower right cell should be selected in the first-stagesince it has the largest estimated area (volume).

Generally, the initial sample size of Θ depends on the numberof unknowns in the model. Since problem (20) is regularizedand solved via an alternating solver, given a limited simulationbudget, we set the initial size N to be smaller than the numberof unknowns in our examples.

A. Exploration: Volume Estimation of Voronoi Cells

Firstly, we employ an exploration step via a space-fillingsequential design. Given the existing sample set Θ, the sampledensity in Ω can be estimated via a Voronoi diagram [50].Specifically, each sample ξn corresponds to a Voronoi cellCn ∈ Ω that contains all the samples that lie closely to ξn thanother samples in Ω. The Voronoi diagram is a complete set ofcells that tesselate the whole sampling space. The volume ofa cell reflects its sample density: a larger volume means thatthe cell region is less sampled.

Here we provide a formal description of the Voronoi cell.Given two distinct samples ξi, ξj ∈ Ω, there always exist ahalf-plane hp(ξi, ξj) that contains all samples that are at leastas close to ξi as to ξj

hp(ξi, ξj) = ξ ∈ Rd| ‖ξ − ξi‖ ≤ ‖ξ − ξj‖. (26)

The Voronoi cell Ci is defined as the space that lie in theintersection of all half-plane hp(ξi, ξj),∀ξj ∈ Ω \ ξi:

Ci =⋂

ξj∈Ω\ξi

hp(ξi, ξj). (27)

It is intractable to construct a precise Voronoi diagram andcalculate the volume exactly in a high-dimensional space.Fortunately, we do not need to construct the exact Voronoidiagram. Instead, we only need to estimate the volume in orderto measure the sample density in that cell. This can be donevia a Monte Carlo method.

Observation 1. In order to detect the least-density region inΩ, we can either estimate the density of Ω directly or estimatethe density of hyper-cube [0, 1]

d and then transform it to Ω.

In the Monte-Carlo-based density estimation, it is fairer tochoose the latter one.

Example. One simple example is given in Appendix C.

Based on the above observation, we estimate the volume ofVoronoi cell in the hyper cube. Let the existing LH samplesζLH

n Nn=1 be the cell centers CnNn=1. We first randomlygenerate M Monte Carlo samples ψmMm=1 ∈ [0, 1]

d. Foreach random sample, we calculate its Euclidean distancetowards the cell centers and assign it to the closest one.Then the volume of the cell vol(Cn) is estimated by countingthe number of assigned random samples. The cell with thelargest estimated volume is least-sampled. The MC samplesassigned to this cell are denoted as set Γ. A simple examplethat illustrates the first-round search is shown in Fig. 2. Aftertransforming all Monte Carlo samples in set Γ back to theactual parameter space Ω via the inverse transform samplingmethod, we obtain a set of candidates for the next-stageselection, denoted as set ΓΩ.

The accuracy of volume estimation depends on the numberof random samples. Clearly, more Monte Carlo samples canestimate the volume more accurately, but they also inducemore computational burden. As suggested by [51], to achievea good estimation accuracy, we use M Monte Carlo sampleswith M = 100N .

B. Exploitation: Nonlinearity Measurement

In the second stage, we aim to do an exploitation searchbased on the obtained candidate sample set ΓΩ. Based on theassumption that the region with a more nonlinear response isharder to capture, we choose the criterion in the second stageas the nonlinearity measure of the target function. We knowthat the first-order Taylor expansion of a function becomesmore inaccurate if that function is more nonlinear. Therefore,given a sample ξ, we measure the non-linearity of y(ξ) via thedifference of y(ξ) and its first-order Taylor expansion aroundthe closest Voronoi cell center a ∈ Ω [52]. We do not knowexactly the expression of y(ξ), but we have already built asurrogate model y(ξ) based on previous simulation samples.Therefore, the nonlinearity of y(ξ) can be roughly estimate as

γ(ξ) = |y(ξ)− y(a)−∇y(a)T (ξ − a)|. (28)

Notice that the nonlinear measure does not imply the accuracyof the surrogate model since we do not use the simulationvalue here. In the second stage, we will choose the sample ξ?

that has the largest γ(ξ) from the candidate set of ΓΩ:

ξ? = argmaxξ∈ΓΩ

(γ (ξ)) . (29)

To summarize, we select the most nonlinear sample fromthe least-sampled cell space, which is a good trade-off be-tween exploration and exploitation. Based on the above, wesummarize the adaptive sampling procedure in Alg. 2.

C. Discussion

The proposed adaptive sampling method can be easilyextended to a batch version by searching for the top-K least-sampled regions in the first stage. We can stop sampling

6

Algorithm 2: Adaptive sampling procedure

Input: Initial samples pairs Θ = ξn, y(ξn)Nn=1

Output: Sample pairs Θ? with the additional sampleUniformly generate M = 100N Monte Carlo samplesψmMm=1 ∈ [0, 1]

d

for m = 1, 2, . . . ,M doFind the closest cell Cn center to ψmvol(Cn)← vol(Cn) + 1

Find the cell with the biggest estimated volume voland the sample set Γ assigned to this cell

ΓΩ ← Inverse transform sampling(Γ)Calculate the nonlinearity measure γ(ΓΩ) via Eq. (28)Select ξ? according to Eq. (29)Θ? ← Θ

⋃ξ?, y(ξ?)

when we exceed a sampling budget or when the constructedsurrogate model achieves the desired accuracy.

The sampling criteria do not rely on the structure of thetargeted surrogate model. Therefore, the proposed samplingmethod is very flexible and generic. The proposed method isvery suitable for constructing a high-dimensional polynomialmodel due to two reasons. Firstly, the number of samplesrequired in estimating the Voronoi cell does not rely onthe parameter dimensionality but on the number of existingsamples. Secondly, the derivative and the nonlinearity of thesurrogate model are easy to compute.

Some variants of the proposed sampling methods maybe further developed. For instance, we may define a scorefunction as the combination of the estimated volume and thenonlinearity measure, and then calculate the score for eachMonte Carlo sample and select the best one. In the batchversion, we may also select several top nonlinear samples fromthe same Voronoi cell.

V. STATISTICAL INFORMATION EXTRACTION

Based on the obtained tensor regression model y(ξ) =∑α∈Θ

cαΨα(ξ) = 〈X ,B(ξ)〉, we can easily extract important

statistical information such as moments and Sobol’ indices.• Moment information. The mean µ of the constructed y(·)

is the coefficient of the zero-order basis Ψ0(ξ):

µ = c0 = x11···1 =

R∑r=1

u(1)r (1)u(2)

r (1) · · ·u(d)r (1). (30)

x11···1 is the (1, 1, · · · , 1)-th element of tensor X , andu

(k)r (1) denotes the first element of vector u

(k)r The variance

of y(ξ) can be estimated as:

σ2 =∑

α∈Θ,α 6=0

cα = 〈X ,X〉 − x211···1

=

R∑r1=1

R∑r2=1

d∏k=1

u(k)r1

Tu(k)r2 − µ

2.

(31)

• Sobol’ indices. Based on the obtained model, we can alsoextract the Sobol’ indices [53, 54] for global sensitivity

analysis. The main sensitivity index Sj measures the contri-bution by random parameter ξj along to the variance y(ξ):

Sj =Var [E [y(ξ)|ξj ]]

σ2(32)

where E [y(ξ)|ξj ] denotes the conditional expectation ofy(ξ) over all random variables except ξj . The variance ofthis conditional expectation can be estimated as

Var [E [y(ξ)|ξj ]] =

p+1∑ij=2

x21···ij1···1

=

p+1∑ij=2

R∑r=1

u(j)r (ij)

∏k 6=j

u(k)r (1)

2

.

(33)

The total sensitivity index Tj measures the contribution tothe variance of y(ξ) by variable ξj and its interactions withall other variables:

Tj = 1−Var[E[y(ξ)|ξ\j

]]σ2

.(34)

Here ξ\j includes all elements of ξ except ξj . The involvedvariance of a conditional expectation is estimated as

Var[E[y(ξ)|ξ\j

]]=

∑(i1,i2,···id), ij=1

x2i1···id − x

211···1

=

R∑r1=1

R∑r2=1

u(j)r1 (1)u(j)

r2 (1)∏k 6=j

u(k)r1

Tu(k)r2 − µ

2.

(35)

Similarly, we can also express any higher-order index rep-resenting the effect from the interaction between a set ofvariables with an analytical form.

VI. NUMERICAL RESULTS

In this section, we will verify the proposed tensor-regressionuncertainty quantification method in one synthetic functionand two photonic/ electronic IC benchmarks.

A. Baseline Methods for ComparisonWe compare our proposed method with the following ap-

proaches.• Tensor regression with adaptive sampling based on space

exploration only introduced in Section IV-A (denoted asSpace).

• Tensor regression with adaptive sampling based on ex-ploiting nonlinearity only introduced in model with onlySection IV-B (denoted as Nonlinear).

• Tensor regression model with random sampling (denotedas Rand). In each iteration of adding samples, newsamples are simply randomly selected.

• Fixed-rank tensor regression (denoted as Fixed rank).This method uses a tensor ridge regularization in theregression objective function [35]:

minU(k)dk=1

f(X ) =h(X ) + λ

d∑k=1

‖U(k)‖2F. (36)

7

200 250 300 350 400

Sample

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Tes

tin

g e

rro

r(a)

200 250 300 350 400

Sample

0

1

2

3

4

5

6

Ran

k

(b)

Proposed

Space

Nonlinear

Rand

-190 -180 -170 -160 -150 -140 -130

Function value

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

PD

F

(c)

Proposed (380 samples)

MC (105 samples)

Fig. 3. Results of approximating the synthetic function. (a) Testing error on 105 MC samples. (b) The estimated rank. (c) Probability density functions ofthe function value.

10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

Mai

n S

ensi

tiv

ity

(a)

10 20 30 40 50 60 70 80 90 100

0.02

0.04

0.06

0.08

0.1

0.12

0.14

To

tal

Sen

siti

vit

y

(b)

Fig. 4. Sensitivity analysis of the synthetic function in (38). The proposedmethod fits the results from Monte Carlo [53] with 107 simulations very well.

TABLE IMODEL COMPARISONS ON THE SYNTHETIC FUNCTION

Sample # Variable # Mean Std Testing

Monte Carlo 105 N/A -162.95 4.80 N/ASparse gPC 380 5151 -163.04 2.27 2.3%Fixed rank 380 15x100 -163.24 5.02 1.01%Proposed 380 3x100* -162.93 4.86 0.37%

* In the alternating solver, there are 100 subproblems with 3 unknownvariables in each one (the rank has been shrunk).

The standard ridge regression does not induce a sparsestructure. We will keep the tensor rank fixed in solvingEq. (36).

• Sparse gPC expansion with a total degree truncation [55](denoted as Sparse gPC). With the truncation scheme inEq. (2), we compute the gPC coefficients by solving thefollowing problem:

minc

1

2

∑n=1

(yn −

∑α∈Θ

cαΨα(ξn)

)2

+ λ‖c‖1. (37)

B. Synthetic Function (100-dim)

We first consider the following high-dimensional analyticalfunction [56]:

y(ξ) = 3− 5

d

d∑k=1

kξk +1

d

d∑k=1

kξ3k + ξ1ξ

22 + ξ2ξ4

− ξ3ξ5 + ξ51 + ξ50ξ254 + ln (

1

3d

d∑k=1

k(ξ2k + ξ4

k)) (38)

where dimension d = 100, ξ20 ∼ U([1, 3]), and ξk ∼U([1, 2]), k 6= 20. We aim to approximate f(ξ) by a tensor-regression gPC model and perform sensitivity analysis.

Assume that we use 2nd-order univariate basis functionsfor each random variable, then we will need 3100 multi-variate basis functions in total. To approximate the coefficienttensor, we initialize it with a rank-5 CP decomposition and useq = 0.5 in regularization. We initialize the training with 200Latin-Hypercube samples and adaptively select 9 batches ofadditional samples, with each batch having 20 new samples.We test the accuracy of different models on additional 105

samples. Fig. 3 (a) shows the relative `2 testing errors (i.e.,‖y(ξ)−y(ξ)‖2‖y(ξ)‖2 ) of different methods. The testing errors may not

monotonically decrease since more samples can not strictlyguarantee the convergence of the surrogate model. However,see from the figure, we can generally conclude that moretraining samples lead to a better model approximation and theproposed sampling method outperforms the others. Fig. 3 (b)shows the estimated tensor rank as the number of trainingsamples increases. The proposed method shrinks the tensorrank differently from other methods while achieving the bestperformance. It shows that a correct determination of thetensor rank helps the function approximation. Fig. 3 (c) plotsthe predicted probability density function of our obtainedmodel which is estimated via a kernel density estimator. Itmatches the Monte Carlo simulation result of the originalfunction very well.

We compare the complexity and accuracy of differentmethods in Table I. We treat the result from 105 MonteCarlo simulations as the ground truth. For the other models,the mean and standard deviation are both extracted from thepolynomial coefficients. Given the same amount of (limited)

8

Fig. 5. Schematic of a band-pass filter with 9 micro-ring resonators.

TABLE IIMODEL COMPARISONS ON THE PHOTONIC BAND-PASS FILTER

Sample # Variable # Mean Std Error

Monte Carlo 105 N/A 21.6511 0.0988 N/ASparse gPC 100 210 21.6537 0.0735 0.39%Fixed rank 100 12x19 21.6677 0.1906 0.52%Proposed 100 3x19 21.6567 0.0955 0.16%

training samples, the proposed method achieves the highestapproximation accuracy.

Now we perform sensitivity analysis to identify the randomvariables that are most influential to the output. Fig. 4 plots themain and total sensitivity metrics from the proposed methodand a Monte Carlo estimation [53] with 107 simulations. Withmuch fewer function evaluations, our proposed method canprecisely identify the indices of some most dominant randomvariables that contribute to the output variance.

C. Photonic Band-pass Filter (19-dim)

Now we consider the photonic band-pass filter in Fig. 5.This photonic IC has 9 micro-ring resonators, and it was orig-inally designed to have a 3-dB bandwidth of 20 GHz, a 400-GHz free spectral range, and a 1.55-nm operation wavelength.A total of 19 independent Gaussian random parameters areused to describe the variations of the effective phase index(nneff) of each ring, as well as the gap (g) between adjacentrings and between the first/last ring and the bus waveguides.We aim to approximate the 3-dB bandwidth f3dB at the DROPport as a tensor-regression gPC model.

We use 2nd order univariate polynomial basis functions foreach random parameter and have 319 multivariate basis func-tions in total in the tensor regression gPC model. We initializethe gPC coefficients as a rank-4 CP tensor decomposition andset q = 0.5 in our regularization. We initialize the training with60 Latin-Hypercube samples and adaptively select 9 batches ofadditional samples, with each batch have 10 new samples. Wetest the obtained model with additional 105 samples. Fig. 6 (a)shows the relative `2 testing errors. The proposed methodoutperforms the others in the first few adaptive samplingrounds. All the models perform similarly when the ranks areall shrunk to 1. Fig. 6 (b) shows the estimated tensor rankas the number of training samples increases. The tensor ranks

TABLE IIIMODEL COMPARISONS ON THE CMOS RING OSCILLATOR

Sample # Variable # Mean Std Error

Monte Carlo 3× 104 N/A 12.7920 0.3829 N/ASparse gPC 600 1711 12.7931 0.3777 0.11%Fixed rank 600 12x57 12.7929 0.3822 0.10%Proposed 600 6x57 12.7918 0.3830 0.04%

are shrunk gradually in all cases, but our proposed methodfinds the best rank with minimal samples. Fig. 6 (c) plots thepredicted probability density function of our obtained result.Since the benchmark has a relatively small standard deviation,the limited approximation error is revealed as the discrepancyaround the peak.

In order to see the influence of the tensor rank initialization,we do the one-shot approximation with different initial tensorranks R and different regularization parameters λ as illustratedin Fig. 7. For a specific benchmark, the best-estimated tensorrank highly depends on the number of training samples.Given the limited number of simulation samples, the rank-1initialization works the best in this example. It coincides withthe results shown in Fig. 6, where the predicted rank is 1.We also compare the complexity and accuracy of all methodsin Table II. The proposed method achieves the best accuracywith limited simulation samples.

D. CMOS Ring Oscillator (57-dim)

We continue to consider the 7-stage CMOS ring oscillatorin Fig. 8. This circuit has 57 random variation parameters,including Gaussian parameters describing the temperature,variations of threshold voltages and gate-oxide thickness, anduniform-distribution parameters describing the effective gatelength/width. We aim to approximate the oscillator frequencywith tensor-regression gPC under the process variations.

We use 2nd-order univariate basis functions for each randomparameter, leading to 357 multivariate basis functions in total.We initialize the gPC coefficients as a rank-4 tensor and setq = 0.5 in the regularization term. We initialize the trainingwith 500 Latin-Hypercube samples and adaptively select 300additional samples in total by 6 batches. We test the obtainedmodel with 3× 104 additional samples. Fig. 10 (a) shows therelative `2 testing errors of all methods. The proposed methodoutperforms other methods significantly when the number ofsamples is small. Fig. 10 (b) shows that the estimated tensorrank reduces to 2 in all methods. Fig. 10 (c) plots the predictedprobability density function of the obtained tensor regressionmodel, which is indistinguishable from the result of MonteCarlo simulations.

We do the one-shot approximation with different initialtensor ranks R and different regularization parameters λ asillustrated in Fig. 9. Conforming with the results shown inFig. 10, a rank-2 model is more suitable in this example. Wecompare the proposed method with the fixed rank model andthe 2nd-order sparse gPC in Table III, where the proposedcompact tensor model is shown to have the best approximationaccuracy.

9

60 80 100 120 140 160

Sample

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Tes

ting e

rror

(a)

120 130 140 150

2

4

6

8

10

10-3

60 80 100 120 140 160

Sample

0

1

2

3

4

5

Ran

k

(b)

Proposed

Space

Nonlinear

Rand

20 20.5 21 21.5 22 22.5 23 23.5

Frequence (MHz)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

PD

F

(c)

Proposed (150 samples)

MC (105 samples)

Fig. 6. Result of the photonic filter. (a) Testing error on 105 MC samples. (b) The estimated rank. (c) Probability density functions of the 3-dB bandwidthf3dB at the DROP port.

1 2 3 4 5 6 7 8 9 10

Regularization paramter 10-3

0

0.005

0.01

0.015

0.02

Tes

tin

g e

rro

r

R=1

R=2

R=3

R=4

R=5

R=6

Fig. 7. One-shot approximations for the photonic band-pass filter with 800training samples under different ranks and λ. The rank-1 initialization worksthe best in this example.

Vdd

Fig. 8. Schematic of a CMOS ring oscillator.

VII. CONCLUSION

This paper has proposed a tensor regression framework forquantifying the impact of high-dimensional process variations.By low-rank tensor representation, this formulation can re-

1 10-5

5 10-5

1 10-4

5 10-4

1 10-3

5 10-3

Regularization paramter

0

0.5

1

1.5

2

2.5

3

Tes

tin

g e

rro

r

10-3

R=1

R=2

R=3

R=4

R=5

R=6

Fig. 9. One-shot approximations for the CMOS ring oscillator with 150training samples under different ranks and λ. In this example, the rank-2model works the best in most cases.

duce the number of unknown variables from an exponentialfunction of parameter dimensionality to only a linear one.Therefore it works well with a limited simulation budget. Wehave addressed two fundamental challenges: automatic tensorrank determination and adaptive sampling. The tensor rankis estimated via a `q/`2-norm regularization. The simulationsamples are chosen based on a two-stage adaptive samplingmethod, which utilizes the Voronoi cell volume estimation andthe nonlinearity measure of the quantity of interest. Our modelhas been verified by both synthetic and realistic examples with19 to 100 random parameters. The numerical experiments haveshown that our method can well capture the high-dimensionalstochastic performance with much fewer simulation data.

APPENDIX APROOF OF LEMMA 1

We consider two cases α ∈ (0, 2) [47] and α = 2 [35].When α ∈ (0, 2), κ(η) := 1

2

∑pr=1

y2r

ηr+ 1

2‖η‖β is a con-tinuously differentiable function for any ηi ∈ (0,∞). When

10

500 550 600 650 700 750 800

Sample

0

0.5

1

1.5

2

2.5T

esti

ng

err

or

10-3 (a)

650 700 750 8004

6

8

10-4

500 550 600 650 700 750 800

Sample

1

1.5

2

2.5

3

3.5

4

4.5

5

Ran

k

(b)

Proposed

Space

Nonlinear

Rand

10 11 12 13 14 15 16

Frequence (MHz)

0

0.2

0.4

0.6

0.8

1

1.2

PD

F

(c)

Proposed (800 samples)

MC (3 104 samples)

Fig. 10. Results of the CMOS ring oscillator. (a) Testing error on 3 × 104 MC samples. (b) The estimated rank. (c) Probability density functions of theoscillator frequency.

yr 6= 0, limηr→∞

κ(η) = ∞ and limηr→0

κ(η) = ∞. Therefore,

the infimum of κ(η) exists and it is attained. According tothe first-order optimality and enforcing the derivative w.r.t.ηr (ηr > 0) to be zero, we can obtain

ηr = |yr|2−α‖η‖α−1α

2−α. (39)

With yr = ‖η‖1−α2−αα

2−αηr

12−α , we have ‖η‖ α

2−α=

‖η‖1−α2−αα

2−α(∑pr=1 ηr

α2−α )

1α = ‖y‖α, therefore we obtain the

optimal solution ηr = |yr|2−α‖y‖α−1α in Lemma 1. If yr = 0,

the solution to minη≥0 κ(η) is ηr = 0, which is also consistentwith Lemma 1.

When α = 2, ‖η‖1 is non-differentiable. Given a scalar yr,we have yr = yr

2

2ηr+ 1

2ηr only when ηr = yr (we let yr2

2ηr= 0

when yr = ηr = 0). Similarly, given a vector y ∈ Rp, wehave ‖y‖1 = 1

2

∑pr=1

yr2

ηr+ 1

2‖η‖1 only when η = |y|, whichis also consistent with Lemma 1.

APPENDIX BDETAILED DERIVATION OF EQ. (21)

Let Λ = diag( 1η1, . . . , 1

ηR) and ∗ denote a Hadamard

product, we can rewrite the objective function of an U(k)-subproblem as

fk(U(k))

=1

2

N∑n=1

[yn − 〈U(k)U(\k)T ,Bn

(k)〉]2

2

R∑r=1

‖u(k)r ‖22ηr

=1

2

N∑n=1

[yn − Tr

(U(k)

(Bn

(k)

)T)]2

2Tr(U(k)ΛU(k)T )

with Bn(k) = Bn

(k)U(\k). When the dimension d is large, it is

intractable to store and compute Bn(k) ∈ R(p+1)×(p+1)(d−1)

or

U(\k) ∈ R(p+1)(d−1)×R. Fortunately, based on the property ofKhatri-Rao product, we can compute Bn

(k) as

Bn(k) = Bn

(k)U(\k)

= φ(k)(ξnk )[φ(d)(ξnd )TU(d) ∗ · · ·φ(k+1)(ξnk+1)TU(k+1)∗φ(k−1)(ξnk−1)TU(k−1) ∗ · · ·φ(1)(ξn1 )TU(1)].

Enforcing the following 1st-order optimality condition

∂fk(U(k))

∂U(k)=

− 1

2

N∑n=1

[yn − Tr(U(k)(Bn

(k))T

)]

Bn(k) + λU(k)Λ = 0,

we can obtain the analytical solution in Eq. (21).

APPENDIX CAN EXAMPLE TO SHOW OBSERVATION 1

Suppose we already have two samples [0.2, 0.6], and weconsider a candidate sample 0.4 in the interval [0, 1] equippedwith a uniform distribution. Then, based on Box–Mullertransform, their corresponding Gaussian-distributed samplesare [−0.8416, 0.2533] and −0.2533, respectively. It is easyto know that the PDF value of sample 0.2533 is largerthan sample −0.8416 in a standard Gaussian distribution.Apparently, the candidate sample is equally close to the twoexamples in a uniform-sampled space, but it is closer to theone with a higher probability density in the Gaussian-sampledspace.

REFERENCES

[1] Z. He and Z. Zhang, “High-dimensional uncertainty quantifi-cation via active and rank-adaptive tensor regression,” in Proc.Electr. Perform. Electron. Packag. Syst. (EPEPS), 2020, pp. 1–3.

[2] D. S. Boning, K. Balakrishnan, H. Cai, N. Drego, A. Farahanchi,K. M. Gettings, D. Lim, A. Somani, H. Taylor, D. Truque et al.,“Variation,” IEEE Trans. Semicond. Manuf., vol. 21, no. 1, pp.63–71, 2008.

[3] M. H. Kalos and P. A. Whitlock, Monte Carlo methods. JohnWiley & Sons, 2009.

[4] D. Xiu and G. E. Karniadakis, “Modeling uncertainty in flowsimulations via generalized polynomial chaos,” J. Comput.Phys., vol. 187, no. 1, pp. 137–167, 2003.

[5] K. Strunz and Q. Su, “Stochastic formulation of SPICE-typeelectronic circuit simulation with polynomial chaos,” ACMTrans. Model. Comput. Simul., vol. 18, no. 4, pp. 1–23, 2008.

[6] Z. Zhang, T. A. El-Moselhy, I. M. Elfadel, and L. Daniel,“Stochastic testing method for transistor-level uncertainty quan-tification based on generalized polynomial chaos,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 32, no. 10, pp.1533–1545, 2013.

11

[7] P. Manfredi, D. V. Ginste, D. De Zutter, and F. G.Canavero, “Stochastic modeling of nonlinear circuits viaSPICE-compatible spectral equivalents,” IEEE Trans. CircuitsSyst. I, Reg. Papers, vol. 61, no. 7, pp. 2057–2065, 2014.

[8] Z. He, W. Cui, C. Cui, T. Sherwood, and Z. Zhang, “Efficientuncertainty modeling for system design via mixed integer pro-gramming,” in Proc. Intl. Conf. Computer Aided Design, 2019,pp. 1–8.

[9] M. Ahadi and S. Roy, “Sparse linear regression (SPLINER)approach for efficient multidimensional uncertainty quantifica-tion of high-speed circuits,” IEEE Trans. Comput.-Aided DesignIntegr. Circuits Syst., vol. 35, no. 10, pp. 1640–1652, 2016.

[10] P. Manfredi and S. Grivet-Talocia, “Rational polynomial chaosexpansions for the stochastic macromodeling of network re-sponses,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 67,no. 1, pp. 225–234, 2019.

[11] P. Manfredi, D. V. Ginste, D. De Zutter, and F. G. Canavero,“Generalized decoupled polynomial chaos for nonlinear circuitswith many random parameters,” IEEE Microw. Wireless Com-pon. Lett., vol. 25, no. 8, pp. 505–507, 2015.

[12] A. C. Yucel, H. Bagcı, and E. Michielssen, “An ME-PCenhanced HDMR method for efficient statistical analysis of mul-ticonductor transmission line networks,” IEEE Trans. Compon.Packag. Manuf. Technol., vol. 5, no. 5, pp. 685–696, 2015.

[13] F. Wang, P. Cachecho, W. Zhang, S. Sun, X. Li, R. Kanj,and C. Gu, “Bayesian model fusion: large-scale performancemodeling of analog and mixed-signal circuits by reusing early-stage data,” IEEE Trans. Comput.-Aided Design Integr. CircuitsSyst., vol. 35, no. 8, pp. 1255–1268, 2015.

[14] S. Zhang, W. Lyu, F. Yang, C. Yan, D. Zhou, X. Zeng,and X. Hu, “An efficient multi-fidelity Bayesian optimizationapproach for analog circuit synthesis,” in Proc. Design Autom.Conf, 2019, pp. 1–6.

[15] C. Cui and Z. Zhang, “Stochastic collocation with non-Gaussiancorrelated process variations: Theory, algorithms, and applica-tions,” IEEE Trans. Compon. Packag. Manuf. Technol., vol. 9,no. 7, pp. 1362–1375, 2018.

[16] X. Li, “Finding deterministic solution from underdeterminedequation: large-scale performance variability modeling of ana-log/RF circuits,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 29, no. 11, pp. 1661–1668, 2010.

[17] C. Cui and Z. Zhang, “High-dimensional uncertainty quantifica-tion of electronic and photonic IC with non-Gaussian correlatedprocess variations,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 39, no. 8, pp. 1649–1661, 2019.

[18] M. Ahadi, A. K. Prasad, and S. Roy, “Hyperbolic polynomialchaos expansion (HPCE) and its application to statistical analy-sis of nonlinear circuits,” in Proc. IEEE Workshop Signal PowerIntegr., 2016, pp. 1–4.

[19] X. Ma and N. Zabaras, “An adaptive high-dimensional stochas-tic model representation technique for the solution of stochas-tic partial differential equations,” J. Comput. Phys., vol. 229,no. 10, pp. 3884–3915, 2010.

[20] X. Yang, M. Choi, G. Lin, and G. E. Karniadakis, “AdaptiveANOVA decomposition of stochastic incompressible and com-pressible flows,” J. Comput. Phys., vol. 231, no. 4, pp. 1587–1614, 2012.

[21] T. El-Moselhy and L. Daniel, “Variation-aware interconnectextraction using statistical moment preserving model orderreduction,” in Proc. Design, Autom. Test Eur. Conf. Exhibit.,2010, pp. 453–458.

[22] Z. Zhang, X. Yang, G. Marucci, P. Maffezzoni, I. A. M.Elfadel, G. Karniadakis, and L. Daniel, “Stochastic testingsimulator for integrated circuits and MEMS: Hierarchical andsparse techniques,” in Proc. IEEE Custom Integrated CircuitsConference, 2014, pp. 1–8.

[23] Z. Zhang, X. Yang, I. V. Oseledets, G. E. Karniadakis,and L. Daniel, “Enabling high-dimensional hierarchical uncer-tainty quantification by ANOVA and tensor-train decomposi-

tion,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.,vol. 34, no. 1, pp. 63–76, 2014.

[24] Z. Zhang, T.-W. Weng, and L. Daniel, “Big-data tensor recoveryfor high-dimensional uncertainty quantification of process vari-ations,” IEEE Trans. Compon. Packag. Manuf. Technol., vol. 7,no. 5, pp. 687–697, 2017.

[25] ——, “A big-data approach to handle process variations: Un-certainty quantification by tensor recovery,” in Proc. IEEEWorkshop Signal Power Integr., 2016, pp. 1–4.

[26] P. Rai, “Sparse low rank approximation of multivariatefunctions–applications in uncertainty quantification,” Ph.D. dis-sertation, Ecole Centrale de Nantes (ECN), 2014.

[27] K. Konakli and B. Sudret, “Polynomial meta-models withcanonical low-rank approximations: Numerical insights andcomparison to sparse polynomial chaos expansions,” J. Comput.Phys., vol. 321, pp. 1144–1169, 2016.

[28] M. Chevreuil, R. Lebrun, A. Nouy, and P. Rai, “A least-squaresmethod for sparse low rank approximation of multivariatefunctions,” SIAM/ASA Int. J. Uncertain. Quantif., vol. 3, no. 1,pp. 897–921, 2015.

[29] A. Nouy, Low-rank methods for high-dimensional approxima-tion and model order reduction. Philadelphia: SIAM, 2017,pp. 171–226.

[30] C. J. Hillar and L.-H. Lim, “Most tensor problems are NP-hard,”Journal of the ACM (JACM), vol. 60, no. 6, pp. 1–39, 2013.

[31] X. Shi, H. Yan, Q. Huang, J. Zhang, L. Shi, and L. He, “Meta-model based high-dimensional yield analysis using low-ranktensor approximation,” in Proc. Design Autom. Conf, 2019, pp.1–6.

[32] K. Tang and Q. Liao, “Rank adaptive tensor recovery basedmodel reduction for partial differential equations with high-dimensional random inputs,” J. Comput. Phys., vol. 409, p.109326, 2020.

[33] H. Zhou, L. Li, and H. Zhu, “Tensor regression with applicationsin neuroimaging data analysis,” J. Amer. Statist. Assoc., vol. 108,no. 502, pp. 540–552, 2013.

[34] J. Kossaifi, Z. C. Lipton, A. Kolbeinsson, A. Khanna,T. Furlanello, and A. Anandkumar, “Tensor regression net-works,” J. Mach. Learn. Res., vol. 21, pp. 1–21, 2020.

[35] W. Guo, I. Kotsia, and I. Patras, “Tensor learning for regres-sion,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 816–827,2011.

[36] Q. Zhao, L. Zhang, and A. Cichocki, “Bayesian CP factorizationof incomplete tensors with automatic rank determination,” IEEETrans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1751–1763,2015.

[37] J. Luan and Z. Zhang, “Prediction of multidimensional spatialvariation data via Bayesian tensor completion,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 39, no. 2, pp.547–551, 2019.

[38] C. Li and Z. Sun, “Evolutionary topology search for tensornetwork decomposition,” in Proc. Int. Conf. Mach. Learn., 2020,pp. 5947–5957.

[39] A. Krishnamurthy and A. Singh, “Low-rank matrix and tensorcompletion via adaptive sampling,” in Proc. Adv. Neural Inf.Process. Syst., vol. 26, 2013.

[40] Z. He, B. Zhao, and Z. Zhang, “Active sampling for acceleratedMRI with low-rank tensors,” arXiv preprint arXiv:2012.12496,2020.

[41] R. Guhaniyogi, S. Qamar, and D. B. Dunson, “Bayesian tensorregression,” J. Mach. Learn. Res., vol. 18, no. 1, pp. 2733–2763,2017.

[42] R. Yu, G. Li, and Y. Liu, “Tensor regression meets Gaussianprocesses,” in Proc. Int. Conf. Artif. Intell. Stat. PMLR, 2018,pp. 482–490.

[43] W. Gautschi, “On generating orthogonal polynomials,” SIAM J.Sci. Stat. Comp., vol. 3, no. 3, pp. 289–317, 1982.

[44] R. G. Ghanem and P. D. Spanos, “Stochastic finite elementmethod: Response statistics,” in Stochastic finite elements: a

12

spectral approach, 1991, pp. 101–119.[45] D. Xiu, Stochastic collocation methods: A survey. Cham:

Springer International Publishing, 2016, pp. 1–18.[46] T. G. Kolda and B. W. Bader, “Tensor decompositions and

applications,” SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009.[47] R. Jenatton, G. Obozinski, and F. Bach, “Structured sparse

principal component analysis,” in Proc. Intl. Conf. Artif. Intell.Stat., 2010, pp. 366–373.

[48] T. Hastie, R. Tibshirani, and M. Wainwright, Statistical learningwith sparsity: the lasso and generalizations. CRC press, 2015.

[49] M. D. McKay, R. J. Beckman, and W. J. Conover, “A compar-ison of three methods for selecting values of input variables inthe analysis of output from a computer code,” Technometrics,vol. 42, no. 1, pp. 55–61, 2000.

[50] F. Aurenhammer, “Voronoi diagrams—a survey of a fundamen-tal geometric data structure,” ACM Computing Surveys (CSUR),vol. 23, no. 3, pp. 345–405, 1991.

[51] K. Crombecq, I. Couckuyt, D. Gorissen, and T. Dhaene, “Space-filling sequential design strategies for adaptive surrogate mod-elling,” in Int. Conf. Soft Computing Techn. in Civil, Structuraland Environmental Engineering, vol. 38, 2009.

[52] S. Mo, D. Lu, X. Shi, G. Zhang, M. Ye, J. Wu, and J. Wu,“A Taylor expansion-based adaptive design strategy for globalsurrogate modeling with applications in groundwater modeling,”Water Resour. Res., vol. 53, no. 12, pp. 10 802–10 823, 2017.

[53] A. Saltelli, P. Annoni, I. Azzini, F. Campolongo, M. Ratto,and S. Tarantola, “Variance based sensitivity analysis of modeloutput. design and estimator for the total sensitivity index,”Comput. Phys. Commun., vol. 181, no. 2, pp. 259–270, 2010.

[54] I. M. Sobol, “Global sensitivity indices for nonlinear mathemat-ical models and their Monte Carlo estimates,” Math. Comput.Simulation, vol. 55, no. 1-3, pp. 271–280, 2001.

[55] N. Luthen, S. Marelli, and B. Sudret, “Sparse polynomial chaosexpansions: Literature survey and benchmark,” SIAM/ASA J.Uncertain. Quantif., vol. 9, no. 2, pp. 593–649, 2021.

[56] S. Marelli and B. Sudret, “UQLab: A framework for uncertaintyquantification in Matlab,” in Vulnerability, uncertainty, and risk:quantification, mitigation, and management, 2014, pp. 2554–2563.

Zichang He received the B.E. degree in Detec-tion, Guidance and Control Technology in 2018from Northwestern Polytechnical University, Xi’an,China. In 2018 he joined the Department of Elec-trical and Computer Engineering at University ofCalifornia, Santa Barbara as a Ph.D. student.

Zichang’s research activities are mainly focusedon uncertainty quantification and tensor related top-ics with applications on design automation, machinelearning, and quantum computing. He is the recipientof best student paper award in IEEE Electrical

Performance of Electronic Packaging and Systems (EPEPS) conference in2020 and the Outstanding Teaching Assistant award in the department ofECE at UCSB in 2020 and 2021.

Zheng Zhang (M’15) received his Ph.D degree inElectrical Engineering and Computer Science fromthe Massachusetts Institute of Technology (MIT),Cambridge, MA, in 2015. He is an Assistant Profes-sor of Electrical and Computer Engineering with theUniversity of California at Santa Barbara (UCSB),CA. His research interests include uncertainty quan-tification and tensor computational methods withapplications to multi-domain design automation, ro-bust/safe and high-dimensional machine learningand its algorithm/hardware co-design.

Dr. Zhang received the Best Paper Award of IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems in 2014, two BestPaper Awards of IEEE Transactions on Components, Packaging and Man-ufacturing Technology in 2018 and 2020, and three Best Conference PaperAwards (IEEE EPEPS 2018 and 2020, IEEE SPI 2016). His Ph.D. dissertationwas recognized by the ACM SIGDA Outstanding Ph.D. Dissertation Awardin Electronic Design Automation in 2016, and by the Doctoral DissertationSeminar Award (i.e., Best Thesis Award) from the Microsystems TechnologyLaboratory of MIT in 2015. He received the NSF CAREER Award in 2019and Facebook Research Award in 2020.