1 non-linear model reduction strategies for generating data driven stochastic input models nicholas...

41
C C O O R R N N E E L L L L U N I V E R S I T Y 1 C C O O R R N N E E L L L L U N I V E R S I T Y NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory Sibley School of Mechanical and Aerospace Engineering 188 Rhodes Hall Cornell University Ithaca, NY 14853-3801 Email: [email protected] URL: http://mpdc.mae.cornell.edu/

Upload: stuart-cooper

Post on 04-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

1

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS

Nicholas ZabarasMaterials Process Design and Control Laboratory

Sibley School of Mechanical and Aerospace Engineering188 Rhodes HallCornell University

Ithaca, NY 14853-3801

Email: [email protected]: http://mpdc.mae.cornell.edu/

Page 2: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

2

- Thermal and fluid transport in heterogeneous media are ubiquitous

- Range from large scale systems (geothermal systems) to the small scale

- Most critical devices/applications utilize heterogeneous/polycrystalline/functionally graded materials

ANALYSIS OF HETEROGENEOUS MEDIA

- Properties depend on the distribution of material/microstructure

- But only possess limited information about the microstructure/property distribution (e.g. 2D images)

Incorporate limited information into stochastic analysis:- worst case scenarios - variations on physical properties

Hydrodynamic transport through heterogeneous permeable media

Thermal transport through polycrystalline and functionally graded materials

Page 3: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

3

FRAMEWORK FOR ANALYSIS OF HETEROGENEOUS MEDIA

Extract properties P1, P2, .. Pn, that the structure satisfies.

These properties are usually statistical: Volume fraction, 2 Point correlation, auto correlation

Reconstruct realizations of the structure satisfying the correlations.

Construct a reduced stochastic model of property variations from the

data. This model must be able to approximate the class of structures.

Solve the heterogeneous property problem in the reduced stochastic space for computing property variations.

1. Property extraction 2. Microstructure/property reconstruction

3. Reduced model4. Stochastic analysis

B. Ganapathysubramanian and N. Zabaras, "Modelling diffusion in random heterogeneous media: Data-driven models, stochastic collocation and the variational multi-scale method", Journal of Computational Physics, in press

Page 4: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

4

INPUT STOCHASTIC MODELS: LINEAR APPROACH

Methodology for creating linear models from data

- Given some limited information (either in terms of statistical correlation functions or sample microstructure/property variations)

- Utilize some reconstruction methodology to create a finite set of realizations of the property/microstructure.

- Utilize this data set to construct a model of this variability

Use Proper Orthogonal Decomposition (POD), Principal Component Analysis (PCA) to construct a reduced order model of the data.

= a1 a2 +..+ an+

Convert variability of property/microstructure to variability of coefficients.

Not all combinations allowed. Developed subspace reducing methodology1 to find the space of allowable coefficients that reconstruct plausible microstructures

B. Ganapathysubramanian and N. Zabaras, "Modelling diffusion in random heterogeneous media: Data-driven models, stochastic collocation and the variational multi-scale method", Journal of Computational Physics, in press

Page 5: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

5

LINEAR APPROACH TO MODEL GENERATION

Successfully applied to investigate effect of heterogeneous media in diffusion phenomena in two-phase microstructures.

PCA based methods are easy to implement and are well understood.

But PCA based methods are linear projection methods

Only guaranteed to discover the true structure of data lying on a linear subspace of the high dimensional input space

# of samples#

of e

igen

vec

tors

PCA works very well when the input space is linear

What about when the input space is curved/non-linear?

PCA based techniques tend to overestimate the dimensionality of the model

Further related issues:- How to generalize it to other properties/structures?

Can PCA be applied to other classes of microstructures, say, polycrystals?

- How does convergence change as the amount of information increases? Computationally?

Page 6: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

6

TOWARDS A NON-LINEAR APPROACH TO MODEL GENERATION

- Find structure of data lying on a possibly non-linear subspace of the high-dimensional input data

- PCA finds a low-dimensional embedding of the data points that best preserves their variance as measured in the high-dimensional input space. Variance is measured based on Euclidian distance. This results in a linear subspace approximation of the data

3D data

PCAPt A

Pt B

- But what if the data lie on nonlinear curve in high-dimensional space?

- Have to unfold/unravel the curve

- Need non-linear approaches

Page 7: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

7

NONLINEAR REDUCTION: THE KEY IDEA

Set of images. Each image = 64x64 = 4096 pixels

Each image is a point in 4096 dimensional space.

But each and every image is related (they are pictures of the same object). Same object but different poses.

That is, all these images lie on a unique curve (manifold) in 4096.

Can we get a parametric representation of this curve?

Problem: Can the parameters that define this manifold be extracted, ONLY given these images (points in 4096)

Solution: Each image can be uniquely represented as a point in 2D space (UD, LR).

Strategy: based on the ‘manifold learning’ problem

Different images of the same object: changes in up-down (UD) and left-right (LR) poses

Page 8: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

8

NONLINEAR REDUCTION: THE KEY IDEA (Contd.)

Based on principles from psychology and cognitive sciences.

How does the brain interpret/store/recall high-resolution images at different poses, locations?

A low-dimensional parameterization of this very high-dimensional space

Effectively constructs a surrogate space of the actual space.

Performs all operations on this surrogate low-dimensional parametric space and maps back to the input space

These ideas are being increasingly used in problems in vision enhancement, speech and motor control, data compression Different images of the same

object: changes in up-down (UD) and left-right (LR) poses

Page 9: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

9

NONLINEAR REDUCTION: EXTENSION TO INPUT MODELS

Different microstructure realizations satisfying some experimental correlations

Given some experimental correlation that the microstructure/property variation satisfies.

Construct several plausible ‘images’ of the microstructure/property.

Each of these ‘images’ consists of, say, n pixels.

Each image is a point in n-dimensional space.

But each and every ‘image’ is related.

That is, all these images lie on a unique curve (manifold) in n.

Can a low-dimensional parameterization of this curve be computed?

Strategy: based on a variant of the ‘manifold learning’ problem.

Page 10: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

10

A FORMAL DEFINITION OF THE PROBLEM

State the problem as a parameterization problem (also called the manifold learning problem)

Given a set of N unordered points belonging to a manifold embedded in a high-dimensional space n, find a low-dimensional region d that

parameterizes , where d << n

Classical methods in manifold learning have been methods like the Principle Component Analysis (PCA) and multidimensional scaling (MDS).

These methods have been shown to extract optimal mappings when the manifold is embedded linearly or almost linearly in the input space.

In most cases of interest, the manifold is nonlinearly embedded in the input space, making the classical methods of dimension reduction highly approximate.

Two approaches developed that can extract non-linear structures while maintaining the computational advantage offered by PCA1,2.

1. J. B. Tenenbaum, V. De Silva, J. C. Langford, A global geometric framework for nonlinear dimension reduction Science 290 (2000), 2319-2323.

2. S Roweis, L. Saul., Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science 290 (2000) 2323--2326.

Page 11: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

11

AN INTUITIVE PICTURE OF THE STRATEGY

- Attempt to reduce dimensionality while preserving the geometry at all scales.

- Ensure that nearby points on the manifold map to nearby points in the low-dimensional space and faraway points map to faraway points in the low-dimensional space.

3D dataPCA

0 20 40 60 80 100-10

0

10

20

30

40

50

60

-10

-5

0

5

10

15

-10

-5

0

5

10

15

20

40

Linear approach Non-linear approach: unraveling the curve

Page 12: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

12

KEY CONCEPT

Pt A Pt B

Euclidian dist

Geodesic dist

1) Geometry can be preserved if the distances between the points are preserved – Isometric mapping.

2) The geometry of the manifold is reflected in the geodesic distance between points

3) First step towards reduced representation is to construct the geodesic distances between all the sample points

Page 13: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

13

MATHEMATICAL DETAILS: AN OVERVIEW

Start from N samples lying on a curve that is embedded in a high- dimensional space.

What are the properties of this curve ?

Next step is to extract some knowledge of the geometry of this curve. Utilize the concept of geodesic distance to encode this knowledge of the geometry.

How is this geodesic distance defined and computed?

Generate an isometric transformation (parameterization) from this curve to a low-dimensional region. Proof of existence of this mapping

How is this information used in numerically constructing the required mapping?

Page 14: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

14

PROPERTIES OF THE CURVE Given a set of N sample points lying on { }ixMust have a notion of distance between these sample points. Define an appropriate function

: [0, ) D:M M [that determines the difference between any two points Ensure that it satisfies the properties of positive-definiteness, symmetry and the triangle inequality.

Lemma 1: (, ) is a metric space

Lemma 2a: (, ) is a bounded metric space

Lemma 2b: (, ) is dense

Lemma 2c: (, ) is completeTheorem: (, ) is compact1.

1. B. Ganapathysubramanian and N. Zabaras, "A non-linear dimension reduction methodology for generating data-driven stochastic input models", Journal of Computational Physics, submitted.

2. V. de Silva, J. B. Tenenbaum, Unsupervised Learning of Curved Manifolds, In Proceedings of the MSRI workshop on nonlinear estimation and classification. Springer Verlag, 2002.

A compact manifold embedded in a high-dimensional space can be isometrically mapped to a region in a low-dimensional space2.

Page 15: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

15

THE ISOMETRIC MAPPING: COMPUTING THE GEODESIC

A compact manifold embedded in a high-dimensional space can be isometrically mapped to a region in a low-dimensional space.

Isometry encoded into the geodesic distances

Have no notion of the geometry of the manifold to start with. Hence cannot construct true geodesic distances!

( , ) inf{length( )}i j

Mm

D

Approximate the geodesic distance using the concept of graph distance G(i,j) : the distance of points far away is computed as a sequence of small hops.

This approximation, G, asymptotically matches the actual geodesic distance . In the limit of large number of samples1,2. (Theorem 4.5 in Ref. 1)

1 2(1 ) ( , ) ( , ) (1 ) ( , )i j i j i j GM Mm m mD D D

1. B. Ganapathysubramanian and N. Zabaras, " A non-linear dimension reduction methodology for generating data-driven stochastic input models ", submitted to Journal of Computational Physics.

2. M.Bernstein, V. deSilva, J.C.Langford, J.B.Tenenbaum, Graph approximations to geodesics on embedded manifolds, Dec 2000

Page 16: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

16

THE ISOMETRIC MAPPING: COMPUTING THE GEODESIC

Proof of theorem is based on ideas from geometry, differential algebra and simple probability arguments

Using this approximation – can compute the pair wise geodesic

distance matrix, M, between all the N sample points

Page 17: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

17

FROM THE MATRIX, M, TO A LOW-DIMENSIONAL MAP

Have N objects, the preceding developments provided a means of computing the pair wise distances between them.

Convert this set of pair wise distances into points in a low-dimensional space

This is a straightforward problem in multivariate statistical analysis1.

Can easily solve the problem using the concept of Multi Dimensional Scaling2 (MDS)

1. J. T. Kent, J. M. Bibby, K. V. Mardia, Multivariate Analysis (Probability and Mathematical Statistics), Elsevier (2006).2. T. F. Cox, M. A. A. Cox, Multidimensional scaling, 1994, Chapman and Hall

Page 18: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

18

MULTI DIMENSIONAL SCALING

Given the N x N matrix of the geodesic distances, M, with elements dij.

Compute the symmetric N x N matrix, A, with elements aij = -1/2 dij2.

Suppose that the N low-dimensional points that we are interested in finding (the parameters of the N objects in high-dimensional space) are { }iy

Denote the N x N matrix of the scalar product of these N low-

dimensional points as B,

1

.d

Tij ik jk i j

k

b y y y y

Can show that A, and B, are related as B HAHWhere H is the centering matrix, 1/ij ijh N

TB YY

Page 19: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

19

MULTI DIMENSIONAL SCALING (contd.)

B is a positive definite matrix and can be written in terms of its eigen values and eigen vectors as:

B

Where Λ is the diagonal matrix of the eigenvalues and Γ is the

corresponding matrix of eigenvectors. B will have a decaying eigen-spectrum. From and we get: TB YY B

1/ 2d d Y

The above equation is an estimate of Y in terms of the largest d eigenvalues of the eigenvalue decomposition of the squared geodesic distance matrix

From N points in a high-dimensional space to N points in a low-dimensional space

Page 20: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

20

THE NONLINEAR MODEL REDUCTION ALGORITHM

Given a set of N sample points lying on { }ix

The complete algorithm consists of three simple stepsStep 1:

- Construct the neighborhood graph: Determine which points are neighbors, Construct the weighted graph with edges given weights corresponding to the distances, , between the pointsStep 2:

- Estimate the shortest intrinsic path- the geodesic lengths: Compute

the shortest path length, M, between the points in the weighted graph, using, say, Floyd’s algorithm

Step 3:

- Construct the d-dimensional embedding: Perform classical Multi

Dimensional Scaling on the geodesic distance matrix, M, to extract a set of N points in a low-dimensional space

Page 21: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

21

CHOOSING THE OPTIMAL DIMENSIONALITY, d

The MDS procedure in the dimension reduction methodology constructs the eigen-value decomposition of a matrix. The low-dimensional representation correspond to the largest d eigenvalues of this matrix.

How is the value of d chosen?

Use ideas from recent development on estimating the intrinsic dimensionality of manifolds (Costa and Hero1)

Elegant ideas from differential geometry and graph theory.

Connect dimensionality to the rate of convergence of a graph theoretic quantity1. J.A.Costa, A.O.Hero, Geodesic Entropic Graphs for Dimension and Entropy Estimation in Manifold Learning, IEEE

Trans. on Signal Processing, 52 (2004) 2210--2221.

Page 22: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

22

CHOOSING THE OPTIMAL DIMENSIONALITY, d

Based on very elegant ideas linking graph theory with differential geometry.

Based on the theorem of Beardwood-Halton-Hammersley1,2.

Based on concepts in geometric probability, the theorem relates the structure/geometry of embedded manifolds to the graph structure of a finite set of points on these embedded manifolds

1. J.A.Costa, A.O.Hero, Manifold Learning with Geodesic Minimal Spanning Trees, arXiv:cs/0307038v12. J. Beardwood, J. H. Halton, and J. M. Hammersley, “The shortest path through many points,” Proc. Cambridge

Philosophical Society, 55 (1959) 299–327.3. B. Ganapathysubramanian and N. Zabaras, " A non-linear dimension reduction methodology for generating data-driven

stochastic input models ", submitted to Journal of Computational Physics.

The theorem states that the length functional of the minimal spanning graph is related to the entropy of density of distribution of a finite set of points in a embedded manifold (with some specific properties)

Costa and Hero1 extended/applied this theorem to general embedded manifolds

Ganapathysubramanian and Zabaras3 applied it to the model reduction problem

Page 23: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

23

CHOOSING THE OPTIMAL DIMENSIONALITY, d (Contd.)

The length functional of the minimal spanning tree of the geodesic matrix,

M, is related to the intrinsic dimensionality of the low-dimensional representation of the manifold 1,2.

1. B. Ganapathysubramanian and N. Zabaras, "A non-linear dimension reduction methodology for generating data-driven stochastic input models", submitted to Journal of Computational Physics.

2. J.A.Costa, A.O.Hero, Geodesic Entropic Graphs for Dimension and Entropy Estimation in Manifold Learning, IEEE Trans. on Signal Processing, 52 (2004) 2210--2221.

Page 24: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

24

CHOOSING THE OPTIMAL DIMENSIONALITY, d (Contd.)

log( ) log( )L a N 1da

d

with

From the above theorem, can easily extract the dimensionality.

Taking the logarithm of the main result of the algorithm,

Where L is the length functional of the minimal spanning tree of the geodesic matrix, N is the number of samples and d is the optimal dimensionality.

Use readily available algorithms (Kruskal’s algorithm or Prim’s algorithm) to compute L for different sample sizes

Perform a least squares fit for the value of a

Page 25: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

25

THE NONLINEAR MODEL REDUCTION FRAMEWORK

Given N unordered samples

N points in a low dimensional space

n. d

The procedure results in N points in a low-dimensional space. The geodesic distance + MDS step (Isomap algorithm1) results in a low-dimensional convex, connected space2, d.

1. J. B. Tenenbaum, V. De Silva, J. C. Langford, A global geometric framework for nonlinear dimension reduction ,Science 290 (2000), 2319-2323.

2. B. Ganapathysubramanian and N. Zabaras, "A non-linear dimension reduction methodology for generating data-driven stochastic input models", submitted to Journal of Computational Physics.

Using the N samples, the reduced space is given as

serves as the surrogate space for .

Access variability in by sampling over .

BUT have only come up with → map …. Need → map too

Page 26: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

26

THE REDUCED ORDER STOCHASTIC MODEL

Only have N pairs to construct → map. Various possibilities based on specific problem at hand. But have to be conscious about computational effort and efficiency.

Illustrate 3 such possibilities below. Error bounds can be computed1.

n d

n d

n

d

1. Nearest neighbor map 2. Local linear interpolation

3. Local linear interpolation with projection1. B. Ganapathysubramanian and N. Zabaras, "A non-linear dimension reduction methodology for generating data-driven

stochastic input models", submitted to Journal of Computational Physics.

Page 27: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

27

THE LOW DIMENSIONAL STOCHASTIC MODEL

Given N unordered samples

Compute pairwise geodesic distance

Perform MDS on this distance matrix

N points in a low dimensional

space

Algorithm consists of two parts.

1) Compute the low-dimensional representation of a set of N unordered sample points belonging to a high-dimensional space

For using this model in a stochastic collocation framework, must sample points in →

2) For an arbitrary point ξ € must find the corresponding point x €. Compute the mapping from →

n. d

Page 28: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

28

NUMERICAL EXAMPLE

Problem strategy:

Extract pertinent statistical information form the experimental image

Reconstruct dataset of plausible 3D microstructures

Construct a low-dimensional parametrization of this space of microstructures

Solve the SPDE for temperature evolution using this input model in a stochastic collocation framework

Given an experimental image of a two-phase metal-metal composite (Silver-Tungsten composite).

Find the variability in temperature arising due to the uncertainty in the knowledge of the exact 3D material distribution of the specified microstructure.

T= -0.5 T= 0.5

1. S. Umekawa, R. Kotfila, O. D. Sherby, Elastic properties of a tungsten-silver composite above and below the melting point of silver J. Mech. Phys. Solids 13 (1965) 229-230

Page 29: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

29

r (um)

g(r)

0 5 10 15 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

r (um)

g(r)

0 5 10 15 200

0.2

0.4

0.6

0.8

1

Experimental image Experimental statistics

GRF statistics

Realizations of 3D microstructure

TWO PHASE MATERIAL

Page 30: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

30

NON LINEAR DIMENSION REDUCTION

The developments detailed before are applied to find a low-dimensional representation of these 1000 microstructure samples.

The optimal representation of these points was a 9-dimensional region

Able to theoretically show that these points in 9D space form a convex region in 9.

This convex region now represents the low- dimensional stochastic input space

Use sparse grid collocation strategies to sample this space.

10

15

20-15 -10 -5 0 5 10 15

-10

-5

0

5

10

15

Log(Samples)

Log

(len

gth

ofM

ST

)

100 300 500

10000

20000

30000

40000

Page 31: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

31

# of collocation points

Err

or

100 101 102 103 10410-6

10-5

10-4

10-3

10-2

10-1

100

101

Computational domain of each deterministic problem: 65x65x65 pixels

COMPUTATIONAL DETAILS

The construction of the stochastic solution: through sparse grid collocation level 5 interpolation scheme used

Number of deterministic problems solved: 26017

Computational platform: 50 nodes on local Linux cluster (x2 3.2 GHz)

Total time: 210 minutes

Total number of DOF: 653x26017 ~ 7x109

Page 32: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

32Cornell University- Zabaras, FA9550-07-1-0139

Coupling data driven model generation with a Multiscale stochastic modeling framework

Seamlessly couple stochastic analysis with multiscale analysis.

Multiscale framework (large deformation/thermal evolution) + Adaptive stochastic collocation framework

Provides roadmap to efficiently link any validated multiscale framework

Coupled with a data-driven input model strategy to analyze realistic stochastic multiscale problems.

r (um)

g(r)

0 5 10 15 20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

T= -0.5 T= 0.5

Limited data

Statistics extraction + model generation

Stochastic multiscale framework

Mean statistics

Higher-order statistics

Page 33: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

33

OTHER APPLICATIONS OF THIS FRAMEWORK

From unordered samples in high-dimensional space to a convex space representing the parameterization

n d

- This methodology has significant applications to problems where working in high-dimensional spaces is computationally intractable. - Can pose the problem in a low-dimensional space

3.05 3.06 3.07 3.08 3.09 3.1 3.11 3.123.03

3.04

3.05

3.06

3.07

3.08

3.09

3.1

3.11

3.12

Taylor factor along RD

Tay

lor

fact

or a

long

TD

b

2. Rolling

3. Rolling followed by drawing

1. Drawing

R

C

ad

fe

Property-process space

-1

-0.5

0

0.5

-1-0.5

00.5

1-1.8

-1.7

-1.6

-1.5

-1.4

-1.3

Process-structure space

Process pathsA100

A1000

A80

Property-structure spacevisualizing property

evolution, process-property maps, searching and contouring, representing input uncertainty, data mining …

Page 34: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

Stochastic design/optimization framework

Most critical components in many applications/devices are usually fabricated from polycrystalline/ functionally graded/ heterogeneous materials

The properties at the device-scale (thermal and electrical conductivity, elastic moduli and failure mechanisms) depend on the microstructure and material distribution at the meso-scale

Have to design the operating conditions/parameters taking into account this limited information about the microstructure.

Robust design in the presence of topological uncertainty

Given limited topological information about the microstructure, design the optimal (stochastic) heat flux to be applied on one end of the device such that a required (stochastic) temperature is maintained at the other end.Design criterion:

- Maintain a specified thermal profile on the right wall

- This thermal profile is given in terms of a pdf, usually specified in terms of moments Additional uncertainties:

- Do not know the exact microstructure that the device is made up of.

- Only know certain statistical correlations that the microstructure satisfies.

- Consider the microstructure to be a random field .Design variables:

- Must design the pdf of the optimal heat flux

Page 35: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

Stochastic design/optimization framework : Some details

Convert the problem into an optimization problem

Denote the stochastic space in which the flux, q(x) exists as q. Convert the problem of constructing the stochastic function q € q into designing the values of the stochastic function at a finite number of points in the stochastic space (collocation based design)

Solution exists to the problem in the sense of Tikhonov.

2 2 2 2 21 2

1({ }) ( ) ( ) ... ( )

2r

k ki kJ q T T T dx

1({ }) [ ( ) ( )i i

r

q i qD J q T D

2 2 22 ( ) ( ) ... ( ) ( )]

i i

k k kq k qT D T D dx

Use gradient based minimization methods. Need to compute gradient. Notion of the directional derivative

1 1 1 1

( )q qk k

i i

n nn nm j m j

q q k q k qm j m j

D w w w w D

Need to compute the directional derivative of the temperature

1

1 1

q k

i

n nm jq k q

m j

w w D

Page 36: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

Stochastic design/optimization framework (contd.)

Leads to the definition of the continuum stochastic sensitivity equations.

Linear in ‘design variable perturbation’ part of the dependant variables

( , : ) ( , : ) ( , : , ) . .i i i i ix q q x q x q q h o t

Novel, highly-efficient way to compute stochastic sensitivities based on parallel sparse grid collocation schemes.

Major advance in non-intrusive stochastic optimization – any stochastic optimization problem can now be posed as a deterministic optimization problem in a large-dimensional space. Classical gradient and gradient-free optimization methods are directly applicable.

The stochastic direct problem is converted into a set of n-decoupled deterministic equation (the collocation based approach to solve SPDEs)

The stochastic temperature sensitivity equations are simply directional derivatives of the n deterministic that form the above stochastic direct problem. The stochastic sensitivity is then represented in terms of these decoupled stochastic equations as:

1 1

( , , : ) ( , , : )q kn n

m j m jm m i q k q k i

m j

x q L L x q

Page 37: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

Stochastic design/optimization framework

Mean temperature (right wall) Standard deviation of temperature

Physical domain: 20 μm x 20μm x 20μm

Computational domain: 65x65x65

Microstructure stochastic space: 1177 collocation points

The design stochastic heat flux represented as a Bezier surface: number of parameters: 25

Designed mean heat flux (left wall)

Bounds on the flux variation (left wall)

Run optimization problem on 36 nodes of our Linux cluster

Each iteration ~ 3 hours

Solve 10593 deterministic problems in each iteration

Each deterministic problem has 65x65x65 = 275625 DOF.

Total number of design variables = 225

Zabaras et al. JCP, 2007b

Page 38: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

38Cornell University- Zabaras, FA9550-07-1-0139

1. Desired product shape

2. Desired microstructure-sensitive properties

Z

X

Y

3.052.922.792.662.532.412.282.15

Impact: 3D Multi-scale Design of Deformation Processes

- A two-length scale continuum sensitivity framework has been developed that allows the efficient and accurate computation of the effects of perturbations of macroscopic design variables (e.g. dies and preforms) on microscopic variables (slip system resistances, crystallographic texture, etc.)

- Using this sensitivity framework, the first 3D multiscale deformation process design simulator has been developed for the control of texture-dependent material properties and applied to complex processes.

Micro-scale representation (Orientation distribution function)

Bo

X

Bn+1

B’n+1

Macro-sensitivity problem driven by perturbation to macro-design variable ()

Micro-sensitivity problem driven by sensitivity of deformation gradient

Page 39: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

39Cornell University- Zabaras, FA9550-07-1-0139

Multi-scale design: Design for uniform yield strength during 3D forging process

(a) Iteration 1

Yield Strength (MPa)

75

85

95

105

115

125

135

(b) Iteration 2

(c) Iteration 3 (d) Iteration 4: Optimal solution

ODF evolution at the center of the workpiece

Strong z-axis <110> fibers (compression texture)

Distribution of (z- axis tension) yield strength obtained at various design iterations

Initial guess: Large variation in yield strength, incomplete fill

Uniform strength, Desired shape obtained

Page 40: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

Impact: Application to designing critical components with extremal properties

Forging velocity

Die/workpiece friction

Die shape

Initial preform shape

Uncertainty in constitutive

relationUncertainty in Texture

Uncertainty in grain sizes

- Significant developments towards the design of processing paths and materials for manufacturing critical components that have extremal properties.

- Dimension reduction strategy has significant applications to problems where working in high dimensional spaces is computationally intractable: visualizing property evolution, process-property maps, searching and contouring.

- Stochastic multiscale framework provides an approach to incorporate in the multiscale design framework shown earlier operational variabilities across multiple scales. Provide rigorous failure criteria and the associated probabilities.

Page 41: 1 NON-LINEAR MODEL REDUCTION STRATEGIES FOR GENERATING DATA DRIVEN STOCHASTIC INPUT MODELS Nicholas Zabaras Materials Process Design and Control Laboratory

CCOORRNNEELLLL U N I V E R S I T Y

CCOORRNNEELLLL U N I V E R S I T Y

Future Plans1. Address a number of critical issues on non-linear model reduction techniques:

basis representation, projection, solution of PDEs, SPDEs, etc.

2. Utilize model reduction strategy to compute realistic input stochastic models of polycrystalline materials based on limited information.

3. Develop a multiscale framework for modeling the effect of microscale (topological) uncertainty on the performance of macro-components. Investigate uncertainty propagation and interaction of uncertainties across multiple scales.

4. Utilize model reduction strategy as a technique for reducing problems in high- dimensional spaces (visualizing property evolution, process-property maps, searching and contouring) into more tractable low-dimensional surrogate spaces.

5. Address many open issues on stochastic optimization using sparse grid collocation. Many technologically important applications.

6. Multi-element collocation wavelets as an alternative to Lagrange polynomials (extension to Smolyak quadrature selection rule).