qregup4

8/14/2019 qregup4

http://slidepdf.com/reader/full/qregup4 1/11

Quasi Regression and black boxes 1

³

²

°

±

Finding important

variables and interactions

in black boxes

Art B. Owen

Stanford University

[email protected]

Tao Jiang

Stanford University

[email protected]


³

²

°

±

ThemeAs dimension increases many numerical problems become

more statistical.

Because:

1. the sample is inevitably sparse,

2. error depends on unsampled part of space,

3. worst case error bounds are inapplicable


³

²

°

±

Example: integration

Á

´ ¼ ½ µ

Ü µ Ü

Sampling Methods

1. Monte Carlo: Ò

½ ¾

2. Quasi-Monte Carlo: Ò

½

´ Ð Ó Ò µ

½ , but no practical

error estimate

3. Randomized Quasi-Monte Carlo: replication based error

estimates, and Ò

¿ ¾

´ Ð Ó Ò µ

½ µ ¾

Rates are asymptotic under mild conditions on

Also statistical: approximation


³

²

°

±

Mortgage backed

securities integrandPaskov & Traub, Caflisch, Morokoff, & Owen

Present value of 30 years of monthly cash flows.

Prepayment:

1. puts lumps into payment stream

2. more common when interest rates are low

MBS Model (from Goldman-Sachs)

µ

Í ¼

½ ℄

¿ ¼

½

µ

Interest rates: Ö

½

Ö

¿ ¼

Geometric Brownian motion

driven by

Prepayment fraction: · Ö Ø Ò ´

· ¢

Ö

Ø

µ

8/14/2019 qregup4



³

²

°

±

QMC super on MBSBut µ is

% additive

Latin hypercube sampling variance about ¼ ¼ % of MC

Also µ is

% odd (antisymmetric)

antithetic sampling variance is about ¼ ¼ ¾

% of MC

Additive and odd: was virtually linear in

upon further investigation

Curse of dimensionality not broken by QMC

we just had an easy integrand

QMC requires low “effective dimension” to trounce MC


³

²

°

±

ANOVA of Ä

¾

¼

½ ℄

Hoeffding, Efron & Stein, Sobol’

Main effects and –factor interactions generalizing familiar

discrete ANOVA

Ü µ

Ù ½ ¾

Ù

Ü µ

Ù

depends only on Ü -components in set Ù

Ê

Ü µ Ü

“grand mean”

¾

µ

È

Ù

Ê

Ù

Ü µ

¾

Ü

Ê

Ù

Ü µ

Ú

Ü µ Ü ¼ , Ù

Ú

½

Ò

Ò

½

Ü

µ

Ù

½

Ò

Ò

½

Ù

Ü

µ

QMC Ü

very uniform in low dimensional projections

Great for functions dominated by

Ù

with small Ù


³

²

°

±

Isotropic integrandCapstick & Keister, Pagageorgiou & Traub, Owen

Ü µ Ó ×

×

½

¾

-

-

-

-

½

Ü µ

-

-

-

-

¾

Ü Í

´ ¼

½ µ

Ê

Ü µ Ü

Ó ×

Õ

¾

µ

¾

closed form (Mathematica) aids comparison of methods

Varies equally in all directions

QMC does well

For ¾

over 99% of variance from

1,2,3 dimensional ANOVA effects

after numerical investigation

exploiting symmetry and Gaussianity


³

²

°

±

The borehole functionMorris, Mitchell, Ylvisaker

Flow from upper to lower aquifer:

¾ Ì

Ù

À

Ù

À

Ð

℄

Ð Ó

Ö

Ö

Û

½ ·

¾ Ä Ì

Ù

Ð Ó

Ö

Ö

Û

µ

Ö

¾

Û

Ã

Û

·

Ì

Ù

Ì

Ð

Ö , Ö

Û

Radii borehole, basin

Ì

Ð

, Ì

Ù

Transmissivities upper and lower

À

Ð

, À

Ù

Potentiometric heads upper and lower

Ä , Ã

Û

Length and conductivity

Diaconis: closed form understanding

Which variables are important?

Which interact?

8/14/2019 qregup4



³

²

°

±

Black box functions

µ Without “ · ”

Examples

Semiconductors Device design Speed, heat

Aerospace Wing shape Lift, drag

Automotive Auto Frame Strength, weight

Statistics Predictors Responses

Used to design products. Cheaper than physical

experiments. Costs from milliseconds to hours. Dimension

from 3 to 300. Accuracy varies too.

Kriging widely used Journel, Huijbreghts, Sacks, Ylvisaker,

Welch, Wynn, Mitchell


³

²

°

±

A small neural netVenables, Ripley

PredictÐ Ó

½ ¼

Ô Ö

µ from the others

perf published performance of computer

syct cycle time in nanoseconds

mmin minimum main memory in kilobytes

mmax maximum main memory in kilobytes

cach cache size in kilobytes

chmin minimum number of channels

chmax maximum number of channels

Function found by training on¾ ¼

examples.


³

²

°

±

T h e n - n e t f u n c t i o n

¼

½

¾

½

Ü

½

·

½

¿

Ü

¾

·

½

¾

Ü

¿

½

¼

½

Ü

¼

¿

¿

Ü

·

¼

¿

¼

Ü

¾

¾

Ë

½

½

¾

·

¼

Ü

½

·

¾

¾

Ü

¾

·

¾

½

Ü

¿

½

¿

Ü

¼

Ü

·

¼

¿

Ü

·

¿

½

Ë

½

¼

·

¾

¾

Ü

½

¼

½

¼

Ü

¾

·

½

Ü

¿

·

¾

¼

Ü

·

½

¾

Ü

·

¼

¾

Ü

·

¼

¿

Ë

¼

¼

¼

½

½

Ü

½

·

¼

½

½

Ü

¾

·

¼

½

¾

Ü

¿

¼

½

¼

Ü

¼

¼

Ü

·

¼

¼

¾

Ü

w h e r e

Ë

Þ

µ

½

·

Ü

Ô

Þ

µ

℄

½

i s a s i g m o i d a l f u n c t i o n


³

²

°

±

Given Ü µ on ¼

½ ℄

How can we tell if is:

1. Nearly linear?

2. Nearly additive?

3. Nearly quadratic?

4. Has mostly ¿ factor interactions or less?

5. Which variables matter most?

6. Which interactions matter most?

We would like:

1. a systematic approach

2. that also predicts

8/14/2019 qregup4


8/14/2019 qregup4



³

²

°

±

InterpretationVariance of is

È

Ö ¼

¬

¾

Ö

·

Ê

Ü µ

¾

Ü

Importance of Ë isÈ

Ö ¾ Ë

¬

¾

Ö

Estimate byÈ

Ö ¾ Ë

¬

¾

Ö

Var

¬

Ö

µ

Subsets of interest include:

Ö

Ö

´ ½ µ ¼

involves Ü

½

Ö

Ö

´ ½ µ ¼ does not involve Ü

½

Ö

Ö

¼

½ additive part

Ö

¼

Ö

¼

interactions up to order

Ö

¼

Ö

½

of degree at most

Ö

Ö

µ ¼ ¿

uses only first ¿ inputs,


³

²

°

±

Approximation throughintegration

Define: Ü µ

¼

Ü µ

Ô ½

Ü µ µ

Ì

Optimal ¬ is

¬

£

Ö Ñ Ò

¬

Ü µ

Ü µ

Ì

¬

¡

¾

Ü

Ü µ Ü µ

Ì

Ü

½

Ü µ Ü µ Ü

also,

Á Ë

Ü µ

Ü µ

Ì

¬ µ

¾

Ü


³

²

°

±

Regression and

quasi-regression

¬

£

Ü µ Ü µ

Ì

Ü

½

Ü µ Ü µ Ü

Ü µ Ü µ Ü

by orthogonality

Observations

Ü

Í

¼

½ ℄

½

Ò

IID

Regression

¬

Ì

¡

½

Ì

Ò ¢ Ô

Ò ¢ ½

Quasi-Regression

¬

½

Ò

Ì


³

²

°

±

Precursors of

quasi-regression

Quasi-interpolation

Chui & Diamond, Wang

“Ignore the denominator” (

Ì

) to get fast approximate

interpolation.

Computer experiments

Koehler and Owen 1996 advocate quasi-regression for

computer experiments

Efromovich 1992 applies qr to sinusoids on ¼

½ ℄

.

Owen 1992 describes quasi-regression for Latin hypercubesampling

8/14/2019 qregup4



³

²

°

±

Accuracy in Monte Carlo

samplingDefine:

¬

Æ

Ô ¢ ½

½

Ò

È

Ò

½

Ì

¬ µ

Ô ¢ Ô

½

Ò

È

Ò

½

Ì

Á

Now

¬

¬

½

Ò

Ì

¬

½

Ò

Ì

¬

· µ

¬

Æ · ¬

¬

¬

Ì

µ

½

Ì

¬

· µ

¬

Ì

µ

½

Ì

Á · µ

½

Æ

Á

·

¾

¿

¡ ¡ ¡ µ Æ

Æ Æ


³

²

°

±

Fast stable updatesDefine:

¬

Ò µ

Ö

½

Ò

Ò

½

Ö

Ü

µ Ü

µ

Ë

Ò µ

Ö

½

Ò

Ò

½

Ö

Ü

µ Ü

µ

¬

Ò µ

Ö

¾

Then:

¬

Ò µ

Ö

¬

Ò ½ µ

Ö

·

½

Ò

Ö

Ü

µ Ü

µ

¬

Ò ½ µ

Ö

Ë

Ò µ

Ö

Ò

½

Ò

Ë

Ò ½ µ

Ö

·

Ò

½

Ò

¾

Ö

Ü

µ Ü

µ

¬

Ò ½ µ

Ö

¾

Chan, Golub, Leveque who useÒ Ë

Ò µ

Ö

Ò

Ò ½

Ë

Ò µ

Ö

Var

¬

Ò µ

Ö

µ


³

²

°

±

Updatable accuracy

estimates

Predict Ü

Ò

µ by

Ò ½

Ü

Ò

µ Ü

Ò

indep of

Ò ½

Average recent squared errors

Á Ë Ò

Ñ

µ

½

Ò

Ñ

Ò

Ñ ½

Ò

Ñ

Ò

Ñ ½

Ü

µ

½

Ü

µ

¾

on subsequence Ò

Ñ

Ñ Ñ · ½ µ

¾

estimates avg ISE over recent

Ô

¾ Ò values

Diagnostic:

Large LOF and small

È

Ö

Var

¬

Ö

µ µ

need bigger basis


³

²

°

±

Presented as lack-of-fit:

½ Ê

¾

LOF

Á Ë

Î Ö

LOF

AVG

µ

¾

AVG

¬

¼

µ

¾

Ð Ó

½ ¼

Ä Ç

µ Ê

¾

±

¿

±

¾ ±

½ ¼ ±

¼ ¼ ±

½

¼ ¼ ±

8/14/2019 qregup4



³

²

°

±

C o

s t s o f a l g e b r a

T i m e

S p a c e

F o o t p r i n t

K r i g i n g

Ç

Ò

¿

·

Ô

¿

µ

Ç

Ò

¾

·

Ô

¾

µ

Ç

Ò

·

Ô

µ

R e g r e s s i o n

Ç

Ò

Ô

¾

µ

Ç

Ô

¾

µ

Ç

Ò

Ô

µ

Q u a s i - r e g r e s s i o n

Ç

Ò

Ô

µ

Ç

Ô

µ

Ç

Ò

Ô

¾

µ

Q u a s i - r e

g a l l o w s l a r g e r

Ò

o r m u c h l a r g e r

Ô

Ô

½

¼

¼

¼

¼

¼

¼

d o a b l e b y q u a s i - r e g . ,

n o t b y r e g . O

w e n ,

A n n S

t a t 2 0 0 0

C o s t o f

D i m e n s i o n

L o w

L o w

E a s y

H i g h

L o w

K r i g i n g

L o w

H i g h

( Q u a s i - ) r e g r e s s i o n

H i g h

H i g h

[ g o o d l u c k ]


³

²

°

±

Incorporating shrinkageHoerl, Kennard, Efromovich, Donoho, Johnstone, Beran

- Ò

Ü µ

Ö

-

Ö Ò

¬

Ö Ò

Ö

Ü µ -

Ö Ò

¾ ¼

½ ℄

Optimally

-

Ö Ò

¬

¾

Ö

¬

¾

Ö

· Var

¬

Ö Ò

µ

Shrinkage can reduce prediction variance.

We use data to estimate -

Ö Ò

e.g. -

Ö Ò

¬

Ò ½ µ

¾

Ö

¬

Ò ½ µ

¾

Ö

· Ë

Ò ½ µ

Ö


³

²

°

±

Exploiting residualsFor Ö

¼ : ¬

Ö

µ

¬

Ö

µ , for ¾

Ê

Var

½

Ò

Ò

½

Ö

Ü

µ Ü

µ

µ

depends on c

Try

¬

¼

More generally

¬

Ò µ

Ö

½

Ò

Ò

½

Ö

Ü

µ

Ü

µ

× Ö

× ½

¬

½ µ

×

×

Ü

µ

Original quasi-reg:

Ö

¼ or ½

Ö ¼

-

Ö

½

Ö ¾ Ê

Self-consistent quasi-reg:

Ö

-

Ö

¾ ¼

½ ℄

BoundingÊ

¾ by sample variance eliminates explosive

feedback

Still updatable

¬

Ö

and Ë

Ö

NB: Ò

¬

Ò µ

Ö

¬

Ö

µ is a martingale in Ò


³

²

°

±

N-net example Ü µ is prediction of Ð Ó

½ ¼

perf µ

Ö

are Legendre polynomials

Ö

are tensor products

Ö

¼

¿

Ö

½

Ö

½

µ Ô

½ ½

Net is fast, so Ò ¼ ¼

¼ ¼ ¼

(about 3min on 800Mhz PC in java)

8/14/2019 qregup4


²

±

S a m p l e s i z e

LOF

1 0 0

1 0 0 0

1 0 0 0 0

1 0 0 0 0 0

10^-3 10^-1 1

²

±

Res

Ü µ

¼ ½ ¾ ½ Ü

½

· ½ ¿ Ü

¾

· ½ ¾ Ü

¿

¾ ¾ Ë

½ ½ ¾ · ¼ Ü

½

· ¾ ¾ Ü

¾

· ¿ ½ Ë

½ ¼ · ¾ ¾ Ü

½

¼ ½ ¼ Ü

¾

· ¼ ¿ Ë

¼ ¼ ¼ ½ ½ Ü

½

· ¼ ½ ½ Ü

¾

·

Additive com

Var syct mmin mmax

% 0.520 0.011 0.088

Q u a s i R e gr e s s i on an d b l a c k b ox e s

3 1

³ ²

° ±

Neural net resultsNumber of bases is 1145

###### Anova at Iteration 500000 ######

1-RSquare (LOF) is 0.0011707 at iteration 499500

Beta[0] (constant factor) is 2.0717Sample mean is 2.0719, sample variance is 0.14359

Unbiased estimates of dimension variances

0.11441 0.026592 0.0027723 0.0 0.0 0.0

Dimension Probabilities

(Ratios of dimension variances to sample variance)

0.79676 0.18518 0.019307 0.0 0.0 0.0

Q u a s i R e gr e s s i on an d b l a c k b ox e s

3 2

³ ²

° ±

Neural net results, ctdVariances on one and two variables / sample variance

syct mmin mmax cach chmin

0.5177106

9.292114E-4 0.01069175

0.008898125 0.02590950 0.08782891

0.05507833 0.006469443 0.05429608 0.1301971

0.01091619 6.212815E-4 0.008541468 0.01008703 0.03679156

2.480628E-4 4.889575E-4 2.725553E-4 0.001473632 2.348261E-4

Biggest main effect: syct is ¾ %

Biggest interaction syct ¢

cach is

%

8/14/2019 qregup4


8/14/2019 qregup4



³

²

°

±

Biggest interaction

Cycle Time x Cache Size Interaction

syct

c a c h

0.0 0.2 0.4 0.6 0.8 1.0

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

Ö

¼

¿

Ö

½

Ö

½


³

²

°

±

2nd biggest interaction

Cycle time ¢

Main Memory Max

5.4% of

0. 20. 4

0. 60. 8

0 .2

0 .4

0 .6

0 .8

- 0 .

0 6 - 0 .

0 4 - 0 .

0 2 0 0 .

0 2 0 .

0 4


³

²

°

±

2nd biggest interaction

Cycle Time x Max Main Memory Interaction

syct

m m a x

0.0 0.2 0.4 0.6 0.8 1.0

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

Ö

¼

¿

Ö

½

Ö

½


³

²

°

±

N-net conclusions1.

a fairly simple function wrt Í ¼

½ ℄

2. Ü

½

most important, and nearly linear

3. At least one interaction not supported by data

4. Non-random cross-validation (leave out clusters) might

help

8/14/2019 qregup4



³

²

°

±

Next directions

1. Mars-like dynamic choice of basis

2. Comparisons of and

on training data

3. Decompositions of

under empirical measures

4. Distinguishing structure from

artifacts

5. More types of statistical/ML black boxes

6. Missing data (arise in function mining too)

7. Stopping rules

8. More basis function choices

9. Block diagonal or banded

Ì

µ

(EG -splines)

10. Examples with noise ( unusable basis fns)


³

²

°

±

Robot arm functionRobot arm has joints: Lengths Ä

, angles

Shoulder at´ ¼

¼ µ

, hand at Ù Ú

µ :

Ù

½

Ä

Ó ×

½

Ú

½

Ä

× Ò

½

is shoulder to hand distance

Ä

½

Ä

¾

Ä

¿

Ä

½

¾

¿

Ô

Ù

¾

· Ú

¾

¼

Ä

½ ¼

¾

qregup4

Documents