qregup4
TRANSCRIPT
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 1/11
Quasi Regression and black boxes 1
³
²
°
±
Finding important
variables and interactions
in black boxes
Art B. Owen
Stanford University
Tao Jiang
Stanford University
Quasi Regression and black boxes 2
³
²
°
±
ThemeAs dimension increases many numerical problems become
more statistical.
Because:
1. the sample is inevitably sparse,
2. error depends on unsampled part of space,
3. worst case error bounds are inapplicable
Quasi Regression and black boxes 3
³
²
°
±
Example: integration
Á
´ ¼ ½ µ
Ü µ Ü
Sampling Methods
1. Monte Carlo: Ò
½ ¾
2. Quasi-Monte Carlo: Ò
½
´ Ð Ó Ò µ
½ , but no practical
error estimate
3. Randomized Quasi-Monte Carlo: replication based error
estimates, and Ò
¿ ¾
´ Ð Ó Ò µ
½ µ ¾
Rates are asymptotic under mild conditions on
Also statistical: approximation
Quasi Regression and black boxes 4
³
²
°
±
Mortgage backed
securities integrandPaskov & Traub, Caflisch, Morokoff, & Owen
Present value of 30 years of monthly cash flows.
Prepayment:
1. puts lumps into payment stream
2. more common when interest rates are low
MBS Model (from Goldman-Sachs)
µ
Í ¼
½ ℄
¿ ¼
½
µ
Interest rates: Ö
½
Ö
¿ ¼
Geometric Brownian motion
driven by
Prepayment fraction: · Ö Ø Ò ´
· ¢
Ö
Ø
µ
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 2/11
Quasi Regression and black boxes 5
³
²
°
±
QMC super on MBSBut µ is
% additive
Latin hypercube sampling variance about ¼ ¼ % of MC
Also µ is
% odd (antisymmetric)
antithetic sampling variance is about ¼ ¼ ¾
% of MC
Additive and odd: was virtually linear in
upon further investigation
Curse of dimensionality not broken by QMC
we just had an easy integrand
QMC requires low “effective dimension” to trounce MC
Quasi Regression and black boxes 6
³
²
°
±
ANOVA of Ä
¾
¼
½ ℄
Hoeffding, Efron & Stein, Sobol’
Main effects and –factor interactions generalizing familiar
discrete ANOVA
Ü µ
Ù ½ ¾
Ù
Ü µ
Ù
depends only on Ü -components in set Ù
Ê
Ü µ Ü
“grand mean”
¾
µ
È
Ù
Ê
Ù
Ü µ
¾
Ü
Ê
Ù
Ü µ
Ú
Ü µ Ü ¼ , Ù
Ú
½
Ò
Ò
½
Ü
µ
Ù
½
Ò
Ò
½
Ù
Ü
µ
QMC Ü
very uniform in low dimensional projections
Great for functions dominated by
Ù
with small Ù
Quasi Regression and black boxes 7
³
²
°
±
Isotropic integrandCapstick & Keister, Pagageorgiou & Traub, Owen
Ü µ Ó ×
×
½
¾
-
-
-
-
½
Ü µ
-
-
-
-
¾
Ü Í
´ ¼
½ µ
Ê
Ü µ Ü
Ó ×
Õ
¾
µ
¾
closed form (Mathematica) aids comparison of methods
Varies equally in all directions
QMC does well
For ¾
over 99% of variance from
1,2,3 dimensional ANOVA effects
after numerical investigation
exploiting symmetry and Gaussianity
Quasi Regression and black boxes 8
³
²
°
±
The borehole functionMorris, Mitchell, Ylvisaker
Flow from upper to lower aquifer:
¾ Ì
Ù
À
Ù
À
Ð
℄
Ð Ó
Ö
Ö
Û
½ ·
¾ Ä Ì
Ù
Ð Ó
Ö
Ö
Û
µ
Ö
¾
Û
Ã
Û
·
Ì
Ù
Ì
Ð
Ö , Ö
Û
Radii borehole, basin
Ì
Ð
, Ì
Ù
Transmissivities upper and lower
À
Ð
, À
Ù
Potentiometric heads upper and lower
Ä , Ã
Û
Length and conductivity
Diaconis: closed form understanding
Which variables are important?
Which interact?
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 3/11
Quasi Regression and black boxes 9
³
²
°
±
Black box functions
µ Without “ · ”
Examples
Semiconductors Device design Speed, heat
Aerospace Wing shape Lift, drag
Automotive Auto Frame Strength, weight
Statistics Predictors Responses
Used to design products. Cheaper than physical
experiments. Costs from milliseconds to hours. Dimension
from 3 to 300. Accuracy varies too.
Kriging widely used Journel, Huijbreghts, Sacks, Ylvisaker,
Welch, Wynn, Mitchell
Quasi Regression and black boxes 10
³
²
°
±
A small neural netVenables, Ripley
PredictÐ Ó
½ ¼
Ô Ö
µ from the others
perf published performance of computer
syct cycle time in nanoseconds
mmin minimum main memory in kilobytes
mmax maximum main memory in kilobytes
cach cache size in kilobytes
chmin minimum number of channels
chmax maximum number of channels
Function found by training on¾ ¼
examples.
Quasi Regression and black boxes 11
³
²
°
±
T h e n - n e t f u n c t i o n
¼
½
¾
½
Ü
½
·
½
¿
Ü
¾
·
½
¾
Ü
¿
½
¼
½
Ü
¼
¿
¿
Ü
·
¼
¿
¼
Ü
¾
¾
Ë
½
½
¾
·
¼
Ü
½
·
¾
¾
Ü
¾
·
¾
½
Ü
¿
½
¿
Ü
¼
Ü
·
¼
¿
Ü
·
¿
½
Ë
½
¼
·
¾
¾
Ü
½
¼
½
¼
Ü
¾
·
½
Ü
¿
·
¾
¼
Ü
·
½
¾
Ü
·
¼
¾
Ü
·
¼
¿
Ë
¼
¼
¼
½
½
Ü
½
·
¼
½
½
Ü
¾
·
¼
½
¾
Ü
¿
¼
½
¼
Ü
¼
¼
Ü
·
¼
¼
¾
Ü
w h e r e
Ë
Þ
µ
½
·
Ü
Ô
Þ
µ
℄
½
i s a s i g m o i d a l f u n c t i o n
Quasi Regression and black boxes 12
³
²
°
±
Given Ü µ on ¼
½ ℄
How can we tell if is:
1. Nearly linear?
2. Nearly additive?
3. Nearly quadratic?
4. Has mostly ¿ factor interactions or less?
5. Which variables matter most?
6. Which interactions matter most?
We would like:
1. a systematic approach
2. that also predicts
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 4/11
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 5/11
Quasi Regression and black boxes 17
³
²
°
±
InterpretationVariance of is
È
Ö ¼
¬
¾
Ö
·
Ê
Ü µ
¾
Ü
Importance of Ë isÈ
Ö ¾ Ë
¬
¾
Ö
Estimate byÈ
Ö ¾ Ë
¬
¾
Ö
Var
¬
Ö
µ
Subsets of interest include:
Ö
Ö
´ ½ µ ¼
involves Ü
½
Ö
Ö
´ ½ µ ¼ does not involve Ü
½
Ö
Ö
¼
½ additive part
Ö
¼
Ö
¼
interactions up to order
Ö
¼
Ö
½
of degree at most
Ö
Ö
µ ¼ ¿
uses only first ¿ inputs,
Quasi Regression and black boxes 18
³
²
°
±
Approximation throughintegration
Define: Ü µ
¼
Ü µ
Ô ½
Ü µ µ
Ì
Optimal ¬ is
¬
£
Ö Ñ Ò
¬
Ü µ
Ü µ
Ì
¬
¡
¾
Ü
Ü µ Ü µ
Ì
Ü
½
Ü µ Ü µ Ü
also,
Á Ë
Ü µ
Ü µ
Ì
¬ µ
¾
Ü
Quasi Regression and black boxes 19
³
²
°
±
Regression and
quasi-regression
¬
£
Ü µ Ü µ
Ì
Ü
½
Ü µ Ü µ Ü
Ü µ Ü µ Ü
by orthogonality
Observations
Ü
Í
¼
½ ℄
½
Ò
IID
Regression
¬
Ì
¡
½
Ì
Ò ¢ Ô
Ò ¢ ½
Quasi-Regression
¬
½
Ò
Ì
Quasi Regression and black boxes 20
³
²
°
±
Precursors of
quasi-regression
Quasi-interpolation
Chui & Diamond, Wang
“Ignore the denominator” (
Ì
) to get fast approximate
interpolation.
Computer experiments
Koehler and Owen 1996 advocate quasi-regression for
computer experiments
Efromovich 1992 applies qr to sinusoids on ¼
½ ℄
.
Owen 1992 describes quasi-regression for Latin hypercubesampling
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 6/11
Quasi Regression and black boxes 21
³
²
°
±
Accuracy in Monte Carlo
samplingDefine:
¬
Æ
Ô ¢ ½
½
Ò
È
Ò
½
Ì
¬ µ
Ô ¢ Ô
½
Ò
È
Ò
½
Ì
Á
Now
¬
¬
½
Ò
Ì
¬
½
Ò
Ì
¬
· µ
¬
Æ · ¬
¬
¬
Ì
µ
½
Ì
¬
· µ
¬
Ì
µ
½
Ì
Á · µ
½
Æ
Á
·
¾
¿
¡ ¡ ¡ µ Æ
Æ Æ
Quasi Regression and black boxes 22
³
²
°
±
Fast stable updatesDefine:
¬
Ò µ
Ö
½
Ò
Ò
½
Ö
Ü
µ Ü
µ
Ë
Ò µ
Ö
½
Ò
Ò
½
Ö
Ü
µ Ü
µ
¬
Ò µ
Ö
¾
Then:
¬
Ò µ
Ö
¬
Ò ½ µ
Ö
·
½
Ò
Ö
Ü
µ Ü
µ
¬
Ò ½ µ
Ö
Ë
Ò µ
Ö
Ò
½
Ò
Ë
Ò ½ µ
Ö
·
Ò
½
Ò
¾
Ö
Ü
µ Ü
µ
¬
Ò ½ µ
Ö
¾
Chan, Golub, Leveque who useÒ Ë
Ò µ
Ö
Ò
Ò ½
Ë
Ò µ
Ö
Var
¬
Ò µ
Ö
µ
Quasi Regression and black boxes 23
³
²
°
±
Updatable accuracy
estimates
Predict Ü
Ò
µ by
Ò ½
Ü
Ò
µ Ü
Ò
indep of
Ò ½
Average recent squared errors
Á Ë Ò
Ñ
µ
½
Ò
Ñ
Ò
Ñ ½
Ò
Ñ
Ò
Ñ ½
Ü
µ
½
Ü
µ
¾
on subsequence Ò
Ñ
Ñ Ñ · ½ µ
¾
estimates avg ISE over recent
Ô
¾ Ò values
Diagnostic:
Large LOF and small
È
Ö
Var
¬
Ö
µ µ
need bigger basis
Quasi Regression and black boxes 24
³
²
°
±
Presented as lack-of-fit:
½ Ê
¾
LOF
Á Ë
Î Ö
LOF
AVG
µ
¾
AVG
¬
¼
µ
¾
Ð Ó
½ ¼
Ä Ç
µ Ê
¾
±
¿
±
¾ ±
½ ¼ ±
¼ ¼ ±
½
¼ ¼ ±
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 7/11
Quasi Regression and black boxes 25
³
²
°
±
C o
s t s o f a l g e b r a
T i m e
S p a c e
F o o t p r i n t
K r i g i n g
Ç
Ò
¿
·
Ô
¿
µ
Ç
Ò
¾
·
Ô
¾
µ
Ç
Ò
·
Ô
µ
R e g r e s s i o n
Ç
Ò
Ô
¾
µ
Ç
Ô
¾
µ
Ç
Ò
Ô
µ
Q u a s i - r e g r e s s i o n
Ç
Ò
Ô
µ
Ç
Ô
µ
Ç
Ò
Ô
¾
µ
Q u a s i - r e
g a l l o w s l a r g e r
Ò
o r m u c h l a r g e r
Ô
Ô
½
¼
¼
¼
¼
¼
¼
d o a b l e b y q u a s i - r e g . ,
n o t b y r e g . O
w e n ,
A n n S
t a t 2 0 0 0
C o s t o f
D i m e n s i o n
L o w
L o w
E a s y
H i g h
L o w
K r i g i n g
L o w
H i g h
( Q u a s i - ) r e g r e s s i o n
H i g h
H i g h
[ g o o d l u c k ]
Quasi Regression and black boxes 26
³
²
°
±
Incorporating shrinkageHoerl, Kennard, Efromovich, Donoho, Johnstone, Beran
- Ò
Ü µ
Ö
-
Ö Ò
¬
Ö Ò
Ö
Ü µ -
Ö Ò
¾ ¼
½ ℄
Optimally
-
Ö Ò
¬
¾
Ö
¬
¾
Ö
· Var
¬
Ö Ò
µ
Shrinkage can reduce prediction variance.
We use data to estimate -
Ö Ò
e.g. -
Ö Ò
¬
Ò ½ µ
¾
Ö
¬
Ò ½ µ
¾
Ö
· Ë
Ò ½ µ
Ö
Quasi Regression and black boxes 27
³
²
°
±
Exploiting residualsFor Ö
¼ : ¬
Ö
µ
¬
Ö
µ , for ¾
Ê
Var
½
Ò
Ò
½
Ö
Ü
µ Ü
µ
µ
depends on c
Try
¬
¼
More generally
¬
Ò µ
Ö
½
Ò
Ò
½
Ö
Ü
µ
Ü
µ
× Ö
× ½
¬
½ µ
×
×
Ü
µ
Original quasi-reg:
Ö
¼ or ½
Ö ¼
-
Ö
½
Ö ¾ Ê
Self-consistent quasi-reg:
Ö
-
Ö
¾ ¼
½ ℄
BoundingÊ
¾ by sample variance eliminates explosive
feedback
Still updatable
¬
Ö
and Ë
Ö
NB: Ò
¬
Ò µ
Ö
¬
Ö
µ is a martingale in Ò
Quasi Regression and black boxes 28
³
²
°
±
N-net example Ü µ is prediction of Ð Ó
½ ¼
perf µ
Ö
are Legendre polynomials
Ö
are tensor products
Ö
¼
¿
Ö
½
Ö
½
µ Ô
½ ½
Net is fast, so Ò ¼ ¼
¼ ¼ ¼
(about 3min on 800Mhz PC in java)
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 8/11
²
±
S a m p l e s i z e
LOF
1 0 0
1 0 0 0
1 0 0 0 0
1 0 0 0 0 0
10^-3 10^-1 1
²
±
Res
Ü µ
¼ ½ ¾ ½ Ü
½
· ½ ¿ Ü
¾
· ½ ¾ Ü
¿
¾ ¾ Ë
½ ½ ¾ · ¼ Ü
½
· ¾ ¾ Ü
¾
· ¿ ½ Ë
½ ¼ · ¾ ¾ Ü
½
¼ ½ ¼ Ü
¾
· ¼ ¿ Ë
¼ ¼ ¼ ½ ½ Ü
½
· ¼ ½ ½ Ü
¾
·
Additive com
Var syct mmin mmax
% 0.520 0.011 0.088
Q u a s i R e gr e s s i on an d b l a c k b ox e s
3 1
³ ²
° ±
Neural net resultsNumber of bases is 1145
###### Anova at Iteration 500000 ######
1-RSquare (LOF) is 0.0011707 at iteration 499500
Beta[0] (constant factor) is 2.0717Sample mean is 2.0719, sample variance is 0.14359
Unbiased estimates of dimension variances
0.11441 0.026592 0.0027723 0.0 0.0 0.0
Dimension Probabilities
(Ratios of dimension variances to sample variance)
0.79676 0.18518 0.019307 0.0 0.0 0.0
Q u a s i R e gr e s s i on an d b l a c k b ox e s
3 2
³ ²
° ±
Neural net results, ctdVariances on one and two variables / sample variance
syct mmin mmax cach chmin
0.5177106
9.292114E-4 0.01069175
0.008898125 0.02590950 0.08782891
0.05507833 0.006469443 0.05429608 0.1301971
0.01091619 6.212815E-4 0.008541468 0.01008703 0.03679156
2.480628E-4 4.889575E-4 2.725553E-4 0.001473632 2.348261E-4
Biggest main effect: syct is ¾ %
Biggest interaction syct ¢
cach is
%
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 9/11
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 10/11
Quasi Regression and black boxes 37
³
²
°
±
Biggest interaction
Cycle Time x Cache Size Interaction
syct
c a c h
0.0 0.2 0.4 0.6 0.8 1.0
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
Ö
¼
¿
Ö
½
Ö
½
Quasi Regression and black boxes 38
³
²
°
±
2nd biggest interaction
Cycle time ¢
Main Memory Max
5.4% of
0. 20. 4
0. 60. 8
0 .2
0 .4
0 .6
0 .8
- 0 .
0 6 - 0 .
0 4 - 0 .
0 2 0 0 .
0 2 0 .
0 4
Quasi Regression and black boxes 39
³
²
°
±
2nd biggest interaction
Cycle Time x Max Main Memory Interaction
syct
m m a x
0.0 0.2 0.4 0.6 0.8 1.0
0 . 0
0 . 2
0 . 4
0 . 6
0 . 8
1 . 0
Ö
¼
¿
Ö
½
Ö
½
Quasi Regression and black boxes 40
³
²
°
±
N-net conclusions1.
a fairly simple function wrt Í ¼
½ ℄
2. Ü
½
most important, and nearly linear
3. At least one interaction not supported by data
4. Non-random cross-validation (leave out clusters) might
help
8/14/2019 qregup4
http://slidepdf.com/reader/full/qregup4 11/11
Quasi Regression and black boxes 41
³
²
°
±
Next directions
1. Mars-like dynamic choice of basis
2. Comparisons of and
on training data
3. Decompositions of
under empirical measures
4. Distinguishing structure from
artifacts
5. More types of statistical/ML black boxes
6. Missing data (arise in function mining too)
7. Stopping rules
8. More basis function choices
9. Block diagonal or banded
Ì
µ
(EG -splines)
10. Examples with noise ( unusable basis fns)
Quasi Regression and black boxes 42
³
²
°
±
Robot arm functionRobot arm has joints: Lengths Ä
, angles
Shoulder at´ ¼
¼ µ
, hand at Ù Ú
µ :
Ù
½
Ä
Ó ×
½
Ú
½
Ä
× Ò
½
is shoulder to hand distance
Ä
½
Ä
¾
Ä
¿
Ä
½
¾
¿
Ô
Ù
¾
· Ú
¾
¼
Ä
½ ¼
¾