learning inhomogeneous gibbs models ce liu [email protected]
TRANSCRIPT
Learning Inhomogeneous Gibbs Models
How to Describe the Virtual World
Histogram
Histogram: marginal distribution of image variances
Non Gaussian distributed
Texture Synthesis (Heeger et al, 95)
Image decomposition by steerable filters Histogram matching
FRAME (Zhu et al, 97)
Homogeneous Markov random field (MRF) Minimax entropy principle to learn homogeneous
Gibbs distribution Gibbs sampling and feature selection
Our Problem
To learn the distribution of structural signals
Challenges• How to learn non-Gaussian distributions in high
dimensions with small observations?• How to capture the sophisticated properties of the
distribution?• How to optimize parameters with global convergence?
Inhomogeneous Gibbs Models (IGM)
A framework to learn arbitrary high-dimensional distributions
• 1D histograms on linear features to describe high-dimensional distribution
• Maximum Entropy Principle– Gibbs distribution
• Minimum Entropy Principle– Feature Pursuit
• Markov chain Monte Carlo in parameter optimization
• Kullback-Leibler Feature (KLF)
1D Observation: Histograms
Feature f(x): Rd→ R• Linear feature f(x)=fTx• Kernel distance f(x)=|| -f x||
Marginal distribution
Histogram dxxfxzzh T )()()(
N
ii
T xN
H1
)(1 )0,,0,1,0,,0()( i
T x
Intuition
)(xf
1
2
1H
2H
Learning Descriptive Models
)(xf
1
2
obsH1
obsH2
1
2
synH1
synH2=
)()( xpxf
Learning Descriptive Models
Sufficient features can make the learnt model f(x) converge to the underlying distribution p(x)
Linear features and histograms are robust compared with other high-order statistics
Descriptive models
},,1),()(|)({ mizhzhxp pff ii
Maximum Entropy Principle
Maximum Entropy Model• To generalize the statistical properties in the observed• To make the learnt model present information no more
than what is available Mathematical formulation
})(log)(max{arg
))((maxarg)(*
dxxpxp
xpentropyxp
miHH fp
ii,,1,: tosubjected
Intuition of Maximum Entropy Principle
)}()(|)({11zHzHxp pf
f
)(xf
1
synH1
)(* xp
Solution form of maximum entropy model
Parameter:
})(),(exp{)(
1);(
1
m
i
Tii zZ
xp
Inhomogeneous Gibbs Distribution
)(zi
Gibbs potential
)( xTi )(),( xz Tii
}{ i
Estimating Potential Function
Distribution form
Normalization
Maximizing Likelihood Estimation (MLE)
1st and 2nd order derivatives
})(,exp{)(
1);(
1
m
i
Tii x
Zxp
dxxZm
i
Ti })(,exp{)(
1
)(maxarg);(log)(:Let *
1
LxpLn
ii
f
iii
HZ
Z
L
1)( obsTixp i
HxE )();(
Parameter Learning
Monte Carlo integration
Algorithm
synTixp i
HxE )]([);(
obssyn
iii
HHL
)(
)}({},{:Input zH obsi i
si},{:Initialize
);(~}{:Sampling xpximiH syn
i:1,:histograms syn Compute
miHHs obssyni ii
:1),(:parameters Update
),(:sdivergence Histogram1
obssynm
i iiHHKLD
D:Untill
s Reduce
}{Λ:Output ix,
Loop
Gibbs Sampling
x
y
),,,|(~ )()(3
)(21
)1(1
tK
ttt xxxxx
),,,|(~ )()(3
)1(12
)1(2
tK
ttt xxxxπx
),,|(~ )1(1
)1(1
)1(
tK
tK
tK xxxx
Minimum Entropy Principle
Minimum entropy principle• To make the learnt distribution close to the observed
Feature selection
dxxp
xfxfxpfKL
);(
)(log)());(,(
**
)];([log)]([log * xpExfE ff
))(());(( * xfentropyxpentropy
}{ )(i
));((minarg ** Ipentropy
})(,)(,exp{)(
1);(
1
xx
Zxp T
m
i
Tii
})(,exp{)(
1);(
1
m
i
Tii x
Zxp
Feature Pursuit
A greedy procedure to learn the feature set
Reference model
Approximate information gain
},{
));(),(());(),(()( xpxfKLxpxfKLd ref
Kii 1}{
));(),((maxarg);(
xpxfKLp
xp
ref
Proposition
The approximate information gain for a new
feature is
and the optimal energy function for this feature is
),()( pobs HHKLd
obs
p
H
H
log
Kullback-Leibler Feature
Kullback-Leibler Feature
Pursue feature by
• Hybrid Monte Carlo
• Sequential 1D optimization
• Feature selection
z
syn
obsobssynobs
KL zH
zHzHHHKL
)(
)(log)(maxarg),(maxarg
Acceleration by Importance Sampling
Gibbs sampling is too slow… Importance sampling by the reference model
})(,exp{)(
1),(
1
1
m
i
Ti
refiref
ref xZ
xp
})(),(exp{1
1
m
i
refj
Ti
refiij xw
),(~ refrefj xpx
Flowchart of IGM
IGMSyn
Samples
Obs Samples
FeaturePursuit
KL Feature
KL<e
Output
MCMC
Obs Histograms
N
Y
Toy Problems (1)
Synthesizedsamples
Gibbs potential
Observedhistograms
Synthesizedhistograms
Featurepursuit
Mixture of two Gaussians Circle
Toy Problems (2)
Swiss Roll
Applied to High Dimensions
In high-dimensional space• Too many features to constrain every dimension• MCMC sampling is extremely slow
Solution: dimension reduction by PCA
Application: learning face prior model• 83 landmarks defined to represent face (166d)• 524 samples
Face Prior Learning (1)
Observed face examples Synthesized face samples without any features
Face Prior Learning (2)
Synthesized with 10 features Synthesized with 20 features
Face Prior Learning (3)
Synthesized with 30 features Synthesized with 50 features
Observed Histograms
Synthesized Histograms
Gibbs Potential Functions
Learning Caricature Exaggeration
Synthesis Results
Learning 2D Gibbs Process
Observed Pattern Triangulation Random Pattern
Obs Histogram (1)
Synthesized Histogram1
Syn Pattern (1)Syn Histogram (1)
Obs Histogram (2)
Obs Histogram (3)
Synthesized Histogram2
Synthesized Histogram3
Obs Histogram (4)
Syn Pattern (2)
Syn Pattern (3)
Syn Pattern (4)
Syn Histogram (2)
Syn Histogram (3)
Syn Histogram (4)