![Page 1: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/1.jpg)
LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES
Kev i n E l l i s - M I T , Dan ie l R i t c h i e - B rown Un i ve r s i t y, A r mando So la r- Lezama- M IT , Jo s hua b . Tenenbaum - M IT
Presented by : Maliha ArifAdvanced Computer Vision -CAP 641229th March 2018
![Page 2: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/2.jpg)
OUTLINE
Introduction / Motivation
Road map – Trace Hypothesis
Stage 1 – Obtaining a Trace Set using Neural Network Training Generalization to Hand Drawings
Stage 2 – Obtaining a Program from Trace setSearch AlgorithmCompression Factor
Applications
2
![Page 3: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/3.jpg)
MOTIVATIONHand Drawn Images
Trace Set- Rendered in LatexSynthesized Program
3
![Page 4: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/4.jpg)
TRACE HYPOTHESIS
Stage 1 Stage 2
4
![Page 5: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/5.jpg)
STAGE 1: NEURAL NETWORK
5
![Page 6: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/6.jpg)
NEURAL NETWORK COMPONENTS
CNN layers
Layer 1: 20 8 × 8 convolutions, 2 16 × 4 convolutions, 2 4 × 16 convolutions. Followed by 8 × 8 pooling with a stride size of 4.Layer 2: 10 8 × 8 convolutions. Followed by 4 × 4 pooling with a stride size of 4.
Drawing commands are predicted Token by Token using MLPs and STNs
Example Circle command is predicted , then X coordinate then Y coordinate
6
![Page 7: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/7.jpg)
NEURAL NETWORK
Circle command or first token is predicted using multinomial logistic regression using a Softmax layer
Subsequent Tokens are predicted using a Differentiable Attention Mechanism and previously predicted Tokens
STN ( Spatial Transformer Network )
7
![Page 8: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/8.jpg)
CNNs lack the ability to be spatially invariant in a computationally and parameter efficient manner.
Max Pooling layers in CNNs satisfy this property where the receptive fields are fixed and local.
STN ( Spatial Transformer Network ) is a dynamic mechanism that can actively spatially transform an image or feature map.
STN- SPATIAL TRANSFORMER NETWORK
Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015
8
![Page 9: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/9.jpg)
STN – SPATIAL TRANSFORMER NETWORKS
Spatial transformers can be incorporated into CNNs to benefit multifarious tasks such as:
Image ClassificationCo-localizationSpatial attention
We use them here for spatial attention.
Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015
9
![Page 10: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/10.jpg)
STN – SPATIAL TRANSFORMER NETWORKS
The Spatial transformer mechanism is split into 3 parts
Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015 10
![Page 11: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/11.jpg)
Output pixels are defined to lie on aregular grid.
Target
Sampling Grid
Grid Generator + Sampler : Affine transform
Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015
STN – SPATIAL TRANSFORMER NETWORKS
11
![Page 12: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/12.jpg)
TargetSource
Spatial Transformer Networks : Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, CVPR 2015
STN – SPATIAL TRANSFORMER NETWORKS
12
![Page 13: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/13.jpg)
STAGE 1: NEURAL NETWORK
13
![Page 14: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/14.jpg)
PROBABILITY DISTRIBUTION OF THE CNNDistribution over next drawing command
Distribution over traces
14
![Page 15: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/15.jpg)
TRAINING
Network may predict incorrectly
To make the network more robust, SMC is employed
SMC – Sequential Monte Carlo Sampling uses Pixel wise distance as a surrogate for likelihood function
15
![Page 16: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/16.jpg)
RESULTS Edit Distance = size of the symmetric difference between the ground truth trace set and the set produced by the model.
16
![Page 17: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/17.jpg)
GENERALIZING TO HAND DRAWINGS
Actual hand drawings are not used, rather noise is introduced into rendering of training target images
They are produced in the following ways:
Rescaling the image intensity by a factor chosen from [0.5, 1.5] Translating the image by ±3 pixels chosen uniformly Rendering the LATEX using the pencildraw style
17
![Page 18: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/18.jpg)
LIKELIHOOD FUNCTION
L learned
Model now predicts 2 scalars T1 –T2 and T2 - T1
where T1 is rendered output with noiseand T2 is rendered output
18
![Page 19: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/19.jpg)
EVALUATION
100 hand drawn images on graph
paper snapped to a 16 x 16 grid
IoU is intersection over union between
ground truth and predicted trace set
We use IoU to measure the system's
accuracy at recovering trace sets.
19
![Page 20: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/20.jpg)
STAGE 2 : PROGRAM SYNTHESIS
20
![Page 21: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/21.jpg)
COST FUNCTION
Cost is computed in the following way
Programs incur a cost of 1 for each command ( primitive drawing action, loop, or reflection )They incur a cost of 1/3 for each unique coefficient they use in a linear transformation
21
![Page 22: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/22.jpg)
BIAS OPTIMAL SEARCH POLICY
Bias Optimality Search Algorithm : A search algorithm is n-bias optimal w.r.t a distribution. if it is guaranteed to find a solution in ( ) after searching for at least
We use a 1 – bias optimal search algorithm which means we spend time looking in the program space.
This implies that we look in the whole program space
22
![Page 23: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/23.jpg)
TRAINING Policy is given by Define a loss to train the search Policy
D is the training corpus i.e. minimum cost programs for few trace sets. denotes parameters of the search policy
23
![Page 24: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/24.jpg)
BILINEAR MODEL
where denote a one-hot encoding of the parameter settings of sigma, 24 dimensions long
24
![Page 25: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/25.jpg)
SEARCH POLICY - COMPARISON
25
![Page 26: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/26.jpg)
COMPRESSION (TRACE SET AND PROGRAM )
Compression Factor
21/6 = 3.5x
Compression Factor
13/6 = 2.2x
26
Ising Model – Lattice Structure
Another example
![Page 27: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/27.jpg)
APPLICATIONS Correcting Errors made by the neural network using Prior Probability of programs
Hand Drawing Trace Set –Rendered in Latex
Program Synthesizer's Output27
![Page 28: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/28.jpg)
Modelling similarity between DrawingsAPPLICATIONS
28
![Page 29: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/29.jpg)
Extrapolating Figures
APPLICATIONS
Example 1 Example 2
29
![Page 30: LEARNING TO INFER GRAPHICS PROGRAMS FROM …crcv.ucf.edu/courses/CAP6412/Spring2018/ADV_Presen… · · 2018-04-17LEARNING TO INFER GRAPHICS PROGRAMS FROM HAND DRAWN IMAGES Kevin](https://reader031.vdocuments.net/reader031/viewer/2022022510/5ada2b717f8b9add658c4873/html5/thumbnails/30.jpg)
THANK YOU Discussion
30