bair retreat 3/28/17 · learning a universal driving policy • self driving as egomotion...
TRANSCRIPT
![Page 1: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/1.jpg)
BAIR Retreat 3/28/17
Trevor DarrellUC Berkeley
![Page 2: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/2.jpg)
Overview
Adversarial Domain Adaptation
Learning end-to-end driving models from crowdsourced dashcams
Vision and Language: Learning to reason to answer and explain
2
![Page 3: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/3.jpg)
Adversarial Domain Adaptation
3
Trevor
Darrell
UC Berkeley
Judy
HoffmanUC Berkeley/
Stanford
Eric
TzengUC Berkeley
![Page 4: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/4.jpg)
Source Data
backpack chair bike
fc8conv1 conv5 source
data
fc6
fc7
classificationloss
Adapting across domains ?
![Page 5: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/5.jpg)
Source Data
backpack chair bike
Target Data
?
fc8conv1 conv5
fc6
fc7
Adapting across domains ?
• Applying source classifier to target domain can yield inferior performance…
classificationloss
![Page 6: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/6.jpg)
Source Data
backpack chair bike
Target Databackpack
?
fc8
conv1 conv5fc6
fc7 labeled
target data
fc8conv1 conv5 source
data
fc6
fc7
classificationloss
shar
ed
shar
ed
shar
ed
shar
ed
shar
ed
Adapting across domains ?
• Fine tune? …..Zero or few labels in target domain
• Siamese network?…..No paired / aligned instance examples!
![Page 7: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/7.jpg)
Adapting across domains: minimize discrepancy
[ICCV 2015]
object classifier
![Page 8: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/8.jpg)
Adapting across domains: minimize discrepancy
[ICCV 2015]
object classifier
Discrepency
![Page 9: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/9.jpg)
Adapting across domains: minimize discrepancy
[ICCV 2015]
object classifier
domain classifier
![Page 10: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/10.jpg)
Adapting across domains: minimize discrepancy
[ICCV 2015]
domain classifier
![Page 11: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/11.jpg)
Adapting across domains: minimize discrepancy
[ICCV 2015]
domain classifier
![Page 12: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/12.jpg)
Adapting across domains: minimize discrepancy
[ICCV 2015]
domain classifier
![Page 13: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/13.jpg)
Adapting across domains: minimize discrepancy
[ICCV 2015]
object classifier
![Page 14: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/14.jpg)
Source Data
backpac
kchair bike
fc
8conv1 conv5
fc
6
fc
7labeled target
data
fc
8conv1 conv5source
data
fc
D
fc
6
fc
7
classification
loss
domain
confusion
loss
domain
classifier
loss
share
d
share
d
share
d
share
d
share
d
Domain Label Cross-entropy with uniform distribution
Adversarial Training of domain label predictor
and domain confusion loss:
Deep domain confusion
Unlabeled Target Data
?
[Tzeng ICCV15 ]
![Page 15: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/15.jpg)
Deep domain confusion
Train a network to minimize classification loss AND confuse two domains
[Tzeng ICCV15 ]
source
inputs
target
inputs
network
parameters
(fixed)
domain
classifier
(learn)domain classifier loss
domain classifier prediction
network
parameters (learn)
domain confusion loss
= 𝑝(𝑦𝐷 = 1|𝑥)
(cross-entropy with uniform distribution)
domain
classifier (fixed)
16
iterate
![Page 16: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/16.jpg)
Source Data + Labels
backpac
kchair bike
Unlabeled Target Data
?
Encoder
Cla
ssifie
r
Encoderclassification
loss
Adversarial Discriminative Domain Adaptation (ADDA)
Discriminator Adversarial
loss
Which loss?Shared or not?Generative or
discriminative?GAN
(in submission)
![Page 17: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/17.jpg)
Adversarial Discriminative Domain Adaptation (ADDA)
source images + labels
Clas
sifie
r
Pre-training
classlabel
source images
SourceCNN
Dis
crim
inat
or
Adversarial Adaptation
domainlabel
TargetCNN
target images
Clas
sifie
r
Testing
classlabel
TargetCNN
target image
SourceCNN
(in submission)
![Page 18: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/18.jpg)
ADDA: Adaptation on digits (in submission)
![Page 19: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/19.jpg)
ADDA: Adaptation on RGB-D (in submission)
Train on RGB
Test on depth
![Page 20: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/20.jpg)
Autonomous Driving Paradigms
1) Learn affordances to predict state; apply rules or learned classic controllers
2) Abandon engineering principles, learn “end-to-end” policy
![Page 21: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/21.jpg)
Autonomous Driving Paradigms
1) Learn affordances to predict state; apply rules or learned classic controllers
How can visual sensing be robust to new enviroments?
2) Abandon engineering principles, learn “end-to-end” policy
How to learn generic driving policies from diverse data?
![Page 22: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/22.jpg)
Autonomous Driving Paradigms
1) Learn affordances to predict state; apply rules or learned classic controllers
How can visual sensing be robust to new enviroments? …Fully Convolutional Domain Adaptation “in the wild”
2) Abandon engineering principles, learn “end-to-end” policy
How to learn generic driving policies from diverse data?…Learning end-to-end driving policy/model from crowdsourced videos
![Page 23: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/23.jpg)
BDD Dataset
BDD Video BDD Segmentation
• 720p 30fps 40s video clips
• ~50K clips
• GPS + IMU
• 720p images
• Fine instance segmentation
• Compatible with Cityscapes
![Page 24: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/24.jpg)
In-domain fully supervised FCN
Train on Cityscapes, Test on Cityscapes
![Page 25: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/25.jpg)
Domain shift: Cityscapes to SF
Train on Cityscapes, Test on San Francisco Dashcam
![Page 26: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/26.jpg)
No tunnels in CityScapes?...
![Page 27: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/27.jpg)
![Page 28: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/28.jpg)
Medium Shift: Cross Seasons Adaptation
![Page 29: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/29.jpg)
Small Shift: Cross City Adaptation
![Page 30: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/30.jpg)
Before domain confusion After domain confusion
Effect of domain confusion loss
![Page 31: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/31.jpg)
Before domain confusion After domain confusion
Effect of domain confusion loss
![Page 32: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/32.jpg)
Before domain confusion After domain confusion
Effect of domain confusion loss
![Page 33: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/33.jpg)
Before domain confusion After domain confusion
Effect of domain confusion loss
![Page 34: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/34.jpg)
BDD Dataset – static
![Page 35: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/35.jpg)
Overview
Adversarial Domain Adaptation
Learning end-to-end driving models from crowdsourced dashcams
Vision and Language: Learning to reason to answer and explain
37
![Page 36: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/36.jpg)
Learning and Adapting from Large-Scale Driving Data
• Fully Convolutional Domain Adaptation “in the wild”
• Learning end-to-end driving policy/model from dashcam videos
![Page 37: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/37.jpg)
End-to-End Paradigm
• ALVINN
• DAVE
• NVIDIA
• BDD RC Cars
• BDD WebCam
![Page 38: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/38.jpg)
AVLINN(1989)
![Page 39: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/39.jpg)
DAVE (2003)
Yann LeCun, Eric Cosatto, Jan Ben, Urs Muller, Beat Flepp: End-to-End Learning of Vision-Based Obstacle Avoidance for Off-Road Robots. Delivered at the Learning@Snowbird Workshop, April 2004.
![Page 40: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/40.jpg)
NVIDIA (2016)
![Page 41: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/41.jpg)
[Karl Zipser]
![Page 42: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/42.jpg)
model driving car, ‘direct’ mode
![Page 43: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/43.jpg)
10 future
timepoints
10 future
timepoints
conv1 96 channels
384 channels
4 channels
conv2
metadata input 4 channels
camera input
fully connected 512 channels
steering motor
![Page 44: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/44.jpg)
![Page 45: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/45.jpg)
![Page 46: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/46.jpg)
Driving Policy
![Page 47: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/47.jpg)
Learning a universal driving policy
• Self driving as egomotion prediction
• Learn general driving policy that is
applicable to all car models.
• Use a large number of easily accessible
dashcam videos as self-supervision.
![Page 48: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/48.jpg)
FCN-LSTM
Visual Encoder
• Dilated Fully Convolutional Nets could provide more spatial details than CNN
Temporal Fusion
• Fuse the visual information, vehicle state (speed and angular velocity) from
each frame
![Page 49: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/49.jpg)
FCN-LSTM
Privileged Learning
• The model should implicitly know what
objects are in the scene
• We use the semantic segmentation mask
from Cityscapes as extra source of
supervision
• It ultimately improves the learnt
representation of the dilated FCN
Vapnik V, Vashist A. A new learning paradigm: Learning using privileged information[J]. Neural Networks the Official Journal of the International Neural Network Society, 2009, 22(5-6):544-57.
![Page 50: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/50.jpg)
Dataset
Sample frames from the dataset
• Real first person driving
videos
• Diverse• City
• Highway
• Rainy days
• Nights and evenings
• Construction zones
![Page 51: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/51.jpg)
Scene and Trajectory Reconstruction of
Crowd-sourced Driving Videos using
Semantic Filtered SfM
Yang Gao*, Huazhe Xu*, Christian Hane, Fisher Yu,
Trevor Darrell
![Page 52: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/52.jpg)
Challenging Driving Videos in the Wild
Challenges
Moving Objects
Subtle behaviors
Lane changing
Slight Steering
Unknown Camera Calibration
Rolling Shutters
![Page 53: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/53.jpg)
Existing Motion-Based Method Failed to Reject Moving Object from
the Scene
Keypoints from motion-based keypoints rejection
methods
Keypoints from our Semantic Filtered SfM pipeline. Most
moving keypoints have been filtered out.
![Page 54: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/54.jpg)
Semantically Filtered SfM: 𝑆𝑓 2𝑀
Classical keypoints matching as points pair preference ranking
𝑀 𝑖1, 𝑖2 =1
𝑑 𝐼1, 𝑖1 , 𝑑 𝐼2, 𝑖2 2
M is the preference score over point pair 𝑖1, 𝑖2 , defined by distance between two low level descriptors 𝑑(⋅,⋅).
Classical matchings could be formulated as ranking based on 𝑀 ⋅,⋅
Semantics should be incorporated in SfM to be robust to moving objects
𝑀 𝑖1, 𝑖2 =𝑆𝑒𝑚𝑎𝑛𝑡𝑖𝑐 𝐼1,𝐼2 [𝑖1,𝑖2]
𝑑 𝐼1,𝑖1 ,𝑑 𝐼2,𝑖2 2
Use the FCN as a semantic term
𝑆𝑒𝑚𝑎𝑛𝑡𝑖𝑐 𝐼1, 𝐼2 𝑖1, 𝑖2 = 𝐹𝐶𝑁 𝐼1 𝑖1 ⋅ 𝐹𝐶𝑁(𝐼2)[𝑖2]
![Page 55: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/55.jpg)
City Turning Example
![Page 56: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/56.jpg)
Lots of Moving Vehicles Example
![Page 57: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/57.jpg)
Recover the subtle car backing behavior
![Page 58: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/58.jpg)
Experiments – Continuous Actions
Lane following: left and right
![Page 59: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/59.jpg)
Experiments – Continuous Actions
Intersection
![Page 60: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/60.jpg)
Experiments – Continuous Actions
Side Walk
![Page 61: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/61.jpg)
![Page 62: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/62.jpg)
BDD Dataset – video
![Page 63: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/63.jpg)
Overview
Adversarial Domain Adaptation
Learning end-to-end driving models from crowdsourced dashcams
Vision and Language: Learning to reason to answer and explain
65
![Page 64: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/64.jpg)
Explainable AI (XAI): Visual Explanations
Western Grebe
This is a Western Grebe because this bird has a long white neck, pointy yellow beak and red eye.
Laysan AlbatrosThis is a Laysan Albatross because this bird has a hooked yellow beak white neck and black back.
This bird has black and white feathers, with a white neck and a yellow beak.
Visual Explanations
Class definitions
ImageDescriptions
Image Relevance
Class discriminating
![Page 65: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/65.jpg)
Explainable Models with Implicit Capabilities
•Translate DNN hidden state into • human-interpretable language • visualizations and exemplars
Can you park here?
![Page 66: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/66.jpg)
Visual Question Answering
![Page 67: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/67.jpg)
![Page 68: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/68.jpg)
NAACL 2016: MCB with Attention• Predict spatial attentions with MCB
CN
N(R
esNet1
52
)
Tile
MCB
WE, LSTM
16k x14x14Signed
Sqrt
L2 N
orm
alization2048x14x14
2048x14x14
Co
nv, R
elu
512
x 1
4 x
14
Softm
ax
1 x
14 x
14
Weigh
ted Su
m
Co
nv
Full C
on
nected
CornMCB
16k
Softm
ax
Signed
Sqrt
L2 N
orm
alization
16k
16k
3000
2048
2048
What is the yellow food?
Attention for captioning : - K. Xu, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Attention for VQA : - H. Xu, K. Saenko Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering- J.Lu Hierarchal Question-Image Co-Attention for Visual Question Answering
![Page 69: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/69.jpg)
Winner VQA Challenge 2016 (real open ended)
535455565758596061626364656667
UC
Be
rkel
ey &
So
ny
Nav
er
Lab
s
DLA
IT
snu
bi-
nav
erl
abs
PO
STEC
H
Bra
nd
eis
VTC
om
pu
terV
iso
n
MIL
-UT
klab
SHB
_10
26
MM
CX
VT_
CV
_Jia
sen
LV-N
US
AC
VT_
Ad
elai
de
UC
Be
rkel
ey (
DN
MN
)
CN
NA
tt
san
UC
Be
rkel
ey (
NM
N)
glo
bal
_vis
ion
vqat
eam
-dee
pe
rLST
M_N
orm
lizeC
NN
Mu
jtab
a h
asan RIT
UP
V_U
B
Bo
lei
att
vqat
eam
-lst
m_
cnn
UP
C
Test
-Ch
alle
nge
Acc
ura
cy
ou
rs
![Page 70: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/70.jpg)
Attention Visualizations
What is the woman feeding the giraffe?Carrot[Groundtruth: Carrot]
![Page 71: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/71.jpg)
Attention Visualizations
What color is her shirt?Purple[Groundtruth: Purple]
![Page 72: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/72.jpg)
Attention Visualizations
What is her hairstyle for the picture?Ponytail[Groundtruth: Ponytail]
![Page 73: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/73.jpg)
Attention Visualizations
What color is the chain on the red dress?Pink[Groundtruth: Gold]
• Correct Attention, Incorrect Fine-grained Recognition
![Page 74: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/74.jpg)
Attention Visualizations
Is the man going to fall down?No[Groundtruth: No]
![Page 75: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/75.jpg)
Attention Visualizations
What is the surface of the court made of?Clay[Groundtruth: Clay]
![Page 76: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/76.jpg)
Attention Visualizations
What sport is being played?Tennis[Groundtruth: Tennis]
![Page 77: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/77.jpg)
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
![Page 78: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/78.jpg)
![Page 79: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/79.jpg)
Human ground truth for the textual justification task.
![Page 80: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/80.jpg)
Human ground truth for the pointing task.
![Page 81: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/81.jpg)
Discussing different evidence for different images.
![Page 82: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/82.jpg)
Discussing different evidence for different questions.
![Page 83: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/83.jpg)
Discussing different evidence for different questions.
![Page 84: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/84.jpg)
Differentiating between some activities requires understanding
special equipment.
![Page 85: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/85.jpg)
Differentiating between some activities requires recognizing
specific context.
![Page 86: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/86.jpg)
Differentiating between some activities requires recognizing
specific context.
![Page 87: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/87.jpg)
Explanations when the model predicts the wrong answer.
![Page 88: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/88.jpg)
Explainable Models with Explicit Capabilities
Explain higher-level reasoning in DNNs
Explainable decision path for multi-task, control and planning
Provide structure and intermediate state
Can you park here?
![Page 89: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/89.jpg)
Explainable Models with
Explicit and Implicit Capabilities
No, because there is a person in the bus.
![Page 90: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/90.jpg)
Grounded question answering
yes
93
Is there a red shape above
a circle?
![Page 91: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/91.jpg)
Neural nets learn lexical groundings
yes
94
[Iyyer et al. 2014, Bordes et al. 2014, Yang et al. 2015, Malinowski et al., 2015]
Is there a red shape above
a circle?
![Page 92: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/92.jpg)
Semantic parsers learn composition
yes
97
[Wong & Mooney 2007, Kwiatkowski et al. 2010,Liang et al. 2011, A et al. 2013]
Is there a red shape above
a circle?
![Page 93: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/93.jpg)
Neural module networks learn both!
yesIs there a red shape above
a circle?
redandand
98
![Page 94: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/94.jpg)
Neural module networks
Is there a red shape above a circle?
red
exists true↦
↦
above ↦
99
![Page 95: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/95.jpg)
Neural module networks
Is there a red shape above a circle?
red
exists true↦
↦
above ↦
circle
red above
exists
and
100
![Page 96: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/96.jpg)
Neural module networks
yesIs there a red shape
above a circle?
red
exists true↦
↦
above ↦
circle
red above
exists
and101
![Page 97: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/97.jpg)
Q: Are there an equal number of large things and metal spheres?
Q: What size is the cylinder that is left of the brown
metal thingthat is left of the big sphere?
Q: There is a sphere with the same size as the metal cube; is
it made of the same material as the small red sphere?
Q: How many objects are either small cylinders or red things?
Questions in CLEVR test various aspects of visual reasoning
including attribute identification,counting, comparison, spatial
relationships, andlogical operations.
![Page 98: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/98.jpg)
Learning to Reason: End-to-End Module
Networks for Visual Question Answering
R. Hu, J. Andreas, M. Rohrbach, T. Darrell, K. Saenko
![Page 99: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/99.jpg)
Background
Natural language is compositional: the meaning of a sentence comes from the meanings of its components.
Different questions may require significantly different reasoning procedure.
● What kind of vehicle is the one on the left of the brown car that is next to the building?
● Why is the person running away?
![Page 100: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/100.jpg)
Background
● Neural Module Networks: dynamic inference structure for each question
● Previous work: structure from NLP parser or parser re-ranking
● This work: learned layout policy to dynamically build a network
![Page 101: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/101.jpg)
End-to-End Module Networks (N2NMN)
Components
● Layout policy p(l | q) with sequence-to-sequence RNN
● Neural modules with co-attention, dynamically assembled into a network
![Page 102: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/102.jpg)
End-to-end Training
Loss
where is the softmax loss of the answer
Optimization: policy gradient method
Easier: behavior cloning from expert layout policy
![Page 103: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/103.jpg)
Experiments on the CLEVR dataset
![Page 104: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/104.jpg)
Cloning
expert
End-to-end
optimization
after cloning
![Page 105: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/105.jpg)
Policy Search from scratch (no experts used)
Even without resorting to an expert policy during training, our method still achieves state-of-the-art
performance with reinforcement learning from scratch.
![Page 106: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/106.jpg)
Accuracy v.s. Question length
Even on long questions with 30+ words, our method still achieves relatively high accuracy (Figure a).
![Page 107: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/107.jpg)
Detailed output visualization
![Page 108: BAIR Retreat 3/28/17 · Learning a universal driving policy • Self driving as egomotion prediction • Learn general driving policy that is applicable to all car models. • Use](https://reader035.vdocuments.net/reader035/viewer/2022070801/5f0293907e708231d404f367/html5/thumbnails/108.jpg)
Overview
Adversarial Domain Adaptation
Learning end-to-end driving models from crowdsourced dashcams
Vision and Language: Learning to reason to answer and explain
113