camera-based vehicle velocity estimation using spatiotemporal depth...
TRANSCRIPT
![Page 1: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/1.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Camera-based Vehicle Velocity Estimationusing Spatiotemporal Depth and Motion Features
Graz University of Technology, Austria
* equal contribution
Moritz Kampelmuehler* [email protected]
Christoph [email protected]
Michael Mueller* [email protected]
![Page 2: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/2.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
TuSimple Velocity Estimation Challenge
o Information about the position, as well as the motion of the agents in the vehicle’ssurroundings plays an important role in motion planning.
o Traditionally, such information is perceived by an expensive range sensor, e.g LiDAR.
![Page 3: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/3.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Data statistics
o Daytime recorded video on highway
o Vehicles with relative distance ranging from 5meters to up to 90 meters.
o Size:
• Train: 1074 clips of 2s videos in 20 frames persecond. 3222 annotated vehicles.
• Test: 269 clips of videos with same format astraining data.
o Annotations:
• Relative position and velocity generated byrange sensors for longitudinal (X) and lateral(Y) direction
o Supplementary data:
• 5066 images with human labeled boundingboxes on vehicles (not used)
(ground truth velocity in m/s)(estimated velocity in m/s)
![Page 4: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/4.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Tracker
Architecture - Tracks
RGB
Input frames
Z. Kalal, K. Mikolajczyk, and J. Matas. Forward-backward error: Automatic detection of tracking failures. In ICPR 2010.
![Page 5: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/5.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Tracker
Architecture - Depth
RGB
Input frames
C. Godard, O. Mac Aodha, and G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In CVPR, 2017.DepthNet
RGB
Input frames
![Page 6: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/6.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Tracker
Architecture – Flow
RGB
Input frames
DepthNet
RGB
Input frames
FlowNet
Eddy Ilg, Nikolaus Mayer, T. Saikia, Margret Keuper, Alexey Dosovitskiy, Thomas BroxFlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks In CVPR, 2017
![Page 7: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/7.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Tracker
FlowNet
DepthNet
width
depth
height
Overall Architecture
RGB
Input frames
stack
smooth
pool
*
![Page 8: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/8.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
4 x 60D
v
p
64D
Regressing features to velocities and positions
time
width
height
Depth
Flow
Tracks
o Dense Features
o Crop2D
o Temporal Stack & Smooth
L2 loss
Weight decay, Dropout & CReLU [1]
[1] W. Shang, K. Sohn, D. Almeida, and H. Lee. Understanding and improving convolutional neural networks via concatenated rectied linear units. CoRR,abs/1603.05201, 2016.
![Page 9: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/9.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Tracker
FlowNet
DepthNet
Overall Architecture
RGB
Input frames
stack
Backward propagation
v
p
Forward propagation
smooth
pool
*
Fully-connected Net
width
depth
height
![Page 10: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/10.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Training & Testing Results
o Regressor has to operate on diverse feature scales
o We train three Distance-models for near / med / far scales based on the box area
o Varying [hidden layers x units]: [3x40] (near), [4x60] (medium), and [4x70] (far).
o Training data for each distance is split up into 5 partitions.
o 4 are used as training set, the 5th is used for validation
o After 2000 epochs, the model with the lowest validation error is chosen
o This results in 3x5 models for the entire dataset.
o Performance gain especially for med and far cases
![Page 11: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/11.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Qualitative Results – Test set
(estimated velocity in m/s)
![Page 12: Camera-based Vehicle Velocity Estimation using Spatiotemporal Depth …feichtenhofer.github.io/pubs/Feichtenhofer_Autonomous... · 2020-04-10 · Training & Testing Results o Regressor](https://reader036.vdocuments.net/reader036/viewer/2022070719/5edeab37ad6a402d666a0045/html5/thumbnails/12.jpg)
Moritz Kampelmuehler, Michael Mueller, Christoph Feichtenhofer
Summary
o Our approach is based on spatiotemporal depth, motion and tracking features for regression of vehicle velocities
RGB
Input frames
Backward propagation
v
p
Forward propagation
Tracker
FlowNet
DepthNet
width
depth
height
stack
smooth
pool
*
o The highly abstract feature representation allows learning from few training samples
o Improvements could be made by backpropagating into the ConvNet layers which wouldbenet from larger datasets
o Implemented by two students in one week of work - Thanks to Moritz and Michael!
o Would not have been possible without the good practice of sharing code among ourcommunity!