Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Lecture 14 Event-based Vision
Davide Scaramuzza
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Outline
Motivation
Event-based Cameras: DVS and DAVIS
Generative model
Calibration
Visualization
Life-time estimation
Pose estimation
Current flight maneuvers achieved with onboard cameras are still to slow
compared with those attainable by birds or FPV pilots
FPV-Drone race
To go fast, robots need faster sensors!
At the current state, the agility of a robot is limited by the latency and
temporal discretization of its sensing pipeline.
Currently, the average robot-vision algorithms have latencies of 50-200 ms.
This puts a hard bound on the agility of the platform.
time
frame next frame
command command
latency
computation
temporal discretization
To go fast, robots need faster sensors!
[Censi & Scaramuzza, «Low Latency, Event-based Visual Odometry», ICRA’14]
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Can we create low-latency, low-discretization perception architectures?
Yes...
...if we use a camera where pixels do not spike all at the same time
...in a way as we humans do..
At the current state, the agility of a robot is limited by the latency and
temporal discretization of its sensing pipeline.
Currently, the average robot-vision algorithms have latencies of 50-200 ms.
This puts a hard bound on the agility of the platform.
To go fast, robots need faster sensors!
Human Vision System
130 million photoreceptors
But only 2 million axons!
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Dynamic Vision Sensor (DVS)
Event-based camera developed by Tobi Delbruck’s group (ETH & UZH).
Temporal resolution: 1 μs
High dynamic range: 140 dB
Low transmission bandwidth: ~1MB/s
Low power: 20 mW
Cost: 2,500 EUR
[Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal Contrast
Vision Sensor. 2008]
Image of the solar eclipse (March’15) captured by
a DVS (courtesy of IniLabs)
By contrast, a DVS outputs asynchronous events at microsecond resolution.
An event is generated each time a single pixel detects an intensity changes value
time
events stream
event:
A traditional camera outputs frames at fixed time intervals:
Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal Contrast Vision Sensor. 2008
time
frame next frame
Camera vs DVS
𝑡, 𝑥, 𝑦 , 𝑠𝑖𝑔𝑛𝑑𝐼(𝑥, 𝑦)
𝑑𝑡
sign (+1 or -1): increase or decrease of brightness
timestamp (s) pixel coordinates
V = log 𝐼(𝑡)
DVS Operating Principle [Lichtsteiner, ISCAS’09]
Events are generated any time a single pixel sees a change in brightness larger than 𝐶
𝑂𝑁
𝑂𝐹𝐹 𝑂𝐹𝐹 𝑂𝐹𝐹
𝑂𝑁 𝑂𝑁 𝑂𝑁
𝑂𝐹𝐹 𝑂𝐹𝐹 𝑂𝐹𝐹
∆log 𝐼 ≥ 𝐶
[Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal Contrast Vision Sensor. 2008]
Video with DVS explanation
Camera vs Dynamic Vision Sensor
Camera vs Dynamic Vision Sensor
An event camera is a motion-activated edge detector:
• Naturally responds to edges
• Edges are detected on hardware
The DVS is fast
The DVS is fast
1 frame = 33 ms 1 frame = 1 ms 1 frame = 0.05 ms
[Censi, Delbruck, Scaramuzza, Low-latency localization by active led markers tracking using a dynamic vision sensor. IROS 2013]
Dynamic Vision Sensor (DVS)
Advantages
• low-latency (~1 micro-second)
• high-dynamic range (140 dB instead 60 dB)
• Very low bandwidth (only intensity changes are transmitted): ~200Kb/s
• Low storage capacity, processing time, and power
Disadvantages
• Require totally new vision algorithms
• No intensity information (only binary intensity changes)
• Low image resolution: 240x180 pixels
Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal
Contrast Vision Sensor. 2008
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
High-speed cameras vs DVS
Photron 7,5kHz camera
DVS
Photron Fastcam SA5 Matrix Vision Bluefox DVS
Max fps or measurement
rate
1MHz 90 Hz 1MHz
Resolution at max fps 64x16 pixels
752x480 pixels 240x180 pixels
Bits per pixels 12 bits 8-10 1 bits
Weight 6.2 Kg 30 g 30 g
Active cooling yes No cooling No cooling
Data rate 1.5 GB/s 32MB/s ~1MB/s on average
Power consumption 150 W + llighting 1.4 W 20 mW
Dynamic range n.a. 60 dB 120 dB
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
http://inilabs.com/videos/
16
Applications of event cameras Surveillance: moving object detection and tracking.
Fast close-loop control (robot Goalie, pencil balancer)
Visual Odometry, SLAM, ego-motion estimation (automotive)
High-dynamic range and high resolution image reconstruction
Gesture recognition, face detection
3D reconstruction, 3D scanning
Optical flow estimation
Video compression
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Related Work (1/2)
Event-based Tracking
Conradt et al., ISCAS’09
Drazen, 2011
Mueller et al., ROBIO’11
Censi et al., IROS’13
Delbruck & Lang, Front. Neuros.’13
Lagorce et al., T-NNLS’14
Event-based Optic Flow
Cook et al, IJCNN’ 11
Benosman, T-NNLS’14
Event-based ICP
Ni et al., T-RO’12
Robotic goalie with 3 ms reaction time at 4% CPU load using
event-based dynamic vision sensor [Delbruck & Lang, Frontiers
in Neuroscience, 2013]
Asynchronous Event-Based Multikernel Algorithm for High-
Speed Visual Features Tracking [Lagorce et al., TNNLS’ 14] Event-Based Visual Flow [Benosman, TNNLS’ 14]
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Related Work (1/2)
Conradt, Cook, Berner, Lichtsteiner, Douglas, Delbruck, A pencil balancing robot using a pair of
AER dynamic vision sensors, IEEE International Symposium on Circuits and Systems. 2009
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Related Work (2/2)
Event 6DoF Localization
Mueggler et al., IROS’14
Weikersdorfer et al., ROBIO’12
Event Rotation estimation
Cook et al, IJCNN’ 11
Kim et al, BMVC’15
Event-based Visual Odometry
Censi & Scaramuzza, ICRA’14
Event-based SLAM
Weikersdorfer et al., ICVS’13
Event-based 3D Reconstruction
Carneiro’13
Event-based, 6-DOF Pose Tracking for High-Speed
Maneuvers, [Mueggler et al., IROS’14]
Simultaneous Localization and Mapping for Event-Based
Vision Systems [Weikersdorfer et al., ICVS’13] Event-based 3D reconstruction from
neuromorphic retinas [Carneiro et al., NN’13]
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Related Work: Event-based Tracking
Collision avoidance
Guo, ICM’11
Clady, FNS’ 14
Mueggler, ECMR’13
Estimating absolute intensities
Cook et al, IJCNN’ 11
Kim et al, BMVC’15
HDR panorama & Mosaicing
Kim et al, BMVC’15
Belbachir, CVPRW’14, Schraml, CVPR’15
Interacting Maps for Fast Visual Interpretation [Cook
et al., IJCNN’11]
Towards Evasive Maneuvers with Quadrotors
using Dynamic Vision Sensors [Mueggler et al., ECMR’15]
Simultaneous Mosaicing and Tracking with an
Event Camera [Kim et al., BMVC’15]
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Videos from Tobi Delbruck and Joerg Conradt groups
Videos on DVS demos
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Calibration [IROS’14]
[Mueggler, Huber, Scaramuzza, Event-based, 6-DOF Pose Tracking for High-Speed Maneuvers,
IROS’14]
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Calibration of a DVS [IROS’14]
Standard pinhole camera model still valid (same optics)
Standard passive calibration patterns cannot be used
need to move the camera → inaccurate corner detection
[Mueggler, Huber, Scaramuzza, Event-based, 6-DOF Pose Tracking for High-Speed Maneuvers,
IROS’14]
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Calibration of a DVS [IROS’14]
Standard pinhole camera model still valid (same optics)
Standard passive calibration patterns cannot be used
need to move the camera → inaccurate corner detection
Blinking patterns (computer screen, LEDs)
ROS DVS driver + intrinsic and extrinsic stereo calibration open source:
https://github.com/uzh-rpg/rpg_dvs_ros
[Mueggler, Huber, Scaramuzza, Event-based, 6-DOF Pose Tracking for High-Speed Maneuvers,
IROS’14]
A Simple Optical Flow Algorithm
25
26
A moving edge
Horizontal motion
White pixels become black brightness decrease negative events (in black color)
27
A moving edge The same edge, visualized in space-time. Events are represented by dots
At what speed is the edge moving?
𝑣 =∆𝑥
∆𝑡
∆𝑥
∆𝑡
28
A moving edge
Speed of the edge:
𝑣 =(133 − 101)
(2.89 − 2.547)= 93.3 𝑝𝑖𝑥/𝑠𝑒𝑐
At t = 2.89, X = 133 At t = 2.547, X = 101
29
An Image Reconstruction Algorithm
V = log 𝐼(𝑡)
Recall: events are generated any time a single pixel sees a change in brightness larger
than 𝐶
𝑂𝑁
𝑂𝐹𝐹 𝑂𝐹𝐹 𝑂𝐹𝐹
𝑂𝑁 𝑂𝑁 𝑂𝑁
𝑂𝐹𝐹 𝑂𝐹𝐹 𝑂𝐹𝐹
[Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal Contrast
Vision Sensor. 2008]
[Cook et al., IJCNN’11] [Kim et al., BMVC’15]
The intensity signal at the event time can be reconstructed by integration of ±𝐶
∆log 𝐼 ≥ 𝐶
Image reconstruction
Image reconstruction Given the events and the camera motion (rotation), recover the absolute brightness.
Image reconstruction
Events
Camera motion R(t) solve Poisson Eq.
Image gradient (magnitude and direction)
• Given the events and the camera motion (rotation), recover the absolute brightness.
• How is it possible?
• Intuitive explanation: an event camera naturally responds to edges, hence, if we know the motion, we can relate the events to “world coordinates” to get an edge/gradient map. Then, just integrate the gradient map to get absolute intensity.
Steps: 1. Recover the gradient map of the scene
2. Integrate the gradient to obtain brightness
Image reconstruction. Step 1: compute gradient map
𝒑𝑚
past: 𝑡 − ∆𝑡 present: 𝑡
Event generated due to brightness change of size 𝐶. Let 𝐿 = log 𝐼,
In terms of the brightness map 𝑀(𝑥, 𝑦) (panorama): Using Taylor 1st order approximation:
∆𝐿(𝑡) ≡ 𝐿 𝑡 − 𝐿 𝑡 − ∆𝑡 = 𝐶
𝑀 𝒑𝑚(𝑡) − 𝑀 𝒑𝑚 𝑡 − ∆𝑡 = 𝐶
Brightness map 𝑀
𝑀 𝒑𝑚 𝑡 − 𝑀 𝒑𝑚 𝑡 − ∆𝑡≈ 𝒈 · 𝒗∆𝑡
Displacement: 𝒗∆𝑡 = 𝒑𝑚 𝑡 − 𝒑𝑚 𝑡 − ∆𝑡
With brightness gradient 𝒈 = 𝛻𝑀 𝒑𝑚 𝑡
Image reconstruction. Step 2: Poisson reconstruction
Integrate gradient map 𝒈 to get absolute brightness 𝑀
Original Image
Divergence
Poisson Image Reconstruction
Solve Poisson eq:
Reconstructed Image
Gradient in x direction Gradient in y direction
Fast using the FFT
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
Event-based Vision
Frame-based vs Event-based Vision
Naive solution: accumulate events occurred over a certaint time interval and adapt «standard» CV algorithms.
Drawback: it increases latency
Instead, we want each single event to be used as it comes!
Problems
DVS output is a sequence of asynchrnous events rather than a standard image
Thus, a paradigm shift is needed to deal with its data
P
Generative Model [Censi & Scaramuzza, ICRA’14]
The generative model tells us that the probability that an event is generated depends on the scalar product between the gradient 𝛻𝐼 and the apparent motion 𝐮 ∆𝑡
P(e)∝ 𝛻𝐼, 𝐮 ∆𝑡
C
O
u
v
p
Zc
Xc
Yc
𝛻𝐼
𝐮
[Gallego, Mueggler, Rebecq, Delbruck, Scaramuzza, Event-based Camera Pose Tracking using a Generative Event
Model, Submitted to PAMI]
[Censi & Scaramuzza, Low Latency, Event-based Visual Odometry, ICRA’14]
Robustness to High-speed Motion
[Gallego, Mueggler, Rebecq, Delbruck, Scaramuzza, Event-based Camera Pose Tracking using a Generative Event
Model, Submitted to PAMI]
[Censi & Scaramuzza, Low Latency, Event-based Visual Odometry, ICRA’14]
Application Experiment: Quadrotor Flip (1,200 deg/s)
• Mueggler, Huber, Scaramuzza, Event-based, 6-DOF Pose Tracking for High-Speed Maneuvers, IROS, 2014.
• Mueggler, Baumli, Fontana, Scaramuzza, Towards Evasive Maneuvers with Quadrotors using Dynamic Vision
Sensors, European Conference on Mobile Robots (ECMR), Lincoln, 2015.
Robustness to High-Dynamic Range Scenes
Rebecq, gallego, Scaramuzza, BMVC’16 : EMVS: Event-based Multi-View Stereo, Best Industry Paper Award
EVO: Geometric Approach to 6-DoF Event-based
Tracking and Mapping in Real Time Video: https://youtu.be/bYqD2qZJlxE
Paper: http://rpg.ifi.uzh.ch/docs/RAL16_EVO.pdf IEEE Robotics and Automation Letter, 2016
41
EVO: Event-based Tracking and Mapping
Conclusions
Standard Cameras
• VO & SLAM theory for standard cameras is well established
• Biggest challenges today are reliability and robustness • High-dynamic-range scenes
• High-speed motion
• Low-texture scenes
Event cameras open enormous possibilities!
• Standard cameras have been studied for 50 years!
• Very robust to high speed motion and high-dynamic-range scenes
Current challenges
• Mapping still limited to circular movements (however,)
• Characterization and modeling of the sensor noise and non-idealities (biases and other dynamic properties)
Recap
DVS: revolutionary sensor for robotics:
1. low-latency (~1 micro-second)
2. high-dynamic range (120 dB instead 60 dB)
3. Very low bandwidth (only intensity changes are transmitted)
Possible future sensing architecture:
DAVIS sensor (Dynamic and Active-pixel Vision Sensor)
• Combines an event sensor (DVS) and a standard camera in the same pixel array
• Output: frames (at 24 Hz) and events (asynchronous)
[Brandli et al. A 240x180 130dB 3us latency global shutter spatiotemporal vision sensor. IEEE JSSC, 2014]
Newer sensor: standard camera and Dynamic Vision Sensor
DVS events time
CMOS frames
Elias Mueggler – Robotics and Perception Group 46
Event-based Feature Tracking [IROS’16]
Extract Harris corners on images
Track corners using event-based Iterative Closest Points (ICP)
IROS’16 : Event-based Feature Tracking for Low-latency Visual Odometry
Elias Mueggler – Robotics and Perception Group 47
Event-based, Sparse Visual Odometry [IROS’16]
IROS’16 : Event-based Feature Tracking for Low-latency Visual Odometry
Release of the first Event-Camera Dataset and Simulator for SLAM
Joint work with Tobi Delbruck
http://rpg.ifi.uzh.ch/davis_data.html
Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch
References for the Sensors
DVS
P. Lichtsteiner, C. Posch, T. Delbruck: A 128×128 120dB 15us Latency Asynchronous Temporal Contrast Vision
Sensor. IEEE Journal of Solid State Circuits, 2008 (PDF)
DAVIS
Brandli, Berner, Yang, Liu, Delbruck: A 240×180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision
Sensor, IEEE Journal of Solid-State Circuits, 2014.
BOOK
Event-based Neuromorphic Systems, Edited by S.C. Liu, T. Delbruck, G. Indiveri, Whatley, R. Douglas,
Wiley, 2014
Shih-Chii Liu Tobi Delbruck Christian Braendli Minhao Yang