Download - Lecture 14 Event-based Vision - Davide Scaramuzzarpg.ifi.uzh.ch/docs/teaching/2016/14_event_based_vision.pdf · Davide Scaramuzza - University of Zurich – Robotics and Perception

Davide Scaramuzza - University of Zurich – Robotics and Perception Group - rpg.ifi.uzh.ch

Lecture 14 Event-based Vision

Davide Scaramuzza


Outline

Motivation

Event-based Cameras: DVS and DAVIS

Generative model

Calibration

Visualization

Life-time estimation

Pose estimation

Current flight maneuvers achieved with onboard cameras are still to slow

compared with those attainable by birds or FPV pilots

FPV-Drone race

To go fast, robots need faster sensors!

At the current state, the agility of a robot is limited by the latency and

temporal discretization of its sensing pipeline.

Currently, the average robot-vision algorithms have latencies of 50-200 ms.

This puts a hard bound on the agility of the platform.

time

frame next frame

command command

latency

computation

temporal discretization


[Censi & Scaramuzza, «Low Latency, Event-based Visual Odometry», ICRA’14]


Can we create low-latency, low-discretization perception architectures?

Yes...

...if we use a camera where pixels do not spike all at the same time

...in a way as we humans do..

At the current state, the agility of a robot is limited by the latency and

temporal discretization of its sensing pipeline.

Currently, the average robot-vision algorithms have latencies of 50-200 ms.

This puts a hard bound on the agility of the platform.


Human Vision System

130 million photoreceptors

But only 2 million axons!


Dynamic Vision Sensor (DVS)

Event-based camera developed by Tobi Delbruck’s group (ETH & UZH).

Temporal resolution: 1 μs

High dynamic range: 140 dB

Low transmission bandwidth: ~1MB/s

Low power: 20 mW

Cost: 2,500 EUR

[Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal Contrast

Vision Sensor. 2008]

Image of the solar eclipse (March’15) captured by

a DVS (courtesy of IniLabs)

http://dx.doi.org/10.1109/JSSC.2007.914337

By contrast, a DVS outputs asynchronous events at microsecond resolution.

An event is generated each time a single pixel detects an intensity changes value

time

events stream

event:

A traditional camera outputs frames at fixed time intervals:

Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal Contrast Vision Sensor. 2008

time

frame next frame

Camera vs DVS

𝑡, 𝑥, 𝑦 , 𝑠𝑖𝑔𝑛𝑑𝐼(𝑥, 𝑦)

𝑑𝑡

sign (+1 or -1): increase or decrease of brightness

timestamp (s) pixel coordinates


V = log 𝐼(𝑡)

DVS Operating Principle [Lichtsteiner, ISCAS’09]

Events are generated any time a single pixel sees a change in brightness larger than 𝐶

𝑂𝑁

𝑂𝐹𝐹 𝑂𝐹𝐹 𝑂𝐹𝐹

𝑂𝑁 𝑂𝑁 𝑂𝑁


∆log 𝐼 ≥ 𝐶

[Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal Contrast Vision Sensor. 2008]


Video with DVS explanation

Camera vs Dynamic Vision Sensor

Camera vs Dynamic Vision Sensor

An event camera is a motion-activated edge detector:

• Naturally responds to edges

• Edges are detected on hardware

The DVS is fast

The DVS is fast

1 frame = 33 ms 1 frame = 1 ms 1 frame = 0.05 ms

[Censi, Delbruck, Scaramuzza, Low-latency localization by active led markers tracking using a dynamic vision sensor. IROS 2013]

Dynamic Vision Sensor (DVS)

Advantages

• low-latency (~1 micro-second)

• high-dynamic range (140 dB instead 60 dB)

• Very low bandwidth (only intensity changes are transmitted): ~200Kb/s

• Low storage capacity, processing time, and power

Disadvantages

• Require totally new vision algorithms

• No intensity information (only binary intensity changes)

• Low image resolution: 240x180 pixels

Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal

Contrast Vision Sensor. 2008



High-speed cameras vs DVS

Photron 7,5kHz camera

DVS

Photron Fastcam SA5 Matrix Vision Bluefox DVS

Max fps or measurement

rate

1MHz 90 Hz 1MHz

Resolution at max fps 64x16 pixels

752x480 pixels 240x180 pixels

Bits per pixels 12 bits 8-10 1 bits

Weight 6.2 Kg 30 g 30 g

Active cooling yes No cooling No cooling

Data rate 1.5 GB/s 32MB/s ~1MB/s on average

Power consumption 150 W + llighting 1.4 W 20 mW

Dynamic range n.a. 60 dB 120 dB

http://www.photron.com/index.php?cmd=product_general&product_id=7&product_name=FASTCAM+SA5




http://inilabs.com/videos/

16

Applications of event cameras Surveillance: moving object detection and tracking.

Fast close-loop control (robot Goalie, pencil balancer)

Visual Odometry, SLAM, ego-motion estimation (automotive)

High-dynamic range and high resolution image reconstruction

Gesture recognition, face detection

3D reconstruction, 3D scanning

Optical flow estimation

Video compression





Related Work (1/2)

Event-based Tracking

Conradt et al., ISCAS’09

Drazen, 2011

Mueller et al., ROBIO’11

Censi et al., IROS’13

Delbruck & Lang, Front. Neuros.’13

Lagorce et al., T-NNLS’14

Event-based Optic Flow

Cook et al, IJCNN’ 11

Benosman, T-NNLS’14

Event-based ICP

Ni et al., T-RO’12

Robotic goalie with 3 ms reaction time at 4% CPU load using

event-based dynamic vision sensor [Delbruck & Lang, Frontiers

in Neuroscience, 2013]

Asynchronous Event-Based Multikernel Algorithm for High-

Speed Visual Features Tracking [Lagorce et al., TNNLS’ 14] Event-Based Visual Flow [Benosman, TNNLS’ 14]


Related Work (1/2)

Conradt, Cook, Berner, Lichtsteiner, Douglas, Delbruck, A pencil balancing robot using a pair of

AER dynamic vision sensors, IEEE International Symposium on Circuits and Systems. 2009


Related Work (2/2)

Event 6DoF Localization

Mueggler et al., IROS’14

Weikersdorfer et al., ROBIO’12

Event Rotation estimation


Kim et al, BMVC’15

Event-based Visual Odometry

Censi & Scaramuzza, ICRA’14

Event-based SLAM

Weikersdorfer et al., ICVS’13

Event-based 3D Reconstruction

Carneiro’13

Event-based, 6-DOF Pose Tracking for High-Speed

Maneuvers, [Mueggler et al., IROS’14]

Simultaneous Localization and Mapping for Event-Based

Vision Systems [Weikersdorfer et al., ICVS’13] Event-based 3D reconstruction from

neuromorphic retinas [Carneiro et al., NN’13]


Related Work: Event-based Tracking

Collision avoidance

Guo, ICM’11

Clady, FNS’ 14

Mueggler, ECMR’13

Estimating absolute intensities



HDR panorama & Mosaicing


Belbachir, CVPRW’14, Schraml, CVPR’15

Interacting Maps for Fast Visual Interpretation [Cook

et al., IJCNN’11]

Towards Evasive Maneuvers with Quadrotors

using Dynamic Vision Sensors [Mueggler et al., ECMR’15]

Simultaneous Mosaicing and Tracking with an

Event Camera [Kim et al., BMVC’15]


Videos from Tobi Delbruck and Joerg Conradt groups

Videos on DVS demos


Calibration [IROS’14]

[Mueggler, Huber, Scaramuzza, Event-based, 6-DOF Pose Tracking for High-Speed Maneuvers,

IROS’14]


Calibration of a DVS [IROS’14]

Standard pinhole camera model still valid (same optics)

Standard passive calibration patterns cannot be used

need to move the camera → inaccurate corner detection


IROS’14]


Calibration of a DVS [IROS’14]

Standard pinhole camera model still valid (same optics)

Standard passive calibration patterns cannot be used

need to move the camera → inaccurate corner detection

Blinking patterns (computer screen, LEDs)

ROS DVS driver + intrinsic and extrinsic stereo calibration open source:

https://github.com/uzh-rpg/rpg_dvs_ros


IROS’14]




A Simple Optical Flow Algorithm

25

26

A moving edge

Horizontal motion

White pixels become black brightness decrease negative events (in black color)

27

A moving edge The same edge, visualized in space-time. Events are represented by dots

At what speed is the edge moving?

𝑣 =∆𝑥

∆𝑡

∆𝑥

∆𝑡

28

A moving edge

Speed of the edge:

𝑣 =(133 − 101)

(2.89 − 2.547)= 93.3 𝑝𝑖𝑥/𝑠𝑒𝑐

At t = 2.89, X = 133 At t = 2.547, X = 101

29

An Image Reconstruction Algorithm

V = log 𝐼(𝑡)

Recall: events are generated any time a single pixel sees a change in brightness larger

than 𝐶

𝑂𝑁


𝑂𝑁 𝑂𝑁 𝑂𝑁


[Lichtsteiner, Posch, Delbruck. A 128x128 120 dB 15µs Latency Asynchronous Temporal Contrast

Vision Sensor. 2008]

[Cook et al., IJCNN’11] [Kim et al., BMVC’15]

The intensity signal at the event time can be reconstructed by integration of ±𝐶

∆log 𝐼 ≥ 𝐶

Image reconstruction


Image reconstruction Given the events and the camera motion (rotation), recover the absolute brightness.

Image reconstruction

Events

Camera motion R(t) solve Poisson Eq.

Image gradient (magnitude and direction)

• Given the events and the camera motion (rotation), recover the absolute brightness.

• How is it possible?

• Intuitive explanation: an event camera naturally responds to edges, hence, if we know the motion, we can relate the events to “world coordinates” to get an edge/gradient map. Then, just integrate the gradient map to get absolute intensity.

Steps: 1. Recover the gradient map of the scene

2. Integrate the gradient to obtain brightness

Image reconstruction. Step 1: compute gradient map

𝒑𝑚

past: 𝑡 − ∆𝑡 present: 𝑡

Event generated due to brightness change of size 𝐶. Let 𝐿 = log 𝐼,

In terms of the brightness map 𝑀(𝑥, 𝑦) (panorama): Using Taylor 1st order approximation:

∆𝐿(𝑡) ≡ 𝐿 𝑡 − 𝐿 𝑡 − ∆𝑡 = 𝐶

𝑀 𝒑𝑚(𝑡) − 𝑀 𝒑𝑚 𝑡 − ∆𝑡 = 𝐶

Brightness map 𝑀

𝑀 𝒑𝑚 𝑡 − 𝑀 𝒑𝑚 𝑡 − ∆𝑡≈ 𝒈 · 𝒗∆𝑡

Displacement: 𝒗∆𝑡 = 𝒑𝑚 𝑡 − 𝒑𝑚 𝑡 − ∆𝑡

With brightness gradient 𝒈 = 𝛻𝑀 𝒑𝑚 𝑡

Image reconstruction. Step 2: Poisson reconstruction

Integrate gradient map 𝒈 to get absolute brightness 𝑀

Original Image

Divergence

Poisson Image Reconstruction

Solve Poisson eq:

Reconstructed Image

Gradient in x direction Gradient in y direction

Fast using the FFT


Event-based Vision

Frame-based vs Event-based Vision

Naive solution: accumulate events occurred over a certaint time interval and adapt «standard» CV algorithms.

Drawback: it increases latency

Instead, we want each single event to be used as it comes!

Problems

DVS output is a sequence of asynchrnous events rather than a standard image

Thus, a paradigm shift is needed to deal with its data

P

Generative Model [Censi & Scaramuzza, ICRA’14]

The generative model tells us that the probability that an event is generated depends on the scalar product between the gradient 𝛻𝐼 and the apparent motion 𝐮 ∆𝑡

P(e)∝ 𝛻𝐼, 𝐮 ∆𝑡

C

O

u

v

p

Zc

Xc

Yc

𝛻𝐼

𝐮

[Gallego, Mueggler, Rebecq, Delbruck, Scaramuzza, Event-based Camera Pose Tracking using a Generative Event

Model, Submitted to PAMI]

[Censi & Scaramuzza, Low Latency, Event-based Visual Odometry, ICRA’14]

Robustness to High-speed Motion

[Gallego, Mueggler, Rebecq, Delbruck, Scaramuzza, Event-based Camera Pose Tracking using a Generative Event

Model, Submitted to PAMI]

[Censi & Scaramuzza, Low Latency, Event-based Visual Odometry, ICRA’14]

Application Experiment: Quadrotor Flip (1,200 deg/s)

• Mueggler, Huber, Scaramuzza, Event-based, 6-DOF Pose Tracking for High-Speed Maneuvers, IROS, 2014.

• Mueggler, Baumli, Fontana, Scaramuzza, Towards Evasive Maneuvers with Quadrotors using Dynamic Vision

Sensors, European Conference on Mobile Robots (ECMR), Lincoln, 2015.

Robustness to High-Dynamic Range Scenes

Rebecq, gallego, Scaramuzza, BMVC’16 : EMVS: Event-based Multi-View Stereo, Best Industry Paper Award

EVO: Geometric Approach to 6-DoF Event-based

Tracking and Mapping in Real Time Video: https://youtu.be/bYqD2qZJlxE

Paper: http://rpg.ifi.uzh.ch/docs/RAL16_EVO.pdf IEEE Robotics and Automation Letter, 2016

41

https://youtu.be/bYqD2qZJlxE



http://rpg.ifi.uzh.ch/docs/RAL16_EVO.pdf



EVO: Event-based Tracking and Mapping

Conclusions

Standard Cameras

• VO & SLAM theory for standard cameras is well established

• Biggest challenges today are reliability and robustness • High-dynamic-range scenes

• High-speed motion

• Low-texture scenes

Event cameras open enormous possibilities!

• Standard cameras have been studied for 50 years!

• Very robust to high speed motion and high-dynamic-range scenes

Current challenges

• Mapping still limited to circular movements (however,)

• Characterization and modeling of the sensor noise and non-idealities (biases and other dynamic properties)

Recap

DVS: revolutionary sensor for robotics:

1. low-latency (~1 micro-second)

2. high-dynamic range (120 dB instead 60 dB)

3. Very low bandwidth (only intensity changes are transmitted)

Possible future sensing architecture:

DAVIS sensor (Dynamic and Active-pixel Vision Sensor)

• Combines an event sensor (DVS) and a standard camera in the same pixel array

• Output: frames (at 24 Hz) and events (asynchronous)

[Brandli et al. A 240x180 130dB 3us latency global shutter spatiotemporal vision sensor. IEEE JSSC, 2014]

Newer sensor: standard camera and Dynamic Vision Sensor

DVS events time

CMOS frames


Elias Mueggler – Robotics and Perception Group 46

Event-based Feature Tracking [IROS’16]

Extract Harris corners on images

Track corners using event-based Iterative Closest Points (ICP)

IROS’16 : Event-based Feature Tracking for Low-latency Visual Odometry

Elias Mueggler – Robotics and Perception Group 47

Event-based, Sparse Visual Odometry [IROS’16]

IROS’16 : Event-based Feature Tracking for Low-latency Visual Odometry

Release of the first Event-Camera Dataset and Simulator for SLAM

Joint work with Tobi Delbruck

http://rpg.ifi.uzh.ch/davis_data.html

http://rpg.ifi.uzh.ch/davis_data.html


References for the Sensors

DVS

P. Lichtsteiner, C. Posch, T. Delbruck: A 128×128 120dB 15us Latency Asynchronous Temporal Contrast Vision

Sensor. IEEE Journal of Solid State Circuits, 2008 (PDF)

DAVIS

Brandli, Berner, Yang, Liu, Delbruck: A 240×180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision

Sensor, IEEE Journal of Solid-State Circuits, 2014.

BOOK

Event-based Neuromorphic Systems, Edited by S.C. Liu, T. Delbruck, G. Indiveri, Whatley, R. Douglas,

Wiley, 2014

Shih-Chii Liu Tobi Delbruck Christian Braendli Minhao Yang

https://www.ini.uzh.ch/~tobi/wiki/lib/exe/fetch.php?media=lichtsteiner_dvs_jssc08.pdf

Download - Lecture 14 Event-based Vision - Davide Scaramuzzarpg.ifi.uzh.ch/docs/teaching/2016/14_event_based_vision.pdf · Davide Scaramuzza - University of Zurich – Robotics and Perception

Top Related