self-supervised learning for video correspondence flovgg/publications/2019/lai19/poster.pdf ·...
TRANSCRIPT
● 1. Color dropout as augmentation
● 2. Cycle-consistency + Scheduled sampling
● 3. Restricted attention for higher resolution.
Consequences: longer tracks, reduced drift
> Oral session (Video Analysis): 13:00 - 13:15 Thursday 12th> Paper, Code, Pretrained model available for download. Checkout:
Self-supervised Learning for Video Correspondence FlowZihang Lai, Weidi Xie
VGG, University of Oxford
The objective of this paper is self-supervised learning of matching correspondences along videos, which we term correspondence flow. Learning only from unlabeled videos, we propose to train a “pointer” that reconstructs a target frame by copying pixels from a reference frame.
Introduction
Our correspondences could be used to propagate many entities (e.g. segmentation masks, keypoints) along a video sequence.
Qualitative results on DAVIS and JMHDB
What to do with correspondence?
Frame t(Only R) model
Frame t+1(RGB)
Frame t(RGB) model
Frame t+1(RGB)
Trai
ning
Test
ing
A feature extractor that produce embeddings suitable for matching correspondences.
What to learn?
Objective: Learning pixel correspondence in videos without annotations!
framet
framet+1 featuret+1
featuretMatching Feature
extractor
How to learn?
We outperform existing self-supervised learning approaches by a significant margin.
Results
Method Supervised J & F (Mean)
Optical Flow ✗ 26.0
Vondrick et al. ✗ 34.0
CycleTime ✗ 40.7
Ours ✗ 49.5
SiamMask ✓ 53.1
OSVOS ✓ 60.3
Method Supervised J&F (Mean)
Optical Flow ✗ 26.0
Vondrick et al. ✗ 34.0
Wang et al. ✗ 40.7
Ours ✗ 49.5
OSVOS ✓ 60.3
Method Supervised PCK @.1
Optical Flow ✗ 49.0
Vondrick et al. ✗ 45.2
Wang et al. ✗ 57.7
Ours ✗ 58.5
ImageNet ✓ 58.4
Video segmentation(DAVIS-2017)
Keypoint tracking(JHMDB dataset)
Find more...
Frame t(Only R) model
Frame t+1(RGB)
Search region
Reference frame Target frame