taking’computer’vision’ into’the’wild’ · 2011-10-05 · taking’computer’vision’...

37
Taking Computer Vision Into The Wild Neeraj Kumar October 4, 2011 CSE 590V – Fall 2011 University of Washington

Upload: others

Post on 18-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Taking  Computer  Vision  Into  The  Wild  

Neeraj  Kumar  

October  4,  2011  CSE  590V  –  Fall  2011  

University  of  Washington  

Page 2: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

A  Joke  

Q.  What  is  computer  vision?  

A.  If  it  doesn’t  work  (in  the  wild),  it’s  computer  vision.  

(I’m  only  half-­‐joking)  

Page 3: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Instant  Object  RecogniWon  Paper*  

Page 4: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Instant  Object  RecogniWon  Paper*  

1.  Design  new  algorithm  

2.  Pick  dataset(s)  to  evaluate  on  3.  Repeat  unWl  conference  deadline:  

a.  Train  classifiers  b.  Evaluate  on  test  set  c.  Tune  parameters  and  tweak  algorithm  

4.  Brag  about  results  with  ROC  curves  

-­‐  Fixed  set  of  training  examples  

-­‐  Fixed  set  of  classes/objects  

*Just  add  grad  students  

-­‐  Training  examples  only  have  one  object,  oaen  in  center  of  image  -­‐  Fixed  test  set,  usually  from  same  overall  dataset  as  training  

-­‐  MTurk  filtering,  pruning  responses,  long  training  Wmes,  …  

-­‐  How  does  it  do  on  real  data?  New  classes?  

Page 5: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Instant  Object  RecogniWon  Paper  

1.  User  proposes  new  object  class  2.  System  gathers  images  from  flickr  

3.  Repeat  unWl  convergence:  a.  Choose  windows  to  label  b.  Get  labels  from  MTurk  c.  Improve  classifier  (detector)  

4.  Also  evaluate  on  Pascal  VOC  

[S.  Vijayanarasimhan  &  K.  Grauman  –  Large-­‐Scale  Live  AcWve  Learning:  

Training  Object  Detectors  with  Crawled  Data  and  Crowds  (CVPR  2011)]  

-­‐  Which  windows  to  pick?  

-­‐  Which  images  to  label?  

-­‐  What  representaWon?  

-­‐  How  does  it  compare  to  state  of  the  art?  

Page 6: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Object  RepresentaWon  

Root  from  here  

Deformable  Parts:  Root  +  Parts  +  Context  

P=6  parts,  from  bootstrap  set  

C=3  context  windows,  excluding  object  candidate,  defined  to  the  lea,  right,  above    

Page 7: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Features:  Sparse  Max  Pooling  

Bag  of  Words   Sparse  Max  Pooling  Base  features   SIFT   SIFT  

Build  vocabulary  tree   ✔   ✔  

QuanWze  features   Nearest  neighbor,  hard  decision  

Weighted  nearest  neighbors,  sparse  coded  

Aggregate  features   SpaWal  pyramid   Max  pooling  

[Y.-­‐L.  Boureau,  F.  Bach,  Y.  LeCun,  J.  Ponce  –  Learning  Mid-­‐level  Features  for  RecogniWon  (CVPR  2010]  

[J.  Yang,  K.  Yu,  Y.  Gong,  T.  Huang  –  Linear  SpaWal  Pyramid  Matching  Sparse  Coding  for  Image  ClassificaWon  (CVPR  2009)]  

Page 8: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

How  to  Generate  Root  Windows?  

100,000s  of  possible  locaWons,  aspect  raWos,  sizes  

1000s  of  images  X  

=  too  many  possibiliWes!  

Page 9: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Jumping  Windows  Training  Image   Novel  Query  Image  

•  Build  lookup  table  of  how  frequently  given  feature  in  a  grid  cell  predicts  bounding  box  

•  Use  lookup  table  to  vote  for  candidate  windows  in  query  image  a  la  generalized  Hough  transform  

Page 10: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Pick  Examples  via  Hyperplane  Hashing  

[P.  Jain,  S.  Vijayanarasimhan  &  K.  Grauman  –  Hashing  Hyperplane  Queries  to  

Near  Points  with  ApplicaWons  to  Large-­‐Scale  AcWve  Learning  (NIPS  2010)]  

•  Want  to  label  “hard”  examples                near  the  hyperplane  boundary  

•  But  hyperplane  keeps  changing,  so  have  to  recompute  distances…  

•  Instead,  hash  all  unlabeled  examples  into  table  

•  At  run-­‐Wme,  hash  current  hyperplane  to  get  index  into  table,  to  pick  examples  close  to  it  

Page 11: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Comparison  on  Pascal  VOC  

•  Comparable  to  state-­‐of-­‐the-­‐art,  beuer  on  few  classes  

•  Many  fewer  annotaWons  required!  •  Training  Wme  is  15mins  vs  7  hours  (LSVM)  vs  1  week  (SP+MKL)  

Page 12: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Online  Live  Learning  for  Pascal  

•  Comparable  to  state-­‐of-­‐the-­‐art,  beuer  on  fewer  classes  

•  But  using  flickr  data  vs.  Pascal  data,  and  automaWcally  

Page 13: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Sample  Results  Co

rrect  

Incorrect  

Page 14: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Lessons  Learned  •  It  is  possible  to  leave  the  sandbox  •  And  sWll  do  well  on  sandbox  evaluaWons  

•  Sparse  max  pooling  with  a  part  model  works  well  

•  Linear  SVMs  can  be  compeWWve  with  these  features  

•  Jumping  windows  is  MUCH  faster  than  sliding  

•  Picking  examples  to  get  labeled  is  a  big  win  

•  Linear  SVMs  also  allow  for  fast  hyperplane  hashing  

Page 15: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

LimitaWons  

“Hell  is  other  people”  

Jean-­‐Paul  Sartre  

users  

With  apologies  to    

Page 16: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’
Page 17: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

It  doesn’t  work  well  enough  

Solving  Real  Problems  for  Users  

Users  want  to  do  stuff  

Users  express  their  displeasure  gracefully  *With  apologies  to  John  Gabriel  

Page 18: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

…And  Never  The  Twain  Shall  Meet?  

Pascal  VOC  Results  from  Previous  Paper  

0  

10  

20  

30  

40  

50  

60  

70  

80  

90  

100  

bicyc.   bird   boul   car   chair   dinin.   horse   person   poue.   sofa   tvmon.   Mean  

Ours�   BoF  SP�   LLC  SP�   LSVM+HOG   SP+MKL  

Current  Best  of  Vision  Algorithms  

User  ExpectaWon  

?  

Page 19: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

SegmentaWon  

DetecWon  Shape  EsWmaWon  

Stereo  

Tracking  Geometry  

Simplify  Problem!  

Unsolved  Vision  Problems  

Page 20: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Columbia  University  

University  of  Maryland  

Smithsonian  InsWtuWon  

Page 21: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’
Page 22: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Easier  SegmentaWon  for  Leafsnap  

Page 23: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Plants  vs  Birds  

2d   3d  

Doesn’t  move   Moves  

Okay  to  pluck  from  tree   Not  okay  to  pluck  from  tree  

Mostly  single  color   Many  colors  

Very  few  parts   Many  parts  

Adequately  described  by  boundary   Not  well  described  by  boundary  

RelaWvely  easy  to  segment   Hard  to  segment  

Page 24: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Human-­‐Computer  CooperaWon  

[S.  Branson,  C.  Wah,  F.  Schroff,  B.  Babenko,  P.  Welinder,  P.  Perona,  S.  Belongie  –  Visual  RecogniWon  with  Humans  in  the  Loop  (ECCV  2010)]  

What  color  is  it?  

Red!  

Where’s  the  beak?  

Top-­‐right!  

Where’s  the  tail?  

Bouom-­‐lea!  

Describe  its  beak  

Uh,  it’s  pointy?  

Where  is  it?  

Okay.  

Bouom-­‐lea!  

<Shape  descriptor>  

Page 25: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

20  QuesWons  

hup://20q.net/  

Page 26: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

InformaWon  Gain  for  20Q  

Pick  most  informaWve  quesWon  to  ask  next  

Expected  informaWon  gain  of  class  c,  given  image  &  previous  responses  

Probability  of  ge|ng  response  ui,  given  image  &  previous  responses  

Entropy  of  class  c,  given  image  and  possible  new  response  ui  Entropy  of  class  c  right  now  

Page 27: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Answers  make  distribuWon  peakier  

Page 28: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

IncorporaWng  Computer  Vision  

Probability  of  class  c,  given  image  and  any  set  of  responses  

Bayes’  rule  

Assume  variaWons  in  user  responses  are  NOT  image-­‐dependent  

ProbabiliWes  affect  entropies!  

Page 29: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

IncorporaWng  Computer  Vision…  

…leads  to  different  quesWons  

Page 30: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Ask  for  User  Confidences  

Page 31: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Modeling  User  Responses  is  EffecWve!  

Page 32: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Birds-­‐200  Dataset  

hup://www.vision.caltech.edu/visipedia/CUB-­‐200.html  

Page 33: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Results  

Page 34: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Results  

With  fewer  quesWons,  CV  does  beuer  With  more  quesWons,  humans  do  beuer  

Page 35: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Lessons  Learned  •  Computer  vision  is  not  (yet)  good  enough  for  users  

•  But  users  can  meet  vision  halfway  

•  Minimizing  user  effort  is  key!  

•  Users  are  not  to  be  trusted  (fully)  •  Adding  vision  improves  recogniWon  

•  For  fine-­‐scale  categorizaWon,  auributes  do  beuer  than  1-­‐vs-­‐all  classifiers  if  there  are  enough  of  them  

Classifier  200  

(1-­‐vs-­‐all)  288  aur.   100  aur.   50  aur.   20  aur.   10  aur.  

Avg  #  QuesWons  

6.43   6.72   7.01   7.67   8.81   9.52  

Page 36: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

LimitaWons  •  Real  system  sWll  requires  much  human  effort  

•  Only  birds  •  CollecWng  and  labeling  data  •  Crowdsourcing?  •  Experts?  

•  Building  usable  system  

•  Minimizing  

Page 37: Taking’Computer’Vision’ Into’The’Wild’ · 2011-10-05 · Taking’Computer’Vision’ Into’The’Wild’ Neeraj’Kumar’ October4,2011 CSE’590V’–Fall’2011’

Visipedia  

hup://www.vision.caltech.edu/visipedia/