beyond datasets: learning in a fully-labeled real world thesis proposal alexander sorokin
Post on 23-Dec-2015
221 Views
Preview:
TRANSCRIPT
Task
Amazon Mechanical Turk
Is this a dog?o Yeso No
Workers
Answer: Yes
Task: Dog?
Pay: $0.01Broker
www.mturk.com
$0.01
Select examples
Joint work with Tamara and Alex Berghttp://vision.cs.uiuc.edu/annotation/data/simpleevaluation/html/horse.html
Outline something
$0.01http://vision.cs.uiuc.edu/annotation/results/production-3-2/results_page_013.html
Data from Ramanan NIPS06
Annotation language
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
How good are the annotations?
Submission is Volume Action Redo
Empty 6% Reject yes
Clearly bad 2% Reject yes
Almost good 4% Accept (pay) yes
Good 88% Accept (pay) no
Task: label people, box+14pts; Volume 3078 HITs
2. Require qualification
Please read the detailed instructions to learn how to perform the task. Please confirm that you understand the instructions by answering the following questions:
Which of the following checboxes are correct for this annotation?
No people (there are people in the image)
> 20 people (there are less than 20 people of appropriate size)
Small heads (there are unmarked small heads in the image)
Task: Put a box around every head
Annotation Method Comparison
Approach Cost Scale Setupeffort
Collaborative Quality Directed Central Elastic to $
MTurk $ +++ * no +/+++ Yes no +++++
GWAP ++++ *** no + Yes Yes +
LabelME ++ Yes ++ no Yes
ImageParsing $$ ++ ** no ++++ Yes Yes +++
In house $$$ + * no +++ Yes no ++
Why is it important
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Challenges
Lighting conditions
Background clutter
Lighting and background are known
Within-class variability Viewpoint changesInternal deformations
100 000 categories How many instances?10s billions total10 000 locally
1000 examples per category 1-10 labels per object
Single image Rich sensor data
PR2 Sensing capabilities
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Preliminary learning results
UChicago-VOC2008-person
Standard deviation tableAccuracy Sigma
% % 10 40 100 300 100099 9.95 3.15 1.57 0.99 0.57 0.3190 30.00 9.49 4.74 3.00 1.73 0.9560 48.99 15.49 7.75 4.90 2.83 1.5550 50.00 15.81 7.91 5.00 2.89 1.5830 45.83 14.49 7.25 4.58 2.65 1.4510 30.00 9.49 4.74 3.00 1.73 0.951 9.95 3.15 1.57 0.99 0.57 0.31
Sigma(%) at N samples
Experimental results
Detector confidence levelNumber of detects
Number of samplesPrecision
3 sigmaRecall 0.1464 0.09774 0.2482 0.16565 0.3861 0.2577 0.6144 0.41008 0.8216 0.54834
Estimate for number of positives
@78K @118K @78K @118K @78K @118K @78K @118K @78K @118K
0.0234 0.0327 0.0275 0.0305
20%-50%
2000
50%-100%14000 14000 28011 84033 140055
5% 5%-10% 10%-20%
0.0126
9960.8239 0.5724 0.3878 0.214 0.11658278 4008 1996
Acknowledgments
Special thanks to:David Forsyth
Nicolas Loeff, Ali Farhadi, Du Tran, Ian Endres Tamara Berg, Pushmeet KohliDolores Labs (Lukas Biewald)Willow Garage (Gary Bradsky, Alex Teichman, Daniel Munos, …)
All workers at Amazon Mechanical Turk
This work was supported in part by the National Science Foundation under IIS - 0534837 and in part by the Office of Naval Research under N00014-01-1-0890 as part of the MURI program.
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect those of the National Science Foundation or the Office of Naval Research.
PR2 Platform
• 2 Laser scannersFixed and Tilting
• 7 cameras2 stereo pairs,1 hires (5mpx)2 in the arms
• Structured light• 16 cores, 48 GB RAM• 2 Arms
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
What are datasets good for?
• Training– The data is fully labeled
• Evaluation
• Tweaking the parameters– Performance is computed automatically
• Comparing algorithms– “They run on exact same data”
Why are datasets bad?
• Data sampling and labeling bias
• Small changes in performance are insignificant
• Parameter tweaking doesn’t generalize
• Overfitting to the datasets
• Datasets should be discarded after performance is measured
top related