sound detection

16
Sound Detection Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004

Upload: vangie

Post on 24-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Sound Detection. Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004. Objective. Learn model of sound object from few (10-20) examples and distinguish from all other sounds Examples of sound classes: Gunshots, screams, laughter, car horns, meow, dog bark, etc. Applications. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sound Detection

Sound Detection

Derek Hoiem

Rahul Sukthankar (mentor)

August 24, 2004

Page 2: Sound Detection

Objective

Learn model of sound object from few (10-20) examples and distinguish from all other sounds

Examples of sound classes: Gunshots, screams, laughter, car horns, meow, dog

bark, etc

Page 3: Sound Detection

Applications

“Tell me if you hear a gunshot.” (monitoring)

“Get me video clips containing dogs barking.” (search and retrieval)

“What’s going on?” (scene understanding)

Page 4: Sound Detection

Why its difficult

Sound classes have large variations

Sounds are often ambiguous without context

Overlaid “noise” obscures sound

Page 5: Sound Detection

Sound or not?

Car horn

Laser gun

Dog bark

Which of these sounds are not from their named classes?

Page 6: Sound Detection

Previous work

Sound Classification (Wold 1996, Casey 2001, etc) Categorize short sound clips Reasonable accuracy (5-20% error)

Sound Detection (Defaux 2000, Piamsa-nga 1999) Localize and recognize sound objects in long clips Poor performance or assumption of unrealistic

conditions (e.g., very quiet background)

Page 7: Sound Detection

Detection via Windowed Search

Long Track

Clip 1

Clip 2

Clip N

Break audio track into short overlapping short clips

Clip Classifier

Independently classify short clips as object or non-object

Return locations of detected sound object

Page 8: Sound Detection

Representation

meows

phone rings

Raw RepresentationTime-frequency analysis: windowed Fourier transform

Extract power percentage in each band over time and total power over time

Features

Features

Features

Features

Compute features used for classification

Page 9: Sound Detection

Classification Features

Diverse feature set:Different sound classes are distinctive

in different waysmeans and standard deviations of

power at different frequenciesBand-width, peaks, loudness, etc.138 features in all

Page 10: Sound Detection

Classification by Decision Trees Try to find simple rules that discriminate object

from non-object Each decision is based on a threshold of a

feature value Assign confidence based on likelihood of data

for object and non-object classes at each leaf node

Decision nodes

Leaf Nodes

Page 11: Sound Detection

Boosted Trees

Problem: One decision tree by itself may not be a great classifier

Solution: Use several trees, with each one focusing on the mistakes of previously learned trees

Adaboost: Weight training data uniformly Learn a decision tree classifier on weighted data Re-weight data giving more weight to incorrectly

classified examples Final classification based on linear combination of

confidences from all learned decision trees

Page 12: Sound Detection

Examples of Decision Trees

Low percentage of power in low frequencies in

mid-time of sound

Very high power amplitude range

Meow Gunshot

High power amplitude range

More complex tree that

focuses on examples

misclassified by tree above

Gunshot

Page 13: Sound Detection

Cascade of Classifiers

Goal: eliminate false positives with few false negatives in early stages

Advantages: Allows use of large set of negative training examples Improves classification speed

Dangers: cannot recover from false negatives

Stage 1Sound Clip Stage 2 Stage 3 Pass

Fail

Pass (5%) Pass (2%) Pass (0.005%)

Fail Fail Fail

Page 14: Sound Detection

Results: Classification Error

Average Error vs Stages in Cascade

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

8.0%

9.0%

10.0%

stage 1 stage 2 stage 3

pos error

neg error

Best Performance

WorstPerformance

  stage 1 stage 2 stages 3

  pos neg pos neg pos neg

meow 0.0% 1.4% 0.0% 1.2% 2.2% 0.8%

phone 0.0% 0.4% 4.3% 0.1% 5.9% 0.0%

car horn 0.0% 3.9% 0.6% 2.2% 3.6% 1.3%

door bell 1.4% 2.1% 2.1% 0.4% 6.3% 0.1%

swords 6.1% 1.3% 6.7% 0.1% 6.7% 0.0%

scream 0.3% 5.5% 2.7% 1.4% 5.3% 1.1%

dog bark 0.7% 1.0% 6.0% 0.3% 7.7% 0.2%

laser gun 0.0% 6.8% 4.4% 5.1% 6.7% 0.9%

explosion 4.1% 5.2% 7.5% 1.5% 12.0% 0.5%

light saber 4.8% 6.8% 9.7% 1.0% 13.9% 0.2%

gunshot 8.1% 6.1% 12.5% 2.3% 14.5% 1.1%

close door 7.9% 7.8% 14.5% 4.8% 17.6% 2.3%

male laugh 4.3% 14.7% 9.5% 9.7% 13.3% 7.0%

average 2.9% 4.4% 6.0% 2.2% 8.5% 1.1%

Page 15: Sound Detection

Results: ROC curves

Note: to approximate negative error rate divide FP by 25,000

Page 16: Sound Detection

Results: Anecdotal

Gunshots Female Laugh Male Laugh

Swords Scream