mmai 2014 final

15
Naive Soul Guardian Bloody Scenes Detection with Deep Convolutional Neural Network B99902080 李李李 R03944007 李李李

Upload: ryan-chang

Post on 09-Feb-2017

33 views

Category:

Technology


0 download

TRANSCRIPT

Naive Soul Guardian Bloody Scenes Detection with Deep Convolutional Neural Network

Naive Soul GuardianBloody Scenes Detection with Deep Convolutional Neural NetworkB99902080 R03944007

OutlineMotivationSystem OverviewConvolutional Neural NetworkFully-Convolutional NetPixelationExperimentFuture WorkReferenceDemo1

Schedule guidance

MotivationLots of videos contain bloody scenes, we want to protect kids from these inappropriate scenesOur system aims to detect and pixelate bloody scenes automatically

2

MotivationLots of videos contain bloody scenes, we want to protect kids from these inappropriate scenesOur system aims to detect and pixelate bloody scenes automatically

3

System Overview4

VideosFrames

Pixelated frames

Ignored framesPixelated videosDecodeEncode

01

Convolutional Neural NetworkFine-tune pre-trained CaffeNet(ImageNet)Human-labeled frames without bounding boxPredict decoded framesBackground(0) ignored framesBloody frame(1) fully-convolutional net5

Fully-Convolutional NetClassification for each 227 227 box with stride 32 on 451 x 451 imageGenerate a 8 x 8 classification mapInterpolate probabilities to obtain heat map6

Fully-Convolutional Net

PixelationResize heat map to frame sizeBase on heat map, blur frames by Gaussian filter 7

Experiment (I)Run on cml21Decoding/Encoding done by FFmpegDecoded frames as training/validation dataPos = Segments from Saw 1, 2, 3, 7, Final Destination 4, 5 + Crawled images from google imagesNeg = Segments from The Big Bang Theory S8E11 + Part of ILSVRC 2013 val/testRandom sample Pos : Neg = 2500 : 25008

Experiment (II)Classification Accuracy73.46%

9

Test time

Experiment (III)Time(sec) of Processing a video clip 10DecodingClassificationHeat mapPixelationEncodingAverage timeSaw6(139 frames,720x404)0.3441.1822.9972.430.020.99 sec/frameCWL(109 frames,1280x720)0.7936.95001.240.36 sec/frameFD5(121 frames,1024x576)0.4436.233.8728.720.810.58 sec/frame

Future WorkTrain our model with more diverse data to increase accuracy and reduce false-positiveAccelerate blurring and smooth boundariesImplement on surveillance camera for securityCombine shot detection and motion vector to reduce computation11

ReferenceCaffe | Deep Learning Frameworkhttp://caffe.berkeleyvision.org/Classifying ImageNet: the instant Caffe wayNet Surgery for a Fully-Convolutional ModelFFmpeghttps://www.ffmpeg.org/ImageNethttp://www.image-net.org/Tutorials by Hsinfu, Shiro, Jocelyn12

Demo13

Finally, I wanna play aQ & A game14