mmai 2014 final

Post on 09-Feb-2017

33 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Naive Soul Guardian Bloody Scenes Detection with Deep Convolutional Neural Network

Naive Soul GuardianBloody Scenes Detection with Deep Convolutional Neural NetworkB99902080 R03944007

OutlineMotivationSystem OverviewConvolutional Neural NetworkFully-Convolutional NetPixelationExperimentFuture WorkReferenceDemo1

Schedule guidance

MotivationLots of videos contain bloody scenes, we want to protect kids from these inappropriate scenesOur system aims to detect and pixelate bloody scenes automatically

2

MotivationLots of videos contain bloody scenes, we want to protect kids from these inappropriate scenesOur system aims to detect and pixelate bloody scenes automatically

3

System Overview4

VideosFrames

Pixelated frames

Ignored framesPixelated videosDecodeEncode

01

Convolutional Neural NetworkFine-tune pre-trained CaffeNet(ImageNet)Human-labeled frames without bounding boxPredict decoded framesBackground(0) ignored framesBloody frame(1) fully-convolutional net5

Fully-Convolutional NetClassification for each 227 227 box with stride 32 on 451 x 451 imageGenerate a 8 x 8 classification mapInterpolate probabilities to obtain heat map6

Fully-Convolutional Net

PixelationResize heat map to frame sizeBase on heat map, blur frames by Gaussian filter 7

Experiment (I)Run on cml21Decoding/Encoding done by FFmpegDecoded frames as training/validation dataPos = Segments from Saw 1, 2, 3, 7, Final Destination 4, 5 + Crawled images from google imagesNeg = Segments from The Big Bang Theory S8E11 + Part of ILSVRC 2013 val/testRandom sample Pos : Neg = 2500 : 25008

Experiment (II)Classification Accuracy73.46%

9

Test time

Experiment (III)Time(sec) of Processing a video clip 10DecodingClassificationHeat mapPixelationEncodingAverage timeSaw6(139 frames,720x404)0.3441.1822.9972.430.020.99 sec/frameCWL(109 frames,1280x720)0.7936.95001.240.36 sec/frameFD5(121 frames,1024x576)0.4436.233.8728.720.810.58 sec/frame

Future WorkTrain our model with more diverse data to increase accuracy and reduce false-positiveAccelerate blurring and smooth boundariesImplement on surveillance camera for securityCombine shot detection and motion vector to reduce computation11

ReferenceCaffe | Deep Learning Frameworkhttp://caffe.berkeleyvision.org/Classifying ImageNet: the instant Caffe wayNet Surgery for a Fully-Convolutional ModelFFmpeghttps://www.ffmpeg.org/ImageNethttp://www.image-net.org/Tutorials by Hsinfu, Shiro, Jocelyn12

Demo13

Finally, I wanna play aQ & A game14

top related