introduction of deeplearning - kbs - kbs · 2018-06-19 · you only look once (yolo) unified,...
TRANSCRIPT
1
Introduction of DeepLearning
Jiang Xuan
DeepLearning
à Most modern deep learning models are based on an artificialneural network
à Multilayer neural network
à Multiple nonlinear transformations
Applications of DeepLearning
DeepLearning Applications In Computer Vision
• (YOLO) You Only Look Once: Unified, Real-Time Object Detection
•CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning Pneumonia
You Only Look Once (YOLO)Unified, Real-Time Object Detection
• Introductionà Using YOLO, you only look once at
an image to predict what objects are present and where they are.
• Traditional Object-Detection Algorithmsà DPM (deformable parts models )à R-CNN
Object Detection
YOLO• R-CNN
à use region proposal methods to generate potential boundingboxes in an image and then run a classifier on these proposedboxes.
• DPMà deformable parts modelsà use a sliding window approach
You Only Look Once (YOLO)
• Advantage of Yolo
à YOLO reframes object detection as a single regression problem
à straight from image pixels to bounding box coordinates and class probabilities.
à Using YOLO, you only look once at an image to predict what objects are present and where they are.
Model Description
(1)resizes the input image to 448 × 448
(2) runs a single convolutional network on the image
(3) thresholds the resulting detections by the model’s confidence.
Model Description• Unified Detection
à Our system divides the input image into an S × S grid.
à Each grid cell predicts B bounding boxes and confidence scores for those boxes.
à Each bounding box consists of 5 predictions: x, y, w, h, and confidence.
, y, w, h, and confidence.
Model ArchitectureThe network architecture is inspired by theGoogLeNet model for image classification .
The network has 24 convolutional layersfollowed by 2 fully connected layers.
Instead of the inception modules used byGoogLeNet, it simply uses 1 × 1 reductionlayers followed by 3 × 3 convolutional layers.
• Training Set
à the ImageNet 1000-class competition dataset
• Validation Set
à PASCAL VOC 2007 and 2012
Model Training
• Training Processà For pretraining,we use the first 20 convolutional layers followed by a
average-pooling layer and a fully connected layer.
à We train this network for approximately a week and achieve a single crop top-5 accuracy of 88% on the ImageNet 2012 validation set
à add four convolutional layers and two fully connected layers with randomly initialized weights.
à Detection often requires fine-grained visual information so we increase the input resolution of the network from 224 × 224 to 448 × 448
à We then train the network for about 135 epochs on the training and validation data sets
Experiments and ComparisonComparing the performance andspeed of fast detectors:
1.Fast YOLO is the fastest detectoron record for PASCAL VOC detection
2. Fast YOLO is still twice as accurateas any other real-time detector.
3.YOLO is 10 mAP more accuratethan the fast version while still wellabove realtime in speed.
Experiments and Comparison
1.Localization errors account formore of YOLO’s errors than all other sources combined.
2.Fast R-CNN makes much fewerlocalization errors but far morebackground errors.
3.Fast R-CNN is almost 3 timesmore likely to predict backgrounddetections than YOLO.
VOC 2007 Error Analysis
CheXNetRadiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning Pneumonia
à CheXNet can automatically detect pneumonia from chest X-rays at a level exceeding practicing radiologists.
ChestX-ray14 dataset
• Model Description
à a 121- layer convolutional neural network
à input a chest X-ray image
à output the probability of pneumonia along with a heatmaplocalizing the areas of the image most indicative of pneumonia.
à Trained with ChestX-ray14 dataset which contains 112,120 frontal-view X-ray images of30,805 unique patients.
14. Juli 2010Wolfgang Nejdl
Problem of Traditional CNN:As CNNs become increasingly deep ,as information aboutthe input or gradient passes through many layers, it canvanish by the time it reaches the end (or beginning) of thenetwork.
How can we solve this problem?
Model ArchitectureDensely Connected Convolutional Neural Network
A 5-layer dense block with a growth rate of k = 4. Each layer takes all precedingfeature-maps as input.
DenseNet:DenseNet propose a different connectivity pattern: directconnections from any layer to all subsequent layers .
Consequently, the layer receivesthe feature-maps of all precedinglayers, , as input:
• Model Architecture
à CheXNet is a 121-layer Dense Convolutional Network trainedon the ChestX-ray 14 dataset.
à We replace the final fully layer with one that has a singleoutput
à After the fully layer,we apply a sigmoid nonlinearity.
Model TrainingBased on ChestX-ray14We downscale the images to 224×224 and normalize based on themean and standard deviation of images in the ImageNet training set
we randomly split the dataset into:
à training (28744 patients, 98637 images)à validation (1672 patients, 6351 images)à test (389 patients, 420 images).
Experiments
We compare radiologists andour model on the F1 metric(F1 Score is the harmonicaverage of the precision andrecall of the models)
Based on 420 images fromChestX-ray14
Extension
• Improvementà instead of outputting one binary label, ChexNet outputs a vector t
indicating the presence of each of the 14 pathology classes
à we replace the final fully connected layer in CheXNet with a fully connected layer producing a 14-dimensional output, after which we apply an elementwise sigmoid nonlinearity
Visualizationvisualize the areas of the imagemost indicative of the diseaseusing class activation mappings(CAMs)
feed an image into the fullytrained network and extract thefeature maps that are output bythe final convolutional layer.
Experiments and Comparision
CheXNet outperforms the best published resultson all 14 pathologies in the ChestX-ray14 dataset.
Discussion