retele neuronale convolutive (convolutional neural networks)inf.ucv.ro/documents/rstoean/8....
TRANSCRIPT
Deep learning
Ruxandra Stoean
http://inf.ucv.ro/~rstoean
Retele neuronale convolutive
(Convolutional neural networks)
Definitii
“For most flavors of the old generations of learning algorithms … performance will plateau. … deep learning … is the first class of algorithms … that is scalable. … performance just keeps getting better as you feed them more data” - Andrew Ng
“The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.” – Ian Goodfellow et al, Deep Learning, MIT Press, 2016, http://www.deeplearningbook.org/
“Deep learning [is] … a pipeline of modules all of which are trainable. … deep because [has] multiple stages in the process of recognizing an object and all of those stages are part of the training” - Yann LeCun
“At which problem depth does Shallow Learning end, and Deep Learning begin? Discussions with DL experts have not yet yielded a conclusive response to this question. […], let me just define for the purposes of this overview: problems of depth > 10 require Very Deep Learning.” -Jurgen Schmidhuber
https://machinelearningmastery.com/what-is-deep-learning/
ZDNet
Retele neuronale convolutive – Convolutional
Neural Networks (CNN)
Extragere automata a trasaturilor: de la
cele low-level la cele high-level
Aplicatii importante in computer vision
Clasificare – Exista o cladire in aceasta
imagine
Segmentare semantica – Acestia sunt
pixelii cladirii
Detectarea de obiecte – Exista cladiri in
aceasta imagine
Segmentarea instantelor – Exista cladiri in
aceasta imagine si acestia sunt pixelii
fiecareia
Arhitectura
Straturi (layers)
Invatarea
trasaturilor
Convolutie
ReLU
Pooling
Clasificare
Fully connected
https://medium.com/@RaghavPrabhu/understanding-of-convolutional-neural-network-cnn-deep-learning-99760835f148
Convolutia
Input
Volum de forma Latime x Inaltime x Adancime
Kernel (sau filtru) – o multimepartajata de ponderi (weights)
Trecere inainte (Forward pass) –convolutie intre filtru si volumul de intrare
Patru hiperparametri:
Marime kernel (Kernel size)
Adancime kernel sau numar de filter (Kernel depth)
Pas (Stride)
Umplere cu zero (Zero-padding)
http://cs231n.github.io/convolutional-networks/ https://www.youtube.com/watch?v=AQirPKrAyDg
ReLU & Pooling
Rectified Linear Unit – un strat de transfer pentru a adauga nonlinearitate
(Max) Pooling – micsoreaza volumul
Marimea ferestrei
Pas
http://cs231n.github.io/convolutional-networks/ https://www.youtube.com/watch?v=AQirPKrAyDg
Practica
Alegerea unei arhitecturi potrivite
Dependenta parametrilor de problema
Timp de rulare mare
Putere computationala mare necesara
Baze de date mici in realitate
Overfitting
Interpretare dificila a modelelor
Nu sunt plug & play!
Parametrizarea
Kernelele convolutive
Marime, adancime, pasi
Straturile de pooling
Marimea ferestrelor, pasi
Rata de dropout pentru straturile ce controleaza overfitting
Marime lotului pentru batch normalization inspre cresterea vitezeide invatare
Rata de invatare
Numarul de unitati in straturile fully connected
Numarul of epoci
Ponderile initiale
Topologia
Optimizatorii
Functiile de activare
Functiile de pierdere
Parametrizare
Manuala
Automata (printr-o euristica?)
Overfitting
Cand modelul e prea complex pentru date
Lucrul cu probleme reale
Esantioane mici
Mijloace de combatere:
Augmentarea datelor
Reorientare, rotire, scalare, crop, translatie, zgomot Gaussian
Retele adversariale generative - Generative adversarial networks (GAN): o retea genereaza, alta evalueaza
Straturi de dropout
Regularizare
Penalizarea pentru ponderi L1 si L2
Oprire timpurie
Utilizare de puncte de control pentru a salva modelul la fiecare epoca
Alegerea celui mai bun candidat din rezultatele pe validare dupa ultima epoca
Invatare prin transfer(Transfer learning)
CNN au nevoie de
Big data
Resurse computationale mari
Se iau parametric de la o retea deja antrenata pe o multime mare de date
Multimi mari de date: ImageNet, CIFAR
Modele pre-antrenate: VGG, Inception, AlexNet, ResNet
Straturile initiale invata trasaturi generale
Se antreneaza straturile finale pentru problema curenta
Se invata trasaturile specific problemei date
Asadar
Se rezolva problema datelor
Sunt mai putin parametri de antrenat
Cadre pentru implementare Deep
Learning
https://towardsdatascience.com/deep-learning-framework-power-scores-2018-23607ddf297a
Instalare Keras in R
Keras este un API high-level de retele neuronale care este scris in Python si
foloseste TensorFlow ca backend.
Pentru a lucra cu Keras sub R trebuie instalate pe calculator ultimele versiuni ale:
R
Rstudio
Anaconda (pentru instalarea Python)
Apoi se instaleaza pachetul keras din RStudio (se vor instala Keras si Tensorflow)
Problema recunoasterii cifrelor MNIST
Vom aplica un CNN pentru recunoasterea cifrelor MNIST1.
library("keras")
# incarcare BD MNIST
mnist <- dataset_mnist()
c(x_train, y_train) %<-% mnist$train
c(x_test, y_test) %<-% mnist$test
rbind(dim(x_train), dim(x_test))
1 http://yann.lecun.com/exdb/mnist/
Transformare date de intrare in format
CNN# transformare in formatul pentru CNN (Lungime x Latime x Numar de canale
de culoare)
x_train_original <- x_train
x_test_original <- x_test
x_train <- array_reshape(x_train, c(nrow(x_train), 28, 28, 1))
x_test <- array_reshape(x_test, c(nrow(x_test), 28, 28, 1))
input_shape <- c(28, 28, 1)
Transformare clase in codificare binara
# transformare clase codificate prin cifrele 0-9 in codificare one-hot binara
y_train_original <- y_train
y_test_original <- y_test
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
Construirea arhitecturii CNN#un strat convolutiv cu 8 filtre de 3 x 3, urmat de ReLU, un al doilea strat convolutiv cu 16 filtre#de marime 5 x 5, urmat de RELU, un strat Max Pooling de marime 2 x 2 si stride 2, un strat de #dropout de 0.25, un strat fully connected cu 100 de unitati si activare ReLU, inca un dropout #de 0.5 si ultimul strat fully connected legat la cele10 clase de iesire cu activare softmax
model <- keras_model_sequential() %>%
layer_conv_2d(filters = 8, kernel_size = c(3,3), activation = 'relu', padding="same",
input_shape = input_shape) %>%
layer_conv_2d(filters = 16, kernel_size = c(5,5), activation = 'relu', padding="same") %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 100, activation = 'relu') %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 10, activation = 'softmax')
summary(model)
Compilare model CNN
# compilare model cu functia de pierdere, optimizatorul si metrica folosite
model %>% compile(
loss = 'categorical_crossentropy', # pentru clasificare multi-class
optimizer = 'adam', # optimizatorul
metrics = c('accuracy') # acuratetea ca performanta a modelului
)
Antrenare model
# antrenare model
history <- model %>% fit(
x_train, y_train,
epochs = 10, # numarul de epoci (10 treceri complete ale multimii de antrenament)
batch_size = 25, # marimea lotului
validation_split = 0.25 # se imparte multimea de date in 75% antrenament si 25%
validare
)
Vizualizare antrenament si evaluare pe test
# vizualizare pierdere si acuratete de-a lungul antrenarii
plot(history)
# evaluarea modelului pe test
model %>% evaluate(x_test, y_test)
# predictiile modelului de CNN
y_test_predicted <- model %>% predict_classes(x_test)
Istoria antrenamentului
Vizualizare test
# vizualizare date de test cu predictiile gasite
#par(mfcol=c(6,6))
par(mar=c(0, 0, 3, 0), xaxs='i', yaxs='i')
for (idx in 1:36) {
im <- x_test_original[idx,,]
im <- t(apply(im, 2, rev))
if (y_test_predicted[idx] == y_test_original[idx]) {
color <- '#008800'
} else {
color <- '#bb0000'
}
image(1:28, 1:28, im, col=gray((0:255)/255),
xaxt='n', main=paste0(y_test_predicted[idx], " (", y_test_original[idx], ")"),
col.main=color)
}
Predictie pe test. Vizualizare
Acuratete:
98.43%