Transcript
Page 1: Extreme Learning Machines and Kernel Deep Convex ... Extreme Learning Machines and Kernel Deep Convex Networks for Speech and Vision Tasks Author Ahmed Karanath(B13104), Kansul Mahrifa(B13123)

Extreme Learning Machines and Kernel Deep Convex Networks for Speech and Vision TasksAhmed Karanath(B13104), Kansul Mahrifa(B13123)

Mentor: Dr. A. D. DileepSchool of Computing and Electrical Engineering, Indian Institute of Technology Mandi

AbstractWe explore two related approaches for the taskof speech emotion recognition(SER), speakeridentification(Spk-Id) and scene classification:Extreme Learning Machine algorithm(ELM)based Neural Network, and Kernel Deep Con-vex Network(KDCN). We propose using theseapproaches to classify varying length patterns,and speed up training time as compared to backpropagation techniques. We also propose a novelapproach to classify varying length patterns us-ing dynamic kernel in KDCN without compro-mising on time, while giving comparable resultsto other methods used for these tasks.

IntroductionImages and speech data consists mostly of varyinglength samples. Hence there is a need for alterna-tive methods to classify varying length patterns inspeech and images.

Figure: Speech signal waveforms of two short durationutterances of word "me". These signals are recorded at a

sampling rate of 16KHz.

Figure: Image of coast from MIT8 scene dataset. Eachfeature vector corresponds to a patch on the image.

Extreme Learning Machine

Figure: Schematic diagram of a single layer feed forwardneural network on which the ELM algorithm is used.

• In this algorithm[1], the weights wi above arerandomly assigned. Then the output weights βis calculated as β = H†T, where H† is thegeneralized inverse of H.

•This is extended to a kernel based version[2].For a test example X, the output functionf (X)of the kernel based extreme learningmachine (KELM) is given as:

f (X) = [K(X,X1), . . . , K(X,Xj)]IC

+ K

−1T

(1)where, K(X,Xj) is the kernel function ofhidden neurons of SLFN.

• In this kernel version, static kernels like linearkernel, polynomial kernel, Gaussian kernel areused.

Issues:•Neural networks including ELM based networksrequire fixed length input size, cannot handlevarying length patterns.

•Hidden layer has to be tweaked by trial anderror.

Proposed SolutionClassify varying length patterns directly withELM using dynamic kernels. Hidden layerdimensionality need not be known, and numberof hidden nodes need not be tweaked in this case.Named as dynamic kernel ELM(DKELM).

Kernel Deep ConvexNetwork

Kernel Deep Convex Network(KDCN)[3] is a neu-ral network composed by stacking shallow neuralnetwork modules.Architecture:•Each module in KDCN has an input layer, ahidden layer and an output layer.

•The input to the higher modules isconcatenation of input data and the outputs oflower layers.

•Concatenation of output from lower layers helpprevent over-fitting on the training data.

Figure: Schematic diagram of KDCN with three modules.

Issues:•Concatenation of output from a module and theinput data at the different levels is not possiblefor varying length patterns.

Proposed SolutionLinear combination of kernel matrices calculatedseparately on the input data and the outputfrom the previous layers is performed, and thisis given as input to the next module. Named asdynamic kernel deep convex network(DKDCN).

ResultsDatasets used:•Scene classification: MIT8 scene and VogelSchiele datasets

•Speech emotion recognition: EmoDB andFAU-AEC datasets

•Speaker identification: NIST-SRE corpora

Dynamic Kernel MIT8 scene Vogel SchieleDKELM DKDCN DKELM DKDCN

FK 82.62 82.63 75.18 74.47PSK 63.50 62.90 56.02 56.21CIGMMIMK 75.20 75.30 65.25 65.20SPSK - - - -SLPMK - - - -

Table: Comparison of classification accuracies of KELM andDKDCN(3 layer) based classifiers on image data.

DynamicKernel

Speech Emotion Recognition Speaker IdentificationEmoDB FAU-AECKELM DKDCN SVM KELM DKDCN SVM KELM DKDCN SVM

FK 88.0 86.45 87.05 63.67 - 61.54 89.14 88.20 88.54PSK 88.40 86.0 87.46 64.9 - 62.54 87.18 - 86.18CIGMMIMK 78.0 79.0 85.62 62.71 - 62.48 88.78 87.35 88.54SLPSK 91.40 - 92.6 65.60 - 66.29 91.21 - 91.67

Table: Comparison of classification accuracies of KELM,DKDCN(3 layer) and SVM based classifiers on speech data.

Future Work•The spaces indicated in the above tables indicateongoing work. These will be reported oncompletion.

•Effect of different types of intermediate kernelsin DKDCN, and ways to improve results.

References[1] Guang-Bin Huang, Qin-Yu Zhu, and Chee-Kheong Siew.

Extreme learning machine: Theory and applications.Neurocomputing, 70:489 – 501, 2006.

[2] Alexandros Iosifidis and Moncef Gabbouj.On the kernel extreme learning machine speedup.Pattern Recogn. Lett., 68(P1):205–210, December 2015.

[3] Niharjyoti Sarangi and C. Chandra Sekhar.Automatic image annotation using convex deep learningmodels.ICPRAM 2015, Portugal. SCITEPRESS - Science andTechnology Publications, Lda.

Top Related