1 continuous learning of human activity models using deep nets

8
1 Continuous Learning of Human Activity Models using Deep Nets Supplementary Materials Expt No. Comments 1. Page: 2 Fig. 1 Parameters sensitivity analysis on UCF11 dataset. 2. Page: 3 Fig. 2 Parameters sensitivity analysis on VIRAT dataset. 3. Page: 4 Fig. 3 Advantage of using deep learning. 4. Page: 5 Fig. 4 Evaluation of continuous learning on some individual test instances of KTH dataset. 5. Page: 6 Fig. 5 Evaluation of continuous learning on some individual test instances of UCF11 dataset. 6. Page: 7 Fig. 6 Evaluaiton of continuous learning on some individual test instances of VIRAT dataset. 7. Page: 8 Table II Empirically chosen parameter values TABLE I TABLE OF CONTENTS .

Upload: others

Post on 07-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

1

Continuous Learning of Human Activity Modelsusing Deep Nets

Supplementary Materials

Expt No. Comments1. Page: 2 Fig. 1 Parameters sensitivity analysis on UCF11 dataset.2. Page: 3 Fig. 2 Parameters sensitivity analysis on VIRAT dataset.3. Page: 4 Fig. 3 Advantage of using deep learning.4. Page: 5 Fig. 4 Evaluation of continuous learning on some individual test instances of

KTH dataset.5. Page: 6 Fig. 5 Evaluation of continuous learning on some individual test instances of

UCF11 dataset.6. Page: 7 Fig. 6 Evaluaiton of continuous learning on some individual test instances of

VIRAT dataset.7. Page: 8 Table II Empirically chosen parameter values

TABLE ITABLE OF CONTENTS.

2

UCF11 Parameter Sensitivity

0.25 0.50 0.75 1.00.35

0.4

0.45

0.5

0.55

0.6

Percentage of training examples

Accura

cy

0.25 0.50 0.75 1.00.35

0.4

0.45

0.5

0.55

0.6

Percentage of training examples

Accura

cy

0.25 0.50 0.75 1.00.35

0.4

0.45

0.5

0.55

0.6

Percentage of training examples

Accura

cy

(a) k (b) β (c) ρ

0.25 0.50 0.75 1.00.2

0.3

0.4

0.5

Percentage of training examples

Accura

cy

0.2 0.4 0.6 0.8 1.00.4

0.45

0.5

0.55

0.6

0.65

Percentage of Manual Labeling

Accura

cy

(d) α (e) α

0.25 0.50 0.75 1.00.2

0.3

0.4

0.5

Percentage of training examples

Accura

cy

0.2 0.3 0.4 0.5 0.60.2

0.3

0.4

0.5

Buffer size

Accura

cy

(f) Kc (g) Kc

Fig. 1. (a) Number of neurons in the hidden layer k vary from 100 to 1500 with 100 increment. (b) β varyfrom 0.5 to 5 with 0.5 increment (c) ρ vary from 0.05 to 0.5 with 0.05 increment. Performance variation isnot significant with the change of k and β, while for ρ performance variation is relatively more significant.(d) and (e) show the effect of the amount of manual labeling. Performance variation is large as expected.As more instances are presented to the framework performance variation increases significantly (plot (d)).If we increase the amount of manual labeling by active learning (changing α), performance increases asshown in plot (e). However, performance with 60%-70% manual labeling is closer to the performancewith 100% manual labeling. (f) and (g) show the effect of buffer size Kc, which has significant effecton the performance. Performance increases with buffer size as expected. In plot (g), x-axis shows thebuffer size as the fraction of total number of training instances available. It is evident that, we can achievesatisfactory performance by storing less number of instances in the memory.

3

VIRAT Parameter Sensitivity

0.25 0.50 0.75 1.00.3

0.35

0.4

0.45

0.5

0.55

Percentage of training examples

Accura

cy

0.25 0.50 0.75 1.00.3

0.35

0.4

0.45

0.5

0.55

Percentage of training examples

Accura

cy

0.25 0.50 0.75 1.00.3

0.35

0.4

0.45

0.5

0.55

Percentage of training examples

Accura

cy

(a) k (b) β (c) ρ

0.25 0.50 0.75 1.00.3

0.35

0.4

0.45

0.5

0.55

Percentage of training examples

Accura

cy

0.2 0.4 0.6 0.8 1.00.3

0.35

0.4

0.45

0.5

0.55

Percentage of Manual Labeling

Accura

cy

(d) α (e) α

0.25 0.50 0.75 1.00.2

0.3

0.4

0.5

Percentage of training examples

Accura

cy

0.2 0.3 0.4 0.5 0.60.2

0.3

0.4

0.5

Buffer size

Accura

cy

(f) Kc (g) Kc

Fig. 2. (a) Number of neurons in the hidden layer k vary from 100 to 1500 with 100 increment. (b) βvary from 0.5 to 5 with 0.5 increment (c) ρ vary from 0.05 to 0.5 with 0.05 increment. Performancevariation is not significant with the change of ρ, while for k and β performance variation is relativelymore significant. (d) and (e) show the effect of the amount of manual labeling. Performance variation islarge as expected. However, it is interesting that with around 50%-60% manual labeling our frameworkcan achieve performance close to 100% manual labeling. (f) and (g) show the effect of buffer size Kc,which has significant effect on the performance. Performance increases with buffer size as expected. Inplot (g), x-axis shows the buffer size as the fraction of total number of training instances available. It isevident that, we can achieve satisfactory performance by storing less number of instances in the memory.

4

0.25 0.50 0.75 1.00.3

0.4

0.5

0.6

Percentage of training examples.

Accura

cy

W/o Deep Learning

W/ Deep Learning

0.25 0.50 0.75 1.00.3

0.35

0.4

0.45

0.5

0.55

Percentage of training examples.

Accura

cy

W/o Deep Learning

W/ Deep Learning

(a) (b)

Fig. 3. Plot (a) shows the benefit of using deep learning on UCF11 dataset. The activity model that does notuse deep learning is outperformed by the method (A0F0) that use deep learning by a margin of around 1%.Plot (b) shows the benefit of using deep learning on VIRAT dataset. Performance of the activity modelthat does not use deep learning is better initially, but as more instances are presented to the frameworkdeep learning based method (A0F0) outperform other method by a margin of 0.5%. It demonstrates theability of learning of our framework in concept drift.

5

Individual action clips of KTH dataset

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(a) Boxing (b) Boxing

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(c) Hand Clapping (d) Hand Clapping

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(e) Hand Waving (f) Jogging

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(g) Walking (h) Boxing

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(i) Walking (j) Running

Fig. 4. Evaluation of continuous leanring on individual action clips of KTH dataset. Each row of this figurehas two boxes and each box contains a snap of an action clip to the left and its corresponding evaluaitonto the right (as stacked bar chart). The bar chart has four bars for four methods such as A1F1, A1F0,A0F1, and A0F0 respectively from left to right. Each bar has four segments illustraing four batches ofcontinuous learning with 25%, 50%, 75%, and 100% training instances presented so far to the frameworkrespectively from bottom to top. Each segment is colored either blue or red. A blue segment means thatthe action clip is classified correctly after that batch of training instacnes are presented and red meansmisclassificaiton. For some cases such as (a), (d), and (g), the action clip is correctly classified from firstbatch through last batch. For some cases such as (b), (c), (e), and (f), the action clip is missclassifiedinitially but correctly classified later as more training instances are presented to the framework. For somecases such as (h) and (i), classification decision may change intermittently. For few hard instances suchas (j), action clip remains missclassfied after all batches of training.

6

Individual action clips of UCF11 dataset

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(a) Biking (b) Basketball

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(c) Diving (d) Golf Swing

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(e) Soccer Juggling (f) Swing

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(g) Tennis Swing (h) Jumping

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(i) Walking (j) Spiking

Fig. 5. Evaluation of continuous leanring on individual action clips of UCF11 dataset. Each row of thisfigure has two boxes and each box contains a snap of an action clip to the left and its correspondingevaluaiton to the right (as stacked bar chart). The bar chart has four bars for four methods such asA1F1, A1F0, A0F1, and A0F0 respectively from left to right. Each bar has four segments illustraing fourbatches of continuous learning with 25%, 50%, 75%, and 100% training instances presented so far to theframework respectively from bottom to top. Each segment is colored either blue or red. A blue segmentmeans that the action clip is classified correctly after that batch of training instacnes are presented andred means misclassificaiton. For some cases such as (a), and (d), the action clip is correctly classifiedfrom first batch through last batch. For some cases such as (b), (c), (e), (f), (g), (h), and (i), the actionclip is missclassified initially but correctly classified later as more training instances are presented to theframework. For some cases such as (e), classification decision may change intermittently. For few hardinstances such as (j), action clip remains missclassfied after all batches of training.

7

Individual action clips of VIRAT dataset

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(a) Boxing (b) Boxing

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(c) Hand Clapping (d) Hand Clapping

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(e) Hand Waving (f) Jogging

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(g) Walking (h) Boxing

A1F1 A1F0 A0F1 A0F0 A1F1 A1F0 A0F1 A0F0

(i) Walking (j) Running

Fig. 6. Evaluation of continuous leanring on individual action clips of VIRAT dataset. Each row of this figurehas two boxes and each box contains a snap of an action clip to the left and its corresponding evaluaitonto the right (as stacked bar chart). The bar chart has four bars for four methods such as A1F1, A1F0,A0F1, and A0F0 respectively from left to right. Each bar has four segments illustraing four batches ofcontinuous learning with 25%, 50%, 75%, and 100% training instances presented so far to the frameworkrespectively from bottom to top. Each segment is colored either blue or red. A blue segment means thatthe action clip is classified correctly after that batch of training instacnes are presented and red meansmisclassificaiton. For some cases such as (e), and (i), the action clip is correctly classified from first batchthrough last batch. For some cases such as (a), (d), (f), (g), (h), and (j), the action clip is missclassifiedinitially but correctly classified later as more training instances are presented to the framework. For somecases such as (b), and (c), classification decision may change intermittently.

8

Parameter Values and Sensitivity: We have three types of parameters, newly feature selection (T ,L,and k), model training (β, rho, and λ), and experiment design parameters (Kc and α). Experimentalresult illustrated in Fig. 7 of the main paper was conducted using the empirically estimated parametervalues shown in Table II. Choice of some of these parameter values has relatively significant effect on theperformance as shown in the Fig. 8 of the main paper and Fig. 1 and 2 of this supplementary respectivelyfor KTH, UCF11, and VIRAT datasets. However, most of the parameter values are same for differentdatasets.

Feature Model training DesignDataset T L n k λ β ρ Kc αKTH 1 3 2268 400 10e−4 2 0.3 0.4 0.5UCF11 2 3 4536 800 10e−4 3.5 0.3 0.5 0.5VIRAT 2 3 4536 800 100e−4 2 0.1 0.5 0.5TRECVID 2 3 4536 800 10e−4 3 0.3 0.5 0.5

TABLE IIPARAMETER VALUES