feature space decomposition for autonomous robot ... · feature space decomposition for autonomous...

2
Feature Space Decomposition for Autonomous Robot Adaptation in Programming by Demonstration Chi Zhang 1 , Hao Zhang 2 , and Lynne E. Parker 1 Abstract— Adaptation is an essential capability for intelligent robots to work in human environments. We propose a novel Fea- ture Space Decomposition (FSD) approach to effectively address the robot adaptation problem, which is directly applicable to the learning framework based on Programming by Demonstration (PbD) and Reinforcement Learning (RL). Our FSD method de- composes the high-dimensional original features into principal and non-principal feature spaces, which are respectively used to learn a generalized trajectory from demonstrations, and to form a new low-dimensional search space on RL for autonomous robot adaptation beyond demonstrations. Experimental results on real robots validate that our FSD approach enables the robots to effectively adapt to new environments, and is usually able to find optimal solutions more quickly than traditional approaches when significant environment changes occur. I. I NTRODUCTION In Programming by Demonstration (PbD), adaptability to unseen situations is a crucial problem that must be addressed when the robot operates in unstructured human environments encounters a state with no demonstrations [1], as the example in Fig. 1 shows. Reinforcement Learning (RL) is a widely used methodology to deal with this problem [2], but RL often suffers from the curse of dimensionality issue when applied on robots with a high Degree of Freedom (DoF) [2], [3]. Dimension Reduction [3], [4] is usually applied to tackle this issue by projecting the original high-dimensional learning space to a low-dimensional latent space. However, since the latent space is optimized for a specific environment during the training phase, it typically contains less variations. Accordingly, searching a solution within the latent space can be less effective for robot adaptation to new environment with unseen changes. Additionally, there is no theoretical guarantee that the optimal solution lies in the latent space [4]. In extreme cases, such approaches may not be able to find any solution, when all policies, including sub-optimal ones, fall outside the latent space. In this abstract, we propose a novel Feature Space De- composition (FSD) approach that divides all features from human demonstration data into two categories: principal and non-principal. Principal features are the ones optimized to represent the human demonstration data in the environment during training, which are equivalent to the features forming the latent space in traditional RL approaches. Non-principal features encode the information that is not optimized for the training environments, which usually contain more variations and are less restricted by the training process. Fig. 2 illus- trates how our FSD method can be directly applied in the 1 Chi Zhang and Lynne E. Parker are with the Dept. of Elec. Engr. and Comp. Sci., University of Tennessee, Knoxville, TN 37996, USA. {czhang24,leparker}@utk.edu 2 Hao Zhang is with the Dept. of Elec. Engr. and Comp. Sci., Colorado School of Mines, Golden, CO 80401, USA. [email protected] (a) (b) (c) Fig. 1. A motivating example of robot adaptation. To teach a writing task, a human instructor physically moves the arm of the NAO robot (Fig. 1(a)). In an unchanged environment (Fig. 1(b)), the robot can successfully perform the task using the learned knowledge. In new environments (e.g., a lifted and/or tilted table top as in Fig. 1(c)), robot adaptation is necessary. https://www.youtube.com/watch?v=Lr_1qdQ6BFI PbD and RL learning framework to construct a complete autonomous robot adaptation system. Our contributions are twofold. First, we introduce a new perspective of autonomous robot adaptation, i.e., learning from the outside of the restricted latent space, which can be especially beneficial when solutions may not exist in the latent space. Second, we propose a simple, yet effective FSD approach to realize our perspective for robot adaptation. The method is effective since it can be scaled to the entire feature space and thus is guaranteed to find an optimal solution even when it doesnt exist in the latent space. By reducing the search space exponentially, FSD-based RL approaches can usually find solutions more quickly. II. APPROACH A. Feature Decomposition Let {x i },i = {1, 2, ..., D} denote the features in the original data space. Then x is centered, and the covariance matrix C, eigenvectors w i and associated eigenvalues λ i are calculated by: C = E(x ¯ x T ) Cw i = λ i w i (1) where λ i denotes the importance of the corresponding com- ponent w i , and w i = {w ij },j = {1, 2, ...D} is the weight vector used to rotate the original feature set {x i } into a projected (latent) space that is encoded by the new feature set ξ i = j (w ij x i ). By accumulating the sorted eigenvalues {λ 1 2 , ..., λ p }, we get the corresponding principal features {ξ 1 2 ,...,ξ p } that contain information above a given threshold T 1 , i.e. T 1 = 95%, where p refers to the first p components covering at least 95% of the data’s spread: p i=1 λ i > 95%. In order to extract the non-principal features, we evaluate the magnitude of contribution (MoC) of each original feature x by calculating the length of x in the latent space. Since the eigenvalue represents the significance of each component, we

Upload: others

Post on 23-Jun-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Feature Space Decomposition for Autonomous Robot ... · Feature Space Decomposition for Autonomous Robot Adaptation in Programming by Demonstration Chi Zhang 1, Hao Zhang2, and Lynne

Feature Space Decomposition for AutonomousRobot Adaptation in Programming by Demonstration

Chi Zhang1, Hao Zhang2, and Lynne E. Parker1

Abstract— Adaptation is an essential capability for intelligentrobots to work in human environments. We propose a novel Fea-ture Space Decomposition (FSD) approach to effectively addressthe robot adaptation problem, which is directly applicable to thelearning framework based on Programming by Demonstration(PbD) and Reinforcement Learning (RL). Our FSD method de-composes the high-dimensional original features into principaland non-principal feature spaces, which are respectively used tolearn a generalized trajectory from demonstrations, and to forma new low-dimensional search space on RL for autonomousrobot adaptation beyond demonstrations. Experimental resultson real robots validate that our FSD approach enables therobots to effectively adapt to new environments, and is usuallyable to find optimal solutions more quickly than traditionalapproaches when significant environment changes occur.

I. INTRODUCTION

In Programming by Demonstration (PbD), adaptability tounseen situations is a crucial problem that must be addressedwhen the robot operates in unstructured human environmentsencounters a state with no demonstrations [1], as the examplein Fig. 1 shows. Reinforcement Learning (RL) is a widelyused methodology to deal with this problem [2], but RLoften suffers from the curse of dimensionality issue whenapplied on robots with a high Degree of Freedom (DoF)[2], [3]. Dimension Reduction [3], [4] is usually applied totackle this issue by projecting the original high-dimensionallearning space to a low-dimensional latent space. However,since the latent space is optimized for a specific environmentduring the training phase, it typically contains less variations.Accordingly, searching a solution within the latent space canbe less effective for robot adaptation to new environmentwith unseen changes. Additionally, there is no theoreticalguarantee that the optimal solution lies in the latent space[4]. In extreme cases, such approaches may not be able tofind any solution, when all policies, including sub-optimalones, fall outside the latent space.

In this abstract, we propose a novel Feature Space De-composition (FSD) approach that divides all features fromhuman demonstration data into two categories: principal andnon-principal. Principal features are the ones optimized torepresent the human demonstration data in the environmentduring training, which are equivalent to the features formingthe latent space in traditional RL approaches. Non-principalfeatures encode the information that is not optimized for thetraining environments, which usually contain more variationsand are less restricted by the training process. Fig. 2 illus-trates how our FSD method can be directly applied in the

1Chi Zhang and Lynne E. Parker are with the Dept. of Elec. Engr.and Comp. Sci., University of Tennessee, Knoxville, TN 37996, USA.{czhang24,leparker}@utk.edu

2Hao Zhang is with the Dept. of Elec. Engr. and Comp. Sci., ColoradoSchool of Mines, Golden, CO 80401, USA. [email protected]

(a) (b) (c)

Fig. 1. A motivating example of robot adaptation. To teach a writingtask, a human instructor physically moves the arm of the NAO robot (Fig.1(a)). In an unchanged environment (Fig. 1(b)), the robot can successfullyperform the task using the learned knowledge. In new environments (e.g., alifted and/or tilted table top as in Fig. 1(c)), robot adaptation is necessary.https://www.youtube.com/watch?v=Lr_1qdQ6BFI

PbD and RL learning framework to construct a completeautonomous robot adaptation system.

Our contributions are twofold. First, we introduce a newperspective of autonomous robot adaptation, i.e., learningfrom the outside of the restricted latent space, which canbe especially beneficial when solutions may not exist in thelatent space. Second, we propose a simple, yet effective FSDapproach to realize our perspective for robot adaptation. Themethod is effective since it can be scaled to the entire featurespace and thus is guaranteed to find an optimal solution evenwhen it doesnt exist in the latent space. By reducing thesearch space exponentially, FSD-based RL approaches canusually find solutions more quickly.

II. APPROACH

A. Feature DecompositionLet {xi}, i = {1, 2, ..., D} denote the features in the

original data space. Then x is centered, and the covariancematrix C, eigenvectors wi and associated eigenvalues λi arecalculated by:

C = E(xx̄T )

Cwi = λiwi

(1)

where λi denotes the importance of the corresponding com-ponent wi, and wi = {wij}, j = {1, 2, ...D} is the weightvector used to rotate the original feature set {xi} into aprojected (latent) space that is encoded by the new featureset ξi =

∑j(wijxi).

By accumulating the sorted eigenvalues {λ1, λ2, ..., λp},we get the corresponding principal features {ξ1, ξ2, . . . , ξp}that contain information above a given threshold T1, i.e. T1 =95%, where p refers to the first p components covering atleast 95% of the data’s spread:

∑pi=1 λi > 95%.

In order to extract the non-principal features, we evaluatethe magnitude of contribution (MoC) of each original featurex by calculating the length of x in the latent space. Since theeigenvalue represents the significance of each component, we

Page 2: Feature Space Decomposition for Autonomous Robot ... · Feature Space Decomposition for Autonomous Robot Adaptation in Programming by Demonstration Chi Zhang 1, Hao Zhang2, and Lynne

Human  Demonstra-ons

Reinforcement  Learning   Ad

apted  

Solu5o

n  

Ac5on  Space    Defini5on  

Policy  Ini5aliza5on  

Reinforcement  Learning

GMM  Encoding Feature  Space  Decomposi0on

Projec5on  by  PCA  

−0.50

0.5 −0.5 0 0.5

−0.6

−0.4

−0.2

0

0.2

0.4

0.6x2

x1 x3

x4

x5

Feature  Selec5on

 

Non-­‐principal  features  {xi}

Principal  features      {ζ1,  ζ2,  …,  ζp}

Fig. 2. Overview of the complete robot adaptation system, with our newFSD approach highlighted by red color.rescale x’s projection in each basis and calculate the multi-dimensional length to represent the MoC (M ):

Mxi =

√∑j

(wijλi)2 = λi

√∑j

w2ij

(2)

Then, we sort the ranking of M to extract a compact set of{xk1 , xk2 , ..., xkn}, n ≤ D that satisfies

∑ni=1Mxki

≤ T2,∑n+1i=1 Mxki

> T2. This new set of features are referred toas the non-principal features that convey the least relevantinformation of the demonstrations, thus are less restrictedby the training process and contain large variations. Thenon-principal features will be employed to define the actionspace of RL for robot adaptation. Given that each feature hask DoFs, the dimensionality of the search space is reducedexponentially from kD to kn. It is noteworthy that the sizeof the non-principal feature set (value of n) is decided by atunable threshold T2 ∈ [0, 1], so the proposed FSD approachis scalable and guaranteed to find optimal solutions if exist.

B. Skill Learning and AdaptationIn order to find a generalized trajectory from the demon-

stration data, we apply the Gaussian Mixture Model (GMM)[5] to encode the principal features ξ, then adopt the Expec-tation Maximization [6] to estimate the parameters of GMM,and finally obtain a generalized trajectory by regression.

To allow robots to find solutions when the learned knowl-edge does not directly apply in the new environment, we usea RL module based on the non-principal features to performautomatous robot adaptation, and adopt the Gradient-DescentSARSA(λ) algorithm [7] to find the satisfactory policy.

III. EXPERIMENTAL RESULTSIn order to demonstrate the effectiveness of our approach,

we conducted experiments using a NAO robot to perform awriting task and a drinking task. In the experiments, jointangles are used as features extracted from the demonstrationdata and then applied to control the robot. The non-principalfeatures selected by T2 = 25% are {x1, x5} and {x2} forthe two tasks respectively, as Table I shows. Using the FSDapproach, the robot successfully finds the policy to adapt tothe changes in environments, as illustrated in Fig. 3. We alsoconduct empirical studies to compare our FSD approach withtwo baselines: method based on the original feature space,and method using the principal feature space [4] in Fig. 4.

TABLE IMOC VALUES OF THE ORIGINAL 5-DIM. FEATURES.

MoC (M) x1 x2 x3 x4 x5

Writing task 0.86% 22.10% 23.01% 35.97% 18.05%Drinking task 27.53% 6.20% 23.16% 22.09% 21.02%

(a) (b)

Fig. 3. (a) Human demonstrations of the writing task (first row) andexperimental results of robot adaptation to the lowered table (second row),and the tilted table (third row). (b) Human demonstrations of the drinkingtask in the obstacle free training environment (first row), and experimentalresults of failure without FSD-based adaptation (second row), and adaptationwith FSD-based approach (third row) in unseen obstacle environment. Theenvironment, e.g., the states of objects, and the robot, are simulated for RLtraining. Then the learned trajectories are applied to the physical robot.

101 102 103

−120

−100

−80

−60

−40

−20

0

Trials

Accu

mul

ated

Rew

ard

Two Non−prin. FeaturesPrin. FeaturesAll Features

(a)

100 101 102 103−35

−30

−25

−20

−15

−10

−5

Trials

Accu

mul

ated

Rew

ard

One Non−prin. FeatureTwo Non−prin. FeaturesPrin. FeaturesAll Features

(b)Fig. 4. Accumulated rewards of (a) the writing task and (b) the drinkingtask. FSD approach achieves better efficiency by converging more quicklythan the all feature space method, which doesn’t converge within 104

trials due to the curse of dimensionality. FSD approach also outperformsthe principal feature space method in terms of efficiency. A plausibleexplanation is that the principal feature space method is optimized and thusrestricted by the training environments, while the non-principle feature spaceusually contains more variations and can have a better chance to includesolutions for new situations that are different from the environments duringthe training process. FSD approach is more effective by converging to abetter rewards, compared with the principal feature space methods whichmay not be able to find an optimal solution (although it can converge), asshown by the converged smaller reward, e.g., in the drinking task. On theother hand, if an optimal solution exists in the original feature space, ourFSD approach is able to find it due to its scalability to the full feature space.

IV. CONCLUSIONS

We introduce a novel Feature Space Decomposition (FSD)approach, within the PbD and RL learning methodology, toenable robots to effectively adapt to environment changes.Results have validated that our FSD approach enables moreeffective autonomous robot adaptation than the conventionaltechniques based on the principal feature space. In addition,the FSD-based robot adaptation approach can often findsolutions more quickly.

REFERENCES

[1] A. Billard, S. Calinon, R. Dillmann, and S. Schaal, Handbook ofRobotics, ch. 59: Robot Programming by DemonstrationRobot Pro-gramming by Demonstration. Springer, Jan 2008.

[2] F. Guenter, M. Hersch, S. Calinon, and A. Billard, “Reinforcementlearning for imitating constrained reaching movements,” RSJ Adv.Robot., vol. 21, no. 13, pp. 1521–1544, 2007.

[3] J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning inrobotics: A survey,” Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274,2013.

[4] S. Bitzer, M. Howard, and S. Vijayakumar, “Using dimensionalityreduction to exploit constraints in reinforcement learning,” IROS, 2010.

[5] S. Calinon, F. Guenter, and A. Billard, “On learning, representing andgeneralizing a task in a humanoid robot,” IEEE Trans. on Syst. Man,and Cybern., vol. 37, no. 2, pp. 286–298, 2007.

[6] C. M. Bishop, Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc, 2006.

[7] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.MIT Press, 1998.