reinforcement learning in autonomic computing

6
1 Project assignment for the Advanced Topics of Machine Learning PhD course; instructor: Prof. Alessandro Lazaric Reinforcement Learning in Autonomic Computing Davide Basilio Bartolini, PhD student Politecnico di Milano, Dipartimento di Elettronica e Informazione [email protected] Abstract Autonomic computing (AC) was born a decade ago as a response to the increasing complexity of computing systems at the level of IT infrastructures, proposing to make such infrastructures able to autonomously manage (part of) their complexity, thusly easing the ever more time and expertises requiring job of human developers and administrators. Since its birth, AC has developed into a lively multidisciplinary research field, leveraging theories and techniques from different disciplines, such as computer science, control theory, and artificial intelligence (AI). One of the branches of AI harnessed in AC is Reinforcement learning (RL), which tackles the problem of an agent learning through trial and error interaction with a dynamic environment and it is being employed in AC to automatically learn policies in order to control the runtime behavior of a computing system. This survey, after presenting relevant concepts from the fields of both autonomic computing and reinforcement learning, reviews relevant works in literature employing RL techniques to obtain self–management of (parts of) a computing system, which is the ultimate goal of autonomic computing. I. I NTRODUCTION AND BACKGROUND Prior to reviewing works coupling autonomic computing and reinforcement learning, it is convenient to report some basics about the two topics, to define concepts that will be used in this survey. For this reason, this Section provides an overview of the concepts at the base of AC and RL. A. Autonomic Computing Autonomic computing was born in 2001 with a manifesto by IBM researchers [3] , envisioning an IT industry where computing systems management should be automatically handled with minimal human intervention. The idea was borrowed from the autonomic nervous system in biological life, which autonomously maintains homeostasis within the organism with the central nervous system being unaware of its workings. This idea has been developed at a theoretical level, leading to a well–defined vision and to a characterization of the desired properties for an autonomic computing system [5,11] . The research field of autonomic computing is still young and the realization of its vision has yet to be achieved, as many research challenges are still unsolved [6] . In particular, the process of applying the ideas at the base of AC to real systems, moving forward from pure theoretical investigation is being tackled in recent times (e.g., the Metronome framework [12] applies concepts from AC to performance management at the operating system level). Even though autonomic computing can be applied at very different levels, a common feature of any such infrastructure is the presence of a feedback control loop exploiting online information about the system and its environment to adapt towards specified goals. As shown in Figure 1, there are different formalisms to describe such control loop. Detection Monitoring Decision Action Environment System Sensors Actuators (a) Self–adaptation loop [11] Knowledge Analyzing Monitoring Planning Executing Environment System Sensors Actuators (b) Monitor–Analyze–Plan–Execute with shared Knowledge base (MAPE-K) loop [5] Observation Decision Action Policy(ies) Alteration of the status System / Environment Status (c) Observe–Decide–Act (ODA) loop [12] Figure 1. Three different representations of the control loop at the base of an autonomic computing system

Upload: ratnadeep-bhattacharya

Post on 16-Sep-2015

212 views

Category:

Documents


0 download

DESCRIPTION

Autonomic computing

TRANSCRIPT

  • 1Project assignment for the Advanced Topics of Machine Learning PhD course; instructor: Prof. Alessandro Lazaric

    Reinforcement Learning in Autonomic ComputingDavide Basilio Bartolini, PhD student

    Politecnico di Milano, Dipartimento di Elettronica e [email protected]

    Abstract

    Autonomic computing (AC) was born a decade ago as a response to the increasing complexity of computing systems atthe level of IT infrastructures, proposing to make such infrastructures able to autonomously manage (part of) their complexity,thusly easing the ever more time and expertises requiring job of human developers and administrators. Since its birth, AC hasdeveloped into a lively multidisciplinary research field, leveraging theories and techniques from different disciplines, such ascomputer science, control theory, and artificial intelligence (AI). One of the branches of AI harnessed in AC is Reinforcementlearning (RL), which tackles the problem of an agent learning through trial and error interaction with a dynamic environment andit is being employed in AC to automatically learn policies in order to control the runtime behavior of a computing system.

    This survey, after presenting relevant concepts from the fields of both autonomic computing and reinforcement learning,reviews relevant works in literature employing RL techniques to obtain selfmanagement of (parts of) a computing system, whichis the ultimate goal of autonomic computing.

    I. INTRODUCTION AND BACKGROUND

    Prior to reviewing works coupling autonomic computing and reinforcement learning, it is convenient to report some basicsabout the two topics, to define concepts that will be used in this survey. For this reason, this Section provides an overview ofthe concepts at the base of AC and RL.

    A. Autonomic Computing

    Autonomic computing was born in 2001 with a manifesto by IBM researchers [3], envisioning an IT industry where computingsystems management should be automatically handled with minimal human intervention. The idea was borrowed from theautonomic nervous system in biological life, which autonomously maintains homeostasis within the organism with the centralnervous system being unaware of its workings. This idea has been developed at a theoretical level, leading to a welldefinedvision and to a characterization of the desired properties for an autonomic computing system [5,11]. The research field ofautonomic computing is still young and the realization of its vision has yet to be achieved, as many research challenges arestill unsolved [6]. In particular, the process of applying the ideas at the base of AC to real systems, moving forward frompure theoretical investigation is being tackled in recent times (e.g., the Metronome framework [12] applies concepts from ACto performance management at the operating system level).

    Even though autonomic computing can be applied at very different levels, a common feature of any such infrastructure isthe presence of a feedback control loop exploiting online information about the system and its environment to adapt towardsspecified goals. As shown in Figure 1, there are different formalisms to describe such control loop.

    Detection

    Monitoring

    Decision

    Action

    Environment

    System

    Sensors

    Actuators

    (a) Selfadaptation loop [11]

    Knowledge

    Analyzing

    Monitoring

    Planning

    Executing

    Environment

    System

    Sensors

    Actuators

    (b) MonitorAnalyzePlanExecute with sharedKnowledge base (MAPE-K) loop [5]

    Observation

    Decision

    ActionPolicy(ies)

    Alteration ofthe status

    System / EnvironmentStatus

    (c) ObserveDecideAct (ODA) loop [12]

    Figure 1. Three different representations of the control loop at the base of an autonomic computing system

  • 2A first version of the autonomic control scheme is named Self-adaptation control loop [11] and it is represented in Figure 1(a).This representation emphasizes the separation between the detection and decision phases. The detection process is in chargeof analyzing the data coming from the sensors and to detect when something should be changed in order to restore the systemfrom an unwanted state into its desired working conditions. The decision process is in charge of determining what should bechanged, i.e., picking the right action to be performed. A second version of the autonomic control loop is called MAPE-K [5]

    and it is represented in Figure 1(b). When an autonomic element is described by means of the MAPE-K representation, thecomponent which implements the control loop is referred to as the autonomic manager, which interacts with the managedelement by gathering data through sensors and acting through actuators (or effectors). This control scheme emphasizes the factthat a shared knowledge about the system and its environment must be detained in order to successfully execute the autonomiccontrol scheme. A third version of the autonomic control loop is named ODA loop [12] and it is represented in Figure 1(c). Thisrepresentation is more general with respect to the MAPE-K and Self-adaptive schemes and, being more generic, it summarizesthe essence of the autonomic control loop. The steps of the ODA loop are observation of the internal and environmental status,decision of what action (or whether no action at all) is to be taken, based on the observations and action, i.e., perturbation ofthe internal or external status in order to modify it towards a better condition for the system.

    For the purpose of this survey, the most important phase in the autonomic control loop is indeed the one where decisionsare made (i.e., the decision stage, referring to the self-adaptation or to the ODA representations, or the analyzing and planningphases, referring to the MAPE-K formalism). In fact, reinforcement learning provides the possibility of learning a decisionmaking mechanism through an online trial and error interaction with the system; provided that information about the currentstatus and the desired goals and knobs to act onto system parameters are provided by other autonomic components, the decisionphase is where RL techniques can be employed to serve AC.

    B. Reinforcement Learning

    Reinforcement learning is a branch of AI accounting theory and techniques for learning optimal policies in a sequentialdecisionmaking situation. Differently from other learning approaches (e.g., reinforcement learning), RL is based on theassumption of a stochastic environment without the possibility of knowing examples of best actions for specific situations.In RL, an agent builds a policy for solving a certain problem through a trial and error process, receiving feedback from theenvironment in the form of a reward associated with each tried action [9]. More in details, the trial and error approach is takenin active RL, whereas in passive RL the agent just observes the evolution of the world, trying to learn the utilities of beingin various states. This survey focuses on active RL, since it is a more powerful model, allowing explicit exploration of thestate space, and it is supported by the structure of the autonomic control loop (e.g., ODA see Section I-A). Moreover, it ispossible to distinguish between RL algorithms that perform a search in the space of all the possible behaviors (e.g., geneticprogramming), and algorithms leveraging statistical techniques to estimate the utility of states and actions [4]. Most of the worksconnecting autonomic computing and reinforcement learning make use of algorithms in the second of these two classes.

    In the standard model for RL, an agent is represented as an entity connected to its environment through perception andaction, as represented in Figure 2 (borrowed from Russell and Norvig [9]). It should be quite immediate to see that the model

    Figure 2. Generic representation of an agent [9]

    is very similar to those given for the control loop in an autonomic computing system, suggesting the applicability of RL tothat context. More in details, at each step of interaction with the environment, the agent is provided with an input i through itssensors, observing some property of the current state of the environment s; then, the agent chooses an action a to be performedthrough its actuators (or effectors). This action can change the state of the environment and this change is reflected back tothe agent with a scalar reinforcement signal r, usually in the form of a reward (not shown in Figure 2). The duty of the agentis to learn a behavior B, through a certain algorithm based on a trial and error process, such that the longrun sum of therewards is maximized. More formally, assuming that the environment can be modeled according to a Markov Decision Process(MDP) [7], a RL model is described by [4]:

    a discrete set S of possible states for the environment; a discrete set A of possible actions the agent can perform;

  • 3 a set of scalar reinforcement signals, either binary {0, 1} or R, representing the reward.In the most simple scenario, the percepts gathered by the agent from the environment faithfully and completely describe itscurrent status s S; in this case the environment is said to be completely observable. More complex models (representingthe environment as a PartiallyObservable Markov Decision Process POMDP) can take into account the possibility of theenvironment to be partially observable, i.e., the agent does not get a faithful and complete perception of the state of theenvironment, but the status is filtered through an input function I. For instance, the perception of the agent could indicate heenvironment be in a certain state si with pi probability, with pi / {0, 1}. The objective of the agent is to learn a policy piwhich maps states to actions with the aim of maximizing a certain longrun measure of reinforcement (i.e., to maximize afunction of the overall achieved reward). What characterizes RL with respect to other supervised learning approaches is thatthe model admits the agent receive only immediate information and there be no preservation of I/O pairs.

    II. REINFORCEMENT LEARNING TO AID AUTONOMIC COMPUTING

    Autonomic computing systems classically rely on an autonomic manager leveraging knowledge on the system as formalizedin a model defined at design time. This approach, despite making use of runtime information through an autonomic control loop,somehow fails to embody the full potential of AC, as the system model is predefined. In fact, it is ever more difficult to provideaccurate models for computing systems able to let an autonomic control mechanism achieve the desired performance and thesedifficulties are a strong limiting factor for the adoption of selfmanagement techniques in contemporary computing systems [14].Machine learning has been seen as a very promising technique to address such issue, called the "knowledge bottleneck", as MLcan be leveraged to incrementally build a system model through online learning needing no (or very little) builtin previousknowledge. Moreover, as already pointed out in Section I-B, using the RL operational model within the autonomic control loop(e.g., the MAPE-k loop, considered by Tesauro [14]) appears direct and natural, assuming that the monitoring phase (realizedthrough sensors) provide relevant state descriptions and reward signals. The only major mismatch between the two modelsis that RL policies are generally seen as reactive planners (i.e., they make immediate decisions, without explicit search orforecasting of future states [14]), while the planning phase in the MAPE-k loop is more general. This mismatch may limit RLapplicability in some more complex cases, but it does not impair its use in common cases for AC, such as management ofreal time applications, which lie in a reactive context [14].

    The remaining of this Section provides an overview of some interesting works found in literature building RL in an ACscenario to provide the decision phase (or, equivalently, the analyzing and planning phases) of the autonomic control loop.

    A. SelfOptimization for QoS

    Whiteson and Stone [16], in 2004, were among the first to consider the use of a RL module to realize selfoptimization in thecontext of network routing. They use a learning approach to continuously improve the system performance, and a schedulingalgorithm relying on a heuristic to take into account packet specificities such as priorities. The routing schema is based onan online learning technique called Qrouting, according to which each node of a network is enhanced with a RL modulemaintaining a table of estimates about the time required to route packets in different ways. The algorithm is implemented in asimulation environment modeling the interactions among the nodes of a network, represented in a graph. Transmitted packetsare modeled as jobs and a reward based on a utility function is associated to each completed job (i.e., routed packet). Thesimulation results are positive, but this approach only deals with one QoS dimension: the routing time, thus not exploiting themultiobjective capabilities of RL algorithms.

    B. Learning for Multiple Objectives

    One of the strengths of RL applied to AC is the possibility of specifying a multiobjective reward function to obtain learningtowards more than one dimension. Amoui, Salehie, Mirarab, and Tahvildari [1] describe their work in the context of autonomicsoftware and build RL into a MAPE-k adaptation loop at the planning phase to learn a policy for selecting the best adaptiveaction at a give time. The reasons put forth by the authors for employing RL are mainly four:

    the chosen RL algorithm provides multiobjective learning; the RL agent can be modified (by adding punishments when the goals are not satisfied) to perform both reactive and

    deliberative decision making; RL provides dynamic online learning, providing the ability of adapting to previously unseen situations and of managing

    uncertainty; RL can be very timeefficient, with algorithms for decision making performing in O(m) time for m possible actions on

    a learned policy.According to the authors, these reasons highlight how RL can be a promising solution to the problem of planning in anautonomic manager. The authors also address the problem of exploration through trial and error, which may be problematicin cases when making wrong decisions for learning may be unacceptable. Three possible solutions are proposed:

    having a learning phase during the testing of the system to be used for exploration;

  • 4 initializing the learning algorithm with values determined by human experts, so that the initial exploration be more focused; relying on simulation to perform the learning phase before the actual system is implemented.

    Based on this rationale, the authors propose a RLbased decision maker based on the StateActionRewardStateAction(SARSA) algorithm [4]; the model for the decisionmaking process is represented in Figure 3. The monitoring process is

    StateGenerator

    StateMapper

    ttatt(attributes)

    rt(reward)

    st(state)

    RewardFunction RL

    Engine at(action)

    Adaptable Web Application

    Sens

    or

    Effe

    ctoratt+1

    Figure 3. Process model of the RLbased decision maker proposed by Amoui et al.; adapted from the original [1]

    modeled as the measurement of a set of attributes of the environment ati, with i 1, . . . , n and the objectives are representedas a set of k goals, with a set of k binary variables G(s) = {g1(s), . . . , gk(s)} indicating whether each goal is or is not beingmet. The possible adaptation actions are represented in a set AC = {a1, . . . , am}. The major modules involved in the processare a state generator discretizing the observed values of the attributes; a state mapper aggregating the discretized attributes ina single key, representing the state; a reward function computing the reinforcement signal and considering the current valuesof the k variables in the set G(s); a RL engine which both updates the current state model (represented as a Qtable andselects the next action. The mechanism is implemented and evaluated within a simulation model of a news web application,originally developed by Salehie and Tahvildari [10], and the results show that the system is able to learn to behave better thanchoosing actions at random, with RL used for learning a policy in a preliminary testing/tuning phase.

    C. Distributed and Collaborative Scenarios

    One of the most interesting problems in autonomic computing research is the management of distributed adaptation policies;this approach is useful in distributed contexts where keeping a consistent global state is too complex (e.g., in a cloud computingenvironment). This problem has been tackled, at a theoretical level, by Dowling, Cunningham, Curran, and Cahill [2], whopropose a reinforcement learningbased model called Collaborative Reinforcement Learning (CRL) to tackle the complextimevarying problem of coordinating autonomic components (i.e., agents) distributed in a system with no notion of a globalstate. CRL extends reinforcement learning with a coordination model describing how agents cooperate to solve a systemwideoptimization problem decomposed in a set of Discrete Optimization Problems (DOPs). Figure 4 shows a schema representingthe approach to collaborative distributed solution of DOPs. Each DOP is modeled as a MDP and the CRL model solves

    AgentAgent

    advertise(V(s)) | delegate(DOP)

    Partially SharedEnvironment

    action ai(t)

    reward ri(t)

    state si(t)

    rj(t+1)

    sj(t+1)

    reward rj(t)

    state sj(t)

    action aj(t)

    si(t+1)

    ri(t+1)

    Decay

    Figure 4. Schema of the Collaborative Reinforcement Learning (CRL) approach proposed by Dowling et al. [2]

    systemwide problems by specifying how individual agents (i.e., autonomic components) can either resolve a certain DOPvia reinforcement learning (i.e., learning a policy to maximize a certain function of the reinforcement signal) and share thesolution with the other agents or delegate the solution to a neighboring agent. Within this model, DOP may be delegated severaltimes before being eventually handled by an agent; reasons for delegation may be the impossibility for an agent of solvingthe problem or the estimated cost of doing so being higher than that foreseen by a neighboring agent. Details on the CRLalgorithm are formalized in the paper [2], where the authors also propose a probabilistic ondemand network routing protocol

  • 5based on CRL and called SAMPLE. This protocol has been implemented in a network simulator framework and simulationresults show how SAMPLE is capable of showing autonomic properties of selfoptimization leveraging the CRL algorithm.

    The application of RL in a distributed AC scenario is treated, at a more practical level, also by Rao, Bu, Wang, and Xu [8],who present a distributed RL algorithm that facilitates the provisioning of virtualized resources in cloud computing in anautonomic fashion. They a reinforcement learning algorithm to manage autonomic allocation of virtual resources to VMs uponchanges in the applications workload. By doing so, VM resources can be automatically provisioned to match the currentapplications demand, and not the peak one. The proposed approach is based on modelbased RL and the states considered inthe learning algorithm are the possible VM resource allocations; the available changes to the allocations form the set of actions.The reinforcement signal is fed to the RL decision mechanism whenever it decides to adjust the resource allocation for theVMs and it consists of performance feedback from individual VMs. After a sufficient interaction time with the environment(exploring the solution space by trying different configurations and receiving feedback), the controller is proved to be able toobtain good estimations of the allocation decisions, given the state of the workload on the different VMs. Further results showthat, starting from an arbitrary initial setup, the controller is able to choose optimal resource allocations for the managed VMs.

    D. Hybrid Approaches

    Under some circumstances, reinforcement learning may be paired with different techniques to get better results in termsof autonomic management. Vienne and Sourrouille [15] associate RL with a control mechanism to improve and adapt theQoS management policy in a dynamic execution environment. They present a middleware consisting in a layer for resourcesallocation with the goal of managing QoS in a computing system characterized by a dynamic runtime behavior. RL is used toestimate the benefit of taking a certain action with the environment (i.e., the managed computing system) presenting a certainstate. The main advantage brought by the proposed middleware is for the application designer to not worry too much forthe performance of the applications, but with highlevel descriptions; moreover, the system is made capable of coping withunexpected changes in the execution context. The use of RL along with a control mechanism allows to require less prefedinformation on the system (e.g., a control theory based controller needs a precise model of the system in order to be effective).Even though the RL controller provides less guarantees, it requires far less apriori information on the controlled system.

    Tesauro, Jong, Das, and Bennani [13] tackles the problem of avoiding wild exploration in bootstrapping the RL system inscenarios where the cost of taking a toowrong decision is higher than the learning benefit for the algorithm. The proposedsolution couples RL with a policy based on queuing model: to bootstrap the RL controller, the managed system is ruled withthe queuing modelbased policy and the RL algorithm is trained offline based on the collected data. This hybrid approachallows the RL controller to bootstrap from existing management policies, substantially reducing learning and costliness. Theeffectiveness of the approach is tested in the context of a simple datacenter prototype.

    III. CONCLUSIONS AND PERSPECTIVE

    Within the field of autonomic management of computing system and self-* systemmanagement problems, reinforcementlearning was introduced as a novel and radically different approach with respect to decisionmaking techniques classicallyleveraged in such scenarios. The first applications of RL in this context are still relatively young, but different works inliterature have explored applications of RL into various scenarios. The main strength of RL with respect to other decisionmaking methods is requiring less systemspecific knowledge while being able to still synthesize reasonably nearoptimalpolicies. Certainly, there are still unresolved issues with RL, mainly related to neareternal training times, highly complexstate descriptions and poor performance while learning due to random exploration, which may be too costly in some scenarios.Some of these problems have bee tackled, e.g., using a hybrid approach to dispense with the cost of online learning fromscratch through pure trial an error. These advances, together with applications of RL in realworld problems (e.g., resourcesallocation for VMs management), are showing that RL can be truly effective as a decision mechanism for AC, and that, withmore research, it will be possible to achieve its promises of outperforming wellestablished autonomic control techniques.

  • 6REFERENCES[1] M. Amoui, M. Salehie, S. Mirarab, and L. Tahvildari. Adaptive Action Selection in Autonomic Software Using Reinforcement Learning.

    In Autonomic and Autonomous Systems, 2008. ICAS 2008. Fourth International Conference on, pages 175181, march 2008.[2] J. Dowling, R. Cunningham, E. Curran, and V. Cahill. Collaborative reinforcement learning of autonomic behaviour. In Database and

    Expert Systems Applications, 2004. Proceedings. 15th International Workshop on.[3] Paul Horn. Autonomic computing: IBMs perspective on the state of information technology, Oct 2001. [Online] Available: http:

    //www.research.ibm.com/autonomic/manifesto/autonomic_computing.pdf.[4] Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: a survey. J. Artif. Int. Res., 4(1):237285,

    May 1996.[5] Jeffrey O Kephart and David M Chess. The Vision of Autonomic Computing. Computer, 36(January):4150, 2003.[6] J.O. Kephart. Research challenges of autonomic computing. In Proceedings of the 27th international conference on Software engineering,

    ICSE 05, page 15AS22, New York, NY, USA, 2005. IEEe.[7] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York,

    NY, USA, 1st edition, 1994. ISBN 0471619779.[8] Jia Rao, Xiangping Bu, Kun Wang, and Cheng-Zhong Xu. Self-adaptive provisioning of virtualized resources in cloud computing.

    SIGMETRICS Perform. Eval. Rev., 39(1):321322, June 2011.[9] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall Press, Upper Saddle River, NJ, USA, 3rd

    edition, 2009. ISBN 0136042597, 9780136042594.[10] Mazeiar Salehie and Ladan Tahvildari. A weighted voting mechanism for action selection problem in self-adaptive software. In

    Proceedings of the First International Conference on Self-Adaptive and Self-Organizing Systems, SASO 07, pages 328331, Washington,DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-2906-2. doi: 10.1109/SASO.2007.4. URL http://dx.doi.org/10.1109/SASO.2007.4.

    [11] Mazeiar Salehie and Ladan Tahvildari. Self-adaptive software: Landscape and research challenges. ACM Transactions on Autonomousand Adaptive Systems, 4(2):142, May 2009.

    [12] Filippo Sironi, Davide Basilio Bartolini, Simone Campanoni, Fabio Cancare, Henry Hoffmann, Donatella Sciuto, and Marco D.Santambrogio. Metronome: operating system level performance management via self-adaptive computing. In Proceedings of the49th Annual Design Automation Conference, DAC 12, pages 856865, New York, NY, USA, 2012. ACM.

    [13] G. Tesauro, N.K. Jong, R. Das, and M.N. Bennani. A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation.In Autonomic Computing, 2006. ICAC 06. IEEE International Conference on, pages 6573, june 2006.

    [14] Gerald Tesauro. Reinforcement Learning in Autonomic Computing: A Manifesto and Case Studies. Internet Computing, IEEE, 11(1):2230, jan.-feb. 2007.

    [15] Patrice Vienne and Jean-Louis Sourrouille. A middleware for autonomic QoS management based on learning. In Proceedings of the5th international workshop on Software engineering and middleware, SEM 05, pages 18, New York, NY, USA, 2005. ACM.

    [16] Shimon Whiteson and Peter Stone. Towards autonomic computing: Adaptive network routing and scheduling. Autonomic Computing,International Conference on, 0:286287, 2004. doi: http://doi.ieeecomputersociety.org/10.1109/ICAC.2004.62.

    June 29, 2012Document produced with LATEX.

    Introduction and BackgroundAutonomic ComputingReinforcement Learning

    Reinforcement Learning to Aid Autonomic ComputingSelfOptimization for QoSLearning for Multiple ObjectivesDistributed and Collaborative ScenariosHybrid Approaches

    Conclusions and Perspective