lecture 10 & 11 multi-agent systems lecture 10 & 11 university “politehnica” of...

Click here to load reader

Post on 26-Dec-2015

213 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • Lecture 10 & 11 Multi-Agent Systems Lecture 10 & 11 University Politehnica of Bucarest 2004-2005 Adina Magda Florea [email protected] http://turing.cs.pub.ro/blia_2005 http://turing.cs.pub.ro/
  • Slide 2
  • Machine Learning Lecture outline 1 Learning in AI (machine learning) 2 Learning decision trees 3 Version space learning 4 Reinforcement learning 5 Learning in multi-agent systems 5.1 Learning action coordination 5.2 Learning individual performance 5.3 Learning to communicate 5.4 Layered learning 6 Conclusions
  • Slide 3
  • 3 1 Learning in AI n What is machine learning? Herbet Simon defines learning as: any change in a system that allows it to perform better the second time on repetition of the same task or another task drawn from the same population (Simon, 1983). In ML the agent learns: n knowledge representation of the problem domain n problem solving rules, inferences n problem solving strategies
  • Slide 4
  • 4 Classifying learning In MAS learning the agents should learn: n what an agent learns in ML but in the context of MAS - both cooperative and self-interested agents n how to cooperate for problem solving - cooperative agents n how to communicate - both cooperative and self- interested agents n how to negotiate - self interested agents Different dimensions n explicitly represented domain knowledge n how the critic component (performance evaluation) of a learning agent works n the use of knowledge of the domain/environment
  • Slide 5
  • 5 Single agent learning Learning Process Problem Solving K & B Inferences Strategy Performance Evaluation Learning results Results Environment Feed-back Teacher Feed-back Data
  • Slide 6
  • 6 NB: Both in this diagram and the next, not all components or flow arrows are always present - it depends on the type of agent (cognitive, reactive), type of learning, etc. Self-interested learning agent Learning Process Problem Solving K & BSelf InferencesOther Strategyagents Performance Evaluation Learning results Results Environment Communication Actions Feed-back Agent Feed-back Data
  • Slide 7
  • 7 Learning Process Problem Solving K & BSelf InferencesOther Strategyagents Learning results Results Learning Process Problem Solving K & BSelf InferencesOther Strategyagents Learning results Cooperative learning agents Performance Evaluation Environment Agent Communication Actions Feed-back Data Communication
  • Slide 8
  • 8 2 Learning decision trees n ID3 - Quinlan -80 n ID3 algorithm classifies training examples in several classes n Training examples: attributes and values 2 phases: n build decision tree n use tree to classify unknown instances Decision tree - definition 231 Shape ColorSizeClassification circleredsmall+ circleredbig+ triangleyellowsmall- circleyellowsmall- triangleredbig- circleyellowbig-
  • Slide 9
  • 9 The problem of estimating an individuals credit risk on the basis of: - credit history, - current debt, - collateral, - income No. Risk (Classification) Credit History DebtCollateralIncome 1High/grandBadHighNone$0 to $15k 2HighUnknownHighNone$15 to $35k 3ModerateUnknownLowNone$15 to $35k 4HighUnknownLowNone$0k to $15k 5LowUnknownLowNoneOver $35k 6LowUnknownLowAdequateOver $35k 7HighBadLowNone$0 to $15k 8ModerateBadLowAdequateOver $35k 9LowGoodLowNoneOver $35k 10LowGoodHighAdequateOver $35k 11HighGoodHighNone$0 to $15k 12ModerateGoodHighNone$15 to $35k 13LowGoodHighNoneOver $35k 14HighBadHighNone$15 to $35k
  • Slide 10
  • Income? High risk Credit history? Low riskModerate risk Debt? Credit history? High risk Moderate risk High risk $0K-$15K $15K-$35K $Over 35K UnknownBadGood HighLow Unknown Bad Good Decision tree ID3 assumes the simplest decision tree that covers all the training examples is the one it should be picked. Ockhams Razor (or Occam ??), 1324: It is vain to do with more what can be done with less Entities should not be multiplied beyond necessity.
  • Slide 11
  • Information theoretic test selection in ID3 n Information theory the info content of a message n M={m 1, , m n }, p(m i ) n The information content of the message M I(M) = Sum i=1,n [-p(m i )*log 2 (p(m i ))] I(Coin_toss) = - p(heads)log 2 (p(heads)) p(tails)log 2 (p(tails) = - 1/2log 2 (1/2) 1/2log 2 (1/2) = 1 bit I(Coin_toss) = - p(heads)log 2 (p(heads)) p(tails)log 2 (p(tails) = - 3/4log 2 (3/4) 1/4log 2 (1/4) = 0.811 bits p(risk_high) = 6/14 p(risk_moderate) = 3/14 p(risk_low) = 5/14 n The information in any tree that covers the examples I(Tree) = -6/14log 2 (6/14)-3/14log 2 (3/14)-5/14log 2 (5/14)
  • Slide 12
  • n The information gain provided by making a test at the root of the current attribute The amount of information needed to complete the tree after making A the root A n values{C 1, C 2, ,C n } n Gain(A) = I(C) - E(A) n C 1 ={1,4,7,11} C 2 ={2,3,12,14} C 3 ={5,6,8,9,10,13} n E(income)=4/14*I(C 1 ) + 4/14*I(C 2 ) + 6/14*I(C 3 ) = 0.564
  • Slide 13
  • n Assesing the performance of ID3 Training set and test set average prediction of quality, happy graphs n Broadening the applicability of decision trees n Missing data: how to classify an instance that is missing one of test attributes? Pretend the instance has all possible values for the attribute, weight each value according to its frequency among the examples, follow all branches and multiply weights along the path n Multivalued attributes: an attribute with a large number of possible values gain ratio gain ratio = selects attributes according to Gain(A)/I(C A ) n Continuous-valued attributes - discretize
  • Slide 14
  • 14 3 Version space learning n P and Q sets that match p, q in FOPL n Expression p is more general than q iff P Q - we say that p covers q color(X,red) color(ball,red) n If a concept p is more general than a concept q then p(x), q(x) descriptions that classify objects as being positive examples: x p(x) positive(x) x q(x) positive(x) n p covers q iff q(x) positive(x) is a logical consequence of p(x) positive(x) n Concept spaceobj(X,Y,Z) n A concept c is maximally specific if it covers all positive examples, none of the negative examples, and for any other concept c that covers the positive examples, c is more general than c - S n A concept c is maximally general if it covers none of the negative examples, and for any other concept c that covers none negative examples, c is more general than c - G
  • Slide 15
  • 15 The candidate elimination algorithm n Algorithms for searching the concept space; overgeneralization, overspecialization Specific to general search for hypothesis set S: maximally specific generalizations n Initialize S to the first positive training instance n Be N the set of all negative instances seen so far n for each positive instance p do for every s S do if s does not match p, replace s with its most specific generalization that matches p delete from S all hypotheses more general than some other hypothesis in S delete from S all hypotheses that match a previously observed negative instance in N n for every negative instance n do delete all members of S that match n add n to N to check future hypotheses for overgeneralization
  • Slide 16
  • 16 The candidate elimination algorithm General to specific search for hypothesis set G: maximally general specializations n Initialize G to contain the most general concept in the space n Be P the set of all positive instances seen so far n for each negative instance n do for every g G that matches n do replace g with its most general specializations that do not match n delete from G all hypotheses more specific than some other hypothesis in G delete from G all hypotheses that fail to match some positive instance in P n for every positive instance p do delete all members of G that fail to match p add p to P to check future hypotheses for overspecialization
  • Slide 17
  • 17 n Bidirectional search; S and G Candidate elimination n Initialize S to the first positive training instance n for each positive instance p do delete from G all hypotheses that fail to match p for every s S do if s does not match p, replace s with its most specific generalization that matches p delete from S all hypotheses more general than some other hypothesis in S delete from S all hypotheses more general than some hypothesis in G n for every negative instance n do delete all members of S that match n for every g G that matches n do replace g with its most general specializations that do not match n delete from G all hypotheses more specific than some other hypothesis in G delete from G all hypotheses more specific than some other hypothesis in S
  • Slide 18
  • 18 G: {obj(X, Y, Z)}Positive: obj(small, red, ball) S: { } G: {obj(X,Y,Z)}Negative: obj(small, blue, ball) S: {obj(small, red, ball) } G: {obj(X, red, Z)}Positive: obj(large, red, ball) S: {obj(small, red, ball) } G: {obj(X, red, Z)}Negative: obj(large, red, cube) S: {obj(X, red, ball) } G: {obj(X, red, ball)} S: {obj(X, red, ball)
  • Slide 19
  • 19 4 Reinforcement learning n Combines dynamic programming and AI machine learning techniques n Trial-and-error interactions with a dynamic environment n The feedback of the environment reward or reinforcement search in the space of behaviors genetic algorithms n Two main approaches learn utility based on statistical techniques and dynamic programming methods
  • Slide 20
  • 20 A reinforcement-learning model B agent's behavior i input = current state of the env r value of r