figure 2.cse.unl.edu/~yxia/downloads/openhuman_2019_08_30.docx · web viewwe start by comparing the...

14
5.1 Benefits of Probabilistic Modeling 7c1 – Performance Metrics for 33-34-33 Population (a) Rewards per tick (b) Rewards per task (c) Percentage of winning bids Figure 1. Performance indicators with 1 subtask per task: (a) Rewards per tick, (b) Rewards per task, and (c) Percentage of winning bids Decision Making: Types of Tasks Agents Bid for and Win. The growth rate of average rewards per tick for P-agents shown in Figure 1(a) implies that probabilistic reasoning might help losing agents switch their bids to tasks with higher chances of winning but possibly less rewards per task. Indeed, Figure 1(c) shows that when the environment is not overly competitive with 50 or more available tasks, on average, P-agents can still win their bids more than 50% (and up to 75% in some cases and even 100% in the closed environment case) of the time and get a task assigned; yet non-P-agents can only manage to win a bid about 25%-37% of the time. To investigate further, Figure 1(b) shows that non-P-agents consistently bid for higher-rewarding tasks than those preferred by P-agents, regardless of the chances of winning. This shows how P-agents are able to use probabilistic modeling to improve their chances of winning bids. Furthermore, Figure 1(b) shows that P-agents eventually start to bid for higher-rewarding tasks when the environment becomes saturated with tasks (e.g., > 100 tasks). This implies that probabilistic modeling helps P-agents adapt to the competitiveness in the environment. Rate of Exploration: Uniqueness of Bids Made. To further support the above observations, we investigate the rate of exploration for new tasks among P- agents and non-P-agents and see how probabilistic modeling helps agents explore more tasks available in the environment. More specifically, we compare agents’ exploration behavior by looking at the ratio between the number of unique task types they bid for (UBM) and the total amount of bids they made (BM) in a game. We define this as the rate of exploration (ROE) for each agent. That is, if the agents bid for new types of tasks in the environment— and thus explore, possibly to adapt to low winning rate or reward gains, the ratio of UBM over BM will be higher and vice versa. Figure 2 shows the average rate of exploration (i.e., UBM/BM) for all agents, with 1 subtask for each task. From Figure 2, we have the following observations. 1

Upload: others

Post on 02-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

5.1 Benefits of Probabilistic ModelingComment by Yi Xia: I have copied what I had in the qualifying report so to be referenced from Section 5.4. This section probably needs more modifications later on. Especially all the graphs need to be updated for the correct population and numbers need to be adjusted accordingly as well.

7c1 – Performance Metrics for 33-34-33 Population

(a) Rewards per tick (b) Rewards per task (c) Percentage of winning bids

Figure 1. Performance indicators with 1 subtask per task: (a) Rewards per tick, (b) Rewards per task, and (c) Percentage of winning bids

Decision Making: Types of Tasks Agents Bid for and Win. The growth rate of average rewards per tick for P-agents shown in Figure 1(a) implies that probabilistic reasoning might help losing agents switch their bids to tasks with higher chances of winning but possibly less rewards per task. Indeed, Figure 1(c) shows that when the environment is not overly competitive with 50 or more available tasks, on average, P-agents can still win their bids more than 50% (and up to 75% in some cases and even 100% in the closed environment case) of the time and get a task assigned; yet non-P-agents can only manage to win a bid about 25%-37% of the time.

To investigate further, Figure 1(b) shows that non-P-agents consistently bid for higher-rewarding tasks than those preferred by P-agents, regardless of the chances of winning. This shows how P-agents are able to use probabilistic modeling to improve their chances of winning bids. Furthermore, Figure 1(b) shows that P-agents eventually start to bid for higher-rewarding tasks when the environment becomes saturated with tasks (e.g., > 100 tasks). This implies that probabilistic modeling helps P-agents adapt to the competitiveness in the environment.

Rate of Exploration: Uniqueness of Bids Made. To further support the above observations, we investigate the rate of exploration for new tasks among P-agents and non-P-agents and see how probabilistic modeling helps agents explore more tasks available in the environment. More specifically, we compare agents’ exploration behavior by looking at the ratio between the number of unique task types they bid for (UBM) and the total amount of bids they made (BM) in a game. We define this as the rate of exploration (ROE) for each agent. That is, if the agents bid for new types of tasks in the environment—and thus explore, possibly to adapt to low winning rate or reward gains, the ratio of UBM over BM will be higher and vice versa. Figure 2 shows the average rate of exploration (i.e., UBM/BM) for all agents, with 1 subtask for each task. From Figure 2, we have the following observations.

UBM/BM

Figure 2. Rate of exploration for 10 to 200 tasks with 1 subtask

First, we observe that the percentage of the unique bids made by P-agents starts from a high value (50% or above), then rapidly drops when the environment has 20 to 40 tasks, and eventually converges to the same level as achieved by non-P-agents with more tasks in the environment (e.g., > 100 tasks). This supports that probabilistic modeling helps agents to bid rationally by avoiding the bids with higher chances of losing, especially when they are acting solo. On the contrary, a high rate of exploration is not observed among non-P-agents, indicating that they bid more eagerly and blindly only for tasks with high rewards regardless of winning probabilities.

5.4 Impact of Agent’s Born-Type

Our previous investigations have provided us with some baseline performance results among the agents with four bidding strategies considering various needs of collaboration (i.e., with tasks that each requires different numbers of subtasks, e.g., 1-6 subtasks). To extend previous results and further investigate the impact of probabilistic modeling, we analyze agents’ performance in the same environmental settings but with different born types (i.e., the maximum level of expertise an agent was born with).

First, an apprentice agent has a novice-level of expertise for all its capabilities. For such an agent, its level of expertise is generated using a Gaussian distribution of mean = 0.15 and standard deviation = 0.175.

Second, a specialist agent is an expert that has at least one capability with a high level of expertise; i.e., generated using a Gaussian distribution with mean = 0.85 and standard deviation = 0.175.

Third, a generalist agent is average in some of its capabilities and a novice in the remaining. For such an agent, its average level of expertise is generated using a Gaussian distribution of mean = 0.5 and standard deviation = 0.0667 while its apprentice level of expertise following the same distribution as defined for an apprentice agent above.

In this investigation, we mainly focus on 2 sets of simulations: (1) agent population with 33% apprentices, 34% generalists, and 33% specialists, shown in Table 1; (2) agent population with 50% apprentices, 50% specialists, and no generalists, shown in Table 2.

Number of tasks

10 – 200, step 10

Number of subtasks

1 – 6, step 1

Number of agents

100

Number of capabilities

7

AO/TO

0.00 – 0.05, step 0.01

AD/TD

Agent born-type

33% Apprentice, 34% Generalists, 33% Specialists

Table 1. Detailed configuration of simulation environment for 33-34-33 population.

Number of tasks

10 – 200, step 10

Number of subtasks

1 – 6, step 1

Number of agents

100

Number of capabilities

7

AO/TO

0.00 – 0.05, step 0.01

AD/TD

Agent born-type

50% Apprentice, 50% Specialists

Table 2. Detailed configuration of simulation environment for 50-0-50 population.

5.4.1. Establishing a Baseline Using Tasks with Only One Subtask Each

We start by comparing the performance between 33-34-33 population agents and 50-0-50 population when agents completed tasks on their own, i.e., where each task requires only one subtask. These observations will serve as a baseline to provide a context for our later discussions when agents are required to team up to complete tasks. Figure 1 shows the average rewards per tick between 33-34-33 and 50-0-50 populations for 10 to 200 tasks with 1 subtask. First, for both populations, from Figure 1, we can see that P-agents are able to obtain more rewards than non-P agents. Also, all agents are able to improve their performance where there are more tasks in the environment (especially in an open-agent setting), agreeing with our previous conclusion.

Furthermore, Figure 1 shows that P-agents in the 50-0-50 population generate 11.42% less rewards on average than those in the 33-34-33 population. Understandably, this is because there are fewer agents in the 33-34-33 population setting with sufficient level of expertise to complete tasks on their own. Indeed, Figure 2 shows that generalists obtained better rewards than apprentices in the 33-34-33 population setting. Thus, overall, we conclude that in the “acting solo” scenario, P-agents perform better than non-P-agents, and that agents with higher level of expertise also perform better. Comment by Leenkiat Soh: Statistically significant? Please provide.Comment by Leenkiat Soh: No response?

7c1 – Rewards per Tick

Figure 1. Average rewards generated between 33-34-33 population and 50-0-50 population

with 1 subtask for each task.

7c1 – Rewards per Tick for 33-34-33 Population

Figure 2. Average rewards generated by apprentice, generalists and specialists

in 33-34-33 population with 1 subtask for each task.

Is Probabilistic Reasoning Always Beneficial? From Figure 2 we can see the performance gap in terms of rewards per tick obtained between P-agents and non-P agents increases with higher level of expertise (i.e., from apprentices to specialists). The higher level of expertise gives agents more flexibility to consider different tasks. With probabilistic reasoning, agents are able to take advantage of their higher level of expertise to get better rewards. On the contrary, apprentices cannot benefit as well from probabilistic reasoning due to their insufficient level of expertise allowing them to only be able to qualify to complete a smaller subset of tasks compared to the other two types of agents. Especially in an open-agent environment (i.e., AO=0.05), non-P-apprentices even obtain 16.51% better rewards than P-apprentices. Similar behavior is also observed with the 50-0-50 population shown in Figure 3. The results from Figures 2 and 3 have shown that probabilistic reasoning only loses its advantage among apprentices under an open-agent environment. For a closed-agent environment (i.e. AO=0.00), P-apprentices still obtain higher rewards than non-P-apprentices. Therefore, we infer that the advantage of probabilistic reasoning is affected by agent openness, especially for apprentices. Through further investigation we find that, with AO=0.05, all agents stay in the same environment for an average of only N ticks before they leave while they stay for 1000 ticks with AO=0.00. A higher AO replaces agents more quickly and thus, gives each agent less time to stay in the environment. Therefore, we conclude that probabilistic reasoning loses its advantage among apprentices due to insufficient time to explore and adapt. In other words, probabilistic reasoning does not benefit apprentices under a more open environment, especially in time-critical domains such as search and rescue or firefighting. Through investigation on agents’ bidding behavior we find that P-apprentices make and win fewer bids than non-P-apprentices (shown in Figures I & II in the Appendix) because of their reasoning based on previous bidding history, most of which are failing bids due to insufficient expertise levels.

7c1 – Rewards per Tick for 50-0-50 Population

Figure 3. Average rewards generated by apprentice and specialists

in 50-0-50 population with 1 subtask for each task.

Are Agents Trying to Avoid CompetitionDoes Level of Expertise Affect Agents’ Exploration? To better understand agents’ bidding strategies to obtain rewards, we look at bidding statistics for task-specific performances. More specifically, we first look at the percentage of bids made (PercentBM) and tasks assigned (PercentTA) by different types of agents (i.e., apprentices, generalists and specialists) to the same task. The percentage of bids made is an important statistic for us to understand how well each type of agent is exploring and adapting to the environment. It is also useful when combined with the percentage of tasks assigned to investigate whether agents are trying to avoid competition, as we shall see in next discussion. Figure 4 shows the average percentage of bids made by apprentices, generalists and specialists to the same task in the 33-34-33 population.[footnoteRef:1] . From Figure 4 we can see that with more tasks available in the environment, the percentage of bids made by specialists to the same task decreases because they have more tasks to choose from with higher level of expertise. This is to be expected. Furthermore, the decrease is more obvious among P-specialists because they are more explorative in general and even so in an open-task environment. Such behavior is consistent with our previous investigation on P-agents’ uniqueness of bidding targets in Section 5.1. On the other hand, the percentage of bids made by apprentices increase with more tasks available Logically, with more tasks available in the environment, P-agents would explore more and scatter their bids over various tasks, decreasing the percentage of bids made to the same task. However, the percentage of bids made by apprentices increases with more tasks available, shown in Figures 4 and 5., Such observation indicatesing that the their lower level of expertise of apprentices force them to target a limited set of tasks for the most rewards they might get. The increase is also partially explained by specialists exploring other tasks and leaving the competition. For generalists, we see a combination of the patterns adopted by apprentices and specialists. When the environment is competitive with fewer than 50 available tasks, more tasks lead to higher percentage of bids made by generalists. When the environment becomes more saturated with more tasks, the percentage slightly decreases. We can see that the percentage of bids made by generalists first increases with 10 to 50 tasks in the environment and then decreases with 60 or more tasks. This emergent behavior is because generalists first target a limited set of tasks for better rewards and then start to explore more tasks once the competition loosens up. This pattern is more obvious in P-generalists, implying that probabilistic reasoning allows agents to be more explorative and adaptive when agents are generalists or specialists. Figure 5 shows the percentage of bids made for the 50-0-50 population. The pattern is similar between apprentices and specialists. Overall, we conclude that a higher level of expertise helps agents explore more of the environment. Comment by Yi Xia: The reasoning and investigation is very long for this inline subtitle. Maybe 5.4.2? Comment by Leenkiat Soh: Perhaps the “Are Agents Trying to Avoid Competition?” is not appropriate and different question should be put here. And your different paragraphs will then each has its own question.Comment by Leenkiat Soh: Where? Which section?Comment by Leenkiat Soh: Hmmm … missing a step here. I don’t see the logic/argument here.Comment by Leenkiat Soh: Hmm … I kinda disagree with this.Comment by Leenkiat Soh: Please indicate where you observe this in the graphs? Explain/elaborate a bit more in your discussion. This is a KEY observation. [1: For example, if a total number of 10 bids were made to task ID 474, of which 4 came from apprentices, 5 came from generalists and 1 came from specialists, then the PercentBM for task ID 474 is 0.4, 0.5 and 0.1 for apprentices, generalists and specialists respectively. Similarly, if a task ID 474 requires 5 agents to complete and the agents get assigned to this task are 2 apprentices, 2 generalists and 1 specialist, then the PercentTA for task ID 474 is 0.4, 0.4 and 0.2 for apprentices, generalists and specialists respectively. ]

7c1 – Percentage of Bids Made for 33-34-33 Population

Figure 4. Percentage of bids made by apprentice, generalists and specialists

in 33-34-33 population with 1 subtask for each task.

7c1 – Percentage of Bids Made for 50-0-50 Population

Figure 5. Percentage of bids made by apprentice and specialists

in 50-0-50 population with 1 subtask for each task.

Are Agents Trying to Avoid Competition? To check whether agents fail to get assigned a task due to higher competition or insufficient level of expertiseSecondly, we look at the difference between percentage of bids made and percentage of tasks assigned (PercentBM-PercentTA). Intuitively, a smaller PercentBM means less agents of the same type are competing for the same task and a larger PercentTA means more agents of the same type get assigned to the same task. The difference between BM and TA is useful to check whether agents get more tasks assigned through smart bidding or greedy bidding. Essentially, the smaller value of PercentBM-PercentTA, the fewer bids an agent use to get a task assignedless competition an agent goes through to get a task assigned. Figure 6 shows the PercentBM-PercentTA curves by each agent type in the 33-34-33 population. Note that for tasks with only 1 subtask, the percentage of tasks assigned to each agent type is equivalent to the percentage of agents being able to complete a task. This definition is different with 2 or more subtasks, elaborated further in the following sections. From Figure 6 we can see that the performance of P-agents is fairly consistent among all three types of agents where the majority of PercentBM-PercentTA falls around 0.00. This indicates that probabilistic reasoning helps agents balance the number of bids made wisely for targets tasks with higher chances of getting assigned to a task (use UBM/BM to support). I STOPPED HERE In other words, P-agents are able to avoid higher competition for better rewards. On the other hand, non-P-agents exhibit rather different patterns among apprentices, generalists and specialists. For non-P-apprentices, lower level of expertise leads to fewer bids made as well as tasks assigned, especially under a closed-agent environment where the bids made exceeds the tasks assigned. For non-P-generalists, they make more bids than non-P-apprentices due to higher level of expertise. However, more tasks are assigned to non-P-specialists. As a result, BM-TA is higher for non-P-generalists compared to non-P-apprentices. Without probabilistic reasoning, non-P-generalists waste their bids on tasks they cannot outcompete specialists, especially under a closed-agent environment. Lastly, for non-P-specialists, they win get more tasks assigned with higher level of expertise and target a smaller set of tasks due to lack of probabilistic reasoning. The resulting BM-TA is much smaller among non-P-specialists compared to P-specialists in a closed-agent environment. For the 50-0-50 population, we expect to see similar results between apprentices and specialists. Indeed, Figure 7 shows that the overall pattern between apprentices and specialists is similar to the 33-34-33 population. The major difference is that without generalists, the percentage of bids made by specialists becomes larger and the resulting BM-TA gets closer to 0 for non-P-specialists. IMPLICATIONS??? Are P-agents able to avoid competitions for App/Gen/Spe? Comment by Yi Xia: I am re-thinking the logic of how we defined PercentBM and PercentTA. Although they are calculated per task, but they are averaged before subtracting. Now I am thinking, maybe we need to do subtracting first, then average over all the tasks. Also the PercentTA might need to be more subtle instead of just the percentage of Specialists in getting assigned a task… Comment by Leenkiat Soh: Why? What do you mean by “wisely”?Comment by Leenkiat Soh: I don’t quite understand this. BM = Bids Made, right? Or BM = Bids won? I am confused.Comment by Leenkiat Soh: No logical flow to lead to this conclusion. Must first provide your arguments step-by-step.Comment by Leenkiat Soh: Awkward. Does not flow from the previous sentence. The previous sentence focuses on non-P-generalists. So, should say what happens to the bids made by non-P-generalists first, and then say why.Comment by Leenkiat Soh: I disagree. Based on your logical flow here, you can’t convincingly conclude this. Double check your logic.Comment by Leenkiat Soh: I am confused by this one now.

7c1 – BM-TA for 33-34-33 Population

Figure 6. Percentage of BM-TA by apprentice, generalists and specialists

in 33-34-33 population with 1 subtask for each task.

7c1 – BM-TA for 50-0-50 Population

Figure 7. Percentage of BM-TA by apprentice and specialists

in 50-0-50 population with 1 subtask for each task.

5.4.2. Investigating Collaborating with Others.Agent CollaborationsComment by Leenkiat Soh: I STOPPED HERE SINCE YOUR E-MAIL SAYS YOU HAVEN”T REVISED COLLABORATION

To further understand the impact of agent’s born type, we expand our previous investigation to consider tasks with 2 or more subtasks where agents need to collaborate with others to complete a certain task. Figure 8 shows the average rewards per tick generated by uniform agents for different number of tasks with 1 to 6 subtasks. From Figure 8 we see that the average rewards decrease with more subtasks per task. The decrease in performance happens to both P-agents and non-P-agents. This observation indicates that probabilistic reasoning is not powerful enough to help agents implicitly search for teammates without direct consideration and communication about collaborations. In fact, the performance gap between P-agents and non-P-agents is getting smaller with more subtasks per task. From Figure III, IV, and V from the Appendix we can see that apprentices only contribute a very small portion to the overall rewards while generalists and specialists generate most of the rewards, differentiated by a small margin.

Figure 8. Average rewards generated by uniform agents with 1 to 6 subtasks.

Appendix

7c1 – Bids Made for Uniform Agents

Figure I. Average bids made by apprentice, generalists and specialists

in uniform setting with 1 subtask.

7c1 – Bids Won for Uniform Agents

Figure II. Average bids won by apprentice, generalists and specialists

in uniform setting with 1 subtask.

Figure III. Average rewards generated by apprentices

in uniform setting with 1 to 6 subtasks.

Figure IV. Average rewards generated by generalists

in uniform setting with 1 to 6 subtasks.

Figure V. Average rewards generated by specialists

in uniform setting with 1 to 6 subtasks.

Original Design

New Design

Task Similarity

Euclidian Distance

Cosine Similarity

Probability Given History

Look at full bidding history

Look at only K_PREVIOUS_BIDS

Table I. Comparison between original code sign and new code design

3