a toy model of human cognition: utilizing fluctuation in uncertain and non-stationary environments

126
A toy model of human cognition: Utilizing fluctuation in uncertain and nonstationary environments Tatsuji Takahashi 1 , Yu Kohno 1,2 Seminar on science of complex systems (organized by YukioPegio Gunji), Yukawa Institute for Theoretical Physics, Kyoto University, Jan. 20, 2014 1 Tokyo Denki University, 2 JSPS (from Apr., 2014)

Upload: tatsuji-takahashi

Post on 04-Jul-2015

710 views

Category:

Education


0 download

DESCRIPTION

http://www.yukawa.kyoto-u.ac.jp/contents/seminar/detail.php?SNUM=51633 Tatsuji Takahashi1, Yu Kohno1,2 Seminar on science of complex systems (organized by Yukio-Pegio Gunji), Yukawa Institute for Theoretical Physics, Kyoto University, Jan. 20, 2014 1 Tokyo Denki University, 2 JSPS (from Apr., 2014)

TRANSCRIPT

Page 1: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A  toy  model  of    human  cognition:    

!

Utilizing  fluctuation  in  uncertain  and  non-­‐‑stationary  environments

Tatsuji  Takahashi1,  Yu  Kohno1,2  Seminar  on  science  of  complex  systems  (organized  by  Yukio-­‐‑Pegio  Gunji),  Yukawa  Institute  for  Theoretical  

Physics,  Kyoto  University,  Jan.  20,  2014  1Tokyo  Denki  University,  2JSPS  (from  Apr.,  2014)

Page 2: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Contents

!2

Page 3: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ContentsThe  loosely  symmetric  (LS)  model  

!2

Page 4: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ContentsThe  loosely  symmetric  (LS)  model  

Cognitive  properties  or  cognitive  biases

!2

Page 5: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ContentsThe  loosely  symmetric  (LS)  model  

Cognitive  properties  or  cognitive  biases

Analysis  of  reconstruction  of  LS  

!2

Page 6: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ContentsThe  loosely  symmetric  (LS)  model  

Cognitive  properties  or  cognitive  biases

Analysis  of  reconstruction  of  LS  

Result:  Efficacy  in  reinforcement  learning

!2

Page 7: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ContentsThe  loosely  symmetric  (LS)  model  

Cognitive  properties  or  cognitive  biases

Analysis  of  reconstruction  of  LS  

Result:  Efficacy  in  reinforcement  learning

Utilization  of  fluctuation  in  non-­‐‑stationary  environments

!2

Page 8: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A  toy  model  of  human  cognition

!3

Page 9: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A  toy  model  of  human  cognitionModeling  focussing  on  deviations  from  rational  standards:  cognitive  biases

!3

Page 10: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A  toy  model  of  human  cognitionModeling  focussing  on  deviations  from  rational  standards:  cognitive  biases

the  differences  from  “machines”

!3

Page 11: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A  toy  model  of  human  cognitionModeling  focussing  on  deviations  from  rational  standards:  cognitive  biases

the  differences  from  “machines”

Principal  properties  implemented  in  a  form  as  simple  as  possible

!3

Page 12: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A  toy  model  of  human  cognitionModeling  focussing  on  deviations  from  rational  standards:  cognitive  biases

the  differences  from  “machines”

Principal  properties  implemented  in  a  form  as  simple  as  possible

so  that  it  can  be  analyzed  and  applied  easily

!3

Page 13: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A  toy  model  of  human  cognitionModeling  focussing  on  deviations  from  rational  standards:  cognitive  biases

the  differences  from  “machines”

Principal  properties  implemented  in  a  form  as  simple  as  possible

so  that  it  can  be  analyzed  and  applied  easily

Intuition  of  human  beings

!3

Page 14: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A  toy  model  of  human  cognitionModeling  focussing  on  deviations  from  rational  standards:  cognitive  biases

the  differences  from  “machines”

Principal  properties  implemented  in  a  form  as  simple  as  possible

so  that  it  can  be  analyzed  and  applied  easily

Intuition  of  human  beings

as  simple,  again:  not  the  policy  (or  strategy)  that  is  learnt  through  education  and  culture

!3

Page 15: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

LS  as  a  toy  model  of  cognition

!4

Page 16: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

LS  as  a  toy  model  of  cognitionWe  treat  the  loosely  symmetric  (LS)  model  proposed  by  Shinohara  (2007).  LS:  

!4

Page 17: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

LS  as  a  toy  model  of  cognitionWe  treat  the  loosely  symmetric  (LS)  model  proposed  by  Shinohara  (2007).  LS:  

models  cognitive  biases

!4

Page 18: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

LS  as  a  toy  model  of  cognitionWe  treat  the  loosely  symmetric  (LS)  model  proposed  by  Shinohara  (2007).  LS:  

models  cognitive  biases

merely  a  function  over  co-­‐‑occurrence  information  between  two  events

!4

Page 19: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

LS  as  a  toy  model  of  cognitionWe  treat  the  loosely  symmetric  (LS)  model  proposed  by  Shinohara  (2007).  LS:  

models  cognitive  biases

merely  a  function  over  co-­‐‑occurrence  information  between  two  events

faithfully  describes  the  causal  intuition  of  humans

!4

Page 20: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

LS  as  a  toy  model  of  cognitionWe  treat  the  loosely  symmetric  (LS)  model  proposed  by  Shinohara  (2007).  LS:  

models  cognitive  biases

merely  a  function  over  co-­‐‑occurrence  information  between  two  events

faithfully  describes  the  causal  intuition  of  humans

which  form  the  basis  of  decision-­‐‑making  and  action  for  adaptation  in  the  world

!4

Page 21: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  model

!5

Page 22: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelA  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  probability  P(-­‐‑|-­‐‑).

!5

Page 23: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelA  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  probability  P(-­‐‑|-­‐‑).

Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q

!5

Page 24: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelA  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  probability  P(-­‐‑|-­‐‑).

Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q

!5

posterior eventq ¬q

prior event

p a b¬p c d

Page 25: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelA  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  probability  P(-­‐‑|-­‐‑).

Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q

The  relationship  from  p  to  q:  LS(q|p)

!5

posterior eventq ¬q

prior event

p a b¬p c d

Page 26: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelA  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  probability  P(-­‐‑|-­‐‑).

Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q

The  relationship  from  p  to  q:  LS(q|p)

LS  describes  the  causal  intuition  of  human  beings  the  most  faithfully  (among  more  than  40  existing  models).  

!5

posterior eventq ¬q

prior event

p a b¬p c d

Page 27: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelA  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  probability  P(-­‐‑|-­‐‑).

Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q

The  relationship  from  p  to  q:  LS(q|p)

LS  describes  the  causal  intuition  of  human  beings  the  most  faithfully  (among  more  than  40  existing  models).  

!5

posterior eventq ¬q

prior event

p a b¬p c d

P (q|p) = a

a+ b

Page 28: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelA  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional  probability  P(-­‐‑|-­‐‑).

Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q

The  relationship  from  p  to  q:  LS(q|p)

LS  describes  the  causal  intuition  of  human  beings  the  most  faithfully  (among  more  than  40  existing  models).  

!5

posterior eventq ¬q

prior event

p a b¬p c d

P (q|p) = a

a+ b

LS(q|p) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

Page 29: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  model

!6

posterior event

q ¬q

prior event

p a b¬p c d

LS(q|p) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

Page 30: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelInductive  inference  of  causal  relationship

!6

posterior event

q ¬q

prior event

p a b¬p c d

LS(q|p) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

Page 31: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelInductive  inference  of  causal  relationship

How  humans  form  the  intensity  of  causal  relationship  from  p  to  q,  

!6

posterior event

q ¬q

prior event

p a b¬p c d

LS(q|p) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

Page 32: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelInductive  inference  of  causal  relationship

How  humans  form  the  intensity  of  causal  relationship  from  p  to  q,  

when  p  is  the  candidate  cause  of  the  effect  q  in  focus?

!6

posterior event

q ¬q

prior event

p a b¬p c d

LS(q|p) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

Page 33: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelInductive  inference  of  causal  relationship

How  humans  form  the  intensity  of  causal  relationship  from  p  to  q,  

when  p  is  the  candidate  cause  of  the  effect  q  in  focus?

The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition

!6

posterior event

q ¬q

prior event

p a b¬p c d

LS(q|p) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

Page 34: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelInductive  inference  of  causal  relationship

How  humans  form  the  intensity  of  causal  relationship  from  p  to  q,  

when  p  is  the  candidate  cause  of  the  effect  q  in  focus?

The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition

!6

posterior event

q ¬q

prior event

p a b¬p c d

LS(q|p) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

Meta analysis as in Hattori & Oaksford (2007)

Page 35: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelInductive  inference  of  causal  relationship

How  humans  form  the  intensity  of  causal  relationship  from  p  to  q,  

when  p  is  the  candidate  cause  of  the  effect  q  in  focus?

The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition

!6

posterior event

q ¬q

prior event

p a b¬p c d

LS(q|p) =a+ b

b+dd

a+ bb+dd+ b+ a

a+cc

Experiment AS95 BCC03.1 BCC03.3 H03 H06 LS00 W03.2 W03.6r for LS 0.95 0.98 0.98 0.98 0.97 0.85 0.95 0.85r for ΔP 0.88 0.92 0.84 0.00 0.71 0.88 0.28 0.46r2 for LS 0.9 0.96 0.96 0.97 0.94 0.73 0.91 0.72

Meta analysis as in Hattori & Oaksford (2007)

Page 36: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

In  2-­‐‑armed  bandit  problems

!7

Page 37: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

In  2-­‐‑armed  bandit  problems

!7

later on bandit problems

Page 38: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

In  2-­‐‑armed  bandit  problemsLS  used  as  the  value  function  in  reinforcement  learning:  

!7

later on bandit problems

Page 39: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

In  2-­‐‑armed  bandit  problemsLS  used  as  the  value  function  in  reinforcement  learning:  

The  agent  evaluates  the  actions  according  to  the  causal  intuition  of  humans.

!7

later on bandit problems

Page 40: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

In  2-­‐‑armed  bandit  problemsLS  used  as  the  value  function  in  reinforcement  learning:  

The  agent  evaluates  the  actions  according  to  the  causal  intuition  of  humans.

!7

1 5 10 50 100 500 10000.5

0.6

0.7

0.8

0.9

1.0

step

Accuracyrate

LSCPToWH0.5LSMH0.3LSMH0.7L

later on bandit problems

Page 41: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

In  2-­‐‑armed  bandit  problemsLS  used  as  the  value  function  in  reinforcement  learning:  

The  agent  evaluates  the  actions  according  to  the  causal  intuition  of  humans.

Very  good  adaptation  to  the  environment,  both  in  short  term  and  long  term.

!7

1 5 10 50 100 500 10000.5

0.6

0.7

0.8

0.9

1.0

step

Accuracyrate

LSCPToWH0.5LSMH0.3LSMH0.7L

later on bandit problems

Page 42: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  model

Page 43: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Page 44: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)

Page 45: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)

Comparative  valuation

Page 46: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)

Comparative  valuation

psychology:  Tversky  &  Kahneman,  Science,  1974.

Page 47: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)

Comparative  valuation

psychology:  Tversky  &  Kahneman,  Science,  1974.

brain  science:  Daw  et  al.,  Nature,  2006.

Page 48: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)

Comparative  valuation

psychology:  Tversky  &  Kahneman,  Science,  1974.

brain  science:  Daw  et  al.,  Nature,  2006.

Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory

Page 49: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)

Comparative  valuation

psychology:  Tversky  &  Kahneman,  Science,  1974.

brain  science:  Daw  et  al.,  Nature,  2006.

Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory

Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009

Page 50: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)

Comparative  valuation

psychology:  Tversky  &  Kahneman,  Science,  1974.

brain  science:  Daw  et  al.,  Nature,  2006.

Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory

Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009

Satisficing  

Page 51: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

The  loosely  symmetric  (LS)  modelFrom  the  analysis  of  LS,  we  found  the  following  cognitive  properties:  

Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)

Comparative  valuation

psychology:  Tversky  &  Kahneman,  Science,  1974.

brain  science:  Daw  et  al.,  Nature,  2006.

Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory

Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009

Satisficing  

Simon,  Psy.  Rev.,  1954,  Kolling  et  al.,  Science,  2012.

Page 52: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Principal  human  cognitive  biases

!9

Page 53: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Principal  human  cognitive  biasesHumans:  

!9

Page 54: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Principal  human  cognitive  biasesHumans:  

Satisficing:  do  not  optimize  but  satisfice.

!9

Page 55: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Principal  human  cognitive  biasesHumans:  

Satisficing:  do  not  optimize  but  satisfice.

become  satisfied  when  it  is  becer  than  the  reference  level

!9

Page 56: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Principal  human  cognitive  biasesHumans:  

Satisficing:  do  not  optimize  but  satisfice.

become  satisfied  when  it  is  becer  than  the  reference  level

Comparative  valuation:  evaluate  states  and  actions  in  a  relative  manner

!9

Page 57: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Principal  human  cognitive  biasesHumans:  

Satisficing:  do  not  optimize  but  satisfice.

become  satisfied  when  it  is  becer  than  the  reference  level

Comparative  valuation:  evaluate  states  and  actions  in  a  relative  manner

Asymmetric  risk  a:itude:  asymmetrically  recognize  gain  and  loss

!9

Page 58: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Satisficing

A1 A2 No pursuit of arms over the reference level givenall arms are over reference

Search hard for an arm over the reference levelall arms are under reference

A1 A2reference

reference

Page 59: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Satisficing

A1 A2 No pursuit of arms over the reference level givenall arms are over reference

Search hard for an arm over the reference levelall arms are under reference

A1 A2reference

reference

Expected value 0.75 = 75% 25% = 25%

win (o) and lose (x) in the past○×○○○ ×○○○○ ○○○×○ ○○×○×

○×○○×○××× ○×××× ×××○× ××○×○

×○××

comparison considering reliability > <

Gamble on 1/4 rather than 5/20

Risk-avoiding over the reference

Choose 15/20 than 3/4

Risk  a:itude  (Reliability  consideration)Risk-seeking under the reference

reflection effect

Page 60: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Satisficing

A1 A2 No pursuit of arms over the reference level givenall arms are over reference

Search hard for an arm over the reference levelall arms are under reference

A1 A2reference

reference

Expected value 0.75 = 75% 25% = 25%

win (o) and lose (x) in the past○×○○○ ×○○○○ ○○○×○ ○○×○×

○×○○×○××× ○×××× ×××○× ××○×○

×○××

comparison considering reliability > <

Gamble on 1/4 rather than 5/20

Risk-avoiding over the reference

Choose 15/20 than 3/4

Risk  a:itude  (Reliability  consideration)Risk-seeking under the reference

reflection effect

Choose A1 and lose

Comparative  evaluation Try arms other than A1 by comparative valuation

(see-saw)value of A1 value of A2

A1 A2 absolute comparative A1 A2

Page 61: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Abstract image

The  generalized  LS  with  variable  reference  (LSVR)

Variable Reference

LSVR is a generalization of LS with an autonomously adjusted parameter of reference.

Page 62: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

n-­‐‑armed  bandit  problem  (nABP)

!12

Page 63: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

n-­‐‑armed  bandit  problem  (nABP)

The  simplest  framework  in  reinforcement  learning,  exhibiting  the  exploration-­‐‑exploitation  dilemma  and  the  speed-­‐‑accuracy  tradeoff.

!12

Page 64: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

n-­‐‑armed  bandit  problem  (nABP)

The  simplest  framework  in  reinforcement  learning,  exhibiting  the  exploration-­‐‑exploitation  dilemma  and  the  speed-­‐‑accuracy  tradeoff.

It  is  to  maximize  the  total  reward  acquired  from  n  actions  (sources)  with  unknown  reward  distribution.

!12

Page 65: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

n-­‐‑armed  bandit  problem  (nABP)

The  simplest  framework  in  reinforcement  learning,  exhibiting  the  exploration-­‐‑exploitation  dilemma  and  the  speed-­‐‑accuracy  tradeoff.

It  is  to  maximize  the  total  reward  acquired  from  n  actions  (sources)  with  unknown  reward  distribution.

One-­‐‑armed  bandit  is  a  slot  machine  that  gives  a  reward  (win)  or  not  (lose).

!12

Page 66: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

n-­‐‑armed  bandit  problem  (nABP)

The  simplest  framework  in  reinforcement  learning,  exhibiting  the  exploration-­‐‑exploitation  dilemma  and  the  speed-­‐‑accuracy  tradeoff.

It  is  to  maximize  the  total  reward  acquired  from  n  actions  (sources)  with  unknown  reward  distribution.

One-­‐‑armed  bandit  is  a  slot  machine  that  gives  a  reward  (win)  or  not  (lose).

n-­‐‑armed  bandit  is  a  slot  machine  with  n  arms  that  have  different  probability  of  winning.  

!12

Page 67: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Performance  indices  for  nABP

!13

Page 68: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Performance  indices  for  nABP

Accuracy:  

!13

Page 69: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Performance  indices  for  nABP

Accuracy:  

the  average  percentage  of  choosing  the  optimal  action

!13

Page 70: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Performance  indices  for  nABP

Accuracy:  

the  average  percentage  of  choosing  the  optimal  action

Regret  (expected  loss):  

!13

Page 71: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Performance  indices  for  nABP

Accuracy:  

the  average  percentage  of  choosing  the  optimal  action

Regret  (expected  loss):  

the  difference  of  the  actually  acquired  accumulated  rewards  from  the  best  possible  sequence  of  actions  (where  accuracy=1.0  all  through  the  trial)

!13

Page 72: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Result n=100, the reward probability for each action is taken uniformly from [0,1].

0e+00 2e+05 4e+05 6e+05 8e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

0e+00 2e+05 4e+05 6e+05 8e+05 1e+06

05

1015

Steps

Expe

cted

loss

LSLS-VRUCB1-tunedLS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

Accuracy: highest Regret: smallest

Kohno & Takahashi, 2012; in prep.The more there are actions, the better

the performance of LSVR becomes.

Page 73: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Non-­‐‑stationary  bandits

The  reward  probabilities  change  while  playing.

!15

Page 74: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Result in non-stationary environment 1n=16, the reward probability is from [0,1].

The probabilities are totally reset every 10,000 steps.

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

0 10000 20000 30000 40000 50000

050

100

150

200

250

300

Steps

Expe

cted

loss

LSLS-VRUCB1-tunedLS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

Kohno & Takahashi, in prep.Accuracy: highest Regret: smallest

Page 75: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Result in non-stationary environment 2

Accuracy (the rate of the optimal action at the time chosen)

n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001.

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

Page 76: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Result in non-stationary environment 2

Accuracy (the rate of the optimal action at the time chosen)

n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001.

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

Even when a not well-tried action becomes the new optimal, it can switch to the optimal action.

Page 77: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Result in non-stationary environment 2

Accuracy (the rate of the optimal action at the time chosen)

n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001.

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

Even when a not well-tried action becomes the new optimal, it can switch to the optimal action.

Page 78: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Result in non-stationary environment 2

Accuracy (the rate of the optimal action at the time chosen)

n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001.

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

Even when a not well-tried action becomes the new optimal, it can switch to the optimal action.

If the reward is given deterministically, this is

impossible.

Page 79: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Result in non-stationary environment 2

Accuracy (the rate of the optimal action at the time chosen)

n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001.

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

Even when a not well-tried action becomes the new optimal, it can switch to the optimal action.

If the reward is given deterministically, this is

impossible.

Page 80: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Result in non-stationary environment 2

Accuracy (the rate of the optimal action at the time chosen)

n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001.

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

Even when a not well-tried action becomes the new optimal, it can switch to the optimal action.

If the reward is given deterministically, this is

impossible.

Efficient search utilizing uncertainty and fluctuation

in non-stationary environments

Page 81: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Results

!18

Page 82: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Results

!18

0e+00 2e+05 4e+05 6e+05 8e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

stationary

Page 83: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ResultsThe  more  there  are  options,  the  becer  the  performance  of  LSVR  becomes.

!18

0e+00 2e+05 4e+05 6e+05 8e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

stationary

Page 84: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ResultsThe  more  there  are  options,  the  becer  the  performance  of  LSVR  becomes.

!18

0e+00 2e+05 4e+05 6e+05 8e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

stationary

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

non-stationary 2

LSVR  can  trace  the  unobserved  change,  amplifying  fluctuation.

Page 85: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ResultsThe  more  there  are  options,  the  becer  the  performance  of  LSVR  becomes.

!18

0e+00 2e+05 4e+05 6e+05 8e+05 1e+06

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

stationary

0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

non-stationary!synchronous

LSVR  can  trace  the  change  in  non-­‐‑stationary  environments.0 10000 20000 30000 40000 50000

0.0

0.2

0.4

0.6

0.8

1.0

Steps

Accu

racy

rate

LSLS-VRUCB1-tuned

LS γ= 0.999LS-VR γ= 0.999UCB1-tuned γ= 0.999

non-stationary 2

LSVR  can  trace  the  unobserved  change,  amplifying  fluctuation.

Page 86: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Discussion

!19

Page 87: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

DiscussionThe  cognitive  biases  of  humans,  when  combined:  

!19

Page 88: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

DiscussionThe  cognitive  biases  of  humans,  when  combined:  

Effectively  works  for  adaptation  under  uncertainty

!19

Page 89: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

DiscussionThe  cognitive  biases  of  humans,  when  combined:  

Effectively  works  for  adaptation  under  uncertainty

Conflates  an  action  and  the  set  of  the  actions  through  comparative  valuation.

!19

Page 90: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

DiscussionThe  cognitive  biases  of  humans,  when  combined:  

Effectively  works  for  adaptation  under  uncertainty

Conflates  an  action  and  the  set  of  the  actions  through  comparative  valuation.

Symbolizes  the  whole  situation  into  a  virtual  action.

!19

Page 91: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

DiscussionThe  cognitive  biases  of  humans,  when  combined:  

Effectively  works  for  adaptation  under  uncertainty

Conflates  an  action  and  the  set  of  the  actions  through  comparative  valuation.

Symbolizes  the  whole  situation  into  a  virtual  action.

Utilizes  fluctuation  from  uncertainty  and  enables  adaptation  to  non-­‐‑stationary  environments.

!19

Page 92: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Conflating  part  and  whole

!20

Page 93: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Conflating  part  and  whole

Comparative  valuation  conflates  the  information  of  an  action  and  of  the  whole  set  of  actions.

!20

Page 94: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Conflating  part  and  whole

Comparative  valuation  conflates  the  information  of  an  action  and  of  the  whole  set  of  actions.

Universal  in  living  systems  from  slime  molds  (Lacy  &  Beekman,  2011)  to  neurons  (Royer  &  Paré,  2003)  to  animals  and  human  beings.

!20

Page 95: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Relative  evaluation  is  especially  important

Try arms other than A1 by relative evaluation

(see-saw)

Choose A1 and lose

value of A1

value of A2

value of A1

value of A2

if relative

value of A1

value of A2

if absolute

Page 96: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Relative  evaluation  is  especially  important

★ Relative  evaluation:  

Try arms other than A1 by relative evaluation

(see-saw)

Choose A1 and lose

value of A1

value of A2

value of A1

value of A2

if relative

value of A1

value of A2

if absolute

Page 97: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Relative  evaluation  is  especially  important

★ Relative  evaluation:  ★ is  what  even  slime  molds  and  real  neural  networks  (conservation  

of  synaptic  weights)  do.  Behavioral  economics  found  that  humans  comparatively  evaluate  actions  and  states.

Try arms other than A1 by relative evaluation

(see-saw)

Choose A1 and lose

value of A1

value of A2

value of A1

value of A2

if relative

value of A1

value of A2

if absolute

Page 98: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Relative  evaluation  is  especially  important

★ Relative  evaluation:  ★ is  what  even  slime  molds  and  real  neural  networks  (conservation  

of  synaptic  weights)  do.  Behavioral  economics  found  that  humans  comparatively  evaluate  actions  and  states.

★ weakens  the  dilemma  between  exploitation  and  exploration  with  the  see-­‐‑saw  game  like  competition  among  arms:  

Try arms other than A1 by relative evaluation

(see-saw)

Choose A1 and lose

value of A1

value of A2

value of A1

value of A2

if relative

value of A1

value of A2

if absolute

Page 99: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Relative  evaluation  is  especially  important

★ Relative  evaluation:  ★ is  what  even  slime  molds  and  real  neural  networks  (conservation  

of  synaptic  weights)  do.  Behavioral  economics  found  that  humans  comparatively  evaluate  actions  and  states.

★ weakens  the  dilemma  between  exploitation  and  exploration  with  the  see-­‐‑saw  game  like  competition  among  arms:  ★ Through  failure  (low  reward),  choice  of  greedy  action  may  quickly  

trigger  to  the  next  choice  of  the  previously  second  best,  non-­‐‑greedy  arm.

Try arms other than A1 by relative evaluation

(see-saw)

Choose A1 and lose

value of A1

value of A2

value of A1

value of A2

if relative

value of A1

value of A2

if absolute

Page 100: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Relative  evaluation  is  especially  important

★ Relative  evaluation:  ★ is  what  even  slime  molds  and  real  neural  networks  (conservation  

of  synaptic  weights)  do.  Behavioral  economics  found  that  humans  comparatively  evaluate  actions  and  states.

★ weakens  the  dilemma  between  exploitation  and  exploration  with  the  see-­‐‑saw  game  like  competition  among  arms:  ★ Through  failure  (low  reward),  choice  of  greedy  action  may  quickly  

trigger  to  the  next  choice  of  the  previously  second  best,  non-­‐‑greedy  arm.★ Through  success  (high  reward),  choice  of  greedy  action  may  quickly  

trigger  to  focussing  on  the  currently  greedy  action,  lessening  the  possibility  of  choosing  non-­‐‑greedy  arms  by  decreasing  the  value  of  other  arms.

Try arms other than A1 by relative evaluation

(see-saw)

Choose A1 and lose

value of A1

value of A2

value of A1

value of A2

if relative

value of A1

value of A2

if absolute

Page 101: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Symbolization of the whole and comparative valuation with multi actions

A2

777

An

777

...

A1

777

Page 102: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Symbolization of the whole and comparative valuation with multi actions

A2

777

An

777

...

A1

777

Page 103: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Symbolization of the whole and comparative valuation with multi actions

A2

777

An

777

...

A1

777

Page 104: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Symbolization of the whole and comparative valuation with multi actions

A2

777

An

777

...Ag

777

Virtual machine representing the whole

A1

777

Page 105: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A2

777

...

Ag

777

A2

777

An

777

Comparative valuation with a virtual action representing the whole

Virtual machine representing the whole

“>” or “<”?

“>” or “<”?

“>” or “<”?

A1

777

Page 106: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

A2

777

...

Ag

777

A2

777

An

777

Ag

777

Comparative valuation with a virtual action representing the whole

Virtual machine representing the whole

“>” or “<”?

“>” or “<”?

“>” or “<”?

A1

777

Page 107: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Conclusion

���24

Page 108: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolution

���24

Page 109: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  control

���24

Page 110: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

���24

Page 111: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 112: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 113: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Brain  science  and  the  three  cognitive  biases:  

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 114: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Brain  science  and  the  three  cognitive  biases:  Satisficing  

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 115: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Brain  science  and  the  three  cognitive  biases:  Satisficing  

Kolling  et  al.,  Science,  2012.

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 116: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Brain  science  and  the  three  cognitive  biases:  Satisficing  

Kolling  et  al.,  Science,  2012.

Comparative  valuation  of  state-­‐‑action  value

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 117: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Brain  science  and  the  three  cognitive  biases:  Satisficing  

Kolling  et  al.,  Science,  2012.

Comparative  valuation  of  state-­‐‑action  value

Daw  et  al.,  Nature,  2006.

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 118: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Brain  science  and  the  three  cognitive  biases:  Satisficing  

Kolling  et  al.,  Science,  2012.

Comparative  valuation  of  state-­‐‑action  value

Daw  et  al.,  Nature,  2006.

Idiosyncratic  risk  evaluation

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 119: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Brain  science  and  the  three  cognitive  biases:  Satisficing  

Kolling  et  al.,  Science,  2012.

Comparative  valuation  of  state-­‐‑action  value

Daw  et  al.,  Nature,  2006.

Idiosyncratic  risk  evaluationBoorman  et  al.,  Neuron,  2009.

ConclusionThe  cognitive  biases  that  look  irrational  are,  when  appropriately  combined  together  as  in  humans,  actually  rational  for  adapting  to  uncertain  environments  and  survival  through  evolutionApplicable  in  engineering,  in  machine  learning  and  robot  controlImplications  to  brain  science  (brain  as  a  machine  learning  equipment)

Modeling  PFC  and  vmPFC

���24

Page 120: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Applications  of  bandit  problems

Game-tree

Page 121: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Applications  of  bandit  problems

★Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)Game-tree

Page 122: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Applications  of  bandit  problems

★Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)★ Online  advertisement

Game-tree

Page 123: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Applications  of  bandit  problems

★Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)★ Online  advertisement★ e.g.,  A/B  test

Game-tree

Page 124: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Applications  of  bandit  problems

★Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)★ Online  advertisement★ e.g.,  A/B  test

★ Design  of  medical  treatment  

Game-tree

Page 125: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Applications  of  bandit  problems

★Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)★ Online  advertisement★ e.g.,  A/B  test

★ Design  of  medical  treatment  ★ Reinforcement  learning

Game-tree

Page 126: A toy model of  human cognition: Utilizing fluctuation in uncertain and non-stationary environments

Robotic motion learningLearning giant-swing motion with no prior knowledge

and under coarse-grained states through trial-and-error.

free$joint�

ac,ve$joint�

Real$Robot$ Simulator$

1st$joint$(free)�

2nd$joint$(ac,ve)� 1st$link�

2nd$link�

200#

300#

400#

500#

600#

0# 20# 40# 60# 80# 100#

Learning#steps#[#/1000#steps]�

Acqu

ired#reward#pe

r#1000#step

s� Typical(case� Average(of(100(trials�

200#

300#

400#

500#

600#

0# 20# 40# 60# 80# 100#

LS>Q#Q#

Uragami, D., Takahashi, T., Matsuo, Y., Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control, BioSystems, 116, 1–9. (2014)

P10�

P0�P1�P2�

P3�P4�P5�P6�P7�

P8�P9�

P11�P12�P13�P14�

P15�P16�

P17�P18�P19�P20�P21�

P22�P23�

Posi%on'State'

3π� .3π�0

W6�W5�W4�W3�W2� W1�W0�

[rad/s]�

Velocity'State'

R0�R1�

R2�R3�R4�

0'[rad]�

5/6π'[rad]�

Posture'State'

A1�

A2�

A0�

Ac%on'

r'='0�

r'='1�

r'='|θ%p'/'π|�

Reward'

4.0'[rad/s]�

.4.0'[rad/s]�0.0'[rad/s]�