pragmatics & game theory: learning dynamicsroland/pgt1314/folien/session11.pdf · pragmatics...

37

Upload: vankhuong

Post on 18-Jun-2019

232 views

Category:

Documents


0 download

TRANSCRIPT

IntroductionModeling pragmatic phenomena

Pragmatics & Game Theory:

Learning Dynamics

Roland Mühlenbernd

WiSe 13/14

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Table of Content

1 IntroductionHomeworksReview: Learning Dynamics

2 Modeling pragmatic phenomenaQ-ImplicatureI-ImplicatureM-Implicature

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Homework Question 1

What kind of fundamental insight brought evolutionary biology intothe analysis of human relationships?

A fundamental insight from evolutionary biology is that mostsocial relationships involve combinations of cooperation andcon�ict.

This insight applies to communication among organisms noless than to physical actions, and indeed animal signaling hasbeen found to involve exploitative manipulation as well as thecooperative exchange of information.

In the human case, one has to think only of threats, dangeroussecrets, contaminating leakage, and incriminating questions.

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Homework Question 2

What is the advantage of bribing a police o�cer in an indirect way?Why is a veiled bribe a rational option, even if the situation is notof legal or �nancial matter? What kind of costs could be involved?

In a simple case like bribing a police o�cer, the appeal of a veiled bribe isintuitively clear: If some o�cers are corrupt and would accept the bribe,but others are honest and might arrest the driver for bribery, an indirectbribe can be detected by the corrupt cop while not being blatant enoughfor the honest cop to prove it beyond a reasonable doubt.

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Homework Question 2

What is the advantage of bribing a police o�cer in an indirect way?Why is a veiled bribe a rational option, even if the situation is notof legal or �nancial matter? What kind of costs could be involved?

In a nonlegal situation like indirectly bribing a maitre d' to getimmediately seated in a restaurant, indirect speech can avoid acon�ict of relationship types like dominance and reciprocity

an overt con�ict of relationship types causes awkwardness thatinvolves social costs

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Homework Question 3

What two purposes does language serve according to PolitenessTheory?

Politeness Theory proposes that language serves two purposes:to convey a proposition (e.g. a bribe, a command, an o�er)and to negotiate and maintain a relationship.

People achieve these dual ends by using language at two levels.The literal form of a sentence is consistent with the safestrelationship between speaker and hearer.

At the same time, by implicating a meaning between the lines,the speaker counts on the listener to infer its real intent, whichmay initiate a di�erent relationship.

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Homework Question 4

Name the (according to Fiske) three distinct types of humanrelationships and give a short description of each.

Alan Fiske has advanced the strong claim that humanrelationships in all cultures fall into only three distinct types:

The dominance or authority relationship is governed by theethos, �Don't mess with me.� It has a basis in the dominancehierarchies common in the animal kingdom, although inhumans, it is based not just on brawn or seniority but on socialrecognition: how much others are willing to defer to you.The communality or communal sharing relationship conformsto the ethos, �What's mine is thine; what's thine is mine.�The reciprocity or equality-matching relationship obeys theethos, �You scratch my back; I'll scratch yours.�

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Homework Question 5

Consider the indirect threat: �Nice store you got here. Would be areal shame if something happened to it.� What types of relationshipare in con�ict here? Explain.

The speaker pretends to be in a reciprocity relationship that�ts to the business context

The speaker indirectly communicates a dominance relationship:�Don't mess with me.�; �Do what I want, otherwise...�

If a cop would eavesdrop the conversation, the speaker couldnot be accused for a threat, as long as his words are indirectand super�cially re�ect a reciprocity relationship

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Homework Question 6

What does the plausible-deniability hypothesis say about thedirectness of the speaker's wording? And what does therelationship-negotiation hypothesis predict about indirect speech?

The plausible-deniability hypothesis predicts that thedirectness of speakers' wording of a veiled bribe or otheroverture (assessed on linguistic grounds) is not an arbitrarysocial ritual, like saying �Please� and �Thank you�,

but is predictable from strategic factors a�ecting its expectedutility, such as the proportion of honest and dishonest o�cersin an area, the cost of a bribe, the cost of a ticket, and thecost of a bribery charge.

For the listener's part, the directness of a speech act shouldpredict their subjective estimates of the likelihood that thespeaker intended the fraught proposition as opposed to makingan innocent remark.

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Homework Question 6

What does the plausible-deniability hypothesis say about thedirectness of the speaker's wording? And what does therelationship-negotiation hypothesis predict about indirect speech?

The relationship-negotiation hypothesis predicts that indirectspeech should be judged as generating less awkwardness anddiscomfort,

as being more respectful,

as better acknowledging the expected relationship with thehearer (such as a�ection, deference, or collegiality),

and as making it easier for the participants to resume theirnormal relationship should the o�er be rebu�ed.

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Overview

Game Theory and Linguistics

Language Evolution

Signaling Games

GT in Lang. Use

Indirect Speech

Pragm. Reasoning

Signaling Games

IBR model SIM

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Coordination & Signaling

R L

R 1 0L 0 1

aL aStL 1 0tS 0 1

Messages: One or two lanterns?

s1:tL m1

tS m2

s2:tL

m2tS

m1

s3:tL m1

tS m2

s4:tL

m2tS

m1

r1:m1 aL

m2 aS

r2:m1

aSm2

aL

r3:m1 aL

m2 aS

r4:m1

aSm2

aL

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

a signaling game is a tuple SG = 〈{S ,R},T ,Pr ,M,A,U〉a Lewis game is de�ned by:

T = {tL, tS}M = {m1,m2}A = {aL, aS}Pr(tL) = Pr(tS) = .5

U(ti , aj) =

{1 if i = j

0 else

aL aStL 1 0tS 0 1

N

S

R

1 0

R

1 0

S

R

0 1

R

0 1

.5 .5tL tS

m1 m2 m1 m2

aL aS aL aS aL aS aL aS

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Pure strategies

Pure strategies are contingency plans, players act according to.

sender strategy: s : T → M

receiver strategy: r : M → A

s1:tL m1

tS m2

s2:tL

m2tS

m1

s3:tL m1

tS m2

s4:tL

m2tS

m1

r1:m1 aL

m2 aS

r2:m1

aSm2

aL

r3:m1 aL

m2 aS

r4:m1

aSm2

aL

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Signaling Systems

signaling systems are combinations of pure strategies. TheLewis game has two: L1 = 〈s1, r1〉 and L2 = 〈s2, r2〉

L1:tL

tS

m1

m2

aL

aSL2:tL

tS

m1

m2

aL

aS

signaling systems are strict Nash equilibria of the EU-table:

r1 r2 r3 r4s1 1 0 .5 .5s2 0 1 .5 .5s3 .5 .5 .5 .5s4 .5 .5 .5 .5

in signaling systems messages associate states and actionsuniquely

signaling systems constitute evolutionary stable states

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Behavioral Strategies

Behavioral strategies are functions that map choice points toprobability distributions over actions available in that choice point.

behavioral sender strategyσ : T → ∆(M)

behavioral receiver strategyρ : M → ∆(A)

σ =

t1 7→[m1 7→ .9m2 7→ .1

]t2 7→

[m1 7→ .5m2 7→ .5

] ρ =

m1 7→[a1 7→ .33a2 7→ .67

]m2 7→

[a1 7→ 1a2 7→ 0

]

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Learning Dynamics & Signaling Games

Extensions in time:

agents play the game repeatedly

agents' decisions are in�uenced by previous encounters

application of learning dynamics like reinforcement learning

belief learning

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Best Response & Expected Utility

Playing Best Response means to make a choice thatmaximizes the Expected Utility.

EUS(m|t, β) =∑a∈A

β(a|m)× U(t, a) (1)

EUR(a|m, β) =∑t∈T

β(m|t)× U(t, a) (2)

How does an agent get belief β?

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Belief Learning

The belief is a result of observation

Example:

SO a1 a2m1 8 2m2 7 13

β =

m1 7→[a1 7→ .8a2 7→ .2

]m2 7→

[a1 7→ .35a2 7→ .65

]

RO t1 t2m1 6 0m2 4 4

β =

t1 7→[m1 7→ .6m2 7→ .4

]t2 7→

[m1 7→ 0m2 7→ 1

]

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Best Response as Behavioural Strategy

behavioural sender strategyσ : T → ∆(M)

σ(m|t) =

{1

|BR(t)| if m ∈ BR(t)

0 else

behavioural receiver strategyρ : M → ∆(A)

ρ(a|m) =

{1

|BR(m)| if a ∈ BR(m)

0 else

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Reinforcement Learning

S Rts

tg

m1

m2

as

ag

0

0

0

0the sender has an urn for eachstate t ∈ T

each urn contains balls of eachmessage m ∈ M

the sender decides by drawingfrom urn 0t

the receiver has an urn for eachmessage m ∈ M

each urn contains balls of eachaction a ∈ A

the receiver decides by drawingfrom urn 0t

successful communication → urn update

in general a signaling system emerges over time

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Behavioural & Pure Strategies

Pure strategies are a subset of behavioural strategies.

Example:

σ2:t1

m2t2

m1

ρ2:m1

a2m2

a1

σ2 =

t1 7→[m1 7→ 0m2 7→ 1

]t2 7→

[m1 7→ 1m2 7→ 0

] ρ2 =

m1 7→[a1 7→ 0a2 7→ 1

]m2 7→

[a1 7→ 1a2 7→ 0

]

Note: If an agents plays σ2 as sender and ρ2 as receiver, we say, hehas learned the signaling language L2 = 〈σ2, ρ2〉.

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Extensions in Time and Space

Extensions in time and space:

agents are placed in a network structure

agents play the game with direct neighbors

agents play both as sender and receiver

agents play the game repeatedly

agents' decisions are in�uenced by previous encounters:

implementation of learning dynamics

best response + belief learningreinforcement learning

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Example: Result in a SW network

Abbildung: Resulting structure after 30 simulation steps of 100 BL agentsplaying the Lewis game on a SW network. The colours blue and green representboth signaling systems as target strategies.

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Example: Result in a SW network

Abbildung: Resulting structure after 300 simulation steps of 100 RL agentsplaying the Lewis game (with lateral inhibition) on a SW network. The coloursblue and green represent both signaling systems as target strategies.

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

HomeworksReview: Learning Dynamics

Belief Learning VS. Reinforcement Learning

behavioural rational learning speed

BL + BR√ √

fastRL

√- slow

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Neo-Gricean Pragmatics

the Conversational Implicature is a pragmatic phenomenonwhere an utterance's intended meaning di�ers from its literalmeaning.

Interlocutors can resolve the di�erence between the intendedpragmatic interpretation (PI) and the literal interpretation (LI)by Cooperation Principles.

Levinson (2000) subdivided GCI's in:

Q-Implicature

I-Implicature

M-Implicature

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Q-Implicature

(1) �Some boys came to the party.�LI: Some, maybe all boys came. ∃ = ∃¬∀ ∨ ∀PI: Some but not all boys came. ∃¬∀

Strategy for LIt∀

t∃¬∀

mall

msome

msbna

a∀

a∃¬∀

Strategy for PIt∀

t∃¬∀

mall

msome

msbna

a∀

a∃¬∀

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Modelling Q-implicature

Parameter settings:

T = {t∀, t∃¬∀}M = {mall ,msome ,msbna}A = {a∀, a∃¬∀}Pr(t∀) = Pr(t∃¬∀) = .5

κ(msbna) = 1κ(mall ) = κ(msome) > 1

Initial LI strategy

t∀

t∃¬∀

mall

msome

msbna

t∀

t∃¬∀

.5

.5

.5

.5

.5

.5

mall msome msbna

0t∀ 50 50 00t∃¬∀ 0 50 50

a∀ a∃¬∀0mall

100 00msome 50 500msbna

0 100

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Simulation & Results

200 RL agents play the Q-Implicature game repeatedly on atotal network with random partners

all agents start with the initial urn setting that represents LI

The simulation ends if all agents have learned a pure strategy

Results:

t∀

t∃¬∀

mall

msome

msbna

t∀

t∃¬∀

t∀

t∃¬∀

mall

msome

msbna

t∀

t∃¬∀

%ofagents

1 2 3 4 5

κ(msome ),κ(mall )

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

I-Implicature

�What is expressed simply is stereotypically exempli�ed�

(2) �Billy drank a glass of milk.�LI: A glass of any kind of milk. tc , tgPI: A glass of cow's milk. tc

Strategy for LItc

tg

mcm

mm

mgm

ac

ag

Strategy for PItc

tg

mcm

mm

mgm

ac

ag

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Modelling I-implicature

Parameter settings:

T = {tc , tg}M = {mm,mcm,mgm}A = {ac , ag}Pr(tc) = .8 > Pr(tg ) = .2

κ(mm) = 2κ(mcm) = κ(mgm) = 1

Initial LI strategy

tc

tg

mcm

mm

mgm

ac

ag

.5

.5

1− p

p

1− p

p

mcm mm mgm

0tc 100− n n 00tg 0 n 100− n

for n = b100× pc

ac ag0mcm 100 00mm 50 500mgm 0 100

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Simulation & Results

200 RL agents play the I-Implicature game repeatedly on atotal network with random partners

all agents start with the initial urn setting that represents LI

The simulation ends if all agents have learned a pure strategy

Results:

tc

tg

mcm

mm

mgm

tc

tg

tc

tg

mcm

mm

mgm

tc

tg

%ofagents

.3 .35 .4 .45 .5 .55 .6 p

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

M-Implicature

�What's said in an abnormal way isn't normal.�

(3) �Billy caused the sheri� to die.�LI: Billy killed the sheri� in any way. tp, trPI: Billy killed the sheri� in an abnormal way. tr

Strategy for LItp

tr

mk

mctd

ap

ar

Strategy for PItp

tr

mk

mctd

ap

ar

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Modelling the M-implicature

Parameter settings:

T = {tp, tr}M = {mk ,mctd}A = {ap, ar}κ(mk) = 2, κ(mctd ) = 1

Pr(tp) > Pr(tr )

Initial LI strategy

tp

tr

mk

mctd

ap

ar

mk mctd

0tp 50 500tr 50 50

ap ar0mk

50 500mctd

50 50

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Simulation & Results

200 RL agents play the M-Implicature game repeatedly on atotal network with random partners

all agents start with the initial urn setting that represents LI

The simulation ends if all agents have learned a pure strategy

Results:

tp

tr

mk

mctd

ap

ar

tp

tr

mk

mctd

ap

ar

%ofagents

.51 .52 .53 .54 .55 .56 .57 Pr(tp)

Roland Mühlenbernd Learning Dynamics

IntroductionModeling pragmatic phenomena

Q-ImplicatureI-ImplicatureM-Implicature

Conclusion

1 Analysis of dynamics of language change and conventionalization oflinguistic behavior by

applying evolutionary and learning dynamics for repeatedsignaling gameson players (=agents) placed in a population structure

2 Concrete experiments for Q-, I- and M- implicature showed thatagents that start with a literal communication strategy stabilizewith the pragmatic one for the major space of parameter settings

3 Results reveal that pragmatic behavior can explained by rationaldeliberation (IBR) as well as by population dynamics (RL, BL)

4 Results highlight the universal power of pragmatic communicationbehavior as a way to maximize e�ciency of communication

Roland Mühlenbernd Learning Dynamics