wisdom of crowds in human memory: reconstructing events by aggregating memories across individuals

Wisdom of Crowds in Human Memory: Reconstructing Events by Aggregating

Memories across Individuals

Mark SteyversDepartment of Cognitive Sciences

University of California, Irvine

Joint work with:Brent Miller, Pernille Hemmer, Mike Yi

Michael Lee, Bill Batchelder, Paolo Napoletano

Wisdom of crowds phenomenon

Group estimate often performs as well as or better than best individual in the group

2

Examples of wisdom of crowds phenomenon

3

Who wants to be a millionaire?Galton’s Ox (1907): Median of individual estimates comes close to true answer

Tasks studied in our research

Ordering/ranking problems declarative memory: order of US presidents, ranking cities by size episodic memory: order of events (i.e., serial recall) predictive rankings: fantasy football

Matching problems assign N items to N responses e.g., match paintings to artists, or flags to countries

Traveling Salesman problems find shortest route between cities

problems involving permutations 4

Ulysses S. Grant

James Garfield

Rutherford B. Hayes

Abraham Lincoln

Andrew Johnson

James Garfield

Ulysses S. Grant

Rutherford B. Hayes

Andrew Johnson

Abraham Lincoln

Recollecting order from Declarative Memory

time

Place these presidents in the correct order

Recollecting order from episodic memory

6http://www.youtube.com/watch?v=a6tSyDHXViM&feature=related

http://www.youtube.com/watch?v=a6tSyDHXViM&feature=related

Place scenes in correct order (serial recall)

7

time

A B C D

Goal: aggregating responses

8

D A B C A B D C B A D C A C B D A D B C

Aggregation Algorithm

A B C D A B C D

ground truth

=?

group answer

Bayesian Approach

9

D A B C A B D C B A D C A C B D A D B C

Generative Model

A B C D

group answer =latent random variable

Task constraints

No communication between individuals

There is always a true answer (ground truth)

Aggregation algorithm never has access to ground truth unsupervised methods ground truth only used for evaluation

10

Research Goals Aggregation of permutation data

going beyond numerical estimates or multiple choice questions combinatorially complex

Incorporate individual differences going beyond models that treat every vote equally assume some individuals might be “experts”

Take cognitive processes into account going beyond mere statistical aggregation

Hierarchical Bayesian models

11

Part IOrdering Problems

12

Experiment 1

Task: order all 44 US presidents

Methods 26 participants (college undergraduates) Names of presidents written on cards Cards could be shuffled on large table

13

= 1= 1+1Measuring performance

Kendall’s Tau: The number of adjacent pair-wise swaps

Ordering by IndividualA B E C D

True OrderA B C D E

C DEA B

A B E C D

A B C D E= 2

Empirical Results

15

1 10 200

100

200

300

400

500

Individuals (ordered from best to worst)

(random guessing)

Probabilistic models Thurstone (1927), Mallows (1957), Plackett-Luce (1975) Lebanon-Mao (2008)

Spectral methods Diaconis (1989)

Heuristic methods from voting theory Borda count

… however, many of these approaches were developed for preference rankings

Many methods for analyzing rank data…

16

Bayesian models constrained by human cognition

Extension of Thurstone’s (1927) model Extension of Estes (1972) perturbation model

17

Bayesian Thurstonian Approach

18

Each item has a true coordinate on some dimension

A B C


19

A B C

… but there is noise because of encoding and/or retrieval error

Person 1


20

Each person’s mental representation is based on (latent) samples of these distributions

B C

A B C

Person 1

A


21

B C

A B C

The observed ordering is based on the ordering of the samples

A < B < C

Observed Ordering:

Person 1

A


22

People draw from distributions with common means but different variances

Person 1

B C

A B CA < B < C

Observed Ordering:

Person 2

A B C

BC

Observed Ordering:

A < C < BA

A

Graphical Model Notation

23

jx

1x

2x 3xj=1..3

shaded = observednot shaded = latent

Graphical Model of Bayesian Thurstonian Model

24

j individuals

jx

jy

μ

j

| , ~ N ,ij j jx

( )j jranky x

~ Gamma ,1 /j

Latent ground truth

Individual noise level

Mental representation

Observed ordering

1 10 200

50

100

150

200

250

300

350

Individuals

Thurstonian ModelIndividuals

(weak) wisdom of crowds effect

26

model’s ordering is as good as best individual (but not better)

Inferred Distributions for 44 US Presidents

27

George Washington (1)John Adams (2)

Thomas Jefferson (3)James Madison (4)James Monroe (6)

John Quincy Adams (5)Andrew Jackson (7)

Martin Van Buren (8)William Henry Harrison (21)

John Tyler (10)James Knox Polk (18)

Zachary Taylor (16)Millard Fillmore (11)Franklin Pierce (19)

James Buchanan (13)Abraham Lincoln (9)

Andrew Johnson (12)Ulysses S. Grant (17)

Rutherford B. Hayes (20)James Garfield (22)Chester Arthur (15)

Grover Cleveland 1 (23)Benjamin Harrison (14)

Grover Cleveland 2 (25)William McKinley (24)

Theodore Roosevelt (29)William Howard Taft (27)

Woodrow Wilson (30)Warren Harding (26)Calvin Coolidge (28)Herbert Hoover (31)

Franklin D. Roosevelt (32)Harry S. Truman (33)

Dwight Eisenhower (34)John F. Kennedy (37)

Lyndon B. Johnson (36)Richard Nixon (39)

Gerald Ford (35)James Carter (38)

Ronald Reagan (40)George H.W. Bush (41)

William Clinton (42)George W. Bush (43)

Barack Obama (44)

median and minimumsigma

Model can predict individual performance

28

0 0.1 0.2 0.3 0.450

100

150

200

250

300

R=0.941

inferred noise level for

each individual

distance to ground

truth

individual

Extension of Estes (1972) Perturbation Model

Main idea: item order is perturbed locally

Our extension: perturbation noise varies

between individuals and items

29

A

True order

B C D E

Recalled order

DB C EA

Strong wisdom of crowds effect

31

1 10 200

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationIndividuals

Perturbation model’s ordering is better than best individual

Perturbation

Inferred Perturbation Matrix and Item Accuracy

322 6 10 14 18 22 26 30 34 38 42

1. George Washington (1)2. John Adams (2)

3. Thomas Jefferson (3)4. James Madison (4)5. James Monroe (6)

6. John Quincy Adams (5)7. Andrew Jackson (7)

8. Martin Van Buren (8)9. William Henry Harrison (21)

10. John Tyler (11)11. James Knox Polk (16)

12. Zachary Taylor (18)13. Millard Fillmore (9)

14. Franklin Pierce (20)15. James Buchanan (13)16. Abraham Lincoln (15)17. Andrew Johnson (10)18. Ulysses S. Grant (17)

19. Rutherford B. Hayes (19)20. James Garfield (22)21. Chester Arthur (14)

22. Grover Cleveland 1 (23)23. Benjamin Harrison (12)

24. Grover Cleveland 2 (25)25. William McKinley (24)

26. Theodore Roosevelt (28)27. William Howard Taft (26)

28. Woodrow Wilson (30)29. Warren Harding (27)30. Calvin Coolidge (29)31. Herbert Hoover (31)

32. Franklin D. Roosevelt (32)33. Harry S. Truman (33)

34. Dwight Eisenhower (34)35. John F. Kennedy (35)

36. Lyndon B. Johnson (36)37. Richard Nixon (38)

38. Gerald Ford (37)39. James Carter (39)

40. Ronald Reagan (40)41. George H.W. Bush (41)

42. William Clinton (42)43. George W. Bush (43)

44. Barack Obama (44)

Output position

True

pos

ition

0 5 10

Abraham Lincoln

Richard Nixon

James Carter

Alternative Heuristic Models

Many heuristic methods from voting theory E.g., Borda count method

Suppose we have 10 items assign a count of 10 to first item, 9 for second item, etc add counts over individuals order items by the Borda count

i.e., rank by average rank across people

33

Model Comparison

34

1 10 20 300

50

100

150

200

250

300

350

Individuals

Thurstonian ModelPerturbationBorda countIndividuals

Borda

Experiment 2

78 participants 17 problems each with 10 items

Chronological Events Physical Measures Purely ordinal problems, e.g.

Ten Amendments Ten commandments

35

Example results

36

1. Oregon (1)2. Utah (2)

3. Nebraska (3)4. Iowa (4)

5. Alabama (6)6. Ohio (5)

7. Virginia (7)8. Delaware (8)

9. Connecticut (9)10. Maine (10)

1. Freedom of speech & relig... (1)2. Right to bear arms (2)

3. No quartering of soldiers... (3)4. No unreasonable searches (4)

5. Due process (5)6. Trial by Jury (6)

7. Civil Trial by Jury (7)8. No cruel punishment (8)

9. Right to non-specified ri... (10)10. Power for the States & Pe... (9)

Perturbation Model Thurstonian Model

Average results over 17 Problems

37

Individuals

Mea

n

1 10 20 30 40 50 60 70 800

5

10

15

20

25

Individuals

Mea

n

Thurstonian ModelPerturbation ModelBorda countIndividuals

Strong wisdom of crowds effect across problems

0.8 1 1.2 1.4 1.6 1.8

0

2

4

6

8

10

12

14

16

18R=-0.752

1

2

3

4

5

6

7

8

9

10

1112

13

14

15

16

17

Predicting problem difficulty

38

std

dispersion of noise levels across individual

distance of group

answer to ground truth

ordering states geographically

city size rankings

Effect of Group Composition

How many individuals do we need to average over?

39

Effect of Group Size: random groups

40

0 10 20 30 40 50 60 70 807

8

9

10

11

12

13

14

Group Size

T=0T=2

T=12

Experts vs. Crowds

Can we find experts in the crowd? Can we form small groups of experts?

Approach Form a group for some particular task Select individuals with the smallest sigma (“experts”) based on

previous tasks Vary the number of previous tasks

41

Group Composition based on prior performance

42

0 10 20 30 40 50 60 70 807

8

9

10

11

12

13

14

Group Size

T=0T=2

T=12

T = 0

# previous tasks

T = 2T = 8

Group size (best individuals first)

Methods for Selecting Experts

43

Endogenous: no feedback

required

Exogenous: selecting people based on

actual performance

0 10 20 30 407

8

9

10

11

12

13

14

0 20 407

8

9

10

11

12

13

14

Aggregating Episodic Memories

44

Study this sequence of images

Place the images in correct sequence (serial recall)

45

A

B

C

D

E

F

G

H

I

J

Average results across 6 problems

46

Mea

n

1 10 20 300

5

10

15

Individuals


Example calibration result for individuals

47

0 2 4 60

5

10

15

20

25

30

R=0.920

inferred noise level

distance to ground

truth

individual

(pizza sequence; perturbation model)

Predictive Rankings: fantasy football

48

South Australian Football League (32 people rank 9 teams)

1 10 20 300

20

40

60

80

Individuals


Australian Football League (29 people rank 16 teams)

1 10 20 300

5

10

15

20

25

Individuals

1 10 20 300

20

40

60

80

Part IIMatching Problems

49

Study these combinations

50

2 3 4 51

B C D EA

Find all matching pairs

51

Experiment

15 subjects

8 problems 4 problems with 5 items 4 problems with 10 items

52

Mean accuracy across 8 problems

53

1 2 3 4 5 6 7 8 9 10 11 12 13 14 150

0.2

0.4

0.6

0.8

1

Individuals

Mea

n A

ccur

acy

Bayesian Matching Model

Proposed process: match “known” items guess between remaining ones

Individual differences some items easier to know some participants know more

54

Graphical Model

55

i items

jx

jy

z

ja

Latent ground truth

Observed matching

Knowledge State

jsProb. of knowing

id

j individuals

logitj i js d a

~ Bernoulliij ijx s

1 1( )

1 / ! 0ij

ij ij ij

xp y z

n x

person abilityitem easiness

Modeling results across 8 problems

56

1 5 10 150

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Individuals

Mea

n A

ccur

acy

Bayesian MatchingHungarian AlgorithmIndividuals

Calibration at level of items and people

57

ITEMS INDIVIDUALS

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (a

ctua

l)

Clothing and faces (5)

R=0.318

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (a

ctua

l)


R=0.722

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (a

ctua

l)

Animals and houses (5)

R=0.433

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (a

ctua

l)


R=0.854

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (a

ctua

l)

Weapons and faces (5)

R=0.969

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (a

ctua

l)


R=0.893

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (a

ctua

l)

Sport and faces (5)

R=0.223

0 0.5 10

0.2

0.4

0.6

0.8

1

D (inferred)

D (a

ctua

l)

Sport and faces (10)

R=0.898

(for weapons and faces 10 items problem)

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)A

(act

ual)


R=0.955

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (a

ctua

l)


R=0.994

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (a

ctua

l)


R=0.962

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (a

ctua

l)


R=0.971

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (a

ctua

l)


R=0.943

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (a

ctua

l)


R=0.957

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (a

ctua

l)

Sport and faces (5)

R=0.953

0 0.5 10

0.2

0.4

0.6

0.8

1

A (inferred)

A (a

ctua

l)

Sport and faces (10)

R=0.984

Varying number of individuals

58

0 5 10 1550

55

60

65

70

75

80

85

90

95

100

Number of Individuals

Mea

n A

ccur

acy

Bayesian MatchingHungarian Algorithm

0 1-2 3-4 5+0

0.2

0.4

0.6

0.8

1

0 1-2 3-4 5+0

0.2

0.4

0.6

0.8

1

How predictive are subject provided confidence ratings?

59

# guesses estimatedby individual

Acc

urac

y

# guesses estimatedby model

(based on variable A)

r=-.50 r=-.81

Another matching problem

60

Dutch

Danish

Yiddish

Thai

Vietnamese

Chinese

Georgian

Russian

Japanese

A

B

C

D

E

F

G

H

I

godt nytår

gelukkig nieuwjaar

a gut yohr

С Новым Годом

สวสัดีปีใหม่

Chúc Mừng Nǎm Mới

გილოცავთ ახალწელს

Modeling Results – Declarative Tasks

62

1 10 200

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Individuals

Mea

n A

ccur

acy

Bayesian MatchingHungarian AlgorithmIndividuals

Part IIITraveling Salesman Problems

65

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21Find the shortest route between cities

66

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 5

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 83

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21 - subj 60

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21

B30-21

Individual 5 Individual 83 Individual 60Optimal

Dataset Vickers, Bovet, Lee, & Hughes (2003)

83 participants 7 problems of 30 cities

TSP Aggregation Problem

Propose a good solution based on all individual solutions

Task constraints Data consists of city order only No access to city locations

68

Approach

Find tours with edges for which many individuals agree

Calculate agreement matrix A A = n × n matrix, where n is the number of cities aij indicates the number of participants that connect cities i and j.

Find tour that maximizes

69

tourji

cija

),(

(this itself is a non-Euclidian TSP problem)

Line thickness = agreement

70

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

2021

22

23

24

25

26

27

28

29

30

B30-21Blue = Aggregate Tour

71

Results averaged across 7 problems

0

2

4

6

8

10

12

14

16

18

Per

cent

ove

r Opt

imal

aggregate

Part IVSummary & Conclusions

74

When do we get wisdom of crowds effect?

Independent errors different people knowing different things Some minimal number of individuals

10-20 individuals often sufficient

75

What are methods for finding experts?

1) Self-reported expertise: unreliable has led to claims of “myth of expertise”

2) Based on explicit scores by comparing to ground truth but ground truth might not be immediately available

3) Endogenously discover experts Use the crowd to discover experts Small groups of experts can be effective

76

What to do about systematic biases?

In some tasks, individuals systematically distort the ground truth spatial and temporal distortions memory distortions (e.g. false memory) decision-making distortions

Does this diminish the wisdom of crowds effect? maybe… but a model that predicts these systematic distortions might be

able to “undo” them

77

Conclusion

Effective aggregation of human judgments requires cognitive models

Psychology and cognitive science can inform aggregation models

78

That’s all

79

Do the experiments yourself:

http://psiexp.ss.uci.edu/

http://psiexp.ss.uci.edu/

Online Experiments

Experiment 1 (Prior knowledge) http://madlab.ss.uci.edu/dem2/examples/

Experiment 2a (Serial Recall) study sequence of still images http://madlab.ss.uci.edu/memslides/

Experiment 2b (Serial Recall) study video http://madlab.ss.uci.edu/dem/

80

http://madlab.ss.uci.edu/dem2/examples/

http://madlab.ss.uci.edu/memslides/

http://madlab.ss.uci.edu/dem/

Graphical Model

81

i items

jx

jy

z

ja

Latent ground truth

Observed matching

Knowledge State

jsProb. of knowing

id

j individuals

logitj i js d a

~ Bernoulliij ijx s

1 1( )

1 / ! 0ij

ij ij ij

xp y z

n x

item and person parameters

MDS solution of pairwise tau distances

82-15 -10 -5 0 5 10 15 20 25 30 35-20

-15

-10

-5

0

5

10

15

7

26

3

16

7 96

1

22

2

13

12

7

11

14

9

5

7

11

8

3

24

3

7

10

10

4

03

6

9

6

26

5

18

44 3

14

6

2

5

3

5

1

4210

11

4

3

42

0

8

21

7

3

5

1

1

8

1

33

14

3

20

6

8

16

7

22

23

2 3710

states westeast

IndividualsTruthThurstonian Model

distance to truth

MDS solution of pairwise tau distances

83-20 -15 -10 -5 0 5 10 15 20 25

-20

-15

-10

-5

0

5

10

15

20

14

23

25

24

18 24

13

14

10

5

9

20

8

20

15

18

12

33

25

29

171

14

20

27176

13

11

15

3

17

17

17

24

7

26

9

13

17

27

13

15

11

15

15

23

2811

26

16

4

27

9

23

24

11

17

19

15

22

2

15

14

12

21

11

26

11

18

35

22

10

20

24

25

1

19

7

0

ten commandments

IndividualsTruthThurstonian Model

Hierarchical Bayesian Models

Generative models ordering information cognitively plausible individual differences

Group response = probability distribution over all permutations of N items With N=44 items, we have 44! > 1053 combinations Approximate inference methods: MCMC

84

Model incorporating overall person ability

85

j individuals

jmx

jmy

mμ

jm

| , ~ N ,ijm m jm m jmx

( )jm jmranky x

~ Gamma ,1 /jm j j

Overall ability

Task specific ability

m tasks

j ~ Gamma ,1 /j j individuals

1 10 20 30 40 50 60 70 800

5

10

15

20

25

Individuals

Mea

n

Thurstonian Model v1Thurstonian Model v2Borda countModeIndividuals

Average results over 17 Problems

86

Mea

n new model

Thurstonian Model – stereotyped event sequences

87

event1 (1)event2 (2)event3 (3)event4 (4)event5 (5)event6 (7)event7 (6)event8 (8)event9 (9)

event10 (10)

Bus (Recall)

0

5

10

15

20

25

R=0.890


event10 (10)

Morning (Recall)

0

5

10

15

20

25

R=0.982


event10 (10)

Wedding (Recall)

0 0.5 1 1.5 20

5

10

15

20

25

R=0.973

Thurstonian Model – “random” videos

88


event10 (10)

Yogurt (Recall)

0

5

10

15

20

25

R=0.908

event1 (1)event2 (3)event3 (4)event4 (5)event5 (2)event6 (6)event7 (7)event8 (9)

event9 (10)event10 (8)

Pizza (Recall)

0

5

10

15

20

25

R=0.851


event10 (10)

Clay (Recall)

0 0.5 1 1.5 20

5

10

15

20

25

R=0.928

Heuristic Aggregation Approach

Combinatorial optimization problem maximizes agreement in assigning N items to N responses

Hungarian algorithm construct a count matrix M Mij = number of people that paired item i with response j find row and column permutations to maximize diagonal sum O( n3 )

89

Hungarian Algorithm Example

90= correct

DutchDan

ish

Frenc

h

Japan

ese

Span

ish

Arabic

Chinese

German

Italia

nRussi

an

ThaiViet

namese

Wels

hGeo

rgian

Yiddish

gelukkig Nieuwjaar 7 3 0 0 0 1 0 0 0 0 0 0 2 0 2godt nytår 2 3 0 0 0 0 0 2 0 2 0 0 1 3 2

bonne année 0 0 14 0 1 0 0 0 0 0 0 0 0 0 00 0 0 9 0 0 2 0 1 0 3 0 0 0 0

feliz año nuevo 0 0 0 0 14 0 0 0 0 0 1 0 0 0 0عامسعيد 0 1 0 0 0 14 0 0 0 0 0 0 0 0 0

0 0 0 2 0 0 12 0 0 0 0 1 0 0 0ein gutes neues Jahr 3 1 0 0 0 0 0 9 0 0 0 0 1 0 1

felice anno nuovo 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0С Новым Годом 0 0 1 0 0 0 0 0 0 11 0 0 1 2 0

สวัสดีปีใหม่ ่ 0 0 0 1 0 0 1 0 0 0 7 1 1 4 0Chúc Mừng Nǎm Mới 0 0 0 0 0 0 0 0 0 1 0 11 1 2 0

Blwyddyn Newydd Dda 0 4 0 1 0 0 0 0 0 0 1 0 6 1 2გილოცავთ ახალ წელს 0 0 0 2 0 0 0 1 0 0 3 2 0 1 6

a gut yohr 3 3 0 0 0 0 0 3 0 0 0 0 2 2 2

= incorrect

wisdom of crowds in human memory: reconstructing events by aggregating memories across individuals

Documents

b ca b d cb

order of us presidents

order of events

correct order talk

d c6place scenes

d c7 goal

n responsese

individual differencesgoing