monica tamariz richard shillcock [email protected] [email protected]

21
Real World Constraints on the Mental Lexicon: Assimilation, the Speech Lexicon and the Information Structure of Spanish Words. Monica Tamariz Richard Shillcock [email protected] [email protected]

Upload: haines

Post on 12-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Real World Constraints on the Mental Lexicon: Assimilation, the Speech Lexicon and the Information Structure of Spanish Words. Monica Tamariz Richard Shillcock [email protected] [email protected]. Overview. Use information profiles of word systems (corpus, lexicons). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Real World Constraints on the Mental Lexicon: Assimilation,

the Speech Lexicon and the Information Structure of Spanish

Words.Monica Tamariz Richard Shillcock

[email protected] [email protected]

Page 2: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Overview

• Use information profiles of word systems (corpus, lexicons).

• More realistic representations of speech generate flatter profiles.

• Flatter profiles reflect more efficient use of the representational space.

Page 3: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Assumptions

• Phonology plays a part in the organization of the mental lexicon.

• For maximal efficiency, information should be spread as evenly as possible over the representational space.

Page 4: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Distribution of information over the representational space

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

representational space segment

info

rmat

ion:

ent

ropy

Page 5: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Entropy

• Concept of entropy from information theory (Shannon, 1948).

• Measure of the uncertainty or informationH = - (pi · log pi)

• Redundancy R = 1 - H

Page 6: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Data sets

• Speech corpusSpeech corpus: 707,000 word tokens.

• Speech lexiconSpeech lexicon: 42,000 word types.

• Dictionary lexiconDictionary lexicon: 28,000 headwords.

Page 7: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Transcriptions

• Citation (30 phonemes)

• Fast-speech(50 phonemes

and allophones)

Page 8: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Fast-speech transcription

• Glides

• Approximant B D G

• Consonant assimilatione.g. []

[z][dental n],

[dental l]

Page 9: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

ORTHO. CITATION FAST-SPEECH

admitir admitIr amitIrcarnets karnEts kanEscolgado kolgAdo kol Aoquitarle kitArle kitAlegracias grAzias Aziagaitero gaitEro gatEro

Transcriptions

Page 10: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

The Information Profile

Calculate entropy H = - (pi · log pi)

Words Count of phonemes in each segment position1 2 3 4 5 6 7 P1 P2 P3 P4 P5 P6 P7

1 a d m i t I r a 4189 6917 1533 1741 258 3923 92002 t e r m I n a b 1789 947 2020 2918 1547 358 53 k a r n E t s d 3075 164 1496 980 2066 4350 2324 k o l g A d o e 4481 8918 1003 1792 284 3427 38665 m o m E n t o f 1104 169 766 131 26 12 26 f a m I l i a g 1110 169 1739 677 285 654 257 k i t A r l e i 1553 4278 1850 2561 4430 2129 1068 t e n E m o s k 3856 1075 1911 567 745 1364 599 g r A z i a s l 567 1450 2207 994 1562 1612 126810 t o d a b I a m 3540 972 4688 2668 3294 668 8etc etc

T. 45559 45559 45559 45559 45559 45559 45559

Page 11: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

The Information profileCorpus. Citation

y = -0.0256x + 0.8941

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7segment position

Slope (-m) Mean level of entropy (Hrel)

Page 12: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

The LERR principle

(Levelling effect of realistic representations)

“Processes that make

the representation of words

more accurate

will flatten the information profiles”

Page 13: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

The effect of the transcription:Information profile slopes

• Fast speech has flatter profiles (as in other languages)

• Longer words have flatter profiles

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

4 5 6 7

slop

e (-

m)

CitationFast-speech

Page 14: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

The effect of the transcription: level of entropy.

• More entropy in the citation transcription.

• Fast speech is more redundant and thus, more predictable. 0.6

0.65

0.7

0.75

0.8

4 5 6 7

Hre

l

Citation

Fast-speech

Page 15: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

The Speech Lexicon: Information profile slopes.

• Speech Lexicon: the active mental lexicon represented in the brain.

• The speech lexicon has flatter profiles.

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

4 5 6 7

slop

e (-

m)

Dict. Lex.Speech Lex.

Page 16: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

The Speech Lexicon:Level of entropy

• Speech lexicon: low redundancy levels.

• Level varies little across word lengths.

• Support for Butterworth ‘Full Listing Hypothesis’.

0.6

0.65

0.7

0.75

0.8

4 5 6 7

Hre

l

Dict. Lex.

Speech Lex.

Page 17: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Corpus vs. Lexicon: Information profile slopes

• Corpus: representation over time.

• Lexicon: representation over space.

• The lexicon yields flatter profiles.

00.010.020.030.040.050.060.07

4 5 6 7

w ord length

slop

e (-

m)

CorpusLexicon

Page 18: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Corpus vs. Lexicon: Level of entropy.

• The lexicon generates higher entropy levels.

0.6

0.65

0.7

0.75

0.8

4 5 6 7

w ord lengthH

rel

Corpus

Lexicon

Page 19: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Discussion

• Fast-speech rules and a ‘Full List’ mental lexicon flatten the information profile.

• In the speech lexicon, the main constraint is efficiency of storage.

• In the corpus, other constraints - such as lexical segmentation - interact with the optimization of communication.

Page 20: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Conclusion

• This simple analysis of the information profile of word systems is a useful tool that can provide insights into the validity of psycholinguistic theories.

Page 21: Monica Tamariz  Richard Shillcock monica@ling.ed.ac.uk  rcs@cogsci.ed.ac.uk

Real World Constraints on the Mental Lexicon: Assimilation,

the Speech Lexicon and the Information Structure of Spanish

Words.Monica Tamariz Richard Shillcock

[email protected] [email protected]