advisor: hsin-hsi chen reporter: chi-hsin yu date: 2010.08.12 from word representations:... acl2010,...

16
Word Representations Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu From Word Representations: ... ACL2010, From Frequency ... JAIR 2010 Representing Word ... Psychological Review 2007

Upload: howard-logan

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Introduction A word representation A mathematical object associated with each word, often a vector. Examples dog: animal, pet, four-leg,... cat: animal, pet, four-leg,... bird: animal, two-leg, fly,... Questions How do we build this matrix? Are there other representations except matrix? Vocabulary 3

TRANSCRIPT

Page 1: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

Word Representations

Advisor: Hsin-Hsi ChenReporter: Chi-Hsin YuDate: 2010.08.12

From Word Representations: ... ACL2010, From Frequency ... JAIR 2010Representing Word ... Psychological Review 2007

Page 2: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

2

OutlinesIntroductionWord representationsExperimental Comparisons (ACL 2010)

Chunking, Named Entity Recognition Conclusions

Page 3: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

3

IntroductionA word representation

A mathematical object associated with each word, often a vector.

Examplesdog: animal, pet, four-leg, ...cat: animal, pet, four-leg, ...bird: animal, two-leg, fly, ...

QuestionsHow do we build this matrix?Are there other representations

except matrix?

Vocabulary

Page 4: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

4

Word RepresentationsCategorizing word representations by sources

From human Feature list, semantic networks, ontology (WordNet, SUMO,

FrameNet, ...)From texts

Frequency-based Distributional Representation, Latent Semantic Indexing

Model-based Clustering (Brown clustering), Latent Dirichlet Allocation,

embedding (Neural Language Model, Hierarchical Log-Bilinear model)

Operations-based Random indexing (quantum informatics), holographic lexicon

Page 5: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

5

Word RepresentationsSome important considerations

Dimension size Distributional representations: > 5000 HLBL, random indexing, LSI: <500

Format Vector Network

Encoded knowledge/relations/information World knowledge: ontology Word semantics Word similarity/distance/proximity

Most important question in word representationsWhat is meaning?

Page 6: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

6

Word Representations –Distributional RepresentationsFrom texts, frequency-basedRow-column<=>Token-event

Event= Same document, Window-size within 5 words

Page 7: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

7

Word Representations –Distributional RepresentationsThe event can be some patterns.

A door is a part of a house. Token door:house Event is_a_part_of

Some procedures applying on the matrix [From Frequency to Meaning: Vector Space Model of Semantics, 2010, JAIR]

Preprocess of texts (tokenization, annotation...)Normalization/weightingSmoothing of the matrix (using SVD)

Latent meaning, noise reduction, high-order co-occurrence, sparsity reduction

Page 8: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

8

Word Representations –Brown ClusteringThe Brown algorithm

is a hierarchical clustering algorithm which clusters words to maximize the mutual information of bigrams (Brown et al., 1992).

is a class-based bigram language model. runs in time O(V·K2),

where V is the size of the vocabulary and K is the number of clusters.

Page 9: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

9

Word Representations –EmbeddingCollobert and Weston embedding (2008)

Neural language modelDiscriminative and non-probabilistic

Hierarchical log-bilinear embedding (HLBL) (2009)

Neural language modelDistributed representation

Page 10: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

10

Experimental Comparisons(ACL2010)

ChunkingCoNLL-2000 shared taskLinear CRF chunker (Sha and Pereira 2003)

Data From Penn Treebank, 7936 sentences(training), 1ooo sentences

(development)Name Entity Recognitions

The regularized averaged perceptron model (Ratinov and Roth 2009)

CoNLL03 shared task 204k words for training, 51k words for development, 46K words

for testing Evaluating out-of-domain dataset: MUC7 formal run (59K words)

Page 11: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

11

Experimental Comparisons—Features

Chunking NER

Page 12: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

12

Experimental Comparisons—Results

Page 13: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

13

Experimental Comparisons—Results

Page 14: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

14

Experimental Comparisons—Results

Page 15: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

15

ConclusionsWord features

can be learned in advance in an unsupervised, task-inspecific, and model-agnostic manner.

The disadvantageis that accuracy might not be as high as a semi-

supervised method that includes task-specific information and that jointly learns the supervised and unsupervised tasks. (Ando & Zhang, 2005-ASO; Suzuki & Isozaki, 2008; Suzuki et al., 2009)

Future workis inducing phrase representations.

Page 16: Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: 2010.08.12 From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological

16

Thanks for your attention!Q&A