information communication theory

43
Information Communication Theory Kentaro Inui ( 乾 乾乾乾 ) Naoaki Okazaki ( 乾乾 乾 乾) 2011-10- 04 Information Communication Theory ( 乾乾乾乾乾 ) 1 乾乾乾乾乾乾 ()

Upload: marcel

Post on 22-Feb-2016

1.736 views

Category:

Documents


0 download

DESCRIPTION

Information Communication Theory. Kentaro Inui ( 乾 健太郎 ) Naoaki Okazaki ( 岡崎 直観 ). (情報伝達学). Course Plan. Part I ( Okazaki ) 10/04: Introduction 10/11: Classification 10/18: Part-of-speech tagging 10/25: Syntactic parsing 11/01: Statistical parsing. Part II ( Inui ) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Information Communication Theory

InformationCommunication

Theory

Kentaro Inui ( 乾 健太郎 )Naoaki Okazaki ( 岡崎 直観 )

2011-10-04 Information Communication Theory ( 情報伝達学 ) 1

(情報伝達学)

Page 2: Information Communication Theory

Course Plan• Part I ( Okazaki )

• 10/04: Introduction• 10/11: Classification• 10/18: Part-of-speech tagging• 10/25: Syntactic parsing• 11/01: Statistical parsing

• Part II ( Inui )• 11/08: Features and unification• 11/15: Representation of meaning• 11/22: Computational semantics• 11/29: Computational lexical semantics• 12/06: (no class)

2011-10-04 Information Communication Theory ( 情報伝達学 ) 2

• Part III ( Inui, Okazaki, TAs ) • 12/13, 12/20, 2013/01/10, 01/17, 01/24• Programming exercises and project from Natural Language

Processing with Python ( by Steven Bird )• Lectures given at 計算機大演習室( New Student Laboratory

Building for Information Engineering, 情報新棟 1 階)

Page 3: Information Communication Theory

Course Format• Text ( optional )

• Jurafsky, Daniel and Martin, James H. Speech and Language Processing. Prentice-Hall, 2009 ( 2nd Edition )• ~ \6,000 available at amazon.co.jp

• Bird, Steven et al. Natural Language Processing with Python. Oreilly & Associates Inc., 2009• 萩原 正人,中山 敬広,水野 貴明 訳 『入門 自然言語処理』

O'Reilly Japan, 2010

• Grading• Exercises ( given in lectures ) : 40%• Final report ( programming project )

2011-10-04 Information Communication Theory ( 情報伝達学 ) 3

Page 4: Information Communication Theory

Handouts• If necessary, please print out a handout and bring it to the class by yourself• Alternatively, browse it on your laptop

• Handouts will be available at (before dawn):• http://

www.cl.ecei.tohoku.ac.jp/index.php?InformationCommunicationTheory

• Username: nlp2012• Password: chukougishitsu

2011-10-04 Information Communication Theory ( 情報伝達学 ) 4

Page 5: Information Communication Theory

Contact Information

• Office hours: • Tue, 1:00-2:30pm or by appointment

• Office: • Room 305 ( 108 after Nov ) , Electrical Engineering

and Applied Physics Research Building No.3 (電気系 3号館)• Contact:

[email protected] @inuikentaro• [email protected] @chokkanorg

2011-10-04 Information Communication Theory ( 情報伝達学 ) 5

Page 6: Information Communication Theory

IntroductionNaoaki [email protected]://www.chokkan.org/http://twitter.com/#!/chokkanorg#nlptohokuhttp://www.chokkan.org/lectures/2012nlp/p/01.pdf

Information Communication Theory ( 情報伝達学 ) 62011-10-04

Page 7: Information Communication Theory

Natural Language Processing (NLP)• Giving computers the ability to process human language• As old as the idea of computers themselves!• Implementations and implications of the exciting idea• The long-awaited dream (that has not come true yet)

Information Communication Theory ( 情報伝達学 ) 7

Doraemon C-3PO(Star Wars)

Atom(Astro boy)

2011-10-04

Page 8: Information Communication Theory

2011-10-04 Information Communication Theory ( 情報伝達学 ) 8

What are needs to be done for understanding languagesas humans do?

Part I: Knowledge (disciplines)

Page 9: Information Communication Theory

Lexical semantics ( 語彙意味論 )

2011-10-04 Information Communication Theory ( 情報伝達学 ) 9

How much Chinese silk was exported to Western Europe by the end of the 18th century?

N

S

EW

Meaning of words

Page 10: Information Communication Theory

Compositional semantics ( 合成意味論 )

2011-10-04 Information Communication Theory ( 情報伝達学 ) 10

How much Chinese silk was exported to Western Europe by the end of the 18th century?

Meaning of constituents

1700 1720 1740 1760 1780 1800

The 18th Century

the end

of

Page 11: Information Communication Theory

Compositional??? (with adjectives)

2011-10-04 Information Communication Theory ( 情報伝達学 ) 11

girl friendformer holeblack

towel winewhitewhite

!?

Page 12: Information Communication Theory

Morphology ( 形態論 )

2011-10-04 Information Communication Theory ( 情報伝達学 ) 12

How much Chinese silk was exported to Western Europe by the end of the 18th century?

Study on word formations(breaking words down into morphemes)

• Inflection ( 屈折 )• is – was – being – been• export – exports – exporting – exported – exported

• Derivation ( 派生 )• China – Chinese• West – Western

Page 13: Information Communication Theory

Syntax ( 統語論,文法 )

• Part-of-speech (POS): Lecture #3• Categorization of words, e.g., nouns, verbs, adjectives, adverbs

• Constituency: Lectures #4 and #5• Grouping words that may behave as a single unit or phrase• e.g., noun phrase, verb phrase, prepositional phrase

• Grammatical relations: Lecture #5• Relationship between words/constituents

2011-10-04 Information Communication Theory ( 情報伝達学 ) 13

Principles and rules for constructingphrases and sentences

Page 14: Information Communication Theory

Syntactic tagging and parsing• Assign a structure to an input sentence

2011-10-04 Information Communication Theory ( 情報伝達学 ) 14

Economic news had little effect on financial markets .JJ NN VBD JJ NN IN JJ NNS

NP NP NP

PP

NP

VP

PU

S

nmod nmod nmodsbj

obj

nmod

pc

p

Constituent parsing

Dependency parsing

Nivre and Kubler (2006)

POS tagging

Page 15: Information Communication Theory

Semantic role ( 意味役割 )

2011-10-04 Information Communication Theory ( 情報伝達学 ) 15

How much Chinese silk was exported to Western Europe by the end of the 18th century?

1700 1720 1740 1760 1780 1800

The 18th Century

How much Chinese silk was exported to Western Europe by southern merchants?

TEMPORAL

AGENT

Page 16: Information Communication Theory

Coreference ( 共参照 )U: Where is The Green Hornet playing in Mountain View?S: The Green Hornet is playing at the Century 16 theatre.U: When is it playing there?S: It’s playing at 2pm, 5pm, and 8pm.U: I’d like 1 adult and 2 children for the first show. How much would that cost?

2011-10-04 Information Communication Theory ( 情報伝達学 ) 16

What does “it” refers to?What does “the first show” refers to?What does “that” refers to?

We can guess these easily!

Page 17: Information Communication Theory

Coreference ( 共参照 )U: Where is The Green Hornet playing in Mountain View?S: The Green Hornet is playing at the Century 16 theatre.U: When is it playing there?S: It’s playing at 2pm, 5pm, and 8pm.U: I’d like 1 adult and 2 children for the first show. How much would that cost?

2011-10-04 Information Communication Theory ( 情報伝達学 ) 17

How words like that or pronouns like it refer to previous parts of the discourse

Page 18: Information Communication Theory

Pragmatics ( 語用論 )

• Bob: Are you coming to the party?• Jane: I’m afraid I can’t.

• Bob: Are you coming to the party?• Jane: You know, I’m really busy.

• Bob: Could you pass me the sugar?• Jane: Yes. Here you are.2011-10-04 Information Communication Theory ( 情報伝達学 ) 18

Actions that speakers intendby their use of text

Page 19: Information Communication Theory

Discourse ( 談話 )

2011-10-04 Information Communication Theory ( 情報伝達学 ) 19

http://www.isi.edu/~marcu/discourse/tagging-ref-manual.pdf

Coherent structured groups of text

Page 20: Information Communication Theory

Various knowledge about languages• Morphology ( 形態論 ): meaningful components within words• Syntax ( 文法 ): structural relationships between words• Semantics ( 意味論 ): meanings of words, phrases,

sentences• Discourse ( 談話 ): relationships across/beyond different

sentences or statements; contextual processing• Pragmatic ( 語用論 ): relationship of meaning to the goals

and intentions of speakers; how we use languages to communicate

• World knowledge ( 世界知識 ): facts of the world; common sense

2011-10-04 Information Communication Theory ( 情報伝達学 ) 20

Page 21: Information Communication Theory

2011-10-04 Information Communication Theory ( 情報伝達学 ) 21

What are needs to be done for understanding languagesas humans do?

Part II: Ambiguity

Page 22: Information Communication Theory

Ambiguity• We may build multiple, alternative linguistic structures and interpretations for a single input• I made her duck (see more examples later)

• Disambiguation (or resolution): to decide which linguistic/semantic structure/interpretation is the most appropriate (in the context)

2011-10-04 Information Communication Theory ( 情報伝達学 ) 22

Page 23: Information Communication Theory

Part-of-speech tagging and ambiguity

2011-10-04 Information Communication Theory ( 情報伝達学 ) 23

Time flies like an arrow .

NN VBZ IN DT NN .(光陰矢のごとし)

VB NNS IN DT NN .(ハエの速度を矢のように測定せよ)

NN NNS VBP DT NN .(時蠅は矢を好む)

Page 24: Information Communication Theory

Attachment ambiguity (1/3)

• I saw the girl on the hill with a telescope.

• I saw the girl on the hill with a telescope.

2011-10-04 Information Communication Theory ( 情報伝達学 ) 24

Page 25: Information Communication Theory

Attachment ambiguity (2/3)

• I saw the girl on the hill with a telescope.

• I saw the girl on the hill with a telescope.

2011-10-04 Information Communication Theory ( 情報伝達学 ) 25

Page 26: Information Communication Theory

Attachment ambiguity (3/3)

• I saw the girl on the hill with a telescope.

• I saw the girl on the hill with a telescope.

2011-10-04 Information Communication Theory ( 情報伝達学 ) 26

Page 27: Information Communication Theory

• Put [[the insects in the box] and [the bowl on the table]]

• Put the insects in [[the box] and [the bowl on the table]]

Coordination ambiguity

2011-10-04 Information Communication Theory ( 情報伝達学 ) 27

Page 28: Information Communication Theory

Semantic ambiguity• Syntax structure is insufficient to represent the meaning

• Distinction between syntax and semantics• Colorless green ideas sleep furiously (Chomsky, 1957)

• Opposite• John bought a book from Mary vs Mary sold a book to John

• Lexical ambiguity• I went to the bank… (of the river) or (to get some money)

• Quantifier• Every man loves a woman

2011-10-04 Information Communication Theory ( 情報伝達学 ) 28

Page 29: Information Communication Theory

2011-10-04 Information Communication Theory ( 情報伝達学 ) 29

The state-of-the-art ofNatural Language Processing

Page 30: Information Communication Theory

Commercial world• A lot of exciting staff going on…

Information Communication Theory ( 情報伝達学 ) 302011-10-04

Page 31: Information Communication Theory

Machine translation (Google)

Information Communication Theory ( 情報伝達学 ) 312011-10-04

Page 32: Information Communication Theory

Machine translation (Google)

Information Communication Theory ( 情報伝達学 ) 322011-10-04

Page 33: Information Communication Theory

Watson (IBM)• Question answering system built on IBM’s DeepQA technology• 14-16 February 2011, Watson beat two human competitors, the

biggest all-time money winner on Jeopardy! and the record holder for the longest championship streak

• Hardware• 2880 processor cores (3.5 GHz POWER7 eight core processors)• 16 TB RAM in total

• Software• Written in Java and C++• Using Apache Hadoop framework for distributed computing

• Data• 200M pages (about 1M books) of structured and unstructured content• Consuming 4T of disk storage• Encyclopedias, dictionaries, thesauri, newswire articles, literary works

Information Communication Theory ( 情報伝達学 ) 33

http://en.wikipedia.org/wiki/Watson_(computer)

2011-10-04

Page 34: Information Communication Theory

Jeopardy!• American quiz show featuring

• history, literature, the arts, pop culture, science, sports, geography, wordplay, etc.

• Six categories are announced, each with five trivia clues

• A correct response adds the dollar value• An incorrect response or a failure to respond within a five-second time limit deducts the dollar value

Information Communication Theory ( 情報伝達学 ) 34

http://en.wikipedia.org/wiki/Jeopardy!

2011-10-04

Page 35: Information Communication Theory

Final Jeopardy! and the Future of Watson

• Watch the video (08:58):• http://www.youtube.com/watch?v=Wq0XnBYC3nQ

Information Communication Theory ( 情報伝達学 ) 352011-10-04

Page 36: Information Communication Theory

Science behind an answer• Watch the very nice video (06:42):

• http://www.youtube.com/watch?v=DywO4zksfXw

Information Communication Theory ( 情報伝達学 ) 362011-10-04

Page 37: Information Communication Theory

Science behind an answer• Step 1: Question analysis

• What is type of question being asked?• What is the question asking for?

• Step 2: Hypothesis generation• Search millions of documents for possible answers

• Step 3: Hypothesis and evidence scoring• Collect positive and negative evidences to support each answer• Score evidences based on everything from source material reliability to

whether time and locations appear correct• Parallelized evidence scoring for each possible answer

• Step 4: Final merging and ranking• Learn the importance of each evidence by practicing games• Yield the final ranking of possible answers• Decide whether Watson answers the question or not based on the

confidence

2011-10-04 Information Communication Theory ( 情報伝達学 ) 37

Page 38: Information Communication Theory

A shame (of NLP)• Japanese translation of the book, “Einstein: His Life and Universe,” published on 23 June 2011

Information Communication Theory ( 情報伝達学 ) 38

• Chapter 13 was translated by computers, not by humans!• How this happened: http://

www.amazon.co.jp/review/R29GQAF5DUOAEW/ref=cm_cr_rdp_perm

• Very rare incident that an MT’ed book is published

• Revised version was published on 17 Aug 2011

2011-10-04

Page 39: Information Communication Theory

Imagine the original sentence• ボルンの妻のヘートヴィヒに最大限にしてください。(そのヘートヴィヒは,彼の家族に関する彼の処理,今や説教された頃,彼が「自分がそのかなり不幸な回答に駆り立てられるのを許容していないべきでない」と自由に彼に叱った)。以上は,彼が目立つべきであり,彼女が言ったのを「科学の人里離れている寺」に尊敬します。

Information Communication Theory ( 情報伝達学 ) 39

• Max Born's wife, Hedwig, who had freely scolded Einstein about his treatment of his family, now lectured, “[You should] not have allowed yourself to be goaded into that rather unfortunate reply.” He should show more respect, she said, for “the secluded temple of science.” (P286)

2011-10-04

Page 40: Information Communication Theory

Passing exams for University of Tokyo

2011-10-04 Information Communication Theory ( 情報伝達学 ) 40

Page 41: Information Communication Theory

Writing short science fictions

2011-10-04 Information Communication Theory ( 情報伝達学 ) 41

Page 42: Information Communication Theory

Goal of this course• Overview the issues and technologies for natural language understanding• What is possible/easy? What is impossible/difficult?• Why is this achieved or not achieved by the current

technology?

• Provide fundamental theories and techniques for natural language processing• Some techniques are useful for other research fields

• Exercise programming with real NLP tasks• You will be an experienced engineer!

2011-10-04 Information Communication Theory ( 情報伝達学 ) 42

Page 43: Information Communication Theory

Course plan1. 4 Oct: Introduction2. 11 Oct: Classification

• Spam filtering, linear classifier, feature extraction, perceptron, logistic regression, evaluation (precision, recall, F1)

3. 18 Oct: Part-of-speech tagging4. 25 Oct: Syntactic parsing5. 1 Nov: Statistical parsing

2011-10-04 Information Communication Theory ( 情報伝達学 ) 43