keyword extraction and image annotation games to enhance the cultural database creation

36
Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation Virach Sornlertlamvanich and Thatsanee Charoenporn [email protected], [email protected] National Electronics and Computer Technology Center, Thailand PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Upload: suchin

Post on 25-Feb-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation. Virach Sornlertlamvanich and Thatsanee Charoenporn [email protected] , [email protected] National Electronics and Computer Technology Center, Thailand. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Keyword Extraction and Image Annotation Games to Enhance the Cultural Database

Creation

Virach Sornlertlamvanich and Thatsanee Charoenporn

[email protected], [email protected] Electronics and Computer Technology Center, Thailand

Page 2: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Motivation

• Cultural Knowledge Creation• Image and object labeling– Keyword and semantic relation extraction– Image as a focal point

• Cultural Knowledge Services– Service platform

Page 3: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

3 Steps in Digital Cultural Communication

Step 1: Cultural knowledge curation– Reuse– Standardization

Step 2: Cultural image annotation– Keyword extraction– Semantic relation acquisition– Image annotation games

Step 3: Cultural knowledge service– Cultural knowledge platform for application service

development

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 4: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

CULTURAL KNOWLEDGE CURATION

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 5: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Community Co-Creation-Input-GPS data-Tag-Invitation, registration, approval

Citation• Museum• Museum archive• Other departments

CommunityProvincial Cultural Knowledge Base Service-Search- Text- Filter- Similarity (color, structure, role, mood, image)

-Presentation- Location, category- Statistics- Relation

Audience

Cultural knowledge curation

Standardized Annotated Cultural Knowledge Base

- Search- Category- Statistics

Curation and PresentationInstitution

Community Co-Creation Cultural Knowledge Base

Page 6: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Cultural Knowledge Portal Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Cultural Personnel/Organization

- Artist- Scholar- Religious Monument- Writer/Author- Society/Association- Cultural Network- Cultural Unit

Scope of Collection

Cultural Artifact

- Archaeological Objects- Artwork- Visual Art- Book/Press- Audiovisual Media- Utensil- Costume

Way of Life

- Ethnic- Religion and Belief- Tradition and Rite- Language and Literature- Local Wisdom- Performing Art and Music

Cultural Site

- Archaeological Site- Historical Park- Historical Site- Architecture- Religious Place- Museum- Library- Archive- Monument- Theatre- Tourism spot

Page 7: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Cultural Databankhttp://www.m-culture.in.th/

Page 8: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Cultural Databankhttp://www.m-culture.in.th/

Page 9: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

CULTURAL IMAGE ANNOTATION

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 10: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Keyword Extraction

• Some keywords are readily available in the set tags, but many of them are still missing.

• Our task is to extract those missing keywords from the description and title.

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 11: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Keyword Extraction

• Some keywords can be linked to external pages, e.g. Wikipedia.

• Our task is to find appropriate articles corresponding to those keywords.

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 12: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Method for KW Extraction

• Chunking model (Uchimoto et al., 2004) for keyword extraction

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 13: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Training Data Preparation• Generate a keyword list from tags and titles that are not

shorter than 5 characters and not longer than 30 characters• Segment descriptions using a state-of-the-art Thai word

segmentation algorithm (Kruengkrai et al., 2009)• Note that the word segmentation algorithm was trained using

ORCHID corpus and TCL’s lexicon (contents of ORCHID corpus and our current data are quite different)

• Label the segmented descriptions with the keyword list

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 14: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Training Data• Description

ผ้าซิน่ลายมดัหมีบ่า้นปทมุแก้ว …เป็นงานฝีมอืพื้นบา้น ..

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 15: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Labeling

• Apply BIO tagging– B: beginning position of a keyword– I: intermediate (or end) position of a keyword – O: other words

• If several matches are possible, select the longest one (like in the previous example)

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 16: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Training Data• Description

ผ้าซิน่ลายมดัหมีบ่า้นปทมุแก้ว …เป็นงานฝีมอืพื้นบา้น ..

• Segmented/Tagged/Labeled DescriptionWord POS tag

Labelผ้าซิน่ N

B-Kลายมดัหมีบ่า้นปทมุแก้ว N I-K<space> P

Oเป็น V

Oงานฝีมอืพื้นบา้น N O…… …..

…..PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

• Keyword List (extracted from tag and title)…..ผ้า…..ผ้าซิน่ผ้าซิน่ลายมดัหมีบ่า้นปทมุแก้ว…..…..

Page 17: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Chunking Model

Page 18: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Preliminary Experiment Result

• 3000 examples for training, 500 examples for testing

• Based on Margin Infused Relaxed Algorithm (MIRA), Crammer et al., 2005– Baseline features (Unigram and Bigram) +– 3 character prefix/suffix of current word +– 3 consecutive POS tags

• Recall=0.8256, Precision=0.9061, F1=0.8640

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 19: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Semantic Relation Acquisition

• Extract commons syntactic patterns between two nouns

• Our task is to acquire triples (ei , rij , ej ), where– ei and ej are entities (keywords)

– rij is a relationship between them

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 20: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

• ExampleTitle: วดัทุ่งDescription: วดัทุ่ง มอีายุราว 500 ปี สนันิฐานวา่สรา้งขึน้ในสมยักรุงสโุขทัยTitle: วดัตราชูDescription: วดัตราชู สรา้งขึน้ในสมยั กรุงศรอียุธยาตอนต้น ราว พ.ศ.2076Title: หลวงพอ่ขาวDescription: เป็นพระพุทธรูปเก่าแก่เนื้อหนิทรายปางสมาธ ิขนาดหน้าตักกวา้ง ๒ ศอกประดิษฐานอยูใ่นวหิารวดัหลวงวดัสนันิฐานวา่สรา้งขึน้ในสมยัอยุธยาTitle: พระพุทธรูปปางมารวชิยัDescription: สรา้งขึน้ในสมยัรตันโกสนิทรต์อนต้นTitle: วหิารวดัโยธานิมติDescription: สรา้งขึน้ในสมยัพระบาทสมเด็จพระเจา้ตากสนิมหาราช

Extract Common Syntactic Pattern of a Predicate between Two Keywords

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Anchored keywordPredicate

Page 21: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

• Example(วดัทุ่ง, สรา้งขึน้ในสมัย, กรุงสโุขทัย)(วดัตราชู, สรา้งขึน้ในสมยั, กรุงศรอียุธยาตอนต้น)(หลวงพอ่ขาว, สรา้งขึน้ในสมยั, อยุธยา)(พระพุทธรูปปางมารวชิยั, สรา้งขึน้ในสมยั,

รตันโกสนิทรต์อนต้น)(วหิารวดัโยธานิมติ, สรา้งขึน้ในสมยั, พระบาทสมเด็จ

พระเจา้ตากสนิมหาราช)

Extract Common Syntactic Pattern of a Predicate between Two Keywords

(ei, BUILT_IN,

ej)PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 22: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Extract Common Syntactic Pattern of a Predicate between Two Keywords

• ExampleTitle: กระโจมไฟบา้นโรงถ่านDescription: สรา้งโดย อพท. เมื่อปี พ.ศ.2550 เป็นท่าเทียบเรอีสำาหรบัเรอืท่องเท่ียวTitle: ศาลเจา้ตากสนิ วดับา้นค่ายDescription: ศาลปูนขนาดกลาง สรา้งโดยพระครูพพิฒัน์ชยาภรณ์Title: วดัทุ่งโฮ้งใต้Description: สรา้งขึ้นเมื่อ พ.ศ.2370 จากตำานานเล่าวา่สรา้งโดยกลุ่มชาวลาวพวนTitle: ศาลพระพรหมDescription: ตัง้อยูบ่รเิวณสวนตงุโคม ตำาบลเวยีงอำาเภอเมอืงเชยีงรายจดัสรา้งโดยเทศบาลนครเชยีงรายTitle:วงเวยีนนิมติรDescription: วงเวยีนนิมติรหรอืวงเวยีนมา้นำ้าก่อสรา้งโดยเทศบาลนครภเูก็ตในปี พ.ศ.2548

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Anchored keywordPredicate

Page 23: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

Extract Common Syntactic Pattern of a Predicate between Two Keywords

• Example(กระโจมไฟบา้นโรงถ่าน, สรา้งโดย, อพท.)(ศาลเจา้ตากสิน วดับา้นค่าย, สรา้งโดย, พระครูพพิฒัน์

ชยาภรณ์)(วดัทุ่งโฮ้งใต้, สรา้งโดย, กลุ่มชาวลาวพวน)(ศาลพระพรหม, สรา้งโดย, เทศบาลนครเชยีงราย)(วงเวยีนนิมติร, สรา้งโดย, เทศบาลนครภเูก็ต)

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

(ei, BUILT_BY,

ej)

Page 24: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

ESP Game and Peekaboomproposed by Luis von Ahn, May 25, 2006 by Pete Cashmore

• ESP Game – In the ESP Game, the two players are shown an image and asked to enter a word that describes it. The players can’t see each other’s guesses. The aim is to enter the same word as your partner in the shortest possible time. But there’s an ulterior motive here: much of the data is recorded, and could be used to power image search engines in the future. What’s cheaper – paying thousands of Mechanical Turkers to label all the images on the web, or tricking people into doing it for free?

• Peekaboom – Peekaboom takes the ESP Game to the next level. Unlike the ESP Game, it’s asymmetrical. To start, one user is shown an image and the other sees an empty black space. The first user is given a word relating to the image, and the aim is to communicate that word to the other player by revealing portions of the image. So if the word is “eye” and the image is a face, you reveal the eye to your partner. But the real aim here is to build a better image search engine: one that could identify individual items within an image.

Page 25: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

ESP Game

• Two players are shown an image

• asked to enter a word that describes it.

• The aim is to enter the same word as your partner in the shortest possible time.

Twitter Bird

BirdTo name the image

Angry birdBird

Mohawk

Page 26: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Peekaboom• One user is shown a

named image and show the part of the image according to the name

• Another user gives a word relating to the image

• The aim is to enter the same word as it is named in the shortest possible time.

Bird

Bird

BirdSquirrel

Flying fish

To label the objectin the image

Page 27: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Extended Peekaboom• One user is shown a named

image and show the part of the image according to the name

• Another user gives a word relating to the image

• The aim is to enter the same word as it is named in the shortest possible time.

• A word from the Synset can be matched.

• Once a synset is selected cross language matching can be determined.

Bird

Bird

Bird

Squirrel

Flying fish

AWN

Page 28: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Demo• ESP-like game

– http://m-culture.in.th/game/esp_game– Play mode

• Single player mode: play against history• Two-player mode: guess to match each other

• Extended Peekaboom game– http://m-culture.in.th/game/peekaboom– Play mode

• Single player mode: play against history• Two-player mode: guess to match each other

– For Thai language, use AWN to support synonym, hypernym, hyponym, meronym, and holonym

– For other languages, use AWN to support synonym only

Page 29: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Preliminary Experiment

• 18 images played by 19 persons. For each image, we allow 60 seconds to guess a proper word.

• AWN can expand the matching in 67 cases or increase 22% of matching ratio.

Exact Syn Hyper Hypo Mero Holo

229 32 16 1 7 11

Page 30: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

CULTURAL KNOWLEDGE SERVICE

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Page 31: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Title cultureSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

A

B

C

D

Cultural Database

Title productSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Product Database

Title shopSnippet descriptionTags A, B, C

Title shopSnippet descriptionTags A, B, C

Title shopSnippet descriptionTags A, B, C

Shop Database

Title makerSnippet descriptionTags A, B, C

Title makerSnippet descriptionTags A, B, C

Title makerSnippet descriptionTags A, B, C

Maker Database

Page 32: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Title productSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

To find a related Product from Culture information

Page 33: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Title productSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

To find the background Culture information from a Product

Page 34: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Title productSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title cultureSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Title productSnippet descriptionTags A, B, C

Product and Culture information relation

Page 35: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Summary

• From this ESP-like game, we successfully named the images or at least obtained a list of candidates for labeling the object in the image to be used in the next extended Peekaboom game.

• Synonym, hypernym, hyponym, meronym, holonym from AWN can help expanding the matching ratio.

• Cross language image labeling is realized by AWN synonym.

Page 36: Keyword Extraction and Image Annotation Games to Enhance the Cultural Database Creation

PNC 2012 Annual Conference and Joint Meetings, UC Berkeley, US., December 7-9, 2012

Future Work

• Enhancing keyword extraction to find more term candidate for image matching

• Call for participation of the extended ESP and Peekaboom games for image labeling