text categorization and images. text categorization text categorization (tc) refers to the automatic...

Post on 13-Jan-2016

242 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Text Categorization

• Text categorization (TC) refers to the automatic labeling of documents, using natural language text contained in or associated with each document, into one or more pre-defined categories.

• TC techniques can be applied to image captions to label the corresponding images.

Clues for Indoor/Outdoor:Text (as opposed to Vision)

Denver Summit of Eight leaders begin their first official meeting in the Denver Public Library, June 21.

The two engines of an Amtrak passenger train lie in the mud at the edge a marsh after the train, bound for Boston from Washington, derailed on the bank of the Hackensack River, just after crossing a bridge.

NewsBlaster Categories

Entertainment Science/Technology Sports

U.S. News World News Finance

Events Categories

Politics Struggle

Disaster Crime Other

Subcategories for Disaster Images

Politics Struggle

Disaster Crime Other

Category F1

Politics 89%

Struggle 88%

Disaster 97%

Crime 90%

Other 59%

Affected People OtherWreckageWorkers Responding

Disaster Image Categories

Affected People

OtherWreckage

Workers Responding

Words are Ambiguous:Workers Responding vs. Affected People

Philippine rescuers carry a fire victim March 19 who perished in a blaze at a Manila disco.

Hypothetical alternative caption: A fire victim who perished in a blaze at a Manila disco is carried by Philippine rescuers March 19.

Workers Responding Affected People

Collect Labels to Train Systems

Contributions of My Research

• Applied text categorization (TC) techniques to images using associated text.

• Created a corpus, hoping to make it public.

• Introduced two novel TC approaches.

• Integrated NLP with traditional approaches.

• Explored combination of approaches.

• Combined text and image features.

top related