Transcript
Page 1: Michele Merler and John R. Kender Computer Science ...mmerler/posterICIP09.pdf · 8 presentation videos, 1hr 45 mins, ~13 Slides each •Studies have been conducted in order to assess

Michele Merler and John R. Kender Computer Science Department {mmerler,jrk}@cs.columbia.edu

Columbia University

Semantic Keyword Extraction via Adaptive Text Binarization of

Unstructured Unsourced Video

Example of indexing function: the word Energy is recognized in slides across 4 different presentations

8 presentation videos, 1hr 45 mins, ~13 Slides each

• Studies have been conducted in order to assess the reliability of slides as a summarization tool [He et al.2000]

Text1 Text2 Text3 Text4 Text5 Text6 Text7 Text8 Text9 Text10

Completed Tasks

Rcscarch

Interview with Client

{ i sited Project SpaceQiant House Resident Association Meeting

LoG edgesEdges Connected

Components Constraints

Empirically validated thresholds applied to

prune non-text regions

56

32

101000

height

height

widthwidth

AreaArea

Area

FR

FR

FR

F

Geometric Constraints

Edge Density ConstraintF – FrameR – Candidate Text Region

Alignment Regions Merge Constraints

Binarization - 54 Detected Regions, 2177154 annotated pixels

N. Ground Truth

Characters

N. Recognized

CharactersPrecision Recall

13804 7376 0.5343 0.7446

N. Ground Truth

Words

N. Recognized

WordsPrecision Recall

2276 1126 0.4947 0.6651 0

2000

4000

6000

8000

Recognition Method

N.

Recogniz

ed C

hara

cte

rs

Tesseract

LAO + Tesseract

),( rgtgtccorc ssEDNN

Edit Distance between Ground Truth String and Recognized String

Videos of presentations are employed in a large variety of systems for

different purposes

• Distance or E-learning• Automatic generation of conference proceedings • Student presentations

Challenges

Result: Fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides

Integration into summarization and presentation tools such as the VAST MultiMedia Browser1 (see image above)

1. www.aquaphoenix.com/research/vastmm 2. http://code.google.com/p/tesseract-ocr 3. http://en.wikipedia.org/wiki/Letter_frequencies

Introduction Local Adaptive Otsu (LAO) Binarization

Proposed Approach

Results

Optimize for threshold T maximizing between-class variance in sliding window

22 ))()()(()()( TTTnTnT FBFBbetween

1

),,(1),,(),(

R

WyxkWyxyxT

)()()()()( 222 TTnTTnT FFBBwithin

Many videos are already archived

Low quality Lack of Structure

• Lack of additional sources of information

(e.g. electronic copies of slides)

• Not recorded by professional cameramen

• Shots cannot be used as clue

• Not edited

• Unconstrained camera movements

• Slides Clipped

• Compression

1. Segment video into semantically distinct shots based on slides

2. Slides Text Detection

10|)()(|

0),(

ji

ji

ji RxRx

RRRRmerge

2.0densityE

3. Slides Text Recognition

• Double Text Regions Size with Bilinear Interpolation

• LAO Binarization

• Tesseract2 OCR Engine• Training with 15 character sets• Height 30pt

Algorithm Precision Recall F1* t(sec) Binarization Example

Otsu 0.8611 0.8555 0.8583 0.539

Sauvola (k = 0.5) 0.9003 0.8759 0.8879 0.626

LAO 0.8831 0.9278 0.9049 2.126

LAO + Int. Hist. 0.8831 0.9278 0.9049 1.29 RecallPrecision

RecallPrecisionF1

2

Most popular fonts used in presentation slidesText reflecting English

letters frequencies3

Slides Text Recognition - 2276 words, 13804 characters

PrecisionSIMPLE RecallSIMPLE

0.71213 0.85914

PrecisionREFINED RecallREFINED

0.88584 0.680460

0.2

0.4

0.6

0.8

1

Precision Recall

•SIMPLE – use all candidate text regions

•REFINED – prune candidate text regions based on OCR output-

Slides Text Detection - 500 Frames

GT

EGT

E

EGT

TA

TATARecall

TA

TATAPrecision

Ground Truth Text Regions Area

Extracted Text Regions AreaTAE

TAGT

400 with text 100 no text

Text vs. Non-TextForeground vs. Background

Pixel Value pF BT

)(ph

2

F 2

B

Character Recognition

Word Recognition

N

r(c/w)

cor(c/w)

NRecall

gt(c/w)

cor(c/w)

N

NPrecision

• No electronic copies of the slides

• Changes in text used to assess slide changes

• The clients desire to generate wind energy • Grey Water system/Hydroelectric Energy@;gm;y,g[g_,bjns and energy efficient light bulbs

electrical/thermal energy Reallstlc energyPV and energy storage system hall energy needs

Optimal version of Sauvola’salgorithm

Localized version of Otsu’s algorithm [Otsu 79]

• Assumption: bimodal distribution (foreground/background)

• Fast implementation with Integral Histogram [Porikli et al. 05]

[Sauvola et al. 00][Otsu 79]

Dependency from k is removed!

*

Top Related