multimedia database management systems

www.elsevier.com/locate/jvci

J. Vis. Commun. Image R. 15 (2004) 261–264

Editorial

Multimedia database management systems

Over the last decennium we have witnessed a significant growth in the digital med-

ia market. Digital video cameras have become ubiquitous with the proliferation of

Internet cameras, security and monitoring cameras, and personal hand-held cam-

eras. Meanwhile, advances in digital storage technology have made the digitization,

compression, archiving, and streaming of multimedia data popular and inexpensive.Finally, the expansion of the Internet and the development of video streaming tech-

nology are providing convenient ways for the widespread distribution and usage of

these data. Apparently, all these trends indicate a promising future for digital media

in a variety of applications, including entertainment, education, medicine, and online

information services.

This transition is both a blessing and a challenge. On one hand, it allows an ex-

tremely flexible way of producing, delivering, and consuming audiovisual content.

On the other hand, the huge amount of multimedia data brings us a crucial challengeon how to efficiently store, access, index, represent, browse, and search the data. Tra-

ditional techniques that are effective in processing alphanumeric data will no longer

work well with multimedia data. In this context, innovative research areas are emerg-

ing and new technologies are being developed to address these issues in the fields of

multimedia database management, multimedia content analysis, video summariza-

tion, video indexing, browsing, and video retrieval.

The objective of this special issue is to review the latest development in multime-

dia data management technologies, bringing various multimedia research effortstogether. It contains the following 10 articles covering five major research topics—

multimedia feature extraction, video event detection, semantic video context model-

ing and concept detection, video summarization and adaptation, and multimedia

content indexing, browsing, and retrieval:

� ‘‘Framework for Measurement of the Intensity of Motion Activity of Video Seg-

ments’’ by Peker and Divakaran� ‘‘Evaluation of Shape Similarity Measurement Methods for Spine X-Ray Images’’

by Antani, Lee, Long, and Thoma

1047-3203/$ - see front matter � 2004 Elsevier Inc. All rights reserved.

doi:10.1016/j.jvcir.2004.08.004

262 Editorial / J. Vis. Commun. Image R. 15 (2004) 261–264

� ‘‘Integrated Use of Different Content Derivation Techniques Within A Multime-

dia Database Management System’’ by Petkovic and Jonker

� ‘‘Real-Time View Recognition and Event Detection for Sports Video’’ by Zhong

and Chang

� ‘‘On Supervision and Statistical Learning for Semantic Multimedia Analysis’’by Naphade

� ‘‘Video Personalization and Summarization System for Usage Environment’’ by

Tseng, Lin, and Smith

� ‘‘Bridging the Semantic Gap in Sports Video Retrieval and Summarization’’ by

Li, Errico, Pan, and Sezan

� ‘‘Organizing a Personal Image Collection with Statistical Model-Based ICL Clus-

tering on Spatio-temporal Camera Phone Meta-data’’ by Pigeau and Gelgon

� ‘‘Content-Based Retrieval for Human Motion Data’’ by Chiu, Chao, Wu, Yang,and Lin

� ‘‘Automatic Generation of Conference Video Proceedings’’ by Amir, Ashour, and

Srinivasan

The first article, by Peker et al. explores a framework for measuring the intensity

of motion activity for characterizing video segments. Motion activity, which

captures object motion and camera motion, provides an effective measure for

discriminating video content for applications such as sports video retrieval andsummarization.

The second article, by Antani et al. reports on the evaluation of two popular

shape similarity measures—polygon approximation and Fourier descriptors—ap-

plied to a collection of digitized medical X-ray images of vertebra. The paper reports

on experimental results that found polygon approximation performed better than

Fourier descriptors. The authors discuss other factors, such as efficiency, partial

matching, and similarity of closely related shapes, that need further consideration

and investigation.In the third article, Petkovic and Jonker present their work on how to extend a

traditional database management system with content-based video retrieval func-

tionality. Specifically, important issues regarding video data models, dynamic feature

extraction, and extensions of different layers of database architecture are elaborately

addressed. Moreover, content analysis techniques that automatically detect and rec-

ognize diverse sports events (e.g., net playing, forehand and backhand in tennis, and

passing and flying-out in Formula 1 car race) are described. The integration of these

techniques with the proposed database management system has allowed efficient,scalable, and domain-independent content-based video retrieval.

A real-time structure parsing and event detection system is described in the fourth

article, by Zhong and Chang, which aims at recognizing important recurrent scenes

in sports videos (e.g., pitching in baseball and serving in tennis) and detecting high-

level sports events such as strokes, net plays, and baseline plays. A three-stage frame-

work is proposed to achieve this goal. Specifically, in the first training phase, feature

models and object rules are automatically or semi-automatically learned. Then in the

second operation phase, optimal models are selected to adapt to new videos and

Editorial / J. Vis. Commun. Image R. 15 (2004) 261–264 263

subsequently used to detect target scenes. Finally, high-level events within detected

scenes are recognized based on constraint models on spatio-temporal properties of

segmented video objects. To facilitate the access of the content structure as well as

detected events, a summarization and browsing application is also investigated in

the article.In the fifth article, ‘‘On Supervision and Statistical Learning for Semantic Multi-

media Analysis,’’ the author presents a review of the state of the art for the hot field

of semantic multimedia analysis, together with his own extensive accomplishments in

tackling various challenging problems on this topic. Issues discussed in the article,

such as context modeling, active learning, and unsupervised structure discovery,

are all active research fields. In general, this is a very informative article, with inter-

esting results.

The sixth article, ‘‘Video Personalization and Summarization System for UsageEnvironment,’’ describes a rather thorough system for personalizing and summariz-

ing video content according to the usage environment and addresses a number of

interesting technical issues within the system. In particular, the proposed approach

leverages MPEG-7 and MPEG-21 for various aspects of the system. Interesting

descriptions on the overall system structure, the set of tools built for the system,

and various application scenarios can be found in the article.

A general framework for indexing and summarizing sports broadcast programs is

presented in the seventh article, ‘‘Bridging the Semantic Gap in Sports Video Retrie-val and Summarization,’’ by Li et al. This work has attempted to bridge the semantic

gap between the rich meanings that a user desires and the shallowness of the content

descriptions for sports video in the following two efforts: first, it applies an event

detection scheme to automatically detect all segments that contain interesting events

of a particular game such as a pitch in baseball and a field goal in football, by

exploiting specific domain knowledge and well-established production patterns. Sec-

ond, it analyzes and interprets independently generated rich textual metadata that

describe key events of the game and synchronizes them with the detected event seg-ments. The event segment data and the synchronized textual metadata are then

merged into a rich media content description that facilitates the retrieval and sum-

marization of various sports content.

The eighth article, ‘‘Organizing a Personal Image Collection with Statistical Mod-

el-Based ICL Clustering on Spatio-temporal Camera Phone Meta-data,’’ discusses

clustering image collection according to time and geo-location metadata informa-

tion, and particularly toward the application on camera phones, which are becoming

more and more popular these days. A model-based unsupervised classification meth-od is used. The ICL criteria were examined and optimized using the EM technique.

Some example photograph-taking scenarios are discussed in this paper. Interesting

experimental results are also presented.

The ninth article, ‘‘Content-Based Retrieval for Human Motion Data,’’ presents

a video indexing and retrieval system based on extracted human motion informa-

tion. Specifically, assuming that each frame contains a posture, it first represents

each posture with a hierarchical skeletal structure using an affine invariant feature

vector. Then, for each skeletal segment (e.g., arm, leg, or a torso), it constructs an

264 Editorial / J. Vis. Commun. Image R. 15 (2004) 261–264

index map according to the segment-posture distribution through self-organizing

map (SOM) clustering. During the retrieval stage, the start and end frames (pos-

tures) of the query example are used to find candidate clips from a video collection,

and then the similarity between the query and each candidate is computed using a

dynamic time warping algorithm. A video collection containing Tai Chi Chuandata (a traditional Chinese marital art) is used to demonstrate the system

performance.

The last article, by Amir et al. describes an application that allows a nearly auto-

matic, real-time creation of video proceedings. The proposed video proceeding con-

tains videos of all conference talks, as opposed to regular paper proceeding, and

provides the end-user with the following functionalities: full conference coverage

including presentations and panels, automated video and speech processing and

indexing, efficient content search using free text queries and keywords, randomand nonlinear content access, and efficient video browsing with generated video sum-

mary and multiple synchronized views. To achieve this goal, various video analysis

and indexing approaches are described including the shot boundary detection, auto-

matic speech recognition (ASR), speech analysis, audio processing, and slide show

streaming.

We thank all the authors in this special issue who have worked very hard to

write their papers in a very strict time frame. Also, we hope that you, the reader,

find this special issue an enjoyable mix and a spotlight on new themes emergingin the Multimedia Database Management field. Finally, we thank the Journal of

Visual Communication and Image Representation staff for helping us produce

this issue.

Ying Li and Dr. John Smith

IBM T.J. Watson Research Center

19 Skyline Drive, Hawthorne, NY 10532, USA

E-mail address: [email protected], [email protected]

Dr. Tong Zhang

Hewlett-Packard Labs, 1501 Page Mill Road, MS1203

Palo Alto, CA 94304, USA

E-mail address: [email protected]

Prof. Shih-Fu Chang

Department of Electrical Engineering, Columbia University

New York, NY 10027

E-mail address: [email protected]

E-mail addresses:

multimedia database management systems

Documents