multimedia database management systems
TRANSCRIPT
www.elsevier.com/locate/jvci
J. Vis. Commun. Image R. 15 (2004) 261–264
Editorial
Multimedia database management systems
Over the last decennium we have witnessed a significant growth in the digital med-
ia market. Digital video cameras have become ubiquitous with the proliferation of
Internet cameras, security and monitoring cameras, and personal hand-held cam-
eras. Meanwhile, advances in digital storage technology have made the digitization,
compression, archiving, and streaming of multimedia data popular and inexpensive.Finally, the expansion of the Internet and the development of video streaming tech-
nology are providing convenient ways for the widespread distribution and usage of
these data. Apparently, all these trends indicate a promising future for digital media
in a variety of applications, including entertainment, education, medicine, and online
information services.
This transition is both a blessing and a challenge. On one hand, it allows an ex-
tremely flexible way of producing, delivering, and consuming audiovisual content.
On the other hand, the huge amount of multimedia data brings us a crucial challengeon how to efficiently store, access, index, represent, browse, and search the data. Tra-
ditional techniques that are effective in processing alphanumeric data will no longer
work well with multimedia data. In this context, innovative research areas are emerg-
ing and new technologies are being developed to address these issues in the fields of
multimedia database management, multimedia content analysis, video summariza-
tion, video indexing, browsing, and video retrieval.
The objective of this special issue is to review the latest development in multime-
dia data management technologies, bringing various multimedia research effortstogether. It contains the following 10 articles covering five major research topics—
multimedia feature extraction, video event detection, semantic video context model-
ing and concept detection, video summarization and adaptation, and multimedia
content indexing, browsing, and retrieval:
� ‘‘Framework for Measurement of the Intensity of Motion Activity of Video Seg-
ments’’ by Peker and Divakaran� ‘‘Evaluation of Shape Similarity Measurement Methods for Spine X-Ray Images’’
by Antani, Lee, Long, and Thoma
1047-3203/$ - see front matter � 2004 Elsevier Inc. All rights reserved.
doi:10.1016/j.jvcir.2004.08.004
262 Editorial / J. Vis. Commun. Image R. 15 (2004) 261–264
� ‘‘Integrated Use of Different Content Derivation Techniques Within A Multime-
dia Database Management System’’ by Petkovic and Jonker
� ‘‘Real-Time View Recognition and Event Detection for Sports Video’’ by Zhong
and Chang
� ‘‘On Supervision and Statistical Learning for Semantic Multimedia Analysis’’by Naphade
� ‘‘Video Personalization and Summarization System for Usage Environment’’ by
Tseng, Lin, and Smith
� ‘‘Bridging the Semantic Gap in Sports Video Retrieval and Summarization’’ by
Li, Errico, Pan, and Sezan
� ‘‘Organizing a Personal Image Collection with Statistical Model-Based ICL Clus-
tering on Spatio-temporal Camera Phone Meta-data’’ by Pigeau and Gelgon
� ‘‘Content-Based Retrieval for Human Motion Data’’ by Chiu, Chao, Wu, Yang,and Lin
� ‘‘Automatic Generation of Conference Video Proceedings’’ by Amir, Ashour, and
Srinivasan
The first article, by Peker et al. explores a framework for measuring the intensity
of motion activity for characterizing video segments. Motion activity, which
captures object motion and camera motion, provides an effective measure for
discriminating video content for applications such as sports video retrieval andsummarization.
The second article, by Antani et al. reports on the evaluation of two popular
shape similarity measures—polygon approximation and Fourier descriptors—ap-
plied to a collection of digitized medical X-ray images of vertebra. The paper reports
on experimental results that found polygon approximation performed better than
Fourier descriptors. The authors discuss other factors, such as efficiency, partial
matching, and similarity of closely related shapes, that need further consideration
and investigation.In the third article, Petkovic and Jonker present their work on how to extend a
traditional database management system with content-based video retrieval func-
tionality. Specifically, important issues regarding video data models, dynamic feature
extraction, and extensions of different layers of database architecture are elaborately
addressed. Moreover, content analysis techniques that automatically detect and rec-
ognize diverse sports events (e.g., net playing, forehand and backhand in tennis, and
passing and flying-out in Formula 1 car race) are described. The integration of these
techniques with the proposed database management system has allowed efficient,scalable, and domain-independent content-based video retrieval.
A real-time structure parsing and event detection system is described in the fourth
article, by Zhong and Chang, which aims at recognizing important recurrent scenes
in sports videos (e.g., pitching in baseball and serving in tennis) and detecting high-
level sports events such as strokes, net plays, and baseline plays. A three-stage frame-
work is proposed to achieve this goal. Specifically, in the first training phase, feature
models and object rules are automatically or semi-automatically learned. Then in the
second operation phase, optimal models are selected to adapt to new videos and
Editorial / J. Vis. Commun. Image R. 15 (2004) 261–264 263
subsequently used to detect target scenes. Finally, high-level events within detected
scenes are recognized based on constraint models on spatio-temporal properties of
segmented video objects. To facilitate the access of the content structure as well as
detected events, a summarization and browsing application is also investigated in
the article.In the fifth article, ‘‘On Supervision and Statistical Learning for Semantic Multi-
media Analysis,’’ the author presents a review of the state of the art for the hot field
of semantic multimedia analysis, together with his own extensive accomplishments in
tackling various challenging problems on this topic. Issues discussed in the article,
such as context modeling, active learning, and unsupervised structure discovery,
are all active research fields. In general, this is a very informative article, with inter-
esting results.
The sixth article, ‘‘Video Personalization and Summarization System for UsageEnvironment,’’ describes a rather thorough system for personalizing and summariz-
ing video content according to the usage environment and addresses a number of
interesting technical issues within the system. In particular, the proposed approach
leverages MPEG-7 and MPEG-21 for various aspects of the system. Interesting
descriptions on the overall system structure, the set of tools built for the system,
and various application scenarios can be found in the article.
A general framework for indexing and summarizing sports broadcast programs is
presented in the seventh article, ‘‘Bridging the Semantic Gap in Sports Video Retrie-val and Summarization,’’ by Li et al. This work has attempted to bridge the semantic
gap between the rich meanings that a user desires and the shallowness of the content
descriptions for sports video in the following two efforts: first, it applies an event
detection scheme to automatically detect all segments that contain interesting events
of a particular game such as a pitch in baseball and a field goal in football, by
exploiting specific domain knowledge and well-established production patterns. Sec-
ond, it analyzes and interprets independently generated rich textual metadata that
describe key events of the game and synchronizes them with the detected event seg-ments. The event segment data and the synchronized textual metadata are then
merged into a rich media content description that facilitates the retrieval and sum-
marization of various sports content.
The eighth article, ‘‘Organizing a Personal Image Collection with Statistical Mod-
el-Based ICL Clustering on Spatio-temporal Camera Phone Meta-data,’’ discusses
clustering image collection according to time and geo-location metadata informa-
tion, and particularly toward the application on camera phones, which are becoming
more and more popular these days. A model-based unsupervised classification meth-od is used. The ICL criteria were examined and optimized using the EM technique.
Some example photograph-taking scenarios are discussed in this paper. Interesting
experimental results are also presented.
The ninth article, ‘‘Content-Based Retrieval for Human Motion Data,’’ presents
a video indexing and retrieval system based on extracted human motion informa-
tion. Specifically, assuming that each frame contains a posture, it first represents
each posture with a hierarchical skeletal structure using an affine invariant feature
vector. Then, for each skeletal segment (e.g., arm, leg, or a torso), it constructs an
264 Editorial / J. Vis. Commun. Image R. 15 (2004) 261–264
index map according to the segment-posture distribution through self-organizing
map (SOM) clustering. During the retrieval stage, the start and end frames (pos-
tures) of the query example are used to find candidate clips from a video collection,
and then the similarity between the query and each candidate is computed using a
dynamic time warping algorithm. A video collection containing Tai Chi Chuandata (a traditional Chinese marital art) is used to demonstrate the system
performance.
The last article, by Amir et al. describes an application that allows a nearly auto-
matic, real-time creation of video proceedings. The proposed video proceeding con-
tains videos of all conference talks, as opposed to regular paper proceeding, and
provides the end-user with the following functionalities: full conference coverage
including presentations and panels, automated video and speech processing and
indexing, efficient content search using free text queries and keywords, randomand nonlinear content access, and efficient video browsing with generated video sum-
mary and multiple synchronized views. To achieve this goal, various video analysis
and indexing approaches are described including the shot boundary detection, auto-
matic speech recognition (ASR), speech analysis, audio processing, and slide show
streaming.
We thank all the authors in this special issue who have worked very hard to
write their papers in a very strict time frame. Also, we hope that you, the reader,
find this special issue an enjoyable mix and a spotlight on new themes emergingin the Multimedia Database Management field. Finally, we thank the Journal of
Visual Communication and Image Representation staff for helping us produce
this issue.
Ying Li and Dr. John Smith
IBM T.J. Watson Research Center
19 Skyline Drive, Hawthorne, NY 10532, USA
E-mail address: [email protected], [email protected]
Dr. Tong Zhang
Hewlett-Packard Labs, 1501 Page Mill Road, MS1203
Palo Alto, CA 94304, USA
E-mail address: [email protected]
Prof. Shih-Fu Chang
Department of Electrical Engineering, Columbia University
New York, NY 10027
E-mail address: [email protected]
E-mail addresses: