mpeg-7 applications document - world wide web ... · web viewthe ability to include descriptions...

INTERNATIONAL ORGANISATION FOR STANDARDISATIONORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC1/SC29/WG11CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11/N3934January 2001/Pisa

Title: MPEG-7 Applications Document v.10

Source: Requirements Group (Anthony Vetro, Editor)

Status: Approved

MPEG-7 Applications1. INTRODUCTION................................................................................................................... 2

2. MPEG-7 FRAMEWORK.......................................................................................................2

3. MPEG-7 APPLICATION DOMAINS....................................................................................3

4. “PULL” APPLICATIONS......................................................................................................3

4.1 STORAGE AND RETRIEVAL OF VIDEO DATABASES...............................................................44.2 DELIVERY OF PICTURES AND VIDEO FOR PROFESSIONAL MEDIA PRODUCTION......................64.3 COMMERCIAL MUSICAL APPLICATIONS (KARAOKE AND MUSIC SALES)................................74.4 SOUND EFFECTS LIBRARIES..............................................................................................104.5 HISTORICAL SPEECH DATABASE.......................................................................................104.6 MOVIE SCENE RETRIEVAL BY MEMORABLE AUDITORY EVENTS.........................................114.7 REGISTRATION AND RETRIEVAL OF MARK DATABASES.....................................................11

5. “PUSH” APPLICATIONS...................................................................................................13

5.1 USER AGENT DRIVEN MEDIA SELECTION AND FILTERING...................................................145.2 PERSONALISED TELEVISION SERVICES.............................................................................155.3 INTELLIGENT MULTIMEDIA PRESENTATION.......................................................................165.4 PERSONALIZABLE BROWSING, FILTERING AND SEARCH FOR CONSUMERS.................................175.5 INFORMATION ACCESS FACILITIES FOR PEOPLE WITH SPECIAL NEEDS.................................18

6. SPECIALISED PROFESSIONAL AND CONTROL APPLICATIONS.............................19

6.1 TELESHOPPING................................................................................................................ 196.2 BIO-MEDICAL APPLICATIONS...........................................................................................216.3 UNIVERSAL ACCESS........................................................................................................ 226.4 REMOTE SENSING APPLICATIONS.....................................................................................24SEMI-AUTOMATED MULTIMEDIA EDITING.......................................................................................256.5 EDUCATIONAL APPLICATIONS..........................................................................................266.6 SURVEILLANCE APPLICATIONS.........................................................................................276.7 VISUALLY-BASED CONTROL............................................................................................28

7. APPLICATIONS BASED ON SPECIFIC DESCRIPTORS & DESCRIPTION SCHEMES. .29

7.1 TRANSCODING APPLICATIONS.................................................................................................29

8. REFERENCES...................................................................................................................... 34

ANNEX A: SUPPLEMENTARY REFERENCES, BY APPLICATION.....................................35

4.1 & 4.2 STORAGE AND RETRIEVAL OF VIDEO DATABASES & DELIVERY OF PICTURES AND VIDEO FOR PROFESSIONAL MEDIA PRODUCTION........................................................................................355.1 USER AGENT DRIVEN MEDIA SELECTION AND FILTERING...................................................385.2 INTELLIGENT MULTIMEDIA PRESENTATION.......................................................................396.3 UNIVERSAL ACCESS........................................................................................................ 396.5 EDUCATIONAL APPLICATIONS.........................................................................................40

Annex B: An example architecture for MPEG-7 Pull applications....................................................41

1. Introduction

This ‘MPEG-7 Applications Document’ describes a number of applications that are enabled by MPEG-7 tools. It certainly does not list all the applications enabled by MPEG-7, but rather gives an idea of the potential applications that could be supported.

The purpose of this document is to provide a better understanding of what the MPEG-7 standard is, and the functionality that could be delivered. In the past, this document was used as a tool to help write concrete requirements. For this reason, the earlier sections that describe the push/pull and other specialised applications not only include the description of the application provided, but also application-specific requirements and the requirements that were imposed on the MPEG-7 standard. These requirements are kept in place since they do indeed provide some important insight regarding the development of MPEG-7.

In October 2000, MPEG-7 reached the stage of Committee Draft (CD). This stage indicates a mature specification, and, for the first time, the specification is made publicly available. In section 7 of this document, applications that are based on actual descriptors and description schemes that are part of the specification are described.

2. MPEG-7 FrameworkToday, more and more audio-visual information is available from many sources around the world. Also, there are people who want to use this audio-visual information for various purposes. However, before the information can be used, it must be located. At the same time, the increasing availability of potentially interesting material makes this search more difficult. This challenging situation led to the need of a solution to the problem of quickly and efficiently searching for various types of multimedia material interesting to the user. Moreover, MPEG-7 not only enables this type of search, but also enables filtering. Thus, MPEG-7 will support both push and pull applications. MPEG-7 wants to answer to this need, providing this solution.

MPEG-7, formally called ‘Multimedia Content Description Interface’, standardises:

A set of description schemes and descriptors

MPEG-7 Applications Document 2

A language to specify description schemes, i.e. a Description Definition Language (DDL).

A scheme for coding the description

For more details regarding the MPEG-7 background, goals, areas of interest, and work plan please refer to document N2729, “MPEG-7 Context, Objectives, and Technical Roadmap” [1]. MPEG-7’s requirements and definitions are indicated in document N2727, “MPEG-7 Requirements Document” [2].

3. MPEG-7 Application DomainsThe increased volume of audio-visual data available in our everyday lives requires effective multimedia systems that make it possible to access, interact and display complex and inhomogeneous information. Such needs are related to important social and economic issues, and are imperative in various cases of professional and consumers applications such as:

Education, Journalism (e.g. searching speeches of a certain politician using his name, his

voice or his face), Tourist information, Cultural services (history museums, art galleries, etc.), Entertainment (e.g. searching a game, karaoke), Investigation services (human characteristics recognition, forensics), Geographical information systems, Remote sensing (cartography, ecology, natural resources management, etc.), Surveillance (traffic control, surface transportation, non-destructive testing in

hostile environments, etc.), Bio-medical applications, Shopping (e.g. searching for clothes that you like), Architecture, real estate, and interior design, Social (e.g. dating services), and Film, Video and Radio archives.

4. “Pull” ApplicationsA preliminary note on the division of this document:

There is a multitude of ways of dividing this group of applications into different categories. Originally, applications were divided by medium, but later were categorised by delivery paradigm. This is not to imply an ordering or priority of divisions, but is simply a reflection of what was convenient at the time. Other means of dividing the list of applications may be done by content type, user group, and position in the content.

MPEG-7 began its life as a scheme for making audio-visual material “as searchable as text is today.” Although the proposed multimedia content descriptions are now acknowledged to serve much more than search applications, they remain for many the primary applications for MPEG-7. These retrieval, or “pull” applications, involve


databases, audio-visual archives, and the web-based Internet paradigm (a client requests material from a server.)

4.1 Storage and retrieval of video databases

Application DescriptionTelevision and film archives store a vast amount of multimedia material in several different formats (digital or analogue tapes, film, CD-ROM, etc.) along with precise descriptive information (meta-data) which may or may not be precisely timecoded. This meta-data is stored in databases with proprietary formats. There is an enormous potential interest in an international standard format for the storage and exchange of descriptions that could ensure:

interoperability between video archive operators, perennial relevance of the meta-data, and a wider diffusion of the data to the professional and the general public.

MPEG-7, in short, must accommodate visual and other search of such existing multimedia databases.

In addition, a vast amount of the older, analogue audio-visual material will be digitised in years to come, which creates a tremendous opportunity to include content-based indexing features (which can be extracted during the digitisation/compression process) into those existing data-bases.

In the case of new audio-visual material, the ability to associate descriptive information within video streams at various stages of video production can dramatically improve the quality and productivity of manual, controlled-vocabulary annotation of video data in a video archive. For example, pre-production and post-production scripts, information captured or annotated during shooting, and post-production edit lists would be very useful in the retrieval and re-use of archival material.

Essential associated activities to this one are cost-efficient video sequence indexing and shot-level indexing for stock footage libraries [3].

A sample architecture is outlined in Annex B.

Application requirementsSpecific requirements for those applications are:

Support of full-text descriptions as well as structured fields (database descriptions);

Multi-language support; We desire the ability to interoperate between different content description

semantics (e.g. different database schemas, different thesauri, etc.) or to translate from each content description semantic into MPEG-7 semantics;

The ability to reference audio-visual objects or object instances and time references, even in analogue format;

The ability to include descriptions with incomplete or missing time references (a shot description that has not been timecoded);


The ability to handle multiple versions of the same document at several stages in the production process, and descriptions that apply to multiple copies of the same material.

MPEG-7 Specific RequirementsRequirements specific to MPEG-7 are:

There must exist a class of descriptors that support unstructured (free-) text in multiple languages.

A note about text: Descriptors should depend as little as possible on a specific language. If text is needed as a descriptor, the language used must be specified in the text description and a text description may contain several translations. The character set chosen must enable the use of all languages (as appropriate to ISO).

There must exist a class of descriptors that support structured text.

There must exist a mechanism by which different MPEG-7 DS’s can interoperate.

There must be a robust linking mechanism that allows for temporal references to material with incomplete timecode.

There must be a mechanism by which document versions may be identified.

Application relevant work and references

Bloch, G. R. (1988). From Concepts To Film Sequences. In RIAO 88, (pp. 760-767). MIT Cambridge MA.: March 21-24, 1988.

Davis, M. (1993). Media streams: An iconic visual language for video annotation. Telektronikk, 89(4), 59 - 71.

EBU/SMPTE Task Force, First Report : User Requirements, version 1.22, chapitre 1 : Compression, chapitre 2 : Metadata and File Wrappers, chapitre 3 : Transfer Protocols. SMPTE Journal, April 1997.

Parkes, A. P. (1989b). The Prototype CLORIS system: Describing, Retrieving and Discussing Videodisc Stills and Sequences. Information Processing and Management, 25(2), 171 - 186.

ISO/TC 46 /SC 9, Information and documentation - Presentation, identification and description of documents <http://www.nlc-bnc.ca/iso/tc46sc9/index.htm>

ISO/TC 46 /SC 9, Working Group 1 (1997) Terms of reference and tasks for the development of an International Standard Audio-visual Number (ISAN). Document ISO/TC 46/SC 9 N 235, May 1997.

See also Annex A.


4.2 Delivery of pictures and video for professional media production

Application description[note: this section is still to be re-evaluated]

Studios need to deliver appropriate videos to TV channels. The studio may have to deliver a whole video, based on some global meta-data, or video segments, for example to edit an archive-based video, or a documentary, or advertisement videos.

In this application, due to the users’ expertise, one formulates relevant and possibly detailed “pull” queries, which specify the desired features of some video segments. With present video databases, these queries are mainly based on objective characteristics at segment level. However, they can also take advantage of subjective characteristics of these segments, as perceived by one or several users.

The ability to formulate a single query on the client side, and send it to many distributed databases is very important to many production studios. The returned items should include visual abstracts, copyright and pricing information, as well as a measure of the technical quality of the source video material.

In this application, one should separate news programs, which must be made widely and instantly available for a short period of time, from other production programs, which can be retrieved on a permanent basis, usually from secondary or tertiary storage. On-line news services providing instant access to the day’s news footage are being built by many broadcasters and archives (including INA, BBC, etc), using proprietary formats, and would benefit from standardisation if they were to be consolidated into common services such as the Eurovision News Exchange (which currently uses broadcast channels, not databases).

Still pictures have similar applications and requirements as pertaining to design. The web designer must not only make new designs but also collect the already available graphics on the net for use in the designed web sites. Other design fields have similar uses for visual search.

Application requirementsRequirements are similar to the previous application. They are mainly characterised by:

Support of feature-based and concept-based queries at segment level, and

Support for similarity queries.

Support for different data formats.

Support for art & design specific parameters.

Media summaries for fast browsing.


There must exist a mechanism that embodies conceptual knowledge.

There must exist a mechanism that allows for links to media summaries.



Aigrain, P., Joly, P., & Longueville, V. (1995). Medium Knowledge-Based Macro-Segmentation of Video into Sequences. In M. Maybury (Ed.) (pp. 5-16), IJCAI 95 - Workshop on Intelligent Multimedia Information Retrieval. Montréal: August 19, 1995

The Art Teacher Connectionhttp://www.primenet.com/~arted

Cohen, A., Levy, M., Roeh, Itzhack, Gurevitch, M. (1995) Global Newsrooms, Local Audiences, A study of the Eurovision News Exchange, Acamedia Research Monograph 12, John Libbey.

European League of Institutes of the Artshttp://www.elia.ahk.nl

Pentland, A. P., Picard, R., Davenport, G., & Haase, K. (1994). Video and Image Semantics: Advanced Tools for Telecommunications (Technical Report No. 283). MIT.

Sack, W. (1993). Coding News And Popular Culture. In The International Joint Conference on Artificial Intelligence (IJCA93) Workshop on Models of Teaching and Models of Learning. Chambery, Savoie, France.

Zhang, H., Gong, Y., & Smoliar, S. W. (1994). Automated parsing of news video. In IEEE International Conference on Multimedia Computing and Systems, (pp. 45 - 54). Boston: IEEE Computer Society Press.

see also 4.1 ‘Application relevant work and references’, and Annex A.

4.3 Commercial musical applications (Karaoke and music sales)

Application DescriptionThe Karaoke industry is extremely large and popular. One of the aims of the pastime is to make the activity of singing in public as effortless and unintimidating as possible. Requiring a participant to recall the name and artist of a popular tune is unnecessary when one considers that the amateur performer must know the song well enough to sing it. A much friendlier interface results if you allow someone to hum a few memorable bars of the requested tune, and to have the computer find it (or a short list of alternatives, if the brief segment under-specifies the intended selection). Alternatively, one may speak the title and author, lyrics, or lyrical key words.

A similar application dealing with Karaoke, but also relating to music sales, below, is enabling solo Karaoke-ists to expand their repertoire in the privacy of their own home. Much of the industry is currently driven by people wishing to practice in their own homes. One can easily imagine a complete on-line database, in which someone selects a song she knows from the radio, sings a few bars, and the entire arrangement is downloaded to their computer, with appropriate payment extracted.

Another consumer music application is a rethinking of the consumer music industry in the on-line world. The current model for music distribution is to go to a store,


purchase a music compact disk, bring it home and play it. An alternative model of growing popularity is to download a compressed song from the Internet, store it on your computer hard drive, and play it via a real-time software decoder. Even at the current density of computer hard drives, more than 130 CDs can be stored on a 6 GB drive (at 128 kb/s for a stereo signal). However, the personal computer is not a home appliance that is highly compatible with a home music system (that fan noise! and why is it taking so long to boot up?). An additional issue with this new model of music distribution is that when reliability problems occur, you don’t lose just a song or two from a scratch on the CD, but an entire library of 130 CDs in the event of a hard drive crash.

However, with the advent of high-speed networking, the repository of compressed music can be moved from the home to a secure and reliable server in the network. This has several advantages. First, a network-based storage device could be made very reliable at a modest cost per-user. Second, the music appliance in the home would not have to be a computer (with associated mass storage device), but merely a small, networked, audio decoder (no fan, and it runs the moment you turn it on!). A home network link of more than 1 Mb/s downstream has sufficient capacity to deliver a 128 kb/s compressed music bitstream to the home with reliable quality of service.

Assume that the consumer has a multi-media personal computer and a high-speed network connection. When shopping for music on the Internet, there are already many sites that will let the prospective buyer browse via artist, album title or song title; show a discography of the album or artist; permit short clips of songs to be downloaded for preview; and recommend related artists or albums that the customer might also like. Considering the magnitude of the effort required to create such a database, a standardized format that would enable the cost to be shared by many retail outlets might be very welcome.

When a song is purchased, the consumer does not download it to local storage, but rather moves it to a networked server. In fact, a purchase could consist of an entry in a server database that indicated that a song, perhaps at another location in the network, is “owned” by the consumer. This poses a dilemma: many consumers have large music libraries, which they might (or might not) catalog and organize in various ways by means of the physical arrangement of the CDs. Possibilities are alphabetical my artist or genre, categorize by mood, or just frequency of play (LIFO stack). Without the physical jewel boxes and liner notes to look at, these methods of organization are not possible. An method to browse the electronic catalog via any of various means of organization is vital, and this proposal is a first step toward satisfying that need.

Now assume that the music application is a subscription service: for a flat fee per month, the consumer can play any music in the provider’s archive. This multiplies the access problem enormously. The consumer can play not just the possibly thousands of songs in his or her personal library of purchased music, but any of hundreds of thousands of songs that are commercially available. An easy method of access is vital. Furthermore, in such a service the consumer might typically compose song lists, and then play the list during a listening session. An attractive new service would be to automatically compose an “infinite” song list based on initial information provided by the consumer. For example, “play cool jazz songs that are relaxing, like the songs on the album `Pariaso – Jazz Brazil’ by Jerry Mulligan.” This proposal attempts to specify the underlying database of information to facilitate such a service.


Application requirements Robust representations of melody and other musical features which allow for

reasonable errors on the part of the indexer in order to accommodate query-by-humming,

Associated information, and

Cross-modal search.


There must exist mechanism that supports melody and other musical features.

There must exist a mechanism that supports descriptors based on information associated with the data (e.g., textual data).

Support description schemes that contain descriptors of visual, audio, and/or other features, and support links between the different media.

Application relevant work and referencesThe area of query-by-humming may be the most-researched field within auditory query by content. A few example papers include:

Ghias, A., Logan, J., Chamberlain, D., Smith, B. C. (1995). “Query by humming-musical information retrieval in an audio database,” ACM Multimedia ‘95 San Francisco.<http://www.cs.cornell.edu/Info/People/ghias/publications/query-by-humming.html>

Kageyama, T., Mochizuki, K., Takashima, Y. (1993). “Melody retrieval with humming,” ICMC ‘93 Tokyo proceedings, 349-351.

Lindsay, A. (1996). “Using Contour as a Mid-Level Representation of Melody,” S.M. Thesis, MIT Media Laboratory, Cambridge, MA.http://sound.media.mit.edu/~alindsay/thesis.html

Quackenbush, S. (1999). “A Description Scheme and Associated Descriptors for Music” ISO IEC/JTC1/SC29/WG11 Doc m4582/p471, presented at MPEG-7 Evaluation meeting, Lancaster UK.

Wu, J. K. (1999). “Speech Annotation – A Descriptive Scheme for Various Media” ISO IEC/JTC1/SC29/WG11 Doc m4582/p629, presented at MPEG-7 Evaluation meeting, Lancaster UK.

4.4 Sound effects libraries

Application DescriptionFoley artists, sound designers, and the like must deal with extremely large databases of sound effects to be used for a variety of applications daily. Existing database


management and search solutions are typically proprietary and therefore closed, or open, and unsuitable for any serious, orderly work.

A sound designer may specify a sound effect type, for example, naming the source of the sound, and select from variations on that sound. A designer may provide a prototypical sound, and detail features such as, “bigger, more distant, but keeping the same brightness.” One may even vocalise the type of abstract sound one seeks, in an onomatopoetic variation of query-by-humming. Essential to the application is the ability to navigate a space of similar sound effects.

Application requirements Compact representation of sound effects,

Sound source name and characteristics, and

Ability to specify classes of audio-visual objects, with features to accommodate selection.


Support for hierarchical descriptors and description schemes.

Support for text based descriptors.

Efficient coding of descriptors.


Blum, T, et al, “Content-based classification, search, and retrieval of audio,” in Intelligent multimedia information retrieval, Maybury, Mark T. (ed) (1997). Menlo Park, Calif.

Mott, R.L. (1990) “Sound Effects: Radio, TV, and Film,” Focal Press, Boston, USA.

4.5 Historical speech database

Application DescriptionOne may search for historical events through key words spoken (“We will bury you”), key events (‘shoe banging’), the speaker (‘Nikita Krushchev’), location and/or context (‘address to the United Nations’), date (12 October 1960), or a combination of any or all of the above in order to call up an audio recording, an audio-visual presentation, or any other associated facts. This application can aid in education (See also 6.4-Film music education) or journalistic research.

Application requirements Representation of textual content of an auditory event.



Support for text based descriptors.

Application relevant work and references[to come]

4.6 Movie scene retrieval by memorable auditory events

Application DescriptionIn our post-modern world, many visual events are referred to by memorable spoken words. This is no more evident than when referring to comedic movie or television scenes (“This parrot is bleedin’ demised,” and “land shark,”) or movies by auteurs (“I’m sorry, did I ruin your concentration?” “thirty-seven!?” and “there’s only trouble and desire.”) by key words. One should be able to look up a movie (and rent a viewing of a particular scene, for example) by quoting such catch phrases. It is not hard to imagine a new market growing up around such micro-views and micro-payments, based on impulse viewing.

In a similar vein, auditory events in soundtracks may be just as accessible as spoken lines in certain circumstances. A key example is the screeching violins in the “Psycho” soundtrack at the point of the infamous shower scene. Those repeated harsh notes (“Scree-ee-ee-ee!”) are iconic to a movie-going public, and a key feature of an important movie.

Application requirements Search by example audio.


Support for audio descriptors.


4.7 Registration and retrieval of mark databases

Application DescriptionRegistration of marks is to protect the inventor or service provider in the form of exclusive rights of exploitation through legal proceedings from misuse or imitation. A mark is a sign, or a combination of signs, capable of distinguishing the goods or services of one undertaking from those of other undertakings. In general, the sign may be in the form of two-dimensional image that consists of text, drawings or pictures, emblems including colors. Two-dimensional marks can be categorized into the following three types as listed below:

Word-in mark

Contains only characters or words in the mark. (best described by text


annotation)

Device mark

Contains graphical or figurative elements only. (shape descriptor needed)

Composite mark

Consists of characters or words and graphical elements. (combination of above descriptors)

If a mark is registered, then no person or enterprise other than its owner may use it for goods or services identical with or similar to those for which the mark is registered. Any unauthorized use of a sign similar to the protected mark is also prohibited, if such use may lead to confusion in the minds of the public. The protection of a mark is generally not limited in time, provided its registration is periodically renewed (typically, every 10 years) and its use continues. Therefore, this number is expected to keep growing rapidly, and it is estimated that the number of registrations and renewals of marks effected worldwide in 1995 was in the order of millions.

In order to register a mark, one has to make sure no identical ones are registered before. For the types of “Word-in mark” and “Composite-mark,” text annotation may be adequate for the retrieval from the database. “Device-mark” type, however, is characterized only by the shape of the object. In addition, this type may not have distinct orientation or scale. When the operator enters a new mark to database for registration, he/she wants to make sure that no identical one is already in the system in disregard of its orientation angle or scale. Furthermore, he/she may want to see how similar shaped ones are already in the system even if there is no identical one. The search process should be robust to noise in image or minor variations in its shape. Any relevant information such as annotation or textual description of the mark should also be accessible if requested.

A mark designer may want the same thing. In addition, to avoid possible inadvertent infringement of the copyright, the designer may wish to see whether some possible variations of the mark under design are already registered.

In this respect, it is desirable for the system capable of returning the retrieved results in terms of the similarity, and displaying the results simultaneously for comparison. So far, the current practice to retrieve similar or the same device-type mark is performed manually by human operator resulting in many duplicated registrations.

Therefore, there is an enormous potential need for an automatic retrieval of marks by the contents-based similarity not only in the international community but also in domicile. The submission of the mark to the system can be done interactively on-line to refine the search-process on a web-based Internet paradigm [4].

Application requirementsSpecific requirements for those applications are:

Efficient interactive response time.

Support for a mechanism by which a mark image may be submitted for similarity based retrieval.

Support for visual based descriptors by which modifications can be made to any of the retrieved results for fine-tuning the search-process (relevance


feedback).


Support for shape-based and content-based queries; Support for precise, shape-oriented similarity queries; Support for scale and orientation independence of marks; Support for media summaries for fast browsing; Support for linking relevant information Support for D’s and DS’s invariant under transformations irrelevant to the

intended features

Application relevant work and referencesAndrews, B. (1990). U. S. patent and trademark office ORBIT trademark retrieval

system. T-term user guide, examining attorney’s version, : October, 1990.Cortelazzo, G., & Mian, G. A., & Vezzi, G., & Zamperoni, P. (1994). Trademark

shapes description by string-matching techniques. Pattern Recognition, 27(8), 1005-1018.

Eakins, J. P. (1994). Retrieval of trademark images by shape feature. Proc. of Int. Conf. on Electronic Library and Visual Information Research, 101-109, May, 1994.

Eakins, J. P., & Shields, K., & Boardman, J. (1996). ARTISAN – a shape retrieval system based on boundary family indexing. Proc. SPIE, Storage and Retrieval for Image and Video Database IV, vol. 2670, 17-28, Feb. 1996.

Lam, C. P., & Wu, J. K., & Mehtre, B. (1995). STAR - a system for trademark archival and retrieval. Proceedings 2nd Asian Conf. on Computer Vision, vol. 3, 214-217.

Kim, Y-S, & Kim, W-Y (1998). Content-Based Trademark Retrieval System Using Visually Salient Feature, Journal of Image and Vision Computing, vol. 16/12-13, August 1998.

WORLD INTELLECTUAL PROPERTY ORGANIZATION: (WIPO)http://www.wipo.org/eng/dgtext.htm

5. “Push” ApplicationsIn contrast with the above “pull” applications, the following “push” applications follow a paradigm more akin to broadcasting, and the emerging webcasting. The paradigm moves from indexing and retrieval, as above, to selection and filtering. Such applications have very distinct requirements, generally dealing with streamed descriptions rather than static descriptions stored on databases.


5.1 User agent driven media selection and filtering

Application descriptionFiltering is essentially the converse of search. Search involves the pull of information, while filtering implies information ‘push.’ Search requests the inclusion of information, while filtering excludes data. Both pursuits benefit strongly from the same sort of meta-information.

Broadcast media are unlikely to disappear any time soon. In fact, there is a movement to make the World Wide Web, primarily a pull medium, more broadcast-like. If we can enable users to select information more appropriate to their uses and desires from a broadcast stream of 500 channels, using the same meta-information as that used in search, then this is an application for MPEG-7.

This application gives rise to several sub-types, primarily divided among types of users. A consumer-oriented selection gives rise to personalised audio-visual programmes, for example. This can go much farther than typical video-on-demand in collecting personally relevant news programmes, for example. A content-producer oriented selection made on the segment or shot level is a way of collecting raw material from archives.

Application requirements Efficient interactive response times, and

The capability to characterise a media object by a set of concepts that may be dependent on locality or language.


Support for descriptors and description schemes that allow multiple languages.

There must exist a mechanism by which concepts may be represented


Lieberman, H. (1997) “Autonomous Interface Agent,” In proceedings of Conference on Computers and Human Interface, CHI-97, Atlanta, Georgia.

Maes, P. (1994a) “Agents that Reduce Work and Information Overload,” Communications of the ACM, vol. 37, no. 7, pp. 30 - 40.

Marx, M., & C. Schmandt (1996) “CLUES: Dynamic Personalized Message Filtering,” In proceedings of Conference on Computer Supported Cooperative Work /CSCW 96, edited by M. S. Ackermann, pp. 113 - 121, Hyatt Regency Hotel, Cambridge, Mass.: ACM.

Nack, F. (1997) Considering the Application of Agent Technology for Collaboration in Media-Networked Environments. IRIS20,


Shardanand, U., & P. Maes (1995) “Social Information Filtering: Algorithms for Automating 'Word of Mouth',” In proceedings of CHI-95 Conference, Denver, CO: ACM Press.

See also Annex A.

5.2 Personalised Television Services

Application DescriptionIn the broadcast area, the MPEG-7 description can provide the user with assistance in selection of broadcast data, be it for immediate or later viewing, or for recording. In a personalized broadcast scenario, the data offered to the user can be filtered from broadcast streams according to his own profile, the generation of which may be done automatically (e.g. based on location, age, gender or on the previous selection behavior) or semi-automatically (e.g. based on pre-set interests). The broadcast of MPEG-7 description streams will enable providers of Electronic Programme Guides (EPGs) with a variety of capabilities, wherein presentation of MPEG-7 data (also along with the original AV data) will also be an important aspect. In combination with NVOD (Near-Video on Demand) services and recording, new functionalities like stepping forward/backward based on keyframe selection and changes in the sequel of scenes for speed-up in presentation are possible. Extended interactivity functionalities, related to specific events in the programmes, are of importance for future broadcast services as well. This can include "offline" interactivity based on a recorded broadcast stream, which can require identification of the associated event during callback. It can be expected that MPEG-7 data will be transmitted along with the AV data streams, or (e.g. for an EPG channel) also as separate streams [5].

Application requirements Description of broadcast media objects in terms of, for example, content type,

author, cast, parental rating, textual description, temporal relationship, locality- and language-dependent features, service provider, and IP protection.

Description of specific events in broadcast media objects

Specification of interactivity capability related to specific events (e.g. polling) and definition of associated interaction channels (e.g. telephone numbers or hyperlinks)

Presentation of content-related data, also along with the associated media objects, and manipulation of the presentation based on the content description (e.g. by event status)

Support for APIs that define receiver-side filter function, interaction capability or definition of extended content description

Support for unique presentation of the media objects and the associated content description, controllable by user interaction and description parameters

The broadcast metadata to be put into MPEG-7 streams must be flexible, the broadcasters must be able to "spice" the content based on available resources. As little information as possible should be mandatory, such that broadcasters can start the service without a huge investment. Upwards compatibility with existing


standards like DVB-SI, ATSC PSIP or metadata definitions originating from the EBU/SMPTE task forces also falls under this aspect.

Capability to set up content- or service-related links between different broadcast media objects (not necessarily transmitted simultaneously)


Support for descriptors and description schemes that allow definition of broadcast media objects and events as listed above in their temporal (absolute and relative time bases, duration of validity) and physical (channel, stream) context

Support for streaming of MPEG-7 data

Support for interactivity, including identification of the related event during a callback procedure


5.3 Intelligent multimedia presentation

Application DescriptionGiven the vast and increasing amount of information available, people are seeking new ways of automating and streamlining presentation of that data. That may be accomplished by a system that combines knowledge about the context, user, application, and design principles with knowledge about the information to be displayed. Through clever application of that knowledge, one has an intelligent multimedia presentation system.

Application requirements The ability to provide contextual and domain knowledge, and

The ability to represent events and the temporal relationships between events.


There must exist a mechanism by which contextual information may be encoded.

There must exist a mechanism by which temporal relationships may be represented


André, E., & Rist, T. (1995). Generating Coherent Presentations Employing Textual and Visual Material. Artificial Intelligence Review, Special Volume on the Integration of Natural Language and Vision Processing, 9(2 - 3), 147 - 165.


Bordegoni, M., et al, “A Standard Reference Model for intelligent Multimedia Presentation Systems,” April 1997, pre-print. <http://www.dfki.uni-sb.de/~rist/csi97/csi97.html>

Davenport, G., & Murtaugh, M. (1995). ConText: Towards the Evolving Documentary. In ACM Multimedia 95 - Electronic Proceedings. San Francisco, California: November 5-9, 1995.http://ic.www.media.edu/icPeople/murtaugh/acm-context/acm-context.html

Feiner, S. K., & McKeown, K. R. (1991). Automating the Generation of Coordinated Multimedia Explanations. IEEE Computer, 24(10), 33 - 41.

Maybury, M. T. (ed.) (1993) Intelligent Multimedia Interfaces. AAAI Press/ MIT Press, Cambridge, MA.

Maybury, Mark T. (ed) (1997) Intelligent multimedia information retrieval. Menlo Park, Calif.

See also Annex A.

5.4 Personalizable Browsing, Filtering and Search for Consumers

Application DescriptionImagine coming home from work late Friday evening, happy that the week is over. You want to catch up with the world and then watch the ABC’s 20/20 show later that night. It is now 9 PM and the 20/20 show will start in an hour at 10PM. You are interested in the sports events of the week, and all the news about the local elections. Your smart appliance has been selecting and recording TV broadcast programs according to your general preferences all week long. It has a large amount of content, definitely much longer than the hour you only have. You start interacting with the appliance. You first identify yourself to the appliance by speaking your name. The appliance uses voice verification to recognize you and then invokes your user profile (which is indeed different than your spouse’s and children) to customize your viewing experience and anticipate your needs.

You speak to your appliance “Show me the sports programs you have recorded”. At that moment you are interacting with the browser of the system. On your large LCD display, you see listed the following: Basketball and Soccer. Apparently, your favorite football team’s game was not broadcast that week. You are interested in the basketball games and express your desire to the appliance. On the screen there appears a Title Frame and Title Text for each one of the games recorded. The Title Frame captures an important moment of the game. The Title Text is a very brief summary, containing the score, similar to those that can be found, for example, on the CNN-SI web site. You are interested in the Chicago-Bulls game, you like to spend not more than 15 minutes on sports and you choose the 5-minute highlights option. You could have also wanted to watch only the clips that contain “slam dunks”. The system has also captured information from the web about this particular game. Your browser supports tools to enhance the audiovisual presentation by other types of material such as textual information. Five minutes is now up. You have seen the most


worthwhile moments of the game. You are impressed and you decide to archive the game. The system will record the entire program on a long-term storage media. Meanwhile you are ready to watch the news about local elections..

It is now 9:50 PM, and you are done with the news. In fact, you have chosen to delete all the recorded news items after you have seen them. You remember to do one last thing before 10PM that night. The next day, you want to watch the digital camcorder tape that you have received from your brother that day, containing footage about his new baby girl and his vacation in Spain last summer. You want to watch the whole 2-hour tape but you are anxious to see what the baby looks like as well as the new aquarium they built in Barcelona (which was not there when you visited Spain 10 years ago). You plan to take a quick look at a visual summary of the tape, browse, and perhaps watch a few segments for a couple of minutes. Your camcorder is already connected to your smart appliance. You put the tape in and start browsing its contents using your smart appliance. You quickly discover the baby’s looks by finding and playing back the right segments. Thanks to standard (MPEG-7) means of describing information audiovisual information, your browser has no difficulty browsing the contents of your brother’s tape. It is now 10:10 PM, it seems that you are 10 min late for the 20/20. But, fortunately, your appliance has been recording 20/20 since 10 PM since it knows that you watch or record it every week. You start watching the recorded program, as the recording proceeds.

Application Requirements There should be support descriptors and description schemes which can express

the interaction between a user and multimedia material, such as the user's preferences, and other user-specific information.

MPEG-7 Specific RequirementsThere are no additional requirements specific to MPEG-7 at this time.


P. van Beek, R. Qian, I. Sezan, “MPEG-7 Requirements for Description of Users,” ISO IEC/JTC1/SC29/WG11 Doc. M4601, MPEG Seoul.

5.5 Information access facilities for people with special needs

Application descriptionIn our increasingly information dependent society we have to facilitate accessibility to information to every individual user. However, some people face serious accessibility problems to information, not because they lack the economic or technical basis but rather because they suffer from one or several disabilities, e.g. visual, auditory, motor, or cognitive disabilities. Providing active information representations might help to overcome the problems. The key issue is to allow multi-modal communication to present information optimised for the abilities of individual users.

Thus, it is important to develop technical aids to facilitate communication for people with special needs. For example, a search agent that does not exclude images as information resource for the blind but rather makes available the MPEG-7 meta-data.


Aided by that meta-data, sonification (auditory display), or haptic display is made possible. Similarity of meta-data helps to provide a set of information in different modalities, in case the particular information is not accessible for the user.

Such applications provide full participation in society by removing communication and information access barriers that restrict interactions between people with and without disabilities, and they will lead to improved global commerce opportunities.

Application requirements[no new ones apparent]


Support description schemes that contain descriptors of visual, audio, and/or other features .


The Yuri Rubinsky Insight Foundation: http://www.yuri.org/webable/library.html#guidlinesandstandards

The Centre of Cognitive Science: http://www.cogsci.ed.ac.uk/

The Human Communication Research Centre: http://www.hcrc.ed.ac.uk/

6. Specialised Professional and Control ApplicationsThe following potential MPEG-7 applications do not limit themselves to traditional, media-oriented, multimedia content‘, but are functional within the meta-content representation to be developed under MPEG-7. They reach into such diverse, but data-intensive, domains as medicine and remote sensing. Such applications can only serve to increase the usefulness and reach of this proposed international standard.

6.1 Teleshopping

Application descriptionMore and more merchandising is being conducted through catalogue sales. Such catalogues are rarely effective if they are restricted to text. The customer who browses such a catalogue is more likely to retain visual memories than text memories, and the catalogue is frequently designed to cultivate those memories. However, given the sheer size of many of these catalogues, and the fact that most people only have a vague idea of what they want ("I'll know it when I see it"), they will only be effective if it is possible to find items by successively refining and/or redirecting the search. Typically, the customer will spot something that is almost right, but not quite. He or she will then want to fine-tune the search-process by interacting with the system. E.g.


“I'm looking for brown shoes, a bit like those over there, but with a slightly higher heel,” or “I'm looking for curtains with that sort of pattern, but in a more vivid colour.”

Catalogues of items for which stock maintenance is difficult or expensive but for which the search process is essentially visual (e.g. garden design, architecture, interior decorating, oriental carpets) are especially aided by this application. For such items, detailed digital image-databases could be supported and updated centrally and accessed from distributed selling points.

Application requirements Support for interactive queries with few predicted constraints,

Support for precise, product-oriented similarity queries, and

MPEG-7 should operate “as fast as possible,” allowing efficient interactive response times.


Support for visual based descriptors.


André, E., J. Mueller, & T. Rist (1997) “Adding Animated Presentation Agents to the Interface,” To appear in the proceedings of IJCAI 97 - Workshop on Animated Interface Agents: Making them intelligent, Nagoya, Japan.

Chavez, A., & P. Maes (1996) “Kasbah: An Agent Marketplace for Buying and Selling Goods,” In proceedings of 1. International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology, London, UK.

Rossetto, L., & O. Morton (1997) “Push!” Wired, no. 3.03 UK, March 97 , pp. 69 - 81.

At the moment there are very few catalogues that provide content-based image retrieval for teleshopping and the ones that do, understandably focus on applications where images of merchandise are relatively homogeneous and standardised, such as wallpaper, flooring, tiles, etc. A searchable online database of full-colour flooring samples including carpet and vinyl products can be found at <http://www.floorspecs.com>. They promise a colour match system in the near future.

6.2 Bio-medical applications

Application descriptionMedicine is an area in which visual recognition is often a significant technique for diagnosis. The medical literature abounds with atlases, volumes of photographs that depict normal and pathological conditions in different parts of the body, viewed at different scales. An effective diagnosis may often require the ability to recall that a


given condition resembles an image in one of the atlases. The amount of material catalogued in such atlases is already large and continues to grow. Furthermore, it is often very difficult to index using textual descriptions only. Therefore, there is a growing demand for search-engines that can respond to image-driven queries. This will allow physicians to access image-based information in a way that is similar to the current keyword-based search-engines such as MEDLINE. (E.g. in order to make a differential diagnosis, a radiologist might want to compare medical records and case-histories of all patients in a medical database for which radiographs showed similar lesions or pathologies. Furthermore, as 3D-imaging techniques keep gaining importance, such image-driven queries will have to be able to handle both 2- and 3-dimensional data. Cross-modal search will apply when one includes associated clinical auditory descriptions, such as associating a cough with a chest x-ray in order to aid diagnosis.

Biochemical interactions crucially depend on the three-dimensional structure of the participating modules (e.g. the shape-complementarity between signal-molecules and cell-receptors, or the key-in-lock concepts for immunological recognition). Thanks to the sustained effort of a large number of biomedical laboratories, the list of molecules for which the chemical composition and spatial structure are documented us growing continuously. Given the fact that it is still extremely difficult to predict the structure of a biomolecule on the basis of its primary structure (i.e. the string of constituent atoms), searching these databases will only be helpful if it can be done on the basis of shape. It is not difficult to make the leap from the ability to search on 3-D models, as proposed by MPEG-7. Such applications would be extremely helpful in drug-design, as one could e.g. search for biomolecules with shapes similar to a candidate-drug to get an idea of possible side effects.

Application requirements The ability to link libraries of correlated and relevant information (e.g. images,

patient history, clinical findings, medication regime),

The ability to perform on-line annotations and mark regions-of-interest with any shape or form of connectivity (from small and compact to large and diffuse),

The ability to search images that contain similar regions-of-interest, ignoring the visual characteristics of the rest of the image,

The ability to handle both n -dimensional data (and their time-evolution).


Support for n-dimensional data descriptors.

Support for descriptors containing temporal information.

There must exist a mechanism by which a specific segmentation or hierarchical level may be chosen, to the exclusion of all other segments.

Application relevant work and referencesOne development that might be relevant is the Picture Archiving and Communication System (PACS) which allows physicians to annotate images with text, references and


questions. This greatly facilitates interaction and consultation with colleagues and specialists.

Furthermore, there is the STARE-project (STructured Analysis of the REtina) at the University of California, San Diego, which is an information system for the storage and content-based retrieval of ocular fundus images. <http://oni.ucsd.edu/stare>

6.3 Universal Access

Application DescriptionNew classes of pervasive computing devices such as personal digital assistants (PDAs),hand-held computers (HHC), smart phones, automotive computing devices, and wearable computers allow users more ubiquitous access to information than ever. As users are beginning to rely more heavily on pervasive computing devices, there is a growing need for applications to bring multimedia information to the devices. The basic idea of Universal Multimedia Access is to enable client devices with limited communication, processing, storage and display capabilities to access rich multimedia content.

Figure 1: Adaptation of multimedia content to pervasive computing devices.

Recently, several solutions focussed on adapting the multimedia content to the client devices as illustrated in Figure 1. Universal Multimedia Access can be provided in two basic ways -- the first is by storing, managing, selecting, and delivering different versions of the media objects (images, video, audio, graphics and text) that comprise the multimedia presentations. The second is by manipulating the media objects on-the-fly, such as by using methods for text-to-speech translation, image and video transcoding, media conversion and summarization. This allows the multimedia content delivery to adapt to the wide diversity of client device capabilities in communication, processing, storage, and display.


Application RequirementsIn order for the Application to work, there must exist some mechanisms for the following:

Description of resource requirements of multimedia material, such as data size, screen size requirements, minimum streaming bandwidth requirements.

Management and selection of different versions of multimedia material to adapt to client device capabilities, network infrastructure, and user preferences.

Description of the capabilities of client devices such as the display size, screen color depth, audio, video display, storage space, processing power, network access bandwidth, and so forth.

Methods for manipulating, transcoding, and summarizing multimedia material.

Maintenance of the synchronization and layout of transcoded multimedia material.


There should be some mechanism for media object resource requirements.

There should be some mechanism for the synchronization needs of multimedia presentations.

There should be some mechanism for transcoding hints that allow content authors to guide the processes of manipulating and adapting multimedia material.

There must be mechanisms that allow the direct manipulation of multimedia material.


T. Ebrahimi and C. Christopoulos, “Can MPEG-7 be used beyond database applications?”, ISO/IEC JTC1/SC29/WG11 MPEG98/MP3861, October 1998, Atlantic City, USA

Charilaos Christopoulos, Touradj Ebrahimi, V.V. Vinod, John R. Smith,Rakesh Mohan, and Chung-Sheng Li, "MPEG-7 application: Universal Access Through Content Repurposing and Media Conversion", ISO/IEC JTC1/SC29/WG11 MPEG99/MP4433, March 1999, Seoul, Korea.

L. Bergman et al, “Integrated DS for Multimedia Content”, ISO/IEC JTC1/SC29/WG11 MPEG99/MP473, February 1999, Lancaster, UK

J.R. Smith, C.-S. Li, “P473: InfoPyramid Description Scheme for Multimedia Content”, ISO/IEC JTC1/SC29/WG11 MPEG99/MP473 (slides), February 1999, Lancaster, UK.


Adam T Lindsay, 01/03/-1,

I changed these to “some mechanism” to match the other requirements, and to indicate that there is more than one way of doing it: it doesn’t need to be a descriptor.

Adam T Lindsay, 01/03/-1,

the other eleven were moved to the Annex.

6.4 Remote Sensing Applications

Application descriptionIn remote sensing applications, the requirements of satellite image databases, namely several millions of images acquired according to various modalities (panchromatic, multispectral, hyperspectral, and hexagonal sampling, etc.), the diversity of potential users (scientists, military, geologists, etc.), and improvements in telecommunication techniques make it necessary to define a highly efficient description standard. Until now, information search in image libraries is based on textual information such as scene name, geographic, spectral, and temporal information. Based on this, information exchange is achieved by means of tapes and photographs.

A challenging aspect is to provide capabilities of exploiting such complex databases from on-line systems supporting the following functionalities:

textual query, image query based on either whole or part of a reference image (one or several

spectral bands), content-based retrieval, browsing, and confidentiality and data protection.

MPEG-7 should be an appropriate framework for solving such requests.

Application requirements support various description schemes, support for different data (e.g. multispectral, hyperspectral, and SAR)

associated with various sensors (ground resolution, wavelength), ability to include multiple descriptions of the same documents, ability to link correlated information, similar region of interest, and ability to take into account time evolution for 2D and 3D data.


Support for multiple descriptors and description schemes for the same data. Support for descriptors for unique data types. Support for descriptors embodying temporal change. There must be a mechanism by which descriptors and description schemes within

a description may be linked to other D’s and DS’s within the same description.



Semi-automated multimedia editing

Application DescriptionGiven sufficient information about its contents, what could a multimedia object do? With sufficient information about its own structure combined with methods on how to manipulate that structure, a ‘smart’ multimedia clip could start to edit itself in a manner appropriate to its neighbouring multimedia. For example, a piece of music and a video clip, from different sources, could be combined in a way such that the music stretches and contracts to synchronise with specific ‘hit’ points in the video, and thus create an appropriate and customised soundtrack.

This could be a new paradigm for multimedia, adding a ‘method’ layer on top of MPEG-7’s ‘representation’ layer. By making multimedia ‘aware,’ to an extent, one opens access to beginning users and increases productivity for experts. Such hidden intelligence on the part of the data itself shifts multimedia editing from direct manipulation to loose management of data.

Semi-automated multimedia editing is a broad category of applications. It can facilitate video editing for home users as well as experts in studios through varying amounts of guidance or assistance through the process. In its simpler version, assisted editing can consist of an MPEG-7-enabled browser for the selection of video shots, using a suitable shot description language. In an intermediate version, assisted editing can include planning, i.e. proposing shot selections and edit points, satisfying a scenario expressed in a sequence description language.

Application requirements Pointers as ‘handles’ that refer to the data directly, to allow manipulation of the

multimedia.


Ability to link descriptors to the data that they describe.


Bloch, G.R (1986) Elements d’une Machine de Montage Pour l’Audio-Visuel. Ph.D., Ecole Nationale Superieure Des Telecommunications.

Nack, F. (1996) AUTEUR: The Application of Video Semantics and Theme Representation for Automated Video Editing. Ph.D. Thesis, Lancaster University.

Parkes, A.P. (1989) An Artificial Intelligence Approach to the Conceptual Description of Videodisc Images. Ph.D. Thesis, Lancaster University.

Sack, W., & Davis, M. (1994). IDIC: Assembling Video Sequences from Story Plans and Content Annotations. IEEE International Conference on Multimedia Computing Systems, Boston, MA: May 14 - 19, 1994.


Sack, W., & Don, A. (1993) Splicer: An Intelligent Video Editor (Unpublished Working Paper, MIT).

6.5 Educational applications

Application descriptionThe challenge of using multimedia in educational software is to make as much use of the intrinsic information as possible to support different pedagogical approaches such as summarisation, question answering, or detection of and reaction to misunderstanding or non-understanding.

By providing direct access to short video sequences within a large database, MPEG-7 can promote the use of audio, video and film archive material in higher education in many areas:

History: Radio, television and film provide detailed accounts of many contemporary events, useful for class-room presentations, provided that a sufficiently precise (MPEG-7) description can be queried based on dates, places, personalities, etc. (see also 4.5-Historical Speech Databases)

Performing arts (music, theatre): Fine-grained, standardised descriptions can be used to bring a selection of relevant documents into the classroom for special classes, using on line video archives as opposed to costly local tape libraries. For instance, several productions of a theatrical scene, or musical work, can thus be consulted for comparison and illustration. Because classic and contemporary theatre are widely available in translation, this application can target worldwide audiences.

Film Music: A tool can be developed for improving the knowledge and skills of users in the domain of film theory/practice and film music (music for film genres). Depending on the user’s background the system should provide enough material to not only improve the user’s ability in understanding the complexity of each single medium but also to handle the complex relationships between the two media film and music. To achieve this, the system should offer an environment in which the student can perform guided/supported experiments, e.g. on editing film, mixing sound, or combining both, which requires that the system can analyse and criticise the results achieved by the user.

Thus, this system must be able to automatically generate film/sound sequences and their synchronisation based on stereotypical music/film pattern for film genres, and perhaps ways to creatively break the established generating rules.

Application requirements Linking mechanisms to synchronise between MPEG-7 descriptors and other

sources of information (e.g.HTML, SGML, World Wide Web services, etc.)

Mechanisms for allowing specialised vocabularies.

The ability to allow real time operation in conjunction with a database.



Support for interoperation with description schemes.

Support for descriptors and description schemes that allow specialised languages or vocabularies.

There must exist a mechanism by which descriptions may link to external information.


Margaret Boden (1991) The Creative Mind: Myths and Mechanisms, Basic Books, New York

Schank, R. C. (1994). Active Learning through Multimedia. IEEE MultiMedia, 1(1), 69 - 78.

Sharp, D., Kinzer, C., Risko, V. & the Cognition and Technology Group at Vanderbilt University. (1994). The Young Children's Video Project: Video and software tools for accelerating literacy in at-risk children. Paper presented at the National Reading Conference, San Diego, CA http://www.edc.org/FSC/NCIP/ASL_VidSoft.html

Tagg, Philip (1980, ed.). Film Music, Mood Music and Popular Music Research. Interviews, Conversations, entretiens. 1980, SPGUMD 8002

Tagg, Philip (1987). Musicology and the Semiotics of Popular Music. Semiotica, 66-1/3: 279-298. (This and other texts accessible on-line via http://www.liv.ac.uk/ipm/tagg/taggwbtx.htm

See also the reference section of 6.3-Semi-automated multimedia editing and Annex A.

6.6 Surveillance applications

Application descriptionThere are a number of surveillance applications, in which a camera monitors sensitive areas and where the system must trigger an action if some event occurs. The system may build its database from no information or limited information, and accumulate a video database and meta-data as time elapses. Meta-content extraction (at an “encoder” site) and meta-data exploitation (at a “decoder” site) should exploit the same database.

As time elapses and the database is sufficiently large, the system, at both sides, should have the ability to support operations on the database, such as:

Search on the audio/video database for a specific event (synthetic or current data). Event is a sequence of audio/video data.

Find similar events in the past.

Make decisions on the current data related to the accumulated database, and/or to a-priori known data.


A related application is in security and forensics, in the matching of faces or fingerprints.

Application requirements Real time operation in conjunction with a database, and

Domain-specific features.


Support for descriptors and description schemes that allow specialised languages or vocabularies.

Support for descriptors for unique data types.


Courtney, J. D. (1997). Automatic Video Indexing by Object Motion Analysis. Pattern Recognition, vol. 30, no. 4, 607-626.

6.7 Visually-based control

Application descriptionIn the field of control, there have been several developments in the area of visually based control. Instead of using text-based approaches for control programming, images, visual objects, and image sequences are used to specify the control behaviour and are an integral part of the control loop (e.g. visual servoing).

One aspect in the description of control information between (video) objects is that objects are not necessarily associated via temporal spatial relationships. Accumulation of the control and video information allows visually based functions such as redo, undo, search-by-task, or object relationship changes [6].

Application requirements Description of relationships between arbitrary object nodes in addition to spatio-

temporal relationships, as in e.g. BIFS

Allow searches based on the arbitrary (control) associations.

MPEG-7 should operate “as fast as possible,” allowing efficient interactive response times.


Support for descriptors and description schemes containing spatio-temporal relationships.

Support for descriptors containing relationships between arbitrary objects.



Palm, S.R., Mori, T., Sato, T. (1998) Bilateral Behavior Media: Visually Based Teleoperation Control with Accumulation and Support, Submitted to Robotics & Automation Magazine special issue on Visual Servoing.

7. Applications Making Use of Specific Descriptors & Description Schemes

The applications described in this section have been added after the Committee Draft of the MPEG-7 standard has been issued. They represent target applications based on some of the specific descriptors and description schemes that have been adopted by the standard.

7.1 Transcoding Applications

IntroductionThe amount of audiovisual (AV-) content that is transmitted over optical, wireless and wired networks is growing. Furthermore, within each of these networks, there are a variety of AV-terminals that support different formats. This scenario is often referred to as Universal Multimedia Access (UMA), which has been described in section 6.3. The involved networks are often characterized by different network bandwidth constraints, and the terminals themselves vary in display capabilities, processing power and memory capacity. Therefore, it is required to represent and deliver the content according to the current network and terminal characteristics. For example, audiovisual content stored in a compressed format such as MPEG-1, 2 or 4 has to be converted between different bit-rates and frame-rates, but must also account for different screen sizes, decoding complexity and memory constraints of the various AV-terminals.To avoid storing the content in different compressed representations for different network bandwidths and different AV-terminals, low-delay and real-time transcoding from one format to another is required. Additionally, it is critical that subjective quality be preserved. To better meet these requirements, Transcoding Hints have been defined by the MPEG-7 standard. The remainder of this document describes some specific application scenarios that could benefit by MPEG-7 transcoding hints. Additionally, we provide some input regarding the device capabilities.

In the following, several distinct application scenarios that could benefit by MPEG-7 transcoding hints are discussed. The transcoding hints that we refer to can be found in section 8 of ISO/IEC CD 15938-5 (Multimedia Description Schemes). Each of these applications differ in the way the transcoding hints may be extracted and their usage. However, all follow the general framework illustrated in Fig. 2. For each of the applications considered, we highlight differences in the flow of information, including the extraction and use of the transcoding hint, as well as the corresponding devices that generate and consume the description.


Transcoder – bits to bitsEncoder – video to bitsDecoder – bits to video

Content +Transcoding

Hints

TranscodingHint Extraction

Content

Meta-data

From database,network orlocal device

MPEG-7 consuming device

Network/TerminalCharacteristics

Output

Figure 2: Generic Illustration of Transcoding Hints Application Scenario.

Server/Gateway ApplicationProbably the most intuitive application and is also extensible to network nodes, such as a base station or gateway. In the case of content being stored on a server, the extraction is performed at the server, where real-time constraints may be relaxed. In the case of content being transmitted through the network, extraction has already been performed and is encapsulated with the content. In both cases, transcoding should be performed in real-time. There is no complexity added in the consuming device (e.g., mobile phone). In addition, since the transcoding hints have been generated offline, the server/gateway can implement the transcoding fast and efficient, therefore minimizing the amount of delays.As a variation of this application scenario, Fig 3 illustrates a situation with two servers, a Broadcast/Internet server and a Home server. The major differences with the above is that (i) there are two servers and (ii) the real-time transcoding constraints may be relaxed. In the example shown, the Broadcast/Internet server distributes content, such as MPEG-2 encoded video, with associated transcoding hints to a local home-server. The home-server enables a wide range of interconnections and may be connected to the client through a wireless link (e.g. by Bluetooth), by cable link (e.g. IEEE 1394, USB), by portable media-like flash memory or several kinds of magneto/optical disks.

Figure 2: Two-Server Application Scenario


Client Upload/Content Sharing ApplicationThis application assumes that image/video content is captured locally (e.g., on a mobile device or remote broadcast camera) in one format and the user would like to upload the content (e.g., to a home/central server) in another format. This application may tolerate delay, but nevertheless should be carried out with good efficiency and preserve quality. The media conversion may consider conversion between compression formats, such as DV to MPEG-1, or within the same compression format but at a lower frame-rate, bit-rate or picture resolution. The major objective is content sharing, either for live viewing or interoperability with other devices. For this application, the extraction of transcoding hints and the actual transcoding is performed at the client. In another application scenario, the client might have to send this content to other clients (in a Multimedia Messaging scenario, for example). Since the terminal capabilities are different, the transcoding hints will be useful in order to speed up the transcoding in the gateway.

Mobile Video CameraThe illustration of Fig. 4 considers the case of a mobile video camera that is capable of extracting the transcoding hints. As many video cameras have the possibility to save the video sequence (or parts of the video sequence) in different representations, the transcoding hints could be very useful. Many video cameras also have the possibility to transmit the captured or stored video by cable (IEEE 1394, USB, etc.) or remote link (Bluetooth, etc.) to the home-video server or other devices, which perform a transcoding by themselves employing the generated transcoding hints.

A related application is a mobile live video camera with built-in transcoding hints extraction, providing video streaming over IP based networks. The content associated transcoding hints can help the transcoder in a gateway/network server to adapt to different target bit-rates.

Figure 4: Mobile Video Camera Scenario


Mobile Video Phone In this application scenario, Fig. 4, video with associated transcoding hints are passed through or transcoded by video enabled mobile phones. For example, in this application a user downloads a short video-clip of a baseball highlight scene with associated transcoding hints from the Internet using a video enabled mobile phone. Later, the user wants to share the baseball video clip with his/her friends at the same bit-rate or at reduced bit-rate and quality.

In the first case, the videophone only has to pass the video clip and the transcoding hints to the receiver. The receiving user still may need the transcoding hints for redistribution or archiving of the video clip.

In the later case, the mobile videophone has to convert the video clip between the different formats of the sender and receiver using transcoding hints. Some of the receivers may only be interested in still images of the baseball game, which may be generated using importance and/or mosaic metadata. Finally, some of the receivers of the baseball video clip (or baseball still images) want to store these on their home-server in the same format as all the other baseball images or video content already being there. For this purpose, the user again needs the transcoding hints.

Figure 5: Mobile Video Phone Scenario

Scalable Content Playback ApplicationIn this application, content is being received at the client, but the terminal does not have the resources to decode and/or display the full bitstream in terms of memory, screen resolution or processing power. Therefore, the transcoding hints may be used to adaptively decode the bitstream, e.g., by dropping frames/objects or decoding only the important parts of a scene, as specified by the received transcoding hints.

Specific Usage of Transcoding HintsThe Transcoding Hints can be used for complexity reduction as well as for quality improvement in the transcoding process. Among the transcoding hints that we


describe are specifically the Difficulty Hint, the Shape Hint and the Motion Hints. The Difficulty Hint describes the bit-rate coding complexity. This hint can be used for improved bit rate control and bit rate conversion from constant bit-rate (CBR) to variable bit-rate (VBR). The Shape Hint specifies the amount of change in a shape boundary over time and is proposed to overcome the composition problem when encoding or transcoding multiple video objects with different frame-rates. The Motion Hints describe i) the motion range, ii) the motion uncompensability and iii) the motion intensity. This meta-data can be used for a number of tasks including, anchor frame selection, encoding mode decisions, frame-rate and bit-rate control, as well as bit-rate allocation among several video objects for object-based MPEG-4 transcoding. These transcoding hints, especially the search range hints, aim also on computational complexity reduction of the transcoding process. Further details can be found in the MDS XM document, as well as the references cited below.

ReferencesKuhn, P. (2000) Camera Motion Estimation using feature points in MPEG

compressed domain, Proc. IEEE Int’l Conf. on Image Processing, Vancouver, CA.

Kuhn, P. and Suzuki T. (2001) MPEG-7 Meta-data for Video Transcoding: Motion and Difficulty Hints, Proc. Storage and Retrieval for Multimedia Databases, SPIE 4315, San Jose, CA.

Vetro, A., Sun H. and Wang Y. (2001) Object-Based Transcoding for Adaptable Content Delivery, IEEE Trans. Circuits and Systems for Video Technology, to appear March 2001.

8. References

[1] MPEG Requirements Group, “MPEG-7 Context, Objectives, and Technical Roadmap”, Doc. ISO/MPEG N2861, MPEG Vancouver Meeting, July 1999.

[2] MPEG Requirements Group, “MPEG-7 Requirements”, Doc. ISO/MPEG N2859, MPEG Vancouver Meeting, July 1999.

[3] Rémi Ronfard, “MPEG-7 Applications in Radio, Film, and TV archives,” Doc. ISO/MPEG M2791, MPEG Fribourg Meeting, October 1997.

[4] Whoi-Yul Kim et al, “MPEG-7 Applications Document,” Doc. ISO/MPEG M3955, MPEG Atlantic City Meeting, October 1997.

[5] Jens-Rainer Ohm et al, “Broadcast Application and Requirements for MPEG-7,” Doc. ISO/MPEG M4107, MPEG Atlantic City Meeting, October 1997.

[6] Stephen Palm, “Visually Based Control: another Application for MPEG-7,” Doc. ISO/MPEG M3399, MPEG Tokyo Meeting, March 1998.


Annex A: Supplementary references, by application

4.1 & 4.2 Storage and retrieval of video databases & Delivery of pictures and video for professional media production

Aguierre Smith, T. G., & Davenport, G. (1992). The Stratification System. A Design Environment for Random Access Video. In ACM workshop on Networking and Operating System Support for Digital Audio and Video, San Diego, California

Aguierre Smith, T. G., & Pincever, N. C. (1991). Parsing Movies In Context. In Proceedings of the Summer 1991 Usenix Conference, (pp. 157-168). Nashville, Tennessee.

Aigrain, P., & Joly, P. (1994). The automatic real-time analysis of film editing and transformation effects and its applications. Computer & Graphics, 18(1), 93 - 103.

American Library Association's ALCTS/LITA/RUSA. Machine-Readable Bibliographic Information Committee. (1996). The USMARC Formats: Background and Principles. MARC Standards Office, Library of Congress, Washington, D.C. November 1996Bateman, J. A., Magnini, B., & Rinaldi, F. (1994). The Generalized Italian, German, English Upper Model. In Proceedings of the ECAI94 Workshop: Comparison of Implemented Ontologies, Amsterdam.

Bloch, G. R. (1986) Elements d'une Machine de Montage Pour l'Audio-Visuel. Ph.D., Ecole Nationale Superieure Des Telecommunications.

Bobrow, D. G., & Winograd, T. (1985). An Overview of KRL: A Knowledge Representation Language. In R. J. Brachman & H. J. Levesque (Eds.), Readings in Knowledge Representation (pp. 263 - 285). San Mateo, California: Morgan Kaufmann Publishers.

Butler, S., & Parkes, A. (1996). Film Sequence Generation Strategies for generic Automatic Intelligent Video Editing. Applied Artificial Intelligence (AAI) [Ed: Hiroaki Kitano], Vol. 11, No. 4, pp. 367-388.

Butz, A. (1995). BETTY - Ein System zur Planung und Generierung informativer Animationssequenzen (Document No. DFKI-D-95-02). Deutsches Forschungszentrum fur Kunstliche Intelligenz GmbH.

Chakravarthy, A., Haase, K. B., & Weitzman, L. (1992). A uniform Memory-based Representation for Visual Languages. In B. Neumann (Ed.), ECAI 92 Proceedings of the 10th European Conference on Artificial Intelligence, (pp. 769 - 773). Wiley, Chichester: Springer Verlag.


Chakravarthy, A. S. (1994). Toward Semantic Retrieval of Pictures and Video. In C. Baudin, M. Davis, S. Kedar, & D. M. Russell (Ed.), AAAI-94 Workshop Program on Indexing and Reuse in Multimedia Systems, (pp. 12 - 18). Seattle, Washington: AAAI Press.

Davenport, G., Aguierre Smith, T., & Pincever, N. (1991). Cinematic Primitives for Multimedia. IEEE Computer Graphics & Applications (7), 67-74.

Davenport, G., & Murtaugh, M. (1995). ConText: Towards the Evolving Documentary. In ACM Multimedia 95 - Electronic Proceedings. San Francisco, California: November 5-9, 1995. http://ic.www.media.edu/icPublications/gdlist.html

Davis, M. (1995) Media Streams: Representing Video for Retrieval and Repurposing. Ph.D., MIT.

Del Bimbo, A., Vicario, E., & Zingoni, D. (1992). A Spatio-Temporal Logic for Sequence Coding and Retrieval. In IEEE Workshop on Visual Languages, (pp. 228 - 231). Seattle, Washington: IEEE Computer Society Press.

Del Bimbo, A., Vicario, E., & Zingoni, D. (1993). Sequence Retrieval by Contents through Spatio Temporal Indexing. In IEEE Symposium on Visual Languages, (pp. 88 - 92). Bergen, Norway: IEEE Computer Society Press.

Domeshek, E. A., & Gordon, A. S. (1995). Structuring Indexing for Video. In J. Lee (Ed.), First International Workshop on Intelligence and Multimodality in Multimedia Interfaces: Research and Applications.. Edinburgh University: July 13 - 14, 1995.

Gregory, J. R. (1961) Some Psychological Aspects of Motion Picture Montage. Ph.D. Thesis, University of Illinois.

Haase, K. (1994). FRAMER: A Persistent Portable Representation Library. In ECAI 94 European Conference on Artificial Intelligence, (pp. 732- 736). Amsterdam, The Netherlands.

Hampapur, A., Jain, R., & Weymouth, T. E. (1995a). Indexing in Video Databases. In Storage and Retrieval for Image and Video Databases II, (pp. 292 - 306). San Jose, California, 9 - 10 February 1995: SPIE.

Hampapur, A., Jain, R., & Weymouth, T. E. (1995b). Production Model Based Digital Video Segmentation. Multimedia Tools and Applications, 1, 9 - 46.

International Standard Z39.50: "Information Retrieval (Z39.50): Application Service Definition and Protocol Specification". http://lcweb.loc.gov/z3950/agency/

Isenhour, J. P. (1975). The Effects of Context and Order in Film Editing. AV Communication Review, 23(1), 69 - 80.


Lenat, D. B., & Guha, R. V. (1990). Building Large Knowledge-Based Systems - Representation and Inference in the Cyc Project. Reading, MA.: Addison-Wesley.

Lenat, D. B., & Guha, R. V. (1994). Strongly Semantic Information Retrieval. In C. Baudin, M. Davis, S. Kedar, & D. M. Russell (Ed.), AAAI-94 Workshop Program on Indexing and Reuse in Multimedia Systems, (pp. 58 - 68). Seattle, Washington: AAAI Press.

Mackay, W. E., & Davenport, G. (1989). Virtual Video Editing in Interactive Multimedia Applications. Communications of the ACM, 32(7), 802 - 810.

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1993). Introduction to WordNet: An On-line Lexical Database (ftp://clarity.princeton.edu/pub/wordnet/5papers.ps). Cognitive Science Laboratory, Princeton University.

Nack, F. (August 1996) AUTEUR: The Application of Video Semantics and Theme Representation for Automated Film Editing. Ph.D. Thesis, Lancaster University

Nack, F. and Parkes, A. (1995). AUTEUR: The Creation of Humorous Scenes Using Automated Video Editing. Proceedings of JCAI-95 Workshop on AI Entertainment and AI/Alife, pp. 82 - 84, Montreal, Canada, August 19, 1995.

Nack, F. and Parkes, A. (1997) Towards the Automated Editing of Theme-Oriented Video Sequences. Applied Artificial Intelligence (AAI) [Ed: Hiroaki Kitano], Vol. 11, No. 4, pp. 331-366.

Nagasaka, A., & Tanaka, Y. (1992). Automatic video indexing and full-search for video appearance. In E. Knuth & I. M. Wegener (Eds.), Visual Database Systems (pp. 113 - 127). Amsterdam: Elsevier Science Publishers.

Oomoto, E., & Tanaka, K. (1993). OVID: Design and Implementation of a Video-Object Database System. IEEE Transactions On Knowledge And Data Engineering, 5(4), 629-643.

Parkes, A. P. (1989a) An Artificial Intelligence Approach to the Conceptual Description of Videodisc Images. Ph.D. Thesis, Lancaster University.

Parkes, A. P. (1989c). Settings and the Settings Structure: The Description and Automated Propagation of Networks for Perusing Videodisk Image States. In N. J. Belkin & C. J. van Rijsbergen (Ed.), SIGIR '89, (pp. 229 - 238). Cambridge, MA:

Parkes, A. P. (1992). Computer-controlled video for intelligent interactive use: a description methodology. In A. D. N. Edwards &. S.Holland (Eds.), Mulimedia Interface Design in Education (pp. 97 - 116). New York: Springer-Verlag.


Parkes, A., Nack, F. and Butler, S. (1994) Artificial intelligence techniques and film structure knowledge for the representation and manipulation of video. Proceedings of RIAO '94, Intelligent Multimedia Information Retrieval Systems and Management, Vol. 2, Rockefeller University, New York, October 11-13, 1994.

Pentland, A., Picard, R., Davenport, G., & Welsh, B. (1993). The BT/MIT Project on Advanced Tools for Telecommunications: An Overview (Perceptual Computing Technical Report No. 212). MIT.

Sack, W., & Davis, M. (1994). IDIC: Assembling Video Sequences from Story Plans and Content Annotations. In IEEE International Conference on Multimedia Computing and Systems. Boston, Ma: May 14 - 19, 1994.

Sack, W., & Don, A. (1993). Splicer: An Intelligent Video Editor (Unpublished Working Paper).

Tonomura, Y., Akutsu, A., Taniguchi, Y., & Suzuki, G. (1994). Structured Video Computing. IEEE MultiMedia, 1(3), 34 - 43.

Ueda, H., Miyatake, T., Sumino, S., & Nagasaka, A. (1993). Automatic Structure Visualization for Video Editing. In ACM & IFIP INTERCHI '93, (pp. 137 - 141).

Ueda, H., Miyatake, T., & Yoshizawa, S. (1991). IMPACT: An Interactive Natural-Motion-Picture Dedicated Multimedia Authoring System. In Proc ACM CHI '91 Conference on Human Factors In Computing Systems, (pp. 343-450).

Yeung, M. M., Yeo, B., Wolf, W. & Liu, B. (1995). Video Browsing using Clustering and Scene Transitions on Compressed Sequences. In Proceedings IS&T/SPIE '95 Multimedia Computing and Networking, San Jose. SPIE (2417), 399 - 413.

Zhang, H., Kankanhalli, A., & Smoliar, S. W. (1993). Automatic Partitioning of Full-Motion Video. Multimedia Systems, 1, 10 - 28.

We are still seeking references regarding ANSI guidelines for Multi-lingual Thesaurus and International radio and television typology.

5.1 User agent driven media selection and filtering

Maes, P. (1994b) “Modeling Adaptive Autonomous Agents,” Journal of Artificial Life, vol. 1, no. 1/2, pp. 135 - 162.

Hanke Fjordhotel, Norway, August 9-12, 1997.Parise, S., S. Kiesler, L. Sproull, & K. Waters (1996) “My Partner is a Real Dog: Cooperation with Social Agents,” In proceedings of Conference on Computer Supported Cooperative Work /CSCW 96, edited by M. S. Ackermann, pp. 399 - 408, Hyatt Regency Hotel, Cambridge, Mass.: ACM.


Rossetto, L., & O. Morton (1997) “Push!,” Wired, no. 3.03 UK, March 97 , pp. 69 - 81.

5.2 Intelligent multimedia presentation

Andre, E. (1995). Ein planbasierter Ansatz zur Generierung multimedialer Pr‰sentationen., Ph.D., Sankt Augustin: INFIX, Dr. Ekkerhard Hundt.

Andre, E., & Rist, T. (1994). Multimedia Presentations: The Support of Passive and Active Viewing. In AAAI Spring Symposium on Intelligent Multi-Media Multi-Modal Systems, (pp. 22 - 29). Stanford University: AAAI.

Maybury, M. T. (1991) “Planning Multimedia Explanations using Communicative Acts,” In proceedings of Ninth National Conference on Artificial Intelligence, AAAI-91, pp. 61 - 66, Anaheim, CA: AAAI/MIT Press.

Riesbeck, C. K., & Schank, R. C. (1989). Inside case-based reasoning. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

6.3 Universal Access

Skodras A.N., and Christopoulos C., "Down - sampling of compressed images in the DCT domain", European Signal Processing Conference (EUSIPCO), 1998 .

J. R. Smith, R. Mohan, and C.-S. Li, “Transcoding Internet Content for Heterogeneous Client Devices”, IEEE Intern. Conf. Circuits and Systems, June, 1998.

J. R. Smith, R. Mohan, and C.-S. Li, “Content-based Transcoding of Images in the Internet”, IEEE Intern. Conf. Image Processing, Oct., 1998.

C.-S. Li, R. Mohan, and J. R. Smith, “Multimedia content description in the InfoPyramid”, IEEE Intern. Conf. Acoustics, Speech and Signal Processing, May, 1998.

Björk N. and Christopoulos C., "Transcoder Architectures for video coding", IEEE International Conference on Acoustic Speech and Signal Processing (ICASSP 98), Seattle, Washington, Vol. 5, pp. 2813-2816, May 12-15, 1998.

Björk N. and Christopoulos C., "Transcoder Architectures for video coding", IEEE Transactions on Consumer Electronics, Vo. 44, No. 1, pp. 88-98, February 1998.

S. Paek and J. R. Smith, “Detecting Image Purpose in World-Wide Web Documents”, SPIE/IS&T Photonics West, Document Recognition, January, 1998.

R. Mohan, J. R. Smith and C-S. Li. Multimedia Content Customization for Universal Access, Multimedia Storage and Archiving Systems, SPIE Vol 3527, Boston, November 1998.

R. Mohan, J. R. Smith and C-S. Li. Adapting Multimedia Internet Content for


Universal Access, IEEE Transactions on Multimedia, Vol. 1, No. 1, March 1999.

R. Mohan, J. R. Smith and C-S. Li. Adapting Content to Client Resources in the Internet, IEEE Intl. Conf on Multimedia Comp. and Systems ICMCS99, Florence, June 1999.

R. Mohan, J.R. Smith and C-S. Li, Content Adaptation Framework: Bringing the Internet to Information Appliances, Globecom 99, Dec 1999.

6.5 Educational Applications

Tagg, Philip (1981). On the Specificity of Musical Communication. Guidelines for Non-Musicologists.SPGUMD 8115 (23 pp.)

Tagg, Philip (1984). Understanding 'Time Sense': concepts, sketches, consequences. Tvorspel: 21-43. [Forthcoming on-line, see Tagg 1987].

Tagg, Philip (1990). Music in Mass Media Studies. Reading Sounds for Example. PMR 1990: 103-114


Annex B: An example architecture for MPEG-7 Pull applicationsA search engine could freely access any complete or partial description associated with any AV object in any set of data, perform a ranking and retrieve the data for display by using the link information. An example architecture is illustrated in fig.1.

Figure 2: Example of a client-server architecture in a MPEG-7 based data search.