a scalable toolkit for designing multimedia authoring environments

Multimedia Tools and Applications, 23, 257–279, 2000c© 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.

A Scalable Toolkit for Designing MultimediaAuthoring Environments

M. JOURDAN [email protected]. ROISIN [email protected]. TARDIF [email protected] Project, INRIA Rhone-Alpes, 655 avenue de l’Europe, 38330 Montbonnot, France

Abstract. This paper introduces Kaomi, a scalable toolkit for designing authoring environments of multimediadocuments. The underlying concept is to provide the designer of multimedia applications with a fast method toget an authoring system based on a set of synchronized views (the presentation view for displaying the document,the scenario view for showing the temporal organization of the document,. . . ) so that each view is the support ofediting actions. Kaomi is flexible enough to support a variety of multimedia documents declarative formats. Itis indeed a scalable toolkit since it provides facilities for extending and/or for modifying the resulting authoringenvironment. In addition, cross-platform portability is provided which allows operation in the heterogenousInternet environment. The use of Kaomi is mainly described through the design of two authoring environments:one for authoring a sub-set of the Smil standard of the W3C and the other one for Madeus, a constraint basedmultimedia language.

Keywords: multimedia document, multimedia toolkit, SMIL, multiview, authoring environment, multimediaedition

1. Introduction

The work presented in this paper lies on the grounds of two observations: first, creatingmultimedia documents is still a complex task, and second, numerous languages and formatsexist for achieving the description of such documents. This paper is a contribution to theemergence of new authoring tools for the end-user.

Multimedia documents combine in time and space different types of elements such asvideo, audio, still-picture, text, synthesized image,. . .Compared to classical documents,multimedia documents are characterized by their inherent time-related dimension. Basicmedia objects, like video, have intrinsic duration. Furthermore, media objects can be orga-nized in time by the author which adds to the document a time structure calledthe temporalscenario. Finally, a browsing structure can also be defined over the objects. Therefore,a document results in a logical entity that encompasses the properties of the objects aswell as the time, space and hyperlink relations between objects. Such an entity can berendered thanks to a presentation engine using the computer output devices (screen andspeaker).

Within the past decade, a number of research works (Cmifed [21], Firefly [2], HTSPN[19], Hyperprop [18], Isis [11], Madeus [8]), have introduced various ways of specifyingtemporal scenarios, focusing on a particular understanding of time synchronization. Somestandards have also been defined for covering temporal specification needs: HyTime [7],

258 JOURDAN, ROISIN AND TARDIF

MHEG [15] and Smil [23] are the most significant examples. It is now time to provide userswith the ability to create their own multimedia documents.

Today, authors of multimedia documents have often to be programmers because it isthe only way they can specify the complex synchronization of their documents (Lingoscripts in Director [12] for example). In order to increase the popularity of such multimediaapplications, computer-illiterate people must have direct access to multimedia documentcreation. That will also drastically reduce production cost of multimedia titles.

Consequently, a key step in research work lies in the definition of proper user interfacesthat would provide the end-user with effective authoring tools [9]. A satisfactory authoringenvironment will certainly not result by simply wrapping up an existing programminglanguage: not only the author has to deal with too many low level specifications, but alsosuch authoring tools still provide slow development cycles due to the composition-testprocess (as with MhegDitor which is based on a converter tool [5]). One solution wouldbe to take a WYSIWYG approach, yet such a paradigm cannot be directly applied insidemultimedia authoring applications due to the time dimension of multimedia documents.Moreover, the intrinsic complexity of multimedia applications (multiple media objectshaving different behaviors, complex and various synchronizations in time and space, event-based schedules) requires new authoring paradigms. Besides market tools mainly based ontime-line views and programming languages, a number of studies have investigated newways of authoring: edition with graphical interfaces in multiple views such as provided bythe GRINS authoring tool [3], edition through the use of templates such as in MediaDesc[4] and RealNetwork G2 editor [17] or ad-hoc authoring environments generated for eachclass of multimedia document [14].

As presented in [9], we think that a proper authoring environment for multimedia doc-uments should be based on multiple views: the “main” view in which the document isplayed and various other views conveying comprehending information on the document:its structure, the existing temporal relations and so on. These views can be synchronizedon object selection and each one must be the support of editing actions. Finally, a key pointis that the author can directly change the document in the presentation view, by stoppingthe execution of the document and then selecting the objects on which editing actions areperformed (for instance, to insert a temporal or a spatial relation). This basic manifoldfunctionality allows an easier authoring task and approaches the WYSIWYG paradigm asprovided in editors of static documents.

Our claim is that these editing principles can be implemented independently from the kindof declarative language used to specify the documents. This is what we are proposing withKaomi, a scalable toolkit for designing multimedia authoring environments. This toolkitis intended to help the designers of authoring environments, assisting them with servicessuch as:

• the definition of a set of predefined views,• some facilities to extend this set of predefined views,• synchronization between views on object selection to help the end-user of the designed

authoring tool to navigate through the different views.• editing capacities of the document in each view with data consistency handling.

A SCALABLE TOOLKIT 259

Figure 1. Kaomi toolkit.

Using Kaomi will reduce the effort to build authoring environments devoted to differentcontexts: the context can be the underlying model and language, the application area and/orthe author skills. In this paper, we will mainly focus on the way to define authoring toolsfor different declarative multimedia languages.

Kaomi can be compared to multimedia toolkits such as the Sun Java Media Framework(JMF) [20] or the Berkeley Continuous Multimedia Toolkit (CMT) [13] and to documentoriented toolkits like Thot toolkit [16]. The JMF toolkit provides a set of Java classes thathandle multimedia objects such as Mpeg video players or WAV audio players. This toolkitis located at a lower-level than Kaomi, and furthermore, Kaomi uses the JMF for the man-agement of audio and video (see figure 1). The CMT of Berkeley provides a programmingenvironment for rapid development of continuous multimedia applications such as video-on-request or interactive television. It is based on predefined objects that can perform mediaspecific operations (e.g., capture, store, play, etc.) and a simple programming model forbuilding application based on the Tcl/Tk language. This multimedia toolkit is more generalthan Kaomi since it allows to build a variety of multimedia applications with it, whereasKaomi is specifically dedicated to authoring environments for multimedia documents. TheThot toolkit provides similar services to Kaomi to build document-oriented applicationsbut it does not take into account temporal information. It is based on XML documentstructuration and uses declarative languages for the specification of spatial properties andapplication design. The application can use, modify or extend standard Thot editing func-tions. It can also receive control information during the execution of any standard function,by asking for an event notification. In Kaomi we use more object-oriented facilities to allowthe creation of new applications on top of the toolkit.

The rest of this article is devoted to the description of the Kaomi toolkit and to theresulting authoring environments that can be built with it: in Section 2, we develop themain features of such resulting environments; Section 3 presents the Kaomi architec-ture and focuses on the core mechanisms of this toolkit; finally, Section 4 demonstratesthe use of this toolkit in the design of two authoring environments: the first one for theW3C Smil standard language and the other one for Madeus, a constraint based multimedialanguage.


2. Authoring environments built with using Kaomi

Multimedia editing environments built with Kaomi are characterized by direct editing ser-vices through multiple synchronized views. More precisely, Kaomi provides:

• A set of predefined views: the textual view, the presentation view, the object viewand the scenario view. Each one can be the support of some editing actions which areautomatically propagated in other views to guarantee data consistency;• Some extension mechanisms for building new views, for instance to display the hyper-

media structure of documents or to show some information which is more specific to theauthoring language;• The synchronization between views when an object is selected by the author: this selected

object is highlighted in other views.

Kaomi toolkit is built upon several existing software packages such as Xml4j, the Swinglibrary and the JMF toolkit (figure 1) and runs on the Java Virtual Machine. The mainaccess to the toolkit is based on object-oriented mechanisms such as inheritance, interfacedefinition and plug-ins (they are described in Section 3.4). In this section, we describe thecharacteristics of the authoring environments built by Kaomi firstly used as a “black box”,and then when using Kaomi extensions for adapting the tool to a specific language.

2.1. “Black box” use of Kaomi

With the basic services provided by Kaomi, authoring tools are multi-documents and multi-views. This means for instance it is possible to play/edit several documents at the same timeor to copy some part of a document in another one. This capability is also useful to define ahyperlink from one document to another one by direct designation of the destination link.

Basic views. The different views through which a document can be shown can be con-sidered as the result of different projections applied on the document, providing the authorwith different perceptions of the document:

• The presentation view, which can be considered as the main view, shows the presentationof the document with basic time control functions such as play, pause and resume.• The object view shows the set of objects that the document contains. Different kinds of

filters can be applied to organize the display of this set. The hierarchical structure ofthe document (if it makes sense in the specification language) can be used as a guidelineto display this set: this means each object appears inside its own hierarchical level.The nature of the objects (video, audio, picture, text,. . . ) and the alphabetical order canprovide other kind of display.• The textual view shows the source format of the document.• The scenario view shows a projection of a specific execution of the document in the

time dimension. However, this view does not only display a representation of that spe-cific schedule, the hierarchical structure of the document and unplayed objects are also


displayed. For example, the complete branches of a parallel Smil structure are displayedeven if a SmilEndsync attribute on aPar node has stopped a parallel branch beforeevery objects in this branch has been played. In addition, visual marks are added inthe view to represent the temporal information given by the author. For instance, iftwo objects were to start simultaneously because of a particular time synchronization(aPar node in Smil, or aStarts constraint in Madeus), the two objects would be linkedin the view by a vertical line.

Multiple view services. In the authoring environments created with Kaomi, it is possibleto open several views of the same document at the same time. It means for instance, that itis possible to open two presentation views of the same document in order to see differenttime points of the presentation.

Thanks to both the definition of a view—it is the result of the application of a specificfilter on the document—and the multi-view capability, each view can be synchronized withor resynchronized from the others.

The default synchronization mode between views is enable. Synchronized views canbe obtained both on object selection and time points. Synchronization on object selectionmeans that when one object is selected in a view, it is highlighted in the other views evenif the display of those views must change to show that object. For instance, if an objectis selected in the object view, the current time point displayed in the presentation viewchanges to become the first time point in the previous presentation of the document inwhich that object appears. Synchronization between time points only makes sense betweentwo presentation views or two scenario views or one presentation view and one scenarioview. This means the current time point displayed is the same in each synchronized view.

In addition, when visual marks are selected in the scenario view (for example, a verticalline corresponding to a SmilPar operator), the corresponding source lines are highlightedin the textual view together with the corresponding nodes in the hierarchical view.

These different ways of synchronization are very helpful to the author in understandingthe current scenario (prior to modifying it) and rapidly navigating through the execution.For example, when selecting a time instant in the scenario view, the presentation viewsimultaneously jumps to that instant and plays the corresponding part of the document.

Editing services. The application provides the author with the usual functions attached toboth documents and views: open/close/create/save a document, open/close a view, selectionand edition of objects. More precisely, common edition operations are:

• Editing object properties (name, location, absolute spatial position, style attributes,. . . ).When the author clicks on an object, an entry form may pop up and prompt for instantiatingthe object properties. The application designer has to provide some interface parametersto help Kaomi to create the language-dependent forms. After each edition step, a methodfor checking the consistency of the values set by the author is called upon. These methodsare language-dependent and have to be provided by the designer.• Inserting/deleting/copying/pasting objects. In order to insert an object or a group of

objects inside the document, the author has to select the level in the hierarchy where itwill be inserted. When an object is removed, each spatial and temporal relation in which


the object is involved is also removed. The copy/paste behavior depends on the set ofobjects to which the function is applied. If the operand is a composite object (a node in thehierarchy of the document), all the objects and the relations between them are involvedin the editing action. If the operand is a set of objects belonging to several compositeobjects, then the function only affects the object themselves and not the relations betweenthem.• Inserting/deleting a spatial or temporal relation between two objects. A set of basic

spatial relations are provided (aligned, centered,. . . ). The author selects two objects andchooses the desired relation through icons; a consistency check is then performed. Thischeck partially depends on the source language, for example, the document format mightdemand the two objects to be at the same hierarchy level. Once consistency has passed,the spatial position of the two objects is updated in the presentation view to apply therelation. The same idea is true for the temporal relations with the following set of basicones: Meets, Parmax, Parmin, Parmasterwhich respectively express that two objectsare sequentially played or simultaneously played with three kinds of ending conditions.The set of temporal relations can be extended by the designer, thanks to Kaomi extensionmechanisms (see 3.4).• Designing an object to be the anchor of a link and selecting the destination. Two kinds of

links are provided: inter and intra document. The destination of an inter-document (resp.intra-document) link is another document (resp. a node of the hierarchy or a basic object).These links have basically the “goto” semantics without maintaining any contextualinformation about the starting point.

In addition to these common services, each view can provide specific services such asdirect spatial placement in the presentation view, textual edition (search, replace,. . . ) inthe textual view or backward and forward operations through the navigation history in thepresentation view.

2.2. The extended use of Kaomi

To allow the designer to tune the authoring environment produced by Kaomi to his appli-cation context (language, authors,. . . ), the toolkit offers him with opportunities to extendand/or modify it:

• To add new views to the set of basic views, for instance to build a view showing thehypermedia structure of the document. If this functionality is implemented in compliancewith the instructions of the Kaomi toolkit, the new view will automatically gain the sameproperties as the existing ones: editing capabilities and synchronization on objects.• To add a new function in a view, for instance allowing in the presentation view the

definition of screen regions and assigning objects to those regions.• To change the actions performed when a button is clicked in a view, for instance to

change the scheduler when the play function is requested by the author. It is also pos-sible to change the constraint solver used to maintain spatial consistency (by default animplementation of the Deltablue solver is used).


• To add a new kind of media format.• To add new temporal or spatial relations.

Next section explains how these extensions can be brought to the basic kernel of Kaomi.

3. The object oriented architecture of Kaomi

This section has a twofold aim: first, presenting the architecture of Kaomi toolkit, andsecond, explaining the reasons why this architecture allows an easy building authoringenvironments for multimedia documents.

Kaomi has been built in Java using an object-oriented architecture that allows the imple-mentation of the services described in the above section:

• Basic authoring services in a multi-documents and multi-views environment.• Extensibility at various application levels: at language level, at authoring level (views

extension), at presentation level (the formatters and the scheduler can be replaced) andat media object level (new media types can be added).

This section outlines the main features of this architecture with a particular focus on thedefinition of basic entities (view, document, graph) and on the synchronization mechanismsas well as the extension mechanisms.

3.1. Kaomi: General overview

The top level structure of Kaomi is built with the following classes, each one being relatedto others through an aggregation relationship as shown in figure 2:

• jKaomimanages the application context (such as user preferences and resource access).

Figure 2. High level classes of Kaomi.


• KaomiManagerimplements the multi-documents function and the access to the servicesshared between documents (such as constraint solvers and schedulers).• DocumentManageris responsible for managing one document and contains the reference

document data structures, the parser, the views manager and the windows manager.• ReferenceDocumentcontains the document internal data structures (tree structure and

graph). This class, described in Section 3.2, is calledReferenceDocumentbecause it is a“reference” class for the document structures managed by the view classes.• Parser contains a standard XML parser and calls external methods for the semantic

actions attached to each language element. The parsing method result is an object ofclassReferenceDocument.• ViewManagermanages the set of opened views of a document through their corresponding

classes, namely theTextualView, theExecutionView, theObjectlViewand theScenario-View. Each view class contains anExtendedDocumentclass that is specific to eachtype of view and that is a copy of theReferenceDocumentstructures extended withview-dependent attributes. TheViewManageralso handles the synchronization on classattributes; this is the way by which both data consistency among views and view syn-chronization on object and time selection are performed (see 3.4.4).• WindowsManagermanages the set of physical windows required for the edition and the

presentation of a document (there is also a specific window class for each view class).

3.2. The referenceDocument class

This class contains all useful information for editing and presenting a document. It isbasically a hierarchical structure corresponding to the logical organization of the documentwhere the leaves represent the basic objects and the non terminal nodes are the compositeobjects (a scene for example). A set of attributes is attached to each node: these attributescarry all the spatial, time and linking properties of the objects.

Thanks to the synchronization mechanism (see 3.4.4), whenever an attribute of theRef-erenceDocumentstructure is modified, all copies of this attribute in theExtendedDocumentclass of the synchronized views are also updated.

In addition to this tree structure, Kaomi uses a graph structure that basically contains thetemporal information required for the presentation of the document. Theexecution graphisthe main class of Kaomi because it is the kernel structure for the edition and the presentationthat allows us to implement the direct editing paradigm. A graph is a set of nodes and arcs.The execution graph of Kaomi is such that the nodes represent the begin and end timepoints of objects. Each arc of this graph is associated with a leaf (basic object) or a node(composite object) in the hierarchical structure of the document. An arc is also labeled withtime information such as the desired duration of the object associated with the arc, if thisduration makes sense.

Finally, two time dependency tables are associated with each node of the execution graph:

• The first one, called thecausality table, is used for the execution of the document (seeSection 3.3). It delivers causality information between an incoming arc (i.e., correspond-ing to the end point of the related object) of the node and other incoming arcs of thatnode, as for instance when the end of object A causes the end of object B.


• The second one, called thesource tablecontains information about the source specifica-tion that is necessary to add to the graph structure in order to display the correspondingvisual marks displayed in the scenario view. Let’s take an example illustrating the kindof information contained in this table. Let’s assume the author expresses that A and B areplayed in sequence, i.e., in the graph, the end node of A will be the same as the begin nodeof B. Thesource tableassociated with this node contains the information that the end ofA implies the start of B, together with the corresponding line number in the textual view.

3.3. The Kaomi scheduler

The scheduler of Kaomi implements the following graph traversing. The scheduler startsplaying each object associated with the outgoing arcs from the beginning node. If the objecthas a required duration, this one is a parameter to thePlaymethod. The scheduler maintainsa set of active nodes which is initiated with the ending nodes of the outgoing arcs fromthe beginning node. Whenever an object notifies its end, the scheduler applies the rulesgiven by the causality table associated with the ending node of that object, interrupting forinstance other running objects. Once all rules have been applied, the scheduler tests if allthe incoming arcs of that node have finished. If it is the case, the scheduler starts the objectsassociated with all the outgoing arcs of the node and updates the set of active nodes.

The semantics attached to the traversal of an arc defined by a subgraph is the following:1) the activation of the begin node of a composite object implies the activation of the firstnode of its subgraph, 2) the termination of the last node of a subgraph implies the termina-tion of the corresponding composite object and 3) the reverse situation is also true, i.e. thetermination of a composite object means that every object inside the composite is ended.

Thanks to the causality table that can be set by the application designer, the schedulercan play different executions from the same graph. For instance, the three kinds of paralleloperators (Parmin, Parmax and Parmaster) give raise to the same graph with differentcausality tables attached to the end node (see Section 4.1).

3.4. Core points of the architecture

In this part, we show how the architectural choices of Kaomi allow its use and its extensionfor the definition of new authoring environments. These features are illustrated throughexamples related to view extension (figures 3 and 4).

Figure 3. Inheritance mechanism for creating new views.


Fig

ure

4.In

terf

ace

and

inhe

rita

nce

for

Tree

man

agem

ents

ervi

ce.


3.4.1. Inheritance. All the entities of the Kaomi architecture (document, view, graph,scheduler) are built following a classical inheritance scheme: a root class contains commonmethods and attributes and its subclasses inherit from this root. For instance, the documentclasses of the views (ExtendedExecutionDocument, ExtendedScenarioDocument, etc.) in-herit from the sameDocumentclass.

Similarly, the View class contains all the common attributes and methods shared byall views. Each view proposed in Kaomi is defined as a child of that class, such as theScenarioViewandExecutionViewclasses. In this way, it is easy for the designer to extendthe set of views of his authoring environment: in adding a new child to theView class,the designer gets in return the common behavior implemented for the views (such as theSpecificLanguageViewclass in figure 3 with the inheritance relation labeled with number(1)). Thanks to the same inheritance mechanism, it is also possible to extend an existingview (by adding new services) such as theExecutionView’class of figure 3 which inheritsfrom theExecutionViewclass (arrow (2)).

3.4.2. Interfaces. The Java language provides designers with means to put together a setof methods or classes in a so-calledinterface. It can be used for instance to define commonconstants. The first well-known advantage of using this interface mechanism when buildingan application lies in the fact that the application designer can change an implementationwithout necessarily updating every part depending on the one being modified. Yet, with asingle condition: the new implementation must implement the same interface.

Kaomi makes use of interfaces to put together a set of definitions of methods that definesa service in order to allow the substitution of the implementation of that service by anotherone. This is true in the case of both theSchedulerand theParserclasses. They can bereplaced by new ones, the only requirement is that these new entities must implement theset of defined interfaces.

The benefits of using interface are largely augmented when this mechanism is coupledwith inheritance. It provides indeed powerful facilities to extend services and dynamicallyaccess the desired method implementation when a service is called through its interfacedefinition.

Lets take the example of figure 4 where some of the classes related to document structuresand views are represented. As explained in Section 3.2, for each view, a corresponding docu-ment structure is defined (namelyScenarioViewDocumentand theObjectViewDocument) onthe figure. These classes inherit from theDocumentclass. The arrows labelled with number(1) show the resulting dynamic link of the method calls that are performed through the inter-faceScenarioViewDocumentInterfaceand theObjectViewDocumentInterfacerespectively.

In order to reuse the services provided by theObjectViewclass, for instance to hierarchi-cally display the specific document structure of theScenarioViewDocument, the interfacemechanism can be used as follows: first, the basic tree manipulation methods are groupedinto an interface namedTreethat is implemented by theDocumentclass; second theOb-jectViewcan access to these services either by theObjectViewDocumentInterfaceinterface(arrow (a)) or by theTreeinterface (arrow (b)). This later case allows a dynamic binding ofthe methods of theScenarioViewDocumentclass when they are called by theObjectViewclass in order to display this document structure (see arrow (2)).


3.4.3. Plug-ins. An application which implements a plug-in mechanism provides the de-signer with a way to register an external “piece of code” to be dynamically called on specificdata of the application. For instance, the well-known Netscape browser provides users witha way to dynamically register (without compiling the application) a specific kind of Mpegviewer to be called when a Mpeg video is present in a HTML document. The same exten-sions are possible with the Kaomi toolkit. This kind of plug-in is said to be external becausethey do not communicate with the application. More “linked” plug-ins can also be definedif both mechanisms of interface and dynamic linking are used. In such cases the interfacemechanism is used to be sure that the plugged module contains the set of necessary methodsto communicate.

3.4.4. Synchronization through class attributes.One core mechanism of the Kaomitoolkit has to do with the synchronization of class attributes (see 3.1). Each attribute ofa class can be synchronized with the value of another attribute. The last one is called themasterof the first one, the former aslaveattribute. When the value of a master attribute ischanged, each slave attribute which is synchronized with it is automatically updated withthe new value thanks to a notification mechanism. This is the only way to change the valueof a slave attribute. Finally, a slave attribute can break this synchronization scheme with itsmaster attribute if needed.

This mechanism of synchronization can provide a consistent editing service in any viewof a Kaomi authoring environment. The document attributes attached to a given view class(they belong to the correspondingExtendedDocumentclass) are slave attributes of theReferenceDocumentattributes. Thus each editing operation performed through that view isfirstly applied on theReferenceDocumentattributes which notify their modification to theirslave attributes in the other views. Synchronization between views on object selection isalso achieved in the same way.

3.5. Synthesis

These basic object-oriented features described in the previous sections are the ground onwhich we allow both extension and code redefinition of the toolkit. May be the most originalpart of Kaomi architecture relies on synchronization. We have chosen to provide this featureat a low level of that software: every attribute of any class can potentially be synchronizedon the value of another one. This allows to easily provide a generic management of viewsynchronization which is of high importance during edition. However it is necessary toallow some tuning facilities in some situations: for instance, the author may want to desyn-chronize two execution views of the same document in order to simultaneously displaytwo different parts of the document. Synchronization can be done on selective criteria: forinstance on time instant selection or on object selection. It is however worth noting that viewdesynchronization can bring to cognitive extra load and to ambiguities at the interface level.

4. Use of Kaomi

We illustrate in this section the use of Kaomi for developing multimedia authoring tools.We first give some principles on how to use Kaomi and then we describe the design of two


multimedia authoring systems: Smil-Editor [10], an authoring environment for the Smillanguage and Madeus, a constraint based authoring tool [8].

4.1. Principles

In order to use Kaomi, the designer firstly has to provide the Kaomi toolkit with a com-plete parser (syntactic parser and semantic actions) of the source language. In fact, Kaomiprovides the designer with a XML parser (namely Xml4j, see figure 1) which checks thesyntactic conformance of the document and generates a call to a specific semantic actioneach time a XML tag is found. It means that if the language is specified by an XML DTD,the application designer has only to write the different methods associated to each semanticaction. These semantic actions aim at building the internal structures on which Kaomi isbased:

• The hierarchical structure of the document (see 3.2). As this structure aims at providingthe author with the logical organization of the document, the methods used by the parser toproduce this structure have to properly interpret the source document in order to change thestandard XML structure if required. Some XML elements are transformed into attributesof the internal tree. This is typically the case for the spatial and temporal relations elementsof Madeus syntax that become attributes of their parent node (a composite element) inthe internal tree.• The execution graph structure and the associated tables that give the correct interpretation

of that graph to the scheduler and the scenario view (see 3.2 and 3.3).

The designer must also provide the code for the save operation, mainly to perform thereverse transformations between the Kaomi tree structure and the XML source structure.This code can be based on the services of Kaomi: a generic save method (simple treetraversing for flatting it) and a tree manipulation toolkit (see figure 4). Semantic methodscan be attached to the node and attribute traversing in order to create the desired targetdocument.

A set of configuration files must be filled by the application designer to specify thedesired behavior of the toolkit classes such as form rendering, temporal and/or spatialrelation displaying, etc.

4.2. Illustration through the design of a Smil editor

Smil (Synchronized Multimedia Integration Language) [23] defines a general documentformat that aims at integrating different types of independent media objects. The organiza-tion of media objects in the document is given in terms of time composition: both sequentialand parallel operators are available together with synchronized attributes that can be usedto specify fine synchronization between objects.

Smil format is defined as an XML DTD and hyperlinks follow XLink specifications. ASmil document is made of two parts (see the example given in figure 5): theHeadpartthat contains information at document level (basically the spatial organization in terms of


Figure 5. A source Smil example.

Regions) and theBodypart that contains the document scenario. A scenario is a hierarchicalstructure of parallel or sequential schedules.

The current state of Smil-Editor, the Smil editor built using Kaomi, takes into accountthe following subset of Smil: spatial regions are only defined by absolute coordinates,time synchronization can be expressed usingseqandpar node withbegin, end, durandendsyncattributes (the last one is only meaningful forpar nodes). Finally media objectelements such as video, audio, text and picture can havebegin, endanddur attributes. Thefill attribute associated with an object is not yet implemented and we have chosen to selectthe valueremoveby default.

In order to design this authoring tool for Smil documents, we have mainly defined a newSmilParserclass that inherits from the KaomiParserclass and a newSmilReferenceDocu-mentthat inherits from the KaomiReferenceDocument.

The tree structure built by theSmilParseris slightly different from the original XMLstructure because theRegionelements of theBodypart of the document are transformedinto spatial attributes.

The methods written for theSmilParseraim at building the graph associated with aSmil document. The part (b) of figure 6 gives an example of the graph (without thetwo dependency tables) associated with the Smil document of figure 5 whose hierarchicalstructure is shown in the (a) part of figure 6. An arc is associated to each node of the tree,plus the arc F’ which is used to handle the “begin= 3” attribute of F object. An additionaldelay (delay3) has also been created. When an arc is associated with a composite node ofthe hierarchy (for instance the SEQ2 arc), this arc refers to the subgraph delineated by theassociated dotted lines. An integer labels an arc whenever the object associated with thearc has a specified desired duration (for the audio A for example).


Figure 6. Kaomi tree and graph for a Smil document.

In this example, there is only one non emptycausality tableattached to a node of thegraph: that node corresponds to the end of thePARelement and the table expresses that theend of object SEQ2 forces the end of the other arcs of the PAR subgraph, namely the A andD objects. As already explained in Section 3.3, ending a composite object means that eachobject inside that composite is ended.

Let’s describe how this subgraph is built. The hierarchical structure of figure 6(a) isrecursively traversed from the root. Each time a leaf is detected (it is a basic object), aspecific action is performed. This action returns the arc associated with the object andin some cases gives contextual information such as when the object has a begin or aend attribute which refers to the begin or the end of another object (D in our example).On each hierarchical node, the sequence of actions to apply takes as parameter the re-sulting subgraph of its children and contextual information about the time dependenciesamong them. These actions depend both on the nature of the node and on its attributes.For example, the actions performed for the PAR node take as parameter itsendsyncat-tribute (endsync= “id(SEQ2”) ) and the fact that the begin of D must be equal to theend of A.

The main different situations for the construction of the time graph are illustrated infigures 7, 8 and 9 which give the three steps of the graph construction of our example: thefirst step corresponds to the analysis of the SEQ2 node, the second one to the analysis ofthe PAR node and the last one is related to the SEQ1 node.

• SEQ2 defines a sequence schedule without any time attribute nor contextual information:the end node of the subgraph resulting of the first operand must merge with the beginnode of the subgraph resulting of the second operand. On this merged node, thesourcetablemust be extended with the information linking the end point of the first operand andthe begin point of the second operand together with the line number of that sequentialoperator in the source document (see figure 7).


Figure 7. Graph construction for SEQ2 node.

Figure 8. Graph construction for the PAR node.

Figure 9. Graph construction for the higher SEQ1 node.

• PAR is a parallel object which ends as soon as its SEQ2 operand has finished (becauseof theendsyncattribute). This semantics is recorded into thecausality tableassociatedwith the ending node of PAR. In addition, the recursive call of the graph generationmethod on its D child results in the information that the begin of D must be equal to theend of A. Therefore, the two arcs A and D must be put in sequence with the appropriatecorresponding information in thesource table(see figure 8). Finally, the two begin points(resp. end points) of arcs A followed by D and SEQ2 must be merged.• SEQ1 is evaluated similarly to SEQ2 and results in the graph of figure 9.

Figure 10 shows a screen dump of the Smil authoring environment. The author can playthe current version of the document in the execution view. He can stop at any interest-ing time point to modify the scenario or the spatial placement. Authoring can be performedthrough the four basic views: theexecution view, theobject viewwhich displays the set ofthe document objects using the hierarchy as a filter, thescenario viewand thetextual view.Attribute values can be set or modified using the associated form. Parallel and sequentialoperators can be placed by selection of the objects in the execution view or in the object view.


Figure 10. The Smil authoring environment built with Kaomi.

For that purpose a fictive root is automatically created in the object view in order to placeevery new object as a child before any further structuration. Synchronization between viewscan help in understanding the current scenario before performing a new editing operation.In our example, the scenario view properly displays the sequential placement of objects Aand D (because of the attributebegin = "id(A) (end)" on the D object) though theseobjects are involved in a parallel construction as shown by the corresponding hierarchicaldisplay of the object view.

4.3. Illustration with the design of the Madeus editor

Madeus is the authoring environment of multimedia documents which is the outcome ofresearch works performed in the Opera project since 1994 [10]. It is based on a con-straint language used to specify multimedia documents. We present here the version of thisauthoring environment that has been developed with the Kaomi toolkit.

The author of a Madeus document can describe the spatial and time organization by settingconstraints between basic or composite objects. A composite object is a group of basic orother (hierarchical definition) composite objects. Constraints can express spatial and timesynchronization such as: two videos must be vertically centered and must be presentedduring the same period of time. A spatial formatter and a temporal formatter compute theposition of media objects in both the spatial and the time dimensions. Madeus uses classicalspatial constraints such as align, center and shift, on both vertical and horizontal axes. Oncea constraint is set, it is held during the whole presentation.


Time constraints used in Madeus are based on Allen’s algebra [1]: Madeus uses theoperatorsEquals, Starts, Before, etc. Basic objects of the scenario are associated with arange of possible durations which can be defined by the author himself or automaticallyassigned by the system (for instance, the interval 5′′ to 120′′ for the images).

Madeus uses a textual format for storing the declarative specification of the hierarchical,spatial, time and navigational organization of multimedia documents. A document has thesame structure as a composite object: a list of basic and/or composite objects. Spatial, timeand style attributes can be attached to basic objects (duration bounds of an object, font for atextual object, etc.). In addition,Spatial-RelandTemporal-Relattributes can be attached tocomposite objects: they contain the list of temporal and spatial relations specified betweenthe components of the composite objects. Another kind of attribute is the hypermedia link,represented in an HTML-like way where the destination anchors are designated using theirURLs.

As the Madeus format is described by a XML DTD, we have used the XML parserof Kaomi. We have specified the set of semantic actions in order to build the appropriatehierarchical structure and the time graph with its two dependency tables.

The graph structure generation is mainly based on a set of equivalence rules betweenthe set of Allen’s relations and time point equalities. For instance, the relationA Starts Bis translated into the equality between start points of the two objects. Each object of thedocument is associated with an arc labeled with its range of possible durations. Then, eachtime a new relation is parsed, the rules specify which nodes must be fused and which delaymust be inserted between two nodes. The source table associated to the nodes involved inthe relation can also be produced thanks to these equivalence rules.

No information needs to be inserted in the causality table, since Madeus language isbased on the idea that the duration of each object can be computed before the presentationof the document.

Figure 11 below shows the four opened views of the same document as the exampletaken for the Smil-Editor environment. The source syntax is obviously different (see theRelations part of the Textual view) and the hierarchical structure is not defined in terms oftemporal composition but simply of object grouping.

The editing operations are mainly inherited from the Kaomi toolkit. The result is closeto what is obtained in the Smil-Editor environment. New interesting features are the incre-mental consistency checking and advanced scenario view manipulations:

• After each editing operation, the temporal formatter is called in order to verify if thisoperation is consistent with the previous set of constraints and to compute object durationsso that they meet the constraints. If this formatting phase is successful, the resulting graphcan be used by the scheduler; if an inconsistency has been detected, a diagnosis is returnedto the author who has to change his temporal specification.• The Madeus scenario view is not only the visualization of object temporal placements

issued from the last execution as it is the case in Smil-Editor. It also provides the authorwith a way to grasp the set of consistent solutions: from the display of one consistentsolution, the access to other solutions is possible by direct manipulation [9]. In otherwords, the author can select an object and move it along the temporal axis in order to


Figure 11. The Madeus authoring environment.

dynamically see when the selected object can be played. With real time performances,the scenario view computes the displacements of other objects in order to react to the“moving object”. Only consistent solutions can be reached, i.e. the author cannot movean object to a position that the object could never reach with the set of constraints.

Thanks to the direct editing facilities given by Kaomi and this incremental consistencychecking, Madeus provides a comfortable end-user authoring tool.

4.4. Evaluation

We have experimented the Kaomi toolkit internally in our research project for the de-velopment of Madeus and Smil-Editor. The Madeus authoring environment has also beenevaluated by two companies: these experience returns have been very encouraging and havevalidated the basic choices of the authoring services provided by the toolkit (synchronizedmultiple views, direct edition in the execution view).

In the table below, we display the current state of the implementation (in term of numberof source lines) of the two authoring environments created with Kaomi that have beendescribed in the previous sections. What is interesting to notice is the ratio between thewhole code of each authoring environment and the specific code it requires: only 5% of


specific code has been written. That means that the architectural choices taken in the toolkithave permitted to meet our objectives:

• A clear separation of language-dependent parts from “generic” services (media players,basic scheduling policy, multiple views management, etc.).• Code reutilization through an ad-hoc object-oriented programming.• Extensibility through simple mechanisms.

The toolkit has currently approximatively 75.000 lines of code. It implements the fourpreviously mentioned views (Scenario View, ExectutionView, ObjectView, TextualView).Each view provides the set of basic authoring facilities as described in Section 2.1.

The specific code written for the Madeus application is devoted to three main functions:parsing the Madeus source code, saving the document into the Madeus format and specificediting operations. These extensions manage the set of Madeus specific temporal and spatialrelations. For that purpose a constraint solver has been added in order to manage consistencychecking and temporal formatting. This solver is based on the PC-2 [6] algorithm.

Similarly, the configuration files defined for the Smil-Editor authoring tool contains theinterpretation of Smil specific temporal relations (par, seq) and temporal attributes (begin,end). This application allows the direct authoring of Smilregionsin the execution view bythe direct placement of the objects in that view.

Toolkit Kaomi Specific modules(nb. of source lines) (nb. of source lines) Total Ratio

Madeus 77383 4250 81633 5.2%

Smil-Editor 77383 3224 80607 3.9%

5. Conclusion

The Kaomi toolkit outlined in this paper is characterized by its very intensive use of theobject-oriented (Java) principles for supporting the definition of multimedia environments.More specifically, Kaomi contributes in the multimedia authoring field in the followingways:

• It provides genuine and effective end-user authoring tools through direct and multiviewediting paradigms.• It provides a set of shared authoring services (multi-document and multiview manage-

ment, XML parser, spatial and temporal formatters, media managers).• It allows ad-hoc services to be built, either as new services or as extensions of existing

services in the toolkit.• It offers a multimedia presentation engine that is (as far as possible) independent from

the specification language. Thanks to the graph structure of Kaomi (arcs labeled withduration intervals and with a causality table), the scheduler can be used from predictiveexecution (using a time formatter to set duration such as for Madeus) to reactive execution


(with the causality table, time dependencies can be conveyed between objects as in theSmil-Editor tool).

It is worth noting this toolkit aims at allowing the creation of authoring tools that reallysuit to the authoring languages for which they have been built thanks to the view extensionfacility and the automatic adaptation of attribute forms. It is not a centralized approach witha single tool in which different formats are imported. This latter method induces a greaterdeal of semantic loss due to import/export functions and differences between underlyingmodels.

Kaomi can be used for creating conversion tools between multimedia formats, with thesame semantic limitation as in the import/export functions. However, Kaomi’s internal treeand graph structures are able to provide a useful support of such services: a number ofconversion needs can be met through tree transformation methods as provided with XSL-based transformation mechanisms [22] for structured documents.

In this paper, we have described two environments created with Kaomi. They reflect twodifferent approaches of multimedia specification. We are also currently developing a thirdone designed for a event-based MHEG language with the objective of assessing authoringneeds for event-based specifications.

In the next steps of this study, intensive experiments will have to be carried on both atdesigner level (for example, enhancing the spatial properties authoring for Smil) and atend-user level.

References

1. J.F. Allen, “Maintaining knowledge about temporal intervals,” CACM, Vol. 26, No. 11, pp. 832–843,1983.

2. M.C. Buchanan and P.T. Zellweger, “Automatic temporal layout mechanisms,” in Proc. ACM Multimedia 93Conference, Anaheim, August 1993, pp. 341–350.

3. D.C.A. Bulterman, L. Hardman, J. Jansen, K.S. Mullender, and L. Rutledge, “A GRaphical INterface for cre-ating and playing Smil documents,” in Proc. WWW-7, Computer Networks and ISDN Systems 30, Brisbane,Australia, April 1998, pp. 519–529.

4. A. Caloini, D. Taguch, K. Yanoo, and E. Tanaka, “Script-free scenario authoring in MediaDesc,” in Proc.ACM Multimedia 98 Conference, Bristol, UK, September 1998, pp. 272–278.

5. CCETT, MhegDitor, http://www.ccett.fr/mheg/converter.htm, 1998.6. R. Dechter, I. Meiri, and J. Pearl, “Temporal constraint networks,Artificial Intelligence, Vol. 49, pp. 61–95,

1991.7. ISO/IEC JTC1/SC18/WG8 N1920, Information Technology: Hypermedia/Time-based Structuring Language

(HyTime), 2nd ed., ISO/IEC, August 1997. http://www.ornl.gov/sgml/wg8/docs/n1920/html/n1920.html.8. M. Jourdan, N. Layaida, C. Roisin, L. Sabry-Ismail, and L. Tardif, “Madeus, an authoring environment for

interactive multimedia documents,” in Proc. 6th ACM Multimedia’98, Bristol, 12–16 September 1998.9. M. Jourdan, C. Roisin, and L. Tardif, “Multiviews interfaces for multimedia authoring environments,” in Proc.

5th Conference on Multimedia Modelling, Lausanne, 12–15 October 1998.10. M. Jourdan, C. Roisin, L. Tardif, and L. Villard, “Authoring SMIL documents by direct manipulations during

presentation,”World Wide Web. Balzer Science, Vol. 2, No. 4, December 1999.11. M. Y. Kim and J. Song, “Multimedia documents with elastic time,” in Proc. ACM 95 Multimedia Conference,

San Francisco, November 1995, pp. 143–154.12. Macromedia, Director 6, http://www.macromedia.com/, 1998.


13. K. Mayer-Patel and L. Rowe, “Design and performance of the berkeley continuous media toolkit,” in Proc.Multimedia computing and Networking, M. Freeman, P. Jardetzky, and H.M. Vin (Eds.), San-Jose (USA),1997. SPIE 3020, pp. 194–206.

14. E. Megalou and T. Hadzilacos, “On conceptual modeling for interactive multimedia presentations,” in Proc.Multimedia Modeling’95, Singapour, 1995.

15. T. Meyer-Boudnik and W. Effelsberg, “MHEG explained,”IEEE Multimedia Magazine, Vol. 2, No. 1, pp. 26–38, 1995.

16. V. Quint and I. Vatton,THOT, A Structured Document Editor, INRIA, http://www.inrialpes.fr/opera/Thot. en.html, 1997.

17. Real Netwoks, http://www.real.com/solutions/rbn/index.html.18. D.C.M. Saade, L.F.G. Soares, F.R. Costa, and G.L. Souza-Filho, “Graphical structured-editing of multimedia

documents with temporal and spatial constraint,” in Proc. 4th Conference on Multimedia Modelling, WorldScientific Publishing, Singapore, 18–19 November 1997, pp. 279–295.

19. P. Senac, M. Diaz, A. Leger, and P. De Saqui-Sannes, “Modeling logical and temporal synchronization inhypermedia systems,”IEEE Journal of Selected Areas on Communications, Vol. 14, No. 1, pp. 84–103, 1996.

20. SUN, Java Media Framework, http://java.sun.com/products/index.html.21. G. Van Rossum, J. Jansen, K. Mullender, and D.C.A. Bulterman, “CMIFed: A presentation environment for

portable hypermedia documents,” in Proc. ACM 93 Multimedia Conference, California, USA, 1993.22. W3C Note, Extensible Specification Language (XSL), http://www.w3.org/TR/NOTE-XSL.html, 27 August

1997.23. W3C Recommendation, Synchronized Multimedia Integration Language (Smil) 1.0 Specification,

http://www.w3.org/TR/REC-smil, 15 June 1998.

M. Jourdan received a PhD degree in computer science from university of Grenoble in 1994. She has beenworking as permanent researcher at INRIA (French Institute of Research in Control and Computer Science) since1995. She is interested in multimedia document authoring and more precisely on the use of constraints in suchenvironments. She participates to the design of SMIL BOSTON W3C standard, mainly in the module about timingand synchronization.

C. Roisin received a PhD degree in computer science from university of Grenoble in 1984. She has been workingas assistant professor at university of Grenoble since 1997 and works at INRIA (French Institute of Research in


Control and Computer Science). Her research interests are in the field of interactive structured and multimediadocument processing, especially in document modeling, structure transformation, temporal synchronization andtheir application in the conception of authoring tools for the creation of multimedia documents.

L. Tardif is currently a PhD student in computer science. His research interests are in the field of interactivemultimedia documents authoring, especially in multiview authoring systems and constraint solving. He is alsodesigning the Kaomi toolkit to build multimedia authoring environments named Kaomi.