modeling negotiation dialogs

Modeling Negotiation Dialogs

Jan Alexandersson , Ralf Engel , Michael Kipp , Stephan Koch , Uwe Kussner ,Norbert Reithinger , and Manfred Stede

DFKI GmbH, Saarbrucken, GermanyTechnische Universitat Berlin, Germany

Abstract. For various purposes in the Verbmobil system it is necessary to build afull model of an unfolding dialog, on a suitably abstract level of representation. Thebasis of this model are representations of the individual utterances, and we capturetheir content by a combination of dialog act and propositional content. Our hierar-chy of dialog acts was used to annotate 21 CD-ROMs from the Verbmobil corpus,and the experience gained with the framework influenced standardization efforts inthe international scientific community. On the side of propositional content, partic-ular attention was given to the representation of temporal expressions, due to theapplication domains of Verbmobil.

1 Introduction

The modules in Verbmobil that exploit contextual and dialog knowledge all use acommon base of representations, which is introduced in this chapter. As the basicbuilding block for describing the underlying intentions of utterances, we defineda hierarchical scheme of dialog acts. With this theoretical background we tackledthe practical problems to devise a framework that on the one hand provides all theimportant facets needed in our modules, but that could also be operationalized forvarious purposes in the Verbmobil system. Thus, in section ?? we demonstrate howwe use a statistical classifier, trained on a corpus annotated with dialog acts, to au-tomatically compute the acts on the output of the speech recognizers; in chapter ??we introduce a framework of symbolic dialog act recognition as part of the contextevaluation module.

Besides the dialog act, an utterance is characterized by its propositional content,for which we have developed representations on various levels of granularity, usedfor different purposes within the system. In section 3 below, we introduce a rela-tively coarse-grained scheme that serves as interface between the context evaluationand the dialog module, and that is also used for generating dialog summaries. Chap-ter ?? explains how more fine-grained representations are used for various disam-biguation. Due to Verbmobils application domain, the treatment of temporal expres-sions deserves particular attentions when representing propositional content, and weoutline our `temporal expression language' below in section 3.3.

2 Dialog Acts

Dialog acts, also called dialog moves or illocutionary acts, describe basic elementsof human communication rather than words or sentences. They are used to markimportant characteristics of utterances, indicate the role of an utterance in a specificdialog and make the relationship between utterances more obvious. Since in generalit is not possible to translate all aspects of a given utterance Schmitz (1997, S. 10ff),we must consider the central requirements of a translation system. Thus, our repre-sentation must preserve the intended interpretation of the speaker with respect tothe goal of the dialog. This information is exploited in Verbmobilfor disambigua-tion purposes as a service to the semantics-based transfer module, as basis for adialog-act-based robust transfer, and to generate summaries.

Most of the dialog act schemes used in implemented natural language processingsystems nowadays are task-oriented and so is the one used in Verbmobil. This is dueto the need of reducing the amount of acts to a manageable size. Also, orthogonalcategories like controlling the dialog and promotion of the task, which might ormight not be realized in one communicative action, are merged in our scheme. Thiseases the handling and processing significantly.

The information content – the semantics – of task-oriented dialogs can be basi-cally split into task and domain related information, and information that addressesthe communication process. To guarantee generality and therefore more flexibilityboth information levels should be kept separately in the notion-choice of tags.

2.1 The Dialog Act Scheme

Verbmobil' s dialog act scheme (Alexandersson et al., 1998) was initially developedfor scheduling dialogs and was revised two times, both to adapt to other domains andto adapt to the requirements from processing and the annotation of the Verbmobilcorpus.

The scheme is in the shape of a decision tree in order to ease comprehensionand processing. It clarifies dependencies and relationships of the different acts (seeFigure 1). The acts are grouped in three sets: one set describes control of the dialog,another management of the task, and the third its direct advancement. The 32 nodesbelow the group labels are acts that are actually used for annotation and processingin Verbmobil.

For this current scheme we wrote a detailed manual comprising instructionsfor taking the decisions at the tree's nodes, and concise definitions and examples/-counterexamples for each dialog act in the three languages relevant for Verbmobil(Alexandersson et al., 1998). Figure 2 shows the main definition for ACCEPT. Themanual also contains guidelines for the segmentation of turns, i.e., boundaries wherea dialog act can be placed.

2.2 Annotation

For training and test purposes, we annotated 21 German, English, and Japanese CD-ROMs containing 1505 dialogs from the Verbmobil data collection (cf. chapter ??)

2

TOP

PROMOTE_TASK

MANAGE_TASK

CONTROL_DIALOGUE

DIALOGUE_ACT

Figure 1. Hierarchical structure of all dialog acts for Verbmobil (phase 2). Actual tokens areprinted boldface, others are abstract units.

with 76210 dialog acts, exploiting the Partitur Format provided by the data collec-tion. We developed a tool (see fig. 3) which provides a comfortable environmentfor easy annotation. The coders are students, most of them native speakers of therespective language, with or without linguistic background.

Training a new coder starts with reading the manual, followed by supervisedcoding of a number of training dialogs. Thereafter, we let the coder annotate a set ofalready annotated dialogs on his/her own comparing the resulting coding with theoriginal annotation. Once a tolerable level of reliability is reached, normal codingbegins. This training method, accompanied by regular meetings where critical casesare discussed, grants a constant view on the overall agreement of the coders andallows for identification of classes where an annotator seems to misunderstand thedefinitions of the manual.

When labeling a dialog segment with a dialog act we follow the tree from thetop towards the leaves. At every branching node we answer a question, the answerof which decides which branch to follow. If an answer cannot be given, the traversalprocess stops. This means that dialog segments can also be labeled with acts that arenot leaves of the hierarchy, i.e. that are more abstract dialog acts.

3

ACCEPTUpper level dialogue act: FEEDBACK POSITIVE

Dialogue phase: NEGOTIATION

Related propositional content:can contain anaphoric or explicit reference to the accepted proposition, e.g. a date or du-ration, a location, a selection of transportation or accommodation, an action (especiallya commitment)Definition:With an utterance expressing an ACCEPT the speaker explicitly accepts a proposal.ACCEPT is special case of FEEDBACK POSITIVE. Note that only proposals can beaccepted, like brought forward in the dialog acts SUGGEST, DEFER, OFFER, COMMIT,REQUEST COMMIT.

(...)

English Example: cdrom13, r423c

ANV001: (...) would that be okay <;quest> <#> <#Klicken> <#> <#> <;seos>(REQUEST_COMMENT)

RMW002: <:<#> <#Klicken> that would be perfect <;period> <A> <;seos>(ACCEPT)

(...)

Figure 2. Partial definition of the dialog act ACCEPT

Figure 3. ANNOTAG annotation tool for dialog act coding

4

To be useful for training and test purposes in a speech processing system ourhand-annotated dialogs have to fulfill very high quality standards. In order to assessthis we carried out a number of reliability studies for the segmentation and annota-tion of the data. To measure the agreement between feature-attributed data sets thecoefficient is of outstanding importance (for details see Carletta (1996)). In the fieldof content analysis a value 0.8 is considered good reliability for the correlationbetween two variables, while a of 0.67 0.8 still allows tentative conclusionsto be drawn. During the training and data annotation human coders repeatedly an-notate a set of dialogs. Usually, the utterance labels coincide in more than 80 % ofthe cases. The value of over 0.8 shows that dialog acts can be coded quite reliably.

2.3 Standardization Efforts

The approach to the definition and annotation of intentions in Verbmobil influencedand contributed to standardization efforts in this area significantly. We collaboratedin various groups and projects to compare and disseminate our approach. Most no-table are the Discourse Resource Initiative (DRI1), where a community effort isunderway to compare dialog acts and come up with a multi-dimensional scheme forannotation (Carletta et al., 1997, Core et al., 1999). Also, in the EU-funded projectMATE2 the level of dialog acts was treated. We contributed especially the compar-ison of various schemes and best practice experience (Klein, 1999). Since Verbmo-bil's schema and annotated corpus is one of the largest and most stable world-wide,it is frequently used as an example for good practice.

3 Propositional Content

We have seen in the previous section how the illocutionary force of an utterance canbe represented by a hierarchy of dialog acts. Apart from this, we need to represent itssubject matter, i.e. its propositional content. There are two basic constraints under-lying the modeling of propositional content. First, Verbmobil is task-oriented, and itis adequate to model only those portions of information that are relevant within thegiven scenario: appointment scheduling and travel planning. The main concepts inthis domain are detailed representations of time and locations, and what is needed tomake a journey or trip. This includes means of transport, accommodation facilities,and also spare time activities.

Second, in this setting dialog partners are trying to achieve an agreement and tofinally schedule a meeting or a trip, and specify the details. For translation, the cru-cial goal is to preserve the speaker's communicative intention. We therefore abstracthere from stylistic properties of utterances: While a speaker has various possibilitiesto express her preferences or manifest her goals, from a goal-oriented perspective itmakes no difference for the hearer which of the following utterances she chooses:

1 visit http://www.dfki.de/dri for a starting point2 visit http://mate.nis.sdu.dk

5

(1) a. and I would think , we get there by plane. (q001n ZDM006)3

b. let us take plane . (q020n RIO017)

c. I would rather take the plane . (m838arr1 016 QXQ)

d. I would like to fly to Hanover (m875bch1 010 QQY)

What all utterances in example (1) have in common is the suggestion to take theplane to travel somewhere. The last example adds information about the destination,but apart from this we should assign the same representation.

3.1 Ontology

In negotiation dialogs of appointment scheduling and travel planning, activities andplans are very important. This includes date and time specifications, as well as lo-cations, institutions, companies, etc. In general, objects and situations can be seenas the basic categories in natural language systems. While temporal expressions andlocations correspond to abstract or concrete objects, under situations we subsumeevents and activities. In addition, qualities are used to describe the or features of anobject or situation. These are again domain dependent; we model only the relevantaspects. Examples for qualities are prices of hotel rooms, travelling first or secondclass, etc. For illustration, a part of the ontology is shown in figure 5. In additionto the modeled hierarchy, the concepts are related to each other by specific roles.For example the concept journey consists of a move to somewhere, a stay there,the return, as well as date and duration information. The concepts are all part of ahierarchical domain model, which is described in chapter ??.

a1

d1

d2

has_agentaction

person

move_by_plane has_destination

has_agent

person

l1city

Figure 4. Representation of the propositional content in example (1).

Considering example (1), the propositional content is represented as an action aof two objects (the dialog partners), and a is classified as a move by plane. This isshown in figure 4. In example (1d) we should add the destination location Hanover,as indicated with the dotted has destination-relation. There are some other relations,e.g. for the name of the city, that are left out for brevity.

3 The examples originate from the Verbmobil data corpus. See chapter ?? in this book.

6

CONCRETE_OBJECT

SITUATION

ACTION

TIME

AGENTIVE

GEO_LOCATION

NONGEO_LOCATION

ROOM CITY

MOVE_BY_PUBLIC_TRANSPORTATION

MOVE_BY_PLANE

JOURNEY

STAY

SHOWMOVE

ACTION_QUALITY

PRICE_CLASS

ROOM_QUALITY

QUALITY

MEETING

MOVE_BY_RAIL

SEAT

HOTEL

COMPANY

INSTITUTION

LOCATION

OBJECT

ABSTRACT_OBJECT

TOP

EVENT

Figure 5. Part of the ontology for the representation of propositional content.

3.2 Topics

Apart from the content of a single utterance, our models have to account for largerportions of the dialog. Verbmobil dialogs focus in particular on four specific topics:

Scheduling The dialog partners try to schedule a date for a cooperative action. Thismight be a meeting, a trip, the visit of a company or the like. Start and end timeand the location of the planned action will be included.

Traveling In case of travel planning, negotiations concern the trip itself: when toleave and which means of transport to choose, reserving seats, etc.

Accommodation Also for travel planning, accommodation is arranged, hotel roomsreserved, etc.

Entertainment Last but not least, dialog partners may plan some activities for thespare time, like going to the cinema, a theater, or having dinner together.

In cooperative dialogs of the Verbmobil corpus, one of these major categories isin focus over a couple of utterances, usually until this topic is closed. This is impor-tant since utterances are often ambiguous when regarded in isolation. Consider forexample the German utterance:

(2) also ich werde einen Platz gleich reservieren fur Sie . (g014arr1 025 ABE 150010)

The German word Platz has several meanings. The most prominent leads to thetranslation “well, I will reserve a seat for you”, thinking of a reservation in a trainor plane. But in the context of arranging accommodation, Platz should be translatedas a “(hotel) room.” For disambiguation purposes, we defined a more fine-grainedset of topics that further specializes the four categories just introduced, see chapter??.

7

3.3 Temporal Expression Language

In the application domains appointment scheduling and travel planning, the majorityof utterances contain temporal expressions, which often are a central goal of thedialog and whose correct translation is therefore especially important to the overallsuccess of interpreting the dialog. Appointment scheduling, in particular, typicallyinvolves a sequence of proposals and rejections, or a successive “zooming in” onsome particular date. Representing and reasoning with temporal expressions hasbeen addressed in earlier research, for instance in the COSMA system Busemannet al. (1997), but not for the purposes of translation. Within Verbmobil, the contextevaluation module (see chapter ??) monitors temporal expressions and tracks howdates and times are negotiated in the dialog. In designing representations that enablesome temporal reasoning, three constraints have to be respected:

– Natural language offers many ways to refer to a particular date/time, and wehave to reduce them all to a single canonical representation.

– As we are dealing with spontaneous speech, we cannot rely on nice and cleaninput, but on the contrary have to reckon with incomplete or erroneous infor-mation.

– Context evaluation processes input VITs (see chapter ??) from several lan-guages, but the reasoning procedures are to be the same, so at some point weneed interlingual representations for our purpose.

Under these circumstances, it is not feasible to map the VIT directly to a canoni-cal representation that allows for the necessary computations — the gap betweensuch a level and the VIT input is too wide. Therefore, we introduce an intermediatelevel of representation, the Verbmobil `Temporal Expression Language' (TEL). Itsdevelopment target was thus a language whose terms are relatively easy to buildfrom a VIT, and that can be further processed by deeper referential analysis. Thissecond step maps TEL expressions to canonical representations of temporal infor-mation (intervals on the time line) with a clear semantics. In the following, we givea very brief sketch of the representation levels; aspects of the associated reasoningare described in chapter ??. A detailed explanation of the representation levels, in-cluding the full BNF for the TEL language, is provided in Stede et al. (1998); for aspecification of the formal semantics, see Endriss (1998).

In order to be constructed from a VIT comfortably, TEL expressions are rel-atively close to natural language expressions; for example, the third Monday af-ter Easter is represented as [after(3,dow:mon,holiday:easter)]. Basi-cally, the idea is to abstract from surface-linguistic idiosyncrasies but to mirror theunderlying constructions. The shape of a TEL expression is a list of TYPE:VALUEpairs; a phrase like on Tuesday at eleven o'clock is thus represented by this un-ordered list: [dow:tue, tod:11:00]. Based on an extensive corpus analysis,we identified five major classes of temporal expressions in English and German anddefined representations for them.

– Simple: Clock times are normalized to the form hh:mm and thus abstract overmany linguistic ways of expressing them. Time spans that can however be con-ceptualized as points (“pointlikes”) comprise part of day, day of week, part of

8

year (seasons), week of year, week of month, day of month, month, year, andholidays. Example: ten past twelve on Monday = (tod:12:10, dow:monday)

– Modified: Qualifications such as early or late are mapped to correspondingterms with scope over a POINTLIKE. The modifier fuzzy is used for ex-pressions like around Tuesday or about four o'clock.

– Spans: Time spans can be either DURATIONs referring to the length of aninterval (Let's meet for two hours) and represented by unit and number, oropen/closed intervals characterized by one or two points or pointlikes.

– Referenced: A time point can be characterized by a reference point and the dis-tance from it. In the most frequent case, a day of week is used as a referencepoint. An English example is A week from Friday; in German it also very com-mon to say Freitag in drei Wochen (`Friday in three weeks').

– Counted: A time point or interval is identified by counting a unit or a day ofweek. There are several subclasses, one of which is an explicit reference span,in which weeks or days are counted: the last Saturday in March; the third weekin May

When processing a VIT, all the pieces of information pertaining to dates andtimes are combined into a single TEL expression which then forms a part of thepropositional content. For instance, given the VIT representing the utterance Wecould leave Munich on Sunday, maybe in the afternoon, say around three, the con-text evaluation module constructs the TEL term (dow:sun, pod:afternoon,around:tod:03:00) and attaches it as one attribute to the propositional con-tent, as outlined above.

For the purposes of shallow processing in Verbmobil, the TEL expression issufficient as a representation of temporal information, and the dialog module (seechapter ??) uses it as such. On the other hand, in the system's deep processing line,the focus is on performing calculations as to the relative position of different tempo-ral expressions. To enable such processes, TEL expressions need to be mapped to atime-line, along which comparisons can be computed. Naturally, this involves con-textual resolution processes, so that the referents of expressions like an hour lateror the following week can be determined. For this step, we map the TEL terms to asecond level of representation, which is amenable to semantic interpretation.

In order to deal with underspecifications, we chose typed feature structures asrepresentation formalism, where any temporal expression is represented as an inter-val (following Allen Allen (1984)). An interval description (ID) is a feature struc-ture of type Id consisting of two date expressions, where date has a range ofsubtypes reflecting the presence or absence of the four attributes YEAR, MONTH,DAY, and TIME. Any combination of these corresponds to a distinct type, whichallows for measuring the informativeness of a date expression. Types are orga-nized in a hierarchy; as an example, the most specific type, tdmy date is definedas follows:

9

tdmy dateYEAR: yearMONTH: monthDAY: dayTIME: time

The three types month, day, time are in turn composed structures. dayhas the two attributes dow (day of week) and dom (day of month), because it canbe defined either way (am Montag versus am vierten). month is decomposed intomoy and wom (week of month), and time into hour and minute.

When a TEL expression is built from a VIT, it is afterwards mapped to an ID onthe basis of the previous context (in particular the dates and times already negoti-ated); IDs from subsequent utterances can then be compared in order to determinewhether one expression specializes or generalizes an earlier one, or is a new pro-posal — see chapter ??.

4 Utterance Representation: Dialog Act plus PropositionalContent

Synthesizing the previous sections, in our approach the meaning of an utteranceconsists of the dialogue act and its propositional content. This is in accordance withLevinson, who claims that

[...] the illocutionary force and the propositional content of utterances aredetachable elements of meaning. Levinson (1983)

On the other hand, not every pair of speech act and propositional content corre-sponds to a meaningful utterance. Thus we assign to each dialog act a set of possiblepropositions, as shown for the dialog act ACCEPT in figure 2. Another element inour utterance representation is the topic, as explained above. Though it may be re-dundant for some utterances, it is often helpful for disambiguation, as it offers anabstract representation of the relevant context. Thus, the examples in (1) can in totalbe represented as shown in figure 6.

utt a1

d1

d2

dialogue_act topic

has_actionhas_agent

suggest travelling

person

person

has_agentmove_by_plane

Figure 6. Aggregative representation of example (1).

10

A sequence of such utterance representations, appropriately linked and extendedwith roles indicating the speaker of each utterance, serves to model the completedialog. The representations are the basis for the interface to the dialog module, usedfor producing a summary of the dialog, see ?? for details.

5 Conclusion

Dialog acts are the building blocks to describe the structure of dialogs in Verbmobil.During the project, we put significant effort in the definition of the scheme and in theannotation of the corpus. The work with the corpus also showed us that the scheme,albeit very useful in our domains, covers of course not all aspects of natural dialogs.For example, we cannot adequately mark up utterances that announce acts explicitlylike “let me tell you when it fits me” or contain insults “you son of a b....”.

...

References

Alexandersson, J., Buschbeck-Wolf, B., Fujinami, T., Kipp, M., Koch, S., Maier, E., Rei-thinger, N., Schmitz, B., and Siegel, M. (1998). Dialogue Acts in VERBMOBIL-2 –Second Edition. Verbmobil-Report 226, DFKI Saarbrucken, Universitat Stuttgart, Tech-nische Universitat Berlin, Universitat des Saarlandes.

Allen, J. (1984). Toward a general theory of action and time. In Artificial Intelligence.123–154.

Busemann, S., Declerck, T., Diagne, A., Dini, L., Klein, J., and Schmeier, S. (1997). Naturallanguage dialogue service for appointment scheduling agents. In Proceedings of FifthConference of Applied Natural Language Processing.

Carletta, J., Dahlback, N., Reithinger, N., and Walker, M. A., eds. (1997). Standards forDialogue Coding in Natural Language Processing. Seminar Report 167.

Carletta, J. (1996). Assessing Agreement on Classification Tasks: The Kappa Statistics.Computational Linguistics 22(2):249–254.

Core, M., Ishizaki, M., Moore, J., Nakatani, C., Reithinger, N., Traum, D., and Tutiya, S.(1999). The Report of The Third Workshop of the Discourse Resource Initiative. Tech-nical Report 3 (CC-TR-99-1), Chiba University.

Endriss, U. (1998). Semantik Zeitlicher Ausdrucke in Terminvereinbarungsdialogen.Verbmobil-Report 227, Technische Universitat Berlin.

Klein, M. (1999). Standardisation Efforts on the Level of Dialogue Act in the MATE Project.In Proceedings of the ACL Workshop ”Towards Standards and Tools for Discourse Tag-ging”, 35–41.

Levinson, S. C. (1983). Pragmatics. Cambridge University Press.Schmitz, B. (1997). Pragmatikbasiertes Maschinelles Dolmetschen. Dissertation, Technis-

che Universitat Berlin.Stede, M., Haas, S., and Kussner, U. (1998). Understanding and tracking temporal descrip-

tions in dialogue. In Schroder, B., Lenders, W., Hess, W., and Portele, T., eds., Comput-ers, Linguistics, and Phonetics between Language and Speech, Proceedings of the 4thConference on Natural Language Processing - KONVENS '98 . Peter Lang, Frankfurt.

11

modeling negotiation dialogs

Documents