atwo-stepneuraldialogstatetrackerfortask-oriented...

Post on 26-Aug-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Research ArticleA Two-Step Neural Dialog State Tracker for Task-OrientedDialog Processing

A-Yeong Kim 1 Hyun-Je Song2 and Seong-Bae Park 3

1School of Computer Science and Engineering Kyungpook National University 80 Daehakro Buk-gu 41566 DaeguRepublic of Korea2Naver Search Naver Corporation 6 Buljeong-ro Bundang-gu Seongnam 13561 Gyeonggi Republic of Korea3Department of Computer Science and Engineering Kyung Hee University 1732 Deogyeong-daero Giheung-gu Yongin 17104Gyeonggi Republic of Korea

Correspondence should be addressed to Seong-Bae Park sbpark71khuackr

Received 30 June 2018 Accepted 24 September 2018 Published 18 October 2018

Academic Editor Rodolfo Zunino

Copyright copy 2018 A-Yeong Kim et alis is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Dialog state tracking in a spoken dialog system is the task that tracks the flow of a dialog and identifies accurately what a user wantsfrom the utterance Since the success of a dialog is influenced by the ability of the system to catch the requirements of the useraccurate state tracking is important for spoken dialog systems is paper proposes a two-step neural dialog state tracker which iscomposed of an informativeness classifier and a neural trackere informativeness classifier which is implemented by a CNNfirstfilters out noninformative utterances in a dialog en the neural tracker estimates dialog states from the remaining informativeutterances e tracker adopts the attention mechanism and the hierarchical softmax for its performance and fast training Toprove the effectiveness of the proposed model we do experiments on dialog state tracking in the human-human task-orienteddialogs with the standard DSTC4 data set Our experimental results prove the effectiveness of the proposedmodel by showing thatthe proposed model outperforms the neural trackers without the informativeness classifier the attention mechanism or thehierarchical softmax

1 Introduction

Dialog systems for a task-oriented dialog facilitate the in-teractions between the systems and their users by a naturallanguage to achieve the requirements from the users usthe task-oriented dialog systems carry out several tasks such asidentifying the user intention from a user utterance trackinga dialog flow and planning a dialog strategy to fulfill therequirements Furthermore they generate a natural languageresponse for the user Since the ability of a task-orienteddialog system to catch user requirements and to recognizea dialog flow influences its success heavily it is essential totrack what has happened and estimate the needs of a user

e dialog state tracker in a task-oriented dialog systemestimates the state of a conversation from a user utteranceen the dialog system plans a dialog strategy using the state

to reach a dialog goal that fulfills the user requirementsUsually the tracker has a slot-filling architecture which isbased on predefined semantic slots to translate user utter-ances into a semantic representation [1] us a dialog stateis expressed as a set of slot-attribute pairs As a result theobjective of the dialog state tracker is to estimate a true slotand its attributes from a user utterance

e success of a task-oriented dialog depends greatly onhow quickly and precisely the dialog states are caught fromuser utterances [2 3] ere have been a number of studiesthat make a dialog state tracker more precise and quicker[4 5] However the previous studies have two kinds ofproblems in common e first problem is that the previousstudies assume that every utterance delivers slot-attributepairs As a result they try to estimate slot-attribute pairs forall user utterances However some utterances do not carry

HindawiComputational Intelligence and NeuroscienceVolume 2018 Article ID 5798684 11 pageshttpsdoiorg10115520185798684

any information about true slot and true attributes of theslot For instance ldquoHi My name is Arlynrdquois an utterance forresponding to a system utterance and ldquoHellordquo is an openingspeech but has no information about dialog state ereforesuch noninformative utterances should be filtered out beforeestimating a dialog state from an utterance

Another problem is that task-oriented dialogs covera large state space As a result many slot-attribute pairsoccur rarely in the training data even if they become a dialogstate e dialog state tracking can be formulated as a clas-sification task in which the slot-attribute pairs are classesHowever the classifiers based on traditional machine-learning methods including the neural networks with thesoftmax do not work well in this task due to a great numberof classes and infrequent classes To resolve this problem thedialog state tracker should not consider only numerousnumber of classes but should be able to predict also rareclasses

is paper proposes a two-step neural tracker that solvesthe problemse first step of the proposed tracker filters outthe noninformative utterances Filtering out noninformativeutterances can be regarded as a binary classification whichdetermines whether a given utterance has meaningful wordsfor tracking dialog states or not Since convolutional neuralnetworks (CNNs) show good performance in sentenceclassification [6 7] this paper adopts a CNN to filter outnoninformative utterances e second step of the proposedtracker determines actual dialog states by using an attention-based recurrent neural network (RNN) with the hierarchicalsoftmax function e reason why RNN is adopted in thesecond step is that it is effective in classifying dynamic se-quences with complex behaviors [8] is RNN-based dialogstate tracker employs the attention mechanism and thehierarchical softmax since the attention mechanism showsgood performances in various NLP tasks [9] and helpsa model focus on valuable words in an utterance In additionthe adoption of the hierarchical softmax [10 11] reduces thecomputational complexity in training of the proposedmodel Dialogs are usually described with a large vocabularyand every word in the vocabulary can be an attribute ora class in dialog state tracking Since the hierarchical softmaxcan deal with a large number of classes efficiently theproposed tracker can make good estimations of dialog statesfor infrequent classes as well as for frequent classes

e rest of this paper is organized as follows Section 2discusses the previous studies on dialog state trackingSection 3 proposes the two-step neural dialog state trackerand Section 4 describes the details of the two-stepneural dialog state tracker An evaluation of the proposedmodel is given in Section 5 Finally Section 6 concludes thispaper

2 Related Work

Conventional dialog systems use handcrafted heuristics toupdate proper dialog states from the output of a naturallanguage understanding module [12 13] For instance Zueet al proposed handcrafted update rules to manage dialogstates in a weather information system [14] and Larsson and

Traum adopted handwritten rules to track information stateswhich represent the necessary information for trackinga dialog flow [15] On the other hand Sun et al proposed arule-based method to compute the confidence scores of theN-best candidates generated by their spoken language un-derstanding module [16] However these rules are notderived automatically from real dialog data so that theyrequire careful tuning and delicate designing efforts Due tosuch characteristics of rule-based methods thesemethods often result in inaccuracy of determining dialogstates

As a solution to handcrafted rules statistical methodshave been applied to dialog state tracking [17ndash19] Bohusand Rudnicky used a logistic regression to provide an initialassessment of the reliability of the information obtainedfrom a user [17] To achieve high tracking performance theyadopted some features such as the prior likelihoods of dialogstate slots omson and Young represented dialog statetracking with a Bayesian network and proposed a beliefupdate algorithm for resolving the tractability of inference inthe Bayesian network [18] eir experimental results provethat the model can not only deal with a very large number ofslots but also it achieves significant improvement over thesystems with handcrafted rules Ma et al proposed a modelwith two trackers for a location-based dialog system [19]One tracker is designed with a Bayesian network to tracknonlandmark concepts such as a company or a street whilethe other is a kernel-based tracker that deals with land-mark concepts in a dialog corpus However these studiesshare a common problem that they have to enumerate allpossible dialog states which is computationally veryexpensive

Due to increasing interests on dialog state trackingdialog state tracking challenges (DSTC) have been held forthe last several years DSTC provided a fully labeled data setand an evaluation framework for benchmark tasks is dataset has evolved from human-machine dialogs to human-human task-oriented dialogs and from a single domain tomultiple domains For instance a person requests bus in-formation to a machine in the DSTC1 us the machineleads dialogs in DSTC1 On the other hand a person re-quests tour information to a (human) tour guide in DSTC4Since the DSTC data set is used in most recent studies[20 21] it is now regarded as a standard data set for dialogstate tracking

One notable aspect of DSTC is that the number of studieswhich adopt a neural network for dialog state tracking isincreasing explosively Henderson et al first applied a neuralnetwork to dialog state tracking [2] eir study is mean-ingful in that it is the first try of using a neural network fordialog state tracking but their method still relies much onhandcrafted features After that they proposed anotherneural network-based model without any handcrafted fea-tures or explicit semantic representation [8] is modelextracts its feature representation automatically from theN-best list of speech recognition and machine acts Perezand Liu used the memory network for easy handling ofdialog history as well as identifying user request [22] eygenerated question-answer pairs manually to estimate what

2 Computational Intelligence and Neuroscience

slot-attribute pairs are actually needed for a subdialogHowever the main problem of these studies is that they arebased on the assumption that each component of a dialogsystem is developed independently As a result the problemof error propagation still remains

e most recent neural approach to dialog state trackingaccepts raw user utterances directly rather than analyzingthe utterances through a natural language understandingmodule For instance Yoshino et al proposed a neuraltracker based on the long short-term memory (LSTM) [20]e input to this LSTM is raw dialog utterances of a user Onthe other hand Shi et al used a CNN to focus on trackinggeneral information in a subdialog segment [21] To dealwith multitopic dialogs their CNN is composed of generaland topic-specific filters that share topic information witheach other ey showed meaningful results in their ex-periments for multiple topics but their model tends to missthe right timing when the tracker should generate its outputfor some specific slots

3 Two-Step Neural Dialog State Tracker

Figure 1 shows the general structure of traditional task-oriented dialog systems e systems usually consist of sixsubmodules which are ASR (automatic speech recognition)NLU (natural language understanding) DST (dialog statetracking) DM (dialog manager) NLG (natural languagegeneration) and TTS (text-to-speech) Since they arepipelined the errors of one module are easily propagated tosubsequent modules us it is of importance to reduce thenumber of modules as well as to minimize the errors of eachmodule is paper proposes a neural dialog state tracker inan end-to-end style at is the tracker does not use anyoutput from NLU Instead it takes the result of ASR as itsinput directly and returns a relevant dialog state as itsoutput erefore it actually substitutes two components ofNLU and DST in this figure

e task-oriented dialogs usually proceed in four steps ofopening asking offering and ending ere could be manyuser utterances in each step that are not directly related withdialog states Especially the utterances in the opening andthe ending steps deliver no information about dialog statessince they are usually greetings and closing remarksHowever asking and offering steps also can have such ut-terances For instance let us consider an example dialog fortour booking in Figure 2 While the bold utterances deliverthe type of a tour place and restriction information the italicutterances have nothing to do with dialog state Note thatsuch noninformative utterances can appear between in-formative utterances ese noninformative utterances re-sult in the performance degradation of a dialog state trackerif they are given as an input to the tracker us they shouldbe filtered out before tracking dialog states

To filter out noninformative utterances before de-termining actual dialog states the proposed tracker has twosteps as shown in Figure 3 In the first step the in-formativeness classifier determines whether a user utteranceU is informative or not If it decides that U is non-informative then U is discarded Otherwise the dialog state

of U is then classified by the actual dialog state tracker Sinceboth the informativeness classifier and the dialog statetracker are implemented with neural networks in this papereach user utterance is represented to a sequence of wordvectors by Word2Vec [23]

4 Implementation of Two-Step Tracker

41 Informativeness Classifier Convolutional neural net-works have been reported to be outstanding in many NLPtasks including sentence classification [6 24]us a CNN isadopted for the informativeness classifier in Figure 3 Itaccepts the numerical utterance vector as its input Assumethat the user utterance U consists of n words at isU x1 x2 xn1113864 1113865 where xi is the iminus th word in U If xi isrepresented as a k-dimensional vector xi byWord2Vec thenthe utterance vector U1n becomes

DB

ASR

NLGTTS

NLU

DST

DM

Proposed model

Figure 1 e general structure of traditional task-oriented dialogsystemse proposed dialog tracker corresponds to both NLU andDST of the traditional systems

S HelloT Hi My name is ArlynS What do you want to do in this tourT For this particular tour since wersquore gonna probably bring

along our daughter with us wersquore sort of thinking going toa theme park maybe

S A theme park Okay How old is your daughterT Shersquos fourS Oh shersquos only four year old OkayT YeahS But some of the riders she may not be able to take You can

visit the theme parks but they may have restrictions on theheight or the age of children

Figure 2 An example dialog about a tour booking with somedialog state independent utterances S in this example is a systemand T is a tourist

UDialogstate

Dialog statetracker

Discard U

No

YesInformativenessclassifier

Figure 3 e overall structure of the proposed two-step neuraltracker

Computational Intelligence and Neuroscience 3

U1n x1 oplus x2 oplus middot middot middot oplus xn (1)

where oplus is the concatenation operatore informativeness classifier determines whether the

utterance delivers some information about dialog states ornot Figure 4 depicts the structure of the informativenessclassifier e classifier is a CNN with a convolution layera max pooling layer a regularization layer and a softmaxlayer e convolution layer makes multiple feature mapswith multiple filters where each filter fw isin Rwk is applied toa window of w words to produce a new feature for the nextlayer at is a new feature ciw is derived from a filter fw anda window of words by

ciw t fw middot Uii+wminus1 + b( 1113857 (2)

where t is a nonlinear function such as the hyperbolictangent and b is a bias term is filter is applied to everywindow of words in U to produce a feature map witha window size w

cw c1w c2w cnminusw+1w1113960 1113961 (3)

Multiple filters with various window sizes of words areused to obtain multiple feature maps

e max pooling layer captures the most importantfeature for each feature map When a max pooling oper-ation is applied to a feature map cw the maximum value ofthe map 1113954cw max(cw) is obtained as a feature corre-sponding to the particular filter fw is pooling schemedeals with variable-length inputs and produces fixed-lengthoutputs for the subsequent layers A dropout is used for theregularization of this CNN before the outputs of maxpooling layer are passed to a fully connected softmax layerwhose output is the probability distribution over in-formativeness labels

42 Neural Dialog State Tracker Figure 5 shows the struc-ture of the proposed dialog state tracker is trackerconsists of a RNN-based encoder and an output decoderwhere the decoder is composed of an attention layer a linearlayer and a softmax layer e gated recurrent unit (GRU) isused for the encoder since it is widely used in recurrentneural networks to encode a sentence or an utterance EachGRU cell takes an input vector xi and calculates the hiddenstate hi using a reset gate ri an update gate zi and weightmatrices W and M for the two gates e variables areactually computed by

hi 1minus zi( 1113857himinus1 + zi1113957hi

zi σ Wzxi + Mzhiminus1( 1113857

ri σ Wrxi + Mrhiminus1( 1113857

(4)

where the hidden state hi of a GRU cell at time i is a linearinterpolation between the previous hidden state himinus1 and thecandidate hidden state 1113957hi e update gate zi decides howmuch the unit updates its content and ri is a reset gate thatdetermines whether the GRU cell forgets the previouslycomputed state or not en 1113957hi the candidate hidden state

is updated using the current input the reset gate and theprevious hidden state at is

1113957hi tanh Wxi + M ri ⊙ himinus1( 1113857( 1113857 (5)

where ⊙ is an element-wise multiplicatione slot and its attribute for a dialog state are estimated

by the decoder e input for the decoder is an encodedrepresentation h1n of the utterance Since it is important inestimating slots to focus on some words containing in-formation about a dialog state an attention mechanism isadopted in this architecture Bahdanau et al showed thata basic encoder-decoder architecture with an attentionmechanism outperforms the architecture without attentionat machine translation tasks [9] is implies that the at-tention mechanism can tell roughly the decoder to whichwords the decoder has to pay attention in translatinga sentence In dialog state tracking the decoder is able tocatch salient words from an input utterance by adopting anattention mechanism

e attention mechanism used in the proposed modelfirst calculates the normalized alignment scores that refer tohow closely the words from xi to xj are related ealignment score of each hidden layer output is calculated by

eij a hi hj1113872 1113873 (6)

where a is an alignment model that is implemented bya feedforward network Before computing the salience vectorsi the alignment score is normalized by

αij exp eij1113872 1113873

1113936nl1exp eil( 1113857

(7)

en the salient vector si that indicates the significanceof the output of a hidden layer is calculated as a weightedsum of the hidden layer output and the normalized align-ment scores at is

si 1113944j1

n

αijhi (8)

After all salience vectors are obtained they are averagedinto a single vector as

s 1n

1113944j1

n

si (9)

en the dialog state is estimated from s at is s isinputted to the softmax layer with a hierarchical softmaxe hierarchical softmax reduces the computation com-plexity by representing all classes as a binary tree [11] A leafnode of the tree represents a class and an intermediate nodecontains the common representation of its child nodesus unlike the flat softmax in which every class is pro-cessed independently the hierarchical softmax makes a classbe complemented by other classes through the intermediatenodes [25] As a result this hierarchical softmax gets able toestimate infrequent classes In general a dialog has a largecollection of vocabulary and every word in the vocabularybecomes a class in dialog state tracking us due to a great

4 Computational Intelligence and Neuroscience

number of classes and infrequent classes in dialog statetracking it is difficult for a tracker to estimate the dialogstate However by adopting the hierarchical softmaxfunction the proposed tracker can estimate infrequentclasses as well as the frequent classes

In the proposed model the slots and their attributes areconsidered as words and thus they are represented as a binarytree in which a word is a leaf node Since there areV words thetree hasVminus 1 inner nodes For each leaf node there exists onlyone path from a root to the leaf node that is used to estimatethe probability of the word For example Figure 6 is a hier-archical binary tree in which the leaf nodes represent words(slot-attribute pairs) and inner nodes represent probabilitymass e highlighted nodes and edges make a path from rootto an leaf node w1 and n(w j) means the j-th node on thepath from root to a leaf node w In the hierarchical softmaxmodel each of the Vminus 1 inner node has an output vectorvprimen(wj) e probability of a word being an output word is

p w wO( 1113857 1113945

L(w)minus1

j1σ I(t) middot vn(wj)

prime middot h1113872 1113873 (10)

where I(t) is an indicator function whose value is 1 if thevariable t is true and minus1 otherwise In this equation t checksif n(w j + 1) lch(n(w j)) lch(n) is the left child of noden and L(w) is the length of the path us I(t) examineswhether the (j + 1)minus th inner node on the path from root tow is equal to the left child of the jminus th inner node on the pathor not

5 Experiments

51 Data Set To examine the effectiveness of the proposedmodel we used TourSG data set (httpwwwcolipsorgworkshopdstc4datahtml) that was used for DSTC4Table 1 shows the simple statistics on the data set is dataset has totally 35 dialog sessions on Singapore tour in-formation collected from Skype call logs ree tour guidesand 35 tourists have participated in these call logs e 35dialog sessions are composed of 31304 utterances and273580 words respectively e training set has seven di-alogs from tour guide SG1 and SG2 the development set hasthree dialogs from the same guides and the test set has threedialogs from all tour guides us the dialogs from tourguide SG3 are used only for test As a result this data set canbe used to evaluate the robustness of dialog state trackers

Every dialog in the data set is expressed as a series ofutterances and each utterance is an unrefined transcriptioncontaining some disfluencies such as ldquoahrdquo ldquouhrdquo or ldquoumrdquo Adialog can be divided again into subdialog segments thatbelong to one of nine topic categories of OPENINGCLOSING FOOD ITINERARY ACCOMMODATIONATTRACTION SHOPPING TRANSPORTATION andOTHER e example dialog in Figure 7 has two subdialogswhere a dashed line is the border between the subdialogs As

n times k representation ofsentence

Convolution layer withmultiple filter widths

and feature maps

Max-over-time

pooling

Fully connectedlayer with

dropout andsomax output

xn

xnndash1

x1

x2

Figure 4 e structure of the informativeness classifier

x1

x1

h1

x2

h2

xnndash1

hnndash1

xn

hn

c

Hierarchical softmax

α1

hz

r

InOut

GRU cell

αn

x2 xnndash1 xn

h~

Figure 5 e structure of the proposed dialog state tracker withattention and hierarchical softmax

Computational Intelligence and Neuroscience 5

shown in ldquoDialog Staterdquo column of this table not all ut-terances have a slot and attributes

e topics of OPENING CLOSING ITINERARY andOTHER deliver no information about dialog states us theyare all merged into a single topic OTHERS for the in-formativeness classification As a result for the informativeclassifier the five topics of FOOD ACCOMMODATIONATTRACTION SHOPPING and TRANSPORTATION are

considered as a informative class while the topic OTHERS isregarded as a noninformative class Table 2 summarizes theclass distribution of the TourSG data set for the in-formativeness classifier e training set has 9974 informativeutterances and 2785 noninformative utterances e devel-opment set has 4139 informative and 673 noninformativeutterances while the test data set has 6528 and 1320 utter-ances for informative and noninformative classes respectively

w1 w2 w3 w4 wnndash1 wn

n(w11)

n(w1 2)

n(w1 3)

Figure 6 An example of hierarchical binary tree

Table 1 A simple statistics on the TourSG data set

SetNo of dialogs No of segments

No of utterancesSG1 SG2 SG3 Total Acco Attr Food Shop Trsp Other Total

Training 7 7 0 14 187 762 275 149 374 357 2104 12759Development 3 3 0 6 94 282 102 67 87 68 700 4812Test 3 3 3 9 174 616 134 49 174 186 1333 7848

Speaker Utterance

Tourist Can you give me some uh- tell me some cheap rate hotels because Irsquom planning just to leave mybags there and go somewhere take some pictures

Dialog state

TYPE = budget hotel

Tourist Yes Irsquom just gonna bring my backpack and my buddy with me So Irsquom kinda looking for a hotel thatis not that expensive Just gonna leave our things there and you know stay out the whole day TYPE = hostel

Guide Okay Let me get you hm hm So you donrsquot mind if itrsquos a bit uh not so roomy like hotel because youjust back to sleep INFO = room size

Tourist Yes Yes As we just gonna put our things there and then go out to take some pictures INFO = room sizeGuide Okay um- mdashTourist Hm mdash

Guide Letrsquos try this one okay mdashTourist Okay mdash

Guide Itrsquos InnCrowd Backpackers Hostel in Singapore If you take a dorm bed per person only twentydollars If you take a room itrsquos two single beds at fifty nine dollars

PLACE = the inncrowdbackpackers hostel

Tourist Um Wow thatrsquos good PLACE = the inncrowdbackpackers hostel

Guide Yah the prices are based on per person per bed or dorm But this one is room So it should be fiftynine for the two room So yoursquore actually paying about ten dollars more per person only INFO = pricerange

Tourist Oh okay Thatrsquos- the price is reasonable actually Itrsquos good INFO = pricerange

Guide Okay Irsquom going to recommend firstly you want to have a backpack type of hotel right TYPE = hostel

Figure 7 A sample dialog from TourSG data set for a topic ACCOMMODATION

6 Computational Intelligence and Neuroscience

e data set has two kinds of slots for dialog statetracking a regular slot and an INFO slot e regular slot ispredefined to deliver the essential content of a subdialog fora specific topic A topic contains three to seven regular slotson average On the other hand the INFO slot is for in-dicating the content mentioned implicitly in the subdialogIn the second subdialog of Figure 7 there are two slots ofldquoPLACErdquo and ldquoINFOrdquo where ldquoPLACErdquo is a regular slot andldquoINFOrdquo is the INFO slot Since the hotel name is describedexplicitly with the words ldquoInnCrowd Backpackers Hotelrdquo itbecomes the attribute of the slot ldquoPLACErdquo In addition thedialog participants are discussing the price of the hotelusthe attribute of the INFO slot becomes Pricerange eventhough the words ldquoprice rangerdquo do not appear explicitly inthe dialog e possible slot types and the list of theircandidate attributes vary according to a topic category andthey are described as an ontology provided by DSTC [26]

Table 3 shows the number of slots and their possibleattributes for every topic category ACCOMMODATIONATTRACTION FOOD and SHOPPING contain commonregular slots that refer to the names of places the types ofplaces and the information of geographic areas In detail thetopic ACCOMMODATION has three common regular slotsand the topic ATTRACTION includes five regular slotsabout the touristic activities visit-time for attraction andthree common regular slots e topic FOOD has seven slotsthat stand for the types of cuisine the names of dishes anddrinks the time for eating and three common regular slotse topic SHOPPING has three common regular slots andtime information for shopping e last topic TRANS-PORTATION includes the types of transportation thedestinations and origins the information on train lines andstations in Singapore and the types of tickets e regularslots have 570 661 1411 651 and 2450 attributes inACCOMMODATION ATTRACTION FOOD SHOPPINGand TRANSPORTATION respectively On average oneregular slot has roughly 190 122 202 163 and 490 attributesfor each topic category On the other hand INFO slot hasonly one slot is is because INFO is both a slot type anda slot name It has 32 31 16 16 and 15 attributes for fivetopic categories

52 Evaluation Metrics and Experimental Settings e ac-curacy and the F1-score are used to evaluate both the in-formativeness classifier and the neural dialog state trackerwhere they are defined as follows

(i) Accuracy the fraction of predictions in which themodel output is same with the gold label

(ii) PrecisionRecallF1

(a) Precision the fraction of predictions in themodel outputs that are correctly predicted

(b) Recall the fraction of predictions in the goldlabels that are correctly predicted

(c) F1 the harmonic mean of precision and recall

For the informativeness classifier the accuracy checkshow correctly a classifier determines whether a given ut-terance is informative or not and the F1-score shows howprecise the informativeness classifier is as well as howrobust it is In the dialog state tracking the predictions areslot-attribute pairs us the accuracy checks howequivalent the trackerrsquos outputs are to gold standard labelsat the whole frame-level while the precision and the recallaim to check the partial correctness at the slot-attributelevel

e parameters of the informativeness classifier and theneural dialog tracker are described in Tables 4 and 5 re-spectively Every word of the utterances the input for boththe informativeness classifier and the neural tracker isembedded into a 500-dimension vector ree different sizesof filters are used for the informativeness classifier and thenumber of filters for each filter size is 128 e dropout rateof the classifier is 05 the batch size is 50 and the number oftraining epochs is 10 For the neural tracker the hidden layersize is 500 the dropout ratio is 01 the learning rate is 0005and the number of epochs is 100000

53 Experimental Results Table 6 shows the classificationperformance of the informativeness classifier e classifierachieves the accuracy of 9316 and F1-score of 9317 for testset while the classifier shows a relatively low precision of9089 e reason for the relatively low precision is thatthere are some utterances about money in the non-informative class For instance the classifier predicts anutterance ldquoOkay maybe a thousand dollars for a dayrdquo asinformative due to the words ldquothousand dollarsrdquo and ldquoa dayrdquobut its gold standard label is noninformative Since thedialog to which the utterance belongs is an opening sectionbefore recommending tour information the utterance isa noninformative utterance for tracking dialog is prob-lem occurs because the classifier determines the in-formativeness of utterances without any contextualinformation of a dialog flow

To show the feasibility of the proposed model the dialogstate tracking performance of the proposed model is com-pared with those of the participants in the DSTC4 challengee string match tracker determines the slot and the at-tribute by a simple exact string match between the slot andthe given utterance [26] e LSTM-based neural trackerproposed by Yoshino et al [20] is a pure LSTM without the

Table 2 e class distribution for the informativeness classifier

Set Informative NoninformativeTraining 9974 2785Development 4139 673Test 6528 1320

Computational Intelligence and Neuroscience 7

attention mechanismus it can be directly compared withthe proposed model Tables 7 and 8 show the performance ofthe trackers e proposed model outperforms its com-petitors in both accuracy and F1-score Especially it achievesfour times higher accuracy than the LSTM which is a neuralnetwork-based model and is a similar approach as ourproposed model is result means that handling necessaryinformation from utterance is important to tracking dialogstate e main dierence between the proposed model andthe LSTM is the presence of the attention mechanism usit can be inferred that it is important in dialog state trackingto focus on some salient words while processing a userutterance

e performance of the proposed model per topic isinvestigated in Figure 8 e performance is measured forve topics related with dialog state tracking e proposedmodel achieves the best performance at every topic Espe-cially the proposed model only obtains lower performanceat the topic ACCOMMODATION e reason why the

proposed model shows poor performance for ACCOM-MODATION is that there are many utterances mentioningsome locations in the dialogs of ACCOMMODATION but

Table 6 e classication performance of the informativenessclassier

Set Accuracy Precision Recall F1-scoreDevelopment 09484 09202 09820 09501Test 09316 09089 09557 09317

Table 5e parameters for the neural dialog state tracker and theirvalues

Parameter ValueEmbedding size 500Hidden layer size 500Dropout ratio 01Learning rate 0005Number of epochs 100000

Table 3 e number of slots (regular INFO) and their attributes

Topic categoryRegular slot INFO slot

No of slot No of attribute No of avg attribute per slot No of slot No of attributeACCOMMODATION 3 570 190 1 23ATTRACTION 5 611 122 1 31FOOD 7 1411 202 1 16SHOPPING 4 651 163 1 16TRANSPORTATION 5 2450 490 1 15

Table 4e parameters for the informativeness classier and theirvalues

Parameter ValueEmbedding size 500Filter sizes 2 3 4e number of lters 128Dropout ratio 05Batch size 50Number of epochs 1000

Table 7 e comparison of the dialog state tracking accuraciesbetween the proposed model and DSTC4 participants

Model AccuracyString match [26] 00374LSTM [20] 00270Proposed model 00852

Table 8 e comparison of the dialog state tracking F1-scoresbetween the proposed model and DSTC4 participants

Model Precision Recall F1-scoreString match [26] 03589 01925 02506LSTM [20] 03400 02010 02530Proposed model 05173 03499 04174

06

05

04

03F1-s

core

02

01

ACCO

MM

OD

ATIO

N

ATTR

ACTI

ON

Topic

FOO

D

SHO

PPIN

G

TRAN

SPO

RTAT

ION

String matchLSTMProposed model

Figure 8 e F1-score of the proposed model per topic

8 Computational Intelligence and Neuroscience

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

any information about true slot and true attributes of theslot For instance ldquoHi My name is Arlynrdquois an utterance forresponding to a system utterance and ldquoHellordquo is an openingspeech but has no information about dialog state ereforesuch noninformative utterances should be filtered out beforeestimating a dialog state from an utterance

Another problem is that task-oriented dialogs covera large state space As a result many slot-attribute pairsoccur rarely in the training data even if they become a dialogstate e dialog state tracking can be formulated as a clas-sification task in which the slot-attribute pairs are classesHowever the classifiers based on traditional machine-learning methods including the neural networks with thesoftmax do not work well in this task due to a great numberof classes and infrequent classes To resolve this problem thedialog state tracker should not consider only numerousnumber of classes but should be able to predict also rareclasses

is paper proposes a two-step neural tracker that solvesthe problemse first step of the proposed tracker filters outthe noninformative utterances Filtering out noninformativeutterances can be regarded as a binary classification whichdetermines whether a given utterance has meaningful wordsfor tracking dialog states or not Since convolutional neuralnetworks (CNNs) show good performance in sentenceclassification [6 7] this paper adopts a CNN to filter outnoninformative utterances e second step of the proposedtracker determines actual dialog states by using an attention-based recurrent neural network (RNN) with the hierarchicalsoftmax function e reason why RNN is adopted in thesecond step is that it is effective in classifying dynamic se-quences with complex behaviors [8] is RNN-based dialogstate tracker employs the attention mechanism and thehierarchical softmax since the attention mechanism showsgood performances in various NLP tasks [9] and helpsa model focus on valuable words in an utterance In additionthe adoption of the hierarchical softmax [10 11] reduces thecomputational complexity in training of the proposedmodel Dialogs are usually described with a large vocabularyand every word in the vocabulary can be an attribute ora class in dialog state tracking Since the hierarchical softmaxcan deal with a large number of classes efficiently theproposed tracker can make good estimations of dialog statesfor infrequent classes as well as for frequent classes

e rest of this paper is organized as follows Section 2discusses the previous studies on dialog state trackingSection 3 proposes the two-step neural dialog state trackerand Section 4 describes the details of the two-stepneural dialog state tracker An evaluation of the proposedmodel is given in Section 5 Finally Section 6 concludes thispaper

2 Related Work

Conventional dialog systems use handcrafted heuristics toupdate proper dialog states from the output of a naturallanguage understanding module [12 13] For instance Zueet al proposed handcrafted update rules to manage dialogstates in a weather information system [14] and Larsson and

Traum adopted handwritten rules to track information stateswhich represent the necessary information for trackinga dialog flow [15] On the other hand Sun et al proposed arule-based method to compute the confidence scores of theN-best candidates generated by their spoken language un-derstanding module [16] However these rules are notderived automatically from real dialog data so that theyrequire careful tuning and delicate designing efforts Due tosuch characteristics of rule-based methods thesemethods often result in inaccuracy of determining dialogstates

As a solution to handcrafted rules statistical methodshave been applied to dialog state tracking [17ndash19] Bohusand Rudnicky used a logistic regression to provide an initialassessment of the reliability of the information obtainedfrom a user [17] To achieve high tracking performance theyadopted some features such as the prior likelihoods of dialogstate slots omson and Young represented dialog statetracking with a Bayesian network and proposed a beliefupdate algorithm for resolving the tractability of inference inthe Bayesian network [18] eir experimental results provethat the model can not only deal with a very large number ofslots but also it achieves significant improvement over thesystems with handcrafted rules Ma et al proposed a modelwith two trackers for a location-based dialog system [19]One tracker is designed with a Bayesian network to tracknonlandmark concepts such as a company or a street whilethe other is a kernel-based tracker that deals with land-mark concepts in a dialog corpus However these studiesshare a common problem that they have to enumerate allpossible dialog states which is computationally veryexpensive

Due to increasing interests on dialog state trackingdialog state tracking challenges (DSTC) have been held forthe last several years DSTC provided a fully labeled data setand an evaluation framework for benchmark tasks is dataset has evolved from human-machine dialogs to human-human task-oriented dialogs and from a single domain tomultiple domains For instance a person requests bus in-formation to a machine in the DSTC1 us the machineleads dialogs in DSTC1 On the other hand a person re-quests tour information to a (human) tour guide in DSTC4Since the DSTC data set is used in most recent studies[20 21] it is now regarded as a standard data set for dialogstate tracking

One notable aspect of DSTC is that the number of studieswhich adopt a neural network for dialog state tracking isincreasing explosively Henderson et al first applied a neuralnetwork to dialog state tracking [2] eir study is mean-ingful in that it is the first try of using a neural network fordialog state tracking but their method still relies much onhandcrafted features After that they proposed anotherneural network-based model without any handcrafted fea-tures or explicit semantic representation [8] is modelextracts its feature representation automatically from theN-best list of speech recognition and machine acts Perezand Liu used the memory network for easy handling ofdialog history as well as identifying user request [22] eygenerated question-answer pairs manually to estimate what

2 Computational Intelligence and Neuroscience

slot-attribute pairs are actually needed for a subdialogHowever the main problem of these studies is that they arebased on the assumption that each component of a dialogsystem is developed independently As a result the problemof error propagation still remains

e most recent neural approach to dialog state trackingaccepts raw user utterances directly rather than analyzingthe utterances through a natural language understandingmodule For instance Yoshino et al proposed a neuraltracker based on the long short-term memory (LSTM) [20]e input to this LSTM is raw dialog utterances of a user Onthe other hand Shi et al used a CNN to focus on trackinggeneral information in a subdialog segment [21] To dealwith multitopic dialogs their CNN is composed of generaland topic-specific filters that share topic information witheach other ey showed meaningful results in their ex-periments for multiple topics but their model tends to missthe right timing when the tracker should generate its outputfor some specific slots

3 Two-Step Neural Dialog State Tracker

Figure 1 shows the general structure of traditional task-oriented dialog systems e systems usually consist of sixsubmodules which are ASR (automatic speech recognition)NLU (natural language understanding) DST (dialog statetracking) DM (dialog manager) NLG (natural languagegeneration) and TTS (text-to-speech) Since they arepipelined the errors of one module are easily propagated tosubsequent modules us it is of importance to reduce thenumber of modules as well as to minimize the errors of eachmodule is paper proposes a neural dialog state tracker inan end-to-end style at is the tracker does not use anyoutput from NLU Instead it takes the result of ASR as itsinput directly and returns a relevant dialog state as itsoutput erefore it actually substitutes two components ofNLU and DST in this figure

e task-oriented dialogs usually proceed in four steps ofopening asking offering and ending ere could be manyuser utterances in each step that are not directly related withdialog states Especially the utterances in the opening andthe ending steps deliver no information about dialog statessince they are usually greetings and closing remarksHowever asking and offering steps also can have such ut-terances For instance let us consider an example dialog fortour booking in Figure 2 While the bold utterances deliverthe type of a tour place and restriction information the italicutterances have nothing to do with dialog state Note thatsuch noninformative utterances can appear between in-formative utterances ese noninformative utterances re-sult in the performance degradation of a dialog state trackerif they are given as an input to the tracker us they shouldbe filtered out before tracking dialog states

To filter out noninformative utterances before de-termining actual dialog states the proposed tracker has twosteps as shown in Figure 3 In the first step the in-formativeness classifier determines whether a user utteranceU is informative or not If it decides that U is non-informative then U is discarded Otherwise the dialog state

of U is then classified by the actual dialog state tracker Sinceboth the informativeness classifier and the dialog statetracker are implemented with neural networks in this papereach user utterance is represented to a sequence of wordvectors by Word2Vec [23]

4 Implementation of Two-Step Tracker

41 Informativeness Classifier Convolutional neural net-works have been reported to be outstanding in many NLPtasks including sentence classification [6 24]us a CNN isadopted for the informativeness classifier in Figure 3 Itaccepts the numerical utterance vector as its input Assumethat the user utterance U consists of n words at isU x1 x2 xn1113864 1113865 where xi is the iminus th word in U If xi isrepresented as a k-dimensional vector xi byWord2Vec thenthe utterance vector U1n becomes

DB

ASR

NLGTTS

NLU

DST

DM

Proposed model

Figure 1 e general structure of traditional task-oriented dialogsystemse proposed dialog tracker corresponds to both NLU andDST of the traditional systems

S HelloT Hi My name is ArlynS What do you want to do in this tourT For this particular tour since wersquore gonna probably bring

along our daughter with us wersquore sort of thinking going toa theme park maybe

S A theme park Okay How old is your daughterT Shersquos fourS Oh shersquos only four year old OkayT YeahS But some of the riders she may not be able to take You can

visit the theme parks but they may have restrictions on theheight or the age of children

Figure 2 An example dialog about a tour booking with somedialog state independent utterances S in this example is a systemand T is a tourist

UDialogstate

Dialog statetracker

Discard U

No

YesInformativenessclassifier

Figure 3 e overall structure of the proposed two-step neuraltracker

Computational Intelligence and Neuroscience 3

U1n x1 oplus x2 oplus middot middot middot oplus xn (1)

where oplus is the concatenation operatore informativeness classifier determines whether the

utterance delivers some information about dialog states ornot Figure 4 depicts the structure of the informativenessclassifier e classifier is a CNN with a convolution layera max pooling layer a regularization layer and a softmaxlayer e convolution layer makes multiple feature mapswith multiple filters where each filter fw isin Rwk is applied toa window of w words to produce a new feature for the nextlayer at is a new feature ciw is derived from a filter fw anda window of words by

ciw t fw middot Uii+wminus1 + b( 1113857 (2)

where t is a nonlinear function such as the hyperbolictangent and b is a bias term is filter is applied to everywindow of words in U to produce a feature map witha window size w

cw c1w c2w cnminusw+1w1113960 1113961 (3)

Multiple filters with various window sizes of words areused to obtain multiple feature maps

e max pooling layer captures the most importantfeature for each feature map When a max pooling oper-ation is applied to a feature map cw the maximum value ofthe map 1113954cw max(cw) is obtained as a feature corre-sponding to the particular filter fw is pooling schemedeals with variable-length inputs and produces fixed-lengthoutputs for the subsequent layers A dropout is used for theregularization of this CNN before the outputs of maxpooling layer are passed to a fully connected softmax layerwhose output is the probability distribution over in-formativeness labels

42 Neural Dialog State Tracker Figure 5 shows the struc-ture of the proposed dialog state tracker is trackerconsists of a RNN-based encoder and an output decoderwhere the decoder is composed of an attention layer a linearlayer and a softmax layer e gated recurrent unit (GRU) isused for the encoder since it is widely used in recurrentneural networks to encode a sentence or an utterance EachGRU cell takes an input vector xi and calculates the hiddenstate hi using a reset gate ri an update gate zi and weightmatrices W and M for the two gates e variables areactually computed by

hi 1minus zi( 1113857himinus1 + zi1113957hi

zi σ Wzxi + Mzhiminus1( 1113857

ri σ Wrxi + Mrhiminus1( 1113857

(4)

where the hidden state hi of a GRU cell at time i is a linearinterpolation between the previous hidden state himinus1 and thecandidate hidden state 1113957hi e update gate zi decides howmuch the unit updates its content and ri is a reset gate thatdetermines whether the GRU cell forgets the previouslycomputed state or not en 1113957hi the candidate hidden state

is updated using the current input the reset gate and theprevious hidden state at is

1113957hi tanh Wxi + M ri ⊙ himinus1( 1113857( 1113857 (5)

where ⊙ is an element-wise multiplicatione slot and its attribute for a dialog state are estimated

by the decoder e input for the decoder is an encodedrepresentation h1n of the utterance Since it is important inestimating slots to focus on some words containing in-formation about a dialog state an attention mechanism isadopted in this architecture Bahdanau et al showed thata basic encoder-decoder architecture with an attentionmechanism outperforms the architecture without attentionat machine translation tasks [9] is implies that the at-tention mechanism can tell roughly the decoder to whichwords the decoder has to pay attention in translatinga sentence In dialog state tracking the decoder is able tocatch salient words from an input utterance by adopting anattention mechanism

e attention mechanism used in the proposed modelfirst calculates the normalized alignment scores that refer tohow closely the words from xi to xj are related ealignment score of each hidden layer output is calculated by

eij a hi hj1113872 1113873 (6)

where a is an alignment model that is implemented bya feedforward network Before computing the salience vectorsi the alignment score is normalized by

αij exp eij1113872 1113873

1113936nl1exp eil( 1113857

(7)

en the salient vector si that indicates the significanceof the output of a hidden layer is calculated as a weightedsum of the hidden layer output and the normalized align-ment scores at is

si 1113944j1

n

αijhi (8)

After all salience vectors are obtained they are averagedinto a single vector as

s 1n

1113944j1

n

si (9)

en the dialog state is estimated from s at is s isinputted to the softmax layer with a hierarchical softmaxe hierarchical softmax reduces the computation com-plexity by representing all classes as a binary tree [11] A leafnode of the tree represents a class and an intermediate nodecontains the common representation of its child nodesus unlike the flat softmax in which every class is pro-cessed independently the hierarchical softmax makes a classbe complemented by other classes through the intermediatenodes [25] As a result this hierarchical softmax gets able toestimate infrequent classes In general a dialog has a largecollection of vocabulary and every word in the vocabularybecomes a class in dialog state tracking us due to a great

4 Computational Intelligence and Neuroscience

number of classes and infrequent classes in dialog statetracking it is difficult for a tracker to estimate the dialogstate However by adopting the hierarchical softmaxfunction the proposed tracker can estimate infrequentclasses as well as the frequent classes

In the proposed model the slots and their attributes areconsidered as words and thus they are represented as a binarytree in which a word is a leaf node Since there areV words thetree hasVminus 1 inner nodes For each leaf node there exists onlyone path from a root to the leaf node that is used to estimatethe probability of the word For example Figure 6 is a hier-archical binary tree in which the leaf nodes represent words(slot-attribute pairs) and inner nodes represent probabilitymass e highlighted nodes and edges make a path from rootto an leaf node w1 and n(w j) means the j-th node on thepath from root to a leaf node w In the hierarchical softmaxmodel each of the Vminus 1 inner node has an output vectorvprimen(wj) e probability of a word being an output word is

p w wO( 1113857 1113945

L(w)minus1

j1σ I(t) middot vn(wj)

prime middot h1113872 1113873 (10)

where I(t) is an indicator function whose value is 1 if thevariable t is true and minus1 otherwise In this equation t checksif n(w j + 1) lch(n(w j)) lch(n) is the left child of noden and L(w) is the length of the path us I(t) examineswhether the (j + 1)minus th inner node on the path from root tow is equal to the left child of the jminus th inner node on the pathor not

5 Experiments

51 Data Set To examine the effectiveness of the proposedmodel we used TourSG data set (httpwwwcolipsorgworkshopdstc4datahtml) that was used for DSTC4Table 1 shows the simple statistics on the data set is dataset has totally 35 dialog sessions on Singapore tour in-formation collected from Skype call logs ree tour guidesand 35 tourists have participated in these call logs e 35dialog sessions are composed of 31304 utterances and273580 words respectively e training set has seven di-alogs from tour guide SG1 and SG2 the development set hasthree dialogs from the same guides and the test set has threedialogs from all tour guides us the dialogs from tourguide SG3 are used only for test As a result this data set canbe used to evaluate the robustness of dialog state trackers

Every dialog in the data set is expressed as a series ofutterances and each utterance is an unrefined transcriptioncontaining some disfluencies such as ldquoahrdquo ldquouhrdquo or ldquoumrdquo Adialog can be divided again into subdialog segments thatbelong to one of nine topic categories of OPENINGCLOSING FOOD ITINERARY ACCOMMODATIONATTRACTION SHOPPING TRANSPORTATION andOTHER e example dialog in Figure 7 has two subdialogswhere a dashed line is the border between the subdialogs As

n times k representation ofsentence

Convolution layer withmultiple filter widths

and feature maps

Max-over-time

pooling

Fully connectedlayer with

dropout andsomax output

xn

xnndash1

x1

x2

Figure 4 e structure of the informativeness classifier

x1

x1

h1

x2

h2

xnndash1

hnndash1

xn

hn

c

Hierarchical softmax

α1

hz

r

InOut

GRU cell

αn

x2 xnndash1 xn

h~

Figure 5 e structure of the proposed dialog state tracker withattention and hierarchical softmax

Computational Intelligence and Neuroscience 5

shown in ldquoDialog Staterdquo column of this table not all ut-terances have a slot and attributes

e topics of OPENING CLOSING ITINERARY andOTHER deliver no information about dialog states us theyare all merged into a single topic OTHERS for the in-formativeness classification As a result for the informativeclassifier the five topics of FOOD ACCOMMODATIONATTRACTION SHOPPING and TRANSPORTATION are

considered as a informative class while the topic OTHERS isregarded as a noninformative class Table 2 summarizes theclass distribution of the TourSG data set for the in-formativeness classifier e training set has 9974 informativeutterances and 2785 noninformative utterances e devel-opment set has 4139 informative and 673 noninformativeutterances while the test data set has 6528 and 1320 utter-ances for informative and noninformative classes respectively

w1 w2 w3 w4 wnndash1 wn

n(w11)

n(w1 2)

n(w1 3)

Figure 6 An example of hierarchical binary tree

Table 1 A simple statistics on the TourSG data set

SetNo of dialogs No of segments

No of utterancesSG1 SG2 SG3 Total Acco Attr Food Shop Trsp Other Total

Training 7 7 0 14 187 762 275 149 374 357 2104 12759Development 3 3 0 6 94 282 102 67 87 68 700 4812Test 3 3 3 9 174 616 134 49 174 186 1333 7848

Speaker Utterance

Tourist Can you give me some uh- tell me some cheap rate hotels because Irsquom planning just to leave mybags there and go somewhere take some pictures

Dialog state

TYPE = budget hotel

Tourist Yes Irsquom just gonna bring my backpack and my buddy with me So Irsquom kinda looking for a hotel thatis not that expensive Just gonna leave our things there and you know stay out the whole day TYPE = hostel

Guide Okay Let me get you hm hm So you donrsquot mind if itrsquos a bit uh not so roomy like hotel because youjust back to sleep INFO = room size

Tourist Yes Yes As we just gonna put our things there and then go out to take some pictures INFO = room sizeGuide Okay um- mdashTourist Hm mdash

Guide Letrsquos try this one okay mdashTourist Okay mdash

Guide Itrsquos InnCrowd Backpackers Hostel in Singapore If you take a dorm bed per person only twentydollars If you take a room itrsquos two single beds at fifty nine dollars

PLACE = the inncrowdbackpackers hostel

Tourist Um Wow thatrsquos good PLACE = the inncrowdbackpackers hostel

Guide Yah the prices are based on per person per bed or dorm But this one is room So it should be fiftynine for the two room So yoursquore actually paying about ten dollars more per person only INFO = pricerange

Tourist Oh okay Thatrsquos- the price is reasonable actually Itrsquos good INFO = pricerange

Guide Okay Irsquom going to recommend firstly you want to have a backpack type of hotel right TYPE = hostel

Figure 7 A sample dialog from TourSG data set for a topic ACCOMMODATION

6 Computational Intelligence and Neuroscience

e data set has two kinds of slots for dialog statetracking a regular slot and an INFO slot e regular slot ispredefined to deliver the essential content of a subdialog fora specific topic A topic contains three to seven regular slotson average On the other hand the INFO slot is for in-dicating the content mentioned implicitly in the subdialogIn the second subdialog of Figure 7 there are two slots ofldquoPLACErdquo and ldquoINFOrdquo where ldquoPLACErdquo is a regular slot andldquoINFOrdquo is the INFO slot Since the hotel name is describedexplicitly with the words ldquoInnCrowd Backpackers Hotelrdquo itbecomes the attribute of the slot ldquoPLACErdquo In addition thedialog participants are discussing the price of the hotelusthe attribute of the INFO slot becomes Pricerange eventhough the words ldquoprice rangerdquo do not appear explicitly inthe dialog e possible slot types and the list of theircandidate attributes vary according to a topic category andthey are described as an ontology provided by DSTC [26]

Table 3 shows the number of slots and their possibleattributes for every topic category ACCOMMODATIONATTRACTION FOOD and SHOPPING contain commonregular slots that refer to the names of places the types ofplaces and the information of geographic areas In detail thetopic ACCOMMODATION has three common regular slotsand the topic ATTRACTION includes five regular slotsabout the touristic activities visit-time for attraction andthree common regular slots e topic FOOD has seven slotsthat stand for the types of cuisine the names of dishes anddrinks the time for eating and three common regular slotse topic SHOPPING has three common regular slots andtime information for shopping e last topic TRANS-PORTATION includes the types of transportation thedestinations and origins the information on train lines andstations in Singapore and the types of tickets e regularslots have 570 661 1411 651 and 2450 attributes inACCOMMODATION ATTRACTION FOOD SHOPPINGand TRANSPORTATION respectively On average oneregular slot has roughly 190 122 202 163 and 490 attributesfor each topic category On the other hand INFO slot hasonly one slot is is because INFO is both a slot type anda slot name It has 32 31 16 16 and 15 attributes for fivetopic categories

52 Evaluation Metrics and Experimental Settings e ac-curacy and the F1-score are used to evaluate both the in-formativeness classifier and the neural dialog state trackerwhere they are defined as follows

(i) Accuracy the fraction of predictions in which themodel output is same with the gold label

(ii) PrecisionRecallF1

(a) Precision the fraction of predictions in themodel outputs that are correctly predicted

(b) Recall the fraction of predictions in the goldlabels that are correctly predicted

(c) F1 the harmonic mean of precision and recall

For the informativeness classifier the accuracy checkshow correctly a classifier determines whether a given ut-terance is informative or not and the F1-score shows howprecise the informativeness classifier is as well as howrobust it is In the dialog state tracking the predictions areslot-attribute pairs us the accuracy checks howequivalent the trackerrsquos outputs are to gold standard labelsat the whole frame-level while the precision and the recallaim to check the partial correctness at the slot-attributelevel

e parameters of the informativeness classifier and theneural dialog tracker are described in Tables 4 and 5 re-spectively Every word of the utterances the input for boththe informativeness classifier and the neural tracker isembedded into a 500-dimension vector ree different sizesof filters are used for the informativeness classifier and thenumber of filters for each filter size is 128 e dropout rateof the classifier is 05 the batch size is 50 and the number oftraining epochs is 10 For the neural tracker the hidden layersize is 500 the dropout ratio is 01 the learning rate is 0005and the number of epochs is 100000

53 Experimental Results Table 6 shows the classificationperformance of the informativeness classifier e classifierachieves the accuracy of 9316 and F1-score of 9317 for testset while the classifier shows a relatively low precision of9089 e reason for the relatively low precision is thatthere are some utterances about money in the non-informative class For instance the classifier predicts anutterance ldquoOkay maybe a thousand dollars for a dayrdquo asinformative due to the words ldquothousand dollarsrdquo and ldquoa dayrdquobut its gold standard label is noninformative Since thedialog to which the utterance belongs is an opening sectionbefore recommending tour information the utterance isa noninformative utterance for tracking dialog is prob-lem occurs because the classifier determines the in-formativeness of utterances without any contextualinformation of a dialog flow

To show the feasibility of the proposed model the dialogstate tracking performance of the proposed model is com-pared with those of the participants in the DSTC4 challengee string match tracker determines the slot and the at-tribute by a simple exact string match between the slot andthe given utterance [26] e LSTM-based neural trackerproposed by Yoshino et al [20] is a pure LSTM without the

Table 2 e class distribution for the informativeness classifier

Set Informative NoninformativeTraining 9974 2785Development 4139 673Test 6528 1320

Computational Intelligence and Neuroscience 7

attention mechanismus it can be directly compared withthe proposed model Tables 7 and 8 show the performance ofthe trackers e proposed model outperforms its com-petitors in both accuracy and F1-score Especially it achievesfour times higher accuracy than the LSTM which is a neuralnetwork-based model and is a similar approach as ourproposed model is result means that handling necessaryinformation from utterance is important to tracking dialogstate e main dierence between the proposed model andthe LSTM is the presence of the attention mechanism usit can be inferred that it is important in dialog state trackingto focus on some salient words while processing a userutterance

e performance of the proposed model per topic isinvestigated in Figure 8 e performance is measured forve topics related with dialog state tracking e proposedmodel achieves the best performance at every topic Espe-cially the proposed model only obtains lower performanceat the topic ACCOMMODATION e reason why the

proposed model shows poor performance for ACCOM-MODATION is that there are many utterances mentioningsome locations in the dialogs of ACCOMMODATION but

Table 6 e classication performance of the informativenessclassier

Set Accuracy Precision Recall F1-scoreDevelopment 09484 09202 09820 09501Test 09316 09089 09557 09317

Table 5e parameters for the neural dialog state tracker and theirvalues

Parameter ValueEmbedding size 500Hidden layer size 500Dropout ratio 01Learning rate 0005Number of epochs 100000

Table 3 e number of slots (regular INFO) and their attributes

Topic categoryRegular slot INFO slot

No of slot No of attribute No of avg attribute per slot No of slot No of attributeACCOMMODATION 3 570 190 1 23ATTRACTION 5 611 122 1 31FOOD 7 1411 202 1 16SHOPPING 4 651 163 1 16TRANSPORTATION 5 2450 490 1 15

Table 4e parameters for the informativeness classier and theirvalues

Parameter ValueEmbedding size 500Filter sizes 2 3 4e number of lters 128Dropout ratio 05Batch size 50Number of epochs 1000

Table 7 e comparison of the dialog state tracking accuraciesbetween the proposed model and DSTC4 participants

Model AccuracyString match [26] 00374LSTM [20] 00270Proposed model 00852

Table 8 e comparison of the dialog state tracking F1-scoresbetween the proposed model and DSTC4 participants

Model Precision Recall F1-scoreString match [26] 03589 01925 02506LSTM [20] 03400 02010 02530Proposed model 05173 03499 04174

06

05

04

03F1-s

core

02

01

ACCO

MM

OD

ATIO

N

ATTR

ACTI

ON

Topic

FOO

D

SHO

PPIN

G

TRAN

SPO

RTAT

ION

String matchLSTMProposed model

Figure 8 e F1-score of the proposed model per topic

8 Computational Intelligence and Neuroscience

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

slot-attribute pairs are actually needed for a subdialogHowever the main problem of these studies is that they arebased on the assumption that each component of a dialogsystem is developed independently As a result the problemof error propagation still remains

e most recent neural approach to dialog state trackingaccepts raw user utterances directly rather than analyzingthe utterances through a natural language understandingmodule For instance Yoshino et al proposed a neuraltracker based on the long short-term memory (LSTM) [20]e input to this LSTM is raw dialog utterances of a user Onthe other hand Shi et al used a CNN to focus on trackinggeneral information in a subdialog segment [21] To dealwith multitopic dialogs their CNN is composed of generaland topic-specific filters that share topic information witheach other ey showed meaningful results in their ex-periments for multiple topics but their model tends to missthe right timing when the tracker should generate its outputfor some specific slots

3 Two-Step Neural Dialog State Tracker

Figure 1 shows the general structure of traditional task-oriented dialog systems e systems usually consist of sixsubmodules which are ASR (automatic speech recognition)NLU (natural language understanding) DST (dialog statetracking) DM (dialog manager) NLG (natural languagegeneration) and TTS (text-to-speech) Since they arepipelined the errors of one module are easily propagated tosubsequent modules us it is of importance to reduce thenumber of modules as well as to minimize the errors of eachmodule is paper proposes a neural dialog state tracker inan end-to-end style at is the tracker does not use anyoutput from NLU Instead it takes the result of ASR as itsinput directly and returns a relevant dialog state as itsoutput erefore it actually substitutes two components ofNLU and DST in this figure

e task-oriented dialogs usually proceed in four steps ofopening asking offering and ending ere could be manyuser utterances in each step that are not directly related withdialog states Especially the utterances in the opening andthe ending steps deliver no information about dialog statessince they are usually greetings and closing remarksHowever asking and offering steps also can have such ut-terances For instance let us consider an example dialog fortour booking in Figure 2 While the bold utterances deliverthe type of a tour place and restriction information the italicutterances have nothing to do with dialog state Note thatsuch noninformative utterances can appear between in-formative utterances ese noninformative utterances re-sult in the performance degradation of a dialog state trackerif they are given as an input to the tracker us they shouldbe filtered out before tracking dialog states

To filter out noninformative utterances before de-termining actual dialog states the proposed tracker has twosteps as shown in Figure 3 In the first step the in-formativeness classifier determines whether a user utteranceU is informative or not If it decides that U is non-informative then U is discarded Otherwise the dialog state

of U is then classified by the actual dialog state tracker Sinceboth the informativeness classifier and the dialog statetracker are implemented with neural networks in this papereach user utterance is represented to a sequence of wordvectors by Word2Vec [23]

4 Implementation of Two-Step Tracker

41 Informativeness Classifier Convolutional neural net-works have been reported to be outstanding in many NLPtasks including sentence classification [6 24]us a CNN isadopted for the informativeness classifier in Figure 3 Itaccepts the numerical utterance vector as its input Assumethat the user utterance U consists of n words at isU x1 x2 xn1113864 1113865 where xi is the iminus th word in U If xi isrepresented as a k-dimensional vector xi byWord2Vec thenthe utterance vector U1n becomes

DB

ASR

NLGTTS

NLU

DST

DM

Proposed model

Figure 1 e general structure of traditional task-oriented dialogsystemse proposed dialog tracker corresponds to both NLU andDST of the traditional systems

S HelloT Hi My name is ArlynS What do you want to do in this tourT For this particular tour since wersquore gonna probably bring

along our daughter with us wersquore sort of thinking going toa theme park maybe

S A theme park Okay How old is your daughterT Shersquos fourS Oh shersquos only four year old OkayT YeahS But some of the riders she may not be able to take You can

visit the theme parks but they may have restrictions on theheight or the age of children

Figure 2 An example dialog about a tour booking with somedialog state independent utterances S in this example is a systemand T is a tourist

UDialogstate

Dialog statetracker

Discard U

No

YesInformativenessclassifier

Figure 3 e overall structure of the proposed two-step neuraltracker

Computational Intelligence and Neuroscience 3

U1n x1 oplus x2 oplus middot middot middot oplus xn (1)

where oplus is the concatenation operatore informativeness classifier determines whether the

utterance delivers some information about dialog states ornot Figure 4 depicts the structure of the informativenessclassifier e classifier is a CNN with a convolution layera max pooling layer a regularization layer and a softmaxlayer e convolution layer makes multiple feature mapswith multiple filters where each filter fw isin Rwk is applied toa window of w words to produce a new feature for the nextlayer at is a new feature ciw is derived from a filter fw anda window of words by

ciw t fw middot Uii+wminus1 + b( 1113857 (2)

where t is a nonlinear function such as the hyperbolictangent and b is a bias term is filter is applied to everywindow of words in U to produce a feature map witha window size w

cw c1w c2w cnminusw+1w1113960 1113961 (3)

Multiple filters with various window sizes of words areused to obtain multiple feature maps

e max pooling layer captures the most importantfeature for each feature map When a max pooling oper-ation is applied to a feature map cw the maximum value ofthe map 1113954cw max(cw) is obtained as a feature corre-sponding to the particular filter fw is pooling schemedeals with variable-length inputs and produces fixed-lengthoutputs for the subsequent layers A dropout is used for theregularization of this CNN before the outputs of maxpooling layer are passed to a fully connected softmax layerwhose output is the probability distribution over in-formativeness labels

42 Neural Dialog State Tracker Figure 5 shows the struc-ture of the proposed dialog state tracker is trackerconsists of a RNN-based encoder and an output decoderwhere the decoder is composed of an attention layer a linearlayer and a softmax layer e gated recurrent unit (GRU) isused for the encoder since it is widely used in recurrentneural networks to encode a sentence or an utterance EachGRU cell takes an input vector xi and calculates the hiddenstate hi using a reset gate ri an update gate zi and weightmatrices W and M for the two gates e variables areactually computed by

hi 1minus zi( 1113857himinus1 + zi1113957hi

zi σ Wzxi + Mzhiminus1( 1113857

ri σ Wrxi + Mrhiminus1( 1113857

(4)

where the hidden state hi of a GRU cell at time i is a linearinterpolation between the previous hidden state himinus1 and thecandidate hidden state 1113957hi e update gate zi decides howmuch the unit updates its content and ri is a reset gate thatdetermines whether the GRU cell forgets the previouslycomputed state or not en 1113957hi the candidate hidden state

is updated using the current input the reset gate and theprevious hidden state at is

1113957hi tanh Wxi + M ri ⊙ himinus1( 1113857( 1113857 (5)

where ⊙ is an element-wise multiplicatione slot and its attribute for a dialog state are estimated

by the decoder e input for the decoder is an encodedrepresentation h1n of the utterance Since it is important inestimating slots to focus on some words containing in-formation about a dialog state an attention mechanism isadopted in this architecture Bahdanau et al showed thata basic encoder-decoder architecture with an attentionmechanism outperforms the architecture without attentionat machine translation tasks [9] is implies that the at-tention mechanism can tell roughly the decoder to whichwords the decoder has to pay attention in translatinga sentence In dialog state tracking the decoder is able tocatch salient words from an input utterance by adopting anattention mechanism

e attention mechanism used in the proposed modelfirst calculates the normalized alignment scores that refer tohow closely the words from xi to xj are related ealignment score of each hidden layer output is calculated by

eij a hi hj1113872 1113873 (6)

where a is an alignment model that is implemented bya feedforward network Before computing the salience vectorsi the alignment score is normalized by

αij exp eij1113872 1113873

1113936nl1exp eil( 1113857

(7)

en the salient vector si that indicates the significanceof the output of a hidden layer is calculated as a weightedsum of the hidden layer output and the normalized align-ment scores at is

si 1113944j1

n

αijhi (8)

After all salience vectors are obtained they are averagedinto a single vector as

s 1n

1113944j1

n

si (9)

en the dialog state is estimated from s at is s isinputted to the softmax layer with a hierarchical softmaxe hierarchical softmax reduces the computation com-plexity by representing all classes as a binary tree [11] A leafnode of the tree represents a class and an intermediate nodecontains the common representation of its child nodesus unlike the flat softmax in which every class is pro-cessed independently the hierarchical softmax makes a classbe complemented by other classes through the intermediatenodes [25] As a result this hierarchical softmax gets able toestimate infrequent classes In general a dialog has a largecollection of vocabulary and every word in the vocabularybecomes a class in dialog state tracking us due to a great

4 Computational Intelligence and Neuroscience

number of classes and infrequent classes in dialog statetracking it is difficult for a tracker to estimate the dialogstate However by adopting the hierarchical softmaxfunction the proposed tracker can estimate infrequentclasses as well as the frequent classes

In the proposed model the slots and their attributes areconsidered as words and thus they are represented as a binarytree in which a word is a leaf node Since there areV words thetree hasVminus 1 inner nodes For each leaf node there exists onlyone path from a root to the leaf node that is used to estimatethe probability of the word For example Figure 6 is a hier-archical binary tree in which the leaf nodes represent words(slot-attribute pairs) and inner nodes represent probabilitymass e highlighted nodes and edges make a path from rootto an leaf node w1 and n(w j) means the j-th node on thepath from root to a leaf node w In the hierarchical softmaxmodel each of the Vminus 1 inner node has an output vectorvprimen(wj) e probability of a word being an output word is

p w wO( 1113857 1113945

L(w)minus1

j1σ I(t) middot vn(wj)

prime middot h1113872 1113873 (10)

where I(t) is an indicator function whose value is 1 if thevariable t is true and minus1 otherwise In this equation t checksif n(w j + 1) lch(n(w j)) lch(n) is the left child of noden and L(w) is the length of the path us I(t) examineswhether the (j + 1)minus th inner node on the path from root tow is equal to the left child of the jminus th inner node on the pathor not

5 Experiments

51 Data Set To examine the effectiveness of the proposedmodel we used TourSG data set (httpwwwcolipsorgworkshopdstc4datahtml) that was used for DSTC4Table 1 shows the simple statistics on the data set is dataset has totally 35 dialog sessions on Singapore tour in-formation collected from Skype call logs ree tour guidesand 35 tourists have participated in these call logs e 35dialog sessions are composed of 31304 utterances and273580 words respectively e training set has seven di-alogs from tour guide SG1 and SG2 the development set hasthree dialogs from the same guides and the test set has threedialogs from all tour guides us the dialogs from tourguide SG3 are used only for test As a result this data set canbe used to evaluate the robustness of dialog state trackers

Every dialog in the data set is expressed as a series ofutterances and each utterance is an unrefined transcriptioncontaining some disfluencies such as ldquoahrdquo ldquouhrdquo or ldquoumrdquo Adialog can be divided again into subdialog segments thatbelong to one of nine topic categories of OPENINGCLOSING FOOD ITINERARY ACCOMMODATIONATTRACTION SHOPPING TRANSPORTATION andOTHER e example dialog in Figure 7 has two subdialogswhere a dashed line is the border between the subdialogs As

n times k representation ofsentence

Convolution layer withmultiple filter widths

and feature maps

Max-over-time

pooling

Fully connectedlayer with

dropout andsomax output

xn

xnndash1

x1

x2

Figure 4 e structure of the informativeness classifier

x1

x1

h1

x2

h2

xnndash1

hnndash1

xn

hn

c

Hierarchical softmax

α1

hz

r

InOut

GRU cell

αn

x2 xnndash1 xn

h~

Figure 5 e structure of the proposed dialog state tracker withattention and hierarchical softmax

Computational Intelligence and Neuroscience 5

shown in ldquoDialog Staterdquo column of this table not all ut-terances have a slot and attributes

e topics of OPENING CLOSING ITINERARY andOTHER deliver no information about dialog states us theyare all merged into a single topic OTHERS for the in-formativeness classification As a result for the informativeclassifier the five topics of FOOD ACCOMMODATIONATTRACTION SHOPPING and TRANSPORTATION are

considered as a informative class while the topic OTHERS isregarded as a noninformative class Table 2 summarizes theclass distribution of the TourSG data set for the in-formativeness classifier e training set has 9974 informativeutterances and 2785 noninformative utterances e devel-opment set has 4139 informative and 673 noninformativeutterances while the test data set has 6528 and 1320 utter-ances for informative and noninformative classes respectively

w1 w2 w3 w4 wnndash1 wn

n(w11)

n(w1 2)

n(w1 3)

Figure 6 An example of hierarchical binary tree

Table 1 A simple statistics on the TourSG data set

SetNo of dialogs No of segments

No of utterancesSG1 SG2 SG3 Total Acco Attr Food Shop Trsp Other Total

Training 7 7 0 14 187 762 275 149 374 357 2104 12759Development 3 3 0 6 94 282 102 67 87 68 700 4812Test 3 3 3 9 174 616 134 49 174 186 1333 7848

Speaker Utterance

Tourist Can you give me some uh- tell me some cheap rate hotels because Irsquom planning just to leave mybags there and go somewhere take some pictures

Dialog state

TYPE = budget hotel

Tourist Yes Irsquom just gonna bring my backpack and my buddy with me So Irsquom kinda looking for a hotel thatis not that expensive Just gonna leave our things there and you know stay out the whole day TYPE = hostel

Guide Okay Let me get you hm hm So you donrsquot mind if itrsquos a bit uh not so roomy like hotel because youjust back to sleep INFO = room size

Tourist Yes Yes As we just gonna put our things there and then go out to take some pictures INFO = room sizeGuide Okay um- mdashTourist Hm mdash

Guide Letrsquos try this one okay mdashTourist Okay mdash

Guide Itrsquos InnCrowd Backpackers Hostel in Singapore If you take a dorm bed per person only twentydollars If you take a room itrsquos two single beds at fifty nine dollars

PLACE = the inncrowdbackpackers hostel

Tourist Um Wow thatrsquos good PLACE = the inncrowdbackpackers hostel

Guide Yah the prices are based on per person per bed or dorm But this one is room So it should be fiftynine for the two room So yoursquore actually paying about ten dollars more per person only INFO = pricerange

Tourist Oh okay Thatrsquos- the price is reasonable actually Itrsquos good INFO = pricerange

Guide Okay Irsquom going to recommend firstly you want to have a backpack type of hotel right TYPE = hostel

Figure 7 A sample dialog from TourSG data set for a topic ACCOMMODATION

6 Computational Intelligence and Neuroscience

e data set has two kinds of slots for dialog statetracking a regular slot and an INFO slot e regular slot ispredefined to deliver the essential content of a subdialog fora specific topic A topic contains three to seven regular slotson average On the other hand the INFO slot is for in-dicating the content mentioned implicitly in the subdialogIn the second subdialog of Figure 7 there are two slots ofldquoPLACErdquo and ldquoINFOrdquo where ldquoPLACErdquo is a regular slot andldquoINFOrdquo is the INFO slot Since the hotel name is describedexplicitly with the words ldquoInnCrowd Backpackers Hotelrdquo itbecomes the attribute of the slot ldquoPLACErdquo In addition thedialog participants are discussing the price of the hotelusthe attribute of the INFO slot becomes Pricerange eventhough the words ldquoprice rangerdquo do not appear explicitly inthe dialog e possible slot types and the list of theircandidate attributes vary according to a topic category andthey are described as an ontology provided by DSTC [26]

Table 3 shows the number of slots and their possibleattributes for every topic category ACCOMMODATIONATTRACTION FOOD and SHOPPING contain commonregular slots that refer to the names of places the types ofplaces and the information of geographic areas In detail thetopic ACCOMMODATION has three common regular slotsand the topic ATTRACTION includes five regular slotsabout the touristic activities visit-time for attraction andthree common regular slots e topic FOOD has seven slotsthat stand for the types of cuisine the names of dishes anddrinks the time for eating and three common regular slotse topic SHOPPING has three common regular slots andtime information for shopping e last topic TRANS-PORTATION includes the types of transportation thedestinations and origins the information on train lines andstations in Singapore and the types of tickets e regularslots have 570 661 1411 651 and 2450 attributes inACCOMMODATION ATTRACTION FOOD SHOPPINGand TRANSPORTATION respectively On average oneregular slot has roughly 190 122 202 163 and 490 attributesfor each topic category On the other hand INFO slot hasonly one slot is is because INFO is both a slot type anda slot name It has 32 31 16 16 and 15 attributes for fivetopic categories

52 Evaluation Metrics and Experimental Settings e ac-curacy and the F1-score are used to evaluate both the in-formativeness classifier and the neural dialog state trackerwhere they are defined as follows

(i) Accuracy the fraction of predictions in which themodel output is same with the gold label

(ii) PrecisionRecallF1

(a) Precision the fraction of predictions in themodel outputs that are correctly predicted

(b) Recall the fraction of predictions in the goldlabels that are correctly predicted

(c) F1 the harmonic mean of precision and recall

For the informativeness classifier the accuracy checkshow correctly a classifier determines whether a given ut-terance is informative or not and the F1-score shows howprecise the informativeness classifier is as well as howrobust it is In the dialog state tracking the predictions areslot-attribute pairs us the accuracy checks howequivalent the trackerrsquos outputs are to gold standard labelsat the whole frame-level while the precision and the recallaim to check the partial correctness at the slot-attributelevel

e parameters of the informativeness classifier and theneural dialog tracker are described in Tables 4 and 5 re-spectively Every word of the utterances the input for boththe informativeness classifier and the neural tracker isembedded into a 500-dimension vector ree different sizesof filters are used for the informativeness classifier and thenumber of filters for each filter size is 128 e dropout rateof the classifier is 05 the batch size is 50 and the number oftraining epochs is 10 For the neural tracker the hidden layersize is 500 the dropout ratio is 01 the learning rate is 0005and the number of epochs is 100000

53 Experimental Results Table 6 shows the classificationperformance of the informativeness classifier e classifierachieves the accuracy of 9316 and F1-score of 9317 for testset while the classifier shows a relatively low precision of9089 e reason for the relatively low precision is thatthere are some utterances about money in the non-informative class For instance the classifier predicts anutterance ldquoOkay maybe a thousand dollars for a dayrdquo asinformative due to the words ldquothousand dollarsrdquo and ldquoa dayrdquobut its gold standard label is noninformative Since thedialog to which the utterance belongs is an opening sectionbefore recommending tour information the utterance isa noninformative utterance for tracking dialog is prob-lem occurs because the classifier determines the in-formativeness of utterances without any contextualinformation of a dialog flow

To show the feasibility of the proposed model the dialogstate tracking performance of the proposed model is com-pared with those of the participants in the DSTC4 challengee string match tracker determines the slot and the at-tribute by a simple exact string match between the slot andthe given utterance [26] e LSTM-based neural trackerproposed by Yoshino et al [20] is a pure LSTM without the

Table 2 e class distribution for the informativeness classifier

Set Informative NoninformativeTraining 9974 2785Development 4139 673Test 6528 1320

Computational Intelligence and Neuroscience 7

attention mechanismus it can be directly compared withthe proposed model Tables 7 and 8 show the performance ofthe trackers e proposed model outperforms its com-petitors in both accuracy and F1-score Especially it achievesfour times higher accuracy than the LSTM which is a neuralnetwork-based model and is a similar approach as ourproposed model is result means that handling necessaryinformation from utterance is important to tracking dialogstate e main dierence between the proposed model andthe LSTM is the presence of the attention mechanism usit can be inferred that it is important in dialog state trackingto focus on some salient words while processing a userutterance

e performance of the proposed model per topic isinvestigated in Figure 8 e performance is measured forve topics related with dialog state tracking e proposedmodel achieves the best performance at every topic Espe-cially the proposed model only obtains lower performanceat the topic ACCOMMODATION e reason why the

proposed model shows poor performance for ACCOM-MODATION is that there are many utterances mentioningsome locations in the dialogs of ACCOMMODATION but

Table 6 e classication performance of the informativenessclassier

Set Accuracy Precision Recall F1-scoreDevelopment 09484 09202 09820 09501Test 09316 09089 09557 09317

Table 5e parameters for the neural dialog state tracker and theirvalues

Parameter ValueEmbedding size 500Hidden layer size 500Dropout ratio 01Learning rate 0005Number of epochs 100000

Table 3 e number of slots (regular INFO) and their attributes

Topic categoryRegular slot INFO slot

No of slot No of attribute No of avg attribute per slot No of slot No of attributeACCOMMODATION 3 570 190 1 23ATTRACTION 5 611 122 1 31FOOD 7 1411 202 1 16SHOPPING 4 651 163 1 16TRANSPORTATION 5 2450 490 1 15

Table 4e parameters for the informativeness classier and theirvalues

Parameter ValueEmbedding size 500Filter sizes 2 3 4e number of lters 128Dropout ratio 05Batch size 50Number of epochs 1000

Table 7 e comparison of the dialog state tracking accuraciesbetween the proposed model and DSTC4 participants

Model AccuracyString match [26] 00374LSTM [20] 00270Proposed model 00852

Table 8 e comparison of the dialog state tracking F1-scoresbetween the proposed model and DSTC4 participants

Model Precision Recall F1-scoreString match [26] 03589 01925 02506LSTM [20] 03400 02010 02530Proposed model 05173 03499 04174

06

05

04

03F1-s

core

02

01

ACCO

MM

OD

ATIO

N

ATTR

ACTI

ON

Topic

FOO

D

SHO

PPIN

G

TRAN

SPO

RTAT

ION

String matchLSTMProposed model

Figure 8 e F1-score of the proposed model per topic

8 Computational Intelligence and Neuroscience

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

U1n x1 oplus x2 oplus middot middot middot oplus xn (1)

where oplus is the concatenation operatore informativeness classifier determines whether the

utterance delivers some information about dialog states ornot Figure 4 depicts the structure of the informativenessclassifier e classifier is a CNN with a convolution layera max pooling layer a regularization layer and a softmaxlayer e convolution layer makes multiple feature mapswith multiple filters where each filter fw isin Rwk is applied toa window of w words to produce a new feature for the nextlayer at is a new feature ciw is derived from a filter fw anda window of words by

ciw t fw middot Uii+wminus1 + b( 1113857 (2)

where t is a nonlinear function such as the hyperbolictangent and b is a bias term is filter is applied to everywindow of words in U to produce a feature map witha window size w

cw c1w c2w cnminusw+1w1113960 1113961 (3)

Multiple filters with various window sizes of words areused to obtain multiple feature maps

e max pooling layer captures the most importantfeature for each feature map When a max pooling oper-ation is applied to a feature map cw the maximum value ofthe map 1113954cw max(cw) is obtained as a feature corre-sponding to the particular filter fw is pooling schemedeals with variable-length inputs and produces fixed-lengthoutputs for the subsequent layers A dropout is used for theregularization of this CNN before the outputs of maxpooling layer are passed to a fully connected softmax layerwhose output is the probability distribution over in-formativeness labels

42 Neural Dialog State Tracker Figure 5 shows the struc-ture of the proposed dialog state tracker is trackerconsists of a RNN-based encoder and an output decoderwhere the decoder is composed of an attention layer a linearlayer and a softmax layer e gated recurrent unit (GRU) isused for the encoder since it is widely used in recurrentneural networks to encode a sentence or an utterance EachGRU cell takes an input vector xi and calculates the hiddenstate hi using a reset gate ri an update gate zi and weightmatrices W and M for the two gates e variables areactually computed by

hi 1minus zi( 1113857himinus1 + zi1113957hi

zi σ Wzxi + Mzhiminus1( 1113857

ri σ Wrxi + Mrhiminus1( 1113857

(4)

where the hidden state hi of a GRU cell at time i is a linearinterpolation between the previous hidden state himinus1 and thecandidate hidden state 1113957hi e update gate zi decides howmuch the unit updates its content and ri is a reset gate thatdetermines whether the GRU cell forgets the previouslycomputed state or not en 1113957hi the candidate hidden state

is updated using the current input the reset gate and theprevious hidden state at is

1113957hi tanh Wxi + M ri ⊙ himinus1( 1113857( 1113857 (5)

where ⊙ is an element-wise multiplicatione slot and its attribute for a dialog state are estimated

by the decoder e input for the decoder is an encodedrepresentation h1n of the utterance Since it is important inestimating slots to focus on some words containing in-formation about a dialog state an attention mechanism isadopted in this architecture Bahdanau et al showed thata basic encoder-decoder architecture with an attentionmechanism outperforms the architecture without attentionat machine translation tasks [9] is implies that the at-tention mechanism can tell roughly the decoder to whichwords the decoder has to pay attention in translatinga sentence In dialog state tracking the decoder is able tocatch salient words from an input utterance by adopting anattention mechanism

e attention mechanism used in the proposed modelfirst calculates the normalized alignment scores that refer tohow closely the words from xi to xj are related ealignment score of each hidden layer output is calculated by

eij a hi hj1113872 1113873 (6)

where a is an alignment model that is implemented bya feedforward network Before computing the salience vectorsi the alignment score is normalized by

αij exp eij1113872 1113873

1113936nl1exp eil( 1113857

(7)

en the salient vector si that indicates the significanceof the output of a hidden layer is calculated as a weightedsum of the hidden layer output and the normalized align-ment scores at is

si 1113944j1

n

αijhi (8)

After all salience vectors are obtained they are averagedinto a single vector as

s 1n

1113944j1

n

si (9)

en the dialog state is estimated from s at is s isinputted to the softmax layer with a hierarchical softmaxe hierarchical softmax reduces the computation com-plexity by representing all classes as a binary tree [11] A leafnode of the tree represents a class and an intermediate nodecontains the common representation of its child nodesus unlike the flat softmax in which every class is pro-cessed independently the hierarchical softmax makes a classbe complemented by other classes through the intermediatenodes [25] As a result this hierarchical softmax gets able toestimate infrequent classes In general a dialog has a largecollection of vocabulary and every word in the vocabularybecomes a class in dialog state tracking us due to a great

4 Computational Intelligence and Neuroscience

number of classes and infrequent classes in dialog statetracking it is difficult for a tracker to estimate the dialogstate However by adopting the hierarchical softmaxfunction the proposed tracker can estimate infrequentclasses as well as the frequent classes

In the proposed model the slots and their attributes areconsidered as words and thus they are represented as a binarytree in which a word is a leaf node Since there areV words thetree hasVminus 1 inner nodes For each leaf node there exists onlyone path from a root to the leaf node that is used to estimatethe probability of the word For example Figure 6 is a hier-archical binary tree in which the leaf nodes represent words(slot-attribute pairs) and inner nodes represent probabilitymass e highlighted nodes and edges make a path from rootto an leaf node w1 and n(w j) means the j-th node on thepath from root to a leaf node w In the hierarchical softmaxmodel each of the Vminus 1 inner node has an output vectorvprimen(wj) e probability of a word being an output word is

p w wO( 1113857 1113945

L(w)minus1

j1σ I(t) middot vn(wj)

prime middot h1113872 1113873 (10)

where I(t) is an indicator function whose value is 1 if thevariable t is true and minus1 otherwise In this equation t checksif n(w j + 1) lch(n(w j)) lch(n) is the left child of noden and L(w) is the length of the path us I(t) examineswhether the (j + 1)minus th inner node on the path from root tow is equal to the left child of the jminus th inner node on the pathor not

5 Experiments

51 Data Set To examine the effectiveness of the proposedmodel we used TourSG data set (httpwwwcolipsorgworkshopdstc4datahtml) that was used for DSTC4Table 1 shows the simple statistics on the data set is dataset has totally 35 dialog sessions on Singapore tour in-formation collected from Skype call logs ree tour guidesand 35 tourists have participated in these call logs e 35dialog sessions are composed of 31304 utterances and273580 words respectively e training set has seven di-alogs from tour guide SG1 and SG2 the development set hasthree dialogs from the same guides and the test set has threedialogs from all tour guides us the dialogs from tourguide SG3 are used only for test As a result this data set canbe used to evaluate the robustness of dialog state trackers

Every dialog in the data set is expressed as a series ofutterances and each utterance is an unrefined transcriptioncontaining some disfluencies such as ldquoahrdquo ldquouhrdquo or ldquoumrdquo Adialog can be divided again into subdialog segments thatbelong to one of nine topic categories of OPENINGCLOSING FOOD ITINERARY ACCOMMODATIONATTRACTION SHOPPING TRANSPORTATION andOTHER e example dialog in Figure 7 has two subdialogswhere a dashed line is the border between the subdialogs As

n times k representation ofsentence

Convolution layer withmultiple filter widths

and feature maps

Max-over-time

pooling

Fully connectedlayer with

dropout andsomax output

xn

xnndash1

x1

x2

Figure 4 e structure of the informativeness classifier

x1

x1

h1

x2

h2

xnndash1

hnndash1

xn

hn

c

Hierarchical softmax

α1

hz

r

InOut

GRU cell

αn

x2 xnndash1 xn

h~

Figure 5 e structure of the proposed dialog state tracker withattention and hierarchical softmax

Computational Intelligence and Neuroscience 5

shown in ldquoDialog Staterdquo column of this table not all ut-terances have a slot and attributes

e topics of OPENING CLOSING ITINERARY andOTHER deliver no information about dialog states us theyare all merged into a single topic OTHERS for the in-formativeness classification As a result for the informativeclassifier the five topics of FOOD ACCOMMODATIONATTRACTION SHOPPING and TRANSPORTATION are

considered as a informative class while the topic OTHERS isregarded as a noninformative class Table 2 summarizes theclass distribution of the TourSG data set for the in-formativeness classifier e training set has 9974 informativeutterances and 2785 noninformative utterances e devel-opment set has 4139 informative and 673 noninformativeutterances while the test data set has 6528 and 1320 utter-ances for informative and noninformative classes respectively

w1 w2 w3 w4 wnndash1 wn

n(w11)

n(w1 2)

n(w1 3)

Figure 6 An example of hierarchical binary tree

Table 1 A simple statistics on the TourSG data set

SetNo of dialogs No of segments

No of utterancesSG1 SG2 SG3 Total Acco Attr Food Shop Trsp Other Total

Training 7 7 0 14 187 762 275 149 374 357 2104 12759Development 3 3 0 6 94 282 102 67 87 68 700 4812Test 3 3 3 9 174 616 134 49 174 186 1333 7848

Speaker Utterance

Tourist Can you give me some uh- tell me some cheap rate hotels because Irsquom planning just to leave mybags there and go somewhere take some pictures

Dialog state

TYPE = budget hotel

Tourist Yes Irsquom just gonna bring my backpack and my buddy with me So Irsquom kinda looking for a hotel thatis not that expensive Just gonna leave our things there and you know stay out the whole day TYPE = hostel

Guide Okay Let me get you hm hm So you donrsquot mind if itrsquos a bit uh not so roomy like hotel because youjust back to sleep INFO = room size

Tourist Yes Yes As we just gonna put our things there and then go out to take some pictures INFO = room sizeGuide Okay um- mdashTourist Hm mdash

Guide Letrsquos try this one okay mdashTourist Okay mdash

Guide Itrsquos InnCrowd Backpackers Hostel in Singapore If you take a dorm bed per person only twentydollars If you take a room itrsquos two single beds at fifty nine dollars

PLACE = the inncrowdbackpackers hostel

Tourist Um Wow thatrsquos good PLACE = the inncrowdbackpackers hostel

Guide Yah the prices are based on per person per bed or dorm But this one is room So it should be fiftynine for the two room So yoursquore actually paying about ten dollars more per person only INFO = pricerange

Tourist Oh okay Thatrsquos- the price is reasonable actually Itrsquos good INFO = pricerange

Guide Okay Irsquom going to recommend firstly you want to have a backpack type of hotel right TYPE = hostel

Figure 7 A sample dialog from TourSG data set for a topic ACCOMMODATION

6 Computational Intelligence and Neuroscience

e data set has two kinds of slots for dialog statetracking a regular slot and an INFO slot e regular slot ispredefined to deliver the essential content of a subdialog fora specific topic A topic contains three to seven regular slotson average On the other hand the INFO slot is for in-dicating the content mentioned implicitly in the subdialogIn the second subdialog of Figure 7 there are two slots ofldquoPLACErdquo and ldquoINFOrdquo where ldquoPLACErdquo is a regular slot andldquoINFOrdquo is the INFO slot Since the hotel name is describedexplicitly with the words ldquoInnCrowd Backpackers Hotelrdquo itbecomes the attribute of the slot ldquoPLACErdquo In addition thedialog participants are discussing the price of the hotelusthe attribute of the INFO slot becomes Pricerange eventhough the words ldquoprice rangerdquo do not appear explicitly inthe dialog e possible slot types and the list of theircandidate attributes vary according to a topic category andthey are described as an ontology provided by DSTC [26]

Table 3 shows the number of slots and their possibleattributes for every topic category ACCOMMODATIONATTRACTION FOOD and SHOPPING contain commonregular slots that refer to the names of places the types ofplaces and the information of geographic areas In detail thetopic ACCOMMODATION has three common regular slotsand the topic ATTRACTION includes five regular slotsabout the touristic activities visit-time for attraction andthree common regular slots e topic FOOD has seven slotsthat stand for the types of cuisine the names of dishes anddrinks the time for eating and three common regular slotse topic SHOPPING has three common regular slots andtime information for shopping e last topic TRANS-PORTATION includes the types of transportation thedestinations and origins the information on train lines andstations in Singapore and the types of tickets e regularslots have 570 661 1411 651 and 2450 attributes inACCOMMODATION ATTRACTION FOOD SHOPPINGand TRANSPORTATION respectively On average oneregular slot has roughly 190 122 202 163 and 490 attributesfor each topic category On the other hand INFO slot hasonly one slot is is because INFO is both a slot type anda slot name It has 32 31 16 16 and 15 attributes for fivetopic categories

52 Evaluation Metrics and Experimental Settings e ac-curacy and the F1-score are used to evaluate both the in-formativeness classifier and the neural dialog state trackerwhere they are defined as follows

(i) Accuracy the fraction of predictions in which themodel output is same with the gold label

(ii) PrecisionRecallF1

(a) Precision the fraction of predictions in themodel outputs that are correctly predicted

(b) Recall the fraction of predictions in the goldlabels that are correctly predicted

(c) F1 the harmonic mean of precision and recall

For the informativeness classifier the accuracy checkshow correctly a classifier determines whether a given ut-terance is informative or not and the F1-score shows howprecise the informativeness classifier is as well as howrobust it is In the dialog state tracking the predictions areslot-attribute pairs us the accuracy checks howequivalent the trackerrsquos outputs are to gold standard labelsat the whole frame-level while the precision and the recallaim to check the partial correctness at the slot-attributelevel

e parameters of the informativeness classifier and theneural dialog tracker are described in Tables 4 and 5 re-spectively Every word of the utterances the input for boththe informativeness classifier and the neural tracker isembedded into a 500-dimension vector ree different sizesof filters are used for the informativeness classifier and thenumber of filters for each filter size is 128 e dropout rateof the classifier is 05 the batch size is 50 and the number oftraining epochs is 10 For the neural tracker the hidden layersize is 500 the dropout ratio is 01 the learning rate is 0005and the number of epochs is 100000

53 Experimental Results Table 6 shows the classificationperformance of the informativeness classifier e classifierachieves the accuracy of 9316 and F1-score of 9317 for testset while the classifier shows a relatively low precision of9089 e reason for the relatively low precision is thatthere are some utterances about money in the non-informative class For instance the classifier predicts anutterance ldquoOkay maybe a thousand dollars for a dayrdquo asinformative due to the words ldquothousand dollarsrdquo and ldquoa dayrdquobut its gold standard label is noninformative Since thedialog to which the utterance belongs is an opening sectionbefore recommending tour information the utterance isa noninformative utterance for tracking dialog is prob-lem occurs because the classifier determines the in-formativeness of utterances without any contextualinformation of a dialog flow

To show the feasibility of the proposed model the dialogstate tracking performance of the proposed model is com-pared with those of the participants in the DSTC4 challengee string match tracker determines the slot and the at-tribute by a simple exact string match between the slot andthe given utterance [26] e LSTM-based neural trackerproposed by Yoshino et al [20] is a pure LSTM without the

Table 2 e class distribution for the informativeness classifier

Set Informative NoninformativeTraining 9974 2785Development 4139 673Test 6528 1320

Computational Intelligence and Neuroscience 7

attention mechanismus it can be directly compared withthe proposed model Tables 7 and 8 show the performance ofthe trackers e proposed model outperforms its com-petitors in both accuracy and F1-score Especially it achievesfour times higher accuracy than the LSTM which is a neuralnetwork-based model and is a similar approach as ourproposed model is result means that handling necessaryinformation from utterance is important to tracking dialogstate e main dierence between the proposed model andthe LSTM is the presence of the attention mechanism usit can be inferred that it is important in dialog state trackingto focus on some salient words while processing a userutterance

e performance of the proposed model per topic isinvestigated in Figure 8 e performance is measured forve topics related with dialog state tracking e proposedmodel achieves the best performance at every topic Espe-cially the proposed model only obtains lower performanceat the topic ACCOMMODATION e reason why the

proposed model shows poor performance for ACCOM-MODATION is that there are many utterances mentioningsome locations in the dialogs of ACCOMMODATION but

Table 6 e classication performance of the informativenessclassier

Set Accuracy Precision Recall F1-scoreDevelopment 09484 09202 09820 09501Test 09316 09089 09557 09317

Table 5e parameters for the neural dialog state tracker and theirvalues

Parameter ValueEmbedding size 500Hidden layer size 500Dropout ratio 01Learning rate 0005Number of epochs 100000

Table 3 e number of slots (regular INFO) and their attributes

Topic categoryRegular slot INFO slot

No of slot No of attribute No of avg attribute per slot No of slot No of attributeACCOMMODATION 3 570 190 1 23ATTRACTION 5 611 122 1 31FOOD 7 1411 202 1 16SHOPPING 4 651 163 1 16TRANSPORTATION 5 2450 490 1 15

Table 4e parameters for the informativeness classier and theirvalues

Parameter ValueEmbedding size 500Filter sizes 2 3 4e number of lters 128Dropout ratio 05Batch size 50Number of epochs 1000

Table 7 e comparison of the dialog state tracking accuraciesbetween the proposed model and DSTC4 participants

Model AccuracyString match [26] 00374LSTM [20] 00270Proposed model 00852

Table 8 e comparison of the dialog state tracking F1-scoresbetween the proposed model and DSTC4 participants

Model Precision Recall F1-scoreString match [26] 03589 01925 02506LSTM [20] 03400 02010 02530Proposed model 05173 03499 04174

06

05

04

03F1-s

core

02

01

ACCO

MM

OD

ATIO

N

ATTR

ACTI

ON

Topic

FOO

D

SHO

PPIN

G

TRAN

SPO

RTAT

ION

String matchLSTMProposed model

Figure 8 e F1-score of the proposed model per topic

8 Computational Intelligence and Neuroscience

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

number of classes and infrequent classes in dialog statetracking it is difficult for a tracker to estimate the dialogstate However by adopting the hierarchical softmaxfunction the proposed tracker can estimate infrequentclasses as well as the frequent classes

In the proposed model the slots and their attributes areconsidered as words and thus they are represented as a binarytree in which a word is a leaf node Since there areV words thetree hasVminus 1 inner nodes For each leaf node there exists onlyone path from a root to the leaf node that is used to estimatethe probability of the word For example Figure 6 is a hier-archical binary tree in which the leaf nodes represent words(slot-attribute pairs) and inner nodes represent probabilitymass e highlighted nodes and edges make a path from rootto an leaf node w1 and n(w j) means the j-th node on thepath from root to a leaf node w In the hierarchical softmaxmodel each of the Vminus 1 inner node has an output vectorvprimen(wj) e probability of a word being an output word is

p w wO( 1113857 1113945

L(w)minus1

j1σ I(t) middot vn(wj)

prime middot h1113872 1113873 (10)

where I(t) is an indicator function whose value is 1 if thevariable t is true and minus1 otherwise In this equation t checksif n(w j + 1) lch(n(w j)) lch(n) is the left child of noden and L(w) is the length of the path us I(t) examineswhether the (j + 1)minus th inner node on the path from root tow is equal to the left child of the jminus th inner node on the pathor not

5 Experiments

51 Data Set To examine the effectiveness of the proposedmodel we used TourSG data set (httpwwwcolipsorgworkshopdstc4datahtml) that was used for DSTC4Table 1 shows the simple statistics on the data set is dataset has totally 35 dialog sessions on Singapore tour in-formation collected from Skype call logs ree tour guidesand 35 tourists have participated in these call logs e 35dialog sessions are composed of 31304 utterances and273580 words respectively e training set has seven di-alogs from tour guide SG1 and SG2 the development set hasthree dialogs from the same guides and the test set has threedialogs from all tour guides us the dialogs from tourguide SG3 are used only for test As a result this data set canbe used to evaluate the robustness of dialog state trackers

Every dialog in the data set is expressed as a series ofutterances and each utterance is an unrefined transcriptioncontaining some disfluencies such as ldquoahrdquo ldquouhrdquo or ldquoumrdquo Adialog can be divided again into subdialog segments thatbelong to one of nine topic categories of OPENINGCLOSING FOOD ITINERARY ACCOMMODATIONATTRACTION SHOPPING TRANSPORTATION andOTHER e example dialog in Figure 7 has two subdialogswhere a dashed line is the border between the subdialogs As

n times k representation ofsentence

Convolution layer withmultiple filter widths

and feature maps

Max-over-time

pooling

Fully connectedlayer with

dropout andsomax output

xn

xnndash1

x1

x2

Figure 4 e structure of the informativeness classifier

x1

x1

h1

x2

h2

xnndash1

hnndash1

xn

hn

c

Hierarchical softmax

α1

hz

r

InOut

GRU cell

αn

x2 xnndash1 xn

h~

Figure 5 e structure of the proposed dialog state tracker withattention and hierarchical softmax

Computational Intelligence and Neuroscience 5

shown in ldquoDialog Staterdquo column of this table not all ut-terances have a slot and attributes

e topics of OPENING CLOSING ITINERARY andOTHER deliver no information about dialog states us theyare all merged into a single topic OTHERS for the in-formativeness classification As a result for the informativeclassifier the five topics of FOOD ACCOMMODATIONATTRACTION SHOPPING and TRANSPORTATION are

considered as a informative class while the topic OTHERS isregarded as a noninformative class Table 2 summarizes theclass distribution of the TourSG data set for the in-formativeness classifier e training set has 9974 informativeutterances and 2785 noninformative utterances e devel-opment set has 4139 informative and 673 noninformativeutterances while the test data set has 6528 and 1320 utter-ances for informative and noninformative classes respectively

w1 w2 w3 w4 wnndash1 wn

n(w11)

n(w1 2)

n(w1 3)

Figure 6 An example of hierarchical binary tree

Table 1 A simple statistics on the TourSG data set

SetNo of dialogs No of segments

No of utterancesSG1 SG2 SG3 Total Acco Attr Food Shop Trsp Other Total

Training 7 7 0 14 187 762 275 149 374 357 2104 12759Development 3 3 0 6 94 282 102 67 87 68 700 4812Test 3 3 3 9 174 616 134 49 174 186 1333 7848

Speaker Utterance

Tourist Can you give me some uh- tell me some cheap rate hotels because Irsquom planning just to leave mybags there and go somewhere take some pictures

Dialog state

TYPE = budget hotel

Tourist Yes Irsquom just gonna bring my backpack and my buddy with me So Irsquom kinda looking for a hotel thatis not that expensive Just gonna leave our things there and you know stay out the whole day TYPE = hostel

Guide Okay Let me get you hm hm So you donrsquot mind if itrsquos a bit uh not so roomy like hotel because youjust back to sleep INFO = room size

Tourist Yes Yes As we just gonna put our things there and then go out to take some pictures INFO = room sizeGuide Okay um- mdashTourist Hm mdash

Guide Letrsquos try this one okay mdashTourist Okay mdash

Guide Itrsquos InnCrowd Backpackers Hostel in Singapore If you take a dorm bed per person only twentydollars If you take a room itrsquos two single beds at fifty nine dollars

PLACE = the inncrowdbackpackers hostel

Tourist Um Wow thatrsquos good PLACE = the inncrowdbackpackers hostel

Guide Yah the prices are based on per person per bed or dorm But this one is room So it should be fiftynine for the two room So yoursquore actually paying about ten dollars more per person only INFO = pricerange

Tourist Oh okay Thatrsquos- the price is reasonable actually Itrsquos good INFO = pricerange

Guide Okay Irsquom going to recommend firstly you want to have a backpack type of hotel right TYPE = hostel

Figure 7 A sample dialog from TourSG data set for a topic ACCOMMODATION

6 Computational Intelligence and Neuroscience

e data set has two kinds of slots for dialog statetracking a regular slot and an INFO slot e regular slot ispredefined to deliver the essential content of a subdialog fora specific topic A topic contains three to seven regular slotson average On the other hand the INFO slot is for in-dicating the content mentioned implicitly in the subdialogIn the second subdialog of Figure 7 there are two slots ofldquoPLACErdquo and ldquoINFOrdquo where ldquoPLACErdquo is a regular slot andldquoINFOrdquo is the INFO slot Since the hotel name is describedexplicitly with the words ldquoInnCrowd Backpackers Hotelrdquo itbecomes the attribute of the slot ldquoPLACErdquo In addition thedialog participants are discussing the price of the hotelusthe attribute of the INFO slot becomes Pricerange eventhough the words ldquoprice rangerdquo do not appear explicitly inthe dialog e possible slot types and the list of theircandidate attributes vary according to a topic category andthey are described as an ontology provided by DSTC [26]

Table 3 shows the number of slots and their possibleattributes for every topic category ACCOMMODATIONATTRACTION FOOD and SHOPPING contain commonregular slots that refer to the names of places the types ofplaces and the information of geographic areas In detail thetopic ACCOMMODATION has three common regular slotsand the topic ATTRACTION includes five regular slotsabout the touristic activities visit-time for attraction andthree common regular slots e topic FOOD has seven slotsthat stand for the types of cuisine the names of dishes anddrinks the time for eating and three common regular slotse topic SHOPPING has three common regular slots andtime information for shopping e last topic TRANS-PORTATION includes the types of transportation thedestinations and origins the information on train lines andstations in Singapore and the types of tickets e regularslots have 570 661 1411 651 and 2450 attributes inACCOMMODATION ATTRACTION FOOD SHOPPINGand TRANSPORTATION respectively On average oneregular slot has roughly 190 122 202 163 and 490 attributesfor each topic category On the other hand INFO slot hasonly one slot is is because INFO is both a slot type anda slot name It has 32 31 16 16 and 15 attributes for fivetopic categories

52 Evaluation Metrics and Experimental Settings e ac-curacy and the F1-score are used to evaluate both the in-formativeness classifier and the neural dialog state trackerwhere they are defined as follows

(i) Accuracy the fraction of predictions in which themodel output is same with the gold label

(ii) PrecisionRecallF1

(a) Precision the fraction of predictions in themodel outputs that are correctly predicted

(b) Recall the fraction of predictions in the goldlabels that are correctly predicted

(c) F1 the harmonic mean of precision and recall

For the informativeness classifier the accuracy checkshow correctly a classifier determines whether a given ut-terance is informative or not and the F1-score shows howprecise the informativeness classifier is as well as howrobust it is In the dialog state tracking the predictions areslot-attribute pairs us the accuracy checks howequivalent the trackerrsquos outputs are to gold standard labelsat the whole frame-level while the precision and the recallaim to check the partial correctness at the slot-attributelevel

e parameters of the informativeness classifier and theneural dialog tracker are described in Tables 4 and 5 re-spectively Every word of the utterances the input for boththe informativeness classifier and the neural tracker isembedded into a 500-dimension vector ree different sizesof filters are used for the informativeness classifier and thenumber of filters for each filter size is 128 e dropout rateof the classifier is 05 the batch size is 50 and the number oftraining epochs is 10 For the neural tracker the hidden layersize is 500 the dropout ratio is 01 the learning rate is 0005and the number of epochs is 100000

53 Experimental Results Table 6 shows the classificationperformance of the informativeness classifier e classifierachieves the accuracy of 9316 and F1-score of 9317 for testset while the classifier shows a relatively low precision of9089 e reason for the relatively low precision is thatthere are some utterances about money in the non-informative class For instance the classifier predicts anutterance ldquoOkay maybe a thousand dollars for a dayrdquo asinformative due to the words ldquothousand dollarsrdquo and ldquoa dayrdquobut its gold standard label is noninformative Since thedialog to which the utterance belongs is an opening sectionbefore recommending tour information the utterance isa noninformative utterance for tracking dialog is prob-lem occurs because the classifier determines the in-formativeness of utterances without any contextualinformation of a dialog flow

To show the feasibility of the proposed model the dialogstate tracking performance of the proposed model is com-pared with those of the participants in the DSTC4 challengee string match tracker determines the slot and the at-tribute by a simple exact string match between the slot andthe given utterance [26] e LSTM-based neural trackerproposed by Yoshino et al [20] is a pure LSTM without the

Table 2 e class distribution for the informativeness classifier

Set Informative NoninformativeTraining 9974 2785Development 4139 673Test 6528 1320

Computational Intelligence and Neuroscience 7

attention mechanismus it can be directly compared withthe proposed model Tables 7 and 8 show the performance ofthe trackers e proposed model outperforms its com-petitors in both accuracy and F1-score Especially it achievesfour times higher accuracy than the LSTM which is a neuralnetwork-based model and is a similar approach as ourproposed model is result means that handling necessaryinformation from utterance is important to tracking dialogstate e main dierence between the proposed model andthe LSTM is the presence of the attention mechanism usit can be inferred that it is important in dialog state trackingto focus on some salient words while processing a userutterance

e performance of the proposed model per topic isinvestigated in Figure 8 e performance is measured forve topics related with dialog state tracking e proposedmodel achieves the best performance at every topic Espe-cially the proposed model only obtains lower performanceat the topic ACCOMMODATION e reason why the

proposed model shows poor performance for ACCOM-MODATION is that there are many utterances mentioningsome locations in the dialogs of ACCOMMODATION but

Table 6 e classication performance of the informativenessclassier

Set Accuracy Precision Recall F1-scoreDevelopment 09484 09202 09820 09501Test 09316 09089 09557 09317

Table 5e parameters for the neural dialog state tracker and theirvalues

Parameter ValueEmbedding size 500Hidden layer size 500Dropout ratio 01Learning rate 0005Number of epochs 100000

Table 3 e number of slots (regular INFO) and their attributes

Topic categoryRegular slot INFO slot

No of slot No of attribute No of avg attribute per slot No of slot No of attributeACCOMMODATION 3 570 190 1 23ATTRACTION 5 611 122 1 31FOOD 7 1411 202 1 16SHOPPING 4 651 163 1 16TRANSPORTATION 5 2450 490 1 15

Table 4e parameters for the informativeness classier and theirvalues

Parameter ValueEmbedding size 500Filter sizes 2 3 4e number of lters 128Dropout ratio 05Batch size 50Number of epochs 1000

Table 7 e comparison of the dialog state tracking accuraciesbetween the proposed model and DSTC4 participants

Model AccuracyString match [26] 00374LSTM [20] 00270Proposed model 00852

Table 8 e comparison of the dialog state tracking F1-scoresbetween the proposed model and DSTC4 participants

Model Precision Recall F1-scoreString match [26] 03589 01925 02506LSTM [20] 03400 02010 02530Proposed model 05173 03499 04174

06

05

04

03F1-s

core

02

01

ACCO

MM

OD

ATIO

N

ATTR

ACTI

ON

Topic

FOO

D

SHO

PPIN

G

TRAN

SPO

RTAT

ION

String matchLSTMProposed model

Figure 8 e F1-score of the proposed model per topic

8 Computational Intelligence and Neuroscience

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

shown in ldquoDialog Staterdquo column of this table not all ut-terances have a slot and attributes

e topics of OPENING CLOSING ITINERARY andOTHER deliver no information about dialog states us theyare all merged into a single topic OTHERS for the in-formativeness classification As a result for the informativeclassifier the five topics of FOOD ACCOMMODATIONATTRACTION SHOPPING and TRANSPORTATION are

considered as a informative class while the topic OTHERS isregarded as a noninformative class Table 2 summarizes theclass distribution of the TourSG data set for the in-formativeness classifier e training set has 9974 informativeutterances and 2785 noninformative utterances e devel-opment set has 4139 informative and 673 noninformativeutterances while the test data set has 6528 and 1320 utter-ances for informative and noninformative classes respectively

w1 w2 w3 w4 wnndash1 wn

n(w11)

n(w1 2)

n(w1 3)

Figure 6 An example of hierarchical binary tree

Table 1 A simple statistics on the TourSG data set

SetNo of dialogs No of segments

No of utterancesSG1 SG2 SG3 Total Acco Attr Food Shop Trsp Other Total

Training 7 7 0 14 187 762 275 149 374 357 2104 12759Development 3 3 0 6 94 282 102 67 87 68 700 4812Test 3 3 3 9 174 616 134 49 174 186 1333 7848

Speaker Utterance

Tourist Can you give me some uh- tell me some cheap rate hotels because Irsquom planning just to leave mybags there and go somewhere take some pictures

Dialog state

TYPE = budget hotel

Tourist Yes Irsquom just gonna bring my backpack and my buddy with me So Irsquom kinda looking for a hotel thatis not that expensive Just gonna leave our things there and you know stay out the whole day TYPE = hostel

Guide Okay Let me get you hm hm So you donrsquot mind if itrsquos a bit uh not so roomy like hotel because youjust back to sleep INFO = room size

Tourist Yes Yes As we just gonna put our things there and then go out to take some pictures INFO = room sizeGuide Okay um- mdashTourist Hm mdash

Guide Letrsquos try this one okay mdashTourist Okay mdash

Guide Itrsquos InnCrowd Backpackers Hostel in Singapore If you take a dorm bed per person only twentydollars If you take a room itrsquos two single beds at fifty nine dollars

PLACE = the inncrowdbackpackers hostel

Tourist Um Wow thatrsquos good PLACE = the inncrowdbackpackers hostel

Guide Yah the prices are based on per person per bed or dorm But this one is room So it should be fiftynine for the two room So yoursquore actually paying about ten dollars more per person only INFO = pricerange

Tourist Oh okay Thatrsquos- the price is reasonable actually Itrsquos good INFO = pricerange

Guide Okay Irsquom going to recommend firstly you want to have a backpack type of hotel right TYPE = hostel

Figure 7 A sample dialog from TourSG data set for a topic ACCOMMODATION

6 Computational Intelligence and Neuroscience

e data set has two kinds of slots for dialog statetracking a regular slot and an INFO slot e regular slot ispredefined to deliver the essential content of a subdialog fora specific topic A topic contains three to seven regular slotson average On the other hand the INFO slot is for in-dicating the content mentioned implicitly in the subdialogIn the second subdialog of Figure 7 there are two slots ofldquoPLACErdquo and ldquoINFOrdquo where ldquoPLACErdquo is a regular slot andldquoINFOrdquo is the INFO slot Since the hotel name is describedexplicitly with the words ldquoInnCrowd Backpackers Hotelrdquo itbecomes the attribute of the slot ldquoPLACErdquo In addition thedialog participants are discussing the price of the hotelusthe attribute of the INFO slot becomes Pricerange eventhough the words ldquoprice rangerdquo do not appear explicitly inthe dialog e possible slot types and the list of theircandidate attributes vary according to a topic category andthey are described as an ontology provided by DSTC [26]

Table 3 shows the number of slots and their possibleattributes for every topic category ACCOMMODATIONATTRACTION FOOD and SHOPPING contain commonregular slots that refer to the names of places the types ofplaces and the information of geographic areas In detail thetopic ACCOMMODATION has three common regular slotsand the topic ATTRACTION includes five regular slotsabout the touristic activities visit-time for attraction andthree common regular slots e topic FOOD has seven slotsthat stand for the types of cuisine the names of dishes anddrinks the time for eating and three common regular slotse topic SHOPPING has three common regular slots andtime information for shopping e last topic TRANS-PORTATION includes the types of transportation thedestinations and origins the information on train lines andstations in Singapore and the types of tickets e regularslots have 570 661 1411 651 and 2450 attributes inACCOMMODATION ATTRACTION FOOD SHOPPINGand TRANSPORTATION respectively On average oneregular slot has roughly 190 122 202 163 and 490 attributesfor each topic category On the other hand INFO slot hasonly one slot is is because INFO is both a slot type anda slot name It has 32 31 16 16 and 15 attributes for fivetopic categories

52 Evaluation Metrics and Experimental Settings e ac-curacy and the F1-score are used to evaluate both the in-formativeness classifier and the neural dialog state trackerwhere they are defined as follows

(i) Accuracy the fraction of predictions in which themodel output is same with the gold label

(ii) PrecisionRecallF1

(a) Precision the fraction of predictions in themodel outputs that are correctly predicted

(b) Recall the fraction of predictions in the goldlabels that are correctly predicted

(c) F1 the harmonic mean of precision and recall

For the informativeness classifier the accuracy checkshow correctly a classifier determines whether a given ut-terance is informative or not and the F1-score shows howprecise the informativeness classifier is as well as howrobust it is In the dialog state tracking the predictions areslot-attribute pairs us the accuracy checks howequivalent the trackerrsquos outputs are to gold standard labelsat the whole frame-level while the precision and the recallaim to check the partial correctness at the slot-attributelevel

e parameters of the informativeness classifier and theneural dialog tracker are described in Tables 4 and 5 re-spectively Every word of the utterances the input for boththe informativeness classifier and the neural tracker isembedded into a 500-dimension vector ree different sizesof filters are used for the informativeness classifier and thenumber of filters for each filter size is 128 e dropout rateof the classifier is 05 the batch size is 50 and the number oftraining epochs is 10 For the neural tracker the hidden layersize is 500 the dropout ratio is 01 the learning rate is 0005and the number of epochs is 100000

53 Experimental Results Table 6 shows the classificationperformance of the informativeness classifier e classifierachieves the accuracy of 9316 and F1-score of 9317 for testset while the classifier shows a relatively low precision of9089 e reason for the relatively low precision is thatthere are some utterances about money in the non-informative class For instance the classifier predicts anutterance ldquoOkay maybe a thousand dollars for a dayrdquo asinformative due to the words ldquothousand dollarsrdquo and ldquoa dayrdquobut its gold standard label is noninformative Since thedialog to which the utterance belongs is an opening sectionbefore recommending tour information the utterance isa noninformative utterance for tracking dialog is prob-lem occurs because the classifier determines the in-formativeness of utterances without any contextualinformation of a dialog flow

To show the feasibility of the proposed model the dialogstate tracking performance of the proposed model is com-pared with those of the participants in the DSTC4 challengee string match tracker determines the slot and the at-tribute by a simple exact string match between the slot andthe given utterance [26] e LSTM-based neural trackerproposed by Yoshino et al [20] is a pure LSTM without the

Table 2 e class distribution for the informativeness classifier

Set Informative NoninformativeTraining 9974 2785Development 4139 673Test 6528 1320

Computational Intelligence and Neuroscience 7

attention mechanismus it can be directly compared withthe proposed model Tables 7 and 8 show the performance ofthe trackers e proposed model outperforms its com-petitors in both accuracy and F1-score Especially it achievesfour times higher accuracy than the LSTM which is a neuralnetwork-based model and is a similar approach as ourproposed model is result means that handling necessaryinformation from utterance is important to tracking dialogstate e main dierence between the proposed model andthe LSTM is the presence of the attention mechanism usit can be inferred that it is important in dialog state trackingto focus on some salient words while processing a userutterance

e performance of the proposed model per topic isinvestigated in Figure 8 e performance is measured forve topics related with dialog state tracking e proposedmodel achieves the best performance at every topic Espe-cially the proposed model only obtains lower performanceat the topic ACCOMMODATION e reason why the

proposed model shows poor performance for ACCOM-MODATION is that there are many utterances mentioningsome locations in the dialogs of ACCOMMODATION but

Table 6 e classication performance of the informativenessclassier

Set Accuracy Precision Recall F1-scoreDevelopment 09484 09202 09820 09501Test 09316 09089 09557 09317

Table 5e parameters for the neural dialog state tracker and theirvalues

Parameter ValueEmbedding size 500Hidden layer size 500Dropout ratio 01Learning rate 0005Number of epochs 100000

Table 3 e number of slots (regular INFO) and their attributes

Topic categoryRegular slot INFO slot

No of slot No of attribute No of avg attribute per slot No of slot No of attributeACCOMMODATION 3 570 190 1 23ATTRACTION 5 611 122 1 31FOOD 7 1411 202 1 16SHOPPING 4 651 163 1 16TRANSPORTATION 5 2450 490 1 15

Table 4e parameters for the informativeness classier and theirvalues

Parameter ValueEmbedding size 500Filter sizes 2 3 4e number of lters 128Dropout ratio 05Batch size 50Number of epochs 1000

Table 7 e comparison of the dialog state tracking accuraciesbetween the proposed model and DSTC4 participants

Model AccuracyString match [26] 00374LSTM [20] 00270Proposed model 00852

Table 8 e comparison of the dialog state tracking F1-scoresbetween the proposed model and DSTC4 participants

Model Precision Recall F1-scoreString match [26] 03589 01925 02506LSTM [20] 03400 02010 02530Proposed model 05173 03499 04174

06

05

04

03F1-s

core

02

01

ACCO

MM

OD

ATIO

N

ATTR

ACTI

ON

Topic

FOO

D

SHO

PPIN

G

TRAN

SPO

RTAT

ION

String matchLSTMProposed model

Figure 8 e F1-score of the proposed model per topic

8 Computational Intelligence and Neuroscience

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

e data set has two kinds of slots for dialog statetracking a regular slot and an INFO slot e regular slot ispredefined to deliver the essential content of a subdialog fora specific topic A topic contains three to seven regular slotson average On the other hand the INFO slot is for in-dicating the content mentioned implicitly in the subdialogIn the second subdialog of Figure 7 there are two slots ofldquoPLACErdquo and ldquoINFOrdquo where ldquoPLACErdquo is a regular slot andldquoINFOrdquo is the INFO slot Since the hotel name is describedexplicitly with the words ldquoInnCrowd Backpackers Hotelrdquo itbecomes the attribute of the slot ldquoPLACErdquo In addition thedialog participants are discussing the price of the hotelusthe attribute of the INFO slot becomes Pricerange eventhough the words ldquoprice rangerdquo do not appear explicitly inthe dialog e possible slot types and the list of theircandidate attributes vary according to a topic category andthey are described as an ontology provided by DSTC [26]

Table 3 shows the number of slots and their possibleattributes for every topic category ACCOMMODATIONATTRACTION FOOD and SHOPPING contain commonregular slots that refer to the names of places the types ofplaces and the information of geographic areas In detail thetopic ACCOMMODATION has three common regular slotsand the topic ATTRACTION includes five regular slotsabout the touristic activities visit-time for attraction andthree common regular slots e topic FOOD has seven slotsthat stand for the types of cuisine the names of dishes anddrinks the time for eating and three common regular slotse topic SHOPPING has three common regular slots andtime information for shopping e last topic TRANS-PORTATION includes the types of transportation thedestinations and origins the information on train lines andstations in Singapore and the types of tickets e regularslots have 570 661 1411 651 and 2450 attributes inACCOMMODATION ATTRACTION FOOD SHOPPINGand TRANSPORTATION respectively On average oneregular slot has roughly 190 122 202 163 and 490 attributesfor each topic category On the other hand INFO slot hasonly one slot is is because INFO is both a slot type anda slot name It has 32 31 16 16 and 15 attributes for fivetopic categories

52 Evaluation Metrics and Experimental Settings e ac-curacy and the F1-score are used to evaluate both the in-formativeness classifier and the neural dialog state trackerwhere they are defined as follows

(i) Accuracy the fraction of predictions in which themodel output is same with the gold label

(ii) PrecisionRecallF1

(a) Precision the fraction of predictions in themodel outputs that are correctly predicted

(b) Recall the fraction of predictions in the goldlabels that are correctly predicted

(c) F1 the harmonic mean of precision and recall

For the informativeness classifier the accuracy checkshow correctly a classifier determines whether a given ut-terance is informative or not and the F1-score shows howprecise the informativeness classifier is as well as howrobust it is In the dialog state tracking the predictions areslot-attribute pairs us the accuracy checks howequivalent the trackerrsquos outputs are to gold standard labelsat the whole frame-level while the precision and the recallaim to check the partial correctness at the slot-attributelevel

e parameters of the informativeness classifier and theneural dialog tracker are described in Tables 4 and 5 re-spectively Every word of the utterances the input for boththe informativeness classifier and the neural tracker isembedded into a 500-dimension vector ree different sizesof filters are used for the informativeness classifier and thenumber of filters for each filter size is 128 e dropout rateof the classifier is 05 the batch size is 50 and the number oftraining epochs is 10 For the neural tracker the hidden layersize is 500 the dropout ratio is 01 the learning rate is 0005and the number of epochs is 100000

53 Experimental Results Table 6 shows the classificationperformance of the informativeness classifier e classifierachieves the accuracy of 9316 and F1-score of 9317 for testset while the classifier shows a relatively low precision of9089 e reason for the relatively low precision is thatthere are some utterances about money in the non-informative class For instance the classifier predicts anutterance ldquoOkay maybe a thousand dollars for a dayrdquo asinformative due to the words ldquothousand dollarsrdquo and ldquoa dayrdquobut its gold standard label is noninformative Since thedialog to which the utterance belongs is an opening sectionbefore recommending tour information the utterance isa noninformative utterance for tracking dialog is prob-lem occurs because the classifier determines the in-formativeness of utterances without any contextualinformation of a dialog flow

To show the feasibility of the proposed model the dialogstate tracking performance of the proposed model is com-pared with those of the participants in the DSTC4 challengee string match tracker determines the slot and the at-tribute by a simple exact string match between the slot andthe given utterance [26] e LSTM-based neural trackerproposed by Yoshino et al [20] is a pure LSTM without the

Table 2 e class distribution for the informativeness classifier

Set Informative NoninformativeTraining 9974 2785Development 4139 673Test 6528 1320

Computational Intelligence and Neuroscience 7

attention mechanismus it can be directly compared withthe proposed model Tables 7 and 8 show the performance ofthe trackers e proposed model outperforms its com-petitors in both accuracy and F1-score Especially it achievesfour times higher accuracy than the LSTM which is a neuralnetwork-based model and is a similar approach as ourproposed model is result means that handling necessaryinformation from utterance is important to tracking dialogstate e main dierence between the proposed model andthe LSTM is the presence of the attention mechanism usit can be inferred that it is important in dialog state trackingto focus on some salient words while processing a userutterance

e performance of the proposed model per topic isinvestigated in Figure 8 e performance is measured forve topics related with dialog state tracking e proposedmodel achieves the best performance at every topic Espe-cially the proposed model only obtains lower performanceat the topic ACCOMMODATION e reason why the

proposed model shows poor performance for ACCOM-MODATION is that there are many utterances mentioningsome locations in the dialogs of ACCOMMODATION but

Table 6 e classication performance of the informativenessclassier

Set Accuracy Precision Recall F1-scoreDevelopment 09484 09202 09820 09501Test 09316 09089 09557 09317

Table 5e parameters for the neural dialog state tracker and theirvalues

Parameter ValueEmbedding size 500Hidden layer size 500Dropout ratio 01Learning rate 0005Number of epochs 100000

Table 3 e number of slots (regular INFO) and their attributes

Topic categoryRegular slot INFO slot

No of slot No of attribute No of avg attribute per slot No of slot No of attributeACCOMMODATION 3 570 190 1 23ATTRACTION 5 611 122 1 31FOOD 7 1411 202 1 16SHOPPING 4 651 163 1 16TRANSPORTATION 5 2450 490 1 15

Table 4e parameters for the informativeness classier and theirvalues

Parameter ValueEmbedding size 500Filter sizes 2 3 4e number of lters 128Dropout ratio 05Batch size 50Number of epochs 1000

Table 7 e comparison of the dialog state tracking accuraciesbetween the proposed model and DSTC4 participants

Model AccuracyString match [26] 00374LSTM [20] 00270Proposed model 00852

Table 8 e comparison of the dialog state tracking F1-scoresbetween the proposed model and DSTC4 participants

Model Precision Recall F1-scoreString match [26] 03589 01925 02506LSTM [20] 03400 02010 02530Proposed model 05173 03499 04174

06

05

04

03F1-s

core

02

01

ACCO

MM

OD

ATIO

N

ATTR

ACTI

ON

Topic

FOO

D

SHO

PPIN

G

TRAN

SPO

RTAT

ION

String matchLSTMProposed model

Figure 8 e F1-score of the proposed model per topic

8 Computational Intelligence and Neuroscience

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

attention mechanismus it can be directly compared withthe proposed model Tables 7 and 8 show the performance ofthe trackers e proposed model outperforms its com-petitors in both accuracy and F1-score Especially it achievesfour times higher accuracy than the LSTM which is a neuralnetwork-based model and is a similar approach as ourproposed model is result means that handling necessaryinformation from utterance is important to tracking dialogstate e main dierence between the proposed model andthe LSTM is the presence of the attention mechanism usit can be inferred that it is important in dialog state trackingto focus on some salient words while processing a userutterance

e performance of the proposed model per topic isinvestigated in Figure 8 e performance is measured forve topics related with dialog state tracking e proposedmodel achieves the best performance at every topic Espe-cially the proposed model only obtains lower performanceat the topic ACCOMMODATION e reason why the

proposed model shows poor performance for ACCOM-MODATION is that there are many utterances mentioningsome locations in the dialogs of ACCOMMODATION but

Table 6 e classication performance of the informativenessclassier

Set Accuracy Precision Recall F1-scoreDevelopment 09484 09202 09820 09501Test 09316 09089 09557 09317

Table 5e parameters for the neural dialog state tracker and theirvalues

Parameter ValueEmbedding size 500Hidden layer size 500Dropout ratio 01Learning rate 0005Number of epochs 100000

Table 3 e number of slots (regular INFO) and their attributes

Topic categoryRegular slot INFO slot

No of slot No of attribute No of avg attribute per slot No of slot No of attributeACCOMMODATION 3 570 190 1 23ATTRACTION 5 611 122 1 31FOOD 7 1411 202 1 16SHOPPING 4 651 163 1 16TRANSPORTATION 5 2450 490 1 15

Table 4e parameters for the informativeness classier and theirvalues

Parameter ValueEmbedding size 500Filter sizes 2 3 4e number of lters 128Dropout ratio 05Batch size 50Number of epochs 1000

Table 7 e comparison of the dialog state tracking accuraciesbetween the proposed model and DSTC4 participants

Model AccuracyString match [26] 00374LSTM [20] 00270Proposed model 00852

Table 8 e comparison of the dialog state tracking F1-scoresbetween the proposed model and DSTC4 participants

Model Precision Recall F1-scoreString match [26] 03589 01925 02506LSTM [20] 03400 02010 02530Proposed model 05173 03499 04174

06

05

04

03F1-s

core

02

01

ACCO

MM

OD

ATIO

N

ATTR

ACTI

ON

Topic

FOO

D

SHO

PPIN

G

TRAN

SPO

RTAT

ION

String matchLSTMProposed model

Figure 8 e F1-score of the proposed model per topic

8 Computational Intelligence and Neuroscience

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

they are often related with other topics such as TRANS-PORTATION and SHOPPING For instance for the utter-ance ldquoten minutes walk to the mrt train stationrdquo theproposed model predicts its slot as ldquoSTATIONrdquo and itsattribute as ldquoMRT train stationrdquo is prediction seems to berelevant since a train station is usually related to TRANS-PORTATION However the dialog is about the places nearthe suggested accommodation us the gold standard slotfor the utterance is ldquoNEIGHBORHOODrdquo and the goldstandard attribute is ldquoKampong Glamrdquo a location name

Table 9 implies the dialog state tracking accuracy of theproposed model for every slot in a topic category Note thateach accuracy is computed at the utterance level not thesubdialog level e regular slots which have distinguishableattributes like ldquoTIMErdquo and ldquoTYPE_OF_PLACErdquo showmoderately high accuracy while some regular slots such asldquoNEIGHBOURHOODrdquo ldquoPLACErdquo or ldquoDISHrdquo achieve lowaccuracy e reason why the proposed model shows poorperformance for some regular slots is that they have specificnames for their attributes such as restaurant name streetname and dish name that are extremely infrequent in dataset e proposed model also achieves meaningful perfor-mance for INFO slot e high performance for INFO slotimplies that the proposed model can catch the contentsmentioned implicitly in an utterance

To examine the effectiveness of the attention mechanismand the hierarchical softmax three variations of the pro-posed model are compared with the proposed model Onevariant (NT-Atten-NoHier) is the neural tracker with theattention mechanism and the general softmax is isdesigned to inspect the effectiveness of the attentionmechanism in dialog state tracking Another (NT-NoAtten-Hier) is the neural tracker without the attention but with thehierarchical softmax is is prepared to examine the hi-erarchical softmax e third (NT-Atten-Hier) is the pro-posed model the neural tracker with both the attention andthe hierarchical softmaxe informativeness classifier is notused in comparing them

Tables 10 and 11 prove the effectiveness of the attentionmechanism and the hierarchical softmax e proposedmodel (NT-Atten-Hier) outperforms both NT-Atten-NoHier and NT-NoAtten-Hier which implies that theattention mechanism and the hierarchical softmax are allhelpful in enhancing the neural tracker One thing to note inthese tables is that the performance improvement of theneural tracker is made mainly by the attention mechanismNT-Atten-Hier achieves 16 times higher accuracy than NT-NoAtten-Hier but it shows a similar accuracy with NT-Atten-NoHier Nevertheless it is also true that the hierar-chical softmax enhances the neural trackeris is proved bythe fact that the performance of NT-Atten-Hier is slightlyhigher than that of NT-Atten-NoHier

Table 12 shows the effectiveness of the informativenessclassifier e F1-score of the proposed pure neural trackerNT-Atten-Hier is 03977 but it goes up to 04174 when theinformativeness classifier is applied in front of the neuraltracker as in Figure 3 e improvement by the in-formativeness classifier is also discovered in accuracy is isbecause the adoption of the informativeness classifier

Table 9 e dialog state tracking accuracy of proposed model forevery slot in topic category

Topic Slot Accuracy

ACCOMMODATION

INFO 324TYPE_OF_PLACE 483

PLACE 187NEIGHBOURHOOD 88

ATTRACTION

INFO 211TYPE_OF_PLACE 655

ACTIVITY 557PLACE 46TIME 633

NEIGHBOURHOOD 178

FOOD

INFO 371CUISINE 471

TYPE_OF_PLACE 686DRINK 622PLACE 77

MEAL_TIME 739DISH 58

NEIGHBOURHOOD 273

SHOPPING

INFO 413TYPE_OF_PLACE 419

PLACE 48NEIGHBOURHOOD 41

TIME 750

TRANSPORTATION

INFO 239FROM 53TO 56

STATION 147LINE 161TYPE 428

Table 10 e dialog state tracking accuracy of the proposedmodelrsquos variants

Model AccuracyNT-Atten-NoHier 00672NT-NoAtten-Hier 00422NT-Atten-Hier 00697

Table 11 e F1-score of the dialog state tracking of the proposedmodelrsquos variants

Model Precision Recall F1-scoreNT-Atten-NoHier 04403 03335 03795NT-NoAtten-Hier 03316 03058 03181NT-Atten-Hier 04662 03469 03977

Table 12 e effect of the informativeness classifier in the pro-posed model

Model Accuracy Precision Recall F1-score

Without informativenessclassifier 00697 04662 03469 03977

With informativenessclassifier 00852 05173 03499 04174

Computational Intelligence and Neuroscience 9

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

reduces the possibility of misjudgments by noninformativeutterances

To analyze the shortcomings of various neural trackerswe use three types of slot-level errors that were dened in theDSTC challenge e error types are as follows

(i) Missing attribute the actual dialog has the attributesfor a specic slot but the tracker does not outputany attribute for the slot

(ii) Extraneous attribute no attribute for a slot iscontained in an actual dialog but the tracker out-puts some attributes for the slot

(iii) False attribute the attribute for a slot is specied inan actual dialog but the tracker outputs a dierentattribute from the dialog

According to the result of DSTC4 challenge [26] thelargest error type is the missing attribute

Figure 9 depicts the number of errors in each error typeby the variants of the proposed model In this gureldquoProposed Modelrdquo implies that NT-Atten-Hier with theinformativeness classier e number of errors in missingattribute and extraneous attribute is much reduced inldquoProposed Modelrdquo while sacricing somewhat of false at-tribute erefore the informativeness classier is proved tobe helpful actually in reducing the errors in missing andextraneous attributes Many errors in false attribute aremade by a limited context of a given utterance For instancelet us consider an utterance ldquouh another place you mightwant to check out is opposite to Sentosa island and it is calledVivocityrdquo In this utterance a user wants to know anothershopping place Since the proposed model focuses only onthe utterance it predicts the slot for the utterance asldquoNEIGHBOURHOODrdquo and its attribute as ldquoSentosardquois iswrong but plausible predictions since the utterance alsocontains locational information semantically at is manyerrors by the proposed model are not critical in un-derstanding the user intention

6 Conclusion

is paper has proposed a task-oriented dialog state trackerin two steps e proposed tracker is composed of an in-formativeness classier and a neural dialog state tracker Itrst lters noninformative utterances in dialog by usinginformative classier e informativeness classier isimplemented by CNN which showed good performance insentence classication In the next step the neural trackerdetermines dialog states from informative utterance withattention mechanism and hierarchical softmax By adoptingthe attention mechanism and hierarchical softmax theproposed model can not only focus on salient word ina given utterance but also deal with a large of vocabulary indialog ecentciently e experimental results on dialog statetracking to examine the capability of the proposed modelshow that the proposed model outperforms both the neuraltracker without informativeness classier the attentionmechanism and the hierarchical softmax as well as the

previous work which implies the plausibility of adoptingthem in dialog state tracking

ere is still some room for its improvement althoughthe proposed model achieves an acceptable result Onepossible improvement for the current neural tracker is topreserve topic consistency when a model predicts slotse proposed model considers only a given utterancewhen it predicts a slot However since a general dialog hasa single topic the topic of the utterances in the samedialog segment has to continue when a model predicts theslots of the utterances us we will consider a dialoghistory to catch the poundow of a dialog and to preserve topicconsistency of predicting slots in the future work

Data Availability

We used dialog state data (TourSG data set) which areprovided from ETPL-AlowastSTAR (dialog state tracking chal-lenge 4 organization) e TourSG is a research dialog corpusin the touristic domain and is divided by traindevelopmentdata and test data Train and Development data includemanual transcriptions and annotations at both utterance andsubdialog levels for training model and ne-tuning its pa-rameters Test set includes manual transcriptions for evalu-ating the model All researchercompany can access the data infollowing site httpswwwetplsginnovation-oeringsready-to-sign-licensestoursg-a-research-dialogue-corpus-in-the-touristic-domain Researchers or their company need to paya license fee for getting the TourSG data set after signinga license agreement with ETPL-AlowastSTAR e data set includetranscribed and annotated dialogs as well as ontology objectsdescribing the annotations

6000

NT-

Atte

n-N

oHie

r

NT-

NoA

tten-

Hie

r

NT-

Atte

n-H

ier

Prop

osed

mod

el

FalseMissingExtraneous

4000

2000

0

Figure 9 e number of errors according to error types

10 Computational Intelligence and Neuroscience

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Conflicts of Interest

e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

is research was supported by the Basic Science ResearchProgram through the National Research Foundation ofKorea (NRF) funded by theMinistry of Education (no NRF-2016R1D1A1B04935678)

References

[1] Y-N Chen W Y Wang and A I Rudnicky ldquoUnsupervisedinduction and filling of semantic slots for spoken dialoguesystems using frame-semantic parsingrdquo in Proceedings ofIEEE Workshop on Automatic Speech Recognition and Un-derstanding pp 120ndash125 Olomouc Czech Republic De-cember 2013

[2] M Henderson B omson and S Young ldquoDeep neuralnetwork approach for the dialog state tracking challengerdquo inProceedings of 14th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 467ndash471 Metz FranceAugust 2013

[3] M Henderson ldquoMachine learning for dialog state trackinga reviewrdquo in Proceedings of Machine Learning in SpokenLanguage Processing Workshop Aizuwakamatsu JapanSeptember 2015

[4] A Bordes Y-L Boureau and J Weston ldquoLearning end-to-end goal-oriented dialogrdquo in Proceedings of 5th InternationalConference on Learning Representations pp 1ndash15 ToulonFrance April 2017

[5] T Zhao and M Eskenazi ldquoTowards end-to-end learning fordialog state tracking and management using deep re-inforcement learningrdquo in Proceedings of 17th Annual Meetingof the Special Interest Group on Discourse and Dialoguepp 1ndash10 Los Angeles CA USA September 2016

[6] Y Kim ldquoConvolutional neural networks for sentence clas-sificationrdquo in Proceedings of 2014 Conference on EmpiricalMethods in Natural Language Processing pp 1746ndash1751Doha Qatar October 2014

[7] X Zhang J Zhao and Y LeCun ldquoCharacter-level con-volutional networks for text classificationrdquo in Proceedings ofAdvances in Neural Information Processing Systemspp 649ndash657 Montreal Canada December 2015

[8] M Henderson B omson and S Young ldquoWord-baseddialog state tracking with recurrent neural networksrdquo inProceedings of 15th Annual Meeting of the Special InterestGroup on Discourse and Dialogue Philadelphia PA USAJune 2014

[9] D Bahdanau K Cho and Y Bengio ldquoNeural machinetranslation by jointly learning to align and translaterdquo 2014httparxivorgabs14090473

[10] F Morin and Y Bengio ldquoHierarchical probabilistic neuralnetwork language modelrdquo in Proceedings of Tenth Workshopon Artificial Intelligence and Statistics pp 246ndash252 Bridge-town Barbados January 2005

[11] A Mnih and G Hinton ldquoA scalable hierarchical distributedlanguage modelrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 1081ndash1088 Vancouver BCUSA December 2009

[12] J Williams A Raux D Ramachandran and A Black ldquoedialog state tracking challengerdquo in Proceedings of 14th Annual

Meeting of the Special Interest Group on Discourse and Di-alogue pp 404ndash413 Metz France August 2013

[13] R Higashinaka M Nakano and K Aikawa ldquoCorpus-baseddiscourse understanding in spoken dialogue systemsrdquo inProceedings of 41st Annual Meeting on Association forComputational Linguistics pp 240ndash247 Sapporo Japan July2003

[14] V Zue S Seneff J R Glass et al ldquoJUPlTER a telephone-based conversational interface for weather informationrdquo IEEETransactions on Speech and Audio Processing vol 8 no 1pp 85ndash96 2000

[15] S Larsson and D R Traum ldquoInformation state and dialoguemanagement in the trindi dialogue move engine toolkitrdquoNatural Language Engineering vol 6 no 3-4 pp 323ndash3402000

[16] K Sun L Chen S Zhu and K Yu ldquoA generalized rule basedtracker for dialogue state trackingrdquo in Proceedings of SpokenLanguage Technology Workshop pp 330ndash335 South LakeTahoe NV USA December 2014

[17] D Bohus and A Rudnicky ldquoA ldquoK hypotheses + otherrdquo beliefupdating modelrdquo in Proceedings of AAAI Workshop on Sta-tistical and Empirical Approaches to Spoken Dialogue Systemspp 13ndash18 Boston MA USA July 2006

[18] Bomson and S Young ldquoBayesian update of dialogue statea POMDP framework for spoken dialogue systemsrdquo Com-puter Speech amp Language vol 24 no 4 pp 562ndash588 2010

[19] YMa A Raux D Ramachandran and R Gupta ldquoLandmark-based location belief tracking in a spoken dialog systemrdquo inProceedings of 13th Annual Meeting of the Special InterestGroup on Discourse and Dialogue pp 169ndash178 Seoul KoreaJuly 2012

[20] K Yoshino T Hiraoka G Neubig and S Nakamura ldquoDi-alogue state tracking using long short term memory neuralnetworksrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka Finland Jan-uary 2016

[21] H Shi T Ushio M Endo K Yamagami and N HoriildquoConvolutional neural networks for multi-topic dialog statetrackingrdquo in Proceedings of Seventh International Workshopon Spoken Dialog Systems pp 1ndash8 Saariselka FinlandJanuary 2016

[22] J Perez and F Liu ldquoDialog state tracking a machine readingapproach using memory networkrdquo in Proceedings of 15thConference of the European Chapter of the Association forComputational Linguistics pp 305ndash314 Valencia SpainApril 2017

[23] T Mikolov I Sutskever K Chen G S Corrado and J DeanldquoDistributed representations of words and phrases and theircompositionalityrdquo in Proceedings of Advances in Neural In-formation Processing Systems pp 3111ndash3119 Lake Tahoe NVUSA December 2013

[24] W-T Yih X He and C Meek ldquoSemantic parsing for single-relation question answeringrdquo in Proceedings of 52nd AnnualMeeting of the Association for Computational Linguisticspp 643ndash648 Baltimore MD USA June 2014

[25] H Peng J Li Y Song and Y Liu ldquoIncrementally learningthe hierarchical softmax function for neural languagemodelsrdquo in Proceedings of irty-First AAAI Conference onArtificial Intelligence pp 3267ndash3273 San Francisco CAUSA February 2017

[26] S Kim L F DrsquoHaro R E Banchs J D Williams andM Henderson ldquoe forth dialog state tracking challengerdquo inProceedings of Seventh International Workshop on Spoken DialogSystems pp 1ndash8 Saariselka Finland January 2016

Computational Intelligence and Neuroscience 11

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

Computer Games Technology

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

Advances in

FuzzySystems

Hindawiwwwhindawicom

Volume 2018

International Journal of

ReconfigurableComputing

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

thinspArtificial Intelligence

Hindawiwwwhindawicom Volumethinsp2018

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawiwwwhindawicom Volume 2018

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Computational Intelligence and Neuroscience

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018

Human-ComputerInteraction

Advances in

Hindawiwwwhindawicom Volume 2018

Scientic Programming

Submit your manuscripts atwwwhindawicom

top related