the fourth dialog state tracking challenge (dstc4)
TRANSCRIPT
IWSDS 2016
The Fourth Dialog State Tracking
Challenge (DSTC4)
Seokhwan Kim1, Luis Fernando D’Haro1, Rafael E. Banchs1,
Jason D. Williams2, Matthew Henderson3 1 Institute for Infocomm Research, 2 Microsoft Research, 3 Google
IWSDS 2016
Dialogue State Tracking
• A key subtask in dialogue management
• To estimate the user’s goal as a dialogue progresses
Pg 2
Utterance Food Area
S Hello, How may I help you? Persian South
U I need a Persian restaurant in the south part of town.
S What kind of food would you like? Persian South
U Persian.
S I’m sorry but there is no restaurant serving persian food Portuguese South
U How about Portuguese food?
S Peking restaurant is a nice place in the south of town. Portuguese South
U Is that Portuguese?
S Nandos is a nice place in the south of town serving tasty Portuguese food. Portuguese South
U Alright. Whats the phone number?
S The phone number of nandos is 01223 327908 . Portuguese South
U And the address?
S Sure, nandos is on Cambridge Leisure Park Clifton Way. Portuguese South
U Thank you good bye.
IWSDS 2016
Previous Dialog State Tracking Challenges
• DSTC1 (Williams et al., SIGDIAL 2013)
– Human-machine dialogues on bus timetable search
– Collected with Let’s go (CMU)
– Focused on the evaluation metrics for state tracking
• DSTC2 (Henderson et al., SIGDIAL 2014)
– Human-machine dialogues on restaurant search
– Collected with Cambridge University’s system
– Introduced changing user goals in a single dialogue session
• DSTC3 (Henderson et al., IEEE SLT 2014)
– Human-machine dialogues on tourist information search
– Collected with Cambridge University’s system
– Addressed the problem of adaptation to a new domain from DSTC2
Pg 3
IWSDS 2016
TourSG: Dataset for DSTC4
• Human-human dialogues
• Tourist information in Singapore
• Speakers
– Guide (3 actual tour guides from Singapore)
– Tourist (35 possible tourists from Philippines)
• Characteristics
– Goal-oriented dialogues
– Mixed-initiative dialogues
– Knowledge-based dialogues
– Multi-topic dialogues
– Verbose dialogues
– Noisy dialogues
Pg 4
IWSDS 2016
DSTC4: Timeline
Period Task
Mar 2012 – Oct 2012
Data collection and annotation
Sep 2014 – Dec 2014
Internal discussions
7 Dec 2014 Challenge planning meeting @ SLT 2014
Dec 2015 – Apr 2015
Labelling additional annotations and building resources for evaluation
15 Apr 2015 – 16 Aug 2015
Development phase of the main and pilot tasks of DSTC4
17 Aug 2015 – 31 Aug 2015
Evaluation phase of the main task of DSTC4
14 Sep 2015 – 16 Sep 2015
Evaluation phase of the pilot tasks of DSTC4
30 Sep 2015 Paper submission deadline to IWSDS 2016
Pg 5
IWSDS 2016
Main Task: Dialogue State Tracking
• Motivation
– Each subject could be expressed through a series of multiple turns
– Multiple topics are interlaced in a session
• Problem Definition
– Dialogue state tracking for each sub-dialogue level
– Focusing on the most common topic categories
• Annotations
– Segmentation
– Topic Category
– Frame Structure for major topic categories
• Itinerary, Accommodation, Attraction, Food, Transportation
Pg 6
IWSDS 2016
Examples of Dialogue States
Pg 7
Tourist Can you give me some uh- tell me some cheap rate hotels, because I'm planning just to leave my bags there and go somewhere take some pictures.
Guide Okay. I'm going to recommend firstly you want to have a backpack type of hotel, right?
Tourist Yes. I'm just gonna bring my backpack and my buddy with me. So I'm kinda looking for a hotel that is not that expensive. Just gonna leave our things there and, you know, stay out the whole day.
Guide Okay. Let me get you hm hm. So you don't mind if it's a bit uh not so roomy like hotel because you just back to sleep.
Tourist Yes. Yes. As we just gonna put our things there and then go out to take some pictures. Guide Okay, um- Tourist Hm. Guide Let's try this one, okay? Tourist Okay.
Guide It’s InnCrowd Backpackers Hostel in Singapore. If you take a dorm bed per person only twenty dollars. If you take a room, it's two single beds at fifty nine dollars.
Tourist Um. Wow, that's good.
Guide Yah, the prices are based on per person per bed or dorm. But this one is room. So it should be fifty nine for the two room. So you're actually paying about ten dollars more per person only.
Tourist Oh okay. That's- the price is reasonable actually. It's good.
TOPIC ACCOMMODATION
TYPE Hostel
PRICERANGE Cheap
TOPIC ACCOMMODATION
NAME InnCrowd Backparkers Hostel
IWSDS 2016
Examples of Dialogue States
Pg 8
Tourist So uh is it near the airport?
Guide Hm no. But you can get there easily by taking the trains from the airport. You just need to make a change in the train direction.
Tourist Hm okay. Because I have no idea at all about Singapore trains or transit. Uh how can I go to the train or to the transit from the airport? Is it just outside the airport?
Guide So when you reach the airport you go down to the basement. Tourist Um. Okay.
Guide So you get your ticket, you pay your deposit. And I think at the airport they gave you a map. and- to give you an idea. So all this is free. And then you travel along the East line towards the West. Can you see Tanah Merah on the second stop?
Tourist Okay. Hm, Tanah Merah, yes.
Guide Okay. So that is where you change to go down to town to the West towards the West. And you go down to- I think the easiest way is to go to Outram Park.
Tourist Outrum Park. Guide Yah. Tourist Alright.
Guide So when you get up there, you take the line towards Little India. So it's one, two, three stops and you are at Little India.
Tourist Hm, okay.
TOPIC TRANSPORTATION
FROM Changi Airport
TO InnCrowd Backparkers Hostel
BY MRT
IWSDS 2016
Examples of Dialogue States
Pg 9
Guide So this is the place that you can go out and try street food. you can soak in the atmosphere. You would love taking your camera out because you can photograph the Indian garland makers, the fortune tellers. Uh it's full of life and culture. It's one of my favourite places.
Tourist Oh- Oh, great. Oh Yah. Is Little India is like a Indian community town or Indiantown?
Guide Yes. So there are Hindu temples there. You can photograph beautiful architecture and statues of the different Deities, the Hindu Deities.
Tourist Uh huh. Okay. Great. So other than Indiantowns, are there other uh nations town there or race town? What else?
Guide Okay. And then Chinatown you take the same line. Two, three stops down. So you'll get off at Chinatown, you are right in the heart of Chinatown. And in Chinatown we have uh also Bhuddist temple and Terrace temple also great for photography.
Tourist Uh Yes. Yes, okay. Okay, great. So we have Little India, then Chinatown. Other than that two, there are other kinds of town, right? Like uh- is there a um something like Vietnamese town or just the two of these?
Guide Not Vietnamese but there is uh Kampong Glam which you have to go by bus because- well actually you could go by train because you are young and healthy you can walk.
Tourist Hm. Yah, I like walking.
TOPIC ATTRACTION
NAME Little India
TOPIC ATTRACTION
TYPE Ethnic enclave TOPIC ATTRACTION
NAME Chinatown
TOPIC ATTRACTION
TYPE Ethnic enclave
TOPIC ATTRACTION
NAME Kampong Glam
IWSDS 2016
Examples of Dialogue States
Pg 10
Tourist So what about other than street food, of course I have to eat my dinner. Wha~ where do you suggest me to eat my dinner? I also want to experience Singaporean delicacies or Singaporean dishes.
Guide do you like hot food? Do you like curries? Tourist Curries? Guide Yah.
Tourist Indian curries? What about Singaporean restaurants? Like they, you know, they offer Singaporean delicacies or Singaporean dishes? Do you have a Singaporean dishes in Singapore?
Guide Uh, Singaporean food is mostly try at the uh food courts. This is one I am recommending to you. It's at old market. It's Maxwell Road Food Centre.
Tourist Um. Road Food Centre.
Guide So it is at place called Maxwell Road which is in Chinatown. So if you take the train to Chinatown from where you are and you'd- It's near. You just walk there.
Tourist Okay, nice.
TOPIC FOOD
CUISINE Singaporean
TOPIC FOOD
DISH Curry
TOPIC FOOD
CUISINE Singaporean TOPIC FOOD
TYPE_OF_PLACE Hawker centre
NAME Maxwell Road Food Centre
IWSDS 2016
Main Task: Evaluation
• Resources
– Data
• Training set: 14 dialogues with 12,759 utterances
• Development set: 6 dialogues with 4,812 utterances
• Test set: 9 dialogues with 7,848 utterances
– Ontology
– Evaluation scripts
– Baseline tracker
• Fuzzy string matching with the ontology entries
– CodaLab: Web-based Competition Platform
• Metrics
– Schedules
• Schedule 1: all turns are included
• Schedule 2: only the turns at the end of segments are included
– Metrics
• Frame Structure-level Accuracy
• Slot-level Precision/Recall/F-measure
Pg 11
IWSDS 2016
Pg 12
Schedule 1 Schedule 2
Team Entry Accuracy Precision Recall F-measure Accuracy Precision Recall F-measure
Baseline 0 0 0.0374 0.3589 0.1925 0.2506 0.0488 0.3750 0.2519 0.3014
1 0 0.0456 0.3876 0.3344 0.3591 0.0584 0.4384 0.3377 0.3815
1 0.0374 0.4214 0.2762 0.3336 0.0584 0.4384 0.3377 0.3815
2 0.0372 0.4173 0.2767 0.3328 0.0575 0.4362 0.3377 0.3807
3 0.0371 0.4179 0.2804 0.3356 0.0584 0.4384 0.3426 0.3846
2 0 0.0487 0.4079 0.2626 0.3195 0.0671 0.4280 0.3257 0.3699
1 0.0467 0.4481 0.2655 0.3335 0.0671 0.4674 0.3275 0.3851
2 0.0478 0.4523 0.2623 0.3320 0.0706 0.4679 0.3226 0.3819
3 0.0489 0.4440 0.2703 0.3361 0.0697 0.4634 0.3335 0.3878
3 0 0.1212 0.5393 0.4980 0.5178 0.1500 0.5569 0.5808 0.5686
1 0.1210 0.5449 0.4964 0.5196 0.1500 0.5619 0.5787 0.5702
2 0.1092 0.5304 0.5031 0.5164 0.1316 0.5437 0.5875 0.5648
3 0.1183 0.5780 0.4904 0.5306 0.1473 0.5898 0.5678 0.5786
4 0 0.0887 0.5280 0.3595 0.4278 0.1072 0.5354 0.4273 0.4753
1 0.0910 0.5314 0.3122 0.3933 0.1055 0.5325 0.3623 0.4312
2 0.1009 0.5583 0.3698 0.4449 0.1264 0.5666 0.4455 0.4988
3 0.1002 0.5545 0.3760 0.4481 0.1212 0.5642 0.4540 0.5031
5 0 0.0309 0.2980 0.2559 0.2754 0.0392 0.3344 0.2547 0.2892
1 0.0268 0.3405 0.2014 0.2531 0.0401 0.3584 0.2632 0.3035
2 0.0309 0.3039 0.2659 0.2836 0.0392 0.3398 0.2639 0.2971
6 0 0.0421 0.4175 0.2142 0.2831 0.0541 0.4380 0.2656 0.3307
1 0.0478 0.5516 0.2180 0.3125 0.0654 0.5857 0.2702 0.3698
2 0.0486 0.5623 0.2314 0.3279 0.0645 0.5941 0.2850 0.3852
7 0 0.0286 0.2768 0.1826 0.2200 0.0323 0.3054 0.2410 0.2694
1 0.0044 0.0085 0.0629 0.0150 0.0061 0.0109 0.0840 0.0194
IWSDS 2016
Main Task: Results
Pg 13
IWSDS 2016
Main Tasks: Error Distribution
Pg 14
IWSDS 2016
Main Tasks: Ensemble Learning
Schedule 1 Schedule 2
Accuracy F-measure Accuracy F-measure
Single best entry 0.1212 0.5306 0.1500 0.5786
Top 3 entries: union 0.1111 - 0.5147 - 0.1325 - 0.5619 -
Top 3 entries: intersection 0.1241 + 0.5344 + 0.1561 + 0.5861 +
Top 3 entries: majority voting 0.1172 - 0.5194 - 0.1421 - 0.5703
Top 5 entries: union 0.0980 - 0.5133 - 0.1107 - 0.5543 -
Top 5 entries: intersection 0.1157 0.4370 - 0.1369 0.5008 -
Top 5 entries: majority voting 0.1183 - 0.5210 - 0.1439 0.5711
Top 10 entries: union 0.0623 - 0.4719 - 0.0680 - 0.5014 -
Top 10 entries: intersection 0.0300 - 0.1816 - 0.0453 - 0.2275 -
Top 10 entries: majority voting 0.1268 + 0.4741 - 0.1456 0.5380 -
All entries: union 0.0077 - 0.1320 - 0.0078 - 0.1366 -
All entries: intersection 0.0132 - 0.0229 - 0.0192 - 0.0331 -
All entries: majority voting 0.0646 - 0.3535 - 0.0898 - 0.4135 -
Pg 15
IWSDS 2016
Pilot Tasks: Evaluation
• Tasks
– Spoken Language Understanding (SLU)
– Speech Act Prediction (SAP)
– Spoken Language Generation (SLG)
– End-to-end System (EES)
• Evaluation Metrics
– SLU and SAP
• Precision/Recall/F-measure
– SLG and EES
• BLEU
• AM-FM
Pg 16
IWSDS 2016
• Resources
– Data
• Training set: 14 dialogues with 12,759 utterances
• Development set: 6 dialogues with 4,812 utterances
• Test set: 6 dialogues with 5,615 utterances
– Ontology
– Evaluation Scripts
• Offline evaluation
• Web-based evaluation
• Web-based Evaluation
Pilot Tasks: Evaluation
Pg 17
JSON Messages
Web-server
System
Participant
Web-client
Evaluation Script
Organizer
IWSDS 2016
Pilot Tasks: Results
• Participant
– SLU
• Team 3 (5 entries)
• Results
Pg 18
Speech Act Semantic Tag
Speaker Entry Precision Recall F-measure Precision Recall F-measure
Guide 1 0.6287 0.5191 0.5687 0.5646 0.4886 0.5239
2 0.6330 0.5227 0.5726 0.5646 0.4886 0.5239
3 0.7451 0.6153 0.6740 0.5646 0.4886 0.5239
4 0.6314 0.5214 0.5712 0.5646 0.4886 0.5239
5 0.6762 0.5584 0.6117 0.5646 0.4886 0.5239
Tourist 1 0.3583 0.2977 0.3252 0.5741 0.4764 0.5207
2 0.2931 0.2435 0.2660 0.5741 0.4764 0.5207
3 0.5627 0.4675 0.5107 0.5741 0.4764 0.5207
4 0.2939 0.2442 0.2668 0.5741 0.4764 0.5207
5 0.5736 0.4766 0.5206 0.5741 0.4764 0.5207
IWSDS 2016
Conclusions
• DSTC4
– Main Task: Dialogue State Tracking
• Multi-topic, Mixed-initiative, Human-human conversations
• Tracking sub-dialogue segment-level state structures
• 24 entries from 7 participants
– Pilot Tasks
• SLU, SAP, SLG, EES
• Web-based evaluation
• 5 SLU entries from a participant
Pg 19
Thank You
Pg 20