nlify: lightweight spoken natural language interfaces via exhaustive paraphrasing

33
NLify Lightweight Spoken Natural Language Interfaces via Exhaus:ve Paraphrasing Seungyeop Han U. of Washington Ma@hai Philipose, YunCheng Ju MicrosoF

Upload: seungyeop-han

Post on 10-May-2015

258 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

NLify    Lightweight  Spoken  Natural  Language  Interfaces    via  Exhaus:ve  Paraphrasing  

Seungyeop  Han            U.  of  Washington  Ma@hai  Philipose,  Yun-­‐Cheng  Ju    MicrosoF  

Page 2: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Speech-­‐Based  UIs  are  Here  

Ubicomp  2013   2  

Today  Siri,  …  

Today  Hey  Glass,  …  

Tomorrow  Hey  Microwave,  …  

Page 3: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Keyphrases  Don’t  Scale  

Ubicomp  2013   3  

What  :me  is  it?  

…  

Use  Spoken  Natural  Language  

App1  

App2   Next  bus  to  Sea@le  

App3   Tomorrow’s  weather  

App50   …  App26   When  is  the  next  mee:ng   “What  &me  is  the  next  mee:ng”  …  

Keyphrase  Hell  

Page 4: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Spoken  Natural  Language  (SNL)  Today:  First-­‐party  Applica:ons  

“Hey,  Siri.    Do  you  love  me?”  

Ubicomp  2013   4  

•  Personal  assistant  model  •  Large  speech  engine    (20-­‐600GB)  •  Experts  mapping  speech  to  a  few  domains  

Speech  Recogni:on  

Language  Processing  

Text:  “Hey  Siri…”  …  “I’m  not  allowed,  Seungyeop”  

Page 5: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

NLify:  Scaling  Spoken  NL  Interfaces  

1st  party  app  (e.g.,  Xbox,  Siri)  mul:ple  PhDs,  10s  of  developers  

3rd  party  app    (e.g.,  intuit,  spo:fy)  0  PhDs,  1-­‐3  developers  

end-­‐user  macro  (e.g.,  [email protected])  0  PhDs,  0  developers  

10  

10,000    

10,000,000  

#  apps  

Ubicomp  2013   5  

Page 6: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Goal  

 Make    

programming  spoken  natural  language  interfaces    as  easy  and  robust  as    

programming  graphical  user  interfaces  

Ubicomp  2013   6  

Page 7: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Outline  

•  Mo:va:on  /  Goal  •  System  Design  •  Demonstra:on  •  Evalua:on  •  Conclusion  

Ubicomp  2013   7  

Page 8: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Challenges  

•  Developers  are  not  SNL  experts  

•  Applica:ons  are  developed  independently  

•  Cloud-­‐based  SNL  does  not  scale  as  UI    – UI  capability  must  not  rely  on  connec:vity  – UI  events  must  have  minimal  cost  

Ubicomp  2013   8  

Page 9: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Specifying  GUIs  

Ubicomp  2013   9  Intui:ve  defini:on  of  UI   handler  linking  to  code  

Page 10: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Specifying  Spoken  Keyphrase  UIs  

<CommandPrefix>Magic  Memo</CommandPrefix>    <Command  Name="newMemo">    

 <ListenFor>Enter  [a]  [new]  memo</ListenFor>      <ListenFor>Make  [a]  [new]  memo</ListenFor>      <ListenFor>Start  [a]  [new]  memo</ListenFor>      <Feedback>Entering  a  new  memo</Feedback>      <Navigate  Target=“/Newmemo.xaml”>    

</Command>  ...  

How  does  natural  language  differ  from  keyphrases?  

Ubicomp  2013   10  

Page 11: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Difference  1:  Local  Varia:on  

•  Missing  words  

•  Repeated  words  

•  Re-­‐arranged  words  

•  New  combina:ons  of  phrases  

When  is  the  next  meeCng?  

When  is  next  mee:ng?  

When  is  the  next..  next  mee:ng?  

When  the  next  mee:ng  is?  

What  :me  is  the  next  mee:ng?  

Ubicomp  2013   11  

Page 12: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Difference  2:  Paraphrases  

show  me  the  current  :me  what  is  the  :me  :me  what  is  the  current  :me  may  i  know  the  :me  please  give  :me  show  me  the  :me  show  me  the  clock  tell  me  what  :me  it  is  what  is  :me  current  :me  tell  what  :me  it  is  list  the  :me  what  :me  

what  :me  it  is  now  show  current  :me  what  :me  please  show  :me  what  is  the  :me  now  current  :me  please  say  the  :me  find  the  current  :me  please  what  :me  is  it  what  is  current  :me  what  :me  is  it  tell  me  :me  current  what's  the  :me  tell  current  :me  

what  :me  is  it  now  what  :me  is  it  currently  check  :me  the  :me  now  tell  me  the  current  :me  what's  :me  :me  now  tell  me  the  :me  can  you  please  tell  me  what  :me  it  is  tell  me  current  :me  give  me  the  :me  :me  please  show  me  the  :me  now    

Ubicomp  2013   12  

Page 13: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Specifying  SNL  Systems  

Ubicomp  2013   13  

Speech  Recogni:on  

Language  Processing  

whanme()  “what  :me  is  it?”  

Few  rules,  lots  of  data  Use  sta:s:cal  language  models  that  require  li@le  an:cipa:on  of  local  noise  

Use  data-­‐driven  models  that  require  li@le  domain  knowledge  

Encode  local  varia:on  in  grammar  

Encode  domain  knowledge  on  paraphrases  in  models  e.g.  CRFs  

Lots  of  rules,  liFle  data  

Page 14: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Exhaus:ve  Paraphrasing  by    Automated  Crowdsourcing  

Ubicomp  2013   14  

Examples  from  developers  

Handler:  whanme()  Descrip:on:  When  you  want  to  know  the  :me  Examples:    What  :me  is  it  now  What’s  the  :me  Tell  me  the  :me    

Handler:  whanme()  Descrip:on:  When  you  want  to  know  the  :me  Examples:    What  :me  is  it  now  What’s  the  :me  Tell  me  the  :me  Current  :me  Find  the  current  :me  please  Time  now  Give  me  :me  …    

following task,

descrip:on  example  

direc:ons  

Automa:cally  generated  crowdsourcing    

Page 15: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

install  :me  

Seed  Examples  

dev  :me  

“Tell  me  when  it’s  @T=20  min  …”  

SAPI   TFIDF  +  NN   NLNo:fyEvent  e  

nlwidget  

Compiling  SNL  Models  .What  is  the  date  @d  .Tell  me  the  date  @d  …  

 amplify  .What  is  the  date  @d  .Tell  me  the  date  @d  .What  date  is  it  @d  .Give  me  the  date  @d  .@d  is  what  date  …  

Internet  crowdsourcing  

service  

Amplified  Examples  

compile  Nearest  neighbormodel  

SLM  Sta:s:cal  Models  

run  :me  

Ubicomp  2013   15  

Page 16: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

install  :me  

dev  :me  

“Tell  me  when  it’s  @T=20  min  …”  

SAPI   TFIDF  +  NN   NLNo:fyEvent  e  

nlwidget  

SNL  Models  for  Mul:ple  Apps  

Amplified    Examples  

compile  Nearest  

neighbor  model  SLM  Sta:s:cal    Models  

run  :me  

Ubicomp  2013   16  

.What  is  the  date  @d  

.Tell  me  the  date  @d  

.What  date  is  it  @d  

.Give  me  the  date  @d  

.@d  is  what  date  …  

Applica:on  1  

•  Apps  developed  separately  =>  “late  assembly”  of  models  •  Limited  :me  for  learning  at  install  :me  =>  simple  (e.g.,  NN)  models  •  Users  no  longer  say  anything  but  what  they  have  installed  =>  “natural  

language  shortcut”  mental  model  

.How  much  is  @com  

.Get  me  quote  for  @com  

.What’s  the  price  for  @com  …  

Applica:on  2  

…  

Applica:on  N  

Page 17: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Outline  

•  Mo:va:on  /  Goal  •  System  Design  •  Demo:  SNL  interfaces  in  4  easy  steps  •  Evalua:on  •  Conclusion  

Ubicomp  2013   17  

Page 18: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Ubicomp  2013   18  

1.  Add  NLify  DLL  

Page 19: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

2.  Providing  Examples  

Ubicomp  2013   19  

Page 20: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

3.  Wri:ng  a  Handler  

Ubicomp  2013   20  

Page 21: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

4.  Adding  a  GUI  Element  

Ubicomp  2013   21  

Page 22: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Ubicomp  2013   22  

Enjoy  J  

Page 23: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Outline  

•  Mo:va:on  /  Goal  •  System  Design  •  Demonstra:on  •  Evalua:on  •  Conclusion  

Ubicomp  2013   23  

Page 24: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Evalua:on  

•  How  good  are  SNL  recogni:on  rates?  •  How  does  performance  scale  with  commands?  •  How  do  design  decisions  impact  recogni:on?  •  How  prac:cal  is  on-­‐phone  implementa:on?  •  What  is  the  developer  experience?  

Ubicomp  2013   24  

Page 25: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Evalua:on  Dataset  

Ubicomp  2013   25  

Domain   Intent  &  Slots   Example  

Clock   FindTime()   What  :me  is  it?  

FindDate(day)   What’s  the  date  today?  

Calendar   CheckNextMtg()   What’s  my  next  mee:ng?  

Bus   FindNextBus(route,  dest)   When  is  the  next  20  to  Sea@le?  

Finance   FindStockPrice(company)   How  much  is  MicrosoF  stock?  

CaculateTip(Money,  NumPeople)   How  much  is  the  :p  for  $20  for  three  people  

CondiCon   FindWeather(day)   How  is  the  weather  tomorrow?  

Contacts   FindOfficeLoca:on(person)   Where  is  the  Janet  Smith’s  office?  

FindGroup(person)   Which  group  does  Ma@hai  work  in?  

…   Across  27  different  commands,    

collected  1612  paraphrases,  3505  audio  samples  

Page 26: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Evalua:on  Dataset  

Ubicomp  2013   26  

Seed  5  paraphrases/intent  By  authors   Amplify  via  

Crowdsourcing  $.03/paraphrase    

Crowd  ~60  paraphrases/intent  By  Crowd  

Audio  130  u@erance/intent  By  20  subjects  

Asking  “What  would  you  say  to  the  phone  to    do  the  described  task”  with  an  example  

Training  

Tes:ng  

Page 27: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Overall  Recogni:on  Performance  

Ubicomp  2013   27  

•  Absolute  recogni:on  rate  is  good  (avg:  85%,  std:  7%)    •  Significant  rela:ve  improvement  from  Seed  (69%)  

Page 28: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Performance  Scales  Well  with    Number  of  Commands    

Ubicomp  2013   28  

Page 29: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Design  Decisions  Impact  Recogni:on  Rates  

Ubicomp  2013   29  

•  The  more  exhaus:ve  paraphrasing  the  be@er:  

•  Sta:s:cal  model  improves  recogni:on  rate  by  16%  vs.  determinis:c  model  

0%  20%  40%  60%  80%  

100%  

20%   40%   60%   80%   100%  

RecogniCon

 Rate  

Training  Set  

Page 30: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Feasibility  of  Running  on  Mobiles  

•  NLify  is  compe::ve  with  a  large  vocabulary  model  

•  Memory  usage  is  acceptable:  maximum  memory  for  27  intents  was  32M  

•  Power  consump:on  very  close  to  listening  loop  

Ubicomp  2013   30  

Figure 5. Scaling with number of commands.

Figure 6. Incremental benefit from templates.

prising since both the SLM and TF-IDF algorithms that iden-tify intents compete across intents. Third, slot recognitiondoes not vary monotonically with number of competitors; infact the particular competitors seem to make a big difference,leading to high variance for each N . On closer examinationwe determined that even the identity of the competitors doesnot matter: when certain challenging functions (e.g., 11, 12and 19) are included, recognition rate for the subset plum-mets. Larger values of n will likely give a smoother averageline. Overall, since slot-recognition is performed determinis-tically bottom up, it does not compete at the language-modellevel with other commands.

Impact of NLify FeaturesNLify uses two main techniques to generalize from the seedsprovided by the developers to the variety of SNL. To cap-ture broad variation, it supports template amplification as perthe UHRS dataset. To support small local noise (e.g. wordsdropped in the speech engine), it advocates a statistical ap-proach even when the models are run locally on the phone (incontrast, e.g., to recent production systems [5]).

We saw earlier that using the Seed set instead of Seed +UHRS (where Seed has 5 templates per command and UHRSaverages 60) lowers recognition from 85% to 69%. ThusUHRS-added templates contribute significantly. To evaluatethe incremental value of templates, we measured recognitionrates when f = 20, 40, 60 and 80% of all templates wereused. We pick the templates arbitrarily for this experiment.The corresponding average recognition rates (across all func-tions) was 66, 75, 80 and 83%. Figure 6 shows the breakoutper function. Three factors stand out: recognition rates im-

(a) intent recognition (b) slots recognition

Figure 7. Benefit of statistical modeling.

Figure 8. Comparison to a large vocabulary model.

prove noticeably between the 80 and 100% configurations,indicating that rates have likely not topped out; improvementis spread across many functions, indicating that more tem-plates are broadly beneficial; and there is a big difference be-tween the 20% and the 80% mark. The last point indicatesthat even had the developer added an additional dozen seeds,crowdsourcing would still have been beneficial.

Given that templates may provide good coverage across para-phrases for a command, it is reasonable to ask whether adeterministic model that incorporates all these paraphraseswould perform comparably to a statistical one. Given tem-plate amplification, is a statistical model really necessary? Inthe spirit of the Windows Phone 8 Voice Command [5], wecreated a deterministic grammar for each intent. For robust-ness toward accidentally omitted words, we made the com-mon words {is, me, the, it, please, this, to, you, for, now}optional in every sentence. We compared recognition per-formance of this deterministic system with the SLM, bothtrained on the Seed + UHRS data. Figure 7 shows the resultsfor both intent and slot recognition. Two points are signifi-cant. First, statistical modeling does add a substantial boostfor both intent (16% incremental) and slot recognition (19%).Second, even though slots are parsed deterministically, theirrecognition rates improves substantially with SLMs. Thisis because deterministic parsing is all-or-nothing: the mostcommon failure mode by far is that the incoming sentencedoes not parse, affecting both slot and intent recognition rates.

The experiments thus far assumed that no query was garbage.In practice, users may speak out-of-grammar commands.SLify’s parallel garbage model architecture is set up to catchthese cases. Without the garbage model, the existing SLMwould still reject commands that are egregiously out-of-gra-

[Average]  SLM:  85%  LV:  80%  

Page 31: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Developer  Study  w/  5  Devs  

Asked  to  add  Nlify  into  the  exis:ng  programs  

    Ubicomp  2013   31  

DescripCon   Sample  commands   Original  LOC  

Time    Taken  

Control  a  night  light   “turn  off  the  light”   200   30  mins  

Get  sen:ment  on  Twi@er   “review  this”   2000   30  mins  

Query,  control  loca:on  disclosure  

“where  is  Alice?”   2800   40  mins  

Query  weather   “weather  tomorrow?”   3800   70  mins  

Query  bus  service   “when  is  next  545  to  Sea@le?”   8300   3  days  

(+)  How  well  did  NLify’s  capabili:es  match  your  needs?  (-­‐)  Did  the  cost/benefit  of  Nlify  scale?  (-­‐)  How  long  do  you  think  you  can  afford  to  wait  crowdsourcing  

Page 32: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

Conclusions  

It  is  feasible  to  build  mobile  SNL  systems,  where:  •  Developers  are  not  SNL  experts  •  Applica:ons  are  developed  independently  •  All  UI  processing  happens  on  the  phone    Fast,  compact,  automaCcally  generated  models  enabled  by  exhausCve  paraphrasing  are  the  key.  

Ubicomp  2013   32  

Page 33: NLify: Lightweight Spoken Natural Language Interfaces via Exhaustive Paraphrasing

For  Data  and  Code  

Check  Ma@hai’s  Homepage.      h@p://research.microsoF.com/en-­‐us/people/ma@haip/  

 Or  e-­‐mail  the  authors    On/aVer  October  1.  

Ubicomp  2013   33