improvised theatre alongside artificial...

Improvised Theatre Alongside Artificial Intelligences

Kory W. Mathewson1,2 and Piotr Mirowski21University of Alberta Edmonton, Alberta, Canada

2HumanMachine, London, [email protected], [email protected]

Abstract

This study presents the first report of Artificial Improvi-sation, or improvisational theatre performed live, on-stage,alongside an artificial intelligence-based improvisational per-former. The Artificial Improvisor is a form of artificial con-versational agent, or chatbot, focused on open domain di-alogue and collaborative narrative generation. Using state-of-the-art machine learning techniques spanning from natu-ral language processing and speech recognition to reinforce-ment and deep learning, these chatbots have become morelifelike and harder to discern from humans. Recent workin conversational agents has been focused on goal-directeddialogue focused on closed domains such as appointmentsetting, bank information requests, question-answering, andmovie discussion. Natural human conversations are seldomlimited in scope and jump from topic to topic, they are lacedwith metaphor and subtext and face-to-face communication issupplemented with non-verbal cues. Live improvised perfor-mance takes natural conversation one step further with mul-tiple actors performing in front of an audience. In improvi-sation the topic of the conversation is often given by the au-dience several times during the performance. These sugges-tions inspire actors to perform novel, unique, and engagingscenes. During each scene, actors must make rapid fire deci-sions to collaboratively generate coherent narratives. We haveembarked on a journey to perform live improvised comedyalongside artificial intelligence systems. We introduce Pyggyand A.L.Ex. (Artificial Language Experiment), the first twoArtificial Improvisors, each with a unique composition andembodiment. This work highlights research and development,successes and failures along the way, celebrates collabora-tions enabling progress, and presents discussions for futurework in the space of artificial improvisation.

IntroductionImprovisational theatre, or improv, is the spontaneous cre-ation of unplanned theatrics, often performed live on-stagein front of an audience. Improv is a form of collaborativeinteractive storytelling, where two or more people work to-gether to generate novel narratives. It is grounded in theconnections between the performer(s) and the audience. Im-prov requires the performers to work as a team. The actorsmust rapidly adapt, empathize, and connect with each other

Copyright c© 2017, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

to achieve natural, fluid collaboration. To truly excel at theart form, performers must think and react to audiences re-actions quickly, and work together to accept and amplifyeach other’s offers—an act that can be seen as real-timedynamic problem solving (Magerko and others 2009). Im-prov demands human performers handle novel subject mat-ter through multiple perspectives ensuring the audience isengaged while progressing narrative and story. Due to theincredible difficulty, improvisors must embrace failure sur-render to spontaneity (Johnstone 1979).

Improvised theatre has been a platform for digital story-telling and video game research for more than 20 years (Per-lin and Goldberg 1996; Hayes-Roth and Van Gent 1996).Past research has explored several knowledge-based meth-ods for collaborative storytelling and digital improvisation(O’Neill et al. 2011; Si, Marsella, and Pynadath ; Zhanget al. 2007; Magerko, Dohogne, and DeLeon 2011). Simi-lar work explores how humans interact with system whichimprovise music and dance (Hoffman and Weinberg 2011;Thomaz et al. 2016). Computer aided interactive storytellinghas been considered for applications in video games with anaim to create endless narrative possibilities in video gameuniverses for user engagement (Riedl and Stern 2006).

Robotic performances have been explored previously. In2000, Tom Sgorous performed Judy, or What is it Like to BeA Robot? In 2010, the realistic humanoid robot Gemenoid Fperformed Sayonara, which was later turned into a movie. In2014, Carnegie Mellon University’s Personal Robotics Labcollaborated with their School of Drama to produce SureThing (Zeglin and others 2014). In these performances, therobots were precisely choreographed, deterministic, or pi-loted on stage. These shows required the audience to sus-pend disbelief and embrace the mirage of autonomy. Theseperformances verge ever closer to the deep cliffs surround-ing the uncanny valley—the idea that as the appearance of ahumanlike robot approaches, but fails to attain, human like-ness, a person’s response would abruptly shift from empathyto revulsion (Mori, MacDorman, and Kageki 2012).

Our work serves as a bridge between the artificial in-telligence labs and improvisational theatre stages. We aimto build a bridge over the uncanny valley, toward a futurewhere humans and autonomous agents converse naturallytogether. Our work is partially inspired by the narratives be-hind George Bernad Shaw’s Pygmalion (Shaw 1913), Mary

Shelly’s Frankenstein (Shelley 1818), and Alan Jay Lerner’sMy Fair Lady (Cukor 1964). In these stories, creators at-tempt to design and build reflections of themselves, fabri-cating their respective ideal images of perfection.

We present our methods and details on the systems whichpower the first two Artificial Improvisors. We concisely re-port on findings, and discuss future work at the intersectionbetween artificial intelligence and improvisational theatre.

MethodsOver the last year we iterated from Version 1: Pyggy usingclassic machine learning and deterministic rules to Version2: A.L.Ex. which uses deep neural networks, advanced natu-ral language processing, and a much larger training dataset.While improvisational theatre is a complex artform mixingdialog, movement, and stagecraft, and there exist many im-provisational rules for the novice improvisor, in this studywe focus on a single component: training the dialog sys-tem. An Artificial Improvisor dialog system is composedof three major building blocks: 1) speech recognition, 2)speech generation, and 3) a dialogue management system.The three modules comprise a simplified framework, in-spired by the General Architecture of Spoken Dialogue Sys-tems, for extemporaneous dialog systems (Pietquin 2005).We detail these components for both Pyggy and A.L.Ex.

Version 1: Chatbot-based Improvisor PyggyPyggy, short for Pygmalion, was an early version of an Arti-ficial Improvisor. Pyggy was built using speech recognitionpowered by Google Cloud Speech API. Speech recognitiontranslated sound waves from human voice, to text through anetwork-dependent API. Pyggy used Apple Speech Synthe-sis for translated output text to sound. Dialogue managementwas handled with Pandorabots and Chatterbot.

Pandorabots handled hard-coded rules and deterministicresponses. For example, when the human said: ”Let’s startimprovising”, the system would always respond: ”Ok”. Pan-dorabots also handled saving simple named entities. For ex-ample, if the human said: ”My name is Lana” then the sys-tem could answer the simple recall question: ”What is myname?” Chatterbot was introduced to handle open dialogueand add a sense of randomness to the system. Chatterbotwas pre-trained on a set of dialogue, as described below, andthen ”learned” based on responses the human gave back tothe system. For a given human improvisor statement, eachof these systems would generate response, which were con-catenated and output to the user.

Pre-training of Pyggy was done through an interactivewebsite where individuals could directly interact in ba-sic chit-chat. Unfortunately, when the general public hadthe ability to interact with Pyggy many started to act ad-versarially and mischievously, training the system to sayrude and inappropriate things. Once the compiled train-ing set was cleaned and filtered, it was quite small. Addi-tional clean training data was appended from the CornellMovie Dialogue Corpus (Danescu-Niculescu-Mizil and Lee2011). The dataset is composed of 220579 conversationalexchanges from 617 movies and provided the system plentyof novel, interesting, and unique dialogue to pull from.

Pyggy is embodied by a visualization as seen in Fig. 1.The dynamic image-based visualization of Pyggy was ac-complished with Magic Music Visualizer. The simple an-imation system shifted the mouth on an image of a faceslightly when sound was being played by the speech gener-ation system. This effect gave Pyggy an animated life, face,and physical embodiment on stage.

Version 2: Neural Network-based A.L.Ex.(Artificial Language Experiment)There were limitations to the dialogue which Pyggy couldproduce, as it was restricted to the set of sentences presentin the training data. The system was crude in this sense, re-calling the most likely response to any input from the hu-man. As well, Pyggy had no means by which to understandor track the topic of a scene. These limitations prompted usto explore a word-by-word generation approach.

Automatic Language Generation in Improvised TheatreThe very nature of improvised theatre relies on spontaneousgenerative conversational abilities. Improvised theatre train-ing relies on teaching the actors games which force them toperform fast-paced word associations (e.g., ”electric ... car ...company”) or sentence completion (Johnstone 1979) with-out over-thinking any of their decisions. During these wordgeneration games, spontaneity is encouraged and failure(e.g., a non-grammatical choice of word, an onomatopoeiainstead of a word or simply a made-up, garbled word sug-gestion) is tolerated and celebrated. By celebrating failure,improvisors actively reinforce spontaneity and liberate thecreative process. Some of the games directly draw on theSurrealists’ Cadavres Exquis idea of taking turns in collab-orative art generation and require the players to build coher-ent narratives. Surprisingly, even the apparently most chal-lenging exercises, such as singing and songwriting, as prac-ticed in musical improvisation, still rely on the faculty ofspontaneous text generation. In the latter, the performers fol-low the rhythm and tune of an accompanist while singingrhyming text. Many musical improv teachers, and freestylerap artists recommend not to prepare rhymes in advance.Rather, they encourage starting lines without predeterminedideas of what rhyme can be found, and let the rhymes ariseorganically in the mind of the improvisor.

While the word generation process is destined to bespontaneous, it is not intrinsically random. Improvisors usetheir cultural background, their literary and pop-cultureknowledge, eloquence skills, and vernacular, to generate se-quences of words which seem most obvious to them. Eachline is statistically likely to occur given the context of theimprovisation.

Neural Language Model-based Text Generation We de-cided to imitate the creative process of improvisation us-ing a statistical language model that can generate text asa sequence of words. While building an open-domain con-versational agent is AI-hard, a generative dialogue systemthat is conditioned on previous text and that mimics col-laborative writing could give to the audience an illusionof sensible dialogue (Higashinaka and others 2014). Theneed for generative dialogue and language models required

shifting from the rule-based, deterministic learning systemsof Pyggy to deep neural network-based language modelwhich would generate sentences word by word. This newmodel, called A.L.Ex (Artificial Language Experiment) wasbuilt using recurrent neural networks (RNN) with long-shortterm memory (LSTM) (Mikolov and others 2010; Hochre-iter and Schmidhuber 1997). Contrary to much similar workin text generation (Sutskever, Martens, and Hinton 2011;Graves 2013), we decided not to rely on character-basedRNNs. This facilitates curating the vocabulary produced bythe dialogue system and thus immediately replace or removeoffensive words generated by the LSTM.

We experimented with multiple LSTM architectures withthe goal of building a dialogue model that can handle thetopics within an improvised scene over dozens of exchangesbetween the human and the AI. Starting from a first versionconsisting of 100,000 linear input word embeddings and atwo-layer LSTM with 256 hidden units followed by a soft-max over 100,000 output words. The second version con-tained 4 layers of 512-dimensional LSTMs and extra 64 in-puts to the first LSTM, coming from a Latent Dirichlet Allo-cation (Blei, Ng, and Jordan 2003) topic model, that enablesthe language model to integrate long-range dependencies inthe generated text and capture the general theme of the dia-logue (Mirowski et al. 2010), following the implementationfrom (Mikolov and Zweig 2012). The third version reliedon pre-trained word embeddings (GloVe, Global Vectors)(Pennington, Socher, and Manning 2014) as inputs, result-ing in a larger vocabulary of 250,000 input words (the GloVeword embedding matrix was considered as pre-trained andstayed fixed over the training) and only 50,000 output words.The fourth version cloned the 4-layer LSTM into a queryembedding module and a response generating module ina seq2seq architecture (Sutskever, Vinyals, and Le 2014;Kiros et al. 2015) with an attention model over the queryembedding vectors (Shang, Lu, and Li 2015).

Dataset The language model of A.L.Ex was trained ontranscribed subtitles from 102,916 movies from Open-Subtitles.org, going from 1902 to early 2016. This user-contributed subtitles dataset for dialogue model trainingcontains multiple languages and versions for each movie(Vinyals and Le 2015). The data were available as XMLfiles, with precise timestamps for each line of dialogue. Wekept one English subtitle version per movie. As we noticedthat subtitles tend to be split over time and that each changeof interlocutor is marked by a dash sign, we processed theXML files to adjoin lines of dialogue separated by 1 sec,starting with lowercased words and without an initial dash,into single lines of dialogue. Further processing involvedcorrecting common spelling mistakes to account for the of-ten erroneous subtitle input (e.g., substitutions of ”l” by ”I”or vice-versa, extra spaces between an apostrophe and thecontracted word or repetitions of letters, using a painstak-ingly hand-crafted set of about a thousand of regular expres-sions) and removal of such as information as ”subtitles by...”. The resulting files were lowercased. After text clean-up,we calculated that the top 50,000 words accounted for about99.4% of the total words appearing in the corpus. The text

contained about 880 million tokens (including dashes).The choice of a movie dialogue corpus, derived from

movie scripts, is fitting. Often improv comedy actors drawon previous experience, personal culture and practice in theirspontaneous creative process (Martin, Harrison, and Riedl2016). Future work will explore a variety of text-based data-sources including plays, short stories, transcripts of impro-vised performances, and symbolic plot points (Cook 1928).

System Architecture A.L.Ex. was designed to subvert themultiplicity of connected services which formed the archi-tecture of Pyggy. A.L.Ex. aimed to be an offline, standaloneArtificial Improvisor. While, similarly to Pyggy, speechrecognition and generation are still performed by ready-made tools, respectively Apple Enhanced Dictation and Ap-ple Speech Synthesis, these tools are run offline, on the samecomputer. Moreover, the entire text-based dialogue system(coded in Lua and Torch), was encapsulated into a singleprogram which then makes system calls to speech recogni-tion and text-to-speech, and was controlled through a graph-ical user interface which visualizes results (both the recog-nized and generated sentences in the dialogue). The coresystem is then extended with additional modules; it alsoruns a fault-resilient server which accepts incoming HTTPGET requests from client applications. These applicationsinclude software controlling a humanoid robot with pre-programmed motions that are activated when A.L.Ex speaks(see Figure 2). Applications have been written for control-ling both the EZ-Robot JD Humanoid and the AldebaranNao.

ResultsThere are challenges associated with testing, and quanti-tatively evaluating, open-domain dialogue systems (Glass1999; Higashinaka and others 2014). An obvious and rea-sonable first measure for qualitative assessment would besimilar to that of a human improvisor; that is, the audience-perceived performance level of an Artificial Improvisor dur-ing an improvisational performance. Thus, each of thesesystems has been tested live in front of audiences between5 and 100 people, for a total of 25, 7-60 minute perfor-mances between 8 April 2016 and 1 June 2017. As is com-mon in improvisation, show structure and order remainedlargely consistent, while content varied based on audiencesuggestion 1. Through audience feedback, the system hasbeen iteratively improved, through enhancement to the neu-ral network-based dialogue system, the addition of perfor-mance props (audio and video user interface, robotic avatar),novel improv comedy games involving the AI and the hu-mans (with optional audience participation), and scriptednarrative elements performed by the humans.

While human audience evaluation is realistic and com-parable with human improvisor evaluation, there are limita-tions. First, while an audience is evaluating the performanceof the AI in the scene, they may be biased by the fact thatan artificial intelligence is performing these lines. Due tothis bias, it will be worthwhile to compare human and AI-

1https://youtube.com/watch?v=FjxBdMRAZlE

based responses and scenes offline where the rater is blindto the source of the sides of the dialog. As well, the perfor-mance of these systems can be limited at several points in thedialog. Speech recognition is not solved, and environmen-tal noise (e.g. cheering/laughing audience) can influence thequality of input audio for speech-to-text. Finally, because ofthe high variability of live theatrical performance, the per-formance spaces (e.g. dark pub basements, or bright confer-ence venues), the narrative elements, the technical difficul-ties encountered during performance, and iterative experi-mentation, it is difficult for us to provide quantitative assess-ment of the AI-Improvisor in this study.

Even with these limitations, we believe it is important toreport on the success and failure thus far with qualitativeexamples, and leave subjective assessment to the reader. Im-portant future work will include large scale qualitative as-sessment from trained improvisors for qualities of improvi-sation and from the general public.

Version 1: Pyggy

Figure 1: Human performing on-stage with Pyggy. Theavatar of the AI is projected behind the human performer.

The first version of AI-based improv, Pyggy, performedin the show Artificial Improvisation live and in front of anaudience on April 8, 2016, as part of an invited performanceduring Rapid Fire Theatre’s 2016 Bonfire Festival. Whiletechnically demanding, it was generally well received by theaudience (Figure 1). Here is a short excerpt:

HUMAN: We’re having a picnic! Would you like cheese

and bread?

PYGGY: Seen one human, you’ve seen them all.

HUMAN: Promise me, right now, we’re here listening to

each other.

PYGGY: For fun? Where are all your queer party

friends?

HUMAN: Why don’t you have any feelings for me? That’s

why I had to invite you on a picnic. I wanted to get

some time to talk to each other.

PYGGY: I miss your vivid sense of humour.

Figure 2: Human performing onstage with A.L.Ex., person-ified as a robot. The user interface, showing speech recog-nition results and machine-generated response, is projectedbehind the performers. Image credits anonymised.

Version 2: A.L.Ex. (Artificial LanguageExperiment)Independently developed A.L.Ex. performed for the firsttime in front of a live audience on July 24, 2016, duringan improv drop-in in London. It has been performed 24times and on two continents since then (Figure 2). We pro-vide below a short transcript from an early performance withA.L.Ex. As can be noticed, the dialogue system somewhatmanages to keep track of the general theme of the conver-sation (dramatic, with words related to navigation and com-bat).

HUMAN: Captain, the ship is under attack. The frigate

is arriving...

A.L.EX: You see, I don’t feel the same way. I just

don’t want to stay in the position to get burnt.

HUMAN: We’re all going to get burned by a cannon

fire...

A.L.EX: While you’re in there, tell me. When I get

dressed, I’ll go up there and see what happens.

HUMAN: Captain, you need to get on the deck, the crew

is expecting you to speak, to take leadership. We

are in the middle of a war and this is an invasion

of foreign ships. Captain, show us your leadership.

A.L.EX: We won’t be able to navigate without them

as we are, serving for us to act like a life of

violence and to help us die from the most powerful

life of our time.

The subsequent experiments detailed in this section wereall performed using iteratively improved versions of A.L.Ex.

Human-AI Collaboration Through GamesWith the base system in place, we have explored several al-ternative formats, often called structures or games in impro-visation (Riedl and Stern 2006; Magerko and others 2009).We explored a variety of formats as the context within whichthe AI-Improvisor performs is critical toward the measureof audience perception of quality. In improvisation, a majortenant is to make one’s fellow improvisors shine. We createda show built on structures where both A.L.Ex. and humanimprovisors could shine.

Figure 3: Two humans performing on-stage with A.L.Ex.One of the humans is remotely connecting, adding to thecomplexity of the show setup.

Justification GameThe most extreme case of enhancing the stature of a humanimprovisor arises from games where the actor is confrontedwith ridiculously difficult challenges that he or she success-fully overcomes (Johnstone 1979). One such game is calledpick-a-line or lines from a hat and consists in the player in-termittently and randomly picking a line of dialogue (typ-ically unrelated to the current improvisation), reading italoud, and seamlessly integrating it into the scene. The hu-mour generally arises from the improvisor’s skill in justify-ing that line of dialogue or from the line being coinciden-tally appropriate. We found that, because of the limitationsof speech recognition and of the dialogue system in A.L.Ex,many of the human-AI interactions ended up following theparadigm of justification games.

Multiple-choice Human-mediated DialogueA multiple-choice game was the first format that we ex-plored outside of the basic structure of two improvisors en-gaging in a basic dialog in a scenic setting. In this format,the AI visually presented several candidate responses on ascreen, but did not say any of the responses. Instead, an au-dience volunteer would select their preferred response andread it aloud. In this way, we were able to directly engagean audience member in the performance (this demolition ofthe fourth wall is common in improvisation). When the au-dience is invited to directly interact with the AI on stage,an additional tension is introduced in the room: how will anuntrained human react if the AI offers multiple interestingcandidates, and what if there are no interesting candidatesgenerated? We observed that these games presented the chal-lenge of the audience member having to share attention be-tween screen and human improvisor and suffered from lowenergy and audience engagement.

Multiple-person GamesAdditionally, we explored dynamics where the AI playeda single character in a scene with multiple humans. First,we introduced multiple humans in the same physical space.In this situation, A.L.Ex. plays alongside two human per-formers. We noticed that there is often a tendency for thetwo humans to form a ’side’, acting together ’against’ theAI. Much more interesting scene dynamics emerged when

we challenged one of the human performers to align withA.L.Ex’s character in the scene. Extending from this work,we then tried including the second human through a re-mote connection (Google Hangout, see Fig. 3). A.L.Ex. wasthen able to interact with the physical human and the re-mote human. Network latency continues to prove challeng-ing and we are still exploring means by which to overcomethese challenges. We then instantiated multiple versions ofA.L.Ex. in scenes. In this way, we could balance the two hu-mans on stage with two robotic improvisors. This presentedopportunities for interesting connections and relationships,as well as challenges as the timing of the two AI-based im-provisors can be noticeably different.

Comparison with ELIZAFinally, we built an audience interaction game in homageto one of the earliest chatbot systems, ELIZA, by JosephWeizenbaum (Weizenbaum 1966). In this format, an audi-ence member is invited to the stage to discuss an ailmentwith an AI therapist, that being A.L.Ex. in ELIZA mode.Interestingly, while ELIZA is powered by relatively simpledeterministic response rules given certain decompositions ofthe human’s input statement, this is an audience favorite andoften well received during shows. It is important to pay spe-cial attention to this note, as the holistic performance of anAI-improvisor should be evaluated based not only on howwell it is received, but also on the novelty and uniqueness ofthe scenes it performs.

Many games were selected to allow for clear, distincttrade-off between multiple improvisors within consistentsettings. Often our systems fail through mis-understandingspeech-to-text input, or robot/human interruption, due tolack of social cueing and perception. By embracing andlearning from these failures, we will continue to inno-vate and experiment to better understand and showcase thestrengths of A.L.Ex.

DiscussionFuture work will incorporate recent advances in deep rein-forcement learning for dialogue generation (Ranzato et al.2015). Through design of reward functions, more interestingdialogue may be encouraged. Three useful conversationalproperties recently shown to improve long-term success ofdialogue training are: informativity, coherence, and ease ofanswering (Li and others 2016). Additional reward schemesmay improve, or tune, the trained deep neural network baseddialogue managers. Recent work has shown that reinforce-ment learning can be used to tune music generation archi-tectures (Jaques and others 2016). Rewarding linguistic fea-tures (i.e. humor, novelty, alliteration) may prove useful indialog generation (Hollis, Westbury, and Lefsrud 2016).

This study focused on building a dialog system for im-provisational performance. Improv theatre is a young prac-tice, but there exists several books of rules for novice im-provisational training which could be useful for future stud-ies (Napier 2004). Future iterations of these systems couldinclude common improvisational rules, such as ’status con-trast’ the ’Yes, and...’ theory of accept and expand, andcomedic rules, such as the ’rule-of-three’ (Johnstone 1979).

Adversarial methods for natural language are anothermeans of recent exploration (Li et al. 2017; Rajeswar etal. 2017). While the results are interesting and informa-tive, these works are still limited in the objective functionsand evaluation criteria used often relying on log-likelihoodscores, BLEU (Papineni et al. 2002) or ROUGE (Lin 2004)scores. Additional evaluation metrics must be devised toscore these open-domain dialog systems (Liu et al. 2016;Lipton, Berkowitz, and Elkan 2015; Ranzato et al. 2015).

Natural human conversations are seldom limited in scope,jump from topic to topic, and are laced with metaphor andsubtext. Artificial Improvisors of the future should make useof recent advances in artificial memory and attention mod-els. As well, humans often make use of non-verbal cues dur-ing dialog. By incorporating this additional information, hu-man(s) could both consciously and subconsciously informthe learning system (Mathewson and Pilarski 2016). Addi-tionally, if the Artificial Improvisor is modeled as a goal-seeking agent, then shared agency could be quantified andcommunicative capacity could be learned and optimized forduring the performance (Pilarski and Mathewson 2015).

While the system is trained to perform dialog, it is nottrained to tell a cohesive story with a narrative arc. The ad-dition of memory network advancements may improve call-back; additional engineering and training will be necessaryto collaboratively build a narrative arc. In 1928, WilliamCook published a book on algorithmic plot developmentwhich may serve this purpose, and implementations andconnections have yet to be explored (Cook 1928).

Thought must be given to the interface through which hu-mans and artificial performers interact (Yee-King and dIn-verno 2016; McCormack and d’Inverno 2012; 2016). Theembodiment of the Artificial Improvisor has been investi-gated with Pyggy and A.L.Ex. using on-screen visualiza-tions and robotics. Stage presence is critical to ensure thatan Artificial Improvisation show is enjoyable and engaging.Improvisational performances are not strictly conversationaland often demand physicality from performers. Collabora-tion between scientists and creatives will lead to innovativeinteractions and immersive art. With the growing popularityof interactive mixed-reality experiences, as well as recentadvances in natural language processing, speech, and musicgeneration, there are exciting avenues of future investigation(Oord and others 2016; Arik and others 2017).

Improvisational theatre is a domain where experimen-tation is encouraged, where interaction is paramount, andwhere failure flourishes. It allows artificial intelligenceagents to be effectively tested, and audience reaction canprovide a subjective measure of improvement and cog-nizance. While this may feel similar to the Turing Test, anearly attempt to separate mind from machine through a gameof imitation, deception and fraud, we believe it is much more(Turing 1950). Success will be measured by audience pref-erence to engage in shows incorporating Artificial Improvi-sation and human desire to participate.

Games, like chess and Go, are complex, but computa-tional solutions can be approximated. Improvisational the-atre demands creativity, rapid artistic generation, and naturallanguage processing. Improvisation is not a zero-sum game,

especially as these systems learn to converse open-domainsettings (Glass 1999; Higashinaka and others 2014).

Future work will continue to explore the evaluation of per-formance in such an open domain. Performances with Ar-tificial Improvisors continue to spur questions and insightsfrom other performers and audiences alike. A formal qual-itative evaluation, in the lab, with expert improvisors inter-acting with the system, is planned to explore system qual-ity with additional rigour. We look forward to the distantgoal of the human observer, as a fly on the wall, watchingAI-improvisors on-stage in front of a full audience of AI-observers.

ReferencesArik, S. O., et al. 2017. Deep Voice: Real-time Neural Text-to-Speech. ArXiv e-prints.Blei, D. M.; Ng, A. Y.; and Jordan, M. I. 2003. Latentdirichlet allocation. JMLR 3(Jan):993–1022.Cook, W. W. 1928. Plotto: a new method of plot suggestionfor writers of creative fiction. Ellis.Cukor, G. 1964. My fair lady.Danescu-Niculescu-Mizil, C., and Lee, L. 2011.Chameleons in imagined conversations: A new approach tounderstanding coordination of linguistic style in dialogs. InProc Wkshp on CMCL, ACL.Glass, J. 1999. Challenges for spoken dialogue systems. InProc 1999 IEEE ASRU Wkshp.Graves, A. 2013. Generating sequences with recurrent neu-ral networks. arXiv preprint arXiv:1308.0850.Hayes-Roth, B., and Van Gent, R. 1996. Improvisationalpuppets, actors, and avatars. In Proc Computer Game De-velopers Conf.Higashinaka, R., et al. 2014. Towards an open-domain con-versational system fully based on natural language process-ing. In COLING, 928–939.Hochreiter, S., and Schmidhuber, J. 1997. Long short-termmemory. Neural computation 9(8):1735–1780.Hoffman, G., and Weinberg, G. 2011. Interactive improvi-sation with a robotic marimba player. In Musical robots andinteractive multimodal systems. Springer. 233–251.Hollis, G.; Westbury, C.; and Lefsrud, L. 2016. Extrapolat-ing human judgments from skip-gram vector representationsof word meaning. Q J Exp Psycho 1–17.Jaques, N., et al. 2016. Tuning recurrent neural networkswith reinforcement learning. CoRR abs/1611.02796.Johnstone, K. 1979. Impro. Improvisation and the theatre.Faber and Faber Ltd.Kiros, R.; Zhu, Y.; Salakhutdinov, R. R.; Zemel, R.; Urtasun,R.; Torralba, A.; and Fidler, S. 2015. Skip-thought vectors.In Advances in NIPS, 3294–3302.Li, J., et al. 2016. Deep reinforcement learning for dialoguegeneration. arXiv preprint arXiv:1606.01541.Li, J.; Monroe, W.; Shi, T.; Ritter, A.; and Jurafsky, D. 2017.Adversarial learning for neural dialogue generation. arXivpreprint arXiv:1701.06547.

Lin, C.-Y. 2004. Rouge: A package for automatic evaluationof summaries. In Proc ACL Wkshp. Vol 8.Lipton, Z. C.; Berkowitz, J.; and Elkan, C. 2015. A criticalreview of recurrent neural networks for sequence learning.arXiv preprint arXiv:1506.00019.Liu, C.-W.; Lowe, R.; Serban, I. V.; Noseworthy, M.; Char-lin, L.; and Pineau, J. 2016. How not to evaluate your di-alogue system: An empirical study of unsupervised evalua-tion metrics for dialogue response generation. arXiv preprintarXiv:1603.08023.Magerko, B., et al. 2009. An empirical study of cognitionand theatrical improvisation. In Creativity and Cognition.Magerko, B.; Dohogne, P.; and DeLeon, C. 2011. Employ-ing fuzzy concept for digital improvisational theatre.Martin, L. J.; Harrison, B.; and Riedl, M. O. 2016. Im-provisational computational storytelling in open worlds. InInteractive Storytelling: 9th Intl Conf on Interactive Digi-tal Storytelling, Los Angeles, USA, November 15–18, 2016,Proc 9, 73–84. Springer.Mathewson, K. W., and Pilarski, P. M. 2016. SimultaneousControl and Human Feedback in the Training of a RoboticAgent with Actor-Critic Reinforcement Learning. In IJCAIInteractive Machine Learning Wkshp.McCormack, J., and d’Inverno, M. 2012. Computers andcreativity: The road ahead. In Computers and creativity.Springer. 421–424.McCormack, J., and d’Inverno, M. 2016. Designing impro-visational interfaces.Mikolov, T., et al. 2010. Recurrent neural network basedlanguage model. In Interspeech, volume 2, 3.Mikolov, T., and Zweig, G. 2012. Context dependent recur-rent neural network language model. SLT 12:234–239.Mirowski, P.; Chopra, S.; Balakrishnan, S.; and Bangalore,S. 2010. Feature-rich continuous language models forspeech recognition. In Spoken Language Technology Wk-shp, 2010 IEEE, 241–246. IEEE.Mori, M.; MacDorman, K. F.; and Kageki, N. 2012. The un-canny valley [from the field]. IEEE Robotics & AutomationMagazine 19(2):98–100.Napier, M. 2004. Improvise: Scene from the inside out.Heinemann Drama.O’Neill, B.; Piplica, A.; Fuller, D.; and Magerko, B. 2011.A knowledge-based framework for the collaborative impro-visation of scene introductions. In ICIDS, 85–96. Springer.Oord, A. v. d., et al. 2016. Wavenet: A generative model forraw audio. arXiv preprint arXiv:1609.03499.Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002.Bleu: a method for automatic evaluation of machine trans-lation. In Proc 40th Meeting on Association for Computa-tional Linguistics, 311–318.Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove:Global vectors for word representation. In EMNLP, vol-ume 14, 1532–1543.Perlin, K., and Goldberg, A. 1996. Improv: A system forscripting interactive actors in virtual worlds. In Proc Conf on

Computer Graphics and Interactive Techniques, 205–216.ACM.Pietquin, O. 2005. A framework for unsupervised learningof dialogue strategies. Presses univ. de Louvain.Pilarski, Patrick M., R. S. S., and Mathewson, K. W. 2015.Prosthetic devices as goal-seeking agents. In 2nd Wkshpon Present and Future of Non-Invasive Peripheral-Nervous-System Machine Interfaces, Singapore.Rajeswar, S.; Subramanian, S.; Dutil, F.; Pal, C.; andCourville, A. 2017. Adversarial generation of natural lan-guage. arXiv preprint arXiv:1705.10929.Ranzato, M.; Chopra, S.; Auli, M.; and Zaremba, W. 2015.Sequence level training with recurrent neural networks.arXiv preprint arXiv:1511.06732.Riedl, M. O., and Stern, A. 2006. Believable agents andintelligent story adaptation for interactive storytelling. InIntl Conf on Technologies for Interactive Digital Storytellingand Entertainment, 1–12. Springer.Shang, L.; Lu, Z.; and Li, H. 2015. Neural respond-ing machine for short-text conversation. arXiv preprintarXiv:1503.02364.Shaw, G. B. 1913. Pygmalion. Simon and Schuster.Shelley, M. W. 1818. Frankenstein. WW Norton.Si, M.; Marsella, S. C.; and Pynadath, D. V. Thespian: Anarchitecture for interactive pedagogical drama.Sutskever, I.; Martens, J.; and Hinton, G. E. 2011. Generat-ing text with recurrent neural networks. In Proc 28th ICML,1017–1024.Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequenceto sequence learning with neural networks. In NIPS, 3104–3112.Thomaz, A.; Hoffman, G.; Cakmak, M.; et al. 2016. Compu-tational human-robot interaction. Foundations and Trends R©in Robotics 4(2-3):105–223.Turing, A. M. 1950. Computing machinery and intelligence.Mind 59(236):433–460.Vinyals, O., and Le, Q. 2015. A neural conversationalmodel. arXiv preprint arXiv:1506.05869.Weizenbaum, J. 1966. ELIZA A Computer Program for theStudy of Natural Language Communication Between Manand Machine. In Communications of the ACM.Yee-King, M., and dInverno, M. 2016. Experience drivendesign of creative systems. In Proc 7th Computational Cre-ativity Conf. Universite Pierre et Marie Curie.Zeglin, G., et al. 2014. Herb’s sure thing: A rapid dramasystem for rehearsing and performing live robot theater. In2014 IEEE Intl Wkshp on Advanced Robotics and its SocialImpacts, 129–136.Zhang, L.; Gillies, M.; Barnden, J. A.; Hendley, R. J.; Lee,M. G.; and Wallington, A. M. 2007. Affect detection and anautomated improvisational ai actor in e-drama. In ArtificalIntelligence for Human Computing. Springer. 339–358.

improvised theatre alongside artificial...

Documents