strategies for structuring story generation · 2020. 1. 16. · strategies for structuring story...

10
Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy [email protected] Mike Lewis FAIR, Seattle [email protected] Yann Dauphin Google AI * [email protected] Abstract Writers generally rely on plans or sketches to write long stories, but most current lan- guage models generate word by word from left to right. We explore coarse-to-fine mod- els for creating narrative texts of several hun- dred words, and introduce new models which decompose stories by abstracting over ac- tions and entities. The model first generates the predicate-argument structure of the text, where different mentions of the same entity are marked with placeholder tokens. It then gen- erates a surface realization of the predicate- argument structure, and finally replaces the en- tity placeholders with context-sensitive names and references. Human judges prefer the sto- ries from our models to a wide range of previ- ous approaches to hierarchical text generation. Extensive analysis shows that our methods can help improve the diversity and coherence of events and entities in generated stories. 1 Introduction Stories exhibit structure at multiple levels. While existing language models can generate stories with good local coherence, they struggle to coalesce in- dividual phrases into coherent plots or even main- tain consistency of the characters throughout the story. One reason for this failure is that classical language models generate the whole story at the word level, which makes it difficult to capture the high-level interactions between the plot points. To address this, we investigate novel decompo- sitions of the story generation process that break down the problem into a series of easier coarse-to- fine generation problems. These decompositions can offer three advantages: They allow more abstract representations to be generated first, in which challenging long- range dependencies may be more apparent. *Work done while at Facebook AI Research Figure 1: Proposed Model. Conditioned upon the prompt, we generate sequences of predicates and ar- guments. Then, a story is generated with placeholder entities such as ent0. Finally we replace the placehold- ers with specific references. They allow specialized modelling techniques for the different stages, which exploit the structure of the specific sub-problem. They are applicable to any textual dataset and require no manual labelling. Several hierarchical models for story generation have recently been proposed (Xu et al., 2018; Yao et al., 2019), but it is not well understood which properties characterize a good decomposition. We therefore implement and evaluate several repre- sentative approaches based on keyword extraction, sentence compression, and summarization. We build on this understanding to devise the proposed decomposition (Figure 1). Our approach breaks down the generation process in three steps: modelling the action sequence, the narrative, and then entities (such as story characters). To model action sequences, we first generate the predicate- argument structure of the story by generating a se- quence of verbs and arguments. This represen- arXiv:1902.01109v1 [cs.CL] 4 Feb 2019

Upload: others

Post on 06-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

Strategies for Structuring Story Generation

Angela FanFAIR, Paris

LORIA, [email protected]

Mike LewisFAIR, Seattle

[email protected]

Yann DauphinGoogle AI∗

[email protected]

Abstract

Writers generally rely on plans or sketchesto write long stories, but most current lan-guage models generate word by word fromleft to right. We explore coarse-to-fine mod-els for creating narrative texts of several hun-dred words, and introduce new models whichdecompose stories by abstracting over ac-tions and entities. The model first generatesthe predicate-argument structure of the text,where different mentions of the same entity aremarked with placeholder tokens. It then gen-erates a surface realization of the predicate-argument structure, and finally replaces the en-tity placeholders with context-sensitive namesand references. Human judges prefer the sto-ries from our models to a wide range of previ-ous approaches to hierarchical text generation.Extensive analysis shows that our methods canhelp improve the diversity and coherence ofevents and entities in generated stories.

1 Introduction

Stories exhibit structure at multiple levels. Whileexisting language models can generate stories withgood local coherence, they struggle to coalesce in-dividual phrases into coherent plots or even main-tain consistency of the characters throughout thestory. One reason for this failure is that classicallanguage models generate the whole story at theword level, which makes it difficult to capture thehigh-level interactions between the plot points.

To address this, we investigate novel decompo-sitions of the story generation process that breakdown the problem into a series of easier coarse-to-fine generation problems. These decompositionscan offer three advantages:

• They allow more abstract representations tobe generated first, in which challenging long-range dependencies may be more apparent.

*Work done while at Facebook AI Research

Figure 1: Proposed Model. Conditioned upon theprompt, we generate sequences of predicates and ar-guments. Then, a story is generated with placeholderentities such as ent0. Finally we replace the placehold-ers with specific references.

• They allow specialized modelling techniquesfor the different stages, which exploit thestructure of the specific sub-problem.• They are applicable to any textual dataset and

require no manual labelling.

Several hierarchical models for story generationhave recently been proposed (Xu et al., 2018; Yaoet al., 2019), but it is not well understood whichproperties characterize a good decomposition. Wetherefore implement and evaluate several repre-sentative approaches based on keyword extraction,sentence compression, and summarization.

We build on this understanding to devise theproposed decomposition (Figure 1). Our approachbreaks down the generation process in three steps:modelling the action sequence, the narrative, andthen entities (such as story characters). To modelaction sequences, we first generate the predicate-argument structure of the story by generating a se-quence of verbs and arguments. This represen-

arX

iv:1

902.

0110

9v1

[cs

.CL

] 4

Feb

201

9

Page 2: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

tation is more structured than free text, makingit easier for the model learn dependencies acrossevents. To model entities, we initially generate aversion of the story where different mentions ofthe same entity are replaced with placeholder to-kens. Finally, we re-write these tokens into dif-ferent references for the entity, based on both itsprevious mentions and global story context.

The models are trained on a large dataset of300k stories, and we evaluate quality both in termsof human judgments and using automatic met-rics. We find that our novel approach leads tosignificantly better story generation. Specifically,we show that generating the action sequence firstmakes the model less prone to generating genericevents, leading to a much greater diversity ofverbs. We also find that by using sub-word mod-elling for the entities, our model can produce novelnames for locations and characters that are appro-priate given the story context.

2 Models

The crucial challenge of long story generation liesin maintaining coherence across a large numberof generated sentences—in terms of both the log-ical flow of the story, and the characters and enti-ties. While there has been much recent progressin left-to-right text generation, particularly usingself-attentive architectures (Dai et al., 2018; Liuet al., 2018), we find that models still struggle tomaintain coherence to produce interesting storieson par with human writing. We therefore intro-duce strategies to decompose neural story gener-ation into coarse-to-fine steps to make modellinghigh-level dependencies easier.

2.1 Tractable DecompositionsIn general, we can decompose the generation pro-cess by converting a story x into a more abstractrepresentation z. The negative log likelihood ofthe decomposed problem is given by

L = − log∑z

p(x|z)p(z). (1)

We can generate from this model by first samplingfrom p(z) and then sampling from p(x|z). How-ever, the marginalization over z is in general in-tractable, except in special cases where every xcan only be generated by a single z (for exam-ple, if the transformation removed all occurrencesof certain tokens). Instead, we minimize a vari-ational upper bound of the loss by constructing a

deterministic posterior q(z|x) = 1z=z∗ , where z∗

can be given by running semantic role labeller orcoreference resolution system on x. Put together,we optimize the following loss:

z∗ = argmaxz

p(z|x) (2)

L ≤ − log p(x|z∗)− log p(z∗) (3)

This approach allows models p(z∗) and p(x|z∗) tobe trained tractably and separately.

2.2 Model ArchitecturesWe build upon the convolutional sequence-to-sequence architecture (Gehring et al., 2017). Deepconvolutional networks are used as the encoderand decoder, connected with an attention mod-ule (Bahdanau et al., 2015) that performs aweighted sum of the encoder output. The decoderuses a gated multi-head self-attention mechanism(Vaswani et al., 2017; Fan et al., 2018) to allow themodel to refer to previously generated words andimprove the ability to model long-range context.

2.3 Modelling Action SequencesTo decompose a story into a structured form thatemphasizes logical sequences of actions, we useSemantic Role Labeling (SRL). SRL identifiespredicates and arguments in sentences, and as-signs each argument a semantic role. This repre-sentation abstracts over different ways of express-ing the same semantic content. For example, Johnate the cake and the cake that John ate would re-ceive identical semantic representations.

Conditioned upon the prompt, we generate anSRL decomposition of the story by concatenatingthe predicates and arguments identified by a pre-trained model (He et al., 2017; Tan et al., 2018)1

and separating sentences with delimiter tokens.We place the predicate verb first, followed by itsarguments in canonical order. To focus on themain narrative, we retain only core arguments.

Verb Attention Mechanism SRL parses aremore structured than free text, allowing scopefor more structured models. To encourage themodel to consider sequences of verbs, we desig-nate one of the heads of the decoder’s multiheadself-attention to be a verb-attention head (see Fig-ure 2). By masking the self-attention appropri-ately, this verb-attention head can only attend to

1for predicate identification, we usehttps://github.com/luheng/deep srl, for SRL given pred-icates, we use https://github.com/XMUNLP/Tagger

Page 3: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

Figure 2: Verb-Attention. To improve the model’s ability to condition upon past verbs, one head of the decoder’sself-attention mechanism is specialized to only attend to previously generated verbs.

previously generated verbs. When the text doesnot yet have a verb, the model attends to a paddingtoken. We show that focusing on verbs with a spe-cific attention head generates a more diverse arrayof verbs and reduces repetition in generation.

2.4 Modelling Entities

The challenges of modelling characters through-out a story is twofold: first, entities such as char-acter names are rare tokens, which make themhard to model for neural language models. Humanstories often feature novel character or locationnames. Second, maintaining the consistency of aspecific set of characters is difficult, as the sameentity may be referenced by many different stringsthroughout a story—for example Bilbo Baggins,he, and the hobbit may refer to the same entity.It is challenging for existing language models totrack which words refer to which entity purelyfrom a language modelling objective.

We address both problems by first generating aform of the story with different mentions of thesame entity replaced by a placeholder token (e.g.ent0), similar to Hermann et al. (2015). We thenuse a sub-word seq2seq model trained to replaceeach mention with a reference, based on its con-text. The sub-word model is better equipped tomodel rare words and the placeholder tokens makemaintining consistency easier.

2.4.1 Generating Entity Anonymized StoriesWe explore two approaches to identifying andclustering entities:

• NER Entity Anonymization: We use anamed entity recognition (NER) model2 toidentify all people, organizations, and loca-tions. We replace these spans with place-holder tokens (e.g. ent0). If any two entitymentions have an identical string, we replace

2Specifically, Spacy: https://spacy.io/api/entityrecognizer,en core web lg

them with the same placeholder. For exam-ple, all mentions of Bilbo Baggins will be ab-stracted to the same entity token, but Bilbowould be a separate abstract entity.• Coreference-based Entity Anonymization:

The above approach cannot detect differ-ent mentions of an entity that use differentstrings. Instead, we use the Coreference Res-olution model from Lee et al. (2018)3 to iden-tify clusters of mentions. All spans in thesame cluster are then replaced with the sameentity placeholder string. Coreference mod-els do not detect singleton mentions, so wealso replace non-coreferent named entitieswith unique placeholders.

2.4.2 Generating Entity References in a StoryWe train models to replace placeholder entitymentions with the correct surface form, forboth NER-based and coreference-based entityanonymised stories. Both our models use aseq2seq architecture that generates an entity ref-erence based on its placeholder and the story. Tobetter model the specific challenges of entity gen-eration, we also make use of a pointer mechanismand sub-word modelling.

Pointer Mechanism Generating multiple con-sistent mentions of rare entity names is challeng-ing. To make it easier for the model to re-use pre-vious names for an entity, we augment the stan-dard seq2seq decoder with a pointer-copy mecha-nism (Vinyals et al., 2015). To generate an entityreference, the decoder can either generate a newentity string or choose to copy an already gener-ated entity reference, which encourages the modelto use consistent naming for the entities.

To train the pointer mechanism, the final hiddenstate of the model h is used as input to a classifierpcopy(h) = σ(wcopy · h). wcopy is a fixed dimen-sion parameter vector. When the model classifier

3https://github.com/kentonl/e2e-coref

Page 4: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

Figure 3: Input for Coreferent entity reference generation. The model has a representation of the entity context ina bag of words form, all previous predicted values for the same anonymized entity token, and the full text story.The green circle represents the entity mention the model is attempting to fill.

predicts to copy, the previously decoded entity to-ken with the maximum attention value is copied.One head of the decoder multi-head self-attentionmechanism is used as the pointer attention head,to allow the heads to specialize.

Sub-word Modelling Entities are often rare ornovel words, so word-based vocabularies can beinadequate. We compare entity generation usingword-based, byte-pair encoding (BPE) (Sennrichet al., 2015), and character-level models.

NER-based Entity Reference GenerationHere, each placeholder string should map ontoone (possibly multiword) surface form—e.g. alloccurrences of the placeholder ent0 should maponly a single string, such as Bilbo Baggins. Wetrain a simple model that maps a combinationplaceholder token and story (with anonymizedentities) to the surface form of the placeholder.While the placeholder can appear multiple times,we only make one prediction for each placeholderas they all correspond to the same string.

Coreference-based Entity Reference Genera-tion Generating entities based on coreferenceclusters is more challenging than for our NERentity clusters, because different mentions of thesame entity may use different surface forms. Wegenerate a separate reference for each mention byadding the following inputs to the above model:

• A bag-of-words context window around thespecific entity mention, which allows localcontext to determine if an entity should be aname, pronoun or nominal reference.• Previously generated references for the same

entity placeholder. For example, if the modelis filling in the third instance of ent0, it re-ceives that the previous two generations forent0 were Bilbo, him. Providing the previousentities allows the model to maintain greaterconsistency between generations.

3 Experimental Setup

3.1 Data

We use the WRITINGPROMPTS dataset4 of 300kstory premises paired with long stories. Storiesare on average 734 words, making the generationsignificantly longer compared to related work onstoryline generation. In this work, we focus onthe prompt to story generation aspect of this task.We follow the previous preprocessing of limitingstories to 1000 words and fixing the vocabularysize to 19,025 for prompts and 104,960 for stories.

3.2 Baselines

We compare our results to the Fusion model fromFan et al. (2018) which generates the full story di-rectly from the prompt. We also implement vari-ous decomposition strategies as baselines:

• Summarization: We propose a new base-line that generates a summary conditionedupon the prompt and then a story conditionedupon the summary. Story summaries are ob-tained with a multi-sentence summarizationmodel(Wu et al., 2019) trained on full-textCNN-Dailymail 5 and applied to stories.• Keyword Extraction: We generate a series of

keywords conditioned upon the prompt andthen a story conditioned upon the keywords,based on Yao et al. (2019). Following Yao etal, we extract keywords with the RAKE algo-rithm (Rose et al., 2010)6. Yao et al. extractone word per sentence, but we find that ex-tracting n = 10 keyword phrases per storyworked well as our stories are much longer.• Sentence Compression: Inspired by Xu et al.

(2018), we generate a story with compressedsentences conditioned upon the prompt and

4Dataset download5Dataset download6https://pypi.org/project/rake-nltk/

Page 5: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

Figure 4: Human evaluations of different decomposed models for story generation. We find that using SRL actionplans and coreference-resolution to build entity clusters generates stories that are preferred by human judges.

Decomposition Stage 1− log p(z∗)

Stage 2− log p(x|z∗)

Summary 4.20 5.09Keyword 6.92 4.23Compression 5.05 3.64SRL Action Plan 2.72 3.95NER Entity Anonymization 3.32 4.75Coreference Anonymization 3.15 4.55

Table 1: Negative log likelihood of generating storiesusing different decompositions (lower is better). Stage1 is the generation of the intermediate representationz∗, and Stage 2 is the generation of the story x condi-tioned upon z∗. Entity generation is with a word-basedvocabulary to be consistent with the other models.

then a story conditioned upon the compressedshorter story. We use the same deletion-basedcompression data as Xu et al., from Filippovaand Altun (2013)7. We train a seq2seq modelto compress all non-dialog story sentences.The compressed sentences are concatenatedto generate the compressed story.

3.3 TrainingWe implement models using fairseq-py(Gehring et al., 2017)8 in PyTorch and train Fanet al. (2018)’s convolutional architecture. We tuneall hyperparameters on validation data.

3.4 GenerationWe suppress the generation of unknown tokens.We require stories to be at least 150 words andcut off the story at the nearest sentence for storieslonger than 250 words (to ease human evaluation).We generate stories with temperature 0.8 and ran-dom top k sampling, where next words are sam-pled from the top k = 10 candidates rather thanthe entire vocabulary distribution.

7Dataset download8https://github.com/pytorch/fairseq/

4 Experiments

4.1 Comparing Decomposition StrategiesAutomated Evaluation We compare the rela-tive difficulty of modelling through each decom-position strategy by measuring the log loss of thedifferent stages in Table 1. We observe that gener-ating the SRL structure has a lower negative log-likelihood and so is much easier than generatingeither summaries, keywords, or compressed sen-tences — a benefit of its more structured form. Wefind keyword generation is especially difficult asthe identified keywords are often the more salient,rare words appearing in the story, which are chal-lenging for neural seq2seq models. This suggeststhat rare words should appear mostly at the lastlevels of the decomposition. Finally, we comparemodels with entity-anonymized stories as an in-termediate representation, either with NER-basedor coreference-based entity anonymization. En-tity references are then filled using a word-basedmodel9. The entity fill is the more difficult stage.

Human Evaluation To compare overall storyquality using various decomposition strategies, weconduct human evaluation. Judges marked whichstory they prefer from 2 choices. 100 stories areevaluated for each model by 3 judges.

Figure 5 shows that human evaluators prefer ournovel decompositions over a carefully tuned Fu-sion model from Fan et al. (2018) by about 60%in a blind comparison. We see additive gains frommodelling actions and entities.

In a second study, evaluators compared base-lines against stories generated by our strongestmodel, which uses SRL-based action plans andcoreference-based entity anonymization. In allcases, our full decomposition is preferred.

9To make likelihoods are comparable across models.

Page 6: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

Figure 5: Our decomposition can generate more coher-ent stories than previous work.

Qualitatively, we find that many poor genera-tions result from mistakes in early stages. Subse-quent models were not exposed to errors duringtraining, so are not able to recover.

4.2 Effect of SRL Decomposition

Human-written stories feature a wide varietyof events, while neural models are plagued bygeneric generations and repetition.

Table 2 quantifies model performance on twometrics to assess action diversity: (1) the numberof unique verbs generated, averaged across all sto-ries (2) the percentage of diverse verbs, measuredby the percent of all verbs generated in the test setthat are not one of the top 5 most frequent verbs. Ahigher percentage indicates more diverse events.10

Our decomposition using the SRL predicate-argument structure improves the model’s abilityto generate diverse verbs. Adding verb attentionleads to further improvement. Qualitatively, themodel can often outline clear action sequences, asshown in Figure 6. However, all models remainfar from matching the diversity of human stories.

4.3 Comparing Entity Reference Models

We explored a variety of different ways to generatethe full text of abstracted entities—using differentamounts of context, and different granularities ofsubword generation. To compare these models,we calculated their accuracy at predicting the cor-rect reference in Table 3. Each model evaluatesn = 10, 50, 100 different entities in the test set, 1real and n−1 randomly sampled distractors. Mod-

10We identify verbs using Spacy: https://spacy.io/

Figure 6: Example generated action plan. It shows aplausible sequence of actions for a character.

Model # UniqueVerbs

% DiverseVerbs

Human Stories 34.0 76.5Fusion 10.3 61.1Summary 12.4 60.6Keyword 9.1 58.2Compression 10.3 54.3SRL 14.4 62.5+ verb-attention 15.9 64.9

Table 2: Action Generation. Generating the SRL struc-ture improves verb diversity and reduces repetition.

els must give the true mention the highest likeli-hood. We analyze accuracy on the first mentionof an entity, an assessment of novelty, and subse-quent references, which measures coherence.

Effect of Sub-word Modelling Table 3 showsthat modelling a character-level vocabulary for en-tity generation significantly outperforms BPE andword-based models, because of the diversity of en-tity names. This result highlights a key advantageof multi-stage modelling: it allows the use special-ized modelling techniques for each sub-task.

Effect of Additional Context Entity referencesshould be contextual. Firstly, names must be ap-propriate for the story setting—Bilbo Baggins ismore appropriate for a fantasy novel than one setin the present day. Subsequent references to thecharacter may be briefer, depending on context—for example, he is more likely to be referred to ashe or Bilbo than his full name in the next sentence.

We compare three models ability to name en-tities based on context (using the coreference-anonymization): a model that does not receive thestory, a model that uses only leftward context (asin Clark et al. (2018)), and the full story. We showin Table 3 that having access to the full story pro-vides the best performance. Having no access tothe story significantly decreases ranking accuracy,

Page 7: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

First Mentions Subsequent MentionsModel Rank 10 Rank 50 Rank 100 Rank 10 Rank 50 Rank 100Word-Based 42.3 25.4 17.2 48.1 38.4 28.8BPE 48.1 20.3 25.5 52.5 50.7 48.8Character-level 64.2 51.0 35.6 66.1 55.0 51.2No story 50.33 40.0 26.7 54.7 51.3 30.4Left story context 59.1 49.6 33.3 62.9 53.2 49.4Full story 64.2 51.0 35.6 66.1 55.0 51.2

Table 3: Accuracy at choosing the correct reference string for a mention, discriminating against 10, 50 and 100random distractors. We break out results for the first mention of an entity (requiring novelty to produce an appro-priate name in the context) and subsequent references (typically pronouns, nominal references, or shorter forms ofnames). We compare the effect of sub-word modelling and providing longer contexts.

Model # Unique EntitiesHuman Stories 2.99Fusion 0.47Summary 0.67Keyword 0.81Compression 0.21SRL + NER Entity Anonymization 2.16SRL + Coreference Anonymization 1.59

Table 4: Diversity of entity names. Baseline modelsgenerate few unique entities per story. Our decomposi-tions generate more, but still fewer than human stories.Using coreference resolution to build entity clusters re-duces diversity here—partly due to re-using existingnames more, and partly due to greater use of pronouns.

Model #CorefChains

UniqueNamesperChain

Human Stories 4.77 3.41Fusion 2.89 2.42Summary 3.37 2.08Keyword 2.34 1.65Compression 2.84 2.09SRL + NER Entity Anonymization 4.09 2.49SRL + Coreference Anonymization 4.27 3.15

Table 5: Analysis of non-singleton coreference clus-ters. Baseline models generate very few different coref-erence chains, and repetitive mentions within clusters.Our models generate larger and more diverse clusters.

even though the model still receives the local con-text window of the entity as input. The left storycontext model performs significantly better, butlooking at the complete story provides additionalgains. We note that full-story context can only beprovided in a multi-stage generation approach.

Qualitative Examples Figure 7 shows exam-ples of entity naming in three stories of differ-ent genres. The models adapt to the context—forexample generating The princess and The Queenwhen the context includes monarchy.

Figure 7: Generating entity references for differentgenres, using entity-anonymized human written stories.Models use the story context to fill in relevant entities.

4.4 Effect of Entity AnonymizationTo understand the effectiveness of the entity gen-eration models, we examine their performance byanalyzing generation diversity.

Diversity of Entity Names Human-written sto-ries often contain many diverse, novel names forpeople and places. However, these tokens arerare and subsequently difficult for standard neuralmodels to generate. Table 4 shows that the fusionmodel and baseline decomposition strategies gen-erate very few unique entities in each story. Gener-ated entities are often generic names such as John.

In contrast, our proposed decompositions gen-erate substantially more unique entities. Wefound that using coreference resolution for entity

Page 8: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

anonymization led to fewer unique entity namesthan generating the names independently. Thisresult can be explained by the coreference-basedmodel re-using previous names more frequently,and also it using more pronouns.

Coherence of Entity Clusters Well structuredstories will refer back to previously mentionedcharacters and events in a consistent manner. Toevaluate if the generated stories have these charac-teristics, we examine the coreference properties inTable 5. We quantify the average number of coref-erence clusters and the diversity of entities withineach cluster (e.g. the cluster Bilbo, he, the hobbitis more diverse than the cluster he, he, he).

Our full model produces more non-singletoncoreference chains, suggesting greater coherence,and also gives different mentions of the same en-tity more diverse names. However, both numbersare still lower than for human generated stories,indicating potential for future work.

Qualitative Example Figure 8 displays a sen-tence constructed to require the generation of anentity as the final word. The fusion model doesnot perform any implicit coreference to associatethe allergy with his dog. In contrast, coreferenceentity fill produces a high quality completion.

5 Related Work

5.1 Story Generation with Planning

Story generation by first composing a plan hasbeen explored using many different techniques.Traditional approaches organized sequences ofcharacter actions using hand crafted models (Riedland Young, 2010; Porteous and Cavazza, 2009).Recent work has extended this to modelling storyevents (Martin et al., 2017; Mostafazadeh et al.,2016), plot graphs (Li et al., 2013), or generateby conditioning upon sequences of images (Huanget al., 2016) or descriptions (Jain et al., 2017).

We build on previous work that decomposesgeneration into separate steps. For example, Xuet al. (2018) learn a story skeleton extractionmodel and a generative model conditioned uponthe skeleton, using reinforcement learning to trainthe two together. Zhou et al. (2018) train a story-line extraction model for news article generation,but require additional supervision from manuallyannotated storylines. Yao et al. (2019) use theRAKE (Rose et al., 2010) algorithm to extract sto-rylines, and condition upon the storyline to write

Figure 8: Constructed sentence where the last wordrefers to an entity. The coreference model is able totrack the entities, whereas the fusion model relies heav-ily on local context to generate the next words.

the full story using dynamic and static schemasthat govern if the storyline is allowed to changeduring the writing process.

5.2 Entity Language Models

An outstanding challenge in text generation ismodelling and tracking entities through a docu-ment. Centering (Grosz et al., 1995) gives a the-oretical account of how referring expressions forentities are chosen in discourse context. Namedentity recognition has been incorporated into lan-guage models since at least Gotoh et al. (1999),and has been shown to improve domain adapta-tion (Liu and Liu, 2007). Language models havebeen extended to model entities based on addi-tional information, such as entity type (Parvezet al., 2018). Recent work has incorporated learn-ing representations of entities and other unknownwords (Kobayashi et al., 2017), as well as explic-itly model entities by dynamically updating theserepresentations (Ji et al., 2017). Dynamic updatesto entity representations are used in other storygeneration models (Clark et al., 2018).

6 Conclusion

We proposed an effective method for writing shortstories by separating the generation of actions andentities. We show through human evaluation andautomated metrics that our novel decompositionsignificantly improves story quality.

ReferencesDzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben-

gio. 2015. Neural machine translation by jointlylearning to align and translate. In Proc. of ICLR.

Elizabeth Clark, Yangfeng Ji, and Noah A Smith. 2018.Neural text generation in stories using entity repre-sentations as context. In Proceedings of the 2018Conference of the North American Chapter of theAssociation for Computational Linguistics: Human

Page 9: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

Language Technologies, Volume 1 (Long Papers),volume 1, pages 2250–2260.

Zihang Dai, Zhilin Yang, Yiming Yang, William WCohen, Jaime Carbonell, Quoc V Le, and RuslanSalakhutdinov. 2018. Transformer-xl: Languagemodeling with longer-term dependency.

Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hi-erarchical neural story generation. In Proceedingsof the 56th Annual Meeting of the Association forComputational Linguistics.

Katja Filippova and Yasemin Altun. 2013. Overcom-ing the lack of parallel data in sentence compression.In Proceedings of the 2013 Conference on EmpiricalMethods in Natural Language Processing.

Jonas Gehring, Michael Auli, David Grangier, DenisYarats, and Yann N Dauphin. 2017. ConvolutionalSequence to Sequence Learning. In Proc. of ICML.

Yoshihiko Gotoh, Steve Renals, and Gethin Williams.1999. Named entity tagged language models. InAcoustics, Speech, and Signal Processing, 1999.Proceedings., 1999 IEEE International Conferenceon, volume 1, pages 513–516. IEEE.

Barbara J Grosz, Scott Weinstein, and Aravind K Joshi.1995. Centering: A framework for modeling the lo-cal coherence of discourse. Computational linguis-tics, 21(2):203–225.

Luheng He, Kenton Lee, Mike Lewis, and Luke Zettle-moyer. 2017. Deep semantic role labeling: Whatworks and whats next. In Proceedings of the An-nual Meeting of the Association for ComputationalLinguistics.

Karl Moritz Hermann, Tomas Kocisky, EdwardGrefenstette, Lasse Espeholt, Will Kay, Mustafa Su-leyman, and Phil Blunsom. 2015. Teaching ma-chines to read and comprehend. In Advances in Neu-ral Information Processing Systems, pages 1693–1701.

Ting-Hao Kenneth Huang, Francis Ferraro, NasrinMostafazadeh, Ishan Misra, Aishwarya Agrawal, Ja-cob Devlin, Ross Girshick, Xiaodong He, PushmeetKohli, Dhruv Batra, et al. 2016. Visual storytelling.In Proceedings of the 2016 Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies,pages 1233–1239.

Parag Jain, Priyanka Agrawal, Abhijit Mishra, Mo-hak Sukhwani, Anirban Laha, and Karthik Sankara-narayanan. 2017. Story generation from sequenceof independent short descriptions. arXiv preprintarXiv:1707.05501.

Yangfeng Ji, Chenhao Tan, Sebastian Martschat, YejinChoi, and Noah A Smith. 2017. Dynamic entityrepresentations in neural language models. arXivpreprint arXiv:1708.00781.

Sosuke Kobayashi, Naoaki Okazaki, and KentaroInui. 2017. A neural language model for dy-namically representing the meanings of unknownwords and entities in a discourse. arXiv preprintarXiv:1709.01679.

Kenton Lee, Luheng He, and Luke Zettlemoyer. 2018.Higher-order coreference resolution with coarse-to-fine inference. In Proceedings of the 2018 Confer-ence of the North American Chapter of the Associ-ation for Computational Linguistics: Human Lan-guage Technologies, Volume 2 (Short Papers), vol-ume 2, pages 687–692.

Boyang Li, Stephen Lee-Urban, George Johnston, andMark Riedl. 2013. Story generation with crowd-sourced plot graphs.

Feifan Liu and Yang Liu. 2007. Unsupervised lan-guage model adaptation incorporating named entityinformation. In Proceedings of the 45th AnnualMeeting of the Association of Computational Lin-guistics, pages 672–679.

Peter J Liu, Mohammad Saleh, Etienne Pot, BenGoodrich, Ryan Sepassi, Lukasz Kaiser, andNoam Shazeer. 2018. Generating wikipedia bysummarizing long sequences. arXiv preprintarXiv:1801.10198.

Lara J Martin, Prithviraj Ammanabrolu, William Han-cock, Shruti Singh, Brent Harrison, and Mark ORiedl. 2017. Event representations for automatedstory generation with deep neural nets. arXivpreprint arXiv:1706.01331.

Nasrin Mostafazadeh, Alyson Grealish, NathanaelChambers, James Allen, and Lucy Vanderwende.2016. Caters: Causal and temporal relation schemefor semantic annotation of event structures. In Pro-ceedings of the Fourth Workshop on Events, pages51–61.

Md Rizwan Parvez, Saikat Chakraborty, BaishakhiRay, and Kai-Wei Chang. 2018. Building languagemodels for text with named entities. arXiv preprintarXiv:1805.04836.

Julie Porteous and Marc Cavazza. 2009. Controllingnarrative generation with planning trajectories: therole of constraints. In Joint International Confer-ence on Interactive Digital Storytelling, pages 234–245. Springer.

Mark O Riedl and Robert Michael Young. 2010. Nar-rative planning: Balancing plot and character. Jour-nal of Artificial Intelligence Research, 39:217–268.

Stuart Rose, Dave Engel, Nick Cramer, and WendyCowley. 2010. Automatic keyword extraction fromindividual documents. Text Mining: Applicationsand Theory.

Rico Sennrich, Barry Haddow, and Alexandra Birch.2015. Neural machine translation of rare words withsubword units. arXiv preprint arXiv:1508.07909.

Page 10: Strategies for Structuring Story Generation · 2020. 1. 16. · Strategies for Structuring Story Generation Angela Fan FAIR, Paris LORIA, Nancy angelafan@fb.com Mike Lewis FAIR, Seattle

Zhixing Tan, Mingxuan Wang, Jun Xie, Yidong Chen,and Xiaodong Shi. 2018. Deep semantic role label-ing with self-attention. In AAAI Conference on Arti-ficial Intelligence.

Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion Jones, Aidan N. Gomez, LukaszKaiser, and Illia Polosukhin. 2017. Attention Is AllYou Need. In Proc. of NIPS.

Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly.2015. Pointer networks. In Advances in Neural In-formation Processing Systems, pages 2692–2700.

Felix Wu, Angela Fan, Alexei Baevski, Yann Dauphin,and Michael Auli. 2019. Pay less attention withlightweight and dynamic convolutions. In Interna-tional Conference on Learning Representations.

Jingjing Xu, Yi Zhang, Qi Zeng, Xuancheng Ren,Xiaoyan Cai, and Xu Sun. 2018. A skeleton-based model for promoting coherence among sen-tences in narrative story generation. arXiv preprintarXiv:1808.06945.

Lili Yao, Nanyun Peng, Ralph Weischedel, KevinKnight, Dongyan Zhao, and Rui Yan. 2019. Plan-and-write: Towards better automatic storytelling. InAssociation for the Advancement of Artificial Intel-ligence.

Deyu Zhou, Linsen Guo, and Yulan He. 2018. Neuralstoryline extraction model for storyline generationfrom news articles. In Proceedings of the 2018 Con-ference of the North American Chapter of the Asso-ciation for Computational Linguistics: Human Lan-guage Technologies, Volume 1 (Long Papers), vol-ume 1, pages 1727–1736.