ehud reiter, computing science, university of aberdeen1 cs4025: content determination and document...

39
Reiter, Computing Science, University of Aberdeen CS4025: Content Determination and Document Planning

Upload: clinton-spencer

Post on 18-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 1

CS4025: Content Determination and Document Planning

Page 2: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 2

Braun’s SaferDriver

Aberdeen MSc by Research App which gives drivers feedback about

speeding, etc» Gathers data from GPS, acceleration» Produces map as well as text» Evaluated

https://aclweb.org/anthology/W/W15/W15-4726.pdf

Page 3: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 3

Document Planning

First stage of NLG Two tasks

» Decide on content (our focus)» Decide on rhetorical structure

Can be interleaved

Page 4: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 4

Document Planning

Problem: Usually the output text can only communicate a small portion of the input data» Which bits should be communicated?» Should the text also communicate

summaries, perspectives, etc» How should information be ordered and

structured?

Page 5: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 5

ScubaText: Imitate Corpus

One approach to document planning is to analyse corpus texts (after aligning them to data), and manually infer content and structure rules.

Scubatext example

Page 6: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 6

Example Input

Page 7: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 7

Input SegmentsdiveNo segNo itime ivalue ftime fvalue

1460 1 0 1.3 60 6.3

1460 2 60 6.3 140 32.2

1460 3 140 32.2 480 0

1460 4 480 0 760 0

1460 5 760 0 920 38.9

1460 6 920 38.9 1300 41.6

1460 7 1300 41.6 1500 15.5

1460 8 1500 15.5 1860 27.2

1460 9 1860 27.2 2160 9.2

1460 10 2160 9.2 2600 2.7

Page 8: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 8

Corresponding Corpus Text

Your first ascent was a bit rapid; you ascended from 33m to the surface in 5 minutes, it would have been better if you had taken more time to make this ascent. You also did not stop at 5m, we recommend that anyone diving beneath 12m should stop for 3 minutes at 5m. Your second ascent was fine.

Page 9: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 9

Align corpus text with data

Input: 14603 140 32.2 480 0Output: (representation of)Your first ascent was a bit rapid; you ascended from

33m to the surface in 5 minutes, it would have been better if you had taken more time to make this ascent

Input: 1460 10 21609.2 26002.7Output: (representation of)Your second ascent was fine

Page 10: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 10

Possible content rules

Describe segments that end (near) 0» And that don’t start at 0» Also segment at end of dive

Give additional info about such segments whose slope is too high» Explain risk» Say what should have happened

Page 11: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 11

Possible ordering rules

Break up dive into sections, where each section starts and ends at surface

For each section, start with most important safety issue (or say dive was fine if no safety issue)

Then add less important safety issues

Page 12: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 12

More examples

We’ve just looked at one example here! Need to repeat process for at least 20-

30 examples, which cover spread of possible cases (including special cases)

Merge rules and deal with conflicts» Often causes by different corpus authors

writing differently; may give priority to one particular author, and imitate his style

Page 13: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 13

Content Determination

The most important aspect of NLG!» If we get content right, users may not be too

fussed if language isn’t perfect» If we get content wrong, users will be

unhappy even if language is perfect Also the most domain-dependent aspect

» Based on domain, user, tasks more than general knowledge about language

Page 14: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 14

Text Structure

Output of DP also shows text structure» Tree, leaves represent sentences, phrases

– For now, assume leaves are strings– Look at other representations next week

» Internal nodes group leaves, lower nodes– Can mark sentences, paragraphs, etc breaks– Can express rhetorical relations between nodes

Page 15: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 15

Example tree

Your first ascent was too rapid, you ascended from 33m to the surface in 5 minutes. However, your second ascent was fine.

your first ascent was too rapid

you ascended from 33m to the surface in 5 minutes

N2: elaboration

N1: contrast (paragraph)

your second ascent was fine

Page 16: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 16

Why tree?

Tree shows where constituents can be merged (children of same parent)» Your first ascent was a bit rapid. You ascended from 33m to

the surface in 5 minutes. However, your second ascent was fine. (OK)

» Your first ascent was a bit rapid, you ascended from 33m to the surface in 5 minutes. However, your second ascent was fine. (OK)

» Your first ascent was a bit rapid. You ascended from 33m to the surface in 5 minutes, however, your second ascent was fine. (NOT OK)

Page 17: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 17

Rhetorical Rel

Shows how messages relate to each other RR can be expressed via cue phrases.

» Best cue phrase for RR depends on context– Your first ascent was a bit rapid. However, your second

ascent was fine.– Your first ascent was a bit rapid, but your second ascent

was fine.

» Also readers like cue phrases to be varied, not same one used again and again

– Eg, don’t overuse “for example”

» Hence better to specify abstract RR in text spec

Page 18: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 18

Common Rhetorical Rels

CONCESSION (although, despite) CONTRAST (but, however) ELABORATION (usually no cue) EXAMPLE (for example, for instance) REASON (because, since) SEQUENCE (and, also) Research community does not agree

» Many different sets of rhetorical rels proposed

Page 19: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 19

How Choose Content

Theoretical approach: deep reasoning based on deep knowledge of user, task, context, etc

Pragmatic approach: write schemas which try to imitate human-written texts in a corpus

Statistical approach: Use learning tech to learn content rules from corpus

Page 20: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 20

Theoretical Approach

Deduce what the user needs to know, and communicate this

Based on in-depth knowledge» User (knowledge, task, etc)» Context, domain, world

Use AI reasoning engine» AI Planner, plan recognition system

Page 21: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 21

Example

User (on rig) asks for weather report» User’s task is unloading supply boats» supply boat SeaHorse is approaching rig

Inference: user wants to unload SeaHorse when it arrives» Tell him what affects choice of unloading

procedure (eg, wind dir, wave height)– Use knowledge of rig, SeaHorse, cargo, …

Page 22: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 22

Theoretical Approach

Not feasible in practice (at least in my experience)» Lack knowledge about user

– Maybe wants to land supply helicopter

» Lack knowledge of context– Eg, which supply boats are approaching

» Very hard to maintain knowledge base– New users, boats, tasks, regulations…

Page 23: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 23

Pragmatic Approach: Schema

Templates, recipes, programs for text content

Typically based on imitating patterns seen in human-written texts (corpus)» Revised based on user feedback

Specify structure as well as content

Page 24: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 24

Schema Implementation

Usually just written as code in Java or other standard programming languages» Some special languages, but publicly

available ones not very useful

Page 25: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 25

Pseudocode example

Schema ScubaSchema

for each ascent A in data set

if ascent is too fast

add unsafeAscentSchema(A)

else

add safeAscentSchema(A)

set rhetorical relation

Page 26: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 26

Pseudocode example

Schema unsafeAscentSchema(Ascent A)

create DocumentElement

add sentence “Your ascent was too fast”add sentence “You ascended from

[A.startValue()] to the surface in [A.duration()] minutes”

return

Page 27: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 27

Creating Schemas

Creating schemas is an art, no solid methodology (yet)

Williams and Reiter (2005)» Create top-down, based on corpus texts» First identify high-level structure of corpus doc

– Eg, each ascent described in temporal sequence– Build schemas based on this

» Then create low-level schemas (rules) – Eg, for describing a single ascent

Page 28: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 28

Williams and Reiter

Problems» Corpus texts likely to be inconsistent

– Especially if several authors wrote texts

» Some cases not covered in the corpus– Unusual cases, boundary cases

Developer needs to use intuition for such cases» check with experts, users!

Page 29: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 29

Williams and Reiter

Evaluation/Testing» Essential!» Developer-based:

– Compare to corpora– Check boundary cases for reasonableness

» Expert-based: Ask experts to revise (post-edit) texts

» User-based: Ask users for comments

Page 30: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 30

Statistical content det

Statistical/learning techniques» Parse corpus, align with source data, use

machine learning algorithms to learn content selection rules/schemas/cases

– Barzilay and Lapata, 2005– Kondadidi et al 2013

Worth considering if large corpora available

Page 31: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 31

Reuters approach

http://www.aclweb.org/anthology/P/P13/P13-1138.pdf Training: Parse corpus of old newswire articles

» Turn sentences, phrases into templates» Cluster templates into “conceptual unit”» Train model which chooses template for 1st, 2nd, 3rd sentence

– Features such as “prior conceptual unit”

Generation» Eliminate templates/conceptual units which dont fit the input

data» Use model to choose templates for 1st, 2nd, etc sentences» Fill in holes in template from data

Page 32: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 32

Example

Corpus: Mr Jones has a Bachelors in Finance from Harvard

Template: [person] has [degree] in [subject] from [university]

Conceptual unit: Person/degree/subject/university» Also covers “[person] was awarded a [degree] in [subject]

from [university]”

Page 33: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 33

Example

Data includes tuple <Miss Smith, MSc, Computing, Aberdeen>

Ranking module chooses previous template as top candidate for sentence 2

Include sentence “Miss Smith has an MSc in Computing from Aberdeen”

Page 34: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 34

Advanced Topic: Computing Rhetorical Relation

In theory, can compute the rhetorical relation between nodes» There are formal definitions of rhet rel

– See p103 of Reiter and Dale

» Reason about which definition best fits to current context

Not easy to do in practice» Alternative is imitate corpus

Page 35: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 35

Advanced Topic: Perspectives

Text can communicate perspectives, “spin”, as well as raw data. Eg,» Smoking is killing you» If you keep on smoking, your health may

keep on getting worse» If you stop smoking, your health is likely to

improve» If you stop smoking, you’ll feel better

Page 36: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 36

Perspectives

How to choose between these? Depends on personality of reader

» Some people react better to positive messages, others to negative messages

» Some react better to short direct messages, others want these weakened (“may”, “is likely to”)

» Hard to predict…

Page 37: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 37

Advanced: User-Adaptation

Texts should depend on» User’s personality (previous)» User’s domain knowledge (how much do

we need to explain)» User’s vocabulary (can we use technical

terms in the text)» User’s task (what does he need to know)

Hard to get this information…

Page 38: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 38

Advanced: Continuity Babytalk: users complained about

» TcPO2 suddenly decreased to 8.1. TcPO2 suddenly decreased to 9.3.

intermediate rise of TcPO2 not mentioned because slow and less medically important

But users have “continuity” assumption that behaviour remains as last mentioned» Need to add material if continuity violated» TcPO2 suddenly decreased to 8.1. After rising back to 11.2,

TcPO2 suddenly decreased to 9.3.

Narrative issue (Reiter et al, 2008)?

Page 39: Ehud Reiter, Computing Science, University of Aberdeen1 CS4025: Content Determination and Document Planning

Ehud Reiter, Computing Science, University of Aberdeen 39

Conclusion

Content determination is the first and most important aspect of NLG» What information should we communicate?

Mostly based on imitating what is observed in human-written texts» Using schemas, written in Java

Also decide on structure» Tree structure, rhetorical relations