shallow language generation tg/2 and temsis-gen stephan busemann dfki gmbh stuhlsatzenhausweg 3...

18
Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken [email protected] http://www.dfki.de/~busemann

Upload: melissa-tate

Post on 27-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Shallow Language GenerationTG/2 and TEMSIS-Gen

Stephan Busemann

DFKI GmbH

Stuhlsatzenhausweg 3

D-66123 Saarbrücken

[email protected]

http://www.dfki.de/~busemann

Page 2: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

DFKI‘s TG/2 System Offers a Flexible Framework for NLG

• TG/2 is a transparent production system

• TG/2 interprets a separately defined set of condition-action rules

• TG/2 maps pieces of input onto surface strings

TG/2 keeps grammars largely independent from input representations

Test Predicates on properties of the input

Access Paths yielding a part of

the Input

Input Grammar Rules

(COOP threshold-passing)DECL -> PPTIME THTYPE EXCEEDS

Page 3: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

The TGL Syntax Uses Test and Access Path Descriptions

• Category: Context-free skeleton of derivations

• Test: Conjunction of Boolean predicates on the the input structure

• Template: Sequence of actions, each operating on a substructure of the current input (accessor functions)– Rule (:RULE), failure causes the rule to fail as well– Optional rule (:OPTRULE), ignored unless input available– Access path descriptions, determining input substructures – Functions (:FUN) into strings, failure causes rule to fail as well – ASCII Strings, directly to output

• Constraints: feature equations expressing agreement relations

• Side effects, e.g. updating a discourse memory

Page 4: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

TG/2 Grammars Integrate Canned Texts, Templates and Context-free Rules

My category is DECL.IF the slot COOP is 'threshold-passing AND the slot LAW-NAME is specified THEN apply PPtime from slot TIME apply THTYPE from CURRENT-INPUT utter "(" apply LAW from slot LAW-NAME utter ") " apply EXCEEDS from slot EXCEEDS utter "."WHERE THTYPE AND EXCEEDS agree in GENDER

My category is THTYPE.IF there is no slot THRESHOLD-TYPE specified THEN utter "la valeur limite autoris&e2e "WHERE THTYPE has value 'fem for GENDER

En été 1999la valeur limite autorisée(selon le decret ...)a été dépassée une fois.

[ Busemann 1996 ]

Page 5: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

The TGL Rules are Written as Condition-Action Pairs

(defproduction wertueberschreitung "WU01" (:PRECOND (:CAT DECL :TEST ((pred-eq 'wertueberschreitung) (not (threshold-value-p)))) :ACTIONS (:TEMPLATE (:OPTRULE PPtime (get-param 'time)) (:RULE THTYPE (self)) (:OPTRULE POLL (get-param 'pollutant)) "(" (:RULE LAW (get-param 'law-name)) ") " (:RULE EXCEEDS (get-param 'exceeds)) "." :CONSTRAINTS (:GENDER (THTYPE EXCEEDS) :EQ))))

(defproduction thtype "THT3" (:PRECOND (:CAT THTYPE :TEST ((not (threshold-type-p)))) :ACTIONS (:TEMPLATE "la valeur limite autoris&e2e " :CONSTRAINTS (:GENDER THTYPE :VAL 'fem))))

Page 6: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

TGL Constraints are Percolated Across the Derivation Tree• Constraints are PATR-II style, percolation through unification ( )• Every local tree is licensed by a grammar rule

• A feature can be assigned a value (:=)

• Two features can be constrained to have identical values (=)

(X1.GENDER = X2.GENDER)

(X0.GENDER = X2.Gender)

inflect(dépassé)

X0

X1 X2

X2 X1

X0

“la valeur limite “

(X0.GENDER := fem)

Page 7: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

The Interpreter is Based on the Context-Free Backbone of TGL-Grammars

• Matching– Identify all rules with the current category– For each of them perform its tests on the input structure

(“IF” part)– Add those passing the tests to the conflict set

• Conflict resolution– Select an element of the conflict set (possibly by some

preference mechanism)• Firing

– Evaluate the rule‘s constraints (if available, “WHERE” part)– For each element of the “THEN” part, read the new

category and determine the new input structure by evaluating the associated access path descriptions

THREE-STEP EVALUATION CYCLE

Page 8: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

eGram Supports Grammar Development

Page 9: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

TEMSIS is Developing a Transnational Environmental Management Support and Information System

• Project goals– distributed information and communi-

cation platform for transnational co-operation

– comfortable bilingual access to regional environmental data

– information kiosks accessible via WWW

• Cooperation between administrations– French / German urban agglomeration

Moselle-Est / Stadtverband Saarbrücken

– Environmental information standardised and shared

TAP Sector C9 (Environment) No. 2945

Duration 01/96-06/98

Effort 172,5 PM

Page 10: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

Air Quality Reports Are Generated From Environmental Data at the TEMSIS-Server

• Parameter selection by the user– Language (German, French)– Pollutant and measurement station– Relevant period of time

• Stage 1: text schema construction– Querying the database– Composition of report structure– Elision of contextual redundancies

• Stage 2: verbalization by TG/2– Selection of sentence patterns– Wording, phrasing, grammar

• HTML postprocessing

Stage 1

Stage 2

Text Generation Server

[Busemann/Horacek 1998, Horacek/Busemann 1998]

Parameter Specs

HTML/Java Code

TEMSISDatabase

Page 11: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

The System Covers 384 Report Structures

Parameters selected from the TEMSIS navigator menus:– French language text about a German situation– Ozone data, exceeding thresholds according to decree– Measurements at Völklingen-City in summer 1997 (confirmation)

Vous avez choisi la station de mesure de Völklingen-City afin de consulter la pollution atmosphérique relevée en été 1997.

A la station de mesure de Völklingen-City, la valeur d'information pour l'ozone pour une exposition de 60 minutes (180 µg/m³ selon le decret allemand (Bundesimmissionsschutzverordnung)) a été dépassée une fois.

La valeur d'interdiction du trafic (240 µg/m³) a aussi été dépassée une fois.

En été 1996 la valeur d'information (180 µg/m³) n'a pas été dépassée.

EXAMPLE

Page 12: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

Text Organization Is Schema-based

• Confirm important selected parameters

• Number the values exceeding the lowest threshold

• Number the values exceeding the next threshold

• Compare with values of preceding year

• Repeat the core statement („summary“)

SAMPLE SCHEMA FOR SUMMER OBSERVATION, THRESHOLD PASSING

A schema is instantiated on the basis of the input parameters and the retrieved data

Page 13: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

Instantiating a Schema Leads to a Report Structure

• Achieves text coherence by

– removing redundant information

– inserting particles („also“)

– simple techniques of aggregating information

• Yields canned texts or intermediate content representations

• Intermediate representations are independent of particular languages

TEXT ORGANISATION

Shallow generation can do without explicit knowledge representation and text planning

Page 14: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

Text Coherence is Achieved by Simple Means• Removing redundancy

– If two subsequent slots in a schema contain identical pieces of information, which are not focussed, then delete the second instance

• Particle insertion

– If two subsequent slots in a schema contain identical pieces of information, which are focussed, then mark the second instance as „repeating“

• Aggregation

– If a time series contains a sequence of identical values, describe a time interval (from ... to ...) and mention the value only once

The means depend on the phenomena encountered in the corpus

Page 15: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

Intermediate Representations Specify the Contents Only Partly

[(COOP threshold-passing) (TIME [(PRED season) (NAME [(SEASON summer) (YEAR 1997)])]) (POLLUTANT o3) (SITE "Völklingen-City") (DURATION [(MINUTE 60)]) (SOURCE [(LAW-NAME bimsch) (THRESHOLD-TYPE info-value)]) (EXCEEDS [(STATUS yes) (TIMES 1)])]

En été 1997, à la station de mesure de Völklingen-City, la valeur d'information pour l'ozone pour une exposition de 60 minutes (180 µg/m³ selon le decret allemand (Bundesimmissionsschutzverordnung)) a été dépassée une fois.

Stage 1: Text Organization

Stage 2: Verbalization

Page 16: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

Shallow Generation is Well Suited for Small Applications

• What are „small“ applications?

– Limited linguistic requirements

– Simple text sort, e.g. technical documents

– Useful in early stages

• The TEMSIS generation task represents a class of applications

– Text organization by few variable schemata

– Sufficient quality of text by simple techniques to establish coherence

– Realization of intermediate representation without accessing the context

– Ca 100/120 TGL rules, ca 400 linguistically different reports

With larger applications, the drawbacks of shallow approaches become obvious

Page 17: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

Shallow Generation Has Pros and Cons

Possible advantages Possible drawbacks

• Low development effort

• Reusable interpreter and subgrammars

• Very fast processing

• Easy introduction of additional languages

• Easy extension with alternative formulations (through a preference mechanism in TG/2)

• Knowledge representation depends on application

• Implicit dependencies

• Scalability is inherently lower than with in-depth generators

• Maintaining transparency of grammars can become a cost factor

ASSESSMENT

Page 18: Shallow Language Generation TG/2 and TEMSIS-Gen Stephan Busemann DFKI GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken busemann@dfki.de busemann

Source: Stephan Busemann Language Technology I, WS 2005/2006

Literature

• Stephan Busemann, ``Ten Years After - An Update on TG/2 (and friends)", in: Proc. 10th European Workshop on Natural Language Generation, Aberdeen, UK, 2005

• Stephan Busemann, ``eGram - a Grammar Development Environment and Its Usage for Language Generation'', in: Proc. Fourth International Conference on Language Resources and Evaluation (LREC), Lisbon, Portugal, 2004

• Stephan Busemann, ``Language Generation for Cross-Lingual Document Summarisation'', in Sheng, Huanye (ed.), International Workshop on Innovative Language Technology and Chinese Information Processing (ILT\&CIP-2001), Science Press, Chinese Academy of Sciences, Beijing, China, May 2002.

• Stephan Busemann and Helmut Horacek, ``A Flexible Shallow Approach to Text Generation'', in: Eduard Hovy (ed.): Proceedings of the Nineth International Natural Language Generation Workshop (INLG '98), Niagara-on-the-Lake, Canada, August 1998, 238-247. Also at the Computation and Language Archive.

• Stephan Busemann, ``Best-First Surface Realization'', in: Donia Scott (ed.): Proceedings of the Eighth International Natural Language Generation Workshop (INLG '96), Herstmonceux, Sussex, 1996, 101-110. Also at the Computation and Language Archive.

All papers on: http://www.dfki.de/lt/publications.php