DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 1
Cascading XSL filters for content selection in multilingual
document generation
G. Barrutieta, J. Abaitua & J. Díaz
(DELi)COLING 2002
W8: NLP XML
Sept. 1st, 2002
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 2
Introduction – System overview
Course material(multilingual parallel corpus)
User aspects
xml-dtd
Document generation
Document view
COURSE GENERATOR
Generation engine
html-xml-dtd-xsl-javascript
Select content and format in an“intelligent” way.
Inputs
Web browser
......
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 3
Introduction - Corpus
• Multilingual parallel corpus or master document– Gross-grained RST to represent the gross-grained discourse
structure.– XML-DTD to represent digitally the gross-grained RST.
• Text > Data > In between tags• Discourse structure > Metadata > XML tags
– Gross-grained RST provides the framework for an isomorphic multilingual corpus.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 4
<RST> <RST-S> <PREPARATION> <S> What is knowledge management? </S> </PREPARATION> </RST-S> <RST-N> <S> Knowledge, in a business context, is the organizational memory, which people know collectively and individually </S> <S> Management is the judicious use of means to accomplish an end </S> <S> Knowledge management is the combination of those concepts, KM = knowledge + management </S> </RST-N></RST>
Introduction: Multilingual parallel corpus with gross-grained RST in XML
EN ES EU
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 5
<RST> <RST-S> <PREPARATION> <S> What is knowledge management? </S> </PREPARATION> </RST-S> <RST-N> <S> Knowledge, in a business context, is the organizational memory, which people know collectively and individually </S> <S> Management is the judicious use of means to accomplish an end </S> <S> Knowledge management is the combination of those concepts, KM = knowledge + management </S> </RST-N></RST>
<RST> <RST-S> <PREPARATION> <S> ¿Qué es gestión del conocimiento? </S> </PREPARATION> </RST-S> <RST-N> <S> Conocimiento, en el contexto de los negocios, es la memoria de la organización, lo que la gente sabe colectiva e individualmente </S> <S> Gestión es el uso juicioso de recursos para alcanzar un fin </S> <S> Gestión del conocimiento es la combinación de esos dos conceptos, GC = gestión + conocimiento </S> </RST-N></RST>
EN ES EU
Introduction: Multilingual parallel corpus with gross-grained RST in XML
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 6
<RST> <RST-S> <PREPARATION> <S> What is knowledge management? </S> </PREPARATION> </RST-S> <RST-N> <S> Knowledge, in a business context, is the organizational memory, which people know collectively and individually </S> <S> Management is the judicious use of means to accomplish an end </S> <S> Knowledge management is the combination of those concepts, KM = knowledge + management </S> </RST-N></RST>
<RST> <RST-S> <PREPARATION> <S> ¿Qué es gestión del conocimiento? </S> </PREPARATION> </RST-S> <RST-N> <S> Conocimiento, en el contexto de los negocios, es la memoria de la organización, lo que la gente sabe colectiva e individualmente </S> <S> Gestión es el uso juicioso de recursos para alcanzar un fin </S> <S> Gestión del conocimiento es la combinación de esos dos conceptos, GC = gestión + conocimiento </S> </RST-N></RST>
<RST> <RST-S> <PREPARATION> <S> Zer da ezagutzaren kudeaketa? </S> </PREPARATION> </RST-S> <RST-N> <S> Kudeaketa, negozioetan, erakundearen memoria da, jendeak bakarka eta taldeka dakiena </S> <S> Kudeaketak erabideen erabilera zuzena du helburu </S> <S> Ezagutzaren kudeaketa bi kontzeptu hauen nahasketa da, EK = ezagutza + kudeaketa </S> </RST-N></RST>
EN ES EU
Introduction: Multilingual parallel corpus with gross-grained RST in XML
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 7
Introduction – User Aspects
Specific User Aspects Discrete values
Subject Language processors
Moment in time Before the course / Period 1 / Period 2 / … / After the course (review)
Languages EN/ ES/ EU
General User Aspects Discrete values Level of expertise Null / Basic / Medium / High
Reason to read To get an idea / To get deep into it
Background Not related to the subject / Related to the subject
Opinion or motivation Against / Without an opinion or motivation / In favour
Time available A little bit of time / Quite some time / Enough time
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 8
CSA – Parallel selection
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 9
CSA – Horizontal filtering
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 10
CSA – Vertical filtering
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 11
CSA – Vertical filteringLevel of expertise
If level_expertise = “null” or level_expertise = “basic”Then no relation-satellite is discarded; If level_expertise = “medium” or level_expertise = “high”Then discard example, exercise, background and preparation relation-satellites;
Rationale for the rule: Any user with a null or basic level of expertise on the selected subject will need all the information available to understand the text. Alternatively, a user with a medium or high level of expertise will not require examples, exercises, background, preparation and similar relation-satellites.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 12
CSA – Vertical filteringReason to read
If reason_to_read = “to get an idea”Then discard exercise and elaboration (all the types of elaboration: textual elaboration, link elaboration and image elaboration) relation-satellites; If reason_to_read = “to get deep into it”Then no relation-satellite is discarded;
Rationale: Any user wishing to broaden his knowledge in the selected subject will need additional information. Conversely, a user with the intention of just getting an idea does not need any exercise, elaboration, or similar relation-satellites, which often require a more active role on the part of the user.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 13
CSA – Vertical filteringProfessional background
If job_studies = “not related subject” Then no relation-satellite is discarded; If job_studies = “related subject” Then discard background and preparation relation-satellites;
Rationale: Any user whose professional background is not related to the subject will need all the additional supporting text to understand its meaning. Conversely, if the user is related to the selected subject, we may assume that background, preparation and similar relation-satellites will be unnecessary.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 14
CSA – Vertical filteringOpinion or motivation
If opinion_motivation = “against” or opinion_motivation = “without an opinion or motivation” Then no relation-satellite is discarded; If opinion_motivation = “in favour” Then discard motivate, antithesis, concession and justify relation-satellite;
Rationale: A motivated or favourable user will not require additional motivation and, therefore, the motivate, antithesis, concession, justification, and similar relation-satellites will be disregarded, since they play a role in changing the opinion of the user to be in favour of the course material.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 15
CSA – Vertical filteringTime available
If time_available = “a little bit of time” Then discard all the relation-satellites; If time_available = “quite some time”Then discard exercise relation-satellite; If time_available = “enough time”Then no relation-satellite is discarded;
Rationale: Time availability is a crucial user aspect. If the user is in a rush or has little time, the system has to provide only the most elementary information. In such case only nuclei will be generated. If the user has a bit more time, but not much, exercises are not offered, since they are usually quite time consuming and they require an active participation of the user. Finally, if the user has plenty of time, all the additional information is delivered.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 16
CSA – Vertical filteringComments
• The order of application of the filters is irrelevant, each filter acts upon certain parts of the text independently.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 17
Implementation
Javascript implementation
XSL implementation
objData.loadXML(sResult);
objStyle.load(sXSL1);
sResult=objData.transformNode(objStyle);
<xsl:template match="BACKGROUND">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 18
Experimentation
• The main objective of the experimentation is to validate the hypothesis expressed in the filtering rules letting people judge the generated document and also the actual filtering mechanism of the CSA.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 19
Demo
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 20
Demo
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 21
Conclusions
• Increase the size of the corpus:– As long as this is done following the same DTD and
RST model, the algorithm will not have to change at all.
• Augment the user model:– New user aspect requires only a new filter– New values for an existing user aspect requires a change
in the corresponding filter
• Therefore none of this modifications increase the complexity of the system and are not difficult to implement.
DELiCOLING 2002 - W8: NLP & XML - Sept. 1st, 2002
Cascading XSL filters for content selection in multilingual document generationG. Barrutieta, J. Abaitua & J. Díaz 22
• Questions
• Comments• Further information
• Suggestions
Thank you for your attention.
This research work has been partly supported by the Basque
Goverment