modeling textual definition contents with bfo 2.0 and creating related linguistic/nlp resources...

49
Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Upload: shawn-scott

Post on 15-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Modeling Textual Definition Contents With BFO 2.0

and Creating RelatedLinguistic/NLP Resources

Selja SeppäläOctober 29, 2014

NCBO Webinar Series

Page 2: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Objectives

Developing the specifications for a widely usable software to help in definition writing

Creating language- and domain-independent definition models (templates)Using the BFO categories and relationsCreating BFO-related linguistic/NLP resources for large-scale corpus analyses of definitions

2NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 3: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

The general idea

gen is_a OBJECT An achromatic cell

spe other: develops_from MATERIAL ENTITY

of the myeloid or lymphoid lineages

spe bearer_of DISPOSITION

capable of ameboid movement,

spe located_in MATERIAL ENTITY

found in blood or other tissue.

OBJECT

NCBO Webinar Series | October 29, 2014 | Selja Seppälä

object

s has_part objects has_part object aggregate

material entity

s bearer_of dispositions bearer_of qualitys contains processs contains process boundarys has_history processa has_part immaterial entitya has_part material entitya located_in independent continuants material_basis_of dispositiona occupies three-dimensional spatial regiona part_of immaterial entitya part_of material entitys participates_in process

3

leukocyte (CL_0000738)An achromatic cell of the myeloid or lymphoid lineages capable of ameboid movement, found in blood or other tissue.

12

3

Page 4: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Outline

• Background on definitions• Introducing the problem• Proposal: modeling definition contents

with BFO• Methodology• Creating BFO-related linguistic/NLP resources• Conclusion

4NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 5: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Background on definitions

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 5

Page 6: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Object of study (1)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 6

Page 7: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Object of study (2)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 7

Page 8: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Object of study (3)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 8

Page 9: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Object of study (4)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 9

Page 10: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Functions of a definition

• Conveying meaning of domain-specific terms• Specifying terms’ intended meaning

Disambiguation function• Problems when– Information is not relevant, e.g. non-defining– No definition at all How to understand/use the term?

10NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 11: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Definition of ‘definition’

• A single sentence • Composed of pieces of information, e.g. found

in experts’ texts• Which are relevant to the experts’ activity• About the types of things to which the defined

term refers• Has classical Aristotelian structure–One genus–One or more differentiae (specifiers)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 11

Page 12: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Example from the Uber Anatomy Ontology (UBERON)

ligamentDense regular connective tissue connecting two or more adjacent skeletal elements or supporting an organ.

NCBO Webinar Series | October 29, 2014 | Selja Seppälä

genus:type of thing

differentia:distinguishing characteristic

12

Page 13: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Example from theCell Type Ontology (CL)

leukocyteAn achromatic cell of the myeloid or lymphoid lineages capable of ameboid movement, found in blood or other tissue.

NCBO Webinar Series | October 29, 2014 | Selja Seppälä

genus:type of thing

differentiae:distinguishing characteristics

13

Page 14: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Definition writing rules, issues, and needs

• Subject to a few formal rules (punctuation, etc.)

• Only vague principles for selecting the right pieces of information

• Time-consuming, costly, and prone to a kinds of inconsistencies

➔ Create computer software to help authors– Improve quality of dictionaries & ontologies– Comply to the publisher’s formal rules (Seppälä 2006)

– Include relevant defining contentsNCBO Webinar Series | October 29, 2014 | Selja Seppälä 14

Page 15: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Definition writing rules, issues, and needs

• Subject to a few formal rules (punctuation, etc.)

• Only vague principles for selecting the right pieces of information

• Time-consuming, costly, and prone to a kinds of inconsistencies

➔ Create computer software to help authors– Improve quality of dictionaries & ontologies– Comply to the publisher’s formal rules (Seppälä 2006)

– Include relevant defining contentsNCBO Webinar Series | October 29, 2014 | Selja Seppälä 15

Page 16: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Introducing the problem

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 16

Page 17: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Example: compiling a corpus5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20).

(Source: http://www.dft.gov.uk/mca/chapter_5.pdf)

17NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 18: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Example: extracting information5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20).

(Source: http://www.dft.gov.uk/mca/chapter_5.pdf)

18NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 19: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Example: identifying potentially defining information

5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20).

(Source: http://www.dft.gov.uk/mca/chapter_5.pdf)

19NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 20: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Example: selecting relevant information for defining the term

5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20).

(Source: http://www.dft.gov.uk/mca/chapter_5.pdf)

20NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 21: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Example: composing the definition5.3.2 Net booms Netting can be used as a boom to collect viscous oils at sea. Vessels deploy the netting, manoeuvring it to surround and concentrate the oil for recovery. In inshore waters, it may be possible to use nets as a moored boom. Net booming is based on the principle that air and water will pass through the net but not viscous oil. Therefore, there is little or no downward current to carry oil underneath the net and the net is less prone to being blown flat. Because of the lower resistance of nets to water movement, it should also be possible in theory to deploy net booms in faster currents than is possible with conventional booms. Net booms have not yet been fully tested in an actual oil spill but results from field trials have been encouraging. A net boom consists of a long strip of netting, supported at frequent and regular intervals by poles with floats and weights attached to keep the netting upright (Figure 20).

(Source: http://www.dft.gov.uk/mca/chapter_5.pdf)

21NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 22: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

A content selection problem

• How to select the relevant kind of information?➔ Selection principles

• How to create rules to help authors select that the relevant information?➔ Content modeling

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 22

Page 23: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Proposal: modeling definition contents with BFO

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 23

Page 24: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Content selection principles

• Extensional dimension– Ontological factors

• Category of the referent (object, process, quality, etc.)• Type of things or particular (particle accelerator vs. the LHC)

• Contextual dimension– Conceptual system of the domain– Background knowledge of the user

• Communicative dimension– User needs– Communication situation (linguistic expression: quadruped

vs. four legged)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 24

Page 25: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

A generic solution to content selection

• Extensional dimension– Ontological factors

• Category of the referent (object, process, quality, etc.)• Type of things or particular (particle accelerator vs. the LHC)

• Contextual dimension– Conceptual system of the domain– Background knowledge of the user

• Communicative dimension– User needs– Communication situation (linguistic expression: quadruped

vs. four legged)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 25

Page 26: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Solution to content representation➔ Represent contents with models based on

categories and relations (Sager 1990, Sager & L’Homme 1994)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä

leukocyteAn achromatic cell of the myeloid or lymphoid lineages capable of ameboid movement, found in blood or other tissue.

is_a OBJECT

capable_of PROCESS

develops_from MATERIAL ENTITY

located_in MATERIAL ENTITY

26

Page 27: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Questions

• What kinds of entities are there?• What are their characteristics?

➔Use a realist upper level ontology: the Basic Formal Ontology (BFO)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 27

Page 28: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Proposal

• Create definition models based on the BFO categories and their characteristics

• See to what extent these models predict the contents of definitions

➔Ontological analysis framework

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 28

Page 29: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Methodology

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 29

Page 30: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Example: modeling definitions of objectsligament (UBERON_0000211)

gen is_a OBJECT Dense regular connective tissue

spe bearer_of FUNCTION

connecting two or more adjacent skeletal elements or supporting an organ.

leukocyte (CL_0000738)

gen is_a OBJECT An achromatic cell

spe other: develops_from MATERIAL ENTITY

of the myeloid or lymphoid lineages

spe bearer_of DISPOSITION

capable of ameboid movement,

spe located_in MATERIAL ENTITY

found in blood or other tissue.

OBJECT

NCBO Webinar Series | October 29, 2014 | Selja Seppälä

object

s has_part objects has_part object aggregate

material entity

s bearer_of dispositions bearer_of qualitys contains processs contains process boundarys has_history processa has_part immaterial entitya has_part material entitya located_in independent continuants material_basis_of dispositiona occupies three-dimensional spatial regiona part_of immaterial entitya part_of material entitys participates_in process

30

Page 31: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Creating definition models

• Use BFO specifications, OWL file & FOL axioms

• To analyze BFO categories as relational configurations (RCs)

‘category + relation + category’

E.g.: some object has_part objectsome object participates_in process

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 31

Page 32: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

material entities in BFO 2.0

NCBO Webinar Series | October 29, 2014 | Selja Seppälä

object aggregatea has_member_part object

fiat object parta part_of object

material entitys bearer_of dispositions bearer_of qualitys contains processs contains process boundarys has_history processa has_part immaterial entitya has_part material entitya located_in independent continuants material_basis_of dispositiona occupies three-dimensional spatial regiona part_of immaterial entitya part_of material entitys participates_in process

objects has_part objects has_part object aggregate

32

is_ais_a

is_a

Page 33: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

objects has_part objects has_part object aggregate

material entitys bearer_of dispositions bearer_of qualitys contains processs contains process boundarys has_history processa has_part immaterial entitya has_part material entitya located_in independent continuants material_basis_of dispositiona occupies three-dimensional spatial regiona part_of immaterial entitya part_of material entitys participates_in process

The model for object Creating relational models with proper and inherited RCs

Relations characterizing the category object

Relations inherited from the category material entity

OBJECT

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 33

Page 34: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Testing the models

Applying the models to multilingual & multi-domain textual definitions corpora•Segmenting: GEN + SPE•Annotating: with relational models•Statistical analyses to see – Which relations are more relevant for defining– If new ones can be added to the models

Relevant models from generic ones

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 34

Page 35: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Annotation example (1)<FICHE langue="en”>

<NI>EN_AB_11</NI><CM>

<DOMAINE>protein biosynthesis in eukaryotic cells</DOMAINE>

<SS-DOM1>cell</SS-DOM1><SS-DOM2>protein</SS-DOM2>

</CM><VE>protein</VE><DF>

<GEN REFrelGEN="is_a OBJECT" relationVE="GENRE_PROCH">A molecule

</GEN><SPE REFrelSPE="has_part OBJECT AGGREGATE">consisting of different amino acids linked to each other in a specific sequence by peptide bonds.</SPE></DF></FICHE>NCBO Webinar Series | October 29, 2014 | Selja Seppälä 35

Page 36: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Annotation example (2)...

<VE>protein</VE><DF><GEN REFrelGEN="is_a OBJECT"

relationVE="GENRE_PROCH">A molecule</GEN><SPE REFrelSPE="has_part OBJECT AGGREGATE">consisting of different amino acids linked to each other in a specific sequence by peptide bonds.</SPE>

</DF>...

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 36

Page 37: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Results grid for OBJECT (BFO 1.0)

NCBO Webinar Series | October 29, 2014 | Selja Seppälä

object normal categ. other total %

has_part object 15 1 1626

participates_in process 3 6 9

bearer_of quality19 3 22

60

bearer_of realizable entity31 1 32

located_at temporal region0 0 0

located_in site 2 2 4

other 14 14 14

total 70 13 14 97 100

37

Page 38: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Creating BFO-relatedlinguistic/NLP resources

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 38

Page 39: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Creating input resources• Corpora of existing definitions– Multi-domain & multilingual– Checked for adequcy of form (spelling, punctuation, etc.)– May be syntactically parsed

• BFO-Definition-Models– XML format– Encoded: full/short name for relations, synonyms, sources,

purl, comments, modality (all, some, no)• Programs for automating– Genus segmentation & tagging– Genus annotation with BFO categoryMore systematic & increased reliabilityPre-annotation of large-scale corpora

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 39

Page 40: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Genus tagging: segmenting

A. Create processing resources1. Morpho-sytactic rules to get SPE preceding the GEN2. List of terms from dictionary or ontology3. Morpho-syntactic rules to find end of GEN

B. For each definition1. Apply rules in A.1 to tag beginning of definitions with

a premodifying <SPE> tag2. Search for terms at the beginning of definition or

after the first <SPE> tag using list in A.23. If term found, tag with <GEN> tag4. Else, apply rules in A.3 to tag end of GEN that does

not contain a termNCBO Webinar Series | October 29, 2014 | Selja Seppälä 40

Page 41: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Segmenting examples

ligament<GEN REFrelGEN="???" relationVE="GENRE_PROCH">

Dense regular connective tissue</GEN><SPE>connecting two or more adjacent skeletal elements or supporting an organ.

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 41

morpho-syntactic rule:

</GEN><SPE>.*ing

Page 42: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Genus tagging: annotating

• No existing mappings of linguistic expressions to BFO categories

• Tests to create such reseources usingA. Terms in BFO-compliant ontologies (e.g., mid-level)

B. WordNet-KYOTO-DOLCE-Lite-BFO mappings• Processing– Get single/complex terms – Get headword of complex terms– Create synsets of single & complex terms linked to

headwords A. Map to BFO categories via terms from BFO-compliant ontologiesB. Map single terms/headwords to WN synsets search synsets’ ids in

KYOTO get DOLCE categories map to BFO categoriesNCBO Webinar Series | October 29, 2014 | Selja Seppälä 42

Page 43: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

conclusion

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 43

Page 44: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Output resources

• Bilingual multi-domain training corpora (comparable & parallel corpora)

• From corpora– Examples for BFO categories– Lexicons mapped to BFO categories & relations

• Annotation manual• Annotation and validation interfaces

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 44

Page 45: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Future work

• Conduct large-scale corpus studies➔ Annotation campaign➔ Larger training corpora for machine learning

• Possible use of the produced linguistic resources to automatically generate textual definitions from logical ones, and vice versa

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 45

Page 46: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Thank [email protected]://seljaseppala.wordpress.com

46NCBO Webinar Series | October 29, 2014 | Selja Seppälä

Page 47: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

References• Juan Sager. 1990. A practical course in terminology processing. John

Benjamins, Amsterdam, Philadelphia.• Juan Sager and Marie-Claude L’Homme. 1994. A model for the definition

of concepts: Rules for analytical definitions in terminological databases. Terminology, 1(2):351–373.

• Selja Seppälä. 2006. Semi-automatic checking of terminographic definitions. In International Workshop on Termi- nology design: quality criteria and evaluation methods (TermEval) — LREC 2006, 22–27, Genoa, Italy, 2006.

• Selja Seppälä. 2010. Automatiser la rédaction de définitions terminographiques : questions et traitements. In RECITAL 2010, Montréal, Canada, 19–22 juillet.

• Selja Seppälä. 2012. Contraintes sur la sélection des informations dans les définitions terminographiques: vers des modèles relationnels génériques pertinents. PhD thesis, Département de traitement informatique multilingue (TIM), Faculté de traduction et d’interprétation, Université de Genève.

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 47

Page 48: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Outcomes

• Pilot study suggests that BFO-based models are predictive of definitions’ contents

• When not predictive, BFO categories and relations can be used as a fixed metalanguage

• Methodology may reveal ‘typical features’ for defining each category weighted differentia

• Creation of lingustic resources for a better integraton of BFO to NLP processing

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 48

Page 49: Modeling Textual Definition Contents With BFO 2.0 and Creating Related Linguistic/NLP Resources Selja Seppälä October 29, 2014 NCBO Webinar Series

Advantages• (Semi-)formal structuring of textual definitions

Homogeneity Internal structure linked to the ontology

• No complex BFO labels displayed (only in metadata) Increased readability & user-friendliness

• Allows more complex querying via the definition (through metadata AND text)

• Domain-adaptable through corpus analysis• Methodology applicable to– More specific categories from ontologies extending BFO– Other upper-level ontologies

NCBO Webinar Series | October 29, 2014 | Selja Seppälä 49