semantic problems of thesaurus mapping

21
1 ICS-FORTH April 10, 2002 Semantic Problems of Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Lund, April 10-12, 2002 Information Systems Group

Upload: zeal

Post on 21-Jan-2016

52 views

Category:

Documents


1 download

DESCRIPTION

Semantic Problems of Thesaurus Mapping. Martin Doerr. Information Systems Group. Institute of Computer Science. Foundation for Research and Technology - Hellas. Lund , April 10-12 , 200 2. Thesaurus Mapping Thesaurus Interoperability. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semantic Problems of Thesaurus Mapping

1ICS-FORTH April 10, 2002

Semantic Problems ofThesaurus Mapping

Martin Doerr

Foundation for Research and Technology - HellasInstitute of Computer Science

Lund, April 10-12, 2002

Information Systems Group

Page 2: Semantic Problems of Thesaurus Mapping

2ICS-FORTH April 10, 2002

Thesaurus MappingThesaurus Interoperability

Objectives: Global access to heterogeneous information sources

Contextual problems of information sources: Different providers Different objectives Overlapping topics/ themes

Where do we need thesauri ? Enhancing full text retrieval, query formulation aids Querying structured data & metadata with controlled vocabularies Classification systems for information organization

Page 3: Semantic Problems of Thesaurus Mapping

3ICS-FORTH April 10, 2002

Thesaurus MappingThe Problem

I ask for Cactus - you know Cholla... I “chaffinch” - you “fringilla coelebs Ι “dolls, Hopi” - you “kachina” I “Champs Elysees” - you “France” I “Greece, Acropolis” - you “restaurant Acropolis” I “Architecture (studies)” - you : “Architecture (buildings)”

Thesauri differ in language: natural, scientific or by convention in subject: coverage, completeness and detail in version: state of development

Page 4: Semantic Problems of Thesaurus Mapping

4ICS-FORTH April 10, 2002

Thesaurus Mapping“Thesaurus Transition”

??

User’s Authorities

Target Authorities CMS Collections

old version

specialized

DistributedRetrieval

Local Term

Agreed-on Term

foreignlanguage

Page 5: Semantic Problems of Thesaurus Mapping

5ICS-FORTH April 10, 2002

Thesaurus MappingWhy do we need mapping?

Thesaurus mapping is central for: Thesaurus merging Thesaurus correlation / interlinking Thesaurus federation

Mapping can be concept-based: Terms are identified with the set of objects they correctly classify Broader terms are regarded to classify supersets Correct mapping is defined through equivalent query results Depends on term use rather than comprehension of a term Mapping logic should conform with query paradigm (Z39.50?)

Page 6: Semantic Problems of Thesaurus Mapping

6ICS-FORTH April 10, 2002

Thesaurus MappingTwo approaches – Three

communities

Automatic mapping: Based on parallel indices/ similar documents Statistical & neural network methods Cheap and with optimal coverage Missing intellectual insight Cannot separate if terms express different aspects or if terms are

used for different aspects. (May confuse mapping of concepts with concept co- occurrence in the document sample)

limited precision

Page 7: Semantic Problems of Thesaurus Mapping

7ICS-FORTH April 10, 2002

Thesaurus MappingTwo approaches – Three

communities

Intellectual mapping: Manual, based on expert knowledge about terms Can be supported by Description Logics (“Ontologies”) Expensive, but with high precision Insight in structure and long-term stability

Proposition: The intellectual structures are complex. Their investigation is helpful for better intellectual and refined statistical mapping methods.

Page 8: Semantic Problems of Thesaurus Mapping

8ICS-FORTH April 10, 2002

ANDEnglish Thesaurus French Thesaurus

English Vocabulary French Vocabulary

interthesaurusrelations

for query expansion(thesaurus transition)

linguistictranslationas “lead-in”

linguistictranslationas “lead-in”

+/-

Interlinguafor agreements

+/- +/-

Thesaurus Mapping Translation and Mapping

Page 9: Semantic Problems of Thesaurus Mapping

9ICS-FORTH April 10, 2002

Interthesaurus relations (ISO 5964):

• partial equivalence Must become: broader equivalence (is subset of) narrower equivalence (is superset of)

• exact equivalence (same set as)

• inexact equivalence (overlaps with)good for FTR only

• single to multiple equivalence Must become: exact equivalence to BOOLEAN combination of

target terms: “AND” (intersection), “OR” (union), “NOT” (complement)

Thesaurus Mapping Logics of Mapping for Z39.50

Page 10: Semantic Problems of Thesaurus Mapping

10ICS-FORTH April 10, 2002

BT

Thesaurus MappingBoolean AND-Combinations

AB AND C

Exact equivalence

Boolean Compound

• Uses instances of both, B and C• Combines properties of B and C• Is NT of B, C and BT of their common narrower terms.

CB

Page 11: Semantic Problems of Thesaurus Mapping

11ICS-FORTH April 10, 2002

Thesaurus Mapping Issues of Mapping Logics for

Z39.50

How to use Boolean expressions inversely : Calculation of inferences Boolean combinations to a post-coordinated thesaurus:

How to index the existence of an incoming link ?

Mappings must be complete: Should guarantee recall over non-equivalent terms :

preservation of precision or recall should be selectable Should avoid redundancies, need consistency control ! Should avoid Combinatorial explosion:

Need cascading Thes A => Thes B => Thes C

Page 12: Semantic Problems of Thesaurus Mapping

12ICS-FORTH April 10, 2002

BT

Thesaurus MappingApproximation by Inclusion

A

CB

Broader equivalence

Narrower equivalences

Page 13: Semantic Problems of Thesaurus Mapping

13ICS-FORTH April 10, 2002

Thesaurus Mapping Obstacles to Thesaurus Transition

Unclear coverage & incompatible organisation. Special vocabularies often contain general terms,

contract upper levels. No global abstraction levels.

Missing or contradictory NT/BT relations.

“Loose” NT semantics (like part-whole, see-also etc.).

Arbitrariness of monohierarchies :

E.g. : A hierarchy of colorants, like “red organic dye”:

organize it: by composition, production method or origin ?

by color ?

by physical property or function ?

Page 14: Semantic Problems of Thesaurus Mapping

14ICS-FORTH April 10, 2002

Thesaurus Mapping Obstacles to Thesaurus Transition

Term semantics. Post-coordination should make use of DL:

— Combinations from disjoint facets: “factories + grinding”.— Unclear rules for allowed combinations.— How to attach and index synonyms in a post-coordinated

hierarchy.

Use-induced incompatibility:— E.G. Subject/object : “brigde” - “bridge construction.”

“Complementary polysemy” (Pustejowsky):

— Context-induced shifts of meaning: door, architecture etc.

… cause context-related differences in hierarchy.

Page 15: Semantic Problems of Thesaurus Mapping

15ICS-FORTH April 10, 2002

Thesaurus Mapping Complementary Polysemy and Minor

Facets

“Minor facets” provide explicit context criteria: E.G. MDA archeological thesaurus:

armour by construction : scale armour armour by form : cuirass armour by function : parade armour

Are these criteria idiosyncratic?

How do they relate to each other ?

How do they relate to compound term formation?

Page 16: Semantic Problems of Thesaurus Mapping

16ICS-FORTH April 10, 2002

Thesaurus Mapping Minor Facets in the AAT

The “object” facet (1998 edition) contains: About 1640 facet indicators, About 600 with explicit criteria (“by form etc..”) Using 150 ! criteria

Preliminary frequency analysis of criteria: Form: 35%, function: 30%, placement: 15%, construction: 15%,

social context: 5%…

Hypothesis: Minor facets criteria can be systematically generalized Minor facet criteria are different kinds of NT relations

Page 17: Semantic Problems of Thesaurus Mapping

17ICS-FORTH April 10, 2002

Thesaurus Mapping Narrower Terms for three Facets

objects

swords

sword-like objects

foils (swords)

weapons

sword-likeFighting and hunting

cutting and thrusting

fencing

cutting and thrusting weapons

Fencing swords

Wooden swords

Wooden

Term specialization

Criteria assignment

Page 18: Semantic Problems of Thesaurus Mapping

18ICS-FORTH April 10, 2002

Thesaurus Mapping Explicit facet criteria for objects

Hierarchyof object forms

Hierarchy of construction

features

Hierarchy of functions and social roles

Hierarchy of compound termswith embedded characteristic terms Descriptive

aspects /descriptionelements

F a c e t F a c e t

F a c e t

Page 19: Semantic Problems of Thesaurus Mapping

19ICS-FORTH April 10, 2002

Thesaurus Mapping Summary of Semantic Problems

We could identify four semantic problems (statistical methods are not sensitive to semantic problems)

Logics of query term expansion between compatible hierarchies

Theory of concept formation by compound terms, linguistic and semantic. “KR “ should collaborate with experienced thesaurus editors.

Understanding of context –dependency of term hierarchies: understanding of the role of complementary polysemydifferences between subject and object classification.

Meaning of terms versus meaning of term used for a document

Page 20: Semantic Problems of Thesaurus Mapping

20ICS-FORTH April 10, 2002

Thesaurus Mapping What To Do

Research: Deeper understanding.

Investigation of polyhierarchies, polysemy and BT/NT semantics.

Theory of concept formation by compound terms, from linguistics and logic.

Use of ontologies as “top-level thesauri”, to provide.

— highest levels (like physical objects, actors, events).

— “roles” for concept formation (e.g. “using”, “made for”, “made in”).

— transition between single terms and terms in multiple fields (e.g. type: “sword”, material: “wood” versus “wooden

sword”).

Page 21: Semantic Problems of Thesaurus Mapping

21ICS-FORTH April 10, 2002

Thesaurus Mapping What To Do

Protocols: enabling dynamic thesaurus transition

Metadata for description of the logic of a thesaurus

— BT/NT semantics, organization principles, lead-ins Recall/precision control in thesaurus transition DL-based post-coordination rules. Explicit use of “Roles”.

Practice: Analysis of semantic heterogeneity Comparing thesauri wrt logic of construction and intended use. Understanding semantics of automatic mappings,

integration of intellectual and automatic methods.