knowledge organization research in the last two decades: 1988

27
Knowledge Organization Research in the last two decades: 1988-2008 Fidelia Ibekwe-SanJuan Eric SanJuan

Upload: datacenters

Post on 14-Apr-2017

370 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Knowledge Organization Research in the last two decades: 1988

Knowledge Organization Research in the last two decades: 1988-2008

Fidelia Ibekwe-SanJuan Eric SanJuan

Page 2: Knowledge Organization Research in the last two decades: 1988

Outline

Previous workGoalData collectionAnalysis methodolgyResultsDiscussion

Page 3: Knowledge Organization Research in the last two decades: 1988

Previous works

• On trends survey in KO: – McIlwaine & Williamson (1999) – McIlwaine (2003) – Hjorland & Albrechtsen (1999) – Lopez-Huertas (2008) – Saumure & Shiri (2008)– Smiraglia (2009)

Page 4: Knowledge Organization Research in the last two decades: 1988

Previous works

• Personal readings of journals & ISKO proceedings• Query: was a query constructed and submitted to a database in order to retrieve

records?• Publications: reading / perusing of full texts?• Records: bibliographic records (titles & abstracts)

Author Period Data source Query Size

1988-1998 personal readings*McIlwaine (2003) 1998-2003 personal readings

Lopez-Huertas (1998) 1989-1998?

Saumure & Shiri 1998 1966-2006 LISTA

McIlwaine & Williamson (1999) 575 publications

Wos (SSCI) + LISA + personal readings

knowledge organization + othersinformation organization, knowledge organization 219 records

Page 5: Knowledge Organization Research in the last two decades: 1988

Previous works• Major findings: – 1998-20031998-2003: McIlwaine & Williamson (1999); McIlwaine

(2003)• Classification schemes (UDC, DCC, LCSH,..)• Bias in classification (gender, culture)• Interoperability of KO vocabularies• Rise of Internet technology, search engines, impact on KO• Resource discovery• Emerging trends in expert systems (NLP, ontologies, automatic indexing...)• Terminology management problems• Thesauri design• Information visualisation in online context

Page 6: Knowledge Organization Research in the last two decades: 1988

Previous works• Major findings: – 1989-1998?1989-1998?: Lopez-Huertas (1998);

• Mainstream research in KO are reformulations of old problems (classification, thesauri)

• Recasting them in web era gives them a new life!• Especially since KO is more & more entwined with sister fields• 2 major driving forces of research in KO:

– demand for quality & interoperability in a multilingual, multicultural world

– Managing emergent knowledge in KOS in the semantic web era• Both are reformulations of multidimensionality of knowledge• Necessitating an inter- and multi-disciplinary effort• etc...

Page 7: Knowledge Organization Research in the last two decades: 1988

Previous works• Major findings: – 1966-2006 (40 yrs!)1966-2006 (40 yrs!): pre & post-web era Saumure & Shiri

(1998);• Organizing corporate or business information• Machine-assisted knowledge organization• Information professionals• Interoperability• Cataloging and classification• Classifying the web• Digital preservation and digital libraries• Metadata applications and uses• Cognition• Education• Indexing and abstracting• Thesauri initiatives

Page 8: Knowledge Organization Research in the last two decades: 1988

Previous works• Major findings: – Saumure & Shiri (1998): 1966-2006 (40 yrs!)1966-2006 (40 yrs!): pre & post-

web era ;• Trends b/w pre (<1993, date of 1st navigator, Mosaic) and post-

web era• KO research focused throughout on mainstream topics• Cataloguing, classification• Pre-web era: more focused on indexing and cataloguing• Post-web era: metadata generation & harvesting,

interoperability, thus more technological thrust

Page 9: Knowledge Organization Research in the last two decades: 1988

Previous works• SummarySummary– Despite methodological differences in data collection and

analysis methods– Important overlaps in findings– Mainstream research is still driving KO (classification

research, cataloguing, thesauri, bias,...)– Reformulations in the web era (interoperability, metadata

creation & harvesting, assisted indexing & retrieval, terminology issues...)

Page 10: Knowledge Organization Research in the last two decades: 1988

Goal

• Trends survey of research on KO issues over past 2 decades (1988-2008), 21 yrs.

• What can we get from automatic data analysis methods?

• Can they provide any useful insight?

Page 11: Knowledge Organization Research in the last two decades: 1988

Goal

• Epistemology:– Empiricism (how): methodology - observation of evidence

from data– Pragmatism (why): is it useful and for whom?

• Some connection with bibliometrics but focus is not on mapping authors but on mapping contents

• Methodological difference with mainstream data analysis techniques: symbolic (linguistic & terminology) vs bag-of-word approach

Page 12: Knowledge Organization Research in the last two decades: 1988

Data collection (1)

• issue

• ISKO proceedingsISKO proceedings: not indexed in a machine-processable format (database)No problem for peer-reviewed journals...

• But ambiguityambiguity of KO conceptKO concept!At the end of the day... a manual selection of KO & LIS-related journalsRecords downloaded from Web-of-Science (WoS)

Page 13: Knowledge Organization Research in the last two decades: 1988

Data collection (2)

• List of 31 selected journals at http://fidelia1.free.fr/isko2010/data/list-journals.pdf

• 931 records out of which 838 came from KOKO & ancestor (International ClassificationInternational Classification)

• 45 000 words in titles & abstracts• Research trends will portray mostly publications from KOKO

journal.• Not the entire realm of publications on KOKO but we had to

be content with that...

Page 14: Knowledge Organization Research in the last two decades: 1988

Sample record from ISI-WoSPT JAU RADA, R ROSSIMORI, A PATON, R RECTOR, A MAGLIANI, F ROBBE, PD

TI THE GALEN DREAM

SO INTERNATIONAL CLASSIFICATION

AB Outlines the origin, needs and principles of GALEN, the Generalized Architecture for Languages, Encyclopedias, and Nomenclatures as applicable to Medicine. Short-term and long-term plans of GALEN have been elaborated to cope with

possible developments. ''Milestones'' are given indicating what should be reached when and how much funding will be required for each milestone. In two ''vision'' pictures the situation before and after the introduction of GALEN is shown and the responsibilities at 4 different levels are listed.

SN 0340-0050PY 1992VL 19IS 4BP 188EP 191UT ISI:A1992KH33900002

Page 15: Knowledge Organization Research in the last two decades: 1988

Analysis methodology (1)

Empirical observations of how terminology depicts knowledge artefacts (titles & abstracts)

– Terminology engineeringDescriptive text data analysis (propose automatically a partition in the data)

Hierarchical agglomerative clustering– Mapping & Visualisation:– Multidimensional view of domain structure: symbolic & numerical

information

•TermWatch system TermWatch system (SanJuan & Ibekwe-SanJuan 2006)

Page 16: Knowledge Organization Research in the last two decades: 1988

Analysis methodology (2)

- Corpus split in 2 periods* 1988-1997* 1998-2008

- Terminology modeling* Automatic extraction of terms* Term variant search

- Clustering by semantic relations- Linking clusters by co-occurrence- Mapping & visualization

Page 17: Knowledge Organization Research in the last two decades: 1988

Analysis methodology (3)

- Terminology modeling* Automatic extraction of terms* surface morpho-syntactic properties of terms

* rule implementation* extraction of likely candidates* filtering: statistical measures or manual * Problem: statistical measures work on massive data

Page 18: Knowledge Organization Research in the last two decades: 1988

Analysis methodology (4)

- Terminology modeling* Term variant search* surface morpho-syntactic operations b/w terms

* spelling variantsspelling variants (WordNet)* synonymssynonyms (USE/UFUSE/UF)(WordNet)* likely BT/NTBT/NT candidates: syntactic information* likely RTRT: lexico-syntactic information* some errors and noise * but in automation you do a trade off!

Page 19: Knowledge Organization Research in the last two decades: 1988

Analysis methodology (5)

• Some term variants acquired Paradigmatic organization (BT/NT)

classification scheme

universal classification scheme

generic classification scheme

knowledge classification scheme

Library of Congress – LC (USE/UF)

knowledge organisation scheme knowledge organization tool (RT)

• The system does not tag these relations as such• They are assumed to be implied by the variations

Page 20: Knowledge Organization Research in the last two decades: 1988

Analysis methodology (6)

• Assumptions behind terminology modeling• Consensus from studies on terminology/lexicography: new terms

(denominations of concepts) are mostly created from existing terms

• Rare creation of terms ad nihilo• Surface linguistic operations reveal semantic (conceptual?)

relations between domain concepts• By studying these operations and visualising how they relate terms• Reveal the conceptual structure of a domain

Page 21: Knowledge Organization Research in the last two decades: 1988

Analysis methodology (7)

• Clustering• 3 tier process:

1st group terms by close semantic relations2nd hierarchical clustering by lesser semantic relations (many

iterations)3rd link cluster labels by co-occurrence of labels or that of their

variants

• VisualisationThematic maps (Pajek)Navigation interface (browser)

Page 22: Knowledge Organization Research in the last two decades: 1988

Results (1)

Page 23: Knowledge Organization Research in the last two decades: 1988

Results (2)

Main topics for period 1 (1988-1997)– Global structure : typical « core - peripheral » layout

– KnowledgeKnowledge is the structuring poleClassificationClassification

– Subjects gravitating around the Knowledge pole:

analysis

online vocabulary control standardization

bibliographic information system

• indexing (automatic & manual)

thesaurus construction and usage

information documentation system

translation

Page 24: Knowledge Organization Research in the last two decades: 1988

Results (3)

In the last decade (1998-2008):• Research network is much more intertwined

No one center but several « core » issues connected to one anotherMajor topics are intertwined:KO issues ↔ classification ↔ information theoretic ↔ indexing language ↔ user evaluation

• Newer topics: web issues, metadata, knowledge discovery, computer algorithm,...

Page 25: Knowledge Organization Research in the last two decades: 1988

Results (4)1998-2008, equal divide b/w:theoretical research• information science, concept, classification theory, epistemological foundation,...

user-oriented studies• user librarian, user-defined descriptor, user evaluation

mainstream KO issues • classification, thesaurus, KO, term selection

technology oriented handling of KO issues• knowledge, system, transfer, knowledge representation, knowledge engineering, knowledge discovery, information processing, computer algorithm...• web, web designer, web document• information retrieval, terminology structuring, metadata, metadata quality

Page 26: Knowledge Organization Research in the last two decades: 1988

Discussion

Evaluation of clusters: information-theoretic problem. No solution.No gold standardGoal of the method: precisely to propose a partition amongst the dataIs it the best one?Reliance on external criteria: human (expert) evaluationSo response from the community neeeded!So response from the community neeeded!

Page 27: Knowledge Organization Research in the last two decades: 1988

• Thank you for listening