enhancing quality of retrieval through concept edit history -- evs update

21
February 26, 200 3 NCICB Jamboree 1 Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update Frank Hartel Sherri De Coronado Gilberto Fragoso Iris Guo Kim Ong

Upload: ulf

Post on 28-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update. Frank Hartel Sherri De Coronado Gilberto Fragoso Iris Guo Kim Ong. Outline. Terminology development -- concept creation, modification, split, merge, retirement Edit history Usage TDE Ontylog editor extension - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 1

Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

Frank HartelSherri De CoronadoGilberto FragosoIris Guo Kim Ong

Page 2: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 2

Outline

• Terminology development -- concept creation, modification, split, merge, retirement

• Edit history Usage

• TDE Ontylog editor extension

• Next steps

• Summary

Page 3: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 3

Elementary Edit Actions In Terminology Development

Version 1

Create Split

Retire Merge

Modify

Version 2

Create Split

Retire Merge

Modify

Version 3

Create Split

Retire Merge

Modify

Version 4

Create Split

Retire Merge

Modify

(Create, Modify, Split, Merge, Retire)

Evolution of versions/baseline over time

Page 4: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 4

Scientific Reasons for Concept Splits

• Oncogene ras discovered based on sequence homology (hybridization) to the v-onc gene of the Harvey strain of murine sarcoma virus.

• Subsequently, it was discovered that there were multiple related ras genes, Ha-ras, and Ki-ras. Later on, a new ras, N-ras, was found.

Page 5: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 5

• BCL1 gene discovered in the vicinity of a t(11;14) translocation, involved in the malignant transformation of B cells.

• PRAD1 gene found in parathyroid adenomas bearing chromosomal abnormalities.

• CCND1 codes for one of a set of proteins, cyclins, that regulate cell cycle progression.

Scientific Reasons for Concept Merges

Page 6: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 6

Concept Based Retrieval

D1<C1, C2>D2<C1, C3, C4 >

Document Indexing terms

Concepts used for retrieval

C2

C1

SearchEngine Relevant documents

User

Page 7: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 7

Edit History UsageThesaurus version

new

retire

split

merge

modify

Version 1

Version 2

Version 3

Version 4

Concepts used forretrieval

• Document are often indexed using different versions of terminology. • Re-indexing document to keep in pace with changes made to the terminology is impractical and can be very costly. • Edit history can greatly enhance precision and recall.

pre-indexeddocuments

SearchEngine

R1

R2

R3

R4

Edit History

Page 8: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 8

Edit History Storage

Page 9: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 9

Terminology Development Environment

Page 10: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 10

Terminology Development Environment

• Previously, only three types of edit action are logged – add, modify, and delete.

• Concepts created through split actions are confounded by newly created concepts.

• Concepts merged into other concepts are indistinguishable from retired concepts.

• Failure to explicitly track merge and split edit actions may result in a low recall rate in information retrieval.

* Recall defines the number of relevant documents retrieved as fraction of all relevant documents.

Page 11: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 11

Approach Taken to Extend TDE

• Create reusable concept edit tree Java bean• Develop user interface for processing split,

merge, and retirement edit actions • Log edit events in TDE history database with

clarity and precision

Page 12: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 12

Extend Ontylog Editor With Plug-Ins

Use Concept Edit Tree widget to build plug-ins

Page 13: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 13

TDE Extension - Split Panel

Edit action is explicitly logged in the TDE History database as a split event.

A concept is created as a result of a split.

Roles and properties may be transferredfrom one concept to another using drag & drop.

Page 14: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 14

TDE Extension - Merge Panel

Edit action is explicitly logged in the TDE History database as a merge event.

Concept to stay Concept to retire

Non-redundant roles and properties are transferredfrom the retiring concept to the resultant merged concept.

Page 15: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 15

TDE Extension - Preretirement

Concept to retire

•Sub-concepts are re-treed.•Role relationships targeted (i.e., pointing) to the retiring concept are either removed or re-targeted.

Concept can be retired only if all preconditions are met.

Page 16: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 16

TDE Extension - Retire Panel

Edit action is explicitly logged in the TDE History database as a retire event.

A non-editable tree shows concept definition informationpertinent to the retiring concept.

Page 17: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 17

Next Steps

• Consolidate edit history logged by individual modelers in terminology development environment (TDE) into concept history data useful to Distributed Terminology System (DTS) users

Page 18: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 18

Next Steps• Extend caBIO and DTS Server capability to

facilitate high quality information retrieval

End User Applications

caBIO.jar

DTS History

API

DTS Extension

DTS Server

XMLRPC Client

XMLRPC Server

Edit history database

EVSRepositories of IndexedDocument

to be developed )(

External Databases Concepts used for

retrieval

Page 19: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 19

Summary

• Tracking explicit edit actions in TDE is absolutely essential to terminology and concept based information retrieval.

• We have successfully extend TDE Ontylog editor to explicitly track split, merge, and retirement edit events.

• Concept history data and supporting APIs will soon become available to DTS users and developers through caBIO.

caBIO (Cancer Bioinformatics Infrastructure Objects)

Page 20: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 20

EVS Team

Frank HartelSherri De CoronadoGilberto FragosoMargaret HaberLarry WrightJim OberthalerNorthrop Grumman, Inc.Kevric CorporationAspen Inc.Apelon, Inc.

Kim OngIris Guo

Bob Dione

Page 21: Enhancing Quality of Retrieval Through Concept Edit History -- EVS Update

February 26, 2003 NCICB Jamboree 21

Contact

Dr. Francis W. HartelCenter for BioinformaticsNational Cancer Institute6116 Executive Blvd. Rockville, MD 20892-8335Phone: (301) 435-3869 Fax: (301) 480-4222 Email: [email protected]