semantics-empowered text exploration for knowledge discovery delroy cameron, pablo n. mendes, amit...

19

Upload: emma-burns

Post on 12-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services
Page 2: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

Semantics-Empowered Text Exploration for Knowledge Discovery

Delroy Cameron, Pablo N. Mendes, Amit P. ShethKnowledge Enabled Information and Services Science Center (Kno.e.sis)

Department of Computer Science and EngineeringWright State University

Dayton, OH

Victor ChanDivision of Biosciences and Performance

Human Effectiveness DirectorateAir Force Research Lab (AFRL)Wright-Patterson Air Force Base

Dayton, OH

48th ACM Southeast Conference. ACMSE 2010. Oxford, Mississippi. April 15-17, 2010.

Page 3: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

OUTLINE

Background Paradigm Shift Demo Architecture Experimental Results Future Work Conclusion

3

Page 4: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

BACKGROUND

IR Systems - Interaction Paradigm Manually seek information Hyperlinked Documents Document-Centric Model

Basis - Interaction Paradigm Keyword Search Document Browsing

4

Page 5: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

SBACKGROUND

Interaction Sequence 1. Assemble Keywords and Search2. Document Selection3. Document Inspection 4. Aggregation/Organization

5

Information Need

What is the role of Magnesium in relation

to Migraine?

Magnesium migraine Search

Page 6: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

LIMITATIONS

Query Reformulations Impatient users Recognition over Recall

Constrained navigation Hyperlink dependent - apriori

Fuzzy User Interests Haiti Earthquake – Recovery, Relief, Political Climate, Crime

Ineffective for Exploratory Search Search-and-Sift

Query: Father of the WebAnswer: Sir Tim Berners-Lee

Amit P. Sheth, Cartic Ramakrishnan: Relationship Web: Blazing Semantic Trails between Web Resources. IEEE Internet Computing 11(4): 77-81 (2007)

Page 7: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

MOTIVATION

Users are A priori hyperlink dependent

Semantic Web Standards Entity Identification (Semantic Annotations) Relationship and Triple Identification Explore documents/information via relationships

information seekersInformation documentsis embedded in

7

Page 8: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

PARADIGM SHIFT

Search Hit > Annotated Hit Bag of annotated words/phrases Annotated phrase is known entity Entity is Subject/Object of Triple

Navigation driven by relationships Entity[Document]Entity[Document] RelationshipRelationship Entity[Document]Entity[Document]

Contextual Navigation (relationships as context)

8

Page 9: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

CONTRIBUTIONS

1. Novel Information Exploration Paradigm Data-Centric Model

2. Demonstrate use of background knowledge Named Entities, Relationships

3. Prototype Implementation Semantic annotations for navigation

4. Aggregation Utilities Saving, bookmarking, publishing etc

9

Page 10: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

DEMO

10

Page 11: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

Trie-based Spotter for Named Entity Identification used ultimately for document annotation

Semantic BrowserSemantic Browser

Controlled Vocabulary992,281 DBpedia terms

15,742 HPCO terms5,232 UMLS terms

Controlled Vocabulary992,281 DBpedia terms

15,742 HPCO terms5,232 UMLS terms

Medline(19 million Abstracts)

Medline(19 million Abstracts)

Spotter Module

Document CorpusLinked Open

Data

Save PublishOrganize

Utilities provided for promoting, bookmarking, and saving search results

Annotated entities provide anchors that serve as entry points to navigation

Semantic Trail Log

Sequential record of each triple navigated by a user

Yahoo (indexed documents

accessed as a Web Service using Yahoo Search Boss)

Yahoo (indexed documents

accessed as a Web Service using Yahoo Search Boss)

Articles saved using Lucene. Indexed as of Aug. 2009

Figure 1: System Components and Architecture

ARCHITECTURE

1

2

1

2

3

4

3

4

5

6

7

8

Background Knowledge

HCPO Ontology

UMLS

Page 12: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

IMPLEMENTATION

Spotter Module <abstract>

Dietary restriction with hypomagnesia is normally associated with diminished urinary excretion. </abstract>

magnesium

magnesium

UMLS Controlled Vocabulary

Entity Label PubMed ID

Magnesium Deficiency

C0024473

Dietary restriction with hypomagnesia

C0024467Magnesium

EntityID:

This process is called Spotting and uses a Trie data structure.

12

magnesium

Page 13: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

ARCHITECTURE

Document Corpus Medline Lucene Index - 19 million abstracts Aug 2009. REST Endpoint: http://knoesis1.wright.edu/IndexWrapper XML Response (or JSON) Keyword queries, Document IDs

Background Knowledge UMLS (Unified Medical Language System)

5,232 entities and 16,540 triples HPCO (Human Performance & Cognition Ontology)

15,742 entities and 22,298 triples

13

Page 14: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

• Rank Feature on [1-5] scale• Normalized Relative Aggregated Scores

EVALUATION

Evaluation MetricsSearch User Interfaces

Semantic Browser (Medline + UMLS) PubMed Yahoo

Interface Design 0.93 0.88 1.00

Useful Features 1.00 0.67 0.65

Motivation to Explore 1.00 0.58 0.65

Information Novelty 1.00 0.76 0.79

Effectiveness of Task outcome 1.00 0.65 0.80

Required Cognitive Load 1.00 0.60 0.64

Overall Satisfaction 1.00 0.62 0.78

14

Page 15: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

CONCLUSION

Novel Information Exploration Paradigm

Semantic Browser support Contextual Navigation

Identify Named Entities and Relationships

Provide Semantic Annotations

Utilities for Aggregation

Semantic Trails to Knowledge Discovery

15

Page 16: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

x

• Formal Model for Paradigm Shift

• Improved Spotter– Additional Vocabularies, Context, Rule Based

• Relationship Ranking

• Document Re-ranking

• Trail Logs Analysis

FUTURE WORK

16

Page 17: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

ACKNOWLEDGEMENTS

People Cartic Ramakrishnan Bilal Gonen, Aditya Dhoke Wesley Workman, Rodrigo Gama, Guilherme de Napoli

Air Force Research Lab Human Effectiveness Directorate Wright-Patterson Air Force Base

National Science Foundation Award SemDis: Discovering Complex Relationships in the Semantic Web.No. 071441 Wright State UniversityNo. IIS-0325464 to University of Georgia

17

Page 18: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

QUESTIONS

18

Page 19: Semantics-Empowered Text Exploration for Knowledge Discovery Delroy Cameron, Pablo N. Mendes, Amit P. Sheth Knowledge Enabled Information and Services

• Semantic Web Semantic Web – is an extension of the current web extension of the current web in which data is expressed in a common vocabularycommon vocabulary making such that the data becomes machine processablemachine processable.

• OntologyOntology – is a specification of conceptsconcepts and relationshipsrelationships between them.

• TripleTriple - a ternary relation containing an entity pair and a relationship that expresses the link between them i.e. subject-predicate-objectsubject-predicate-object

• Entity/ConceptEntity/Concept – an instance of a thingthing

• URIURI – a unique identifier for any resource/entity/thing on the web

• LODLOD - a semantic web initiative to provide a repository of semantically connected datasets

TERMINOLOGY

19