semantics-empowered text exploration for knowledge discovery delroy cameron, pablo n. mendes, amit...
TRANSCRIPT
Semantics-Empowered Text Exploration for Knowledge Discovery
Delroy Cameron, Pablo N. Mendes, Amit P. ShethKnowledge Enabled Information and Services Science Center (Kno.e.sis)
Department of Computer Science and EngineeringWright State University
Dayton, OH
Victor ChanDivision of Biosciences and Performance
Human Effectiveness DirectorateAir Force Research Lab (AFRL)Wright-Patterson Air Force Base
Dayton, OH
48th ACM Southeast Conference. ACMSE 2010. Oxford, Mississippi. April 15-17, 2010.
OUTLINE
Background Paradigm Shift Demo Architecture Experimental Results Future Work Conclusion
3
BACKGROUND
IR Systems - Interaction Paradigm Manually seek information Hyperlinked Documents Document-Centric Model
Basis - Interaction Paradigm Keyword Search Document Browsing
4
SBACKGROUND
Interaction Sequence 1. Assemble Keywords and Search2. Document Selection3. Document Inspection 4. Aggregation/Organization
5
Information Need
What is the role of Magnesium in relation
to Migraine?
Magnesium migraine Search
LIMITATIONS
Query Reformulations Impatient users Recognition over Recall
Constrained navigation Hyperlink dependent - apriori
Fuzzy User Interests Haiti Earthquake – Recovery, Relief, Political Climate, Crime
Ineffective for Exploratory Search Search-and-Sift
Query: Father of the WebAnswer: Sir Tim Berners-Lee
Amit P. Sheth, Cartic Ramakrishnan: Relationship Web: Blazing Semantic Trails between Web Resources. IEEE Internet Computing 11(4): 77-81 (2007)
MOTIVATION
Users are A priori hyperlink dependent
Semantic Web Standards Entity Identification (Semantic Annotations) Relationship and Triple Identification Explore documents/information via relationships
information seekersInformation documentsis embedded in
7
PARADIGM SHIFT
Search Hit > Annotated Hit Bag of annotated words/phrases Annotated phrase is known entity Entity is Subject/Object of Triple
Navigation driven by relationships Entity[Document]Entity[Document] RelationshipRelationship Entity[Document]Entity[Document]
Contextual Navigation (relationships as context)
8
CONTRIBUTIONS
1. Novel Information Exploration Paradigm Data-Centric Model
2. Demonstrate use of background knowledge Named Entities, Relationships
3. Prototype Implementation Semantic annotations for navigation
4. Aggregation Utilities Saving, bookmarking, publishing etc
9
DEMO
10
Trie-based Spotter for Named Entity Identification used ultimately for document annotation
Semantic BrowserSemantic Browser
Controlled Vocabulary992,281 DBpedia terms
15,742 HPCO terms5,232 UMLS terms
Controlled Vocabulary992,281 DBpedia terms
15,742 HPCO terms5,232 UMLS terms
Medline(19 million Abstracts)
Medline(19 million Abstracts)
Spotter Module
Document CorpusLinked Open
Data
Save PublishOrganize
Utilities provided for promoting, bookmarking, and saving search results
Annotated entities provide anchors that serve as entry points to navigation
Semantic Trail Log
Sequential record of each triple navigated by a user
Yahoo (indexed documents
accessed as a Web Service using Yahoo Search Boss)
Yahoo (indexed documents
accessed as a Web Service using Yahoo Search Boss)
Articles saved using Lucene. Indexed as of Aug. 2009
Figure 1: System Components and Architecture
ARCHITECTURE
1
2
1
2
3
4
3
4
5
6
7
8
Background Knowledge
HCPO Ontology
UMLS
IMPLEMENTATION
Spotter Module <abstract>
Dietary restriction with hypomagnesia is normally associated with diminished urinary excretion. </abstract>
magnesium
magnesium
UMLS Controlled Vocabulary
Entity Label PubMed ID
Magnesium Deficiency
C0024473
Dietary restriction with hypomagnesia
C0024467Magnesium
EntityID:
This process is called Spotting and uses a Trie data structure.
12
magnesium
ARCHITECTURE
Document Corpus Medline Lucene Index - 19 million abstracts Aug 2009. REST Endpoint: http://knoesis1.wright.edu/IndexWrapper XML Response (or JSON) Keyword queries, Document IDs
Background Knowledge UMLS (Unified Medical Language System)
5,232 entities and 16,540 triples HPCO (Human Performance & Cognition Ontology)
15,742 entities and 22,298 triples
13
• Rank Feature on [1-5] scale• Normalized Relative Aggregated Scores
EVALUATION
Evaluation MetricsSearch User Interfaces
Semantic Browser (Medline + UMLS) PubMed Yahoo
Interface Design 0.93 0.88 1.00
Useful Features 1.00 0.67 0.65
Motivation to Explore 1.00 0.58 0.65
Information Novelty 1.00 0.76 0.79
Effectiveness of Task outcome 1.00 0.65 0.80
Required Cognitive Load 1.00 0.60 0.64
Overall Satisfaction 1.00 0.62 0.78
14
CONCLUSION
Novel Information Exploration Paradigm
Semantic Browser support Contextual Navigation
Identify Named Entities and Relationships
Provide Semantic Annotations
Utilities for Aggregation
Semantic Trails to Knowledge Discovery
15
x
• Formal Model for Paradigm Shift
• Improved Spotter– Additional Vocabularies, Context, Rule Based
• Relationship Ranking
• Document Re-ranking
• Trail Logs Analysis
FUTURE WORK
16
ACKNOWLEDGEMENTS
People Cartic Ramakrishnan Bilal Gonen, Aditya Dhoke Wesley Workman, Rodrigo Gama, Guilherme de Napoli
Air Force Research Lab Human Effectiveness Directorate Wright-Patterson Air Force Base
National Science Foundation Award SemDis: Discovering Complex Relationships in the Semantic Web.No. 071441 Wright State UniversityNo. IIS-0325464 to University of Georgia
17
QUESTIONS
18
• Semantic Web Semantic Web – is an extension of the current web extension of the current web in which data is expressed in a common vocabularycommon vocabulary making such that the data becomes machine processablemachine processable.
• OntologyOntology – is a specification of conceptsconcepts and relationshipsrelationships between them.
• TripleTriple - a ternary relation containing an entity pair and a relationship that expresses the link between them i.e. subject-predicate-objectsubject-predicate-object
• Entity/ConceptEntity/Concept – an instance of a thingthing
• URIURI – a unique identifier for any resource/entity/thing on the web
• LODLOD - a semantic web initiative to provide a repository of semantically connected datasets
TERMINOLOGY
19