1 a semantic web-based approach for personalizing news flavius frasincar [email protected]...

38
1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar [email protected] Erasmus University Rotterdam * Joint work with Kim Schouten, Philip Ruijgrok, Jethro Borsje, Leonard Levering, and Frederik Hogenboom

Upload: benjamin-adams

Post on 31-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

1

A Semantic Web-Based Approach for Personalizing News

Flavius [email protected]

Erasmus University Rotterdam

* Joint work with Kim Schouten, Philip Ruijgrok, Jethro Borsje, Leonard Levering, and Frederik Hogenboom

Page 2: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

2

Contents

• Motivation

• Hermes Framework:1. News Classification2. Knowledge Base Updating3. News Querying4. Results Presentation

• Hermes News Portal:– An example

• Evaluation

• Conclusions

• Future Work

Page 3: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

3

Motivation

• Large quantity of news on the Web:– Difficult to find the ones of interest

• News messages have a strong impact on stock prices

• Limited annotation of RSS feeds:– Broad categories (business, cars, entertainment, etc.)

• Google finance shows direct news which pertain to a certain portfolio:– Indirect news (competitors of Google like Microsoft) are not

presented– Not possible to ask time-related queries about news

Page 4: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

4

Motivation

• Need for an intelligent system to personalize news

• The world is changing:– It is important to have an up-to-date representation of the world

into the system

• News have a dual function:– Find the information of interest– Update our previous knowledge on the state of the world

• Feedback loop:– The extracted information helps in the next iteration to refine

your domain of interest

Page 5: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

5

Hermes Framework

• Input: – News items from RSS feeds– Domain ontology linked to a semantic lexicon (e.g., WordNet)– User query

• Output:– News items as answers to the user query

• Four phases:

1. News Classification• Relate news items to ontology concepts

2. Knowledge Base Updating• Update the knowledge base with news information

3. News Querying• Allow the user to express his concepts of interest and the temporal constraints

4. Results Presentation• Present the news items that match user’s query

Page 6: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

6

Hermes Architecture

News Items

Filtered News Items

News Classification

Results Presentation

Ontology

Query Formulation

Query Execution

News Querying

Query

Classified News Items

News Item 1News Item 2News Item 3

...

Semantic Lexicon

Knowledge Base Updating

Page 7: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

7

1. News Classification

• Concept defined in the ontology (class or individual)

• Multiple lexical representations for the same concept:– Ontology synonyms (e.g., New York → “New York”, “Big Apple”)

– Semantic lexicon synonyms (e.g., buy → “acquire”)

• Concepts without subclasses or instances:– Semantic lexicon hyponyms (e.g., company → dot-com)

• Lookup ontology concepts into news items

• A longer match supersedes a shorter match (“European Central Bank” supersedes “European”)

Page 8: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

8

1. News Classification

1.1 Tokenization (words, punctuation signs)

1.2 Sentence splitting (sentences)

1.3 Part-of-speech tagging (e.g., noun, verb, adj., etc.)

1.4 Morphological analysis (e.g., lemma “read” for “reading” as a verb)

1.5 Word sense disambiguation (e.g., Structural Semantic Interconnection (SSI) based on word context)

1.6 Adding “hits” between news items and the domain ontology

Page 9: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

9

2. Knowledge Base Updating

• Knowledge base updates are based on recognized events

• Events have associated rules with:– Alternative patterns for event detection– Sequence of actions for knowledge base update

• Before knowledge base updating the discovered events need to be validated by the user

• E.g., an event is kb:newCEO which represents the appointment of a new CEO

Page 10: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

10

2. Knowledge Base Updating

2.1 Event Rules Patterns Construction

– Make use of lexico-semantic patterns

– Lexico-Semantic patterns are based on triples

(Subject, Predicate, Object) where

• [type] stands for knowledge instances of the enclosed types

E.g., [kb:Company] represents all company instances (all their associated lexical representations):– “IBM”, “International Business Machines”, etc.– “EBay”, “E-bay”, “Ebay”, etc.– Etc.

• Otherwise they represent knowledge base instances

• $name represent variables

Page 11: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

11

2. Knowledge Base Updating

2.1 Event Rules Patterns Construction

– Two types of patterns:

• SP patterns:

E.g., $c:[kb:Company] kb:GoesBankrupt matches “WorldCom goes bankrupt”, “WorldCom filed for Chapter 11”, etc.

• SPO patterns:

E.g., $p:[kb:Person] kb:BecomesCEO $c:[kb:Company]matches “Steve Ballmer appointed CEO of Microsoft”,“Steve Ballmer becomes new Chief ExecutiveOfficer of Microsoft”, etc.

Page 12: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

12

2. Knowledge Base Updating

2.2 Event Rules Patterns Execution

– Extract information from text– Assign ontology concepts to the variables

2.3 Event Validation

– The knowledge extraction process is not flawless– User validates the extracted knowledge

2.4 Event Rules Actions Construction

– Two types of actions:

• Insert triplesE.g., INSERT $c kb:hasCEO $p• Delete triplesE.g., DELETE $c kb:hasCEO $p

Page 13: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

13

2. Knowledge Base Updating

2.4 Event Rules Actions Construction (Cont’d)

– Per event a sequence of actions is defined – The order of actions is important

E.g., for the event kb:newCEO two actions are defined:

DELETE $c kb:hasCEO $ppINSERT $c kb:hasCEO $p

– Unbound variables stand for anything and are not allowed in INSERT actions (e.g., $pp in the example)

2.5 Event Rules Actions Execution

– Execute the actions associated to events in the order they are found in the news

– Per event execute in the given order the associated actions

Page 14: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

14

3. News Querying

3.1 Query Formulation

– Present the domain knowledge as directed labeled multi-graph with the additional constraint that arcs between two nodes are not allowed to share the same label (called conceptual graph)

– User selects the concepts of interest in the conceptual graph (e.g., Google)

– User is able to add to its selection concepts related to the concepts of interests using specified relations (e.g., kb:hasCompetitors: kb:Microsoft, kb:eBay, and kb:Yahoo)

– The selected concepts are presented in a separate graph (called search graph)

Page 15: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

15

3. News Querying

3.1 Query Formulation (Cont’d)

– News are time stamped

– User is able to specify that only news in a certain time interval should be retrieved

– Time constraints:• Last hour

• Last day

• Last year

• [2007-03-01T00:00:00.000+00:01, 2007-05-31T00:00:00.000+00:01 ]

– [Future: order constraints (e.g., order by time)]

Page 16: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

16

3. News Querying

3.2 Query Execution

– Generate the query in a semantic query language:• Map concepts of interest to query restrictions (current: disjunctive queries)• Map temporal constraints to query restrictions

– Execute the semantic query• The order of the relevant news items is not important here

Page 17: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

17

4. Results Presentation

4.1 News Sorting

• Return news items that match a query

• Sort the news items based on their relevance degree to the query

• The relevance degree is determined empirically:– based on a weighted sum of the number of hits in title (higher weight)

and body (lower weight) of the news item

• News items that have the same relevance degree are sorted in descending timestamp order

Page 18: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

18

4. Results Presentation

4.2 News Presentation

• Present the concepts involved in the query

• Per each news items show a summary:– Title– Source– Date– Few beginning lines from the news item ([Future: snippet])

• Emphasize the hits (found concepts from the ontology) in the retrieved news items

• Show the icons of the most important query concepts found in a news item:– based on a weighted sum of the number of hits in title (higher weight)

and body (lower weight) of a concept in a news item

Page 19: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

19

Hermes News Portal

• Hermes News Portal (HNP) is an implementation of the Hermes framework

• Implementation language: Java

• Ontology represention language: OWL (e.g., cardinality restrictions, inverses, etc.)

• Semantic lexicon: WordNet

• Graph visualization: Prefuse (OWL2Prefuse)

• Query language: SPARQL/Update

• Query language: SPARQL extended with custom time functions (e.g., currentDate(), currentTime(), etc.)

• Natural language processing: GATE

Page 20: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

20

An Example

• Query:

Which are the news items about Google or one of its competitors from the past six months?

Page 21: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

21

1. News Classification – Import News

Page 22: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

22

1. News Classification – Conceptual Graph

Page 23: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

23

acquireInc.Google

YouTube Inc.

1. News Classification – News Item

“SAN FRANCISCO (Reuters) -Web search leader Google Inc. on Monday said it agreed to acquire top video entertainment site YouTube Inc. for $1.65 billion in stock, putting a lofty new value on consumer-generated media sites.” [October 9th, 2006 at 20:15:33 CET]

•Three concepts are founded in the news:– kb:Google– kb:Buy– kb:YouTube

•kb:Relation class instances store hits between the news item and the found concepts (Semantic Web best practice recommendation for modeling N-ary relationships)

Page 24: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

24

2. Knowledge Base Updating – Rule Editor (Define Event Rule Patterns)

Page 25: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

25

2. Knowledge Base Updating – Rule Editor (Select Concepts)

Page 26: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

26

2. Knowledge Base Updating – Rule Editor (Define Relations)

Page 27: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

27

2. Knowledge Base Updating – Rule Editor (Event Validation)

Page 28: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

28

2. Knowledge Base Updating – Rule Editor (Define Event Rule Actions)

Page 29: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

29

3. News Querying- Search Graph

Individuals

Classes

Selected concepts

Concepts relatedto the selected node

Concepts fromkeyword search

Page 30: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

30

3. News Querying - Search Graph

Page 31: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

31

3. News Querying- SPARQL

PREFIX hermes: <http://hermes-news.org/news.owl#>SELECT ?titleWHERE {

?news hermes:title ?title .?news hermes:time ?date .?news hermes:relation ?relation .

?news hermes:relatedTo ?concept .FILTER

( ?concept hermes:relatedTo hermes:Google || ?concept hermes:relatedTo hermes:Micosoft || ?concept hermes:relatedTo hermes:Ebay || ?concept hermes:relatedTo hermes:Yahoo )

FILTER( ?date > "2009-02-01T00:00:00.000+00:01" && ?date < "2009-07-31T00:00:00.000+00:01")

}

• SPARQL query:

Page 32: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

32

3. News Querying- tSPARQL

• Custom time functions:

Function name Output typecurrentDate() xsd:date

currentTime() xsd:time

now() xsd:dateTime

dateTime-add(xsd:dateTime A, xsd:duration B) xsd:dateTime

dateTime-substract(xsd:dateTime A, xsd:duration B) xsd:dateTime

Page 33: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

33

3. News Querying- tSPARQL

PREFIX hermes: <http://hermes-news.org/news.owl#>SELECT ?titleWHERE {

?news hermes:title ?title .?news hermes:time ?date .?news hermes:relation ?relation .

?news hermes:relatedTo ?concept .FILTER

( ?concept hermes:relatedTo hermes:Google || ?concept hermes:relatedTo hermes:Micosoft || ?concept hermes:relatedTo hermes:Ebay || ?concept hermes:relatedTo hermes:Yahoo )

FILTER( ?date > hermes:dateTime-substract(hermes:now(), P0Y6M) && ?date < hermes:now())

}

• tSPARQL query:

Page 34: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

34

4. Results Presentation

Page 35: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

35

Evaluation

• Test set: 200 new items from Yahoo! business and technology news feed

• Precision for concept identification: 86%• Recall for concept identification: 81%

• Precision for event identification 62% • Recall for event identification 53%

• Subsecond performance for one news item processing time

Page 36: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

36

Evaluation

• Test users: 9 students following a Semantic Web course

• Usability: build one query (the one from the presentation) using HNP and in SPARQL

• Quantitative evaluation:– Measure the time it takes to build the query in the two approaches

– Faster to build the news query using HNP

• Qualitative evaluation:– Questionnaire

– Easier to build the news query using HNP

– HNP pros: graphical user interface, predefined time functionality, results explanation by highlighting the found concepts

– HNP cons: the layout changes from conceptual graph to search graph, results are not ordered by time

Page 37: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

37

Conclusions

• Hermes Framework: presents news items that match the user interests

• Hermes Framework:– News Classification– Knowledge Base Updating– News Querying– Results Presentation

• Hermes News Portal (HNP): an implementation of the Hermes framework

• HNP based on:– WordNet semantic lexicon, OWL ontology, (extended) SPARQL

queries, Prefuse visualization, GATE natural language processing

Page 38: 1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar frasincar@ese.eur.nl Erasmus University Rotterdam * Joint work with Kim Schouten,

38

Future Work

• Limited query expressivity:– Add conjunction to queries (e.g., retrieve all news items that

mention both Google and Yahoo!)– Add negation to queries (e.g., retrieve all news items that do not

mention Google)– Add patterns to queries (e.g., retrieve all news items that refer to

Google acquiring another company)

• Add snippets and temporal ordering to query results presentation

• Evaluate the tool outside the university lab

• Evaluate the tool for another domain (e.g., politics instead of finance)