(linked data development and exploitation track) "generating the semantic snapshot of newscasts...

14
GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY @peputo / [email protected] @giusepperizzo / [email protected] [email protected] @McHildebrand / [email protected] @rtroncy / [email protected]

Upload: icwe2015

Post on 14-Aug-2015

38 views

Category:

Internet


0 download

TRANSCRIPT

GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION

JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY

@peputo / [email protected] @giusepperizzo / [email protected]

[email protected] @McHildebrand / [email protected]

@rtroncy / [email protected]

NEWS CONSUMPTION SEMANTIC SNAPSHOT (NSS)

Named Entity Expansion

News item

2

News Semantic Snapshot (NSS)

Snowden asks Russia for asylum

15th International Conference on Web Engineering (ICWE) June 24, 2015

NEWS ENTITY EXPANSION

NSS

June 24, 2015 3

(20) (1) (4) (4) Web-based, Unsupervised, Sequential

15th International Conference on Web Engineering (ICWE)

Involving: (experts in the news domain + users) Dimensions: Play with the data and help us to extend it at: https://github.com/jluisred/NewsConceptExpansion/wiki/Golden-Standard-Creation

EVALUATION: NEWS ENTITIES GOLD STANDARD

(1) Video Subtitles (2) Image in the video (3) Text in the video image (4) Suggestions of an expert (5) Related articles

4 June 24, 2015 15th International Conference on Web Engineering (ICWE)

DOCUMENT COLLECTION

(20 variations)

Using Google Custom Search Engine (CSE)1

[1] https://cse.google.com/cse/all

June 24, 2015 5

N

… N N N N N

N N N N N N N N N N

N N N

Web sites to be crawled: -  Google:

-  L1 : A set of 10 internationals English speaking newspapers

-  L2 : A set of 3 international newspapers used in GS

Temporal Window: -  1W:

-  2W:

Annotation filtering:

15th International Conference on Web Engineering (ICWE)

DOCUMENT ANNOTATION

NER extractors in NERD *

(*) Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web, Rizzo et al. (2004)

6 June 24, 2015 15th International Conference on Web Engineering (ICWE)

ENTITY FILTERING (4 variations)

Filtering dimensions: -  F1: NERD type:

-  Person -  Organization -  Location

-  F2: Confidence score: > Threshold

-  F3: Capitalization: country president Obama asylum

June 24, 2015 7 15th International Conference on Web Engineering (ICWE)

RANKING STRATEGIES (1)

increase representativeness è leverage on entity frequency

June 24, 2015 8

(Freq) (Gaussian) 15th International Conference on Web Engineering (ICWE)

RANKING STRATEGIES (2)

Rules: [ Sel(e) , ]

POPULARITY EXPERT RULES

9

-  Based on Google Trends -  w = 2 months -  µ + 2*σ (2.5%) -  .

Example: -  [ Location, = 0.48 ] -  [ Person, = 0.74 ] -  [ Organization, = 0.95 ] -  [ < 2 , = 0.0 ]

(4 variations)

June 24, 2015 15th International Conference on Web Engineering (ICWE) 9

EVALUATION: MEASURES

Mean P/R at N: -  Most popular -  Easy to interpret

Mean Average Precision at N (MAP): -  Considers ranking -  Relevant documents at the top positions Mean Normalized Discounted Cumulative Gain at N (MNDCG): -  Different levels of document relevance -  The lower an high relevant document is ranked, the less useful

is for the user N = 10

June 24, 2015 10 15th International Conference on Web Engineering (ICWE)

RESULTS (1) Baselines: BS1: Former Entity Expansion Implementation* •  Google •  No temporal window •  No_Schema.org •  No_Filter • 

BS2: TFIDF-based Function.

June 24, 2015 11 15th International Conference on Web Engineering (ICWE)

(*) Describing and Contextualizing Events in TV News Show, Redondo et

al. (2014)

RE

SU

LTS

(2)

12

20 x 4 x 4 =

320 runs

F3

Freq + POP + EXP

Google + 2W + Schema.org

12

CONCLUSIONS & FUTURE WORK -  News Entity Expansion è Generate the News

Semantic Snapshot -  Best score: 0.666 in MNDCG at 10, better than BS1/2

•  Collection: CSE (Google + 2W + Schema.org) •  Filtering: F3 •  Ranking: Freq + POP + EXP

What’s next: -  Extend the Ground Truth -  Supervised approach -  Better exploit semantic connections between entities in KB -  Is MNDCG@10 an ideal indicator for assessing NSS quality?

June 24, 2015 13 15th International Conference on Web Engineering (ICWE)

JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY

@peputo / [email protected] @giusepperizzo / [email protected]

[email protected] @McHildebrand / [email protected]

@rtroncy / [email protected]

http://www.slideshare.net/joseluisredondo/newssemantic