epiphany: adaptable rdfa generation linking the web of documents to the web of data

23
Benjamin Adrian, <[email protected]> Epiphany Adaptable RDFa Generation Linking the Web of Documents to the Web of Data Benjamin Adrian, Jörn Hees Ivan Herman, Michael Sintek, Andreas Dengel

Upload: benjamin-adrian

Post on 15-May-2015

2.324 views

Category:

Technology


0 download

DESCRIPTION

This presentation is about Epiphany, a system that automatically generates RDFa annotated versions of web pages based on information from Linked Data models.

TRANSCRIPT

Page 1: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Epiphany

Adaptable RDFa Generation Linking the Web of Documents

to the Web of Data

Benjamin Adrian, Jörn HeesIvan Herman, Michael Sintek,

Andreas Dengel

Page 2: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Outline

I. Web of Document vs. Web of Data

II. RDFa, the glue combining both worlds

III. Use Linked Data for RDFa generation

I. Extract RDF from web pages

II. Visualize RDFa via Linked Data

IV. Evaluation and Comparison with Open Calais

2

Page 3: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Web of Documents

Features● Distributed textual content

● Addressed by URLs

● Layout in HTML, CSS

● Connected with hyperlinks

● Access via HTTP

Made for human readers!

3

World WideWeb

Page 4: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Web of Data

Features● Distributed data sets

● Addressed by URIs

● Format is RDF

● Connected with RDF Links

● Access via HTTP

Made for machine readers!

4

Linked OpenData

Page 5: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

A bridge from document to data

8

RDFa

RDFa

Page 6: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

A bridge from document to data

9

RDFa

RDFa

Benefits1. RDFa is easy to generate by CMS ( i.e., Drupal), and

other dynamic content providers.2. It is easy to annotate well-structured

data with RDFa

Open Problems1. How to annotate unstructured plain text content?2. How to annotate the same documents

differently based on different data bases?

Page 7: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Epiphany

10

EpiphanyRDFa Generation

HTML HTML+RDFa

Linked Data Model

1. Generate RDFa-annotated versions of web pages on-the-fly.

2. For different Linked Data Models used generate different RDFa annotations.

3. Create interactive boxes filled with additional information about annotated resources.

Linked Data Model

How to consumeLinked Data for

RDFa generation

Page 8: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Epiphany - Example

11

Tim Burton is a movie maker.

<span about=“http://dbpedia.org/resource/Tim_Burton”

property=“foaf:name”>Tim Burton</span> is a movie maker.

How to consumeLinked Data for

RDFa generation

Page 9: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Epiphany

12

Ontology-based Information Extraction

Preprocessing ExtractionPipeline

RDFaGenerator

HTML+RDFa

HTML

Cache

How to extractRDF from

web pages

Linked Data Model

RDFgraph

RDFgraphstore

Page 10: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Ontology-based Information Extraction

13

Ontology-based Information Extraction Pipeline

Text Normalization

Text Segmentation

Symbol Recognition

Instance Recognition

Contextual Fact Recognition

RDF Generation

“Ben is member of RDFa WG.”

“Ben”, “is”, “member”, “of”, “RDFa WG”, ”.”

[] foaf:name “Ben”.[] foaf:name “RDFa WG” .

<#me> foaf:name “Ben”. <#RDFaWG> foaf:name ”RDFa WG.”

<#me> foaf:member of <#RDFaWG>.

<#me> foaf:name “Ben”. <#RDFaWG> foaf:name ”RDFa WG.”<#me> foaf:member <#RDFaWG>.

RDF-based Information

Extraction

Page 11: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

RDFa Generation

14

DOM node traversal

requestHTML

tidy toXHTML

list of RDF triples with literal object values

am:Burton foaf:name “Tim Burton”.am:august rdfs:label “August”.am:autor foaf:name “Autor”.Am:film foaf:name “Film”.

How to create semantic

annotations

Page 12: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

RDFa Generation

15

for each text node

around matches create html:SPAN element with

RDFa attributes

DOM node traversal

requestHTML

tidy toXHTML

list of RDF triples with literal object values

am:Burton foaf:name “Tim Burton”.am:august rdfs:label “August”.am:autor foaf:name “Autor”.Am:film foaf:name “Film”.

How to create semantic

annotations

Page 13: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

RDFa Generation

16

for each text node

around matches create html:SPAN element with

RDFa attributes

DOM node traversal

requestHTML

tidy toXHTML

add link to RDF graph to header

add js:onclick listener to elements with RDFa:about attributes

list of RDF triples with literal object values

am:Burton foaf:name “Tim Burton”.am:august rdfs:label “August”.am:autor foaf:name “Autor”.Am:film foaf:name “Film”.

How to create semantic

annotations

Page 14: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

RDFa Visualization

17

onclick event on <SPAN/> elements

AJAX call toInformation Provider GET /resource/Tim_Burton

HTTP 1.1 HOST: dbpedia.orgACCEPT: RDFRender RDF data in

HTML as lighting box

Use RDF togenerate

Epiphanies

Browser Epiphany Linked Data

Page 15: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Evaluation

18

ComparedEpiphany and

Open Calais

Goal: Epiphany is at least as good as Open Calais

Epiphany Open Calais

Adaptable to any Linked Data Model

Fixed on its own data set

Preserves the original vocabulary Proprietary vocabulary

Rated with confidence [0, 1] Sometimes rated with a score value

RDFa annotated version of web page

Only RDF or list of RDFa in <SPAN/>s

Performs instance disambiguation No disambiguation

But:

Page 16: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Evaluation

19

Linked data model : 12,462 pages + RDF graphs by BBC Music Artists

ComparedEpiphany and

Open Calais

</music/artists/0383dadf-2a4e-4d10-a46a-e9e041da8eb3#artist> rdf:type mo:MusicGroup ;rdf:type mo:MusicArtist ;

foaf:name „Queen“ .…

http://www.bbc.co.uk/music/artists/0383dadf-2a4e-4d10-a46a-e9e041da8eb3http://www.bbc.co.uk/music/artists/0383dadf-2a4e-4d10-a46a-e9e041da8eb3.rdf

Page 17: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Evaluation – BBC Corpus

20

ComparedEpiphany and

Open Calais

Open Calais:

oc:Personoc:MusicGroupoc:match, oc:name

BBC / Epiphany:

mo:SoloMusicArtist mo:MusicGroup foaf:name

For comparing results generated by Open Calais and Epiphany,we had to align Open Calais’ results to BBC’s vocabulary.

Page 18: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Evaluation

22

mo:SoloMusicArtist with known foaf:name values?

e.g., [] foaf:name “Brian May”; a mo:SoloMusicArtist.

Solo MusicArtists

Page 19: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Evaluation

23

mo:MusicGroup with known foaf:name values?

e.g., [] foaf:name “Queen”; a mo:MusicGroup .

MusicGroups

Page 20: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Discussion

Ambigious Music Group Names

Occurrence in corpus

Off 3,991

Free 5,715

Contact 12,461

Fin 12,461

Food 12,461

Sport 12,461

24

DisambiguationProblems

Page 21: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Summary

25

Applied to several domains

• DBpedia• SKOS-topic maps• Personal Information Models (PIMO)• BBC Music Artists (Music Ontology)• Amazon (Good Relations)

At a glance

Epiphany is an RDFa Generator that enriches Web Pages with Information from Linked Data Models.

Demo Version

http://projects.dfki.uni-kl.de/epiphany/

FinalOverview

Page 22: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Future Work

Evaluate Epiphany in other domains

Increase Precision by adding context analyses

Deploy Epiphany for DBpedia content on a scalable server

Add provenance information

Integrate existing RDFa widgets and visualisations

26

NextSteps

Page 23: Epiphany: Adaptable RDFa Generation Linking the Web of Documents to the Web of Data

Benjamin Adrian, <[email protected]>

Thank you for Listening!

Michael SintekDFKI

Benjamin AdrianDFKI

Jörn HeesUniversity of Kaiserslautern

Ivan HermanW3C

Andreas DengelDFKI

Contributors

27