epiphany: adaptable rdfa generation linking the web of documents to the web of data
DESCRIPTION
This presentation is about Epiphany, a system that automatically generates RDFa annotated versions of web pages based on information from Linked Data models.TRANSCRIPT
Benjamin Adrian, <[email protected]>
Epiphany
Adaptable RDFa Generation Linking the Web of Documents
to the Web of Data
Benjamin Adrian, Jörn HeesIvan Herman, Michael Sintek,
Andreas Dengel
Benjamin Adrian, <[email protected]>
Outline
I. Web of Document vs. Web of Data
II. RDFa, the glue combining both worlds
III. Use Linked Data for RDFa generation
I. Extract RDF from web pages
II. Visualize RDFa via Linked Data
IV. Evaluation and Comparison with Open Calais
2
Benjamin Adrian, <[email protected]>
Web of Documents
Features● Distributed textual content
● Addressed by URLs
● Layout in HTML, CSS
● Connected with hyperlinks
● Access via HTTP
Made for human readers!
3
World WideWeb
Benjamin Adrian, <[email protected]>
Web of Data
Features● Distributed data sets
● Addressed by URIs
● Format is RDF
● Connected with RDF Links
● Access via HTTP
Made for machine readers!
4
Linked OpenData
Benjamin Adrian, <[email protected]>
A bridge from document to data
9
RDFa
RDFa
Benefits1. RDFa is easy to generate by CMS ( i.e., Drupal), and
other dynamic content providers.2. It is easy to annotate well-structured
data with RDFa
Open Problems1. How to annotate unstructured plain text content?2. How to annotate the same documents
differently based on different data bases?
Benjamin Adrian, <[email protected]>
Epiphany
10
EpiphanyRDFa Generation
HTML HTML+RDFa
Linked Data Model
1. Generate RDFa-annotated versions of web pages on-the-fly.
2. For different Linked Data Models used generate different RDFa annotations.
3. Create interactive boxes filled with additional information about annotated resources.
Linked Data Model
How to consumeLinked Data for
RDFa generation
Benjamin Adrian, <[email protected]>
Epiphany - Example
11
Tim Burton is a movie maker.
<span about=“http://dbpedia.org/resource/Tim_Burton”
property=“foaf:name”>Tim Burton</span> is a movie maker.
How to consumeLinked Data for
RDFa generation
Benjamin Adrian, <[email protected]>
Epiphany
12
Ontology-based Information Extraction
Preprocessing ExtractionPipeline
RDFaGenerator
HTML+RDFa
HTML
Cache
How to extractRDF from
web pages
Linked Data Model
RDFgraph
RDFgraphstore
Benjamin Adrian, <[email protected]>
Ontology-based Information Extraction
13
Ontology-based Information Extraction Pipeline
Text Normalization
Text Segmentation
Symbol Recognition
Instance Recognition
Contextual Fact Recognition
RDF Generation
“Ben is member of RDFa WG.”
“Ben”, “is”, “member”, “of”, “RDFa WG”, ”.”
[] foaf:name “Ben”.[] foaf:name “RDFa WG” .
<#me> foaf:name “Ben”. <#RDFaWG> foaf:name ”RDFa WG.”
<#me> foaf:member of <#RDFaWG>.
<#me> foaf:name “Ben”. <#RDFaWG> foaf:name ”RDFa WG.”<#me> foaf:member <#RDFaWG>.
RDF-based Information
Extraction
Benjamin Adrian, <[email protected]>
RDFa Generation
14
DOM node traversal
requestHTML
tidy toXHTML
list of RDF triples with literal object values
am:Burton foaf:name “Tim Burton”.am:august rdfs:label “August”.am:autor foaf:name “Autor”.Am:film foaf:name “Film”.
How to create semantic
annotations
Benjamin Adrian, <[email protected]>
RDFa Generation
15
for each text node
around matches create html:SPAN element with
RDFa attributes
DOM node traversal
requestHTML
tidy toXHTML
list of RDF triples with literal object values
am:Burton foaf:name “Tim Burton”.am:august rdfs:label “August”.am:autor foaf:name “Autor”.Am:film foaf:name “Film”.
How to create semantic
annotations
Benjamin Adrian, <[email protected]>
RDFa Generation
16
for each text node
around matches create html:SPAN element with
RDFa attributes
DOM node traversal
requestHTML
tidy toXHTML
add link to RDF graph to header
add js:onclick listener to elements with RDFa:about attributes
list of RDF triples with literal object values
am:Burton foaf:name “Tim Burton”.am:august rdfs:label “August”.am:autor foaf:name “Autor”.Am:film foaf:name “Film”.
How to create semantic
annotations
Benjamin Adrian, <[email protected]>
RDFa Visualization
17
onclick event on <SPAN/> elements
AJAX call toInformation Provider GET /resource/Tim_Burton
HTTP 1.1 HOST: dbpedia.orgACCEPT: RDFRender RDF data in
HTML as lighting box
Use RDF togenerate
Epiphanies
Browser Epiphany Linked Data
Benjamin Adrian, <[email protected]>
Evaluation
18
ComparedEpiphany and
Open Calais
Goal: Epiphany is at least as good as Open Calais
Epiphany Open Calais
Adaptable to any Linked Data Model
Fixed on its own data set
Preserves the original vocabulary Proprietary vocabulary
Rated with confidence [0, 1] Sometimes rated with a score value
RDFa annotated version of web page
Only RDF or list of RDFa in <SPAN/>s
Performs instance disambiguation No disambiguation
But:
Benjamin Adrian, <[email protected]>
Evaluation
19
Linked data model : 12,462 pages + RDF graphs by BBC Music Artists
ComparedEpiphany and
Open Calais
</music/artists/0383dadf-2a4e-4d10-a46a-e9e041da8eb3#artist> rdf:type mo:MusicGroup ;rdf:type mo:MusicArtist ;
foaf:name „Queen“ .…
http://www.bbc.co.uk/music/artists/0383dadf-2a4e-4d10-a46a-e9e041da8eb3http://www.bbc.co.uk/music/artists/0383dadf-2a4e-4d10-a46a-e9e041da8eb3.rdf
Benjamin Adrian, <[email protected]>
Evaluation – BBC Corpus
20
ComparedEpiphany and
Open Calais
Open Calais:
oc:Personoc:MusicGroupoc:match, oc:name
BBC / Epiphany:
mo:SoloMusicArtist mo:MusicGroup foaf:name
For comparing results generated by Open Calais and Epiphany,we had to align Open Calais’ results to BBC’s vocabulary.
Benjamin Adrian, <[email protected]>
Evaluation
22
mo:SoloMusicArtist with known foaf:name values?
e.g., [] foaf:name “Brian May”; a mo:SoloMusicArtist.
Solo MusicArtists
Benjamin Adrian, <[email protected]>
Evaluation
23
mo:MusicGroup with known foaf:name values?
e.g., [] foaf:name “Queen”; a mo:MusicGroup .
MusicGroups
Benjamin Adrian, <[email protected]>
Discussion
Ambigious Music Group Names
Occurrence in corpus
Off 3,991
Free 5,715
Contact 12,461
Fin 12,461
Food 12,461
Sport 12,461
24
DisambiguationProblems
Benjamin Adrian, <[email protected]>
Summary
25
Applied to several domains
• DBpedia• SKOS-topic maps• Personal Information Models (PIMO)• BBC Music Artists (Music Ontology)• Amazon (Good Relations)
At a glance
Epiphany is an RDFa Generator that enriches Web Pages with Information from Linked Data Models.
Demo Version
http://projects.dfki.uni-kl.de/epiphany/
FinalOverview
Benjamin Adrian, <[email protected]>
Future Work
Evaluate Epiphany in other domains
Increase Precision by adding context analyses
Deploy Epiphany for DBpedia content on a scalable server
Add provenance information
Integrate existing RDFa widgets and visualisations
26
NextSteps
Benjamin Adrian, <[email protected]>
Thank you for Listening!
Michael SintekDFKI
Benjamin AdrianDFKI
Jörn HeesUniversity of Kaiserslautern
Ivan HermanW3C
Andreas DengelDFKI
Contributors
27