sda2012 pundit system
TRANSCRIPT
PUNDIT: SEMANTICALLY STRUCTURED ANNOTATIONS FOR WEB CONTENTS
AND DIGITAL LIBRARIESMarco Grassi(1), Christian Morbidoni(2), Michele Nucci(3),
Simone Fonda(4), Giovanni Ledda(5)
(1,2,3,5) DII - Department of Information Engineering. Polytechnic University of Le Marche, Ancona, Italy(4) - NET7 srl
SDA 2012Semantic Digital Archives
Semedia(Semantic Web and Multimedia)http://semedia.dii.univpm.it www.netseven.it/
This work is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0)
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
THE WEB SCENARIO
• Annotating web content has become a common task• Comments and tags are widely supported by
mainstream application
• Many tools to bookmark, highlight, comment web page fragments
• Some tools support collaborative annotations
• Web content annotations are beneficial:• More engaging and productive user experience
• Exploit social engagement to improve resource ranking, classification
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DL SCENARIO
• Crowdsourcing experiments for enriching DL, curating contents or uploading digital material of interest for the DL (BBC WW2 People’s War, …)
• Digital Libraries (DL) are no longer simple “expositions” of digital objects but provide users with more interaction
Digital Library
Consume Contents
Create Contents
Experts
Users
Expert modelDigital Library
Consume Contents
Create Contents
Experts
Users
TaggingConsumeContents
Linking
Commenting
Social Engagement
User Interaction
Digital LibraryConsume Contents
Create Contents
Experts
Users
TaggingConsumeContents
Linking
Commenting
Add Content Add Annotations
Crowdsourcing
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
WHAT’S MISSING? ...
• Most of existing annotation tools are
usually limited to simple textual tags and
comments.
• limitation due to the ambiguity of natural
language
• their semantic is not machine interpretable
Limitation in the efficiency of resource classification and retrieval and in the possibility to reuse these annotations in other context
Orange?
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
SEMANTICALLY STRUCTURED ANNOTATIONS
• Semantically structured annotations to make smart use of such added knowledge:
• Unambiguously express semantics to be processed by software agents: • annotations can be harvested periodically and publish back• used by recommender systems or search engines,• ...
• Enhance Digital Libraries capabilities
• improving browsing• enabling automatic content classification• ...
• Reuse such a collaborative knowledge in different contexts and by different applications
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
SEMANTICALLY STRUCTURED ANNOTATIONS
User should be able to create knowledge graphs where web content fragments, concepts and entities are meaningfully connected.
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
SEMANTICALLY STRUCTURED ANNOTATIONS
• Rely on controlled vocabularies and ontologies• share the same terminology and “talk about the same things”• annotations can be meaningfully mashed-up
• Link to the emerging Web of Data• a software can automatically get additional, useful semantic data (e.g. date and place of
birth, pictures, citations, multi-language data)
Augmenting the information of the annotation and of the original content to support
smarter application behaviors!
Ex. We have discovered that the two images contain american film actors showing anger emotion!
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
• developed by:
• funded by:
• supported and further developed in:
• Pundit is a novel semantic annotation tool:
Semedia (Semantic Web and Multimedia)http://semedia.dii.univpm.it
with the collaboration of NET7
Semlib Project Eu Projecthttp://semedia.dii.univpm.it
DM2E EU Projecthttp://dm2e.edu/
AGORA EU Projecthttp://project-agora.eu/
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
SEMLIB PROJECT
• R&D project supported by EU FP7 Theme: Research for SMEs (no. FP7-SME -2010-01- 262301 - SEMLIB)
• 24 months (commenced in January 2011, currently at month 19)
Semlib ProjectSemantic Web Tools for DL
http://www.semlibproject.eu/
www.netseven.it/www.knowledgehives.com/www.liberologico.com/www.in-two.com
www.semedia.dii.univpm.it/ www.deri.ie/
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
ANNOTATION MODEL
Contextual Information
• Based on Open Annotation Collaboration (OAC) ontology (currently working to provide full compliancy with OA)
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
ANNOTATION MODEL• Based on Open Annotation Collaboration (OAC) ontology
(currently working to provide full compliancy with OA)
Contextual Information
Annotation Content
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
ANNOTATION MODEL
Contextual Information
Annotation Content
Semantically Structured Content
• Based on Open Annotation Collaboration (OAC) ontology (currently working to provide full compliancy with OA)
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
ANNOTATION MODEL
Contextual Information
Annotation Content
Named Graph
SPARQL support to query slices of knowledge
• Based on Open Annotation Collaboration (OAC) ontology (currently working to provide full compliancy with OA)
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NAMED GRAPHS AS BODIES
An example annotation showing the annotation model
oac:Annotation
ex:MarcoGrassi
http://example.com/mypage.htm#textFragment
a
oac:hasTarget
rdfs:comment
2011-01-27 10:30:56
http://example.com/img1.jpeg
oac:hasTarget
ex:ANNOTATION-GRAPH-ID-1
http://example.com/mypage.htm#textFragment
semlib:Renassance
semlib:DanteAlighieri
http://example.com/img1.jpeg
http://example.com/1.htm
semlib:mentionsAuthor
semlib:depicts
Fragment: Durante gli Alighieri...
rdfs:label
semlib:mentionsPeriod
Annotation 1
rdfs:label
dcterms:created
dcterms:creator
ex:ANNOTATION-ID-1
oac:hasBody
Another annotation whose content can be merged with the former one
oac:Annotation
ex:MarcoGrassi
http://example.com/mypage.htm#textFragment2
a
oac:hasTarget
rdfs:comment
2011-09-27 11:43:12
ex:ANNOTATION-GRAPH-ID-2
http://example.com/mypage.htm#textFragment
2
semlin:Renassance
semlib:Politics
http://example.com/mypage.htm#textFragment
semlib:talksAbout
Fragment: Dante Alighieri life has
been..
rdfs:label
semlib:mentionPeriod
Annotation 2
rdfs:label
dcterms:created
dcterms:creator
ex:ANNOTATION-ID-2
oac:hasBody
semlib:hasSimilarContent
...allow to keep separated statements belonging to different annotations...
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NAMED GRAPHS AS BODIES
An example annotation showing the annotation model
oac:Annotation
ex:MarcoGrassi
http://example.com/mypage.htm#textFragment
a
oac:hasTarget
rdfs:comment
2011-01-27 10:30:56
http://example.com/img1.jpeg
oac:hasTarget
ex:ANNOTATION-GRAPH-ID-1
http://example.com/mypage.htm#textFragment
semlib:Renassance
semlib:DanteAlighieri
http://example.com/img1.jpeg
http://example.com/1.htm
semlib:mentionsAuthor
semlib:depicts
Fragment: Durante gli Alighieri...
rdfs:label
semlib:mentionsPeriod
Annotation 1
rdfs:label
dcterms:created
dcterms:creator
ex:ANNOTATION-ID-1
oac:hasBody
Another annotation whose content can be merged with the former one
oac:Annotation
ex:MarcoGrassi
http://example.com/mypage.htm#textFragment2
a
oac:hasTarget
rdfs:comment
2011-09-27 11:43:12
ex:ANNOTATION-GRAPH-ID-2
http://example.com/mypage.htm#textFragment
2
semlin:Renassance
semlib:Politics
http://example.com/mypage.htm#textFragment
semlib:talksAbout
Fragment: Dante Alighieri life has
been..
rdfs:label
semlib:mentionPeriod
Annotation 2
rdfs:label
dcterms:created
dcterms:creator
ex:ANNOTATION-ID-2
oac:hasBody
semlib:hasSimilarContent
http://example.com/mypage.htm#textFragment
semlib:Renassance
semlib:DanteAlighierihttp://example.com/
img1.jpeg
semlib:mentionsAuthor
semlib:depicts
Fragment: Durante gli Alighieri...
rdfs:labelsemlib:mentionsPeriod
http://example.com/mypage.htm#textFragment
2
semlib:Politics
semlib:talksAbout
Fragment: Dante Alighieri life has
been..rdfs:label
semlib:mentionPeriod
semlib:hasSimilarContent
...allow to keep separated statements belonging to different annotations...
...but enable to aggregate them into “composite’ graphs and query them using standard SPARQL
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NOTEBOOKS• Annotations are collected in notebooks
NotebookURI
2011-01-27 10:30:56
My Example Notebook
An Example Notebook used to show the model
dcterms:creator
dcterms:created
rdfs:label
rdfs:comment
• Users can organize their annotations
• Aggregate annotations to be retrieved and queried
• Different UNIX style read/write privileges (from private to completely public)*
• Activate/Deactivate a notebook to filter the amount of public annotations visualizing only those of interest.
• Identified by a (dereferenciable) URI
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NOTEBOOKS• Notebooks allow annotations sharing
NotebookURI
2011-01-27 10:30:56
My Example Notebook
An Example Notebook used to show the model
dcterms:creator
dcterms:created
rdfs:label
rdfs:comment
SINGLE USER
COMMUNITIES
PUBLIC
SHARE
NotebookURI
SHARENotebookURI
SHARE
NotebookURI
WIKI
• Sharing a notebook is as easy as sharing its URL on the web (similarly to popular file sharing platforms)
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NOTEBOOK MANAGEMENT
• Create new notebooks
• Set the current notebook (where the annotations are written)
• Set notebook private or public
• Activate/deactivate owned notebooks or public notebook to filter annotations of interest
• Share notebook by URI
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
USER AUTHENTICATION
• Authentication is based on OpenID:
• No need to store user’s credentials
• Implemented already by mainstream company (Google, Yahoo, ...)
• Possibly avoid user multiple registration (waste of time, another password)
• Single identity can be used among different Pundit-enabled Digital Libraries
• Adding an OpenID provider is easy and transparent to the Pundit server.
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
PUNDIT ARCHITECTURE
• Open Source RESTful Web Service (Java Jersey framework)
• Cross origin request• CORS (Cross-Origin Resource Sharing)
• JSONP
• Sesame triple store• SPARQL and inference
• Different sail are provided to implement different storages (BigOWLIM, MySQL, PostgreeSQL, Virtuoso ...)
• MySQL for user data
• RESTful API to edit and consume annotations
• Set of Javascript modules (Dojo Framework)• Easily extendable
• Highly customizableCLI
ENT
SERV
ER
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DIFFERENT ANNOTABLE CONTENTS
• Pundit allows the annotation of different types of contents at different level of granularity
• Text fragments
• Images
• Image fragments (under development)
• Videos and video fragments (experimented in Semtube)
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
• Semantic annotation of YouTube videos (alpha state) based on Pundit JavaScript libraries and annotation server
http://semedia.dii.univpm.it/semtube
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DIFFERENT TYPES OF ANNOTATIONS
Annotation with different levels of expressivity and structure
Comment/Tag Panel
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DIFFERENT TYPES OF ANNOTATIONS
• Textual comments
Annotation with different levels of expressivity and structure
Comment/Tag Panel
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DIFFERENT TYPES OF ANNOTATIONS
• Textual comments• Semantic Tags
• Automatically extracted from textual comments (Dbpedia Spotlight)
Annotation with different levels of expressivity and structure
Comment/Tag Panel
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DIFFERENT TYPES OF ANNOTATIONS
• Textual comments• Semantic Tags
• Automatically extracted from textual comments (Dbpedia Spotlight)
• Popular Linked Data service(Dbpedia, Freebase, Wordnet, ..)
• Define your own source of named entities (SPARQL endpoint, HTTP API)
Annotation with different levels of expressivity and structure
Comment/Tag Panel
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DIFFERENT TYPES OF ANNOTATIONS
• Textual comments• Semantic Tags• Semantic Relations
• Subject-Property-Object Statements
• Drag&Drop and suggestions
• Connect different resources (user selection, linked data entities, ...) with semantically defined properties
Annotation with different levels of expressivity and structure
Triple Composer
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DIFFERENT TYPES OF ANNOTATIONS
• Textual comments• Semantic Tags• Semantic Relations
• Subject-Property-Object Statements
• Drag&Drop and suggestions
• Connect different resources (user selection, linked data entities, ...) with semantically defined properties
Annotation with different levels of expressivity and structure
Triple Composer
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
CUSTOM VOCABULARIES• Pundit allows to use custom vocabularies/taxonomies (and
relations):• Create a JSONp file (manually or automatically from an ontology )
• Put it online
• Add its URL to the configuration to import and use it
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
CROSS PAGE / DOMAIN ANNOTATIONS• Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations
• Selected resources (text fragments, images, ...) on different pages and domain can be added to “My Items” to be stored on server and reused on different pages
Add to My Items
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
CROSS PAGE / DOMAIN ANNOTATIONS• Special Bookmarklet allows to lunch Pundit on every Web page to perform annotations
• Selected resources (text fragments, images, ...) on different pages and domain can be added to “My Items” to be stored on server and reused on different pages
Add to My Items
Use in another page
Create cross page semantic relations
cites
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NAMED CONTENT• DLs change over time
• Presentation can restyled and content can be re-organized
• Same content in different pages• Some part of the page should not be
annotated (menu, ...)
• Specific markup can be added in the pages to allows Pundit:• identifying atomic pieces of content (by
means of URI)• attaching the annotations to such
contents• avoid the annotation of page accessory
component
<div class="pundit-content" about="http://example.org/contents/123"> <!-- HTML goes here. --> <p>This is a named content and contains both text and a picture</p> <img src="http://example.org/pictires/pictire123.png" /> <p><em>Caption:</em> this is a caption.</p></div>
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NAMED CONTENT• DLs change over time
• Presentation can restyled and content can be re-organized
• Same content in different pages• Some part of the page should not be
annotated (menu, ...)
• Specific markup can be added in the pages to allows Pundit:• identifying atomic pieces of content (by
means of URI)• attaching the annotations to such
contents• avoid the annotation of page accessory
component
<div class="pundit-content" about="http://example.org/contents/123"> <!-- HTML goes here. --> <p>This is a named content and contains both text and a picture</p> <img src="http://example.org/pictires/pictire123.png" /> <p><em>Caption:</em> this is a caption.</p></div>
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NAMED CONTENT
The same content in different pages shows the same annotations!
Text
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
NAMED CONTENT
The same content in different pages shows the same annotations!
Text
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
CONSUMING THE ANNOTATIONS
• PUNDIT server provides RESTfull APIs to consume annotations.
• (Public) annotations can be consumed by third party applications.
• Currently conceiving and developing apps to display and reuse pundit annotation
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
ASK THE PUND
• A social web app consuming people's annotations, which let group of people to organize them into a shared collection, telling a meaningful story with it.
http://ask.thepund.it/
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
EDGEMAPS VISUALIZATION
• An Edgemaps graph populated with Pundit annotations
http://thepund.it/edgemaps_demo/demo.html
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
TIMELINE ANNOTATION
http://ask.thepund.it/#/timeline/31951d93
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
MORE...• Find our and suggest more: http://thepund.it/okfest.php
...and don’t forget to leave some feedbacks :-) !!!
Pundit: Semantically Structured Annotations for Web Contents... [email protected] 2012
DEMO TIME!
http://thepund.it
http://thepund.it
THANK YOU!
Semlib Project Eu Projecthttp://www.semlibproject.eu/
DM2E EU Projecthttp://dm2e.edu/
AGORA EU Projecthttp://project-agora.eu/
SDA 2012Semantic Digital Archives
Semedia(Semantic Web and Multimedia)http://semedia.dii.univpm.it www.netseven.it/
This work is licensed under a Creative Commons Attribution 3.0 Unported (CC BY 3.0)