semantic tagging for the xwiki platform with zemanta and dbpedia

13
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia Elena-Oana T˘ ab˘ aranu and Anna-Maria Metzak Faculty of Computer Science “Alexandru I. Cuza” University of Ia¸ si {elena.tabaranu,anna.metzak}@info.uaic.ro Abstract. Tags are a very efficient method of describing information with metadata. Adding semantic information to the keywords allows computers to comprehend what the pages are saying and use that knowl- edge to offer better service to humans when interacting with them. The tagging extension for the XWiki Platform links the user-defined keywords with semantic information from the DBpedia knowledge base. Key words: XWiki, Zemanta, DBpedia, knowledge base, Semantic Web, tagging, Common Tag 1 Introduction A tag is a relevant keyword or term associated with specific content. Labeling by keywords has long been used in scientific publications. Recent comeback hap- pened when web users and developers realized tags are a very efficient method of describing information with metadata. The goal of this project is to extend a conventional open source Web ap- plication with semantic information. The Semantic Tagging XWiki component enriches the tagging mechanism for the XWiki Platform using the content rec- ommendation tool Zemanta 1 and the knowledge base DBpedia 2 . The XWiki semantic tagging mechanism allows the user to get suggestions when adding new tags and have links for each new tag to concepts extracted from the world’s biggest knowledge base, Wikipedia. 2 The XWiki Platform XWiki is a open source platform for developing collaborative web applications using the wiki paradigm. XWiki Products are based on the XWiki Platform 1 Zemanta is a tool which brings relevant content from around the web brought as the user is typing. The API allows to bring these related Images, Articles, Hyperlinks and Tags to your Application. 2 DBpedia is a community effort toextract structured information from Wikipedia andtomake this information available onthe Web.

Upload: elena-oana-tabaranu

Post on 06-May-2015

4.078 views

Category:

Technology


0 download

DESCRIPTION

Tags are a very effcient method of describing information with metadata. Adding semantic information to the keywords allows computers to comprehend what the pages are saying and use that knowledge to o er better service to humans when interacting with them. The tagging extension for the XWiki Platform links the user-defi ned keywords with semantic information from the DBpedia knowledge base.

TRANSCRIPT

Page 1: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

Semantic Tagging for the XWiki Platform withZemanta and DBpedia

Elena-Oana Tabaranu and Anna-Maria Metzak

Faculty of Computer Science“Alexandru I. Cuza” University of Iasi

{elena.tabaranu,anna.metzak}@info.uaic.ro

Abstract. Tags are a very efficient method of describing informationwith metadata. Adding semantic information to the keywords allowscomputers to comprehend what the pages are saying and use that knowl-edge to offer better service to humans when interacting with them. Thetagging extension for the XWiki Platform links the user-defined keywordswith semantic information from the DBpedia knowledge base.

Key words: XWiki, Zemanta, DBpedia, knowledge base, Semantic Web,tagging, Common Tag

1 Introduction

A tag is a relevant keyword or term associated with specific content. Labelingby keywords has long been used in scientific publications. Recent comeback hap-pened when web users and developers realized tags are a very efficient methodof describing information with metadata.

The goal of this project is to extend a conventional open source Web ap-plication with semantic information. The Semantic Tagging XWiki componentenriches the tagging mechanism for the XWiki Platform using the content rec-ommendation tool Zemanta1 and the knowledge base DBpedia2. The XWikisemantic tagging mechanism allows the user to get suggestions when addingnew tags and have links for each new tag to concepts extracted from the world’sbiggest knowledge base, Wikipedia.

2 The XWiki Platform

XWiki is a open source platform for developing collaborative web applicationsusing the wiki paradigm. XWiki Products are based on the XWiki Platform

1 Zemanta is a tool which brings relevant content from around the web brought as theuser is typing. The API allows to bring these related Images, Articles, Hyperlinksand Tags to your Application.

2 DBpedia is a community effort toextract structured information from Wikipediaandtomake this information available onthe Web.

Page 2: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

2 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

which provides common services and UI to them. XWiki is a second generationwiki that provides all the basic content management and administration featuresof common wikis, but with much more. XWiki takes the wiki approach to a wholenew level by providing enhanced features and capabilities. With XWiki, you canbuild simple applications, extend the platform with custom plugins/components,or even build complex Web applications.

Some of the features offered by the XWiki Platform are:

– Edit pages by using wiki syntax to format text, create tables, create links,display images, etc. Alternatively use a powerful WYSIWYG editor to editthe content of documents.

– Create, Edit, Show, Print, Delete, Copy, Move and Rename documents.– Export wiki pages to PDF, RTF, XML or HTML.– Attach as many files as you want to any page. These files can then be refer-

enced and used in page contents.– Control who can view, edit or delete documents in a flexible manner. Apply

rights to a document, a space or an entire wiki.– Use XWiki’s programming API directly into your pages (Velocity or Groovy)

to perform advanced formatting, layout or anything really.– Create applications by grouping several pages together. Import and export

Applications to/from your wiki.

Examples of applications that non-developers can create quickly and in an or-ganic manner using XWiki:

– A blogging application.– An RSS feed aggregator.– Mashups. For example combining Google Maps with Delicious with Flickr

with Google Base with Google Calendar, etc.– Collaborative authoring of documents in real time.– Form-based applications to enter collections of items– A Poll/Survey application

2.1 The XWiki Platform Core

XWiki Core is a single historic JAR that is split into several distinct modulesand that currently implements the following features:

– Model: All the classes representing the wiki model, i.e. the following notions:Document, Space, Wiki, Classes/Objects, Attachments and more.

– XWiki Syntax 1.0 Rendering: This is the old service for rendering XWikiSyntax 1.0 which we keep for backward compatibility so that existing userscan keep using the XWiki Syntax 1.0. For all other syntaxes there’s now anew Rendering Module.

– Localization: Handles translations in various languages. A new Localizationmodule is under development that will replace this old module.

Page 3: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 3

Fig. 1. The XWiki Platform Architecture.

– Notification: Handles event registration and distribution. For example codecan subscribe to receive an event when a new document is created.

– Exports (PDF, RTF, XAR). In the future this will be done by implementingspecific Renderers in the new Rendering Module.

– Security: Authentication and Authorization handling.– User Management

2.2 The XWiki Platform Plugins

The plugins created and maintained by the XWiki development team are ei-ther in their own JAR, either are still located in the XWiki Core JAR. Besidesthese ones, others plugins have been contributed by the community and can beinstalled. The full list of available plugins is available on the Code Zone3.

2.3 The XWiki Platform Modules

A module offers services in a given domain. Modules are the equivalent of Pluginsbut using the new XWiki component-based architecture.

XWiki’s Architecture is based on Component-oriented Development. XWikihas chosen to be independent of all existing Components Managers and insteadto define some simple Component interfaces that can then be bound on any ex-isting Component Manager. XWiki is currently implementing its own lightweightComponent Manager.

3 Contributions from the XWiki community can be accessed at: http://code.xwiki.org/xwiki/bin/view/Main/.

Page 4: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

4 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

2.4 The XWiki Platform Applications

The applications created and maintained by the XWiki development team are:Panels, Administration, Blog, Application Manager, Wiki Manager, Scheduler,Statistics, Watch List, Office Importer, WebDAV, WebDAV, Tags, Search. Inaddition to these, others applications have been contributed by the communityand can be installed. The full list of available applications is available on theCode Zone.

2.5 Extending The XWiki Platform

The XWiki Platform can be extended by:

– Writing scripts in wiki pages– Writing Applications (set of wiki pages)– Writing Plugins in Java– Writing Modules (a set of components) in Java– Writing new Skins or extending existing ones– Extending existing Service APIs when they provide extension points.

Fig. 2. Extending the XWiki Platform.

3 Bringing Semantic Tagging to the XWiki Platformwith Zemanta and DBpedia

Semantic Tagging is a proposal to extend XWiki’s default tagging mechanismusing the Zemanta content recommendation tool and the DBpedia knowledgebase:

Page 5: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 5

– tag documents with user-defined tags (default behavior in XWiki for tag-ging);

– use Zemanta to recommend tags for the wiki page content;– add concept information for each tag using Dbpedia.

The mockups below were produced using Balsamq mockups and provide theuser interface changes for the XWiki Platform when adding and displaying asemantic tag.

3.1 Add a semantic tag

When adding a tag for the content of a wiki page, the user has two options fromthe “Add Tag” form: the “Suggested tags” tab or the “Wiki Tags” tab.

When hovering over a suggested tag, a popup with semantic details will bedisplayed: tag description and URI link for the DBpedia resource page. Besidesthe “Suggested tags”, the user can use the “Wiki tags” tab to display the tagcloud from the entire wiki. Also, the default autocomplete feature will help theuser find tags already used in the wiki instance.

After a tag will be added to the Tags section for a wiki page, it will be deac-tivated from the suggested list. The grey color was used to mark the deactivatedtags.

Fig. 3. Mockup for tagging a wiki page in XWiki.

Page 6: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

6 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

Fig. 4. Tagging a wiki page in XWiki.

Fig. 5. Autocomplete feature for tagging a wiki page in XWiki.

Page 7: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 7

3.2 Display semantic information for a tag

A semantic tag will preserve the default behavior for XWiki in view mode: addicon, remove icon and link to the list of documents which were tagged with it,but will also have semantic information attached.

Fig. 6. Mockup for displaying a wiki page in XWiki.

Fig. 7. Semantic information for a wiki tag.

3.3 Instruments used for suggestions

Digitalization of content started by putting written word into ASCII form.HTML and web eventually enabled linking and interleaving with other typesof media such as images, sound and video. Flash and Javascript further enabledinteractive widgets such as map views. Lately the content on the web is movinginto direction of explicitly exposing relations between pieces of data. Generalintention of explicitly exposing relations is to allow computers to comprehendwhat pages are saying and use that knowledge to offer better service to humanswhen interacting with them.

Page 8: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

8 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

While authoring text comes naturally for educated human beings many rea-sons exist why creating fully featured web content is still cumbersome experience.Those reasons can be split into two main categories. One issue is efficiently find-ing the right content that should be included or connected to. This usually takesa lot of time. The other issue is efficiently telling the computer the relationshipsbetween our content and external content and data. This usually requires skillsand knowledge from depths of specifications and standards.

Zemanta is the service that tries to resolve those two issues by providingsemi-automatic process of content enrichment to be more appealing to humansand at the same time placing it in correct relations to other content in a waycomputers can understand.

Fig. 8. Authoring process with Zemanta.

Zemanta API allows application developers to automatically query the Ze-manta engine for contextual information about the text that user enters. Tech-nically, the API accepts (any) text through a POST request and upon analysisof that text returns suggestions.

While some other services only try to find the most overrepresented rarewords or proper names in the text, Zemanta goes deeper when processing con-tent. Zemanta offers both tags based on words and phrases that can be foundinside author’s text and also those that are only topics that could represent thecontent as a whole, but are not explicitly mentioned. It goes even further andtries to find very concrete items and concepts that are related to what is beingsaid, but are only connected through a third piece of information. Thereforeauthor can expect topics, names and concepts as tags.

Page 9: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 9

Structure of Zemanta’s RDF/XML response was inspired by Linking OpenData initiative, other APIs offering semantic responses and most importantlyideas championed by W3C.

The XWiki Semantic Tagging component uses the Zemanta API to suggestpossible keywords for a specific text. The component identifies itself with anAPI key. The API key is a string that uniquely identifies a specific instance ofapplication that is using the Zemanta web service. Also, there are limitationson the number of requests per day and number of requests per second: defaultdeveloper accounts allow for 1000 posts per day and 1 post per second.

3.4 Instruments used for semantic information

DBpedia extracts factual information from Wikipedia pages, allowing users tofind answers to questions where the information is spread across many differ-ent Wikipedia articles. DBpedia is served on the Web under the terms of theGNU Free Documentation License. In order to full the requirements of differentclient applications and can be accessed through four mechanisms: Linked Data,SPARQL endpoint, RDF dumps and index lookup.

Linked Data is a method of publishing RDF data on the Web that relieson HTTP URIs as resource identifers and the HTTP protocol to retrieve re-source descriptions. DBpedia resource identifers (such as http://dbpedia.org/resource/Andy_Warhol) are set up to return RDF descriptions when accessedby Semantic Web agents and a simple HTML view of the same informationto traditional Web browsers. HTTP content negotiation is used to deliver theappropriate format.

A SPARQL endpoint is available for querying the Dbpedia knowledge base.Client applications can send queries over the SPARQL protocol to the endpointat http://dbpedia.org/sparql. In addition to standard SPARQL, the end-point supports several extensions of the query language that have proved usefulfor developing client applications, such as full text search over selected RDFpredicates, and aggregate functions, notably COUNT(). To protect the servicefrom overload, limits on query complexity and result size are in place.

The DBpedia knowledge base is sliced by triple predicate into several partsand N-Triple serializations of these parts are available for download on the DB-pedia website. In addition to the knowledge base that is served as Linked Dataand via the SPARQL endpoint, the download page also ooffers infobox datasetsthat have been extracted from Wikipedia editions in 29 languages other thanEnglish.

In order to make it easy for Linked Data publishers to find Dbpedia resourceURIs to link to, a lookup service proposes DBpedia URIs for a given label.The Web service is based on a Lucene index providing a weighted label lookup,which combines string similarity with a relevance ranking in order to and themost likely matches for a given term. DBpedia lookup is available as a Webservice at http://lookup.dbpedia.org/api/search.asmx.

The XWiki Semantic Tagging component links information from the DBpediaindex (short description for a tag, URI for the resource page, label) to the user-

Page 10: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

10 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

defined tags in the wiki. This is an extension to the default tagging mechanismfor the XWiki platform which does not link the user-defined tags to a concept.

3.5 Common Tags

The Semantic Tagging component uses the Common Tags RDFa vocabulary tobring semantic markup to the default XWiki tagging mechanism.

Fig. 9. Example of semantic markup using RDFa for a wiki tag.

3.6 Implementation details

Extensions for the XWiki Platform to implement the semantic tagging mecha-nism:

– a XWiki application(SemTags.Tooltip) for the tag tooltip: contains a Javascriptskin extension, Stylesheet skin extension;

– a XWiki application (SemTags.CreateTagForm) for the new form for seman-tic tagging: velocity code to add a tag suggested from Zemanta, linked withinformation from DBpedia or just a tag already used in the wiki;

– a XWiki component for the backend tag mechanism: connect to the ZemantaAPI, query the DBpedia index.

– resources modifications: Javascript code to support the new tagging func-tionality;

– template modifications: updating htmlheader.vm with the DOCTYPE ofthe XHTML wiki pages to support the new RDFa vocabulary, updatingdocumentTags.vm with the new display for a keyword.

The XWiki code lifecycle is based on maven, hence a maven archetype wasused to help create a simple component module with respect to the XWikiarchitecture and components specific requirements. Since the XWiki platform iswritten using the Java programming language, a Java library was used to querythe Zemanta engine and the API was added as a maven dependency for theXWiki component.

Page 11: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 11

Maven dependency for the Zemanta API.

<dependency><groupId>com.zemanta.api</groupId><artifactId>zemapi</artifactId><version>1.0</version></dependency>

The HTTPClient library was used to query the Dbpedia lookup web service anda dependency was also added in the component pom.xml.

Maven dependency for the HTTPClient library.

<dependency><groupId>commons-httpclient</groupId></dependency><artifactId>commons-httpclient</artifactId><version>3.1</version></dependency>

Content of the component declaration file components.txt.

org.xwiki.semtag.component.internal.DefaultSemanticTaggerorg.xwiki.semtag.component.internal.vcinitializer.SemanticTaggerVelocityContextInitializer

The @ComponentRole annotation used for declaring the interface of the compo-nent.

@ComponentRolepublic interface SemanticTagger{

public ArrayList<SemanticTag> getSuggestions(String text);

public void updateFirstSemanticDetail(SemanticTag tag)throws SAXException, ParserConfigurationException, RemoteException;

public SemanticTag updateSemanticDetails(String tagName)throws ParserConfigurationException, SAXException;

}

The @Component annotation is used to implement the XWiki component whichwill be accessed using a scripting language like Velocity.

@Component("tagger")public class SemanticTaggerVelocityContextInitializerimplements VelocityContextInitializer{

/** The key to add to the velocity context */public static final String VELOCITY_CONTEXT_KEY = "tagger";

Page 12: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

12 Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

@Requirementprivate SemanticTagger semanticTagger;

/*** Add the component instance to the velocity context

* received as parameter.*/public void initialize(VelocityContext context){

context.put(VELOCITY_CONTEXT_KEY, semanticTagger);}

}

Using the component API from Velocity to display the tag name, description andlink to the DBpedia URI.

#set($suggestedList = $tagger.getSuggestions("$request.text"))#foreach($suggestedTag in $suggestedList)

#set($ok = $tagger.updateFirstSemanticDetail($suggestedTag))#set($details = $suggestedTag.getSemanticDetails())<li>

<a class="suggested-tag" href="#">$suggestedTag.name</a><span class="suggested-tag-info"

style="display: none">$details.get(0).getDescription()<br/><a href="$details.get(0).getUri()">Visit</a><div id="more-at">Powered by

<a href="http://www.dbpedia.org"><img src=’$dbpediaImg’ alt="Dbpedia"/></a></div>

</span></li>

#end

4 Conclusions

A tag is a relevant keyword or term associated with specific content and providea very efficient method of describing information with metadata. The taggingextension for the XWiki platform provides semantic details extracted from theworld’s biggest knowledge base improving the content understanding both userand the computer.

5 Bibliography

1. Common Tag, http://commontag.org/Home

Page 13: Semantic Tagging for the XWiki Platform with Zemanta and DBpedia

Semantic Tagging for the XWiki Platform with Zemanta and DBpedia 13

2. Bizer, Ch., Lehmann, J., Kobilarov, G., Auer, S., Becker, Ch., Cyganiak, R., Hell-mann, S.: Dbpedia A Crystallization Point for the Web of Data

3. Zemnata Developer Network, http://developer.zemanta.com/4. Tori,A.: Everything you need to know about Zemanta API besides the specification5. Writing XWiki Components, http://platform.xwiki.org/xwiki/bin/view/

DevGuide/WritingComponents

6. ***, http://platform.xwiki.org/xwiki/bin/view/Main/7. ***, http://platform.xwiki.org/xwiki/bin/view/DevGuide/Architecture8. ***, http://platform.xwiki.org/xwiki/bin/view/DevGuide/