www.sti-innsbruck.at © copyright 2013 sti innsbruck “how to put an annotation in html?”...

13
www.sti-innsbruck.at © Copyright 2013 STI INNSBRUCK www.sti- innsbruck.at “How to put an annotation in HTML?” Ioannis Stavrakantonakis

Upload: nigel-tyler

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

www.sti-innsbruck.at © Copyright 2013 STI INNSBRUCK www.sti-innsbruck.at

“How to put an annotation in HTML?”

Ioannis Stavrakantonakis

www.sti-innsbruck.at

Outline

2

• Research question

• ITS 2.0

• NIF

• What about Microdata?

• Demo

• References

www.sti-innsbruck.at

Research question

3

We want to annotate Springfield with an URI to make sure that the computer understands we mean the Springfield in Massachusetts.

HTML:

<p>It is well known, that Springfield has mild summers and short, but hard winters.</p>

HTML with annotation (something like that):

<p>It is well known, that

<span about="http://sws.geonames.org/4951788/">Springfield</span>

has mild summers and short, but hard winters.</p>

We don't want to add whole triples, but just annotate the HTML and say "this element refers to the following URI".

From: Denny Vrandečić Sent: Wednesday, April 24, 2013 1:59 PM

To: semantic-web at W3C Subject: How to put an annotation in HTML?

www.sti-innsbruck.at

ITS 2.0

4

• International Tag Set (ITS) [2]

– enhances the foundation to integrate automated processing of human language into core Web technologies;

– focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF);

– is a technology to add metadata to Web content, for the benefit of localization, language technologies, and internationalization (see more in [5] regarding localization (l10n) and internationalization (i18n))

www.sti-innsbruck.at

ITS 2.0

5

• Potential Users of ITS [2]:

– Schema developers starting a schema from the ground up(proposals for attribute and element names to be included in their new schema)

– Schema developers working with an existing schema(should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema)

– Vendors of content-related tools (e.g. tools for authoring, translation, etc.)

– Content producers (may be used by them to mark up specific bits of content)

– Machine Translation Systems

– Text Analytics (automatically generated metadata for improving localization, data integration or knowledge management workflows)

– Localization Workflow Managers

www.sti-innsbruck.at

ITS 2.0

6

The Text Analysis use case:

•This data category is used to annotate content with lexical or conceptual information for the purpose of contextual disambiguation.

•3 pieces of annotation:

– Confidence: The confidence of the agent (that produced the annotation) in its own computation – XSD double data type (e.g. 0.63)

– Entity type: The type of entity, or concept class of the text analysis target – IRI (e.g. http://nerd.eurecom.fr/ontology#Location [8])

– Entity identifier: A unique identifier for the text analysis target – IRI or String (e.g. http://dbpedia.org/page/Innsbruck or the identifier for “Capital” from Wordnet [9])

www.sti-innsbruck.at

ITS 2.0

7

Rendered HTML:

HTML with ITS metadata:

<html xmlns="http://www.w3.org/1999/xhtml"><body>

<h2 translate="yes">Welcome to <span its-ta-ident-ref="http://dbpedia.org/page/Innsbruck" its-within-text="yes" translate="no">Innsbruck</span> in <b translate="no" its-within-text="yes">Austria</b>!</h2>

</body></html>

www.sti-innsbruck.at

ITS 2.0

8

• Conversion to NIF [2]:

– Convert XML or HTML documents that contain ITS metadata to the RDF-based format based on NIF. The conversion results in RDF.

– The conversion algorithm to generate NIF consists of seven steps. The output of the algorithm uses the ITS RDF ontology [7].

– The conversion to NIF is a possible basis for a natural language processing (NLP) application that creates, for example, named entity annotations.

– To integrate the RDF annotations into the original input document is given in [6] (NIF2ITS).

www.sti-innsbruck.at

NLP Interchange Format (NIF)

9

• NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.

• NIF will soon be a normative part of the ITS 2.0

• NIF and its community project NLP2RDF serve as an umbrella project liaising with other community of practices, especially:

– LOD2 FP7 EU project– MultilingualWeb-LT Working Group– Best Practices for Multilingual Linked Open Data Community Group– Ontology-Lexica Community Group– Named Entity Recognition and Disambiguation (NERD)– Ontologies of Linguistic Annotation (OLiA)

• University of Leipzig

www.sti-innsbruck.at

How is it different to Microdata annotations?

10

What is the latitude and longitude of the <span ?=?>Empire State Building</span>?

<span its-ta-ident-ref="http://live.dbpedia.org/page/Empire_State_Building">

Empire State Building</span>

<div itemscope itemtype="http://schema.org/Place">

What is the latitude and longitude of the

<span itemprop="name">Empire State Building</span>?

</div>

Microdata + schema.org

ITS2.0 + dbpedia resource

www.sti-innsbruck.at

How is it different to Microdata annotations?

11

What is the latitude and longitude of the <span ?=?>Empire State Building</span>?

Semantics of ITS2.0 annotations:

Specify entity identifiers (IRIs) for the presented information item.

Semantics of Microdata annotations:

Specify the type of information that is presented.

Microdata

ITS2.0

www.sti-innsbruck.at

Hands-on / Demo

12

• HTML with ITS metadata

• Transformation of HTML with ITS metadata to NIF

Notes:

• Based on the XSLT files shared by the W3C Working Group member Felix Sasaki (@fsasaki) [4]

• The Java internal XSLTC processor fails to compile the XSLTs. Use Saxon 9 HE.

www.sti-innsbruck.at

References

[1] W3C semantic web list thread: http://lists.w3.org/Archives/Public/semantic-web/2013Apr/0218.html

[2] ITS 2.0 W3C working draft: http://www.w3.org/TR/its20/

[3] NIF Core Ontology: http://persistence.uni-leipzig.org/nlp2rdf/

[4] Felix Sasaki ITS 2.0 extractor (github): https://github.com/fsasaki/its20-extractor

[5] W3C, Localization vs. Internationalization: http://www.w3.org/International/questions/qa-i18n

[6] W3C, Conversion NIF2ITS: http://www.w3.org/TR/its20/#nif-backconversion

[7] W3C, ITS 2.0 / RDF Ontology: http://www.w3.org/2005/11/its/rdf-content/its-rdf.html

[8] Named Entity Recognition and Disambiguation (NERD): http://nerd.eurecom.fr/ontology

[9] WordNet Search 3.1: http://wordnetweb.princeton.edu/perl/webwn

13