Introduction and Theoretical Foundations of New Media Metadata and Ontologies

Slides from the Introduction and Theoretical Foundations of New Media course of the Interactive Media and Knowledge Environments master program (Tallinn University).


Introduction and Theoretical Foundations of New Media

Metadata and Ontologies


David Lamas, TLU, 2011






The sematic web

The internet of things

David Lamas, TLU, 2011


David Lamas, TLU, 2011



So, why is metadata relevant?Or… why should we care about metadata?

David Lamas, TLU, 2011



As a concept, is not newMetadata has long been for managing document collections

such as the ones kept by libraries

But the term itself, was only coined in 1968By Philip Bagley, a pioneer of computerized document


David Lamas, TLU, 2011



Literally, a set of data that describes and gives information about other data, metadata in our context is:Machine readable


For the purposes of resource…




Access control



Long term preservation

David Lamas, TLU, 2011



Or in other words, metadata allows for the description of the…Definition

Structure; and


of selected resources with all contents in context to ease the further use of the resource

David Lamas, TLU, 2011



Or… Machine Readable CatalogueIs still the main metadata standard in the library world

although it is not a full cataloguing scheme being

David Lamas, TLU, 2011



Universal Decimal ClassificationA multilingual classification scheme for all fields of knowledge

Available at…

Anglo-American Cataloguing RulesFor use in the construction of catalogues

Available at…

Resource description and accessAvailable at…

David Lamas, TLU, 2011


Z39.50, SRW and SRU

Z39.50is a client–server protocol for searching and retrieving

information widely used in library environments

Search & Retrieve Web ServiceA intended standard web-based text-searching interface

Search/Retrieval via URLA standard XML-focused search protocol for Internet search

queries, which uses the Contextual Query Language

David Lamas, TLU, 2011



This should not bother you other than to note that…Metadata tends to get more complicated the longer you think

about it

David Lamas, TLU, 2011


As for the web…

It was early recognized that finding what you need was going to start getting difficultWe’re talking about the mid nineties when the web’s size

was referred to in terms of tens of thousands

Users, mainly information sciences specialists, begun trying to catalogue it by handDo you remember Yahoo’s earlier versions?

David Lamas, TLU, 2011


As for the web…

The first search engines appeared and authors begun to realize that the metadata they embedded into web pages might be important



<title>A web page</title>

<meta name=“keywords” content=“some, key, words” />

<meta name=“description” content=“a summary” />



David Lamas, TLU, 2011


As for the web…

Then came GoogleAnd metadata lost some relevance as Google’s PageRank

algorithm takes note of links between pages but places less emphasis on embedded metadata to avoid…


<meta name=“description” content=“a summary” />


<title>put your title here</title>

David Lamas, TLU, 2011


Dublin Core

Despite the initial drawbacks, work continued on embedded metadata and the Dublin Core was and still is one of the main players with its 15 elements…Title, Creator, Subject, Description, Publisher, Contributor, Date,

Type, Format, Identifier, Source, Language, Relation, Coverage, Rights

…embedded into web pages or encoded using XML

The initial intention was to improve indexing by search enginesBut whereas its promoters forgot about metaspam and metacrap,

the search engines didn’t

And so, main search engines still ignore embedded metadata

David Lamas, TLU, 2011


Dublin Core

David Lamas, TLU, 2011



Remarkably, there has been fairly widespread adoption of metadata principles, specially in policy terms, namely in government(look into

viewer.aspx for and interesting example)

And in:



Cultural heritage

Environmental agencies, and…

Libraries, of course

David Lamas, TLU, 2011



This resulted in the… Growth of metadata cataloguing rules

(although every community has its own rules)

Growth in use of additional elements for particular communities

(and again, every community’s additions are different)

Adoption of application profiles to document the distinct cataloguing rules and additions

Institution of the Dublin Core Metadata Initiative as

an organization engaged in the development of interoperable metadata standards that support a broad range of purposes and business models

David Lamas, TLU, 2011



But the Dublin Core isn’t alone, far from itMany other standards were and are being developed such as

these, just to name two:

RDF (Resource Description Framework)

LOM (Learning Object Metadata)

David Lamas, TLU, 2011


Resource Description Framework

The resource description framework was developed by the W3C, the RDF is the envisioned standard for the semantic webIts goal is to allow software to automatically navigate and

reason about web content thus enabling…

A web of (linked) data

David Lamas, TLU, 2011


Resource Description Framework

David Lamas, TLU, 2011


Learning Object Metadata

Learning Object Metadata is a data modelUsually encoded in XML, it is used to describe learning

objects and similar digital resources used to support learning.

David Lamas, TLU, 2011


Learning Object Metadata

David Lamas, TLU, 2011



As said in the beginning…Metadata tends to get more complicated the longer we think

about it

The current metadata efforts lack of within standards and within communities coherence and cohesion are a good example

And that is why we will next look into Ontologies

So… do we care about metadata?Why are we interested?

David Lamas, TLU, 2011



I guess the answer is yes, we care.And yes, we are interested, because metadata is everywhere

Sometimes it is explicitly available,

Other times it is hidden or not so readily available, but anyway…

It would be foolish not to make use of it

David Lamas, TLU, 2011



Further, there is increasing pressure to expose metadata on the web for other to mash up and this is specially true today in settings such as…Education;

Research; and


And finally, metadata becomes paramount in scenarios wherecontent is data; or

the required information can not easily derived from content

David Lamas, TLU, 2011


David Lamas, TLU, 2011



One way of dealing with the lack of within standards and within communities coherence and cohesion of current metadata efforts is to evolve to an ontology-base metadata approach

But what does this means?

David Lamas, TLU, 2011


An ontology is a logical theory which gives an explicit partial account of a conceptualizationAn intentional semantic structure which encodes the implicit

rules constraining the structure of a piece of reality

In this light, the aim of an ontology is to define which primitives, provided with their associated semantics, are necessary for knowledge representation in a given context

Thomas R. Gruber (1993). Toward principles for the design of ontologies used for knowledge sharing.

Originally in N. Guarino and R. Poli, (Eds.), International Workshop on Formal Ontology, Padova, Italy. Revised

August 1993. Published in International Journal of Human-Computer Studies, Volume 43 , Issue 5-6

Nov./Dec. 1995, Pages: 907-928, special issue on the role of formal ontology in the information technology.

David Lamas, TLU, 2011



Ontologies are usually characterized by their…Coverage

The extent to which the primitives mobilized by the perceived usage scenarios are covered by the ontology


The extent to which ontological primitives are precisely identified


The extent to which primitives are precisely and formally defined


The extent to which primitives are described in a formal language

David Lamas, TLU, 2011



And ontologies are not… taxonomies

But taxonomy might be perceived as a specific case of an ontologyA taxonomy is a particular classification arranged in a

hierarchical structure

Typically it is organized by supertype/subtype relationships also called generalization/specialization relationships

David Lamas, TLU, 2011


Why ontologies?


David Lamas, TLU, 2011


Why ontologies?


David Lamas, TLU, 2011


Why ontologies?


David Lamas, TLU, 2011


Why ontologies?

In short, we interpret, machines don’tAs such, an effort must be undertaken in order to support

adequate usage of digital resources

So, what’s missing?Among other…

The possibility to share a common understanding of the structure of information within a specific domain

The possibility to reuse domain knowledge

The possibility to make domain assumptions explicit

The possibility to analyze domain knowledge

David Lamas, TLU, 2011


Ontologies and the web

It is estimated that by 2010…70% of public web pages will have some level of metadata,

but only

20% will use more extensive semantic web approaches such as ontology-based metadata

But why should we care?


David Lamas, TLU, 2011


Ontologies and the web

An emerging ontological approach is OWL or…Web Ontology Language

A vocabulary extension of the Resource Description Framework, which adds more vocabulary for describing characteristics of properties and classes or relations between classes

David Lamas, TLU, 2011


Web Ontology Language

OWL enables ontology-based information sharing and manipulation together with RDF and XMLIn reverse order…

XML allows users to add arbitrary structure to their docuemnts but says nothing about what such structures mean

RDF enables expression of meaning over XML (and other) structures

Using subject, verb and object triples

OWL enables machines to comprehend semantic documents and data

David Lamas, TLU, 2011


Web Ontology Language

David Lamas, TLU, 2011



This said and while addressing some of the current metadata efforts weaknesses, present-day ontologies still largely depend on explicit human intervention to be usefulAnd that is why we will next look into folksonomies

David Lamas, TLU, 2011


David Lamas, TLU, 2011


Are mainly a bottom-up social classification systemA way to organize and share contents by tagging resources

Synonyms are…

Ethno-classification; and

Collaborative tagging

David Lamas, TLU, 2011



Folksonomies are created by users and have…No structure

No fixed vocabulary

No explicit relationships between terms, and

No authority

David Lamas, TLU, 2011



Folksonomies also are…Distributed, and

Collaboratively built and maintained

You can tag items owned by others

You can get instant feedback

All items for the same tag

All tags for the same item

You can a adapt your tags to the group norm

But you are never forced

David Lamas, TLU, 2011



Some of their apparent benefits are…Being cheap and easy to build and use

Being capable to adapt very quickly to changes and users needs

They scale well

Foster serendipity

Semantic browsing instead of searching

Lower the cooperation barriers

David Lamas, TLU, 2011



But they have limits such as…Semantic ambiguity

Polysemy, synonymy, cardinality and the use of acronyms

Syntax free

Spaces and multiple words are used without rules


Different languages can be used for the same tag

Being eventually shortsighted

Fail to depict the general overview

Lack of (or minimal) structure

No explicit relationships between otherwise related tags

David Lamas, TLU, 2011


Folksonomies and ontologies



Large corpus

Informal categories

Unstable entities

Unclear edges


Naïve cataloguers

No authority

Uncoordinated users

Amateur users

Critical mass needed



Small corpus

Formal categories

Stable entities

Restricted entities

Clear edges


Expert cataloguers

Authoritative sources of judgment

Coordinated users

Expert users

David Lamas, TLU, 2011


Folksonomies and ontologies

How do we choose?Folksonomies are useful when all that is needed is the ability

to link items to topics

Ontologies are useful when what is needed is to formally define meaning

But… do we need to choose?Not really, at least that what current research is exploring

David Lamas, TLU, 2011


Folksonomies and ontologies

Research directions includeThe combination of the folksonomy and ontology approaches

into an hybrid system where the most consensual constructs would long last while others would be forgotten or redefined

An approach that combines the ease and adaptability of folksonomy with the formality and semantic richness of an ontology

Quantitative tag analysis and qualitative use analysis in current online social networking services

To understand if tag usage converge or not

To understand how a folksonomy is formed

To… any ideas?

David Lamas, TLU, 2011

Semantic web

David Lamas, TLU, 2011

Semantic Web

The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help

One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web

Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine processable form.

Page 52: Metadata and ontologies

David Lamas, TLU, 2011

Internet of things

David Lamas, TLU, 2011

The internet of things

The internet of things might be described as a self-configuring wireless network of sensors whose purpose would be to interconnect all thingsAnd the concept is attributed to the former Auto-ID Center,

founded in 1999, based at the time at the MIT

An alternative view focuses instead on making all things addressable by the existing naming protocolsIn the current vision, objects themselves do not interact, but

they may now be referred to by other agents, such as centralized servers acting for their human users

David Lamas, TLU, 2011


Metadata and Ontologies recap




The sematic web

The internet of things