metadata – the what and why - clarin · the shape of metadata metadata is “transcendental”...

20
Metadata – The What and Why Alexander König The Language Archive, MPI for Psycholinguistics [email protected] CLARIN NL: CMDI Tutorial 13 th September 2012

Upload: doancong

Post on 06-Mar-2019

235 views

Category:

Documents


2 download

TRANSCRIPT

Metadata – The What and Why

Alexander König The Language Archive, MPI for Psycholinguistics

[email protected]

CLARIN NL: CMDI Tutorial 13th September 2012

www.clarin.eu The Shape of Metadata

Metadata is “transcendental” Data about data: ‘who, what, where and when’ of a document. Structured data about data In the computer age: machine readable data about data Metadata is data describing a (set of) (digital) resource(s)

www.clarin.eu Why Metadata?

Why should you create metadata?

www.clarin.eu Why Metadata?

You do not want your data to float in

the cyberspace and get lost!

www.clarin.eu Why Metadata?

(Re) Finding resources: l  Using free text “key words” (Google-

like search) l  Special search engines that exploit

the structure of metadata

www.clarin.eu Why Metadata?

l  Those who don’t take care over their metadata are doorless.

l  Content is expensive to create.

l  Just what good is your content if the people who need to read it can’t find it?

www.clarin.eu The Shape of Metadata

Metadata doesn't need to be in any particular form per se, but... ...if you want computers to understand it and be more useful to humans it should be structured

=> In a standardized fashion using a

metadata model

www.clarin.eu The Shape of Metadata

There are a lot of different

Metadata Models l  Dublin Core (DC) l  OLAC l  IMDI l  CMDI l  ...

www.clarin.eu The Shape of Metadata

Nowadays metadata is more widely used than you might think You probably already deal with metadata on a daily basis without even thinking about it

www.clarin.eu The Shape of Metadata

For Music (id3)

www.clarin.eu The Shape of Metadata

www.clarin.eu The Shape of Metadata

For Photos (EXIF)

www.clarin.eu The Shape of Metadata

The Shape of Metadata

Some common metadata models in the linguistic domain are l  Dublin Core (DC) / OLAC l  IMDI l  CMDI

www.clarin.eu

www.clarin.eu The Shape of Metadata

Dublin Core (DC) Metadata Set

Content Intellectual Property

Instance

Title Creator Date

Subject Publisher Type

Description Contributor Format

Language Rights Identifier

Relation

Coverage

Source

www.clarin.eu The Shape of Metadata

DC example [Content] DC.Title = “American Gods” DC.Language = “English” [IP] DC.Creator = “Neil Gaiman” [Instance] DC.Format = “Hardcover” DC.Date = “2001-06-19” DC.Identifier = “0-380-97365-0”

www.clarin.eu The Shape of Metadata

OLAC extends DC with fields specific to linguistics

www.clarin.eu The Shape of Metadata

IMDI is a more complex model designed with multimodal resources in mind

www.clarin.eu The Shape of Metadata

l IMDI contains strcutural information (corpus files for building a tree) l Resources are attached to the metadata (session file / bundle)

www.clarin.eu