the europeana use case - world wide web consortium · multilingual enrichment strategy. in: ... !...

14
Multilingual & Semantic Interoperability in Cultural Heritage Information Systems Vivien Petras Berlin School of Library and Information Science 12 March 2013 W3C Multilingual Web Workshop The Europeana Use Case

Upload: buidang

Post on 27-Jul-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Multilingual & Semantic Interoperability in Cultural Heritage Information Systems

Vivien Petras

Berlin School of Library and Information Science

12 March 2013 W3C Multilingual Web Workshop

The Europeana Use Case

Contents

•  Europeana: Multilingual Collections & Users •  Multilingual Interoperability •  Semantic Enrichment •  Preview: New Enrichment Plans •  Playing with Europeana Data

2 Image: http://www.europeana.eu/portal/record/08535/D53FE7B7621E65A5E01E16E3D72785C68F2E2059.html

Europeana

3

•  15.2 million images •  10 million texts •  450,000 sound files •  170,000 video files

> 2,200 institutions > 30 countries

Europeana Multilingual Collections

German 18%

Multilingual 12%

French 11%

Dutch 10%

Swedish 9%

Spanish 8%

English 7%

Norwegian 6%

Polish 6%

Italian 6%

Finnish 3%

Danish 2%

Hungarian 1%

Slovenian 1%

à Most Europeana objects are language-independent (e.g. images), but the meta-data is multilingual.

4

Multilingual Europeana Users

•  Native language browser: 69% •  Native language Google (entry point): 91%

•  Native language objects: 43% (SV 77%, DE 71%) à Native language use increases as soon as native language

content increases. Gäde, Maria (forthcoming). “User Behavior through the Language Glass” – Language-specific Behavior in Multilingual Digital Libraries.

5 Image: http://www.europeana.eu/resolve/record/9200105/AF5C65B3CC6A71CC0E4FF6FE5AAEB4CDAA1873C9

Multilingual Interface in 31 Languages

•  users seem to assume that search is affected

6

Query Result Filtering by Language

•  language of record vs. language of content

7

Document Translation

•  general MT – not domain-specific

8

Query Translation – Planned for 2013

•  How many languages? •  How much user interaction?

9

•  concept (GEMET Thesaurus), agent (DBpedia), period (Semium time ontology), place (Geonames)

10

Semantic Enrichment

Poisonous India…

11

Enrichment Challenges

•  Metadata quality & sparsity

•  Vocabulary ambiguity

–  domain GEMET print (German) Druck pressure

–  language electrical Power (German) Strom (Czech) strom tree

–  context Córdoba = Spain | Argentina

Olensky, M., Stiller, J., Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In: Proc. of MTSR 2012: Metadata and Semantics Research Conference, Nov. 2012, Cádiz, Spain.

12 Image: http://www.europeana.eu/portal/record/03919/FCD38BDE7A03579F24BEDA5D157943B75BB36F11.html

Preview: New Enrichment Plans

13

à transition to linked data-based Europeana Data Model (EDM)

•  links to contextual vocabularies from providers •  enrich during ingestion

Playing with Europeana Data

•  CHiC: Cultural Heritage in CLEF à Europeana data (XML) & queries / 13 languages à ad-hoc retrieval / semantic enrichment tasks à Submission deadline: 14 April 2013 à http://www.culturalheritageevaluation.org

•  Europeana Linked Open Data à RDF file dumps in EDM (Europeana Data Model) à SPARQL endpoint à CC0 open license à http://data.europeana.eu/

•  Contact: [email protected]

14 Image: http://www.europeana.eu/resolve/record/03486/DF559A7721E55BAE5BF5095FB9AA55406C0269C4