wp3 further specification of functionality and interoperability - gradmann

23
WP3 Further specification of Functionality and Interoperability Work Group 3.2 Semantic and Multilingual Aspects

Upload: europeana

Post on 11-May-2015

709 views

Category:

Education


1 download

DESCRIPTION

WP3 Further specification of Functionality and Interoperability

TRANSCRIPT

Page 1: WP3 Further specification of Functionality and Interoperability - Gradmann

WP3 Further specification of Functionality and InteroperabilityWork Group 3.2 Semantic and Multilingual Aspects

Page 2: WP3 Further specification of Functionality and Interoperability - Gradmann

Issues for Work Group WG3.2:Some Principles

• Europeana surrogates need rich semantic context in (at least)

• Place, Time, Persons, Abstract Concepts

• The graphs linking surrogates and semantic nodes need to be typed

• We will use linked data wherever possible instead of creating our own semantic nodes

• Source data and their context will be in all European languages (and potentially more!)

• Europeana users will wish to use all European languages (and potentially more!)

Page 3: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2: Semantic Contextualisation and Multilingual Issues

Page 4: WP3 Further specification of Functionality and Interoperability - Gradmann

Issues for Work Group WG3.2: Semantic Contextualisation (1)

• What kind of functionality based on semantic technology do we actually want to enable (have a look at the thoughtlab and develop ideas from there)? Do we want to enable logical inferencing, for instance?

• What source data do we actually have (subject headings, classifications, thesauri) and how well are objects contextualised in source data?

• What kinds of semantic elements will we be able to produce from these via SKOSification or other automated procedures?

• Which linked data resources will we be using?

Page 5: WP3 Further specification of Functionality and Interoperability - Gradmann

Issues for Work Group WG3.2: Semantic Contextualisation (2)

• To what extent will we be able to automatically contextualise surrogates in linking them to semantic nodes?

• What types of links between surrogates and nodes do we distinguish?

• What may providers expect to get back from us?

• What technology do we need for all this• RDF? SKOS?? OWL???

• What input does Europeana.Connect (EuCo) WP1 expect from us and when?

• What do we expect back from EuCo WP1 and when?

• Any related projects? Results we can reuse??

Page 6: WP3 Further specification of Functionality and Interoperability - Gradmann

Issues for Work Group WG3.2Multilingual Issues

• What is a realistic scope for multilingual functionality: • query translation? • Result set translation?? • More???

• Which languages will Europeana 1.0 support?

• What input does EuCo WP2 expect from us and when?

• What do we expect back from EuCo WP2 and when do we expect this?

• Any related projects? Results we can reuse??

Page 7: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2: Semantic and multilingual aspects• Marco Berni• Tobias Blanke • Giuliana de Francesco • Milena Dobreva• Martin Doerr • Zeki Mustafa Dogan • Nicola Ferro • Stefan Gradmann • Antoine Isaac • Walter Koch • Stefanos Kollias • Allison Kupietzky• Dan Matei • Hans Nederbragt • Vivien Petras• Anne Schiller• Douglas Tudhope • Vassilis Tzouvaras • Dov Wiener

• Issues:• intended functionality• quality and semantic

contextualization of object data• subject headings, thesauri,

classification data available• which technologies to use• realistic scope for multilingual

operations• related projects in area of

multilinguality

• Office:• Sjoerd Siebinga• Go Sugimoto

Page 8: WP3 Further specification of Functionality and Interoperability - Gradmann

Today (02 April)

• Contextualisation of existing source data

• Contextual data available

• Functional Scope

• Linked data at our disposal

Page 9: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 02 April - 1

• Contextual data available• List of 84 different vocabularies• Some prominent ones such as LCSH, some of them in VIAF• Semantic areas: subjects, names, persons, material• Various delivery formats

Page 10: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 02 April - 2

• Contextualisation of existing source data• Geographic names used 50% -> 90%• Coordinates 6% -> 8%• Time• Subjects• Persons• organisations

Page 11: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 02 April - 3

• Questions / suggestions:• Which resources are cross-domain?• Which ESE element to be used?• Who will do cleaning of metadata?• Why not store metadata received as objects of its own rights• Minerva list of thesauri to be considered• Distinguish subject terms and classification of objects• Restrict structured operations to high level thesauri and do the rest

based on lexical associations and the like• Ask providers to make their internal authorities available rather

than trying to do map

Page 12: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 02 April - 4

• Functional Scope (1)• Surrogate model as presented in D2.5 doesn’t distinguish different

types of relationships such as ‘about’ and ‘was present at’.• The Point is valid for data organisation and for searching

• Is a better model realistic for 1.0?

• Can relation types be derived from the original attributes’ semantics

• Contextualisation pertaining to surrogate vs. context data pertaining to originating context

• Granularity: complex objects• We need examples! -> Don Undeen: The Semantic Web in Practice

• Separation of digital object, conceptual object (FRBRize the model)• Annotation: part of surrogate? When are these object of their own

rights

Page 13: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 02 April - 5

• Functional Scope (2)• Provenance! Diachronic dimensions should be better represented• Geographic data DigMap (input from Milena)• Target audience is critical! User modelling!!• Reasoning: indirectly connected things • Related terms + related (functional) context• Flexibility of modeling is a requirement• -> inferencing, some kind of reasoning is needed, and be it for

machine processing only• Cost of processing time may be a critical issue in designing!• How to generalise properties to a small set of super-properties

Page 14: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April - 5

• Functional Scope (3)• Access by super-properties based on appropriate generalisations, follow

data paths• Rosetta stone metaphor: Rosetta navigation• Domain specific ontologies mapped (or pruned) to more generic

Europeana ontologies as part of OurEuropeana• Higher level terms (Europeana) + more granular terminology (user)• Generalisation, query expansion• Characterisation of collections (do we want these?) – or rather fonds (in

archival speak), contextual groupings• Distinguish curatorial environments (with metatada pertaining to these) and virtual

‘collections’

• Tree structure in archives: can we represent these in the surrogate structure, or do we model this in semantic contextualisation

Page 15: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April - 5

• Functional Scope (4)• (Collections contd): provider vs. user generated groupings• All ‘collections’ can be reduced to conceptual context (including

‘events’)• Questions – answers? Or just surrogate retrieval?? And if we

provide answers: multilingually??

• Multilingual issues• Linguistic info pertaining to each attribute is a basic requirement –

possible?• Query expansion + translation as scope + query formulation aids• Surrogate model doesn’t account for language, also regarding

diachronic aspects

Page 16: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April - 5

• [Multilingual issues]• Architecture: language manager indicates query translation focus,

but multilingual approach should be much more transversal• Check against lexica at ingest stage and normalise / enrich• Use of an interlingua of controlled terms – but consider out of

vocabulary terms!• Use CACAO results: make recommendations rather than try to

impose …• Resources in different languages (FRBRzing)• Use payloading in search contex• Who will provide named entity resources, and which standards will

we use in this respect

Page 17: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April - 5

• [Multilingual issues]• Distinguish properties that are important for multilingual operations

from those that are not• Wordnet use in ThoughtLab with English as pivotal language

providing quick wins• Freely available resources are rare! UNESCO thesaurus availiable

in some languages: CACAO list, TrebleCLEF, Placenames (European resource)

• IMPACT uses lexica, some of which may be freely available -> Max Kaiser!

• Political issues: who are the semantic/linguistic resource providers? Political authorities??

• Last FP7 call (DL) …

Page 18: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April - 5

• [Multilingual issues]• CEN INNN• Talk to CLARIN for multilingual services• Contact FlareNET project• Eurovoc mapping involving Gemnet and others (Doug)• Aligning all these resources may be a non-trivial issues

• Organise a seminar joining all projects

• Whitepaper on multilingual issues as a starting point (Milena, Martin, CACAO,

• CERL has produced a thesaurus• Subject terms and concepts are harder than place names and the like• Problem of differing standards

Page 19: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April - 5

• [Multilingual issues]• Whitepaper on multilingual services provided to Europeana as a

starting point (Milena, Martin, CACAO, Vivien, Sjoerd, Nicola) until June using the ROSE wiki

• -> Seminar adjacent to the September meeting• Technology watch

Page 20: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April - 6

• Linked data at our disposal (quite restricted)• Link at ingestion and updating time rather than dynamically in query

context (-> use a Europeana cache for pointers -> surrogate model?)• DBPedia (pivotal resource for multilingual operations!)• Language repository• Geonames• LCSH• Rameau (use MACS and CrissCross to provide mappings)• VIAF• ETB• But: Metadata provided will contain links to other resources, and

typically not URIs

Page 21: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April - 7

• Typing relations ...! Including language tags again

Page 22: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April – Conclusion (1)

• Semantics: Rosetta Stone metaphor with two types of functionality

• Context of surrogates• Contextual groupings• Open: typing relations

Page 23: WP3 Further specification of Functionality and Interoperability - Gradmann

WG3.2 03 April – Conclusion (2)

• Multilingual Issues• Linguistic info pertaining to each attribute is a basic requirement –

possible?• Surrogate model doesn’t account for language, also regarding

diachronic aspects• Scope: Query expansion + translation + query formulation aids• Whitepaper on multilingual services provided to Europeana as a

starting point (Milena, Martin, CACAO, Vivien, Sjoerd, Nicola) until June using the ROSE wiki

• -> Seminar bringing together all initiatives and projects adjacent to the September meeting