platform interoperability...

Post on 11-Nov-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Platform Interoperability

Guidelines March 16, 2017

Deliverable Code: D5.5

Version: 1.0 – Final Dissemination level: Public

First version of the guidelines for infrastructure interoperability structured into sets that target the stakeholder groups (providers of content and software resources)

H2020-EINFRA-2014-2015 / H2020-EINFRA-2014-2 Topic: EINFRA-1-2014 Managing, preserving and computing with big research data Research & Innovation action Grant Agreement 654021

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 1 of 30

Document Description D5.5 – Platform Interoperability Guidelines

WP5 – Interoperability Framework

WP participating organizations: ARC, USFD, UNIMAN, AK, UoG, GRNET

Contractual Delivery Date: 9/2016 Actual Delivery Date: 3/2017

Nature: Report Version: 1.0 Final

Public Deliverable

Preparation slip Name Organization Date

From Penny Labropoulou Dimitris Galanis Angus Roberts Matt Shardlow Giulia Dore Thomas Margoni Byron Georgantopoulos Panagiotis Zervas Pythagoras Karampiperis Richard Eckart de Castilho

ARC ARC USFD UNIMAN UoG UoG GRNET AK AK UKP-TUDA

21/02/2017

Edited by Penny Labropoulou ARC 16/03/2017 Reviewed by Vangelis Floros

Christian O'Reilly Mappet Walker Lucas Anastasiou

GRNET EPFL FRONTIERS OU

07/03/2017

Approved by Androniki Pavlidou ARC 16/03/2017 For delivery Mike Hatzopoulos ARC 21/03/2017

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 2 of 30

Document change record Issue Item Reason for Change Author Organization V0.1 Draft version Initial version sent for comments Penny Labropoulou ARC

V0.2

Draft version Version sent to internal reviewers Penny Labropoulou ARC

V0.3 Draft version Version from internal reviewers Vangelis Floros Christian O'Reilly Mappet Walker

GRNET EPFL FRONTIERS

V0.4 Draft version Version sent to internal reviewers (second round)

Penny Labropoulou ARC

V0.5 Draft version Versions from internal reviewers Vangelis Floros Christian O'Reilly Mappet Walker Lucas Anastasiou

GRNET EPFL FRONTIERS OU

V0.9 Pre-final version Version incorporating the internal reviewers' comments; pending final approval

Penny Labropoulou ARC

v1.0 Final version Final version; incorporating all comments Penny Labropoulou ARC

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 3 of 30

Table of Contents

1. Introduction 14

2. The OpenMinTeD platform 14

3. Target audience 17

4. Background and methodology of work 18

5. The OMTD-SHARE metadata schema 20

6. Structure of the guidelines 23

Appendix A - References 25

Appendix B – Acknowledgements & Contributors 26

Appendix C - Guidelines 29

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 4 of 30

Table of Figures

Figure 1. Overview of the OMTD-SHARE data model .............................................................................. 22

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 5 of 30

Disclaimer This document contains description of the OpenMinTeD project findings, work and products. Certain parts of it might be under partner Intellectual Property Right (IPR) rules so, prior to using its content please contact the consortium head for approval.

In case you believe that this document harms in any way IPR held by you as a person or as a representative of an entity, please do notify us immediately.

The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated in the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content.

This publication has been produced with the assistance of the European Union. The content of this publication is the sole responsibility of the OpenMinTeD consortium and can in no way be taken to reflect the views of the European Union.

The European Union is established in accordance with the Treaty on European Union (Maastricht). There are currently 28 Member States of the Union. It is based on the European Communities and the member states cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice and the Court of Auditors. (http://europa.eu.int/)

OpenMinTeD is a project funded by the European Union (Grant Agreement No 654021).

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 6 of 30

Acronyms API Application Programming Interface LR Language Resource NLP Natural Language Processing ML Machine Learning OA Open Access OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting OKFN Open Knowledge Foundation OMTD OpenMinTeD OWL Web Ontology Language PDF Portable Document Format RDF Resource Description Framework REST Representational State Transfer RI Research Infrastructure SKOS Simle Knowledge Organization System SOAP Simple Object Access Protocol TDM Text and Data Mining VM Virtual Machine WP Workpackage XML Extensible Markup Language XSD XML Schema Definition

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 7 of 30

Glossary annotation (text/corpus annotation) A note by way of explanation or comment added to a text or diagram [OED, https://en.oxforddictionaries.com/definition/annotation]. In OpenMinTeD, the term refers mainly to text or corpus annotation, which is the practice of adding interpretative linguistic information grounded in a knowledge resource to a text or corpus respectively. For example, one common type of annotation is the addition of tags, or labels, indicating the word class to which lexical units in a text belong; these tags come from a predefined set (e.g. Noun, Verb, Preposition, etc.). Semantic labeling with terms and concepts from an ontology is another common example of annotation. Relationships such as syntactic dependencies or semantic relations that link entities of the text are also annotations.

annotation resource Any resource that can be used for annotating a text, including part-of-speech tagsets, annotation schemes, domain-specific ontologies, etc.

annotation scheme A set of elements and values designed to annotate data. An annotation scheme usually aims to represent a specific level of information, such as morphological features of words, syntactic dependency relations between phrases, discourse level information, etc. It can consist of a flat structure of elements and values (e.g. part-of-speech tags) or it can be more complex with interrelated elements (e.g. specific morphological features to be used for each part-of-speech).

application Any software program (or group of programs seen as a whole) intended for the end-user and addressing one or multiple related user needs.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 8 of 30

component (software component) An algorithm wrapped in a standard way so that it can be integrated as a reusable tool or service within a particular component-oriented framework such as UIMA, GATE, etc.

corpus A structured collection of pieces of data (textual, audio, video, multimodal/multimedia, etc.) typically of considerable size and selected according to criteria external to these data (e.g. size, type of language, type of producers or expected audience, etc.) to represent as comprehensively as possible the object of study.

data model A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world entities. [Wikipedia, https://en.wikipedia.org/wiki/Data_model]

distribution Any form by which a resource can be shared; it can be a downloadable PDF or a plain text file, a form of a corpus accessible only through a web interface, or the source code of a software, etc.

document A piece of written, printed, or electronic matter that is primarily intended for reading.

interoperability Interoperability describes the extent to which systems and devices can work together, exchange data, and interpret that shared data. For two systems to be interoperable, they must be able to exchange data and subsequently present that data such that it can be understood by a user. [Research Data Alliance, http://smw-rda.esc.rzg.mpg.de/index.php/Interoperability]

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 9 of 30

licence A permission or a written evidence of a permission that confers the licensee the right to do something that otherwise would be prevented by the law.

licence compatibility/interoperability The condition or state in which two or more licences can co-exist or be combined without conflicting with each other. In OpenMinTeD, licence compatibility and licence interoperability are used as synonyms.

knowledge resource A resource (data and/or tool) containing, producing or representing knowledge; knowledge is specific information that is relevant for the linguistic and conceptual interpretation of data. For OpenMinTeD purposes, this information is exploited or produced by TDM modules and tools, or exchanged between them.

language description The resource describes a language or some aspect(s) of a language via a systematic documentation of linguistic structures. [Open Language Archives Community, http://www.language-archives.org/REC/type.html#language_description] Examples include sketch grammar, computational grammar, etc.

language resource Language Resources (LRs) encompass (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, used to assist and augment language processing applications, but also, in a broader sense, in language and language-mediated research studies and applications, and (b) tools/technologies/services used for their processing.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 10 of 30

lexical/conceptual resource A resource organised on the basis of lexical or conceptual entries (lexical items, terms, concepts, etc.) with their supplementary information (e.g. grammatical, semantic, statistical information, etc.). In OpenMinTeD, they can be used for annotation purposes.

machine learning (ML) model The process of training an ML model involves providing an ML algorithm (that is, the learning algorithm) with training data to learn from. The term ML model refers to the model artifact that is created by the training process. [http://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html]

metadata Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information. [National Information Standards Organization, Understanding metadata, http://www.niso.org/publications/press/UnderstandingMetadata.pdfhttp://www.niso.org/publications/press/UnderstandingMetadata.pdf]

open access (OA) The free and online availability of literature, which allows to read, download, copy, distribute, print, search, or link to the full text, crawl articles for indexing, pass them as data to software, or use them for any other useful purpose. An availability that is granted without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself, and those related to giving authors control over the integrity of their work and the right to be properly acknowledged and cited [Budapest OA Initiative 2002; Bethesda Statement on OA Publishing 2003; Berlin Declaration on OA Knowledge in Science and Humanities 2003]

OpenMinTeD infrastructure An infrastructure refers to the basic structures and facilities required for the operation of a system. The

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 11 of 30

OpenMinTeD infrastructure consists of different layers of resources: content resources that can be mined, ancillary knowledge resources, tools and web services. Any resource that can be registered in the OpenMinTeD registry is part of the underlying infrastructure.

OpenMinTeD platform The OpenMinTeD platform brings together all the services that facilitate the interoperability aspects of the underlying infrastructure (e.g. registration, search and browsing, creation of workflows, processing, annotation, etc.) and, thus, becomes an infrastructural service of the wider research ecosystem.

publication A book, article, etc., that has been made available to the public either via a formal publication service or over the internet and is stored at an archive or repository. For OpenMinTeD purposes, this mainly covers scholarly publications.

resource Something that you can use to help you to achieve something, especially in your work or study. [MacMillan dictionary, http://www.macmillandictionary.com/dictionary/british/resource_1]

rights statement Formal or official statement asserting the copyright status and/or the licensing conditions for a given resource. It can be issued by an authoritative body (e.g. http://rightsstatements.org/). For OpenMinTeD purposes, it can be deemed similar to a "licence category", grouping licences that share similar features.

Text and Data Mining Text and Data Mining (TDM) was initially defined as “the discovery by computer of new, previously unknown information, by automatically extracting and relating information from different (…) resources, to reveal otherwise hidden meanings” (Hearst, 1999), in other words, “an exploratory data analysis that leads to the discovery of heretofore unknown information, or to answers for questions for

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 12 of 30

which the answer is not currently known” (Hearst, 1999). [FutureTDM, http://www.futuretdm.eu/news/tdm-definition/]

service / web service A piece of software accessible through remote invocation typically using some REST-style APIs or SOAP protocols.

tool Piece of (standalone) software typically for a very limited technical purpose, such as a particular implementation of a part-of-speech tagger (e.g. TreeTagger), a tree parsing program (e.g. mstparser), etc. Preferred terms in OpenMinTeD include 'component' and 'workflow'.

workflow A series of software components assembled together in order to perform a specific task.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 13 of 30

Publishable Summary The current deliverable brings together the guidelines that interested stakeholders must follow in order to be compatible with OpenMinTeD interoperability specifications.

The guidelines intend to present in a user-friendly way the specifications set for empowering interoperability between content and software resources, especially in the framework of the OpenMinTeD platform. It is, therefore, based on input from

● D5.2 and its updated version D5.3 - Interoperability Requirements Reports (in-progress) that includes the interoperability specifications set for OpenMinTeD,

● D6.1 - Platform Architectural Specification that describes the architecture and functions of the OpenMinTeD platform, and

● the data model adopted by OpenMinTeD for describing resources involved in TDM and implemented in the OMTD-SHARE metadata schema.

The deliverable presents the work and methodology according to which the guidelines have been created, while the actual guidelines are annexed to this report and published online at https://guidelines.openminted.eu.

Four guidelines have been created, targeting respectively the providers of publications, corpora, ancillary knowledge resources and TDM software resources. The specifications determine technical (e.g. data representation formats, transfer protocols), legal and documentation (metadata) issues. Two levels of compliance are foreseen, corresponding to mandatory and recommended specifications, allowing for a gradual adoption by stakeholder groups.

Public review will be solicited from the stakeholder groups and their comments, together with additional requirements from the ongoing work on the project, will be taken into account for the next version of the guidelines (D5.6).

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 14 of 30

1. Introduction OpenMinTeD enables the creation of an infrastructure that fosters and facilitates the use of text and data mining technologies in the scientific publications world and beyond, by both application domain users (i.e., scientists, technicians, etc.) and text mining experts. OpenMinTeD builds upon existing tools and text mining platforms. It aims at rendering them discoverable through the OpenMinTeD registry, and interoperable through the interoperability layer, also based on existing and emerging standards and best practices.

The current deliverable puts together the guidelines that interested parties must follow in order to be compatible with OpenMinTeD interoperability specifications. To serve better the needs of the target stakeholder groups and the peculiarities of each resource type, separate guidelines are available per resource type and provider group. Thus, the deliverable is structured as follows:

● a short presentation of the OpenMinTeD platform and the objectives it serves, ● a short presentation of the audience targeted by the guidelines ● background and methodology of the work ● a synopsis of the OMTD-SHARE metadata schema, which is used for the documentation of all

resources in OpenMinTeD, and the data model it supports. The guidelines themselves are presented in Appendix C, while an online version is available at: https://guidelines.openminted.eu. Given that the project is still in progress, there will be two new releases in the next twelve months, taking into account stakeholders' feedback and additional specifications coming from the project; backwards compatibility of the new versions will be a priority and, where needed, conversion tools to the new version will be made available.

2. The OpenMinTeD platform TDM involves a wide range of resource types:

● the content resources to be mined (scholarly publications in the OpenMinTeD project), ● the text mining software and ● ancillary knowledge resources used for the operation of the software (e.g. annotation schemas,

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 15 of 30

linguistic tagsets, lexical or ontological resources used for annotating the resources to be mined, machine learning models, annotated textual corpora).

The OpenMinTeD platform1 integrates these resources and supports their interaction through appropriate services:

● a Registry service for storing, browsing, downloading, searching and managing the various resources, which will be registered in OpenMinTeD by using a set of specifications/protocols (e.g. OAI-PMH [https://www.openarchives.org/pmh/], Maven [https://maven.apache.org/]) and documented with high-quality metadata;

● the Workflow Editor service of the platform to guide users (via an appropriate User Interface) in creating interoperable workflows of TDM components, which will be executed by the Workflow Execution service in a cloud infrastructure (or on a local machine);

● the Annotation Editor service to allow users to annotate the publications (texts) in order to create datasets that can be used in workflows, e.g. for evaluation purposes.

The OpenMinTeD platform was designed and is being implemented as a facilitator of TDM in an ecosystem of e-infrastructures and repositories, collecting, transforming and making available resources only as needed for TDM purposes. In other words, it is not one more registry of content and services, and it doesn't seek to collect and provide information about resources that might be of interest to TDM stakeholders. Resources are uploaded and stored only as required to accommodate the processing process. Thus, for instance, knowledge resources can be registered at the OpenMinTeD registry and continue to reside at locations outside the platform, only to be accessed at the time of processing. Publications, on the other hand, are harvested and locally stored at OpenMinTeD storage facilities to ensure processing requirements and improve processing time.

Resources are to be registered into OpenMinTeD only if they can be accessed and deployed in the context of a TDM processing operation.

For this reason, it is imperative that

• the resource itself can be accessed in a single step process and in a transparent way through the

1 For a full description of the platform, see D6.1 - Platform Architectural Specification.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 16 of 30

OpenMinTeD mechanisms;

• the resource is properly described with the metadata schema adopted by OpenMinTeD, i.e. the OMTD-SHARE schema (see section 5), at least at the minimal level, ensuring that it can be discovered through the search-or-browse interface by the platform users, and that it can be instantiated when required by the software components at the time of execution of a workflow

• the resource is in a form that can be exploited as is in the OpenMinTeD context (or can be easily transformed into one of the OpenMinTeD acceptable forms through one of the conversion tools included in the platform)

• the resource adheres to the specifications set by OpenMinTeD (at least at the minimal level) that seek to achieve interoperability among all resources, as described in the guidelines.

The resources will be registered into OpenMinTeD by trustworthy sources, i.e. registered individuals. Bilateral agreements with repositories, infrastructures and other registries containing useful resources will also be made to facilitate this process.

In addition, new resources created using the OpenMinTeD toolbox and resources (i.e. corpora built by users by selecting scholarly publications, workflows created by TDM developers with components registered in OpenMinTeD, and outputs of running TDM tools and services in the platform), are also registered, stored and made available to the end-users through OpenMinTeD2 and must follow the above principles. The descriptions of new resources are produced semi-automatically, based on information from the resources used in their composition, and can be edited and enriched by users.

Providers of resources interact with the OpenMinTeD Registry service through a specially designed interface, guiding them through the process of registration (uploading resources and their descriptions). All users can browse resources through the catalogue, select a specific resource and view its detailed description; moreover, resources are fed internally through the system into the Workflow and Annotation services, where they are presented to expert users for further operations.

2 There's an ongoing discussion on the archiving and distribution of the output resources; more information on this will be made available when decisions are reached on this issue.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 17 of 30

3. Target audience OpenMinTeD targets the following groups:

● End users as consumers of the e-Infrastructure, which are further divided to: o Domain specific researchers and research communities (e.g. research labs around the world):

Users that are not knowledgeable about TDM and who want to find end-to-end applications (e.g. web services) that fulfill their needs in an off the shelf type of situation.

o Application developers / Research e-Infrastructures data scientists: People who understand the basic usage of NLP and TDM services, but not the (algorithmic) details. They are aware of the research community needs, limitations and goals. They know how to connect and configure components, and which content they must use to get the required results. They need to develop end-to-end applications.

o e-Infrastructure operators: Users agnostic to the internal specifics of TDM, but who need to integrate and operate TDM services into daily workflows which serve their constituency; the group includes, for instance, researchers of an RI, of a national e-Infrastructure or of a research institution.

● Contributors of content and software resources: o For content to be mined (scholarly publications), a potentially wide group of stakeholders

can be envisaged; in the current phase, the focus is on publishers and repository managers (research libraries).

o For TDM software resources, two subgroups are identified: ▪ A well-established community of expert language technology oriented people, who

are using specific technologies and frameworks (e.g. UIMA, GATE) to develop and enhance their software, which can be used for TDM purposes. Examples of software include Named Entity Recognizers and Term Extractors that incorporate grammatical taggers and parsers.

▪ Non-NLP expert developers, who are creating TDM modules based on off the shelf libraries and tools (e.g. Python NLTK3, Tidytext4, Scikit-learn5, Genism6, OKFN’s

3 https://pypi.python.org/pypi/nltk 4 https://cran.r-project.org/web/packages/tidytext/index.html 5 http://scikit-learn.org/stable/ 6 https://radimrehurek.com/gensim/

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 18 of 30

relevant initiative ContentMine7). These are not familiar with NLP frameworks and terminology but are eager to publish their TDM software.

o For ancillary resources, contributions are expected from two main sources: ▪ The TDM software developers (see above) who are usually bundling the required

resources in their software, but also make them available as separate entities; this includes, for instance, ML models that come together with the software that uses them but may also be distributed separately and, thus, re-used with other software.

▪ Language resources developers (e.g. terminologists, lexicographers, NLP experts producing annotation resources) and members of the various domain communities that already use resources such as ontologies, terminological lexica, thesauri etc. in their work. For this phase, the focus is on the communities targeted by the OpenMinTeD use cases, i.e. research analytics, life sciences, agriculture & biodiversity, social sciences.

The guidelines, at the present stage, are targeting only the second group, i.e. contributors of resources. It supplies instructions and advice on the registration and uploading process, as well as on the proper packaging and documentation of the resources required for importing resources in the OpenMinTeD platform. It also provides recommendations on technical features and properties that contribute to interoperability.

It should be noted, though, that the needs, expertise, habits and expectations of the first group have also influenced the descriptive schema of the resources as well as the functionalities and services supported by the platform. In addition, to further assist the end-users, the creation of guidelines targeting them, with examples and suggested pathways on the use of the OpenMinTeD platform will be investigated during the second phase of the project.

4. Background and methodology of work

The guidelines provide instructions on how to prepare, package and add new resources using the Registry interface. Their production has been based mainly on the interoperability specifications (WP5), taking into account the overall OpenMinTeD architecture and the platform implementation (WP6) as

7 http://contentmine.org/

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 19 of 30

well as the user requirements (cf. Appendix B for acknowledgements).

The four working groups participating in WP5 have set a number of abstract requirements which are described in D5.2 - Interoperability Standards and Specifications Report. The requirements specify ways of assessing and improving interoperability between content resources and software components involved in TDM operations. The next step in this endeavour has been the formulation of concrete requirements (which will be included in the updated version of this deliverable, i.e. D5.3 - Interoperability Standards and Specifications Report (2nd edition)) recommending specific implementation strategies, techniques and features that ensure interoperability as envisaged in OpenMinTeD. These requirements have fed and will continue to feed the Guidelines, given that this is still an ongoing work and that updated versions will be released during the subsequent phases of project.

An important instrument construed to support interoperability in OpenMinTeD is the OMTD-SHARE metadata schema, which is used for the description of the resources (see next section). The Guidelines include separate sections on the use of the metadata schema for each resource type, focusing in the first phase on the minimal level, which includes mandatory and strongly recommended elements. In the next release of the Guidelines, we will also include a full documentation with examples for all resource types, FAQ's and tips/advice. Given the size and complexity of the schema, we have decided to adopt this stepwise process in order to have a first testbed regarding the user-friendliness of the guidelines, and then build upon them following recommendations from the stakeholders.

Additional input for the Guidelines will come from discussions on policies regarding the registration of providers and resources in the platform. Key issues include:

● the interaction with other infrastructures, data and software repositories, in order to manually or automatically harvest all or selected resources from them,

● the involvement of organizations vs. individuals in the process of registering and uploading resources,

● the criteria for accepting resources, ● the criteria for assigning user privileges.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 20 of 30

The structure and the content of the Guidelines reflect these decisions by addressing issues related to specific types of providers in specific sections.

The current version is limited to the resource providers listed hereafter, but next releases will broaden this scope to cover additional stakeholders (e.g. the non-NLP expert software developers). More specifically:

● for content resources (scholarly publications), we expect to get input through big aggregators, i.e. OpenAIRE and CORE, who are aggregating open access content from various sources, such as repositories, publishers, journals etc. To further support the task of data collection, a connector is implemented in OpenMinTeD targeting specifically content from traditional publishers of open access publications.

● for software resources, we expect input mainly from the Consortium partners, collected through software repositories (e.g. Maven Central), but also through MetaShare that hosts resources intended for Language Technology development. In both cases, these belong mainly to the expert language technology oriented communities of developers;

● for ancillary resources, such as lexica, ontologies, ML models etc., we expect input from (a) the TDM software developers, who are wrapping ancillary resources (especially typesystems, models and tagsets) with their software modules; in the first phase, again we are focusing on the Consortium partners; ( b) developers of language resources, who are describing and storing their resources in repositories intended for that purpose, such as MetaShare, and/or in discipline repositories (especially as regards terminologies and ontologies); the main focus will be on the disciplines targeted by the WP9 use cases.

5. The OMTD-SHARE metadata schema The OMTD-SHARE metadata schema8 is the recommended schema for the description of the resources. It has been designed in order to support interoperability between the various resources used in TDM processes. This interoperability is achieved by homogenising descriptions of TDM resources from the different scientific communities using a common core vocabulary, which is linked to pre-

8 The full OMTD-SHARE schema is documented at: https://openminted.github.io/releases/omtd-share/.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 21 of 30

existing domain-specific vocabularies. Standards and best practices of the source communities are integrated whenever possible. The main principles and strategies employed in the design of the OMTD-SHARE schema consist of the following:

● cover needs of resource discoverability and TDM processing ● cover documentation needs of all resource types involved in TDM ● be flexible enough to support varying degrees of documentation completeness ● organize the schema elements and accommodate common vs. particular features of resources ● reuse what is available vs. create new elements and values ● normalize user input vs. allow for free user input ● document processing procedure and outputs.

It has largely been based on the META-SHARE metadata schema9 [Gavrilidou et al. 2012], which caters for the description of language resources, encompassing both data (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) and technologies (tools/services) used for their processing. The OMTD-SHARE is more restricted in the sense that it focuses on text resources only, while it also extends the basic schema in order to include TDM-specific concepts, and enhances the description of processing procedures and workflows.

As in META-SHARE, the schema documents the full lifecycle of a resource, including at least a minimal documentation of its satellite entities (see Figure 1), especially their interrelations. The OMTD-SHARE data model thus comprises of the following entities:

● the resources, further classified into: ○ corpora, i.e. datasets of text documents - mainly scholarly publications in OMTD-SHARE ○ lexical/conceptual resources, including lexica, ontologies, term lists, gazetteers, etc., but

also tagsets and annotation schemas, which are used for annotating corpora ○ language descriptions, which mainly refer to computational grammars ○ machine learning and statistical models10 ○ software components, pieces of software, tools offered as locally executable codes or as

9 http://metashare.ilsp.gr/knowledgebase/homePage 10Models could be considered as a subtype of language descriptions, but we decided to keep it distinct because it had a lot of properties that differentiated it from grammars; it was also considered better to keep them apart as it would enhance their discoverability.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 22 of 30

web services, wrapped in a workflow or as standalone end-to-end applications, and, finally, ○ publications, which constitute a peculiar resource type, as they are viewed in OpenMinTeD

only in a collective form, as a "corpus", ● the satellite entities, such as actors, be it persons or organizations that have created the resources,

or the projects using or funding them.

Figure 1. Overview of the OMTD-SHARE data model

The schema is composed of metadata elements that are used to describe properties and relationships. Some of these elements, especially those that pertain to administrative features, are common to all types of resources (e.g. identification, contact, licensing information, etc.) while others, mainly technical features about the contents and format of resources, differ across types. As aforesaid, publications differ from other resources types: their recommended metadata elements mainly describe criteria used for their selection in the corpus building process.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 23 of 30

One of the characteristic features of the META-SHARE family of schemas11 is the adoption of the component-based mechanism (Component MetaData Infrastructure, CMDI), according to which semantically coherent elements are grouped together to form components12 [Broeder et al., 2008]. For instance, the licensing module includes elements such as the name and URL of a licence, attribution text, copyright holders, etc. For the sake of simplicity, the container elements used for this grouping will not be presented in the guidelines unless required.

The OMTD-SHARE schema classifies elements into three levels of optionality:

● mandatory: elements that are necessary for intended purposes, i.e. for discovering resources and for triggering operations between content and software components

● recommended: elements that can help the current or future use of the resource, or useful information that providers have not yet standardized

● optional: all remaining information related to the lifecycle of a resource.

The XML Schema Definition (XSD) that formally describes the schema has been made publicly available13. An important difference from META-SHARE lies in the organisation vis-a-vis the different resource types covered: while META-SHARE describes all resources types in one common XSD, in OMTD-SHARE, the resource types are described in a more modular way as separate sets of XSDs.

Work is ongoing for producing also an RDF/OWL version, which will be documented in the next release of the guidelines.

6. Structure of the guidelines The current release includes four guidelines (cf. Appendix C), which correspond to the three major

11 Based on the META-SHARE schema, four more adaptations are now available: ELRC-SHARE, clarin:el, and OMTD-SHARE. The META-SHARE schema has also been implemented as an RDF/OWL ontology with the collaboration of the ld4lt W3C group. 12 To avoid confusion with the term "component" also used for software components, we will from now on refer to this concept as "modules". 13 The current version of XSD's is available at: https://github.com/openminted/omtd-share_metadata_schema and the documentation of v1.0.0 at: https://openminted.github.io/releases/omtd-share/.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 24 of 30

distinctions of resources involved in TDM processes:

● content resources to be mined, i.e. scholarly publications, ● ancillary (knowledge) resources used for the operation of the software (e.g. annotation schemas,

linguistic tagsets, lexical or ontological resources used for annotating the resources to be mined, machine learning models)

● TDM (-related) software, and one more for

● corpora as they can be used either as an ancillary resource or as a resource to be mined.

Each set of guidelines contains the following information:

● a brief introduction, specifying the resources expected, potential sources, minimal requirements for the contributions

● packaging and registering instructions for the OpenMinTeD registry ● technical and metadata requirements that empower interoperability ● for each resource type, an overview of the OMTD-SHARE metadata schema (minimal level) with

definitions, explanations, recommended usage and mappings to other widespread metadata schemas

● further instructions per type of contributors or resource type/subtype where required.

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 25 of 30

Appendix A - References

Broeder, D., T. Declerck, E. Hinrichs, S. Piperidis, L. Romary, N. Calzolari and P. Wittenburg,“Foundation of a Component-based Flexible Registry for Language Resources and Technology”, Proceedings of the 6th International Conference of Language Resources and Evaluation, 2008. Available at: http://www.lrec-conf.org/proceedings/lrec2008/

Gavrilidou M., P. Labropoulou, E. Desipri, S. Piperidis, H. Papageorgiou, M. Monachini, F. Frontini, T. Declerck, G. Francopoulo, V. Arranz, V. Mapelli (2012) "The META-SHARE Metadata Schema for the Description of Language Resources", LREC 2012, Istanbul, Turkey. http://www.lrec-conf.org/proceedings/lrec2012/pdf/998_Paper.pdf

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 26 of 30

Appendix B – Acknowledgements & Contributors The guidelines have been the product of work carried out mainly in the OpenMinTeD WP5 Interoperability Framework. The following internal and external experts have exchanged ideas and participated in discussions that have formulated the interoperability requirements, which these guidelines purport to describe:

Internal experts

• Sophia Ananiadou (University of Manchester, UK) • Lucas Anastasiou (Open University, UK) • Sophie Aubin (INRA, France) • Mouhamadou Ba (INRA, France) • Kalina Bontcheva (University of Sheffield, UK) • Robert Bossy (INRA, France) • Jacob Carter (University of Manchester, UK) • Louise Deléger (INRA, France) • Giulia Dore (University of Glasgow, UK) • Richard Eckart de Castilho (TU Darmstadt, Germany) • Fred Fenter (Frontiers Media S.A, Switzerland) • Dimitris Galanis (Athena RC, Greece) • Maria Gavriilidou (Athena RC, Greece) • Patricia Geretto (INRA, France) • Mark Greenwood (University of Sheffield, UK) • Lucie Guibault (University of Amsterdam, Netherlands) • Masoud Kiaeeha (TU Darmstadt, Germany) • Petr Knoth (Open University, UK) • Penny Labropoulou (Athena RC, Greece) • Antonis Lempesis (Athena RC, Greece) • Miguel Madrid (CNIO) • Natalia Manola (Athena RC, Greece) • Thomas Margoni (University of Glasgow, UK) • John McNaught (University of Manchester, UK) • Claire Nedellec (INRA, France) • Wim Peters (University of Sheffield, UK) • Stelios Piperidis (Athena RC, Greece) • Prokopis Prokopidis (Athena RC, Greece)

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 27 of 30

• Piotr Przybyla (University of Manchester, UK) • Angus Roberts (University of Sheffield, UK) • Matt Shardlow (University of Manchester, UK) • Mappet Walker (Frontiers Media SA, Switzerland)

External experts

• Giulia Ajmone Marsan (The Organisation for Economic Co-operation and Development) • Enrique Alonso (Consejo de Estado) • Geoffrey Bilder (CrossRef) • Lukasz Bolikowski (University of Warsaw, Poland) • Maurizio Borghi (Bournemouth University, UK) • Steve Cassidy (Macquarie University Sydney, Australia) • Christopher Cieri (LDC, USA) • Christian Chiarcos (Goethe-Universität Frankfurt am Main, Germany) • Liam Earney (JISC, UK) • Kristofer Erickson (CREATe) • Dominique Estival (Western Sydney University, Australia) • Gwen Franck (Creative Commons, EIFL) • Thilo Götz (IBM) • Nancy Ide (Vassar College, USA) • Pawel Kamocki (Institut für Deutsche Sprache, Germany) • Andreas Kempf (Deutsche Zentralbibiothek für Wirtschaftswissenschaften, Germany) • Jin-Dong Kim (Database Center for Life Science, Research Organization of Information and

Systems) • John McCrae (National University of Ireland, Galway, Ireland) • Federico Morando (Nexa Center for Internet & Society, Italiae) • Eric Nyberg (Carnegie Mellon University, USA) • Mark Perry (University of new England, Australia) • Diane Peters (Creative Commons HQ) • Rafal Rak (UberResearch, UK) • Jochen Schirrwagen (Universität Bielefeld, Germany) • Ineke Schuurman (CCL, University of Leuven) • Peter Suber (Berkman Klein Centre, Harvard University) • Keith Suderman (Vassar College, LAPPS) • Prodromos Tsiavos (The Media Institute) • Paul Uhlir (National Academy of Sciences) • Maarten van Gompel (Radboud University Nijmegen)

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 28 of 30

• Marc Verhagen (Brandeis University, LAPPS) • Piek Vossen (VU University Amsterdam, Netherlands) • Menzo Windhouwer (MPI for Psycholinguistics, Netherlands) • Maarten Zeinstra (Kennisland)

Platform Interoperability Guidelines

∙ ∙ ∙

Public Page 29 of 30

Appendix C - Guidelines

1.1

1.2

1.3

1.3.1

1.3.2

1.3.2.1

1.3.2.2

1.3.2.3

1.3.3

1.3.3.1

1.3.3.2

1.3.4

1.3.5

1.3.5.1

1.3.5.2

1.3.5.3

1.3.5.4

1.3.5.5

1.3.5.6

1.3.5.7

1.3.5.8

1.3.5.9

1.3.5.10

1.3.5.11

1.3.5.12

1.3.5.13

1.3.5.14

1.3.5.15

1.3.5.16

1.3.5.17

TableofContentsOpenMinTeDguidelines

Acknowledgements&Contributors

Guidelinesforprovidersofpublications

Introduction

Instructionsforpublicationrepositories,libraries,journals,publishers,etc.

Howtoregisteryourresources

Howtomakeyourresourcesinteroperable

Howtodocumentyourresources

InstructionsforaggregatorsHowtoregisteryourresources

Howtodocumentyourresources

Furtherrequirementsforannotatedpublications

RecommendedschemaforpublicationsdocumentType

publicationType

identifier

title

licence

rightsStmtName

rightsStmtURL

nonStandardLicenceName

nonStandardLicenceTermsURL

versionoflicence

distributionMedium

downloadURL

documentLanguage

fullText

abstract

author

publisher

1

1.3.5.18

1.3.5.19

1.3.5.20

1.3.5.21

1.3.5.22

1.3.5.23

1.3.5.24

1.3.5.25

1.3.5.26

1.3.5.27

1.3.5.28

1.3.5.29

1.3.5.30

1.3.5.31

1.3.5.32

1.3.6

1.3.6.1

1.3.6.2

1.3.6.3

1.3.6.4

1.3.6.5

1.3.6.6

1.3.6.7

1.3.6.8

1.3.6.9

1.3.6.10

1.3.6.11

1.4

1.4.1

1.4.2

1.4.2.1

1.4.2.2

1.4.2.3

1.4.2.4

journal

mimeType

characterEncoding

publicationDate

subject

keyword

collectedFromrepositoryNameorrepositoryIdentifier

sourceMetadataLink

originalDataProviderType

originalDataProviderRepository

originalDataProviderJournal

originalDataProviderPublisher

relationType

relatedResource1

relatedResource2

Metadataschemaforannotatedpublications

annotationLevel

annotationStandoff

mimeType

documentationURL

dataFormatSpecific

characterEncoding

typesystem

tagset

annotationMode

isAnnotatedBy

annotationDate

Guidelinesforprovidersofcorpora

IntroductionInstructionsforprovidersofcorpora

Howtoregisteryourresources

Howtomakeyourresourcesinteroperable

Howtodocumentyourresources

Furtherrequirementsforannotatedcorpora

2

1.4.2.5

1.4.2.5.1

1.4.2.5.2

1.4.2.5.3

1.4.2.5.4

1.4.2.5.5

1.4.2.5.6

1.4.2.5.7

1.4.2.5.8

1.4.2.5.9

1.4.2.5.10

1.4.2.5.11

1.4.2.5.12

1.4.2.5.13

1.4.2.5.14

1.4.2.5.15

1.4.2.5.16

1.4.2.5.17

1.4.2.5.18

1.4.2.5.19

1.4.2.5.20

1.4.2.5.21

1.4.2.5.22

1.4.2.5.23

1.4.2.5.24

1.4.2.5.25

1.4.2.5.26

1.4.2.5.27

1.4.2.5.28

1.4.2.5.29

1.4.2.5.30

1.4.2.5.31

1.4.2.5.32

RecommendedschemaforcorporaresourceName

resourceType

description

identifier

version

licence

rightsStmtName

rightsStmtURL

versionoflicence

nonStandardLicenceName

nonStandardLicenceTermsURL

distributionMedium

downloadURL

contactEmail

landingPage

contactPerson(identifierorpersonName)

contactGroup(identifierororganizationName)

mustBeCitedWith

resourceCreator

creationDate

corpusType

mediaType

lingualityType

multilingualityType

language

sizePerLanguage

size

mimeType

characterEncoding

domain

subject

keyword

3

1.4.2.5.33

1.4.2.5.34

1.4.2.5.35

1.4.2.5.36

1.4.2.6

1.5

1.5.1

1.5.2

1.5.2.1

1.5.2.2

1.5.2.3

1.5.2.4

1.5.2.4.1

1.5.2.4.2

1.5.2.4.3

1.5.2.4.4

1.5.2.4.5

1.5.2.4.6

1.5.2.4.7

1.5.2.4.8

1.5.2.4.9

1.5.2.4.10

1.5.2.4.11

1.5.2.4.12

1.5.2.4.13

1.5.2.4.14

1.5.2.4.15

1.5.2.4.16

1.5.2.4.17

1.5.2.4.18

1.5.2.4.19

1.5.2.4.20

1.5.2.4.21

userQuery

relationType

relatedResource1

relatedResource2

Metadataschemaforannotatedcorpora

Guidelinesforprovidersofknowledgeresources

Introduction

Instructionsforprovidersofancillaryknowledgeresources

Howtoregisteryourknowledgeresources

Howtomakeyourknowledgeresourcesinteroperable

Howtodocumentyourknowledgeresources

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

resourceType

resourceName

description

identifier

licence

rightsStmtName

rightsStmtURL

nonStandardLicenceName

nonStandardLicenceTermsURL

versionoflicence

distributionMedium

downloadURL

accessURL

contactEmail

landingPage

contactPerson(identifierorpersonName)

contactGroup(identifierororganizationName)

mustBeCitedWith

lexicalConceptualResourceType

encodingLevel

linguisticInformation

4

1.5.2.4.22

1.5.2.4.23

1.5.2.4.24

1.5.2.4.25

1.5.2.4.26

1.5.2.4.27

1.5.2.4.28

1.5.2.4.29

1.5.2.4.30

1.5.2.4.31

1.5.2.4.32

1.5.2.5

1.5.2.5.1

1.5.2.5.2

1.5.2.5.3

1.5.2.5.4

1.5.2.5.5

1.5.2.5.6

1.5.2.5.7

1.5.2.5.8

1.5.2.5.9

1.5.2.5.10

1.5.2.5.11

1.5.2.5.12

1.5.2.5.13

1.5.2.5.14

1.5.2.5.15

1.5.2.5.16

1.5.2.5.17

1.5.2.5.18

1.5.2.5.19

1.5.2.5.20

1.5.2.5.21

conformanceToStandardsBestPractices

lingualityType

language

metalanguage

size

domain

characterEncoding

mimeType

relationType

relatedResource1

relatedResource2

RecommendedschemaformodelsresourceType

resourceName

identifier

description

version

licence

rightsStmtName

rightsStmtURL

nonStandardLicenceName

nonStandardLicenceTermsURL

versionoflicence

distributionMedium

downloadURL

contactEmail

landingPage

contactPerson(identifierorpersonName)

contactGroup(identifierororganizationName)

mustBeCitedWith

resourceCreator(personororganization,describedwithidentifierorname)

variantName

tagset

5

1.5.2.5.22

1.5.2.5.23

1.5.2.5.24

1.5.2.5.25

1.5.2.5.26

1.5.2.5.27

1.5.2.5.28

1.5.2.5.29

1.5.2.5.30

1.5.2.5.31

1.5.2.5.32

1.5.2.5.33

1.6

1.6.1

1.6.2

1.6.2.1

1.6.2.2

1.6.2.3

1.6.2.4

1.6.3

1.6.4

1.6.4.1

1.6.4.2

1.6.4.3

1.6.4.4

1.6.4.5

1.6.4.6

1.6.4.7

1.6.4.8

1.6.4.9

1.6.4.10

1.6.4.11

1.6.4.12

1.6.4.13

typesystem

algorithm

trainingCorpusDetails

mediaType

lingualityType

language

size

mimeType

characterEncoding

relationType

relatedResource1

relatedResource2

GuidelinesforprovidersofsoftwareresourcesIntroduction

Instructionsforprovidersofsoftwarecomponents

Howtoregisteryourcomponents

Howtomakeyourcomponentsinteroperable

Howtodocumentyourcomponents

GuidefordeployingUIMAcomponentsintheArgoplatform

Recommendedancillaryknowledgeresources

Recommendedschemaforsoftwareresources

resourceType

resourceName

description

identifier

version

componentType

licence

rightsStmtName

rightsStmtURL

nonStandardLicenceTermsURL

versionoflicence

componentDistributionMedium

accessURL

6

1.6.4.14

1.6.4.15

1.6.4.16

1.6.4.17

1.6.4.18

1.6.4.19

1.6.4.20

1.6.4.21

1.6.4.22

1.6.4.25

1.6.4.26

1.6.4.29

1.6.4.30

1.6.4.31

1.6.4.33

1.6.4.34

1.6.4.35

1.6.4.36

1.6.4.37

1.6.4.38

1.6.4.39

1.6.4.23

1.6.4.24

1.6.4.27

1.6.4.28

1.6.4.32

1.7

1.8

downloadURL

contactEmail

landingPage

contactPerson(identifierorpersonName)

contactGroup(identifierororganizationName)

mailingListInfo

onlineHelpURL

issueTracker

mustBeCitedWith

resourceCreator(personororganization,describedwithidentifierorname)

mediaTypeinsideinputContentResourceInfooroutputResourceInfo

resourceTypeinsideinputContentResourceInfooroutputResourceInfo

languageinsideinputContentResourceInfooroutputResourceInfo

characterEncodinginsideinputContentResourceInfooroutputResourceInfo

mimeTypeinsideinputContentResourceInfooroutputResourceInfo

dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo

typesysteminsideinputContentResourceInfooroutputResourceInfo

tagsetinsideinputContentResourceInfooroutputResourceInfo

annotationLevelinsideinputContentResourceInfooroutputResourceInfo

typesysteminsidecomponentDependencies

tagsetinsidecomponentDependencies

annotationResourceinsidecomponentDependencies

framework

relationType

relatedResource1

relatedResource2

TheOMTD-SHAREmetadataschema

Glossary

7

OpenMinTeDguidelinesWelcometotheOpenMinTeDGuidelines!

OpenMinTeDenablesthecreationofaninfrastructurethatfostersandfacilitatestheuseofTextandDataMining(TDM)technologiesinthescientificpublicationsworld,buildsonexistingTDMtoolsandplatforms,andrendersthemdiscoverableandinteroperablethroughappropriateregistriesandastandards-basedinteroperabilitylayer,respectively.

Thisiswhereyou'llfindinformationon

howtomakeyourresourcesinteroperablewithotherresourcesforTDMpurposeshowtoregisteryourresourcesattheOpenMinTeDplatform(https://services.openminted.eu/)howtocontributetotheguidelines.

TDMinvolvesawiderangeofresourcetypes:

thecontentresourcestobemined,i.e.scholarlypublicationsinthecurrentphase,theTDMsoftwareandancillaryknowledgeresourcesusedfortheoperationofthesoftware(e.g.annotationschemes,linguistictagsets,lexicalorontologicalresourcesusedforannotatingtheresourcestobemined,machinelearningmodels,annotatedtextualcorpora).

Fourguidelinesarereleasedtargetingprovidersoftheseresources:

GuidelinesforprovidersofpublicationsGuidelinesforprovidersofcorporaGuidelinesforprovidersofsoftwareresourcesGuidelinesforprovidersofknowledgeresources

TheOpenMinTeDplatformservesasafacilitatorofTDMinanecosystemofe-infrastructuresandrepositories,collecting,transformingandmakingavailableresourcesonlyasneededforTDMpurposes.Inotherwords,itisnotonemoreregistryofcontentandservices,anditdoesn'tseektocollectandprovideinformationaboutresourcesthatmightbeofinteresttoTDMstakeholders.

Importantnotice

ResourcesaretoberegisteredintoOpenMinTeDonlyiftheycanbeaccessedanddeployedinthecontextofaTDMprocessingoperation.

OpenMinTeDguidelines

8

Eachsetofguidelinescontainsthefollowinginformation:

abriefintroduction,specifyingtheresourcesexpected,potentialsources,minimalrequirementsforthecontributionspackagingandregisteringinstructionsfortheOpenMinTeDregistrytechnicalandmetadatarequirementsthatempowerinteroperabilityforeachresourcetype,anoverviewoftheOMTD-SHAREmetadataschema(minimallevel)withdefinitions,explanations,recommendedusageandmappingstootherpopularmetadataschemasfurtherinstructionspertypeofcontributorsorresourcetype/subtypewhererequired.

OpenMinTeDguidelines

9

Acknowledgements&ContributorsTheguidelineshavebeentheproductofworkcarriedoutmainlyintheOpenMinTeDWP5InteroperabilityFramework.Thefollowinginternalandexternalexpertshaveexchangedideasandparticipatedindiscussionsthathaveformulatedtheinteroperabilityrequirements,whichtheseguidelinespurporttodescribe:

Internalexperts

SophiaAnaniadou(UniversityofManchester,UK)LucasAnastasiou(OpenUniversity,UK)SophieAubin(INRA,France)MouhamadouBa(INRA,France)KalinaBontcheva(UniversityofSheffield,UK)RobertBossy(INRA,France)JacobCarter(UniversityofManchester,UK)LouiseDeléger(INRA,France)GiuliaDore(UniversityofGlasgow,UK)RichardEckartdeCastilho(TUDarmstadt,Germany)FredFenter(FrontiersMediaS.A,Switzerland)DimitrisGalanis(AthenaRC,Greece)MariaGavriilidou(AthenaRC,Greece)PatriciaGeretto(INRA,France)MarkGreenwood(UniversityofSheffield,UK)LucieGuibault(UniversityofAmsterdam,Netherlands)MasoudKiaeeha(TUDarmstadt,Germany)PetrKnoth(OpenUniversity,UK)PennyLabropoulou(AthenaRC,Greece)AntonisLempesis(AthenaRC,Greece)MiguelMadrid(CNIO)NataliaManola(AthenaRC,Greece)ThomasMargoni(UniversityofGlasgow,UK)JohnMcNaught(UniversityofManchester,UK)ClaireNedellec(INRA,France)WimPeters(UniversityofSheffield,UK)SteliosPiperidis(AthenaRIC,Greece)ProkopisProkopidis(AthenaRC,Greece)PiotrPrzybyla(UniversityofManchester,UK)AngusRoberts(UniversityofSheffield,UK)

Acknowledgements&Contributors

10

MattShardlow(UniversityofManchester,UK)MappetWalker(FrontiersMediaSA,Switzerland)

Externalexperts

GiuliaAjmoneMarsan(TheOrganisationforEconomicCo-operationandDevelopment)EnriqueAlonso(ConsejodeEstado)GeoffreyBilder(CrossRef)LukaszBolikowski(UniversityofWarsaw,Poland)MaurizioBorghi(BournemouthUniversity,UK)SteveCassidy(MacquarieUniversitySydney,Australia)ChristopherCieri(LDC,USA)ChristianChiarcos(Goethe-UniversitätFrankfurtamMain,Germany)LiamEarney(JISC,UK)KristoferErickson(CREATe)DominiqueEstival(WesternSydneyUniversity,Australia)GwenFranck(CreativeCommons,EIFL)ThiloGötz(IBM)NancyIde(VassarCollege,USA)PawelKamocki(InstitutfürDeutscheSprache,Germany)AndreasKempf(DeutscheZentralbibiothekfürWirtschaftswissenschaften,Germany)Jin-DongKim(DatabaseCenterforLifeScience,ResearchOrganizationofInformationandSystems)JohnMcCrae(NationalUniversityofIreland,Galway,Ireland)FedericoMorando(NexaCenterforInternet&Society,Italiae)EricNyberg(CarnegieMellonUniversity,USA)MarkPerry(UniversityofnewEngland,Australia)DianePeters(CreativeCommonsHQ)RafalRak(UberResearch,UK)JochenSchirrwagen(UniversitätBielefeld,Germany)InekeSchuurman(CCL,UniversityofLeuven)PeterSuber(BerkmanKleinCentre,HarvardUniversity)KeithSuderman(VassarCollege,LAPPS)ProdromosTsiavos(TheMediaInstitute)PaulUhlir(NationalAcademyofSciences)MaartenvanGompel(RadboudUniversityNijmegen)MarcVerhagen(BrandeisUniversity,LAPPS)PiekVossen(VUUniversityAmsterdam,Netherlands)MenzoWindhouwer(MPIforPsycholinguistics,Netherlands)MaartenZeinstra(Kennisland)

Acknowledgements&Contributors

11

Acknowledgements&Contributors

12

GuidelinesforprovidersofpublicationsIntroductionInstructionsforpublicationrepositories,libraries,publishersetc.InstructionsforaggregatorsFurtherrequirementsforannotatedpublicationsRecommendedschemaforpublications

Guidelinesforprovidersofpublications

13

IntroductionOpenMinTeDfacilitatestheuseofTDMtechnologiesinthescientificpublicationsworld,rangingfromgenericscholarlycommunicationtoliteraturerelatedtospecificdisciplines.Scholarlypublicationscomefromawidebulkofstakeholders,e.g.institutionalanddisciplinerepositories,academicjournals,scientificpublishers,etc.Forthefirstphase,thefocusisonliteraturerepositoriesandpublishers,asregardssources,andonOpenAccesscontent,asregardsaccessconditions.

Importantnotice

Itshouldbenotedthatonlypublicationsthatprovidethefulltextor,atleast,anabstractarecandidateforinclusioninOpenMinTeD.

OpenMinTeDreliesonexistinginfrastructuresandstandards/bestpracticesforitsoperation.Thus,toaccessscholarlypublications,itreliesonthetwomainaggregatorsofsuchcontent,OpenAIREandCORE.Providersofscholarlypublicationsareaskedtocontributetheirresourcesbydepositingthematoneofthesestakeholders,followingtheirrespectiveguidelinesandprocedures.Inaddition,OpenAIREandCOREaredevelopingacontentconnectorthatallowsharvestingofopenaccesspublicationsthroughtheAPIsofpublishersthatallowthis.

ScholarlypublicationsareimportedintoOpenMinTeDforTDMprocessingviathecreationofcorporauponqueriessubmittedbytheend-users.ResearcherscometoOpenMinTeDnottoreadpublications,buttobuildacorpusbyselectingpublicationsfromvarioussourcesbasedonspecificcriteria,e.g."acorpusofEnglisharticlesinthebiomedicinearea",inordertorunTDMservicesonthem.

OpenMinTedhaselaboratedseveralarchitecturaloptionsofhowtointegrateexistingcontentproviders(suchasOpenAIREandCOREbutnotlimitedto)andchooseanapproachwherebycontentismanagedinthoseexternalservicesbutisaccessibleintheOpenMinTeDplatformthroughafederatedsearchstrategy.ContentismadeavailabletoOpenMinTedplatformthroughasimpleAPI,definingsimpleoperationstosearchandretrievecontent.

Asoneofthefirststepsofbuildingacorpusofscholarlypublications,end-usersareexpectedtoissueaqueryintheOpenMinTedregistry:infact,theyarepresentedwithafacetedviewoftheOpenMinTeDcontents(i.e.ofallregisteredcontentproviders)and,byselectingfromarangeofcriteria,aqueryisgraduallybuilt.Resultsfromallregisteredcontentprovidersarepresentedtotheend-userand,afterrefinementandcarefulelicitation

Introduction

14

ofthefinalquery,theassociatedcontentistransferredtoOpenMinTeD’sregistryandbecomesavailableforthesubsequentstepsofaTDMworkflow.Alazydeposit/cachingstrategyhasbeenemployedtoavoidredundantqueries(insimpletermsarecordisfetchedonlythefirsttimeitisrequestedandremainspersistentlocallyforfurtherrequests).Extracareistakentoensurereproducibilityofthecreatedcorpusbystoringanexactversionofthecontentusedinit.

Thus,acorpusincludedintheOpenMinTeDRegistryessentiallyconsistsofalistofpublications.Eachpublicationisidentified(equivalenttoaprimarykey)byitscontent(fulltextpdf)hashvalueandasetofmetadatafiles(intheOMTD-SHAREschema)thatdescribetheresource.Inmostcases,thissetconsistsofjustoneitembutthecasethatmultiplemetadatafilesdescribethesameresourceispossible(forexampledifferentmetadatafilesfromCOREorOpenAIRE,updateinmetadatafields,richermetadatafromacontentprovider,etc.)

Thefollowingsectionspresentalistofinstructions,requirementsandrecommendationsthatpublicationsmustmeettointeractwithTDMresources.

Introduction

15

Instructionsforpublicationrepositories,libraries,journals,publishers,etc.

HowtoregisteryourresourcesHowtomakeyourresourcesinteroperableHowtodocumentyourresources

Instructionsforpublicationrepositories,libraries,journals,publishers,etc.

16

Howtoregisteryourresources

IfyouwishtoregisterpublicationsthatcanbeharvestedforTDMpurposesthroughOpenMinTeD,youcandoso

byregisteringthroughOpenAIRE,followingproceduresandguidelinesat:https://www.openaire.eu/validator/welcome.action

OR

byregisteringthroughCORE,followingproceduresat:https://core.ac.uk/join

ForeachpublicationtobevalidforimportintoOpenMinTeD,ametadatarecordconformantwiththeOMTD-SHAREminimalschema,andafilewiththecontentsmustbedelivered.

Howtoregisteryourresources

17

Howtomakeyourresourcesinteroperable

TobefullycompatiblewithOpenMinTeD,youmust:

provideafilewiththeactualcontentsofeachpublicationinanyformatyoudesire(e.g.PDF,HTML,etc.).

Inaddition,ifyouwishyourmaterialtobeeasilyprocessableandinteroperablewithTDMtoolsandservices,youshouldadoptthefollowingrecommendations:

Thepreferredformatsfordeliveringtextualmaterialareplaintext,XML,PDF(notproprietaryandcertainlynotofscannedimages),whichcanbereadbyoneoftheexistingreaders.

Ifappropriateforyourmaterial,useoneofthemorespecificdataformatsthatarealreadysupportedbyreadersandconvertersincludedintheOpenMinTeDregistry(cf.dataFormatSpecific).

ThepreferredcharacterencodingisUTF-8.

Please,notethatnotalloftheaboverequirementsareabsolute:ifyourmaterialisnotcompliantwiththem,itmaystillbeprocessable,buttheiradoptionmakesitbetterequippedforTDMandNLPprocessing.

Howtomakeyourresourcesinteroperable

18

Howtodocumentyourresources

TobefullycompatiblewithOpenMinTeD,youmust

provideametadatarecordforeachpublicationwithatleastbibliographicinformationaboutit,inpreferencefollowingtheOpenAIREguidelinesensurethatthepublicationsaredistributedunderOpenAccessconditionsincludeinthemetadatarecordofeachpublicationalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththepublicationifyoualreadyhaveaPIDforyourpublication(preferablyDOI),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformationonidentifierschemes).

Thefollowingrecommendationswillhelpinteractionwithyourresources,buttheyarenotmandatory.

FurtheradoptionofstandardssuchastheJATSarticletagsuiteorTEIP5guidelinesforannotatingtheinnerstructureofpublicationsisrecommended.Usestandardclassificationvocabularies,suchasMeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.

Howtodocumentyourresources

19

InstructionsforaggregatorsForthefirstphaseoftheproject,OpenAIREandCOREwillbringcontentresourcesintoOpenMinTeDthroughuserqueries.Fornextversions,interestedcontentproviderswillbeabletocontributedirectlytoOpenMinTeDiftheyimplementthefollowing:

MapthemetadataoftheircontentstotheOMTD-SHAREschemaProvidesearchcapabilitiesonthemetadataProvidetheactualcontent(e.g.fulltextinthecaseofpublications)

Morespecificinstructionsarefoundinthenextsection.

HowtoregisteryourresourcesHowtodocumentyourresources

Instructionsforaggregators

20

Howtoregisteryourresources

InterestedcontentprovidersmustimplementaJavainterface,calledContentConnector,whichcanbefoundathttps://github.com/openminted/content-connector-api.TheimplementationisthenincludedinthecodeoftheContentServiceoftheOpenMinTeDplatform.Thisinterfacespecifiesthreemethods:

search,whichacceptsaQueryobjectdescribingaqueryandreturnsapageofmetadata.Thismethodisusedforbrowsingthemetadataoftheproviderandsupportskeywordsearch,advancedsearchinanumberoffieldsandalsofacetedsearch.Theresultofthemethodis(a)apage(ofuserspecifiedsize)ofmetadata,(b)thestatisticsoftheresults(totalnumberofhits,etc),and(c)thefacets(ifrequested).

fetchMetadata,whichacceptsaQuery,but,unlikethepreviousmethod,returnsallthemetadataoftheresult,withoutanystatisticsorfacets.Theresultisastreamcontainingasinglexmlelement(called“publications”),whichinturncontainsallthemetadataofthecontent.Thismethodiscalledwhenacorpusisbeingbuilt.

downloadFullText,whichgivenapublicationidentifier(ascontainedinthemetadata)returnsastreamcontainingtheactualcontent.Thismethodisagainusedwhentheplatformisbuildingacorpus.

AdditionaltechnicalinformationisprovidedintheJavacodeoftheinterface.

Howtoregisteryourresources

21

Howtodocumentyourresources

Inthecaseofpublications,therequiredmetadatarecordscomeattwolevels:

oneforthewholequery-generatedcorpusofpublications,incompliancewiththeOMTD-SHAREschemaforcorpora,whichisautomaticallyconstructedonthebasisoftheuserfiltersandmanuallyenrichedbytheuser;oneperpublication,withaminimalsetofmetadataelementsincompliancewiththeOMTD-SHAREschemaforpublications,automaticallyconvertedfromthecurrentschemasoftheproviders.

Itshouldbenotedthattheoriginalresourceproviders(e.g.publicationrepositories,publishersetc.)thatofferpublicationsviaOpenAIREandCOREdonothavetochangetheircurrentschemas.MappingsandconversionsbetweentheOpenAIRE andCOREmetadataandtheOMTD-SHAREschemaaremadebytheprovidersthemselvesintheframeworkofOpenMinTeD .

AllmetadatarecordsforpublicationsmustbedeliveredinXMLformat.

.TheOpenAIREschemaandguidelinesarecurrentlyunderrevision;collaborationwiththerelevantactorshasbeenestablishedtotakeintoaccountthenewfeaturesand,wheredesired,influencethechangessoastosupportTDMprocessesinaccordancetotheinteroperabilityrequirements.↩

.Mappingswithothermetadataschemas,includingOpenAIREandCORE,areincludedinthepresentationoftherecommendedmetadataschema.↩

1

2

1

2

Howtodocumentyourresources

22

Furtherrequirementsforannotatedpublications

ScholarlypublicationswillnormallybeimportedintotheOpenMinTeDplatforminanunprocessedformatandwillbeannotatedbytheoperationofTDMsoftwarealsoregisteredintheplatform.

However,certainprovidersmaydecidetoruntheTDMorannotationsoftwareattheirownpremisesanduploadtheresultsoftheprocessingdirectlyintoOpenMinTeD(e.g.annotatingthepublicationswithstructuralmarkup,extractingacknowledgementsorcitationssectionsetc.).

Inthesecases,theannotatedoutputisconsideredanewresourceand,therefore,shouldberegistered

asaseparateresourcefromtherawpublicationinafoldercalled"annotatedfiles"withitsownmetadatarecord,followingtheinstructionsforannotatedpublications.

ItshouldbenotedthatpublicationsannotatedbymeansoftheOpenMinTeDplatformwillbeautomaticallyassignedtheappropriatevaluesfortheseelements.

Furtherrequirementsforannotatedpublications

23

RecommendedschemaforpublicationsThissectionincludestheoverviewoftherecommendedOMTD-SHAREschemaforpublications,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements.Onlyelementsrelatedtothedescriptionoftheresourcearepresentedhere;additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arehandledinternallybytheOpenMinTeDplatform.

Forannotatedpublications,seehere.

OMTD-SHAREelement Usage

documentType M

publicationType M

identifier M

title M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M

nonStandardLicenceName Rwhenapplicable

nonStandardLicenceTermsURL Mwhenapplicable

versionoflicence Μ

distributionMedium M

downloadURL Μwhenapplicable

documentLanguage M

fullText R

abstract R

author R

publisher R

journal R

mimeType R

characterEncoding R

publicationDate R

Recommendedschemaforpublications

24

subject R

keyword R

collectedFromrepositoryNameorrepositoryIdentifier R

sourceMetadataLink R

originalDataProviderType R

originalDataProviderRepository Rwhenapplicable

originalDataProviderJournal Rwhenapplicable

originalDataProviderPublisher Rwhenapplicable

relationType R

relatedResource1 Mwhenapplicable

relatedResource2 Mwhenapplicable

Recommendedschemaforpublications

25

documentType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:documentType:bibliographicRecordOnly,abstract,fullText

Definition/Explanations

Specifieswhetherthemetadatarecordprovidesaccesstothefulltext,theabstractorservesonlyasabibliographicrecord(i.e.includesonlymetadata)

Recommendedusage

Please,selectoneofthevaluesprovidedtoindicatewhetherthemetadatarecordincludesthefulltext(eitherasalinkorasafreetextfieldinsidetherecord),theabstract(again,asalinkorasafreetextdescriptioninametadataelement)ornoneatall.Iftherecordincludesboththeabstractandthefulltext,thepreferredoptionistoselect"fullText".

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:type

documentType

26

publicationType

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:publicationType:article,bachelorThesis,masterThesis,doctoralThesis,book,bookPart,review,conferenceObject,lecture,workingPaper,prePrint,report,annotation,contributionToPeriodical,patent,inProceedings,booklet,manual,techReport,inCollection,unpublished,other

Definition/Explanations

Specifiesthetypeofthepublication(e.g.whetherit'sajournalarticle,oralpaperorposterintheproceedingsofaconferenceetc.)

Recommendedusage

Please,selectoneofthevaluesfromthelist(compatiblewiththeCASRAIresearch/scholarlyoutputtypesIhttp://dictionary.casrai.org/Output_Types));ifnoneofthevaluesfits,pleaseuse"other"

Relationtoothermetadataschemas

OpenAIREcurrentversion:computedfrominstanceTypeOpenAIREv4.0:dc:typeCORE:article.typesDCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageforpublicationsistouse"text"fordatacite:resourceTypeGeneralandoneoftheCASRAIvaluesfordatacite:resourceType(e.g.text/ConferenceObject)

publicationType

27

identifier

Usage

Mandatory

Type

freetext

Attributes

ms-omtd:publicationIdentifierSchemeNameorms-omtd:schemeURI

Definition/Explanations

ReferencetoaDOI(recommended)oranykindofidentifierusedforthepublication

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURIthatdocumentstheschemeitadheresto.

Relationtoothermetadataschemas

OpenAIREcurrentversion:doi/pmc/etc.identifiersOpenAIREv4.0:dc:identifierCORE:article.id&article.identifiersDCMI:skos:closeMatchdct:identifierDataCite4.0:datacite:contributorwithskos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)contributorType="ContactPerson",contributorName(familyName&givenName)ornameIdentifierandnameIdentifierSchemeandschemeURI

identifier

28

title

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:langandms-omtd:titleType

Definition/Explanations

Thetitleofthepublication

Recommendedusage

Pleaseprovidethetitleasintheoriginalmetadatarecord;the"lang"attributecanbeusedtospecifythelanguageofthetitle,andthe"titleType"attribute(afterDataCite)todifferentiatebetweenmaintitle,alternativeortranslatedtitleandsubtitle.

Relationtoothermetadataschemas

OpenAIREcurrentversion:titleOpenAIREv4.0:dc:titleCORE:article.titleDCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

title

29

licence

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

licence

30

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.ForpublicationsharvestedfromOpenAIREandCORE,pleaseprovidetheoriginallicencevalueifitwasincludedintheoriginalmetadatarecord;inanycase,the"rightsStmtName"elementmustadditionallybeusedforallpublications.

Relationtoothermetadataschemas

OpenAIREcurrentversion:bestlicenseprovidesinfoforNonStandardLicenceTermsandRightsStatementInfoOpenAIREv4.0:dc:rights&file/dc:accessRightsDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

licence

31

rightsStmtName

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

OpenAIREcurrentversion:conversionfrombestlicenceclassnameDCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

rightsStmtName

32

rightsStmtURL

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

OpenAIREcurrentversion:http://api.openaire.eu/vocabularies/dnet:access_modesDCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

rightsStmtURL

33

nonStandardLicenceName

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

freetext

Definition/Explanations

Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences

Recommendedusage

Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.

Relationtoothermetadataschemas

OpenAIREcurrentversion:bestlicenseDCMI:skos:closeMatchdct:title(fordct:licenseDocument)

nonStandardLicenceName

34

nonStandardLicenceTermsURL

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

OpenAIREcurrentversion:bestlicenseclassidDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

nonStandardLicenceTermsURL

35

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

versionoflicence

36

distributionMedium

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other

Definition/Explanations

Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource

Recommendedusage

Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.ForpublicationsharvestedfromOpenAIREandCORE,thedefaultvalueis"downloadable",ifthedocumentTypeis"abstract"or"fullText".Please,notethatIfthepublicationisdistributedindifferentmediumsunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelements("distributionInfo")todescribethem.

Relationtoothermetadataschemas

OpenAIREv4.0:distributionInfoarerelatedtowebresourceorurlDCMI:skos:closeMatchdct:medium

distributionMedium

37

downloadURL

Usage

Recommendedunderconditions

Conditionsforusage

ifdistributionMedium=downloadable

Definition/Explanations

AnyURLwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,useforpublicationswhoseactualcontentisnotalreadyuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheactualcontentofthepublicationandnottoalandingpage.ForpublicationsharvestedfromOpenAIRE&CORE,thefullcontentmustbeuploadedinOpenMinTeDaccordingtotheapprovedguidelinesfortheuserbuiltcorporaofpublications.

Relationtoothermetadataschemas

OpenAIREcurrentversion:urlCORE:article.fulltextURLs

downloadURL

38

documentLanguage

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:documentLanguage(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

ThelanguagethedocumentiswritteninaccordingtoIETFBCP47guidelines

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)accordingtotheIETFBCP47guidelines

Relationtoothermetadataschemas

OpenAIREcurrentversion:language(buttobemappedfromISO639-23-lettercodestous)OpenAIREv4.0:dc:languageCORE:article.languageDCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language

documentLanguage

39

fullText

Usage

Recommended

Type

freetext

Attributes

xs:lang

Definition/Explanations

Thefulltextofthepublicationinsimpletextformat

Recommendedusage

Youcanusethismetadataelementtoincludethefulltextofthepublicationinsimpletextformatinsteadofuploadingitasaseparatefile.

Relationtoothermetadataschemas

OpenAIREv4.0:file/objectTypeCORE:article.fulltext

fullText

40

abstract

Usage

Recommended

Type

freetext

Attributes

xs:lang

Definition/Explanations

Theabstractofthedocumentinplaintextformat

Recommendedusage

Youcanusethismetadataelementtoincludetheabstractofthepublicationinsimpletextformat;theelementcanberepeatedforthedifferentlanguageversionsusingthe"lang"attributetospecifythelanguage.

Relationtoothermetadataschemas

OpenAIREcurrentversion:dc:descriptionOpenAIREv4.0:dc:descriptionCORE:article.descriptionDCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType

abstract

41

author

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thathas/haveauthoredthepublication

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.

Relationtoothermetadataschemas

OpenAIREcurrentversion:rels/relOpenAIREv4.0:datacite:creatorCORE:article.authorsDCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&datacite:schemeURI

author

42

publisher

Usage

Recommended

Type

personororganization,bothencodedwithidentifierormultilingualfreetext

Attributes

forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)ororganization(s)thathas/havepublishedthepublication

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.

Relationtoothermetadataschemas

OpenAIREcurrentversion:publisherOpenAIREv4.0:dc:publisherCORE:article.publisherDCMI:skos:exactMatchdct:publisher

publisher

43

DataCite4.0:skos:exactMatchdct:Publisher

publisher

44

journal

Usage

Mandatoryifapplicable

Conditionsforusage

Ifthearticlecomesfromajournal

Type

identifierormultilingualfreetext

Attributes

ms-omtd:journalIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)

Definition/Explanations

Groupsinformationonthejournalwherethepublicationhasappeared

Recommendedusage

Therecommendedwayforreferringtoajournalisbygivingtheiridentifier(e.g.ISSN,DOI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"journalIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthejournal,youmayprovidethetitleatleastinEnglish;ifyouwanttoaddtitlesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREcurrentversion:journalCORE:article.journalsDCMI:skos:exactMatchdct:title(forjournals)

journal

45

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)

Relationtoothermetadataschemas

OpenAIREv4.0:format&file/mimetypeDCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format

mimeType

46

mimeType

47

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.

characterEncoding

48

publicationDate

Usage

Recommended

Type

datepattern(yearoryearandmonthorfulldate)

Definition/Explanations

Thepublicationdateor,foranunpublishedwork,thedateitwaswritten

Recommendedusage

Ifpossible,provideatleasttheyearofpublication(orcreation)

Relationtoothermetadataschemas

OpenAIREcurrentversion:dateofacceptanceOpenAIREv4.0:datacite:datewithdateType:acceptedCORE:Article.datePublishedDCMI:skos:closeMatchdct:createdDataCite4.0:skos:closeMatchdatacite:CreationDate

publicationDate

49

subject

Usage

Recommended

Type

freetext

Attributes

ms:classificationSchemeNameandms:schemeURI

Definition/Explanations

Subjectortopicofthedocument

Recommendedusage

Itisrecommendedthatthesubjectsaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthesubjectvaluesistheidentifierofthesubjectinthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.

Relationtoothermetadataschemas

OpenAIREcurrentversion:subjectwithschemeid&schemename(aftermappingtoourvalues)OpenAIREv4.0:dc:subjectCORE:article.subjects&article.topicsDCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI

subject

50

keyword

Usage

Recommended

Type

freetext

Definition/Explanations

Wordsusedforindexingthedocument

Recommendedusage

Afreetextelementusedforencodingkeywordsfortheclassificationofthepublication,onlyinEnglish;please,encodeoneword/phraseeachtimeandrepeattheelementformultiplekeywords.

Relationtoothermetadataschemas

OpenAIREcurrentversion:subjectwithclassidequaltokeywordDCMI:skos:narrowMatchdct:subject

keyword

51

collectedFromrepositoryNameorrepositoryIdentifier

Usage

Recommended

Type

identifier(repositoryIdentifier)ormultilingualfreetext(repositoryName)

Attributes

ms-omtd:repositoryIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)

Definition/Explanations

Referstotheentity(repository,aggregatoretc.)fromwhichthemetadatarecordhasbeenharvestedintoOMTD

Recommendedusage

Therecommendedwayforreferringtoarepositoryisbygivingitsidentifier(e.g.openDOAR);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"repositoryIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftherepository,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREv4.0:dc:source

collectedFromrepositoryNameorrepositoryIdentifier

52

sourceMetadataLink

Usage

Recommended

Type

URLpattern

Definition/Explanations

Alinktotheoriginalmetadatarecord,incasesofharvesting

Recommendedusage

ThiselementcanbeencodedautomaticallybyOMTDincasesofharvesting.

Relationtoothermetadataschemas

CORE:article.idDCMI:skos:narrowMatchdct:source

sourceMetadataLink

53

originalDataProviderType

Usage

Recommended

Type

closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:originalDataProviderType:repository,journal,publisher

Definition/Explanations

Referstothetypeoftheoriginaldataprovider(repository/journal/publisher),incasethemetadatarecordcarriesinformationtakenfrompreviousrepositories/journals/publishers(e.g.incasetheOMTDrecord'ssourceisanaggregator)

Recommendedusage

Please,selectoneofthepredefinedvaluesasappropriate.ForrecordsharvestedfromOpenAIREandCORE,thisistheelementwheretheoriginaldataprovider(i.e.therepo/journal/publisher)fromwhichtheythemselveshaveharvestedtherecord.

Relationtoothermetadataschemas

OpenAIREcurrentversion:hastobecomputedfromtheidentifierofcollectedFrominOpenAIRE

originalDataProviderType

54

originalDataProviderRepository

Usage

Recommendedunderconditions

Conditionsforusage

iforiginalDataProviderType=repository

Type

identifierormultilingualfreetext

Attributes

ms-omtd:repositoryIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)

Definition/Explanations

Referstotheentity(repository,aggregatoretc.)fromwhichthemetadatarecordhasbeenharvested

Recommendedusage

Therecommendedwayforreferringtoarepositoryisbygivingtheiridentifier(e.g.fromOpenDOAR,re3dataetc.);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"repositoryIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftherepository,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREcurrentversion:collectedFromCORE:article.repositoriesDCMI:skos:narrowMatchdct:source

originalDataProviderRepository

55

originalDataProviderJournal

Usage

Recommendedunderconditions

Conditionsforusage

iforiginalDataProviderType=journal

Type

identifierormultilingualfreetext

Attributes

ms-omtd:journalIdentifierSchemeName(foridentifiers)orxs:lang(fortitle)

Definition/Explanations

Referstothejournalfromwhichthemetadatarecordhasbeenharvested

Recommendedusage

Therecommendedwayforreferringtoajournalisbygivingtheiridentifier(e.g.ISSN,DOI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"journalIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthejournal,youmayprovidethetitleatleastinEnglish;ifyouwanttoaddtitlesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREcurrentversion:collectedFromCORE:article.journalsDCMI:skos:narrowMatchdct:source

originalDataProviderJournal

56

originalDataProviderPublisher

Usage

Recommendedunderconditions

Conditionsforusage

iforiginalDataProviderType=publisher

Type

organizationencodedwithidentifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Referstothepublisherfromwhichthemetadatarecordhasbeenharvested

Recommendedusage

Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

OpenAIREcurrentversion:collectedFromDCMI:skos:narrowMatchdct:source

originalDataProviderPublisher

57

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Forpublications,therecommendedrelationsareisVersionOfandisSimilarTo,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchhasVersionDataCite4.0:skos:closeMatchdatacite:relationType

relationType

58

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

relatedResource1

59

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

relatedResource2

60

Metadataschemaforannotatedpublications

Annotatedpublicationsaredocumentedasseparateresourceswithalinktotherawpublicationandtheirownsetofmetadataelementsprovidinginformationontheannotationprocess,tooletc.

OMTD-SHAREelement Usage

publicationIdentifier M

annotationLevel M

annotationStandoff R

mimeType R

dataFormatSpecific R

documentationURL R

characterEncoding R

typesystem R

tagset R

annotationMode R

isAnnotatedBy R

annotationDate R

Metadataschemaforannotatedpublications

61

annotationLevel

Usage

Mwhenapplicable

Conditionsforusage

forallannotatedresources

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:annotationLevel:alignment,discourseAnnotation,discourseAnnotation-argumentation,discourseAnnotation-audienceReactions,discourseAnnotation-coreference,discourseAnnotation-dialogueActs,discourseAnnotation-discourseRelations,lemmatization,morphosyntacticAnnotation-bPosTagging,morphosyntacticAnnotation-posTagging,segmentation,semanticAnnotation,semanticAnnotation-certaintyLevel,semanticAnnotation-emotions,semanticAnnotation-events,semanticAnnotation-namedEntities,semanticAnnotation-polarity,semanticAnnotation-questionTopicalTarget,semanticAnnotation-readabilty,semanticAnnotation-semanticClasses,semanticAnnotation-semanticRelations,semanticAnnotation-semanticRoles,semanticAnnotation-speechActs,semanticAnnotation-subjectivity,semanticAnnotation-temporalExpressions,semanticAnnotation-textualEntailment,semanticAnnotation-wordSenses,syntacticAnnotation-semanticFrames,speechAnnotation,speechAnnotation-orthographicTranscription,speechAnnotation-paralanguageAnnotation,speechAnnotation-phoneticTranscription,speechAnnotation-prosodicAnnotation,speechAnnotation-soundEvents,speechAnnotation-soundToTextAlignment,speechAnnotation-speakerIdentification,speechAnnotation-speakerTurns,stemming,structuralAnnotation,structuralAnnotation-documentDivisions,structuralAnnotation-sentences,structuralAnnotation-clauses,structuralAnnotation-phrases,structuralAnnotation-words,syntacticAnnotation-subcategorizationFrames,syntacticAnnotation-dependencyTrees,syntacticAnnotation-constituencyTrees,syntacticAnnotation-chunks,syntacticosemanticAnnotation-links,translation,transliteration,modalityAnnotation-bodyMovements,modalityAnnotation-facialExpressions,modalityAnnotation-gazeEyeMovements,modalityAnnotation-handArmGestures,modalityAnnotation-handManipulationOfObjects,modalityAnnotation-headMovements,modalityAnnotation-lipMovements,other

annotationLevel

62

Definition/Explanations

Theannotationleveloftheannotatedresourceorwhatas/wcomponentconsumesorproducesasoutput

annotationLevel

63

annotationStandoff

Usage

Recommended

Type

boolean

Definition/Explanations

Indicateswhethertheannotationiscreatedinlineorinastand-offfashion.

Forinteroperabilityreasons,therecommendedformatisthestand-offmode

annotationStandoff

64

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.

mimeType

65

documentationURL

Usage

Recommended

Type

urlpattern

Definition/Explanations

Linktothedocumentationforthespecificdataformat(explanationsandexamples)

documentationURL

66

dataFormatSpecific

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:dataFormatSpecific:aclAnthology,aimedCorpus,alvisEnrichedDocument,bioNLP,bioNLP;,format-variant=ST2013a1_a2bnc,cadixeJSON,conll2000,conll2002,conll2006,conll2007,conll2009,conll2012,dataSift,factoredTagLem,gate,genia,graf,html5Microdata,i2b2,imsCwb,jdbc,keaCorpus,lll,negraExport,pml,ptb;,format-variant=chunked,ptb;,format-variant=combined,relp,tiger,tupp-dz,twitter,uimaBinaryCas,uimaCASDump,web1t,xces;,format-variant=ilsp

Definition/Explanations

Thesupplementarylevelofdataformat

dataFormatSpecific

67

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcorporathatincludesfilesofvariouscharacterencodings.

characterEncoding

68

typesystem

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

typesystem

69

tagset

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

tagset

70

annotationMode

Usage

Recommended

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:annotationMode:manual,automatic,mixed,interactive

Definition/Explanations

Indicateswhethertheresourceisannotatedmanuallyorbyautomaticprocesses

annotationMode

71

isAnnotatedBy

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothecomponentusedfortheannotationoftheresource

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

isAnnotatedBy

72

annotationDate

Usage

Recommended

Type

dateorrangeofdates

Definition/Explanations

Thedates(eitherdateorrangeofdates)inwhichtheannotationprocesshastakenplace

annotationDate

73

GuidelinesforprovidersofcorporaIntroductionInstructionsforprovidersofcorpora

Guidelinesforprovidersofcorpora

74

IntroductionOpenMinTeDfacilitatestheuseofTDMtechnologiesinthescientificpublicationsworld,rangingfromgenericscholarlycommunicationtoliteraturerelatedtospecificdisciplines.

CorporaintheOpenMinTeDframeworkrefermainlytothecollectionsofpublicationsthatwillbeusedasminingsourceintheTDMprocess.Infact,theOpenMinTeDplatformincludesamechanismforautomaticallygeneratingcorporabasedonusercriteriaselectedfromafacetedviewofallpublicationsprovidedbytheOpenMinTeDpartners-moredetailsareincludedintheGuidelinesforpublications.

Corporamayalsocomefromrepositoriesoflanguageresources,suchasMETA-SHAREandCLARIN,ordiscipline-specificrepositories,inwhichcasetheydonothavetobecomposedofscholarlypublications.Examplesincludereferencecorpora(i.e.corporadeemedrepresentativeofgenerallanguageorasublanguageusage),newscorpora,collectionsofdomain-specifictexts,suchasmanuals,etc.aswellasannotatedcorpora,suchastreebanks,morphologicallytaggedgoldencorporaetc.Thesecorporaarenottargetedassourceofminingbutcanbeusedfortrainingcomponents(e.g.trainalanguagemodel)orforevaluatingtheirperformanceorforancillarypurposes.

TobevalidforregistrationintoOpenMinTeD,allcorporamustbeaccompaniedwithametadatarecordconformantwiththeOMTD-SHAREschema,andafilewiththecontentsmustbemadereadilyaccessibleduringtheprocessingoperation.

Thefollowingsectionspresentalistofinstructions,requirementsandrecommendationsthatcorporamustmeettointeractwithTDMresources.

Introduction

75

InstructionsforprovidersofcorporaHowtoregisteryourresourcesHowtomakeyourresourcesinteroperableHowtodocumentyourresourcesFurtherrequirementsforannotatedcorporaRecommendedschemaforcorpora

Instructionsforprovidersofcorpora

76

Howtoregisteryourresources

Corporacanberegisteredbyauthorisedusers.

Ifyouwishtoregisteracorpus,youmust:

provideametadatarecordcompliantwiththeOMTD-SHAREschemaforcorpora,atleastattheminimallevelwhichyoucanuploadtotheRegistryasanXMLfileand/oreditwiththeOpenMinTeDmetadataeditorprovideazippedfilewiththecontentsofthecorpusoralinktoaURLwherethecorpusisdirectlyaccessible(i.e.notalandingpage);wherepossible,thezippedfileshouldfollowthefolderstructurerecommendedforOpenMinTeDpublications,i.e.separatefoldersforcontents,metadatarecordsandlicencedocuments.

Ifthecorpusisstoredattherepositoryofanetworkorinfrastructurethatallowsharvesting(normallyuponagreementsmadewithOpenMinTeD),youcanalsoprovidetherelevantidentifierandthiswillbeuploadedwiththeappropriatedescription.Wherepossible(andthiswillbeappropriatelyindicated),themetadatadescriptionwillbeautomaticallyconvertedtotheOMTD-SHAREschemaandpresentedtotheuserforfurtherediting.

Howtoregisteryourresources

77

Howtomakeyourresourcesinteroperable

InordertoensurethatyourcorporacanbeminedintheOpenMinTeDplatform,youmustfollowthesamerequirementsthataresetforscholarlypublications.Youmusttherefore

providedirectaccesstothecontentsofeachcorpusdescribeeachcorpuswithametadatarecordcompatiblewiththeOTMD-SHAREminimalschema.

Inaddition,thefollowingrecommendationscontributetointeroperabilityandmakeyourcorporaeasiertoprocess:

Thepreferredformatsfordeliveringtextualmaterialareplaintext,XML,PDF(notproprietaryandcertainlynotofscannedimages),whichcanbereadbyoneoftheexistingreaders.

Ifappropriateforyourmaterial,useoneofthemorespecificdataformatsthatarealreadycoveredbyreadersandconverters(cf.dataFormatSpecific).

ThepreferredcharacterencodingisUTF-8.

Ifyoufailtoabidetothem,itmightstillbepossibletoprocessyourcorporaviatheOpenMinTeDplatform,butthiscannotbeguaranteedandinteroperabilitywithotherresourceswillsufferloss.

So,ifyouintendtocreateanewcorpus,itisimportantthatyoutakeintoaccountfromtheearlystepsofitsdesign,therequirements,standards,bestpracticesandrecommendationspromotedbyOpenMinTeDandothercooperatinginfrastructures.

Please,notethattherearenogeneralrequirementsyetforcorporatobeusedforancillarypurposes(e.g.fortrainingatool),asthesearedependentontherequirementsofthesoftwarethatwillusethemandonthepurposeofuse.

Howtomakeyourresourcesinteroperable

78

Howtodocumentyourresources

TobefullycompliantwithOpenMinTeD,youmust

ensurethatthecorpusisdistributedunderOpenAccessconditionsincludeinthemetadatarecordalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththeresourceifyoualreadyhaveaPIDforyourpublication(preferablyDOI),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformationonidentifierschemes).

Furtherrecommendationswillcontributetotheinteroperabilityofyourresources:

FurtheradoptionofstandardssuchastheJATSarticletagsuiteorTEIP5guidelinesforannotatingtheinnerstructureoftextsisrecommended.Please,ensurethatyouversionallyourresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendations.Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.publicationsaboutthedesignandconstructionofthecorpusetc.),whichyoushouldalsoversionalongwiththecorpusandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributor.Usestandardclassificationvocabularies(e.g.MeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.

Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.

1

1

Howtodocumentyourresources

79

Howtodocumentyourresources

80

Furtherrequirementsforannotated/processedcorpora

CorporacanberegisteredintheOpenMinTeDplatform

inanunprocessedformatandannotatedbytheoperationofTDMsoftwarealsoregisteredintheplatformand/orinanalreadyprocessedformat;inthiscase,theymustbeincludedasaseparateresourcewithitsownmetadatarecordincludingaspecificsetofmetadataelements(thesameasforannotatedpublications).

ItshouldbenotedthatcorporaannotatedbymeansoftheOpenMinTeDplatformwillbeautomaticallyassignedtheappropriatevaluesfortheseelements.

Furtherrequirementsforannotatedcorpora

81

Recommendedschemaforcorpora

Overview

Thissectionincludesasynopsisoftheminimalschemaforcorpora,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements.Additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.

Forannotatedcorpora,seehere.

OMTD-SHAREelement Usage

resourceType Μ

resourceName Μ

description Μ

identifier Μ

version M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) Μ

nonStandardLicenceName Rwhenapplicable

nonStandardLicenceTermsURL Μwhenapplicable

versionoflicence Μ

distributionMedium Μ

downloadURL Μwhenapplicable

contactEmailorlandingPage(oneofthetwomustbeprovided) Μ

contactPerson(identifierorpersonName) R

contactGroup(identifierororganizationName R

mustBeCitedWith R

resourceCreator R

creationDate R(Mforquery-builtcorpora)

corpusType Μ

mediaType Μ

Recommendedschemaforcorpora

82

lingualityType Μ

multilingualityType Μwhenapplicable

language Μ

sizePerLanguage Μ

size Μ

mimeType R

characterEncoding R

domain R

subject R

keyword R

userQuery Μwhenapplicable

relationType R

relatedResource1 R

relatedResource2 R

Recommendedschemaforcorpora

83

resourceName

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Thefullnamebywhichtheresourceisknown

Recommendedusage

Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“BritishNationalCorpus”insteadofjust“corpusofEnglish”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,pleaseuseanindicativenamewiththesourcesandthedates(e.g."SubcorpusofOpenAIREwithbiochemistryarticlescreatedon4/10/2016")

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

Recommendedschemaforcorpora

84

resourceType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Attributes

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component

Definition/Explanations

Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput

Recommendedusage

Forcorpora,thefixedvalue"corpus"mustbeaddedautomatically

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusagefortextcorporaistouse"dataset"butthevalues"collection"and"text"canalsobeused

Recommendedschemaforcorpora

85

description

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Providesthedescriptionoftheresourceinprose

Recommendedusage

Giveabriefyetinformativedescriptionofthecorpuscontents,mentioningatleastlanguage(s),subject(s)/domain(s)and,ifpossible,sizeandprovenance.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType

Recommendedschemaforcorpora

86

identifier

Usage

Mandatory

Type

freetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI

Definition/Explanations

ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither

theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theidentifiermustbeassignedautomatically.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)

Recommendedschemaforcorpora

87

version

Usage

Recommended

Type

freetext

Definition/Explanations

Anystring,usuallyanumber,thatidentifiestheversionofaresource

Recommendedusage

Please,keepthisonlyforversionsofthesameresource(e.g.corrected,enlargedetc.)andnotforvariantsorforversionswithadditionalordifferentinformation.Therecommendedpracticeforversioningshouldfollowsemanticversioningguidelines(http://semver.org/)

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version

Recommendedschemaforcorpora

88

licence

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

Recommendedschemaforcorpora

89

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thelicencevaluescanbeautomaticallyaggregatedfromthelicencevaluesofthemetadatarecordsincludedinthem;inanycase,the"rightsStmtName"canalsobecomputedautomatically.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaforcorpora

90

rightsStmtName

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaforcorpora

91

rightsStmtURL

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaforcorpora

92

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

Recommendedschemaforcorpora

93

nonStandardLicenceName

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

freetext

Definition/Explanations

Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences

Recommendedusage

Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:title(fordct:licenseDocument)

Recommendedschemaforcorpora

94

nonStandardLicenceTermsURL

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaforcorpora

95

distributionMedium

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other

Definition/Explanations

Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<

Recommendedusage

Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thedefaultvalueis"downloadable".Please,notethatIfthepublicationisdistributedindifferentmediumsunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelements("distributionInfo")todescribethem.

Recommendedschemaforcorpora

96

downloadURL

Usage

Mandatoryunderconditions

Conditionsforusage

ifdistributionMedium=downloadable

Definition/Explanations

Anyurlwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,useforcorporawhoseactualcontentisnotuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheactualcontentofthecorpusandnottoalandingpage.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thefullcontentisalreadyuploadedinOpenMinTeD,andthereforethedownloadURLisautomaticallyinserted(publicurllinkfromwhichthecorpuscanbedownloaded).

Recommendedschemaforcorpora

97

contactEmail

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

emailpattern

Definition/Explanations

Ageneralemailaddressthatcanbeusedascontactpointforaresource(e.g.resource@example.com)

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"elementForcorporacreatedthroughtheOMTDcorpusbuildingprocess,acontactEmailisinsertedautomaticallyfilledinwiththeemailaddressoftheuserthathasbuiltit.

Recommendedschemaforcorpora

98

landingPage

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

URLpattern

Definition/Explanations

AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"elementForcorporacreatedthroughtheOMTDcorpusbuildingprocess,alandingPagewillalsobeautomaticallycreatedwithinformationontheuserqueryandthecontentsoftheresults.

Recommendedschemaforcorpora

99

contactPerson(identifierorpersonName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)

Recommendedschemaforcorpora

100

contactGroup(identifierororganizationName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.

Recommendedschemaforcorpora

101

mustBeCitedWith

Usage

Recommended

Type

freetextoridentifier

Definition/Explanations

Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)

Recommendedusage

Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.

Recommendedschemaforcorpora

102

resourceCreator

Usage

Recommended

Type

personororganization,bothencodedwithidentifierormultilingualfreetext

Attributes

forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&

Recommendedschemaforcorpora

103

datacite:schemeURI

Recommendedschemaforcorpora

104

creationDate

Usage

Recommended

Type

datepatternordaterange

Definition/Explanations

Thedateofthecreationofhteresource,expressedasarangebetweenstartingandenddateorexactdate

Recommendedusage

Please,indicateatleastyearofcreation,ortimeinterval.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thecreationDateisautomaticallyinserted.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:createdDataCite4.0:skos:exactMatchdatacite:CreationDate

Recommendedschemaforcorpora

105

corpusType

Usage

Mandatory

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:corpusType:raw,annotated,annotations

Definition/Explanations

Thesubtypeofthecorpusintermsofprocessing(i.e.whetheritisraw/unprocessed,annotatedorcomposedonlyofannotationswithlinkstotheoriginalrawcorpus

Recommendedusage

Please,selecttheappropriatevalue.ForcorporacreatedthroughthecorpusbuildingprocessofOMTD,thevalueisautomaticallysetto"raw"

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdc:type

Recommendedschemaforcorpora

106

mediaType

Usage

Mandatory

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mediaType:text,audio,video,image

Definition/Explanations

Specifiesthemediatypeoftheresourceandbasicallycorrespondstothephysicalmediumofthecontentrepresentation.Eachmediatypeisdescribedthroughadistinctivesetoffeatures.Aresourcemayconsistofpartsattributedtodifferenttypesofmedia.Acomponentmaytakeasinput/outputmorethanonedifferentmediatypes.

Recommendedusage

OpenMinTeDonlyhandlestextresources,sothedefaultvalueissetto"text".

Recommendedschemaforcorpora

107

lingualityType

Usage

Mandatory

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:lingualityType:monolingual,bilingual,multilingual

Definition/Explanations

Indicateswhethertheresourcecontainsone,twoormorelanguages

Recommendedusage

Please,selectoneofthevalues.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thevaluecanbeautomaticallycomputed.

Recommendedschemaforcorpora

108

multilingualityType

Usage

Mandatoryunderconditions

Conditionsforusage

iflingualityType=bilingualormultilingual

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:multilingualityType:parallel,comparable,multilingualSingleText,originalTranslationsInSameText,other

Definition/Explanations

Indicateswhetherthecorpusisparallel,comparableormixed

Recommendedusage

Please,selectoneofthevalues.

Recommendedschemaforcorpora

109

language

Usage

Mandatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)ofthecorpusaccordingtoIETFBCP47guidelines.

ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thevaluecanbecomputedautomatically.

Theelementcanberepeatedtoencodemultiplelanguages.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)accordingtotheIETFBCP47guidelines

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language

Recommendedschemaforcorpora

110

sizePerLanguage

Usage

Recommended

Type

sizepattern(sizeandsizeUnit)

Definition/Explanations

Providesinformationonthesizeperlanguagesubset

Recommendedusage

Youmayindicatethesizeofthesubsetsofthecorpusperlanguage;todothat,fillintheappropriatenumber(withoutspaces)andselecttheappropraitesizeUnit(e.g.20000words).ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thiscanbeautomaticallycomputed,forinstance,forfiles/publications.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size

Recommendedschemaforcorpora

111

size

Usage

Mandatory

Type

sizepattern(sizeandsizeUnit)

Definition/Explanations

Providesinformationonthesizeoftheresourceorofresourceparts.

Recommendedusage

Youmayindicatethesizeoftheentirecorpus(orcorpusparts)byfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).ThepreferredsizeUnitiswordsorsentences.Ifnothingelseisknown,pleaseindicateatleastfiles.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,thiscanbeautomaticallycomputed,forinstance,forfiles/publications.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size

Recommendedschemaforcorpora

112

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format

Recommendedschemaforcorpora

113

Recommendedschemaforcorpora

114

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcorporathatincludesfilesofvariouscharacterencodings.

Recommendedschemaforcorpora

115

domain

Usage

Recommended

Type

freetext

Attributes

ms:classificationSchemeNameandms:schemeURI

Definition/Explanations

Domainofthecorpus

Recommendedusage

Itisrecommendedthatdomainvaluesaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthedomainvaluesistheidentifierofthedomaininthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI

Recommendedschemaforcorpora

116

subject

Usage

Recommended

Type

freetext

Attributes

ms:classificationSchemeNameandms:schemeURI

Definition/Explanations

Subjectortopicofthecorpus

Recommendedusage

Itisrecommendedthatthesubjectsaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthesubjectvaluesistheidentifierofthesubjectinthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI

Recommendedschemaforcorpora

117

keyword

Usage

Recommended

Type

freetext

Definition/Explanations

Wordsusedforindexingthecorpus

Recommendedusage

Afreetextelementusedforencodingkeywordsfortheclassificationofthepublication,onlyinEnglish;please,encodeoneword/phraseeachtimeandrepeattheelementformultiplekeywords.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:subject

Recommendedschemaforcorpora

118

userQuery

Usage

Mandatorywhenapplicable

Type

freetext

Definition/Explanations

Thequerytextthathascreatedthecorpusofscholarlypublications

Recommendedusage

TobefilledinautomaticallyduringtheOMTDcorpusbuildingprocess

Recommendedschemaforcorpora

119

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Forcorpora,therecommendedrelationsareisVersionOfandisSimilarTo,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DataCite4.0:skos:closeMatchdatacite:relationType

Recommendedschemaforcorpora

120

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaforcorpora

121

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaforcorpora

122

Metadataschemaforannotatedcorpora

Annotatedcorporaaredocumentedasseparateresources

includingonlytheannotateddata,withalinktotherawcorpusanditsownsetofmetadataelementsprovidinginformationontheannotationprocess,tooletc.orasasetofrawandannotatedfilestogether,withametadatarecordthatincludesalltheappropriateelementsforrawcorpora(cf.above)withtheadditionalsetofmetadataelementsforannotations,i.e.allthefollowingelementsexceptfor"resourceIdentifier".

OMTD-SHAREelement Usage

resourceIdentifier M

annotationLevel M

annotationStandoff R

mimeType R

dataFormatSpecific R

documentationURL R

characterEncoding R

typesystem R

tagset R

annotationMode R

isAnnotatedBy R

annotationDate R

Metadataschemaforannotatedcorpora

123

Guidelinesforprovidersofancillaryknowledgeresources

IntroductionInstructionsforprovidersofancillaryknowledgeresources

Guidelinesforprovidersofknowledgeresources

124

IntroductionManyTDMtoolsandservicesmakeuseofancillaryknowledgeresources.Byknowledgeresources,wemeaninformationfromsomedomainorareaofhumanendeavor(e.g.linguistics,agriculture,orthesocialsciences),representedinaformthatcanbeusedtosolveproblemscomputationallyinthatdomainorarea .Creationofsuchknowledgeresourcesiswidespreadinbothlinguistics,andinmanydomainswhereinformaticsisapplied.Theseknowledgeresourcestypicallyincludecontrolledvocabularies,terminologies,lexica,ontologies,andsoon.

AsOpenMinTeDisaboutapplyingTDMtoend-userdomains,theresourcesusedinthosedomainsareofprimaryimportance.Similarly,astextisimportanttoOpenMinTeDtoolsandservices,solinguisticresources(e.g.resourcesthatdescribepartsofspeech)arealsoimportant.

OpenMinTeDtoolsandservicesmaymakeuseoftheseresourcesinordertoprocesstext.Forexample,aservicemaymakeuseofadictionaryofarchaeologicaltermswhenprocessingobjectdescriptions.Or,aservicemaymakeuseofpartsofspeechtofindtheadjectivesinadocument,andusethisinformationtohelpdeterminethesentimentofthedocument.

InordertomakeiteasiertosharetheresultsofTDM,andinordertoallowTDMtoolsandservicestoworktogether,OpenMinTeDmakesanumberofrecommendationsabouthowknowledgeresourcesarerepresented.Knowledgeresourcesthatdonotfollowtheserecommendationscanofcoursebeused;however,interoperabilitywillbereduced.

TheOpenMinTeDrecommendationsonknowledgeresourcesarebasedontheLinkedDataparadigm.By"LinkedData",wemeandatathatiscreatedandmadeavailablewiththeuseofsemanticwebtechnologiesandformats(e.g.RDF,OWL,SPARQL)and,mostimportantly,thatisinterrelatedwithotherdata.

.Poole,DavidandAlanMackworth(2010)ArtificialIntelligence,CambridgeUniversityPress↩

1

1

Introduction

125

Instructionsforprovidersofancillaryknowledgeresources

HowtoregisteryourknowledgeresourcesHowtomakeyourknowledgeresourcesinteroperableHowtodocumentyourknowledgeresourcesRecommendedschemaforlexical/conceptualresources,incl.annotationresourcesRecommendedschemaformodels

Instructionsforprovidersofancillaryknowledgeresources

126

HowtoregisteryourknowledgeresourcesAncillaryknowledgeresourcescanberegisteredbyauthorisedusersasdecidedintheOpenMinTeDPolicies.

Ifyouwishtoregistersucharesource,dependingonthemodeofregisteringtheresource,thefollowingrequirementsareinorder:

iftheresourceisbeingprovidedforuploadtotheOpenMinTeDregistry,pleasepackageitasazipfilepreservingtherecommendedfolderstructureiftheresourceisavailableaspartofaMavenartifact,pleaseprovidetheappropriateMavencoordinatesiftheresourceisofferedwithaSPARQLendpointorataURL,pleasetypeintherelevantlink.

Inallcases,youmustalso

provideametadatarecordcompliantwiththeOMTD-SHAREschema.

Wherepossible,e.g.inthecaseofprovidingaMavenartifact,metadatamaybe,atleastpartially,convertedfromtheexistingdescriptors.Inallcases,youwillbenotifiedoftheavailabilityofconvertedmetadataatthetimeofuploading.

Howtoregisteryourknowledgeresources

127

Howtomakeyourknowledgeresourcesinteroperable

Inaddition,ifyouwanttobefullycompliantwiththeOpenMinTeDinteroperabilityrequirements,pleaseensurethat

youprovidetheresourceinastandardformat,preferablyXMLorJSON-basedsyntax,oranyotherRDFserialisationformat(e.g.TurtleorN3)allelementsintheknowledgeresourceareidentifiedwithaURI;forLinkedDataresources,thefollowingidentifiersshouldbeused:

JSON-LD-the@idkeywordRDF/XML-theattributesxml:base,rdf:IDandrdf:aboutXML-thexml:idattribute

youregisterknowledgeresourcesindependentlyofanycomponentthatusesthem,e.g.inaseparateMavenartifact.

Inthecasethatyouprovidetheresource

inanotherformat,giventhatadherencetoLinkedDatastandardsisnotimposedpackagedinMavenartifactswiththecomponentsthatuseit,attheexpense,however,ofreusability

youstillqualifyforpartialcompliance.

Howtomakeyourknowledgeresourcesinteroperable

128

Howtodocumentyourknowledgeresources

TobefullycompatiblewithOpenMinTeD,youmust

ensurethattheresourceisdistributedunderOpenAccessconditionsincludeinthemetadatarecordalinktothelicencedocumentthatdescribesthetermsandconditionsunderwhichitisprovided,andattachthelicencedocumenttogetherwiththeresourceifyoualreadyhaveaPIDforyourresource(e.g.aURIoraHANDLE),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformation).providelinkagebetweenyourresourceandotherresources(domain-specificorgenericresources);forlinksbetweenknowledgeresourcesintheLinkedDataparadigm,mappingshouldbeexpressedthroughRDFstatements,usingrelationsfromSKOS,togetherwiththefollowingOWLandRDFobjectproperties:owl:sameAs,owl:equivalentClass,owl:equivalentProperty,rdfs:subClassOf,rdfs:subPropertyOf.versionallyourresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendations.

Thefollowingrecommendationscontributetointeroperabilitybutarenotyetenforced:

Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.publicationsaboutthedesignandconstructionofthecorpusetc.),whichyoushouldalsoversionalongwiththeknowledgeresourceandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributorofUsestandardclassificationvocabularies(e.g.MeSH,DDC,LCSHetc.)foraddingclassificationtagstoyourmaterialandspecifythevocabularyyouuseinthemetadatarecord;provideatleastonebroadcategoryforyourmaterial(e.g.lifesciences,computingetc.).Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.)inthemetadatarecordsisadded,pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.

Thefollowingsectionsincludeasynopsisoftheminimalschemasforancillaryknowledgeresources,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelementsperresourcetype,giventhatknowledgeresourcesmaytakeoneofthefollowingresource

1

Howtodocumentyourknowledgeresources

129

types:

lexical/conceptualresource:reservednotonlyforlexica,ontologies,termlists,glossariesetc.butalsoforanyresourcethatcanbeusedforannotationpurposes,i.e.linguistictagsets,typesystemsetc.languagedescription:reservedmainlyforcomputationalgrammarsmodel:formachinelearningandstatisticalmodels.

Itshouldalsobenotedthatadditionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.

Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.

1

Howtodocumentyourknowledgeresources

130

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

131

OMTD-SHAREelement Usage

resourceType M

resourceName M

description M

identifier M

version M

distributionMedium M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided)] M

versionoflicence M

distributionMedium M

downloadURL Mwhenapplicable

contactEmailorlandingPage(oneofthetwomustbeprovided) M

contactPerson(identifierorpersonName) R

contactGroup(identifierororganizationName) R

mustBeCitedWith R

lexicalConceptualResourceType M

encodingLevel R

linguisticInformation R

conformanceToStandardsBestPractices R

lingualityType M

language M

metalanguage R

size&sizeUnit M

mimeType R

characterEncoding R

domain R

relationType R

relatedResource1 R

relatedResource2 R

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

132

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

133

resourceType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Attributes

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component

Definition/Explanations

Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatacomponenttakesasinputorproducesasoutput

Recommendedusage

Forlexical/conceptualresources,thefixedvalue"lexicalConceptualResource"mustbeaddedautomatically

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageforlexical/conceptualresourcesistouse"dataset"butthevalues"collection"and"text"canalsobeused

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

134

resourceName

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Thefullnamebywhichtheresourceisknown

Recommendedusage

Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“GreekPAROLElexicon”insteadofjust“amonolinguallexiconofGreek”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

135

description

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Providesthedescriptionoftheresourceinprose

Recommendedusage

Giveabriefyetinformativedescriptionofthecorpuscontents,mentioningatleastlanguage(s),subject(s)/domain(s)and,ifpossible,sizeandprovenance.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

136

identifier

Usage

Mandatory

Type

freetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI

Definition/Explanations

ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither

theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

137

licence

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

138

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

139

rightsStmtName

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

140

rightsStmtURL

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

141

nonStandardLicenceName

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

freetext

Definition/Explanations

Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences

Recommendedusage

Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:title(fordct:licenseDocument)

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

142

nonStandardLicenceTermsURL

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

143

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

144

distributionMedium

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other

Definition/Explanations

Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<

Recommendedusage

Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.Forinteroperabilityreasons,therecommendedwayofprovidingannotationresources(e.g.tagsets,ontologiesetc.)istodistributetheminadownloadableformorinawaythatcanbeeasilyaccessedbythes/w

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

145

downloadURL

Usage

Mandatoryunderconditions

Conditionsforusage

ifdistributionMedium=downloadable

Definition/Explanations

Anyurlwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,useforresourceswhoseactualcontentisnotuploadedinOpenMinTeD;inthiscase,pleaseensurethattheURLlinkleadstotheresourceitselfandnottoalandingpage.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

146

accessURL

Usage

Mandatoryunderconditions

Conditionsforusage

ifdistributionMedium=webExecutableoraccessibleThroughInterface

Definition/Explanations

Alandingpage,feed,SPARQLendpointetc.thatgivesaccesstotheresourceorwherethewebservice/workflowisexecuted

Recommendedusage

Pleaseuseforresourcesthatare"accessibleThroughInterface"or"webExecutable"

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

147

contactEmail

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

emailpattern

Definition/Explanations

Ageneralemailaddressthatcanbeusedascontactpointforaresource(e.g.resource@example.com)

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

148

landingPage

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

URLpattern

Definition/Explanations

AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

149

contactPerson(identifierorpersonName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

150

contactGroup(identifierororganizationName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

151

mustBeCitedWith

Usage

Recommended

Type

identifierorfreetext

Definition/Explanations

Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)

Recommendedusage

Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

152

lexicalConceptualResourceType

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:lexicalConceptualResourceType:wordList,computationalLexicon,ontology,wordnet,thesaurus,framenet,terminologicalResource,machineReadableDictionary,lexicon,typesystem,tagset,mappingOfResources,other

Definition/Explanations

Specifiesthetypeoflexical/conceptualresources

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:type

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

153

encodingLevel

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:encodingLevel:phonetics,phonology,semantics,morphology,syntax,pragmatics,other

Definition/Explanations

InformationonthecontentsofthelexicalConceptualResourceasregardsthelinguisticlevelofanalysis

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

154

linguisticInformation

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:linguisticInformation:accentuation,lemma,lemma-MultiWordUnits,lemma-Variants,lemma-Abbreviations,lemma-Compounds,lemma-CliticForms,partOfSpeech,morpho-Features,morpho-Case,morpho-Gender,morpho-Number,morpho-Degree,morpho-IrregularForms,morpho-Mood,morpho-Tense,morpho-Person,morpho-Aspect,morpho-Voice,morpho-Auxiliary,morpho-Inflection,morpho-Reflexivity,syntax-SubcatFrame,semantics-Traits,semantics-SemanticClass,semantics-CrossReferences,semantics-Relations,semantics-Relations-Hyponyms,semantics-Relations-Hyperonyms,semantics-Relations-Synonyms,semantics-Relations-Antonyms,semantics-Relations-Troponyms,semantics-Relations-Meronyms,usage-Frequency,usage-Register,usage-Collocations,usage-Examples,usage-Notes,definition/gloss,translationEquivalent,phonetics-Transcription,semantics-Domain,semantics-EventType,semantics-SemanticRoles,statisticalProperties,morpho-Derivation,semantics-QualiaStructure,syntacticoSemanticLinks,other

Definition/Explanations

AmoredetailedaccountofthelinguisticinformationcontainedinthelexicalConceptualResource

Relationtoothermetadataschemas

DataCite4.0:creatorwithcreatorNameornameIdentifier&nameIdentifierScheme&schemeURI;N.B.creatorNamefamilyName&givenNameinv4

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

155

conformanceToStandardsBestPractices

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:conformanceToStandardsBestPractices:AgroVoc,ALVIS,ARGO,BML,CES,DKPro_Core,EAGLES,EDAMontology,ELSST,EML,EMMA,GATE,GESIS,GMX,GrAF,HamNoSys,HASSET,InkML,ILSP_NLP,ISO12620,ISO16642,ISO1987,ISO26162,ISO30042,ISO704,JATS,LAF,LAPPS,Lemon,LMF,MAF,MLIF,MOSES,MULTEXT,MUMIN,multimodalInteractionFramework,OAXAL,OLIA,OWL,PANACEA,pennTreeBank,pragueTreebank,RDF,SemAF,SemAF_DA,SemAF_NE,SemAF_SRL,SemAF_DS,SKOS,SRX,SynAF,TBX,TMX,TEI,TEI_P3,TEI_P4,TEI_P5,TimeML,XCES,XLIFF,UD,WordNet,othe

Definition/Explanations

Specifiesthestandardsorthebestpracticestowhichthetagsetusedfortheannotationconforms

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

156

lingualityType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:lingualityType:monolingual,bilingual,multilingual

Definition/Explanations

Indicateswhethertheresourcecontainsone,twoormorelanguages

Recommendedusage

Please,selectoneofthevalues.Please,notethattheelementconcernsthelanguageoftheresourceitselfandnotthelanguageusedforitsdescription;forinstance,alexiconofEnglishwithdefinitionsbothinEnglishandFrenchisconsideredmonolingual.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

157

language

Usage

Mandatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)oftheresourceaccordingtoIETFBCP47guidelines.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageusedtodescribetheresource(e.g.en-US)accordingtotheIETFBCP47guidelines;nottobeconfusedwith"language"whichisusedforthelanguageofthecontentsoftheresource.Forinstance,alexiconofEnglishwithdefinitionsinEnglishandFrenchmustbeencodedwith"language""English"and2"metalanguage"valuesfor"English"and"French".Theelementcanberepeatedformultiplelanguages.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

158

metalanguage

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)usedtodescribethecontentsoftheresource(the"metalanguage")accordingtoIETFBCP47guidelines.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageusedtodescribetheresource(e.g.en-US)accordingtotheIETFBCP47guidelines;nottobeconfusedwith"language"whichisusedforthelanguageofthecontentsoftheresource.Forinstance,alexiconofEnglishwithdefinitionsinEnglishandFrenchmustbeencodedwith"language""English"and2"metalanguage"valuesfor"English"and"French".Theelementcanberepeatedformultiplelanguages.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

159

size

Usage

Mandatory

Type

sizepattern(sizeandsizeUnit)

Definition/Explanations

Providesinformationonthesizeoftheresourceorofresourceparts.

Recommendedusage

Youmayindicatethesizeofthelexical/conceptualresourcebyfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

160

domain

Usage

Recommended

Type

freetext

Attributes

ms:classificationSchemeNameandms:schemeURI

Definition/Explanations

Domainofthelexical/conceptualresource

Recommendedusage

Itisrecommendedthatdomainvaluesaretakenfromanauthoritativesource,suchasDDC(DeweyDecimalClassification,http://www.oclc.org/dewey/)orUDC(UniversalDecimalClassification,http://www.udcc.org/)andthatthesourceisidentified;ifyoudo,pleaseusetheclassificationSchemeNametoindicatethesource;ifthisisnotincludedinthelistofvalues,pleaseuse"schemeURI"withalinktoaURLwithmoreinformationonthescheme.Therecommendedwayofaddingthedomainvaluesistheidentifierofthedomaininthescheme;furtherinstructionsonthestandardizationoftheformatwillbeprovided.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:subjectDataCite4.0:skos:exactMatchdatacite:Subjectwithdatacite:subjectScheme,datacite:schemeURIanddatacite:valueURI

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

161

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

162

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

163

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

164

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Forlexical/conceptualresources,therecommendedrelationsareisVersionOfandrequiresSoftware,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DataCite4.0:skos:closeMatchdatacite:relationType

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

165

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

166

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaforlexical/conceptualresources,incl.annotationresources

167

Recommendedschemaformodels

Recommendedschemaformodels

168

OMTD-SHAREelement Usage

resourceType M

resourceName M

description M

identifier M

version M

distributionMedium M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M

versionoflicence M

distributionMedium M

downloadURL Mwhenapplicable

contactEmailorlandingPage(oneofthetwomustbeprovided) M

contactPerson(identifierorpersonName) R

contactGroup(identifierororganizationName) R

mustBeCitedWith R

resourceCreator(personororganization,describedwithidentifierorname) R

variantName M

tagset R

typesystem R

algorithm R

trainingCorpusDetails R

mediaType M

lingualityType M

multilingualityType Mwhenapplicable

language M

size M

relationType=isCompatibleWith(externalrelationbetweenmodelsandcomponentsthatcanusethem) R

Recommendedschemaformodels

169

Recommendedschemaformodels

170

resourceType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Attributes

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component

Definition/Explanations

Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput

Recommendedusage

Formodels,thefixedvalue"model"mustbeaddedautomatically

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType;recommendedusageformodelsistouse"model"butthevalue"dataset"canalsobeused

Recommendedschemaformodels

171

resourceName

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Thefullnamebywhichtheresourceisknown

Recommendedusage

Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“OpenNLPPOStaggermodelforEnglish”insteadofjust“modelforEnglishPOStags”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

Recommendedschemaformodels

172

identifier

Usage

Mandatory

Type

freetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI

Definition/Explanations

ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither

theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)

Recommendedschemaformodels

173

description

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Providesthedescriptionoftheresourceinprose

Recommendedusage

Giveabriefyetinformativedescriptionofthemodel,e.g.thelanguage(s)itappliesto,thecorpusithasbeentrainedon,theoreticalapproachesusedetc.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:abstractDataCite4.0:skos:exactMatchdatactite:descriptionwithvalue"abstract"fordatacite:descriptionType

Recommendedschemaformodels

174

version

Usage

Recommended

Type

freetext

Definition/Explanations

Anystring,usuallyanumber,thatidentifiestheversionofaresource

Recommendedusage

Please,keepthisonlyforversionsofthesameresource(e.g.corrected,enlargedetc.)andnotforvariantsorforversionswithadditionalordifferentinformation.Therecommendedpracticeforversioningshouldfollowsemanticversioningguidelines(http://semver.org/)

Relationtoothermetadataschemas

DCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version

Recommendedschemaformodels

175

licence

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

Recommendedschemaformodels

176

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaformodels

177

rightsStmtName

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:rightsStmtName:openAccess,closedAccess,embargoedAccess,restrictedAccess

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

Recommendedschemaformodels

178

rightsStmtURL

Usage

Mandatoryunderconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaformodels

179

nonStandardLicenceName

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

freetext

Definition/Explanations

Thenamewithwhichalicenceisknown;tobeusedforlicencesnotincludedinthepre-definedlistofrecommendedlicences

Recommendedusage

Please,providethenameofthelicenceifit'salreadyknownorsupplyonethatcanuniquelyidentifyit.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:title(fordct:licenseDocument)

Recommendedschemaformodels

180

nonStandardLicenceTermsURL

Usage

Mandatoryunderconditions

Conditionsforusage

tobeusedwithms:licenceother(i.e.forlicencesnotincludedinthelistofrecommendedones)

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

Recommendedschemaformodels

181

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

Recommendedschemaformodels

182

distributionMedium

Usage

Mandatory

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:distributionMedium:webExecutable,paperCopy,hardDisk,bluRay,DVD-R,CD-ROM,downloadable,accessibleThroughInterface,other

Definition/Explanations

Specifiesthemedium(channel)usedfordeliveryorprovidingaccesstotheresource<

Recommendedusage

Please,useoneoftheprovidedvaluestoindicatethemediumofdistribution.Formodels,theexpectedvalueis"downloadable".Please,notethatIfthemodelisdistributedindifferentmediumsand/orunderdifferenttermsofuseorlicences,youcanrepeatthewholesetofelementstodescribethem.

Recommendedschemaformodels

183

downloadURL

Usage

Mandatoryuponconditions

Conditionsforusage

ifdistributionMedium=downloadable

Type

urlpattern

Definition/Explanations

Anyurlwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,indicatewherethemodelcanbedownloaded;thiselementisofparticularimportanceifyouhavenotuploadedtheresourceintherepository

Recommendedschemaformodels

184

contactEmail

Usage

Mandatoryunderconditions

Conditionsforusage

Anemailoralandingpagemustbeprovided

Type

emailpattern

Definition/Explanations

Ageneralemailaddressthatcanbeusedascontactpointforaresource(e.g.resource@example.com)

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Recommendedschemaformodels

185

landingPage

Usage

Mandatoryunderconditions

Conditionsforusage

Anemailoralandingpagemustbeprovided

Type

URLpattern

Definition/Explanations

AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Recommendedschemaformodels

186

contactPerson(identifierorpersonName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)

Recommendedschemaformodels

187

contactGroup(identifierororganizationName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.

Recommendedschemaformodels

188

mustBeCitedWith

Usage

Recommended

Type

freetextoridentifier

Definition/Explanations

Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)

Recommendedusage

Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.

Recommendedschemaformodels

189

resourceCreator(personororganization,describedwithidentifierorname)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:creatorDataCite4.0:skos:closeMatchdatacite:Creatorwithdatacite:creatorName(familyName

Recommendedschemaformodels

190

&givenName)ordatacite:nameIdentifier&datacite:nameIdentifierScheme&datacite:schemeURI

Recommendedschemaformodels

191

variantName

Usage

Mandatory

Type

freetext

Definition/Explanations

variantnameusedforthemodel

Recommendedschemaformodels

192

tagset

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaformodels

193

typesystem

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

xs:resourceIdentifierSchemeNameorxs:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaformodels

194

algorithm

Usage

Recommended

Type

freetext

Definition/Explanations

Trainingalgorithmusedforthemodel(e.g.maximumentropy,svmetc.)

Recommendedusage

Please,provideanameandnotdetailsaboutit

Recommendedschemaformodels

195

trainingCorpusDetails

Usage

Recommended

Type

freetext

Definition/Explanations

Detaileddescriptionofthetrainingcorpus(e.g.size,numberoffeaturesetc.)

Recommendedschemaformodels

196

mediaType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mediaType:text,audio,video,image

Definition/Explanations

Specifiesthemediatypeoftheresourceandbasicallycorrespondstothephysicalmediumofthecontentrepresentation.Eachmediatypeisdescribedthroughadistinctivesetoffeatures.Aresourcemayconsistofpartsattributedtodifferenttypesofmedia.Acomponentmaytakeasinput/outputmorethanonedifferentmediatypes.

Recommendedusage

OpenMinTeDonlyhandlestextresources,sothedefaultvalueissetto"text".

Recommendedschemaformodels

197

lingualityType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:lingualityType:monolingual,bilingual,multilingual

Definition/Explanations

Indicateswhethertheresourcecontainsone,twoormorelanguages

Recommendedusage

Please,selectoneofthevalues.

Recommendedschemaformodels

198

language

Usage

Mandatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)forwhichthemodelhasbeentrained,expressedaccordingtoIETFBCP47guidelines.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguagethatthemodelcanbeusedfor(e.g.en-US)accordingtotheIETFBCP47guidelines.Theelementcanberepeatedtoencodemultiplelanguages.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:languageDataCite4.0:skos:closeMatchdatacite:Language

Recommendedschemaformodels

199

size

Usage

Mandatory

Type

sizepattern(sizeandsizeUnit)

Definition/Explanations

Providesinformationonthesizeoftheresourceorofresourceparts.

Recommendedusage

YoumayindicatethesizeoftheentiremodelbyfillingintheappropriatenumberandselectingtheappropriatesizeUnit(e.g.20000words).ThepreferredsizeUnitiswordsorsentences.Ifnothingelseisknown,pleaseindicateatleastfiles.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:extentDataCite4.0:skos:closeMatchdatacite:size

Recommendedschemaformodels

200

mimeType

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentaccepts,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcorporathatincludesfilesofvariousformats.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:formatDataCite4.0:skos:closeMatchdatacite:Format

Recommendedschemaformodels

201

Recommendedschemaformodels

202

characterEncoding

Usage

Recommended

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceoracceptedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.

Recommendedschemaformodels

203

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Formodels,therecommendedrelationisisCompatibleWithholdingwithsoftwarecomponents,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DataCite4.0:skos:closeMatchdatacite:relationType

Recommendedschemaformodels

204

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaformodels

205

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

Recommendedschemaformodels

206

Guidelinesforprovidersofsoftwareresources

IntroductionInstructionsforprovidersofsoftwarecomponentsRecommendedancillaryknowledgeresourcesRecommendedmetadataschemaforsoftwareresources

Guidelinesforprovidersofsoftwareresources

207

IntroductionOpenMinTeDtargetsscholarlyresearcherswhoareagnostictosoftwaredetailsandpeculiaritiesaswellasTDMdevelopers.Itallows,therefore,theregistrationof

applications,thatcanbeusedas-istoperformTDMoperationsoncontentresources,andsoftwarecomponents,i.e.piecesofsoftwarethatcan,bymeansoftheOpenMinTeDWorkflowEditor,beputtogetherandtunedwithvariousancillaryresourcesinordertocreateworkflowsthatwillbedeliveredtotheend-usersand/orfurtherintegratedintootherworkflows.

Allofthesewillbemadeavailabletotheresearchersinawaythatwillnotrequireanykindofexpertisefromthem,bothaslocallydownloadableandexecutabletoolsoraswebservices.

TheOpenMinTeDplatform,atthecurrentstage,supportstheintegrationofsoftwarecomponentswrappedfortheGATEorUIMA/uimaFITframeworks.

TobefullycompatiblewithOpenMinTeD,youmustprovide

ametadatarecordcompliantwiththeOMTD-SHAREschema,atleastattheminimallevel(whichyoucanuploadtotheRegistryasanXMLfileand/oreditwiththeOpenMinTeDmetadataeditor),thesoftwareinanexecutableform,byuploadingitinacompressedfileorprovidingalinktoaURLlocationfromwhichitcanbedirectlyaccessed(i.e.notalandingpage).

Introduction

208

Instructionsforprovidersofsoftwarecomponents

HowtoregisteryourcomponentsHowtomakeyourcomponentsinteroperableHowtodocumentyourcomponentsGuidefordeployingUIMAcomponentsintheArgoplatform

Instructionsforprovidersofsoftwarecomponents

209

Howtoregisteryourcomponents

TherecommendedwayofprovidingsoftwarecomponentsisthroughtheMavenCentralrepositoryaccordingtothefollowinginstructions:

Please,puttogetherinasinglefolder(intheformthatisrequiredfromtheusedtechnologies/frameworks)

allfilesthatimplementthecomponent(e.g.Javaclassesetc.)licencetext(s),preferablynamedas"LICENCE.TXT"inordertobeunambiguouslyrecognised;inthecaseofmultiplelicences,theyshouldbeallaggregatedinthesamefileareadmenotice,thatdescribesthecontentsofthefolderaswellasanyimportantnoticeforthecompilationandexecutionofthecomponentalldescriptors(UIMA/uimaFIT,GATECREOLE ,OMTD-SHAREetc.)availableforthecomponentaccordingtotheimplementationframework,aMavenPOMXMLfile.

PackthemasaJARusingtherespectiveMavenplugin.UploadthemtotheMavenrepositoryaccordingtotheMavenguidelinesFinally,submittheMavencoordinatesintheOMTDregistry;inthiscase,themetadatarecordwillbepartiallyconvertedfromtheMavenPOMfileand,potentiallyfromelementsincludedinthemetadatadescriptorssupportedbyOpenMinTeD(UIMA/uimaFIT,CREOLE,andthenyoucanenrichitusingtheOpenMinTeDeditor.

.DetailsofGATEdescriptorscanbefoundathttps://gate.ac.uk/userguide/sec:creole-model:configalthoughtheydonotcurrentlycontainmany(ifany)oftheinformationneededtocompletetheOMTD-SHAREmetadatadescriptor.NotethatthisischangingtoincludemoreOpenMinTedlikeinformationmuchofwhichwillbespecifiedinaMavenPOMratherthanasCREOLEmetadata.ThisiscurrentlynotdocumentedasitrelatestothenextversionofGATEthatisstillunderactivedevelopment.↩

1

1

Howtoregisteryourcomponents

210

Howtomakeyourcomponentsinteroperable

Inaddition,ifyouwanttobefullycompliantwiththeOpenMinTeDinteroperabilityrequirements,pleaseensurethatyouadoptthefollowingrules;ifyoufailtoabidetothem,itmightstillbepossibletooperateyoursoftwareresourcesviatheOpenMinTeDplatform,butthiscannotbeguaranteedandinteroperabilitywithotherresourceswillsufferloss.

Please,keepancillaryknowledgeresources,e.g.models,annotationresources,etc.,separatefromthecomponentitself;documentanduploadthesealsointheOpenMinTeDRegistryfollowingtheproceduredescribedinGuidelinesforprovidersofancillaryknowledgeresources.Ifyouwanttorefertotheseresourcesfromthesoftwaremetadatarecord,pleaseusetheresourceidentifierforthelinking.Toensurethatprovidedsoftwarecomponentscanbescaledasrequiredfordifferentworkloads,itisrecommendedthattheyareimplementedinastatelessfashion,i.e.withouttheneedtomaintaininformationaboutoneormoredocumentsandtheneedtosharethisinformationwithotherinstancesofthesamecomponent.E.g.acomponentthatcountsalltokensinacorpuscannotbetriviallyscaled.InadditiontoplainUIMA/uimaFITandGATE-CREOLEdescriptors,OpenMinTeDalsosupportsArgodescriptors;furtherinstructionsfordeployingUIMAcomponentsinArgoarefoundhere.

Howtomakeyourcomponentsinteroperable

211

Howtodocumentyourcomponents

TobefullycompatiblewithOpenMinTeD,youmust

ensurethatthesoftwareisdistributedunderaperpetual,world-wide,no-charge,royalty-freecopyright/patentlicencethatpermitsunrestricteduseandallowsunlimitedredistributionincludeinthemetadatarecordalinktothelicencedocument(s)withthetermsandconditionsunderwhichitisprovided,andattachthelicencedocument(s)togetherwiththeresourceifyoualreadyhaveaPIDforyourresource(e.g.aURIoraHANDLE),makesureitisincludedinthemetadatarecord(cf.identifierformoreinformation)ensurethatyouversionallyoursoftwareresourcesandlabeltheversionsinanunambiguousway,preferablyfollowingtheSemanticVersioningrecommendationsensurethatyouprovidewithyoursoftwareresourceappropriatemachine-readablemetadataembeddedinthesourcecode(wherepossible)andaccordingtotherelevantframework(e.g.uimaFITJavaannotationsetc.);makesurethatthemetadatadescriptorsareproperlyidentifiedinanunambiguouswaythatmakesthemeasytodistinguishandextractforJava-basedcomponents,ensurethatyouusetheJavafullyqualifiedclassnamingconventionsfornamingyourcomponents;togetherwiththeMavenpracticesforregisteringpackagingandversion,thiscontributestouniqueidentifiersofthecomponentsdescribealltheexecutionalrequirementsfortheproperoperationofthesoftware,i.e.requiredsoftwarelibraries,ancillaryresources,annotationschemadependencies,etc.describetheinputandoutputrequirementsforyoursoftware,atleastasregardsthetypeofresource,thelanguage(ifrequired),dataformatandcharacterencoding,andannotationtypesoftheinput/outputresourcedeclarewhetherthesoftwareisdownloadableorcanonlybeaccessedasawebserviceinthemetadataensurethatyoudescribeappropriatelythefunctionalitiesofthesoftware,boththroughtheOMTD-SHAREcomponenttypevocabularyaswellasinafreetextdescription,supplyingmoreinformationfortheuser.

Furtherrecommendationsthatcontributetointeroperabilityincludethefollowing:

Itisimportantthatyouprovidetheappropriatedocumentationforyourresource(e.g.manuals,helpfilesetc.),whichyoushouldalsoversionalongwiththesoftwareandaddasreferencetoyourmetadatarecord.Recommendoneofthepublicationsaboutyourresourceastheonetobecitedforscholarlyattributionandaddthisinformationinthemetadatarecord.

Howtodocumentyourcomponents

212

Makesurethatyoufillinthemetadatarecordalltheelementsrequiredforcitingyourresource ,i.e.thecreatoroftheresource,atitle,theresourcetypeandanidentifier,andoptionally,thepublicationdate,theversionandthepublisherordistributor.Inallcases,wherelinkingtootherresourcesorentities(e.g.persons,projectsetc.),pleasetrytodothisthroughuniqueandpersistentidentifiersofauthoritylistsandsources,totheextentpossible,documentingalsotheauthorityand/orschemeitadheresto.

.Forcitation,OpenMinTeDendorsestheJointDeclarationofDataCitationPrinciples,aswellasthemorespecialisedRDArecommendationsfordatacitationofevolvingdataandDataCiteguidelines.↩

1

1

Howtodocumentyourcomponents

213

GuidefordeployingUIMAcomponentsintheArgoplatformArgoisabletousestandardJavaUIMAcomponents,howevertheymustbefirstpackagedasUIMAPEAR(ProcessingEngineARchive)filesbeforetheycanbedeployedwithintheArgoplatform.

ItisstronglyrecommendedtouseMaven,abuildautomationtool,tomanageUIMAcomponentprojects,andaMavenpom.xmltemplate(seefurtherbelow)isavailable.Thehighlightedvalueswithinthepom.xmltemplatearethoseexpectedtobeconfiguredbycomponentdevelopers.

TheveryminimumfilesrequiredtoproduceaworkingUIMAcomponentare:

1. AstandardUIMAXMLdescriptor(locatedunderthedescfolderattherootoftheproject).

2. AJavaclasscontainingtheimplementationofthecomponent(locatedundersrc/main/Java).

3. AMavenpom.xml(adaptedfromthetemplate).

Figure1showstherecommendedlayoutofaverysimplecomponentprojectmanagedbyMaven,usingtheexampleplaceholdervaluesfoundintheMavenpom.xmltemplate.TheUIMAXMLdescriptorshouldbenamedusingtheMavenartifactIdvalue(e.g.uima-component)andresideunderthedescdirectoryandthenanestedsetofdirectoriesrepresentingtheMavengroupIdvalue(e.g.xyz.company.uima).

Figure1:BasiclayoutofaMaven-basedUIMAcomponentproject

ItisrecommendedtousetheMavenartifactIdandgroupIdtoproducetheUIMAComponentID(e.gthegroupIdxyz.company.uimaandartifactIduima-componentshouldresultinaComponentIDofxyz.company.uima.uima-component).Thedefaultconfigurationofthe

GuidefordeployingUIMAcomponentsintheArgoplatform

214

PEARPackagingMavenplugin,withinthepom.xmltemplate,automatesthisprocedure.AComponentIDisintendedtobeuniqueandisnotintendedtobevisibletoArgoend-users.

AnyJavadependenciesofaUIMAcomponentareexpectedtobeincludedwithinacomponent’sPEARfile.Thepom.xmltemplateisconfiguredtoautomaticallypackagetheMavendependencieswhenbuildingaPEARfile.However,toachieveArgocompatibility,itisimportanttoexcludetheuimaj-coreartifactandanyartifactsrepresentingUIMATypeSystems.Inthepom.xmltemplatethisisachievedbysupplyingtheexcludeArtifactIdsconfigurationparameterofthetheMavenDependencypluginwithacomma-delimitedlistoftheaffectedartifactIds.ArgoexpectsUIMATypeSystemstobeinstalledseparatelyandpackagedasPEARfiles,asforUIMAcomponents.

AcomponentmayalsocontainanArgoXMLdescriptorfile,althoughthisisentirelyoptional.Itisintendedtoprovideadditionalmetadataforacomponent.AnArgoXMLdescriptormust:

Resideinthesamedirectoryasthecomponent’sUIMAXMLdescriptor.HavethesamefilenameastheUIMAXMLdescriptor,butwitha.argo.xmlsuffix.

Figure2showsthelocationandnameofanArgoXMLdescriptorfileforacomponentwiththeIDofxyz.company.uima.uima-component,whileFigure3showsthegeneralformatofthedescriptorfileitself.

Figure2:ExamplefilestructureofacomponentcontainingArgoXMLdescriptorfile

<argoDescriptor>

<tags>

<tag>{string}/tag>

...

</tags>

<minimumMemoryInMbs>{integer}</minimumMemoryInMbs>

<interactive>[true/false]</interactive>

<configurationParametersMetaData>

<configurationParameterMetaData>...</configurationParameterMetaData>

...

<configurationParametersMetaData>

</argoDescriptor>

Figure3:StructureofanArgoXMLdescriptor

GuidefordeployingUIMAcomponentsintheArgoplatform

215

WithinanArgodescriptorfile,allofthesub-elementsdirectlyundertheargoDescriptorelementareoptional.

Thetagselementcancontainmultipletagelements,eachcontainingastringvalue.ThesetagvaluesareintendedtobeusedwithinArgo’scomponentsearchfacility,toassistend-usersinfindingrelevantcomponents.

TheminimumMemoryinMbselementholdsanintegervalue,settingthedefaultvaluefortheminimumofamountofmemory(inMegabytes)requiredbythiscomponentwhenitisraninadistributedworkflow.Thisisimportantfordeterminingtheallocationofcomponentstomachines.

Theinteractiveelementcontainsabooleanvalue.Thisvalueissettotruewhenacomponentcontainsacustomwebuserinterface,whichrequiresinteractionwiththeannotationmodelduringaworkflowexecution.TheonlyexistingArgocomponentwiththisvaluesettotrueistheManualAnnotationEditor.

TheconfigurationParametersMetaDatacancontainmultipleconfigurationParameterMetaDataelements,eachoneprovidingadditionalinformationaboutcomponentconfigurationparametersfoundwithinthematchingUIMAXMLdescriptor.AconfigurationParameterMetaDataelementmustcontainanamesubelement(whichhasthesamenameastheconfigurationparameteritisreferencingintheUIMAdescriptor)andauiTypesubelement(whichisusedbyArgotoprovidethemostappropriateUIwidgettotheend-user).ValidvaluesforuiTypearetime,date,datetime.enum,password,type,documentandtext.

Figure4showshowconfigurationParameterMetaDataelementsareconfigurediftheiruiTypevalueiseithertime,dateordatetime.ThecorrespondingUIMAconfigurationparametermustbeoftypestring.Argoneedstoknowhowtoformatthetimechosenbytheend-userusingacalendarUIwidget,sothishastobespecifiedintheformatsubelement,asdemonstratedinFigure4.

<configurationParameterMetaData>

<name>timeParam</name>

<uiType>time</uiType>

<uiConfiguration>

<format>HH:mm:ss</format>

</uiConfiguration>

</configurationParameterMetaData>

Figure4:Adate,timeordatetimeconfigurationparameter

GuidefordeployingUIMAcomponentsintheArgoplatform

216

Forconfigurationparametersthathaveafixedsetofvalues,auiTypevalueofenumisrequired.Thesefixedvaluesshouldbelistedasasetofvalueelements,nestedwithinavalueselement,asshowninFigure5.

<configurationParameterMetaData>

<name>enumParam</name>

<uiType>enum</uiType>

<values>

<value>red</value>

<value>green</value>

<value>blue</value>

</values>

</configurationParameterMetaData>

Figure5:Anenumconfigurationparameter

Configurationparameterscontainingsensitiveinformation,suchaspasswords,shoulduseauiTypevalueofpassword.Thishidesthevalueoftheparameterfromtheuserand,onceentered,doesnotgettransmittedbacktotheArgoUI,foradditionalsecurity.Additionally,itisalsopossibletospecifytheminimumand/orthemaximumnumberofcharacterswhichthisvaluecanhold,usingminandmaxelementswithinthevalueConstraintselement.SeeFigure6foranexample.

<configurationParameterMetaData>

<name>passwordParam</name>

<uiType>password</uiType>

<valueConstraints>

<min>5</min>

<max>10</max>

</valueConstraints>

</configurationParameterMetaData>

Figure6:Apasswordconfigurationparameter

TomakeiteasierforausertoselectUIMAtype(s)withintheArgoUI,anyconfigurationparametersrepresentingtypesshouldhaveuiTypevalueoftype.Thiswillresultinasearchablelistofalltypes,knowntoArgo,beingdisplayedtotheend-userwhentheyareconfiguringthecomponent,fromwhichtherequiredtypescanbeselected.SeeFigure7foranexample.

<configurationParameterMetaData>

<name>typeParam</name>

<uiType>type</uiType>

</configurationParameterMetaData>

GuidefordeployingUIMAcomponentsintheArgoplatform

217

Figure7:Atypeconfigurationparameter

Configurationparameterswhichrefertolocalfilesand/ordirectoriesshouldhavetheuiTypevalueofdocument.Thiswillallowanend-usertoselectfilesfromtheArgoFileStoreusingafileselectordialog.Figure8showsanexampleconfigurationandatabledeclaringtheUIconfigurationparametersavailabletoconfigurethefiledialogcanbefoundinFigure9.

<configurationParameterMetaData>

<name>documentParam</name>

<uiType>document</uiType>

<uiConfiguration>

<selectFile>true</selectFile>

<selectFolder>false</selectFolder>

<selectFilesRecursively>false</selectFilesRecursively>

<hideFiles>false</hideFiles>

<windowCaption>Savefileas...</windowCaption>

</uiConfiguration>

</configurationParameterMetaData>

Figure8:Adocumentconfigurationparameter

selectFile Boolean Allowsausertoselectafileinthedialog

selectFolder Boolean Allowsausertoselectafolderinthedialog

selectFilesRecursively Boolean Recursivelyselectsallofthefilesand/orfolders,undertheselectedfolders.

hideFiles Boolean Onlyshowdirectoriesinthedialog

windowCaption Boolean Acaptiontodisplayinthefilebrowserwindow

Figure9:uiConfigurationelements

ConfigurationparametersthatarelikelytoholdalargeamountoftextshoulduseauiTypevalueoftext.Thiswillresultinalargertextboxbeingmadeavailabletotheend-user.ThesizeofthetextareaisconfiguredusingcharacterWidthandvisibleLineselements,nestedwithintheuiConfigurationelement,asshowninFigure10.

<configurationParameterMetaData>

<name>textAreaParam</name>

<uiType>text</uiType>

<uiConfiguration>

<characterWidth>30</characterWidth>

<visibleLines>5</visibleLines>

</uiConfiguration>

</configurationParameterMetaData>

Figure10:Atextareaconfigurationparameter

GuidefordeployingUIMAcomponentsintheArgoplatform

218

AnexampleofaUIMAXMLdescriptor,alongwithitscorrespondingArgoXMLdescriptor,canbefoundfurtherbelow.

Mavenpom.xmltemplateforArgocomponents

<projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/X

MLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.

apache.org/xsd/maven-4.0.0.xsd">

<modelVersion>4.0.0</modelVersion>

<groupId>xyz.company.uima</groupId>

<artifactId>uima-component</artifactId>

<version>1.0</version>

<build>

<resources>

<resource>

<directory>desc</directory>

</resource>

<resource>

<directory>src/main/resources</directory>

</resource>

</resources>

<plugins>

<plugin>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-dependency-plugin</artifactId>

<version>2.4</version>

<executions>

<execution>

<id>copy-dependencies</id>

<phase>prepare-package</phase>

<goals>

<goal>copy-dependencies</goal>

</goals>

<configuration>

<stripVersion>true</stripVersion><outputDirectory>${project.build.directory}/pearPac

kaging/lib</outputDirectory>

<overWriteReleases>true</overWriteReleases>

<overWriteSnapshots>true</overWriteSnapshots>

<includeScope>runtime</includeScope>

<excludeArtifactIds>U_compareTypeSystem,uimaj-core</excludeArtifactId

s>

</configuration>

</execution>

</executions>

</plugin>

<plugin>

<groupId>org.apache.uima</groupId>

<artifactId>PearPackagingMavenPlugin</artifactId>

<version>2.4.0</version>

<extensions>true</extensions>

<executions>

GuidefordeployingUIMAcomponentsintheArgoplatform

219

<execution>

<phase>package</phase>

<configuration><mainComponentDesc>desc/xyz/company/uima/uima-component.xml</mainComp

onentDesc><componentId>${project.groupId}.${project.artifactId}</componentId>

</configuration>

<goals>

<goal>package</goal>

</goals>

</execution>

</executions>

</plugin>

<plugin>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-install-plugin</artifactId>

<version>2.3.1</version>

<executions>

<execution>

<phase>install</phase>

<configuration>

<packaging>pear</packaging>

<groupId>${project.groupId}</groupId>

<artifactId>${project.artifactId}</artifactId>

<version>${project.version}</version>

<file>${project.build.directory}/${project.groupId}.${project.artifactId}.pear

</file>

</configuration>

<goals>

<goal>install-file</goal>

</goals>

</execution>

</executions>

</plugin>

</plugins>

<pluginManagement>

<plugins>

<plugin>

<groupId>org.eclipse.m2e</groupId>

<artifactId>lifecycle-mapping</artifactId>

<version>1.0.0</version>

<configuration>

<lifecycleMappingMetadata>

<pluginExecutions>

<pluginExecution>

<pluginExecutionFilter>

<groupId>org.apache.maven.plugins</groupId>

<artifactId>maven-dependency-plugin</artifactId>

<versionRange>[1.0.0,)</versionRange>

<goals>

<goal>copy-dependencies</goal>

</goals>

</pluginExecutionFilter>

<action>

GuidefordeployingUIMAcomponentsintheArgoplatform

220

<execute>

<runOnIncremental>false</runOnIncremental>

</execute>

</action>

</pluginExecution>

</pluginExecutions>

</lifecycleMappingMetadata>

</configuration>

</plugin>

</plugins>

</pluginManagement>

</build>

<dependencies>

<dependency>

<groupId>org.apache.uima</groupId>

<artifactId>uimaj-core</artifactId>

<version>2.7.0</version>

</dependency>

<dependency>

<groupId>org.u_compare</groupId>

<artifactId>U_compareTypeSystem</artifactId>

<version>1.1</version>

</dependency>

</dependencies>

</project>

ArgoXMLDescriptorexample

<argoDescriptor>

<tags>

<tag>categoryA</tag>

<tag>finance</tag>

</tags>

<minimumMemoryInMbs>256</minimumMemoryInMbs>

<interactive>false</interactive>

<configurationParametersMetaData>

<configurationParameterMetaData>

<name>timeParam</name>

<uiType>time</uiType>

<uiConfiguration>

<format>HH:mm:ss</format>

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>dateParam</name>

<uiType>date</uiType>

<uiConfiguration>

<format>yyyy/MM/dd</format>

GuidefordeployingUIMAcomponentsintheArgoplatform

221

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>dateTimeParam</name>

<uiType>datetime</uiType>

<uiConfiguration>

<format>yyyy/MM/ddHH:mm:ss</format>

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>enumParam</name>

<uiType>enum</uiType>

<values>

<value>red</value>

<value>green</value>

<value>blue</value>

</values>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>passwordParam</name>

<uiType>password</uiType>

<uiConfiguration>

</uiConfiguration>

<valueConstraints>

<min>5</min>

<max>10</max>

</valueConstraints>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>typeParam</name>

<uiType>type</uiType>

<uiConfiguration>

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>documentParam</name>

<uiType>document</uiType>

<uiConfiguration>

<selectFile>true</selectFile>

<selectFolder>false</selectFolder>

<selectFilesRecursively>false</selectFilesRecursively>

<hideFiles>false</hideFiles>

<windowCaption>Savefileas...</windowCaption>

</uiConfiguration>

</configurationParameterMetaData>

<configurationParameterMetaData>

<name>textAreaParam</name>

<uiType>text</uiType>

<uiConfiguration>

<characterWidth>30</characterWidth>

<visibleLines>5</visibleLines>

</uiConfiguration>

</configurationParameterMetaData>

GuidefordeployingUIMAcomponentsintheArgoplatform

222

</configurationParametersMetaData>

</argoDescriptor>

UIMAAnalysisEngineXMLDescriptorreferencedbytheArgoXMLDescriptor

<?xmlversion="1.0"encoding="UTF-8"?>

<analysisEngineDescriptionxmlns="http://uima.apache.org/resourceSpecifier">

<frameworkImplementation>org.apache.uima.Java</frameworkImplementation>

<primitive>true</primitive>

<annotatorImplementationName>xyz.company.uima.UimaComponent</annotatorImplementationNa

me>

<analysisEngineMetaData>

<name>UIMAComponent</name>

<description/>

<version>1.0</version>

<vendor/>

<configurationParameters>

<configurationParameter>

<name>timeParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>dateParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>dateTimeParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>enumParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>passwordParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>typeParam</name>

<type>String</type>

<multiValued>false</multiValued>

GuidefordeployingUIMAcomponentsintheArgoplatform

223

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>documentParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

<configurationParameter>

<name>textAreaParam</name>

<type>String</type>

<multiValued>false</multiValued>

<mandatory>false</mandatory>

</configurationParameter>

</configurationParameters>

<configurationParameterSettings/>

<typeSystemDescription/>

<typePriorities/>

<fsIndexCollection/>

<capabilities>

<capability>

<inputs/>

<outputs/>

<languagesSupported/>

</capability>

</capabilities>

<operationalProperties>

<modifiesCas>true</modifiesCas>

<multipleDeploymentAllowed>true</multipleDeploymentAllowed>

<outputsNewCASes>false</outputsNewCASes>

</operationalProperties>

</analysisEngineMetaData>

<resourceManagerConfiguration/>

</analysisEngineDescription>

GuidefordeployingUIMAcomponentsintheArgoplatform

224

Recommendedancillaryknowledgeresources

Inordertofurtherencourageinteroperability,OpenMinTeDmakesspecificrecommendationsaboutparticularknowledgeresourcesthatTDMtoolsandservicesshoulduse.TheserecommendationsareintheareasoflinguisticsandoftheinitialdomainsofusetargetedbyOpenMinTeD.Thecurrentrecommendationsshouldnotbeseenasafinalandstaticset.Theywillevolvewithexperience,andasOpenMinTeDisusedforTDMofnewdomains.Usersarethereforeencouragedtousetheexistingrecommendations,buttomakeuseofotherswherethesearenotsuitable.

TDMtoolsandservicesshoulduseresourcesfromthefollowinginitiallistwherepossible.Wherethisisnotpossible,knowledgeresourceauthorsareencouragedtoprovidelinkagesbetweentheirownresourceandthosegivenhere,ortoanyotherwidelyusedorstandardLinkedDataknowledgeresource.Thislistofrecommendedresourcesshouldbeseenasafirstversion,andwillbeextended.

SocialsciencesresourcesTheSoz

AgricultureandagronomyresourcesAgrovocOntologiesfromAgroPortal

LifesciencesresourcesOboInOwlMeSH(availableinLOD)BioCNeuroLexBioLexicon

LinguisticresourcesLAPPS(vocabularyofcorelinguisticobjects)UniversalDependencies(partofspeechtags,featuresformorphologyandsyntacticdependencies)OLIA(referencemodelandannotationmodelsformorphology,morphosyntax,dependencies)PennTreebank(partofspeechtagsandfeaturesofmorphology)ISOcat/CCR(linguisticandmetadataterminology)GOLD(linguisticontology)

Typesystems

1

Recommendedancillaryknowledgeresources

225

*usedbythesoftwarecomponentsintegratedintheOpenMinTeDplatform(GATE,DKPRO,ALVIS,ARGOandILSP)ISOcathasrecentlymovedtotheClarinConceptRegistry(CCR)andiscurrentlyundercuration.

1

Recommendedancillaryknowledgeresources

226

RecommendedschemaforsoftwareresourcesThissectionincludesasynopsisoftherecommendedschemaforsofwareresources,i.e.thesubsetofM(andatory)andstronglyR(ecommended)metadataelements,onlyasregardselementsrelatedtotheresourceitself.Additionalelementsrequiredforthemanagementofthemetadatarecord(e.g.metadataCreationDate,metadataCreatoretc.)arenotpresentedhere,astheyaretobehandledbytheOMTDplatform.

OMTD-SHAREelement Usage

resourceType M

resourceName M

description M

identifier M

version M

componentDistributionMedium M

componentType M

licenceorrightsStmtName&rightsStmtURL(oneofthetwomustbeprovided) M

versionoflicence M

contactEmailorlandingPage(oneofthetwomustbeprovided) M

contactPerson(identifierorpersonName) R

contactGroup(identifierororganizationName) R

mailingList(mailingListName,subscribe,unsubscribe,post,archive,otherArchive) R

issueTracker R

onlineHelpURL R

mustBeCitedWith R

downloadURLoraccessURL(oneofthetwoshouldbeprovided) Mwhenapplicable

resourceCreator(personororganization,describedwithidentifierorname) R

mediaTypeinsideinputContentResourceInfooroutputResourceInfo(i.e.mediaTypeofinputandoutputresource)

Mwhenapplicable

resourceTypeinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

Recommendedschemaforsoftwareresources

227

languageinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

characterEncodinginsideinputContentResourceInfooroutputResourceInfo

Rwhenapplicable

mimeTypeinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo

Rwhenapplicable

typesysteminsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

tagsetinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

annotationLevelinsideinputContentResourceInfooroutputResourceInfo Rwhenapplicable

typesystem R

tagset R

annotationResource R

framework R

forparameters:parameterName,description,parameterType,mandatory,multiValue

Mwhenapplicable

relationType=isCompatibleWith(externalrelation;linktomodels,annotationresourcesetc.thatcanbeusedwiththecomponent) R

Recommendedschemaforsoftwareresources

228

resourceType

Usage

Mandatory

Type

Closedcontrolledvocabulary

Attributes

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,lexicalConceptualResource,languageDescription,model,component

Definition/Explanations

Specifiesthetypeoftheresourcebeingdescribedorthetypeoftheresourcethatatoolorservicetakesasinputorproducesasoutput

Recommendedusage

Forcomponents,thefixedvalue"component"mustbeaddedautomatically

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:typeDataCite4.0:skos:closeMatchdatacite:resourceTypeGeneral&datacite:resourceType

Recommendedusageistouseoneofthevalues"software","service"or"workflow"fordatacite:resourceTypeGeneral

resourceType

229

resourceName

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Thefullnamebywhichtheresourceisknown

Recommendedusage

Please,provideashortbutdescriptiveanduniquenamefortheresource,e.g.“OpenNLPtagger”insteadofjust“taggerofEnglish”.ProvidethenameinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.N.B.Thiselementisintendedforahuman-readable/human-understandablenamefortheresource.

Relationtoothermetadataschemas

MavenPOM4.0.0:nameGATE:nameUIMA/UIMA-fit:nameDCMI:skos:exactMatchdct:titleDataCite4.0:skos:exactMatchdatacite:title

resourceName

230

description

Usage

Mandatory

Type

Multilingualfreetext

Attributes

xs:lang

Definition/Explanations

Providesthedescriptionoftheresourceinprose

Recommendedusage

Giveabriefyetinformativedescriptionofthefunctionalitiesofthecomponent,thelanguage(s)itworkson,inputrequirementsetc.Please,providethetextinEnglish;ifyouwanttoaddtextsinotherlanguages,youcanaddthemusingthe“lang”attributetospecifythelanguage.

Relationtoothermetadataschemas

MavenPOM4.0.0:descriptionGATE:commentUIMA/UIMA-fit:descriptionDCMI:skos:exactMatchdct:descriptionDataCite4.0:Description&descriptionTypewithvalue"abstract"

description

231

identifier

Usage

Mandatory

Type

freetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI

Definition/Explanations

ReferencetoaPID,DOIoranykindofidentifierusedbytheresourceproviderfortheresource

Recommendedusage

Provideauniqueidentifieralreadyassignedbyanauthoritativesource;youcanuseeither

theattribute"resourceIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,HDL,ISLRNetc.)or,iftheschemeisnotlistedamongthem,selectthe"other"valueandusetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Iftheresourcedoesn'thaveauniqueidentifier,anidentifierwillbeassignedbyOpenMinTeD.ForcomponentsharvestedfromMaven,theMavenidcanbeusedwithareferencetotheMavenscheme(https://maven.apache.org/pom.html#Maven_Coordinates\).ThisiscombinedwiththeJavafullyqualifiedclassnamingconventionstogivethefollowingcoordinates:groupId:artifactId:version:(packaging):(classifier)#class

Relationtoothermetadataschemas

MavenPOM4.0.0:groupId&artifcactId&version&packaging&classifier,withresourceIdentifierSchemeURI="https://maven.apache.org/pom.html#Maven_Coordinates"GATE:classUIMA/UIMA-fit:classDCMI:skos:narrowMatchdct:identifierDataCite4.0:skos:broadMatchdatacite:identifier(identifierTypecanonlybeDOI)

identifier

232

identifier

233

version

Usage

Recommended

Definition/Explanations

Anystring,usuallyanumber,thatidentifiestheversionofaresource

Recommendedusage

Forcomponents,therecommendedpracticeistofollowthesemanticversioning(http://semver.org/).N.B."version"shouldnotbeconfusedwiththerelationthatlinkstogetheraspecificresourcewithitsvariousenrichedormodifiedversions(e.g.annotatedversion,subsetetc.).

Relationtoothermetadataschemas

MavenPOM4.0.0:versionDCMI:skos:exactMatchdct:hasVersionDataCite4.0:skos:exactMatchdatacite:Version

version

234

componentType

Usage

Mandatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:componentType:access,reader,writer,supportComponent,visualizer,debugger,validator,viewer,corpusViewer,lexiconViewer,editor,mlTrainer,mlPredictor,featureExtractor,dataSplitter,dataMerger,converter,evaluator,flowController,scriptBasedAnalyzer,matcher,gazetteerBasedComponent,crowdSourcingComponent,dataCollector,crawler,processingComponent,annotator,segmenter,stemmer,lemmatizer,tagger,chunker,parser,coreferenceAnnotator,namedEntityRecognizer,semanticsAnnotator,srlAnnotator,readabilityAnnotator,aligner,generator,summarizer,simplifier,naturalLanguageGenerator,prePostProcessor,spellingChecker,grammarChecker,normalizer,filters,extractor,topicExtractor,documentClassifier,languageIdentifier,sentimentAnalyzer,keywordsExtractor,terminologyExtractor,contradictionDetector,emotionRecognizer,eventExtractor,persuasiveExpressionMiner,informationExtractor,lexiconExtractorFromCorpora,lexiconExtractorFromLexica,wordSenseDisambiguator,qualitativeAnalyser

Definition/Explanations

Specifiesthetypeofthecomponentintermsofthefunction/taskitperforms

Recommendedusage

Please,selectoneofthepredefinedvalues.Itshouldbenotedthatthevaluesarehierarchicallyorganised,soit'srecommendedtoselectthemorespecificvalueapplicable(e.g."visualizer"ratherthanthebroader"supportComponent").Thecurrentlistofvaluesisintendedforusemainlybysimplecomponentsratherthanworkflowsorfullapplications.Thelistwillbefurtherenrichedwithvaluesthattargettheend-usersalso.

Relationtoothermetadataschemas

DCMI:skos:narrowMatchdct:type

componentType

235

componentType

236

licence

Usage

Mandatoryuponconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:licence:CC-BY,CC-BY-NC,CC-BY-NC-ND,CC-BY-NC-SA,CC-BY-ND,CC-BY-SA,CC-ZERO,PDDL,ODC-BY,ODbL,MS-NoReD,MS-NoReD-FF,MS-NoReD-ND,MS-NoReD-ND-FF,MS-NC-NoReD,MS-NC-NoReD-FF,MS-NC-NoReD-ND,MS-NC-NoReD-ND-FF,ELRA_END_USER,ELRA_EVALUATION,ELRA_VAR,CLARIN_PUB,CLARIN_ACA,CLARIN_ACA-NC,CLARIN_RES,AGPL,ApacheLicence_2.0,BSD_4-clause,BSD_3-clause,FreeBSD,GFDL,GPL,LGPL,MIT,Princeton_Wordnet,proprietary,underNegotiation,nonStandardLicenceTerms

Definition/Explanations

Thelicenceofusefortheresource

Recommendedusage

Youcanprovideinformationontherightsofaccessingandusingaresourceinoneofthefollowingways,inorderofpreference:

usetheelement"licence"andselectoneoftherecommendedlicences;please,notethatthelistcontainslicencesintendedfordataresources&componentsmixedtogether;forcomponentstherecommendedlicencesaretheOpenSourcelicences;fordataresources,pleaseuseastandardlicencesuchasoneoftheCCfamily;ifthelicenceyouuseisnotincludedinthelist,youcanusethe"nonStandardLicenceTerms"orthe"proprietary"valuesandgivefurtherinformationonyourlicenceintheelements:"nonStandardLicenceName","nonStandardLicenceTermsURL"and"nonStandardLicenceTermsText"youcanalsousethe"rightsStatementName"andthe"rightsStatementURL"(withalinktoaURLwithmoreexplanationsonitsusage)iftheresourceisprovidedwithageneralstatementofuseandnotanofficiallicencedocument;please,notethatthisisanoption

licence

237

usedmainlytofacilitateend-usersinaccessingyourresource,whileyouarestronglyadvisedtoproperlylicenseyourresource.

Relationtoothermetadataschemas

MavenPOM4.0.0:license/nameDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rights

licence

238

rightsStmtName

Usage

Mandatoryuponconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

openAccessclosedAccessembargoedAccessrestrictedAccess:

Definition/Explanations

Thenameofanofficialstatementindicativeoflicensingtermsfortheuseofaresource(e.g.openaccess,freetoreadetc.);itssemanticsshouldbeclear,preferrablyformallyexpressedandstoredataurl.

ThecurrentlistofpredefinedvaluescomesfromOpenAIRE,butit'sunderrevision.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rights

rightsStmtName

239

rightsStmtURL

Usage

Mandatoryuponconditions

Conditionsforusage

eitherlicenceorrightsStmtmustbefilledin

Type

URLpattern

Definition/Explanations

LinktotheURLwiththetextthatformallyexplainsthelicensingconditionsimposedbytherightsstatement.

Recommendedusage

The"rightsStmtName"and"rightsStmtURL"elementscanbeusedinadditiontothe"licence"valueinordertofacilitateuserstounderstandthelicensingtermsofaresource.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:accessRightsDataCite4.0:skos:closeMatchdatacite:rightsURI

rightsStmtURL

240

nonStandardLicenceTermsURL

Usage

Mandatoryuponconditions

Conditionsforusage

whenoneofthevalues"nonStandardLicenceTerms"or"proprietary"isselectedfor"licence"

Type

URLpattern

Definition/Explanations

Usedtoprovideahyperlinktoaurlcontainingthetextofalicencenotincludedinthepredefinedlistordescribingthetermsofuseforalanguageresourceortermsofserviceforwebservices

Recommendedusage

Please,providethelinktothefulltextdocumentofthelicence.Pleasenotethatthisisthepreferredoptionoverinsertingthelicencetextintheelement"nonStandardLicenceTermsText",asitprovidesapermanentaccessibletoalllocationforthelicence.

Relationtoothermetadataschemas

MavenPOM4.0.0:license/urlDCMI:skos:closeMatchdct:licenseDataCite4.0:skos:closeMatchdatacite:rightsURI

nonStandardLicenceTermsURL

241

versionoflicence

Usage

Mandatory

Type

freetext

Definition/Explanations

Theversionofthelicence

Recommendedusage

Youareadvisedtoindicatetheversionofthelicenceofyourresource;thelatestversionisthepreferredoption,e.g."4.0"forallCC-licencesand"2.0"fortheMETA-SHARE-NoReDones.

Relationtoothermetadataschemas

DCMI:skos:closeMatchdct:hasVersion(fordct:licenseDocument)

versionoflicence

242

componentDistributionMedium

Usage

Mandatory

Type

closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms-omtd:componentDistributionMedium:webService,sourceCode,executableCode,sourceAndExecutableCode

Definition/Explanations

Themedium/formofthedistribution(e.g.downloadableresource,accessiblethroughinterface,sourcecodeetc.)

componentDistributionMedium

243

accessURL

Usage

Recommendedunderconditions

Type

urlpattern

Definition/Explanations

Alandingpage,feed,SPARQLendpointetc.thatgivesaccesstotheresourceorwherethewebservice/workflowisexecuted

Recommendedusage

Pleaseuseforcomponentsthatareexecutableaswebservices

accessURL

244

downloadURL

Usage

Recommendedunderconditions

Type

urlpattern

Definition/Explanations

Anyurlwheretheresourcecanbedownloadedfrom

Recommendedusage

Please,useifthecomponentisdistributedassourceand/orexecutablecode,andhastobedownloadedinordertobeexecuted;thiselementisofparticularimportanceifyouhavenotuploadedtheresourceintherepository

Relationtoothermetadataschemas

MavenPOM4.0.0:canbedonethroughID

downloadURL

245

contactEmail

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

emailpattern

Definition/Explanations

Ageneralemailaddressthatcanbeusedascontactpointforaresource(e.g.resource@example.com)

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

contactEmail

246

landingPage

Usage

Mandatoryunderconditions

Conditionsforusage

AnemailoralandingPagemustbeprovided

Type

URLpattern

Definition/Explanations

AURLusedasthelandingpageofaresourceprovidinggeneralinformation;forinstance,itmaypresentadescriptionoftheresource,itscreatorsandpossiblyincludelinkstotheURLwhereitcanbeaccessedfrom

Recommendedusage

Youcanindicateacontactpointwhereuserscansolicitfurtherinformationinoneofthefollowingways,inorderofpreference:

giveageneralemailaddressatthe"contactEmail"address,orprovideat"landingPage"thelinktoawebpagethatdocumentstheresource(e.g.apagewithdocumentation,examplesandlinkstotheresourceitself).Youcanalsoindicatetheperson(s)orgroup(s)thatareresponsibleforcommunicationinthe"contactPerson"and"contactGroup"element

Relationtoothermetadataschemas

MavenPOM4.0.0:url

landingPage

247

contactPerson(identifierorpersonName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons.IfyoudecidetoaddacontactPersoninsteadofageneralcontactEmail,pleaseensurethatthedata(includingtheemail)ofthispersonarealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

DataCite4.0:contributorwithdatacite:contributorType="ContactPerson",*datacite:contributorName(familyName&givenName)ordatacite:nameIdentifieranddatacite:nameIdentifierSchemeanddatacite:schemeURI)

contactPerson(identifierorpersonName)

248

contactGroup(identifierororganizationName)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationonthegroup(s)thatis/areresponsibleforprovidingfurtherinformationregardingtheresource

Recommendedusage

Therecommendedwayforreferringtoagroup(currentlymodelledasanorganization)isbygivingtheiridentifier(e.g.ISNI,fundref);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifierofthegroup(organization),youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.IfyoudecidetoaddacontactGroupinsteadofanothercontactoption,pleaseensurethatthedata(includingthecommunicationdata)ofthisgroup(organization)arealsouploadedinOpenMinTeD.

Relationtoothermetadataschemas

MavenPOM4.0.0:developers

contactGroup(identifierororganizationName)

249

mailingListInfo

Usage

Recommended

Type

setofmetadataelements

Definition/Explanations

Setofmetadataelements(name,subscribe,unsbuscribe,post,archive,otherArchive)requiredfordocumentingamailing

Recommendedusage

Mailinglistsareimportantfortrackinginformationusefulfordevelopersand/orusers;thewholesetofelementsinthemailingListgroupcanberepeatedforrecordingmultiplemailinglists.

Relationtoothermetadataschemas

MavenPOM4.0.0:Mailinglist

mailingListInfo

250

onlineHelpURL

Usage

Recommended

Type

urlpattern

Definition/Explanations

Aurlintendedforend-usersprovidingusefulinformationregardingthecomponetusage/application,e.g.executiontips,faq's,helpforumsetc.

Relationtoothermetadataschemas

GATE:helpurl

onlineHelpURL

251

issueTracker

Usage

Recommended

Type

urlpattern

Definition/Explanations

Theurlwhereissues,bugs,andfeaturerequestsshouldbesubmitted;thisinformationisimportantfors/wdevelopers

Relationtoothermetadataschemas

MavenPOM4.0.0:issuemanagement/url

issueTracker

252

mustBeCitedWith

Usage

Recommended

Type

freetextoridentifier

Definition/Explanations

Publicationtobeusedforcitationpurposesasrequestedbyresourceproviders(usuallyascientificarticlethatdescribestheresource)

Recommendedusage

Thepreferredoptiontorefertoapublicationisbyprovidingitsuniqueidentifieralreadyassignedbyanauthoritativesource;thepreferredidentifierforpublicationsisDOI;youcanuseeither

theattribute"publicationIdentifierSchemeName"tospecifythescheme,byselectingoneofthepre-definedvalues(e.g.DOI,ISBNetc.)or,iftheschemeisnotlistedamongthem,usethe"other"value,usetheattribute"schemeURI"toprovidealinktotheURLthatdocumentstheschemeitadheresto.Ifyoudon'tknowthepublicationidentifier,youcanprovidethefullbibliographicrecordasafreetextformat.N.B.Thecitationpublicationshouldnotbeconfusedwiththeattributiondatawhichisalegalobligation;citationthroughpublicationsisacommonpracticeinresearch.

mustBeCitedWith

253

resourceCreator(personororganization,describedwithidentifierorname)

Usage

Recommended

Type

identifierormultilingualfreetext

Attributes

forperson:ms:personIdentifierSchemeName(foridentifiers)orxs:lang(forname);fororganization:ms:organizationIdentifierSchemeName(foridentifiers)orxs:lang(forname)

Definition/Explanations

Groupsinformationontheperson(s)ororganization(s)thathas/havecreatedtheresource

Recommendedusage

Therecommendedwayforreferringtoapersonisbygivingtheiridentifier,preferablytheORCID;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"personIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheperson,youmayprovidethename,preferablyintheformat"Surname,Firstname"atleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Therecommendedwayforreferringtoanorganizationisbygivingtheiridentifier(e.g.ISNI);ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"organizationIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheorganizationatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Theelementcanalsoberepeatedtoencodemultiplepersons/organizations.ForcorporacreatedthroughtheOMTDcorpusbuildingprocess,theresourcecreatorisconsideredtobethepersonthathasputtogetherthecorpusthroughtheuserquery.

Relationtoothermetadataschemas

MavenPOM4.0.0:developersDCMI:skos:closeMatchdct:creator

resourceCreator(personororganization,describedwithidentifierorname)

254

DataCite4.0:creatorwithcreatorNameornameIdentifier&nameIdentifierScheme&schemeURI;N.B.creatorNamefamilyName&givenNameinv4

resourceCreator(personororganization,describedwithidentifierorname)

255

mediaTypeinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

Closedcontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mediaType:text,audio,video,image

Definition/Explanations

Specifiesthemediatypeoftheresourcethatthecomponentprocessesand/orproduces.

Recommendedusage

OpenMinTeDonlyhandlestextresources,soonly"text"mustbeallowed.

mediaTypeinsideinputContentResourceInfooroutputResourceInfo

256

resourceTypeinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

controlledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:resourceType:corpus,document,userInputText,lexicalConceptualResource,languageDescription

Definition/Explanations

Thetypeoftheresourcethatthecomponenttakesasinputorproducesasoutput

Recommendedusage

Pleaseuseespeciallyforreadersandwritersinordertospecifytheresourcetypetheycanprocessorproduce;e.g.forreaders,whethertheytakeasinputadocument(singlefile)orcollectionoffiles(corpus).

Relationtoothermetadataschemas

GATE:parameters/UIMA/UIMA-fit:Parametersinput/outputtypes

resourceTypeinsideinputContentResourceInfooroutputResourceInfo

257

languageinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:language(acombinationoflanguageId,scriptId,regionIdandvariantIdaccordingtotheIETFBCP47guidelines):

Definition/Explanations

Thelanguage(s)ofthetextthatthecomponentsupports(takesasinputand/orproduces),expressedaccordingtoIETFBCP47guidelines.Theelementcanberepeatedtoencodemultiplelanguages.

Recommendedusage

Please,enterthelanguageand,ifneeded,theregion,scriptandvariantidentifierthatbestfitsthelanguageofthedocument(e.g.en-US)thatthecomponentsupports(takesasinputand/orproduces),expressedaccordingtotheIETFBCP47guidelines.Theelementcanberepeatedforcomponentsthatsupportvariouscharacterencodings.

Relationtoothermetadataschemas

UIMA/UIMA-fit:@LanguageCapabilityDataCite4.0:language-butthisisthelanguageoftheresourceandnotofinput/output

languageinsideinputContentResourceInfooroutputResourceInfo

258

characterEncodinginsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:characterEncoding:alonglistofpopularcharacterencodings

Definition/Explanations

Thenameofthecharacterencodingusedintheresourceorsupportedbythecomponent

Recommendedusage

Please,selectoneofthepre-definedvalues;itshouldbenoted,however,thatforOpenMinTeDthepreferredcharacterencodingisUTF-8toensureinteroperabilitybetweencontentandcomponents.Theelementcanberepeatedforcomponentsthatsupportvariouscharacterencodings.

Relationtoothermetadataschemas

GATE:Parameters/encoding

characterEncodinginsideinputContentResourceInfooroutputResourceInfo

259

mimeTypeinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:mimetype(asubsetofvalues(themostpopularonesfortextfiles)fromtheIANAmimetypecontrolledvocabulary):text/plain,application/vnd.xmi+xml,text/xml,application/x-tmx+xml,application/x-xces+xml,application/tei+xml,application/rdf+xml,application/xhtml+xml,application/emma+xml,application/pls+xml,application/postscript,application/voicexml+xml,text/sgml,text/html,application/x-tex,application/rtf,application/json+ld,application/x-latex,text/csv,text/tab-separated-values,application/pdf,application/x-msaccess,audio/mp4,audio/mpeg,audio/wav,image/bmp,image/gif,image/jpeg,image/png,image/svg+xml,image/tiff,video/jpeg,video/mp4,video/mpeg,video/x-flv,video/x-msvideo,video/x-ms-wmv,application/msword,application/vnd.ms-excel,audio/mpeg3,text/turtle,other,audio/PCMA,audio/flac,audio/speex,audio/vorbis,video/mp2t

Definition/Explanations

Themime-typeoftheresource(aformalizedspecifierfortheformat)oramime-typethatthecomponentsupports,inconformancewiththevaluesoftheIANA(InternetAssignedNumbersAuthority)

Recommendedusage

Please,selectoneofthepre-definedvalues(whicharethemostpopularonesfortextfiles)oraddavalue,PREFERABLYFROMTHEIANAMEDIAMIMETYPERECOMMENDEDVALUES(http://www.iana.org/assignments/media-types/media-types.xhtml)Theelementcanberepeatedforcomponentsthatsupportmultiplemimetypes.

mimeTypeinsideinputContentResourceInfooroutputResourceInfo

260

Relationtoothermetadataschemas

UIMA/UIMA-fit:@MimeTypeCapability

mimeTypeinsideinputContentResourceInfooroutputResourceInfo

261

dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

aclAnthologyaimedCorpusalvisEnrichedDocumentbioNLPbioNLP;format-variant=ST2013a1_a2bnccadixeJSONconll2000conll2002conll2006conll2007conll2009conll2012dataSiftfactoredTagLemgategeniagrafhtml5Microdatai2b2imsCwbjdbckeaCorpuslllnegraExportpmlptb;format-variant=chunkedptb;format-variant=combinedrelptigertupp-dztwitteruimaBinaryCasuimaCASDumpweb1txces;format-variant=ilsp:

Definition/Explanations

Thesupplementarylevelofdataformat

Recommendedusage

Please,usetofurtherspecifytheformatoftheresourcesupportedbythecomponent(asinputoroutput).Forinteroperabilityreasons,itisimportanttostandardiseasfaraspossiblethiselement;thisiswhyalistofvaluesincludingtheformatscurrentlysupportedbycomponentsintheOMTDregistryisprovided.Wherepossible,itisalsorecommendedtousethe"documentationURL"elementwithinformationandexamplesaboutthespecificdataformat.

Relationtoothermetadataschemas

UIMA/UIMA-fit:@MimeTypeCapability

dataFormatSpecificinsideinputContentResourceInfooroutputResourceInfo

262

typesysteminsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

whenthes/wcomponenttakesasinput(orprovidesasoutput)aresourcethatusesaspecifictypesystem

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

typesysteminsideinputContentResourceInfooroutputResourceInfo

263

tagsetinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

tagsetinsideinputContentResourceInfooroutputResourceInfo

264

annotationLevelinsideinputContentResourceInfooroutputResourceInfo

Usage

Mandatorywhenapplicable

Conditionsforusage

iftheinputcontentresource(i.e.theresourcetobemined)ortheoutputresource(theresultsoftheprocessing)istobedescribed,thiselementisobligatory

Type

opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:annotationLevel:alignment,discourseAnnotation,discourseAnnotation-argumentation,discourseAnnotation-audienceReactions,discourseAnnotation-coreference,discourseAnnotation-dialogueActs,discourseAnnotation-discourseRelations,lemmatization,morphosyntacticAnnotation-bPosTagging,morphosyntacticAnnotation-posTagging,segmentation,semanticAnnotation,semanticAnnotation-certaintyLevel,semanticAnnotation-emotions,semanticAnnotation-events,semanticAnnotation-namedEntities,semanticAnnotation-polarity,semanticAnnotation-questionTopicalTarget,semanticAnnotation-readabilty,semanticAnnotation-semanticClasses,semanticAnnotation-semanticRelations,semanticAnnotation-semanticRoles,semanticAnnotation-speechActs,semanticAnnotation-subjectivity,semanticAnnotation-temporalExpressions,semanticAnnotation-textualEntailment,semanticAnnotation-wordSenses,syntacticAnnotation-semanticFrames,speechAnnotation,speechAnnotation-orthographicTranscription,speechAnnotation-paralanguageAnnotation,speechAnnotation-phoneticTranscription,speechAnnotation-prosodicAnnotation,speechAnnotation-soundEvents,speechAnnotation-soundToTextAlignment,speechAnnotation-speakerIdentification,speechAnnotation-speakerTurns,stemming,structuralAnnotation,structuralAnnotation-documentDivisions,structuralAnnotation-sentences,structuralAnnotation-clauses,structuralAnnotation-phrases,structuralAnnotation-words,syntacticAnnotation-subcategorizationFrames,syntacticAnnotation-dependencyTrees,syntacticAnnotation-constituencyTrees,syntacticAnnotation-chunks,syntacticosemanticAnnotation-links,translation,transliteration,modalityAnnotation-bodyMovements,modalityAnnotation-facialExpressions,modalityAnnotation-

annotationLevelinsideinputContentResourceInfooroutputResourceInfo

265

gazeEyeMovements,modalityAnnotation-handArmGestures,modalityAnnotation-handManipulationOfObjects,modalityAnnotation-headMovements,modalityAnnotation-lipMovements,other

Definition/Explanations

Theannotationleveloftheannotatedresourceorwhatas/wcomponentconsumesorproducesasoutput

Relationtoothermetadataschemas

UIMA/UIMA-fit:@TypeCapability

annotationLevelinsideinputContentResourceInfooroutputResourceInfo

266

typesysteminsidecomponentDependencies

Usage

Mandatorywhenapplicable

Conditionsforusage

whenthes/wcomponentusesaspecifictypesystemforitsoperation

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetypesystemusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

typesysteminsidecomponentDependencies

267

tagsetinsidecomponentDependencies

Usage

Mandatorywhenapplicable

Conditionsforusage

whenthes/wcomponentusesaspecifictagsetforitsoperation

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetagsetusedintheannotationoftheresourceorusedbythecomponent

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

tagsetinsidecomponentDependencies

268

annotationResourceinsidecomponentDependencies

Usage

Mandatorywhenapplicable

Conditionsforusage

whenthes/wcomponentusesaspecificannotationresourceforitsoperation

Type

identifierormultilingualfreetext

Attributes

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Aresource(e.g.ontology,terminologicalresource)usedforannotatingadocument,corpus,sentenceetc.

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribetypesystems,tagsets,annotationresourcesetc.intheOpenMinTeDregistryandrefertothemthroughtheidentifier.

annotationResourceinsidecomponentDependencies

269

framework

Usage

Recommended

Controlledvocabularyreferenceand/orvalues

UIMAGATEAlvisNLPother:

Definition/Explanations

Theframeworkusedfordevelopinganddeployingthecomponent

framework

270

relationType

Usage

Recommended

Type

Opencontrolledvocabulary

Controlledvocabularyreferenceand/orvalues

ms:relationType:isPartOf,isPartWith,hasPart,hasOutcome,isCombinedWith,requiresLR,requiresSoftware,isexactMatch,isSimilarTo,isContinuationOf,isVersionOf,replaces,isReplacedWith,isCreatedBy,isElicitedBy,isRecordedBy,isEditedBy,isAnalysedBy,isEvaluatedBy,isQueriedBy,isAccessedBy,isArchivedBy,isDisplayedBy,isCompatibleWith

Definition/Explanations

Specifiesthetypeofrelationholdingbetweentwoentities(e.g.tworesourcesthatcompriseonenewresourcetogether,acorpusandthes/wcomponentthathasbeenusedforitscreationoracorpusandthepublicationthatdescribesit

Recommendedusage

Forcomponents,therecommendedrelationisisCompatibleWithholdingwithmodels,butanyrelationTypecanbeusedasappropriate.

Relationtoothermetadataschemas

DataCite4.0:skos:closeMatchdatacite:relationType

relationType

271

relatedResource1

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothesourceresourcerelatedtothetargetresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

relatedResource1

272

relatedResource2

Usage

Mandatorywhenapplicable

Conditionsforusage

whenrelationTypeisfilledin

Type

ms:resourceIdentifierSchemeNameorms:schemeURI(foridentifiers)andxs:lang(fornames)

Definition/Explanations

Anameoranidentifier(e.g.urlreference)tothetargetresourcerelatedtothesourceresource(relatedResource2)througharelationdescribedinrelationType

Recommendedusage

Therecommendedwayforreferringtoaresourceisbygivingitsidentifier;ifyouprovidetheidentifier,pleaseselectalsotherelevantvaluefromthelistofvaluesintheattribute"resourceIdentifierSchemeName";ifnoneisappropriate,pleaseselect"other"andusethe"schemeURI"attributetoprovidealinktoaURLwithmoreinformationabouttheidentifierscheme.Ifyoudon'tknowtheidentifieroftheresource,youmayprovidethenameatleastinEnglish;ifyouwanttoaddnamesinotherlanguages,youcanusethe“lang”attribute.Forinteroperabilityreasons,itisrecommendedtodescribeallrelatedresourcesintheOpenMinTeDregistryandrefertothemthroughtheidentifier.

relatedResource2

273

TheOMTD-SHAREmetadataschemaTheOMTD-SHAREmetadataschema istherecommendedschemaforthedescriptionoftheresources.Ithasbeenconceivedanddesignedinordertoserveasafacilitator,providingtheinteroperabilitybridgebetweenthevariousresourcetypesinvolvedinTDMprocesses,andasanintermediarywiththetargetaudience,includingTDMdevelopersandend-users.

Itsdesigntakesintoconsiderationthefactthatbothresourcesanduserscomefromdifferentscientificcommunitiesandtriestoachieveinteroperabilitythroughacommoncorevocabularyforthedescriptionofresourcesandtheirproperties,establishinglinkstothevocabulariesalreadyusedbythevarioussourcesforthispurpose.Standardsandbestpracticesofthesourcecommunitiesaretakenaboardtothebestextentpossible.ThemainprinciplesandstrategiesemployedinthedesignoftheOMTD-SHAREschemaconsistofthefollowing:

coverneedsofresourcediscoverabilityandTDMprocessingcoverdocumentationneedsofallresourcetypesinvolvedinTDMbeflexibleenoughtosupportvaryingdegreesofdocumentationcompletenessorganizetheschemaelementsandaccommodatecommonvs.particularfeaturesofresourcesreusewhatisavailablevs.createandrecommendnewelementsandvaluesstandardize/normalizeuserinputvs.allowforfreeuserinputdocumentprocessingprocedureandoutputs.

1

TheOMTD-SHAREmetadataschema

274

IthaslargelybeenbasedontheMETA-SHAREmetadataschema[Gavrilidouetal.2012],whichcatersforthedescriptionoflanguageresources,encompassingbothdata(textual,multimodal/multimediaandlexicaldata,grammars,languagemodelsetc.)andtechnologies(tools/services)usedfortheirprocessing.TheOMTD-SHAREismorerestrictedinthesensethatitfocusesontextresourcesonly,whileitalsoextendsthebasicschemainordertoincludeTDM-specificconcepts,anddescribeinanenhancedwayprocessingproceduresandworkflows.

AsinMETA-SHARE,theschemasetsouttodocumentthefulllifecycleofaresource,whichalsoincludesatleastaminimaldocumentationofthesatelliteentitiesthatparticipateinit,especiallytherelationsthatholdbetweenthem.TheOMTD-SHAREdatamodelthuscomprisesofthefollowingentities:

theresources,furtherclassifiedinto:corpora,i.e.datasetsoftextdocuments-mainlyscholarlypublicationsinOMTD-SHARElexical/conceptualresources,includinglexica,ontologies,termlists,gazetteersetc.,butalsotagsetsandannotationschemas,whichareusedforannotatingcorporalanguagedescriptions,whichmainlyrefertocomputationalgrammarsmachinelearningandstatisticalmodels ,softwarecomponents,piecesofsoftware,toolsofferedaslocallyexecutablecodesoraswebservices,wrappedinaworkfloworasstandaloneend-to-endapplications,and,finally,publications,whichconstituteapeculiarresourcetype,astheyareviewedinOpenMinTeDonlyinacollectiveform,asa"corpus",

butalsosatelliteentities,suchastheactors,beitpersonsororganizationsthathavecreatedtheresources,ortheprojectsthathavefundedthemorwheretheyareused.

Obviously,lexical/conceptualresources,languagedescriptionsandmodelsareancillaryresourcesusedfortheTDMoperation.Corporaareanin-betweencaseastheymayrefertocorporausedfortheTDMoperation,suchastrainingorevaluationcorporaandthusplayasupportiverole,ortheycanbecomposedofscholarlypublications,inwhichcasetheyareapproachedasapropercontentresourcetobemined.

Theschemaiscomposedofmetadataelementsthatareusedtodescribepropertiesandrelationsbetweenalltheseentities.Someoftheseelements,especiallythosethatpertaintoadministrativefeatures(e.g.identification,contact,licensinginformationetc.),arecommontoalltypesofresources,whileotherelements,mainlythoserepresentingtechnicalfeaturesaboutthecontentsandformatofresources,differacrosstypes.Asaforesaid,publicationsdifferfromotherresourcestypes:themetadataelementsrecommendedfortheirdescriptionmainlyderivefromtheneedofservingasselectioncriteriainthecorpusbuildingprocess.

2

3

TheOMTD-SHAREmetadataschema

275

OneofthecharacteristicfeaturesoftheSHAREfamilyofschemas istheadoptionofthecomponent-basedmechanism(ComponentMetaDataInfrastructure,CMDI),accordingtowhichsemanticallycoherentelementsaregroupedtogethertoformcomponents [Broederetal.,2008].Forinstance,thelicensingmoduleincludeselementssuchasthenameandURLofalicence,attributiontext,copyrightholders,etc.Forthesakeofsimplification,thecontainerelementsusedforthisgroupingwillnotbepresentedintheguidelinesunlessrequired.

TheOMTD-SHAREschemaclassifieselementsinto3levelsofoptionality:

mandatory:elementsthatarenecessaryforintendedpurposes,i.e.fordiscoveringresourcesandfortriggeringoperationsbetweencontentands/wcomponentsrecommended:elementsthatcanhelpthecurrentorfutureuseoftheresource,orusefulinformationthatprovidershavenotyetstandardizedoptional:allremaininginformationrelatedtothelifecycleofaresource.

TheschemaiscurrentlyimplementedasanXSD .AnimportantdifferencefromMETA-SHAREliesintheorganisationvis-a-visthedifferentresourcetypescovered:whileMETA-SHAREdescribesallresourcestypesinonecommonXSD,inOMTD-SHARE,theresourcetypesaredescribedinamoremodularwayasseparatesetsofXSDs.

WorkisongoingforproducingalsoanRDF/OWLversion,whichwillbedocumentedinthenextreleaseoftheguidelines.

.ThefullOMTD-SHAREschemaisdocumentedat:https://openminted.github.io/releases/omtd-share/.↩

.Modelscouldbeconsideredasasubtypeoflanguagedescriptions,butwedecidedtokeepitdistinctbecauseithadalotofpropertiesthatdifferentiateditfromgrammars;atthispointitwasalsoconsideredbettertokeepthemapartasitwouldenhancetheirdiscoverability.↩

.BasedontheMETA-SHAREschema,fourmoreadaptationsarenowavailable:ELRC-SHARE,clarin:el,andOMTD-SHARE.TheMETA-SHAREschemahasalsobeenimplementedasanRDF/OWLontologywiththecollaborationoftheld4ltW3Cgroup.↩

.Toavoidconfusionwiththeterm"component"alsousedforsoftwarecomponents,wewillfromnowonrefertothisconceptas"modules".↩

.ThecurrentversionofXSD'sisavailableat:https://github.com/openminted/omtd-share_metadata_schemaandthedocumentationofv1.0.0at:https://openminted.github.io/releases/omtd-share/1.0.0/↩

3

4

5

1

2

3

4

5

TheOMTD-SHAREmetadataschema

276

TheOMTD-SHAREmetadataschema

277

Glossary

annotation(text/corpusannotation)Anotebywayofexplanationorcommentaddedtoatextordiagram[OxfordEnglishDictionary,https://en.oxforddictionaries.com/definition/annotation].InOpenMinTeD,thetermrefersmainlytotextorcorpusannotation,whichisthepracticeofaddinginterpretativelinguisticinformationgroundedinaknowledgeresourcetoatextorcorpusrespectively.Forexample,onecommontypeofannotationistheadditionoftags,orlabels,indicatingthewordclasstowhichlexicalunitsinatextbelong;thesetagscomefromapredefinedset(e.g.Noun,Verb,Preposition,etc.).Semanticlabelingwithtermsandconceptsfromanontologyisanothercommonexampleofannotation.Relationshipssuchassyntacticdependenciesorsemanticrelationsthatlinkentitiesofthetextarealsoannotations.

annotationresourceAnyresourcethatcanbeusedforannotatingatext,includingpart-of-speechtagsets,annotationschemes,domain-specificontologies,etc.

annotationschemeAsetofelementsandvaluesdesignedtoannotatedata.Anannotationschemeusuallyaimstorepresentaspecificlevelofinformation,suchasmorphologicalfeaturesofwords,syntacticdependencyrelationsbetweenphrases,discourselevelinformation,etc.Itcanconsistofaflatstructureofelementsandvalues(e.g.part-of-speechtags)oritcanbemorecomplexwithinterrelatedelements(e.g.specificmorphologicalfeaturestobeusedforeachpart-of-speech).

applicationAnysoftwareprogram(orgroupofprogramsseenasawhole)intendedfortheend-userandaddressingoneormultiplerelateduserneeds.

component(softwarecomponent)

Glossary

278

Analgorithmwrappedinastandardwaysothatitcanbeintegratedasareusabletoolorservicewithinaparticularcomponent-orientedframeworksuchasUIMA,GATE,etc.

corpusAstructuredcollectionofpiecesofdata(textual,audio,video,multimodal/multimedia,etc.)typicallyofconsiderablesizeandselectedaccordingtocriteriaexternaltothesedata(e.g.size,typeoflanguage,typeofproducersorexpectedaudience,etc.)torepresentascomprehensivelyaspossibletheobjectofstudy.

datamodelAdatamodelisanabstractmodelthatorganizeselementsofdataandstandardizeshowtheyrelatetooneanotherandtopropertiesoftherealworldentities.[Wikipedia,https://en.wikipedia.org/wiki/Data_model]

distributionAnyformbywhicharesourcecanbeshared;itcanbeadownloadablePDForaplaintextfile,aformofacorpusaccessibleonlythroughawebinterface,orthesourcecodeofasoftware,etc.

documentApieceofwritten,printed,orelectronicmatterthatisprimarilyintendedforreading.

interoperabilityInteroperabilitydescribestheextenttowhichsystemsanddevicescanworktogether,exchangedata,andinterpretthatshareddata.Fortwosystemstobeinteroperable,theymustbeabletoexchangedataandsubsequentlypresentthatdatasuchthatitcanbeunderstoodbyauser.[ResearchDataAlliance,http://smw-rda.esc.rzg.mpg.de/index.php/Interoperability]

licence

Glossary

279

Apermissionorawrittenevidenceofapermissionthatconfersthelicenseetherighttodosomethingthatotherwisewouldbepreventedbythelaw.

licencecompatibility/interoperabilityTheconditionorstateinwhichtwoormorelicencescanco-existorbecombinedwithoutconflictingwitheachother.InOpenMinTeD,licencecompatibilityandlicenceinteroperabilityareusedassynonyms.

knowledgeresourceAresource(dataand/ortool)containing,producingorrepresentingknowledge;knowledgeisspecificinformationthatisrelevantforthelinguisticandconceptualinterpretationofdata.ForOpenMinTeDpurposes,thisinformationisexploitedorproducedbyTDMmodulesandtools,orexchangedbetweenthem.

languagedescriptionTheresourcedescribesalanguageorsomeaspect(s)ofalanguageviaasystematicdocumentationoflinguisticstructures.[OpenLanguageArchivesCommunity,http://www.language-archives.org/REC/type.html#language_description]Examplesincludesketchgrammar,computationalgrammar,etc.

languageresourceLanguageResources(LRs)encompass(a)datasets(textual,multimodal/multimediaandlexicaldata,grammars,languagemodels,etc.)inmachinereadableform,usedtoassistandaugmentlanguageprocessingapplications,butalso,inabroadersense,inlanguageandlanguage-mediatedresearchstudiesandapplications,and(b)tools/technologies/servicesusedfortheirprocessing.

lexical/conceptualresourceAresourceorganisedonthebasisoflexicalorconceptualentries(lexicalitems,terms,concepts,etc.)withtheirsupplementaryinformation(e.g.grammatical,semantic,statisticalinformation,etc.).InOpenMinTeD,theycanbeusedforannotationpurposes.

Glossary

280

machinelearning(ML)modelTheprocessoftraininganMLmodelinvolvesprovidinganMLalgorithm(thatis,thelearningalgorithm)withtrainingdatatolearnfrom.ThetermMLmodelreferstothemodelartifactthatiscreatedbythetrainingprocess.[http://docs.aws.amazon.com/machine-learning/latest/dg/training-ml-models.html]

metadataMetadataisstructuredinformationthatdescribes,explains,locates,orotherwisemakesiteasiertoretrieve,use,ormanageaninformationresource.Metadataisoftencalleddataaboutdataorinformationaboutinformation.[NationalInformationStandardsOrganization,Understandingmetadata,http://www.niso.org/publications/press/UnderstandingMetadata.pdf]

openaccess(OA)Thefreeandonlineavailabilityofliterature,whichallowstoread,download,copy,distribute,print,search,orlinktothefulltext,crawlarticlesforindexing,passthemasdatatosoftware,orusethemforanyotherusefulpurpose.Anavailabilitythatisgrantedwithoutfinancial,legal,ortechnicalbarriersotherthanthoseinseparablefromgainingaccesstotheinternetitself,andthoserelatedtogivingauthorscontrolovertheintegrityoftheirworkandtherighttobeproperlyacknowledgedandcited[BudapestOAInitiative2002;BethesdaStatementonOAPublishing2003;BerlinDeclarationonOAKnowledgeinScienceandHumanities2003]

OpenMinTeDinfrastructureAninfrastructurereferstothebasicstructuresandfacilitiesrequiredfortheoperationofasystem.TheOpenMinTeDinfrastructureconsistsofdifferentlayersofresources:contentresourcesthatcanbemined,ancillaryknowledgeresources,toolsandwebservices.AnyresourcethatcanberegisteredintheOpenMinTeDregistryispartoftheunderlyinginfrastructure.

OpenMinTeDplatform

Glossary

281

TheOpenMinTeDplatformbringstogetheralltheservicesthatfacilitatetheinteroperabilityaspectsoftheunderlyinginfrastructure(e.g.registration,searchandbrowsing,creationofworkflows,processing,annotation,etc.)and,thus,becomesaninfrastructuralserviceofthewiderresearchecosystem.

publicationAbook,article,etc.,thathasbeenmadeavailabletothepubliceitherviaaformalpublicationserviceorovertheinternetandisstoredatanarchiveorrepository.ForOpenMinTeDpurposes,thismainlycoversscholarlypublications.

resourceSomethingthatyoucanusetohelpyoutoachievesomething,especiallyinyourworkorstudy.[MacMillandictionary,http://www.macmillandictionary.com/dictionary/british/resource_1]

rightsstatementFormalorofficialstatementassertingthecopyrightstatusand/orthelicensingconditionsforagivenresource.Itcanbeissuedbyanauthoritativebody(e.g.http://rightsstatements.org/).ForOpenMinTeDpurposes,itcanbedeemedsimilartoa"licencecategory",groupinglicencesthatsharesimilarfeatures.

TextandDataMiningTextandDataMining(TDM)wasinitiallydefinedas“thediscoverybycomputerofnew,previouslyunknowninformation,byautomaticallyextractingandrelatinginformationfromdifferent(…)resources,torevealotherwisehiddenmeanings”(Hearst,1999),inotherwords,“anexploratorydataanalysisthatleadstothediscoveryofheretoforeunknowninformation,ortoanswersforquestionsforwhichtheanswerisnotcurrentlyknown”(Hearst,1999).[FutureTDM,http://www.futuretdm.eu/news/tdm-definition/]

service/webservicePieceofsoftwareaccessiblethroughremoteinvocationtypicallyusingsomeREST-styleAPIsorSOAPprotocols.

Glossary

282

toolPieceof(standalone)softwaretypicallyforaverylimitedtechnicalpurpose,suchasaparticularimplementationofapart-of-speechtagger(e.g.TreeTagger),atreeparsingprogram(e.g.mstparser),etc.PreferredtermsinOpenMinTeDinclude'component'and'workflow'.

workflowAseriesofsoftwarecomponentsassembledtogetherinordertoperformaspecifictask.

Glossary

283

top related