the clarin-nl & clarin-vl web services project · the clarin-nl & clarin-vl web services...

40
The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute [email protected] Ineke Schuurman K.U. Leuven [email protected] Freudenstadt 2010-11-16

Upload: others

Post on 25-Jun-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

The CLARIN-NL & CLARIN-VL Web Services Project

Marc Kemps-Snijders Meertens Institute

[email protected]

Ineke Schuurman K.U. Leuven

[email protected]

Freudenstadt 2010-11-16

Page 2: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

CLARIN Mission

 Mission:  create an infrastructure which makes

language resources (annotated recordings, texts,

lexica, ontologies) and technology (speech recognizers,

lemmatizers, parsers, summarizers, information extractors) available and readily usable to scholars of all disciplines, in particular the Humanities and Social Sciences.

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 3: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

HOW ARE WE GOING TO DO THAT?

Building an infrastructure via CLARIN-centers Demonstration projects Technical and financial support Include as many institutions as possible in Netherlands and Flanders

Strategy

Participants Clarin-NL

Utrecht Institute for Linguistics OTS Landelijke Onderzoeksschool Taalkunde

Max-Planck-Institute for Psycholinguistics Meertens Instituut (KNAW) Huygens Instituut (KNAW)

Data Archiving and Networked Services (KNAW) Fryske Akademy (KNAW)

Digitale Bibliotheek voor de Nederlandse Letteren Instituut voor Nederlandse Lexicologie

Centre for Language and Speech Technology Centre for Language Studies

Amsterdam Center for Language and Communication

Center for Language and Cognition Centre for Linguistics

Tilburg Centre for Creative Computing Human Media Interaction Group Katholiek Documentatie Centrum

Koninklijke Bibliotheek Veteraneninstituut

Taal en Communicatie Vrije Universiteit Instituut voor Beeld en Geluid

Nederlands Instituut voor Oorlogsdocumentatie Aletta (Instituut voor Vrouwengeschiedenis

Participants Clarin-VL

Centrum voor Computerlinguïstiek (CCL), K.U.Leuven.

Interdisciplinary research on Technology, Education & Communication (itec), K.U.Leuven

(Kortrijk) Center for Computational Linguistics and

Psycholinguistics (CLiPS), Universiteit Antwerpen Elektronica en Informatiesystemen (ELIS-DSSP),

Universiteit Gent ESAT-PSI Spraak K.U.Leuven

Laboratory for Digital Speech and Audio Processing ETRO-DSSP, Vrije Universiteit Brussel

Language Intelligence and Information Retrieval (HMDB-LIIR), K.U.Leuven

Language and Translation Technology Team (LT³), Hogeschool Gent

Page 4: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

CLARIN centers - Netherlands

Max  Planck  Ins,tute  

Meertens  Ins,tute  

INL  

CLARIN centers provide

(CLARIN compatible) data

and software that can be

used for all researchers of

participating institutions.

DANS  

Huygens    Ins,tute  

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 5: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

CLARIN centers – Flanders (in preparation)

University  of  Antwerp  

Catholic  University  of  

Leuven  

CLARIN centers provide

(CLARIN compatible) data

and software that can be

used for all researchers of

participating institutions.

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 6: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Clarin-NL Call 1 Data Curation & Demonstrator Projects

  Involve a targeted user and address the user’s research questions

  Open call for small subprojects (.5 yr / max. 60k Euro each)   12 projects running, some already finished   Will make available

  a range of curated resources   a range of showcases of CLARIN functionality   evidence-based requirements and desiderata – for the

CLARIN infrastructure and – for supported standards and best practices

2009 Name Description

AAM-LR Automatic Annotation of Multi-modal Language Resources

Adelheid A Distributed Lemmatizer for Historical Dutch

ADEPT Assaying Differences via Edit-Distance of Pronunciation Transcriptions

DUELME-LMF Converting DUELME into LMF format

INTER-VIEWs Curation of Interview Data

MIMORE Microcomparative Morphosyntax Research Tool

SignLinC Linking lexical databases and annotated corpora of signed languages

TICCLops Text-Induced Corpus Clean-up online processing system

TDS Curator A web-services architecture to curate the Typological Database System

TQE Transcription Quality Evaluation

WFT-GTB Integrating the Wurdboek fan 'e Fryske Taal into the Geïntegreerde Taalbank

Page 7: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

  New in 2010   Second Open Call for data curation & demonstrator projects   Directly assigned projects

  Selection to be based (inter alia) on results of user survey   Budget available: 400k Euro – Initial selection before end

2010

2010

Clarin-NL Call 2 Data Curation & Demonstrator Projects

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 8: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

  Infrastructure Implementation Project (IIP, 3 yrs)   infrastructure services, an open archiving service, registries,

federation of centers, set up a schema registry, profile matching, ISOCAT maintenance, add relation registry RELCAT.

  coordinate and give guidance for work on web services, wrapper and service bus specification and implementation, select work flow tools and experiment with them.

Clarin-NL Infrastructure projects

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 9: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Introduction Nijmegen

2010-07-01

www.clarin.eu

  Metadata project (MD, 0.5 yr)   Testing CMDI against existing national data   Create initial set of required metadata components   Results: (see http://www.clarin.eu/cmdi) – Component

Registry – CMDI XML Toolkit – Documentation for users and developers – Metadata Tutorial held

  Search & Develop (S&D, 3 yrs)   centralized metadata search   distributed content search – Text based and structured search   National extension of the European Demonstrator project

Clarin-NL Infrastructure projects

Page 10: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Clarin-VL and Clarin-NL

  TTNWW (3 yrs)   Cooperation between Netherlands and Flanders   Implement user friendly workflow services for

  Text   enriching text corpora with annotations   For literature researchers (Huygens) and archeologists

(Sagalassos)   Speech

  indexing and search of (a limited set of) audio and video data   For social historians (Aletta, KDC, KADOC, M2P)

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 11: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

TTNWW web services

  TEXT   Text preprocessing - Tilburg University

  Lexical Unit tagging - Tilburg University

  Shallow parsing - University of Antwerp/Tilburg University

  NER and coreference – University College Ghent

  Semantic role annotations – Utrecht University

  Spatiotemporal analysis – Catholic University of Leuven

  SPEECH   Speech recognizers - Catholic University of Leuven/University of

Twente

  Segmentation, transcription, alignment – Ghent University/Catholic University of Leuven/University of Twente/Radboud University

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 12: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

TTNWW workflow (simplified)

Input: Machine readable

texts Text preprocessing

Lexical Unit Tagging

Full Parsing Shallow Parsing

Alignment

NER and Coreference

Spatiotemporal analysis

Output: Analyzed

Text

ASR (Automatic

Speech Recognition) ASR resources:

lexicon, acoustic model, language model

Resource adaptation

Output: transcription

with recognition errors and incomplete punctuation

WEB Input: context relevant texts,

previous recordings, ..

Input: Audio (archives,

musea, interviews, …)

TEXT SPEECH

Path strongly depends on quality of transcriptions

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 13: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Questions

  How to make these services available?   How to construct workflows from these services?   How to accommodate for different data formats?

  Text (D-COI/SONAR is de facto standard in NL/VL )   Speech + text

  How to accommodate for other CLARIN requirements?   Metadata for web services   Metadata for workflow specifications   Metadata and provenance data generation   Authorization and authentication

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 14: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Making services available CMD Service specification

In Clarin each service must be described using CMDI Web service metadata is harvested to central MD repository using standard OAI-PMH protocol

REST, SOAP or XML/RPC

Reference to WSDL or WADL file

Reference to format specification, e.g. schema file

Alternative format name, e.g. TCF Reference to format

element, e.g. schema element

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 15: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

CMD Service specification

  Each service metadata document must contain a Service component

  and may be further enhanced with descriptive information such as organization information

  The Service component describes the operations, input and output parameters and characteristics of resource being used in the operation

  Two types of parameters are distinguished:   ConfigurationParameter: describes configuration settings   ResourceParameter: describes resource characteristics

expressed in the TechnicalMetadata component

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 16: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

CMD Service specification

  TechnicalMetadata contains information on   General technical characteristics of a resource, e.g. mime

type, character encoding,..   is extensible to describe different resource types

  Content related information ContentEncoding   Structure indication, e.g. schema reference for XML   Structural element presence (which structural elements are

actually present in the resource) AnnotationLevel   TechnicalMetadata is expected to be present in resource

metadata to enable profile matching

  AnnotationLevel contains additional information relevant to the structural element   E.g. PartOfSpeech may contain additional information on

tagset and tagset language

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 17: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

How to create service CMDI documents

  Generate skeleton CMDI document from WSDL or WADL   Example: Language Identifier service from RACAI using

WSDL

  CMDI metadata document editing

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 18: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

<?xml version="1.0" encoding="UTF-8"?> <wsdl:definitions xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:tm="http://microsoft.com/wsdl/mime/textMatching/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/" xmlns:tns="http://tempuri.org/" xmlns:s="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://schemas.xmlsoap.org/wsdl/soap12/" xmlns:http="http://schemas.xmlsoap.org/wsdl/http/" targetNamespace="http://tempuri.org/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"> <wsdl:types> <s:schema elementFormDefault="qualified" targetNamespace="http://tempuri.org/"> ……………………………………………………………… </s:schema> </wsdl:types> <wsdl:message name="IdentifyLanguageSoapIn"> <wsdl:part name="parameters" element="tns:IdentifyLanguage"/> </wsdl:message> <wsdl:message name="IdentifyLanguageSoapOut"> <wsdl:part name="parameters" element="tns:IdentifyLanguageResponse"/> </wsdl:message> <wsdl:portType name="LangIdWebServiceSoap"> <wsdl:operation name="IdentifyLanguage"> <wsdl:input message="tns:IdentifyLanguageSoapIn"/> <wsdl:output message="tns:IdentifyLanguageSoapOut"/> </wsdl:operation> </wsdl:portType> <wsdl:binding name="LangIdWebServiceSoap" type="tns:LangIdWebServiceSoap"> <soap:binding transport="http://schemas.xmlsoap.org/soap/http"/> <wsdl:operation name="IdentifyLanguage"> <soap:operation soapAction="http://tempuri.org/IdentifyLanguage" style="document"/> <wsdl:input> <soap:body use="literal"/> </wsdl:input> <wsdl:output> <soap:body use="literal"/> </wsdl:output> </wsdl:operation> </wsdl:binding> <wsdl:service name="LangIdWebService"> <wsdl:port name="LangIdWebServiceSoap" binding="tns:LangIdWebServiceSoap"> <soap:address location="http://www.racai.ro/WebServices/LangId.asmx"/> </wsdl:port> </wsdl:service> </wsdl:definitions>

Service Operation

Input message

Output message

Service name Service location

CMDI and WSDL Example: Language Identifier service from RACAI using WSDL

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 19: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

WSDL and message schema Example: Language Identifier service from RACAI using WSDL

<wsdl:definitions xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:tm="http://microsoft.com/wsdl/mime/textMatching/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/" xmlns:tns="http://tempuri.org/" xmlns:s="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://schemas.xmlsoap.org/wsdl/soap12/" xmlns:http="http://schemas.xmlsoap.org/wsdl/http/" targetNamespace="http://tempuri.org/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"> <wsdl:types> <s:schema elementFormDefault="qualified" targetNamespace="http://tempuri.org/"> <s:element name="IdentifyLanguage"> <s:complexType> <s:sequence> <s:element minOccurs="0" maxOccurs="1" name="text" type="s:string"/> <s:element minOccurs="1" maxOccurs="1" name="modern_languages" type="s:boolean"/> <s:element minOccurs="1" maxOccurs="1" name="rare_languages" type="s:boolean"/> </s:sequence> </s:complexType> </s:element> <s:element name="IdentifyLanguageResponse"> <s:complexType> <s:sequence> <s:element minOccurs="0" maxOccurs="1" name="IdentifyLanguageResult" type="tns:LangIDResult"/> </s:sequence> </s:complexType> </s:element> <s:complexType name="LangIDResult"> <s:sequence> <s:element minOccurs="0" maxOccurs="1" name="Language" type="s:string"/> <s:element minOccurs="1" maxOccurs="1" name="Confidence" type="s:double"/> </s:sequence> </s:complexType> </s:schema> </wsdl:types>

Message element: Language

Message element: Confidence

Message element: text (Resource)

Message element: modern_languages (Configuration)

Message element: rare_languages (Configuration)

Input message

Output message

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 20: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Example input/output message Example: Language Identifier service from RACAI using WSDL

<IdentifyLanguage> <text>The text for which the language to be identified goes here...</text> <modern_languages>true</modern_languages> <rare_languages>false</rare_languages>

</IdentifyLanguage>

Message element: Language

Message element: Confidence

Message element: text (Resource)

Input message

Message element: modern_languages (Configuration)

Message element: rare_languages (Configuration)

<LangIDResult> <Language>english</Language> <Confidence>94.8</Confidence>

</LangIDResult>

Output message

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 21: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

CMDI skeleton generated from WSDL <?xml version="1.0" encoding="UTF-8"?> <CMD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="example-md-schema.xsd"> <Header/> <Resources> <JournalFileProxyList/> <ResourceProxyList> <ResourceProxy> <ResourceType>WSDL service</ResourceType> <ResourceRef>http://www.racai.ro/webservices/LangId.asmx</ResourceRef> </ResourceProxy> </ResourceProxyList> </Resources> <Components> <Service> <Type>SOAP</Type> <Name>LangIdWebService</Name> <URL> http://www.racai.ro/webservices/LangId.asmx?WSDL </URL> <Operation> <Name>IdentifyLanguage</Name> <Action>http://tempuri.org/IdentifyLanguage</Action> <Input> <Parameter> <Name>IdentifyLanguage.text</Name> </Parameter> <Parameter> <Name>IdentifyLanguage.modern_languages</Name> </Parameter> <Parameter> <Name>IdentifyLanguage.rare_languages</Name> </Parameter> </Input> <Output> <Parameter> <Name>IdentifyLanguageResponse.Language</Name> </Parameter> <Parameter> <Name>IdentifyLanguageResponse.Confidence</Name> </Parameter> </Output> </Operation> </Service> </Components> </CMD>

Input ‘parameter’

Output ‘parameter’

WSDL Location

Service Location (this should be a PID)

Operation Name

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 22: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Parameter requirements Resource

<text>The text for which the language is to be identified goes here...</text> Message element: text

(Resource)

  <text> requirements:   Text/plain   UTF-8 encoded

These are not represented in WSDL

  These characteristics are important for profile matching   Current state of discussion:

  Specify using CMDI   Specify using external type system

Input message

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 23: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

CMD for web services (refresh)

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 24: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

How to specify TechnicalMetadata?

<TechnicalMetadata>

<MimeType>text/xml</MimeType> <CharacterEncoding>UTF-8</CharacterEncoding>

<ContentEncoding> <URL>hdl:tcf_reference</URL> <ResourceFormat>Text Corpus Format</ResourceFormat> <AnotationLevel> <Identifier>TextCorpus.POSTags.tag.token</Identifier> <TagSet>STTS</TagSet> <Language>de</Language> </AnnotatonLevel> </ContentEncoding>

</TechnicalMetadata>

<TechnicalMetadata>

<MimeType>text/plain</MimeType> <CharacterEncoding>UTF-8</CharacterEncoding>

<ContentEncoding> <URL>hdl:plain_text_reference</URL> <ResourceFormat>Plain Text</ResourceFormat> </ContentEncoding>

</TechnicalMetadata>

Describing plain text resource Describing XML resource with POStoken element characteristics

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 25: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Where to get the right component?

<CMD_Component name=“Text Corpus Format"> <CMD_Element name=“MimeType" ValueScheme="string“/> <CMD_Element name=“CharacterEncoding " ValueScheme="string“> </CMD_Component>

<CMD_Component name=“Plain text"> <CMD_Element name=“MimeType" ValueScheme="string“/> <CMD_Element name=“CharacterEncoding " ValueScheme="string“> </CMD_Component>

hdl:plain_text_reference

hdl:tcf_reference

e.g. Components are associated with nodes in ontology of resources See e.g. MyGrid service ontology

TextCorpus.text …

TextCorpus.POStags.tag …

TextCorpus.sem_lex_rels.antonymy.orthform

<CMD_Component name=“TCF-POS"> <CMD_Component name=“TagSet"> <CMD_Element name=“Name“ ValueScheme="string“ /> <CMD_Element name=“Language“

ValueScheme="string“ ConceptLink="http://www.isocat.org/datcat/DC-1766"/>

</CMD_Component> </CMD_Component>

MimeType= text/xml CharacterEncoding = ?

MimeType= text/plain CharacterEncoding = ?

Tagset= STTS language = de Smart web service metadata editor picks up

Components and guides metadata creation process

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 26: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

How to put this to use

  Profile matching   Each resource must specify TechnicalMetadata component   Match is made against TechnicalMetadata of service input

parameters   Metadata and provenance data generation

  The resulting resource of each service invocations must have metadata and provenance data

  Options:   Service wrapper   Enterprise Service Bus

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 27: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Clarin Service Bus Principle

Client Web

service CSB

CDMI

Resolve Service Identifier Resolve parameter resource identifiers Retain incoming metadata documents

CMDI

Create metadata documents •  inline: reuse metadata incoming resource •  standoff: create new metadata document

provenance

Create provenance data •  service •  Operation •  Input parameter values •  Output parameter values

Client calls service and receives result

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 28: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Clarin Service BUS design (Component diagram)

Implemented using Apache ServiceMix 4

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 29: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Clarin Service Bus Interface

  CSB publishes generic web service interface with one method: Invoke Invokes service on behalf of the client   Input

  Service identifier   Operation   Service parameters

  Output   Service response

  (Injected) WSDL for each service is published instead of original WSDL or WADL

Service identifier PID to service metadata

document

Operation SOAPAction from WSDL

document

Service parameters Original

Resource is injected here using the

resource metadata PID

Page 30: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Provenance data Example <?xml version="1.0" encoding="UTF-8" standalone="no"?> <Entries> <Entry date="Thu Nov 11 09:14:01 CET 2010"> <Input> <To>hdl:serviceMetadata</To> <Action>http://tempuri.org/IdentifyLanguage</Action> <serviceParameters ……..> <q0:IdentifyLanguage> <q0:text>hdl:resourceMetadata</q0:text> <q0:modern_languages>true</q0:modern_languages> <q0:rare_languages>false</q0:rare_languages> </q0:IdentifyLanguage> </serviceParameters> </Input> <Output> <IdentifyLanguageResponse xmlns="http://tempuri.org/"> <IdentifyLanguageResult> <Language>hdl:1289463241309</Language> <LanguageEn>English</LanguageEn> <LanguageNative>English</LanguageNative> <Confidence>75.237260220415891</Confidence> </IdentifyLanguageResult> </IdentifyLanguageResponse> </Output> </Entry> </Entries>

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 31: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Further actions in this area

  Authentication is still an issue   Talks with GEANT: beginning of 2011 solution may become

available based on STS.   Use brokered authentication with a security token issued by a Security Token Service (STS). The STS is

trusted by both the client and the Web service to provide interoperable security tokens.

Client Web

service CSB CSB GEANT AAI

STS Security

Token Service

Token transformation •  SAML •  OAUTH •  SLCS   Workspaces are still being discussed   Requires authentication   In Netherlands: work together with projects such as BigGrid

and CATCHPlus

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 32: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

SOA and workflow

resource tool resource+

tool resource++

etc. manual selection, adaptation etc

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 33: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

TTNWW Workflow engine selection

  Selection criteria   Data flow rather than control flow   Possibility for nested workflow

  Created workflows act as building blocks to other end users   Support for both REST and SOAP   Capable of handling heterogeneous data formats

  Text (D-COI/SONAR)   Speech data (audio + transcription)

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 34: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Taverna workbench workflow design

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 35: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Taverna workbench workflow execution and progress

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 36: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Execution scenarios

1. Download workflow specification

2. Execute locally

Local Execution 1 a. Access workflow specification

2. Execute remotely

1 b. Access workflow specification

Remote execution

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 37: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

User interface (Initial design)

Page 38: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

CMDI for workflow specifications

  Predefined workflows are considered as special resources (See e.g. MyExperiment.org)   withTechnicalMetadata specifying workflow language etc   With Input and Output parameter characteristics

  Workflow engine (as a service)   Defines which workflows it is capable of processing   Optionally defines extra parameters for processing

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 39: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Tying it all together

  Metadata for web services   Metadata for workflow specifications   Metadata services

  Clarin metadata repository (harvesting using OAI-PMH)   Becomes available towards end 2010

  Workflow system   Metadata and Provenance data generation

  provided that AAI and workspaces are resolved.

Web service workshop Freudenstadt

2010-11-16

www.clarin.eu

Page 40: The CLARIN-NL & CLARIN-VL Web Services Project · The CLARIN-NL & CLARIN-VL Web Services Project Marc Kemps-Snijders Meertens Institute Marc.kemps.snijders@meertens.knaw.nl Ineke

Thank you for your attention

CLARIN has received funding fromthe European Community's Seventh Framework Programme

under grant agreement n° 212230