1 new emboss web service shaun mcglinchey (shaun@ebi.ac.uk)

Post on 25-Dec-2015

237 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

New EMBOSS Web Service

Shaun McGlinchey (shaun@ebi.ac.uk)

Outline

The presentation will discuss the challenges encountered in

exposing the EMBOSS suite of command line sequence analysis

tools as a ‘stateful’ SOAP based web service. An overview of the

proposed framework for client-side requests, server-side job

submission and results delivery will then be given.

What is EMBOSS?

EMBOSS is "The European Molecular Biology Open Software

Suite".

What can I use EMBOSS for? Consists of approx 300 command line applications covering areas such

as:

Sequence alignment

Rapid database searching with sequence patterns

Protein motif identification, including domain analysis

Phylogenetic analysis

Presentation tools for publication

What is JAX-WS?

In the words of SUN: JAX-WS - Java API for XML Web Services

(JAX-WS). is the centerpiece of a newly rearchitected API stack

for Web services, the so-called "integrated stack" that includes

JAX-WS 2.0, JAXB 2.0, and SAAJ 1.3.

Essentially a SOAP toolkit for Java

The implementation has been renamed (JAXRPC)

It brings clear improvements on data binding capabilities through

its tight integration with JAXB – Java API for XML Binding

Current State of (old) EBI EMBOSS Web Service

The current server-side implementation is Perl-based. Sample clients

are available in .Net, SOAP::Lite and Java (Axis) solutions.

Currently accepts free text as data input – weak typing – poor validation

capability

Supports both Synchronous and Asynchronous job submission.

Asynchronous requests are allocated a job id

Migrating to a Java-based JAX-WS server side implementation enables

us to have more control over the generated artifacts, increased data

validation capabilities and to rapidly improve on the functionality

provided.

EMBOSS Data Types

There are 52 datatypes (at the last count) used within the

EMBOSS suite of applications. These fall under five headings

1. Simple – Array, Boolean, Integer, String …

2. Input – Codon, Features, Sequence, Seqall …

3. Selection Lists – List, Selection …

4. Output – Align, Report, Seqout …

5. Graphics – Graph, Xygraph

EMBOSS Qualifiers

EMBOSS command line program

Accepts application name + qualifiers (each of which is a

datatype):

Water -asequence tsw:hba_human -bsequence

tsw:hbb_human : (water sequence seqall)

-asequence is of datatype Sequence, bsequence of Seqall

Qualifiers consist of associated qualifiers which can be also

passed to the command line to enable advanced configuration

of the application call. - sbegin, -send, -sformat

General, Additional & Advanced Qualifiers

General are common to all EMBOSS applications

-auto true - Turn off prompts (boolean datatype)

-stdout true - Write standard output (boolean)

Web Service Development

In accordance with the Technology Recommendation we have

chosen Top-Down approach to WS Development, not Bottom-

Up.

Top-Down Approach to WS Development

Express data types in schema

Write WSDL (include schema)

Generate Artifacts (JavaBeans – data objects, server side

stubs, implementation class

Top-Down Approach to WS Development

Top-Down

Express data types in schema

Write WSDL (include schema)

Generate Artifacts (JavaBeans – data objects, server side

stubs, implementation class

Package (WAR file)

Deploy WAR file to server

Sample EMBOSS Application Schema (Head)

<?xml version="1.0" encoding="UTF-8"?>

<definitions targetNamespace=“emboss"

xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"

xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<types>

<xsd:schema xmlns="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://www.ebi.ac.uk/ws/emboss/water/>

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"

xmlns:tns="http://www.ebi.ac.uk/ws/emboss/applications/water/"

xmlns:jxb="http://java.sun.com/xml/ns/jaxb" jxb:version="1.0">

Application Schema – Custom Bindings (cont’d)

<xsd:annotation>

<xsd:appinfo>

<jxb:schemaBindings>

<jxb:package name="uk.ac.ebi.ws.emboss.applications.water">

</jxb:package>

</jxb:schemaBindings>

</xsd:appinfo>

</xsd:annotation>

Express Application Parameters

<xsd:element name="asequence“/>

<xsd:complexType name="asequence">

<xsd:sequence>

<xsd:element name="asequence" type="xsd:string" nillable="false"/>

<xsd:element name="asequenceQualifiers" type="tns:asequenceQualifiers"

nillable="true"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

Express asequenceQualifiers

<xsd:element name=“asequenceQualifiers”>

<xsd:complexType name=“asequenceQualifiers">

<xsd:sequence>

<xsd:element name="sbegin" type="xsd:integer"/>

<xsd:element name="send" type="xsd:integer"/>

<xsd:element name=“usa" type="xsd:string"/>

……

</xsd:sequence>

</xsd:complexType>

</xsd:element>

Encapsulate all data types inside an application element

<xsd:element name="water" type="tns:water"/>

<xsd:complexType name="water">

<xsd:sequence>

<xsd:element name="asequence" type="tns:asequence"/>

<xsd:element name="bsequence" type="tns:bsequence"/>

<xsd:element name="datafile" type="xsd:string"/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

Using JAXB Generated Java Beans at the client side

Java Bean Objects are generated using

for client using JAX-WS ‘wsimport’ tool –

compiles wsdl + schema

Generated objects are populated using

setter (client-side) i.e.

Sequence asequence = newSequence();

asequence.setUsa("tsw:hba_human");

asequenceQual.setSprotein(true);

asequenceQual.setSbegin(0);

EMBOSS Applications (300)

Manually create the schema – Not scaleable

Maven is a software project management & build tool.

Written an EMBOSS ACD parser plugin for our Maven WS

Software Build

Java class

Takes EMBOSS application definitions (ACD) as input

Output XML Schema, WSDL, representing each EMBOSS

application

These schema are passed to a JAXB compiler which generates

our Java Bean objects

Advantages of WS EMBOSS Software Build

Advantage of this approach is

We can auto-generate XML schema, Application WSDLs

Generate Java Objects for use on Client-Side

We can easily integrate new EMBOSS applications as a WS

by running the ACD file through our software build

Generated Artifacts

Why go to these lengths?

Because of sheer number of EMBOSS apps, necessary to

provide a clear means of representing the invocation of

separate applications and the passing of parameters

appropriate to that app.******* CLIENT SIDE CODE **********RunEmbossRequest run = new

RunEmbossRequest();EmbossParams water = new EmbossParams();water.setAsequence(asequence);water.setBsequence(bsequence);Emboss emboss = new Emboss();emboss.setApplication(EmbossApplication.WAT

ER);emboss.setApplicationParams(water);run.setEmbossParams(emboss);service = new WSEmbossService(); WSEmboss wsemboss =

service.getWSEmboss(); RunEmbossResponse response =

wsemboss.run(run);

Server-side – Reverse Process

At the server-side level, to obtain values objects can be de-

serialised using the Java getter methods, i.e.

******* SERVER-SIDE CODE **********

Emboss emboss = input.getEmbossParams();

EmbossApplication embossApp = emboss.getApplication();

String appname = embossApp.value();

EmbossParams water = emboss.getApplicationParams();

Sequence asequence = water.getAsequence();

Seqall bsequence = water.getBsequence();

This solution does not scale well

How do we get from a Web Service payload to a valid command line?

We are looking at the possibility of developing a generic

mechanism to transform the SOAP envelope (our WS inputs –

Water params etc) using XSL (Extensible Stylesheets) into a

form (that can used to access the EMBOSS binary (application)

Understanding our Job Submission Requirements

Building a valid & secure command line (approx 300 EMBOSS

applications)

Issuing the command line (300 applications)

Retrieving results from the EMBOSS application

Our WS Job Submission should fulfill the EMBRACE Technology

recommendations of: Being a ‘Stateful Web Service’

Implement both synchronous and asynchronous functionality

Synchronous – submit a job (locked in to that application untill it returns a result)

Asynchronous (not synchronised) – submit a job but retain a free hand (not locked in)

– we can poll the service with a jobid to obtain job status and results

Operations to support requirement of ‘Stateful’ WS

RunJob: i.e. runJob(water); – all parameters for the job are

encapsulated in the water object. Operation will return a jobid.

CancelJob: i.e. cancelJob(“water12”);

This can be used to cancel the job execution

GetStatus: i.e. getStatus(“water12”);

Waiting, Scheduled, Running, Done, Cancelled, Aborted)

GetResult: i.e. getResult(“water12”);

Retrieve result of job, given a identifier

Do we have to reinvent the wheel? – Enter OMII

We propose borrowing established technology as one possible

solution to our requirements

Recently (this week) I met with Software Group Leader at OMII –

Open Middleware Infrastructure Institute based at University of

Southampton – www.omii.ac.uk

OMII is an established GRID middleware service provider – very

keen to have real users (developers using their products)

OMII design GRID related software products

What can they offer us?

We are interested in their GridSAM product

GridSAM consists of several subsystems that support:

Pluggable job persistence (if your job fails, it will be retried)

Job Queuing, Launching

Job Monitoring

Pending, staging in, active, executed, staging out, job

completed

GridSAM cont’d

File Staging (stage in input files, stage out output files)

All this functionality is available through an API – JobManager

Interface

Providing us with rich job submission functionality at little cost

Typically this functionality will be invoked from within the embedding

Application – web service – using the API

How do I pass my job content to GridSAM Server

Jobs are launched by passing a JSDL (Job Submission

Description Language) document to the GridSAM server from a

GridSAM client using the JobManager API

All of this can exist underneath your web service layer

Opportunity for a shared EMBRACE server perhaps!

Sample JSDL

<xml version”1.0” encoding=“UTF-8”?>

<JobDefinition xmlns=http://schemas.ggf.org/jsdl/2005/11/jsdl>

<JobDescription>

<Application>

<POSIXApplication xmlnshttp://schema.gff.org.jsdl/2005/11/jsdl-posix>

<Executable>/bin/echo</Executable>

</Application

</JobDescription>

</JobDefinition>

Very good! – What about the EMBOSS WS

As mentioned, we propose to transform the EMBOSS WS

payloads (soap message) at runtime into a valid JSDL document

to be submitted to GridSAM

GridSAM looks promising!

We will use the EMBOSS WS as a test bed

If successful we may make a recommendation to WP3

Thank you for listening!

top related