Outline
The presentation will discuss the challenges encountered in
exposing the EMBOSS suite of command line sequence analysis
tools as a ‘stateful’ SOAP based web service. An overview of the
proposed framework for client-side requests, server-side job
submission and results delivery will then be given.
What is EMBOSS?
EMBOSS is "The European Molecular Biology Open Software
Suite".
What can I use EMBOSS for? Consists of approx 300 command line applications covering areas such
as:
Sequence alignment
Rapid database searching with sequence patterns
Protein motif identification, including domain analysis
Phylogenetic analysis
Presentation tools for publication
What is JAX-WS?
In the words of SUN: JAX-WS - Java API for XML Web Services
(JAX-WS). is the centerpiece of a newly rearchitected API stack
for Web services, the so-called "integrated stack" that includes
JAX-WS 2.0, JAXB 2.0, and SAAJ 1.3.
Essentially a SOAP toolkit for Java
The implementation has been renamed (JAXRPC)
It brings clear improvements on data binding capabilities through
its tight integration with JAXB – Java API for XML Binding
Current State of (old) EBI EMBOSS Web Service
The current server-side implementation is Perl-based. Sample clients
are available in .Net, SOAP::Lite and Java (Axis) solutions.
Currently accepts free text as data input – weak typing – poor validation
capability
Supports both Synchronous and Asynchronous job submission.
Asynchronous requests are allocated a job id
Migrating to a Java-based JAX-WS server side implementation enables
us to have more control over the generated artifacts, increased data
validation capabilities and to rapidly improve on the functionality
provided.
EMBOSS Data Types
There are 52 datatypes (at the last count) used within the
EMBOSS suite of applications. These fall under five headings
1. Simple – Array, Boolean, Integer, String …
2. Input – Codon, Features, Sequence, Seqall …
3. Selection Lists – List, Selection …
4. Output – Align, Report, Seqout …
5. Graphics – Graph, Xygraph
EMBOSS Qualifiers
EMBOSS command line program
Accepts application name + qualifiers (each of which is a
datatype):
Water -asequence tsw:hba_human -bsequence
tsw:hbb_human : (water sequence seqall)
-asequence is of datatype Sequence, bsequence of Seqall
Qualifiers consist of associated qualifiers which can be also
passed to the command line to enable advanced configuration
of the application call. - sbegin, -send, -sformat
General, Additional & Advanced Qualifiers
General are common to all EMBOSS applications
-auto true - Turn off prompts (boolean datatype)
-stdout true - Write standard output (boolean)
Web Service Development
In accordance with the Technology Recommendation we have
chosen Top-Down approach to WS Development, not Bottom-
Up.
Top-Down Approach to WS Development
Express data types in schema
Write WSDL (include schema)
Generate Artifacts (JavaBeans – data objects, server side
stubs, implementation class
Top-Down Approach to WS Development
Top-Down
Express data types in schema
Write WSDL (include schema)
Generate Artifacts (JavaBeans – data objects, server side
stubs, implementation class
Package (WAR file)
Deploy WAR file to server
Sample EMBOSS Application Schema (Head)
<?xml version="1.0" encoding="UTF-8"?>
<definitions targetNamespace=“emboss"
xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<types>
<xsd:schema xmlns="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.ebi.ac.uk/ws/emboss/water/>
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:tns="http://www.ebi.ac.uk/ws/emboss/applications/water/"
xmlns:jxb="http://java.sun.com/xml/ns/jaxb" jxb:version="1.0">
Application Schema – Custom Bindings (cont’d)
<xsd:annotation>
<xsd:appinfo>
<jxb:schemaBindings>
<jxb:package name="uk.ac.ebi.ws.emboss.applications.water">
</jxb:package>
</jxb:schemaBindings>
</xsd:appinfo>
</xsd:annotation>
Express Application Parameters
<xsd:element name="asequence“/>
<xsd:complexType name="asequence">
<xsd:sequence>
<xsd:element name="asequence" type="xsd:string" nillable="false"/>
<xsd:element name="asequenceQualifiers" type="tns:asequenceQualifiers"
nillable="true"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Express asequenceQualifiers
<xsd:element name=“asequenceQualifiers”>
<xsd:complexType name=“asequenceQualifiers">
<xsd:sequence>
<xsd:element name="sbegin" type="xsd:integer"/>
<xsd:element name="send" type="xsd:integer"/>
<xsd:element name=“usa" type="xsd:string"/>
……
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Encapsulate all data types inside an application element
<xsd:element name="water" type="tns:water"/>
<xsd:complexType name="water">
<xsd:sequence>
<xsd:element name="asequence" type="tns:asequence"/>
<xsd:element name="bsequence" type="tns:bsequence"/>
<xsd:element name="datafile" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
Using JAXB Generated Java Beans at the client side
Java Bean Objects are generated using
for client using JAX-WS ‘wsimport’ tool –
compiles wsdl + schema
Generated objects are populated using
setter (client-side) i.e.
Sequence asequence = newSequence();
asequence.setUsa("tsw:hba_human");
asequenceQual.setSprotein(true);
asequenceQual.setSbegin(0);
EMBOSS Applications (300)
Manually create the schema – Not scaleable
Maven is a software project management & build tool.
Written an EMBOSS ACD parser plugin for our Maven WS
Software Build
Java class
Takes EMBOSS application definitions (ACD) as input
Output XML Schema, WSDL, representing each EMBOSS
application
These schema are passed to a JAXB compiler which generates
our Java Bean objects
Advantages of WS EMBOSS Software Build
Advantage of this approach is
We can auto-generate XML schema, Application WSDLs
Generate Java Objects for use on Client-Side
We can easily integrate new EMBOSS applications as a WS
by running the ACD file through our software build
Generated Artifacts
Why go to these lengths?
Because of sheer number of EMBOSS apps, necessary to
provide a clear means of representing the invocation of
separate applications and the passing of parameters
appropriate to that app.******* CLIENT SIDE CODE **********RunEmbossRequest run = new
RunEmbossRequest();EmbossParams water = new EmbossParams();water.setAsequence(asequence);water.setBsequence(bsequence);Emboss emboss = new Emboss();emboss.setApplication(EmbossApplication.WAT
ER);emboss.setApplicationParams(water);run.setEmbossParams(emboss);service = new WSEmbossService(); WSEmboss wsemboss =
service.getWSEmboss(); RunEmbossResponse response =
wsemboss.run(run);
Server-side – Reverse Process
At the server-side level, to obtain values objects can be de-
serialised using the Java getter methods, i.e.
******* SERVER-SIDE CODE **********
Emboss emboss = input.getEmbossParams();
EmbossApplication embossApp = emboss.getApplication();
String appname = embossApp.value();
EmbossParams water = emboss.getApplicationParams();
Sequence asequence = water.getAsequence();
Seqall bsequence = water.getBsequence();
This solution does not scale well
How do we get from a Web Service payload to a valid command line?
We are looking at the possibility of developing a generic
mechanism to transform the SOAP envelope (our WS inputs –
Water params etc) using XSL (Extensible Stylesheets) into a
form (that can used to access the EMBOSS binary (application)
Understanding our Job Submission Requirements
Building a valid & secure command line (approx 300 EMBOSS
applications)
Issuing the command line (300 applications)
Retrieving results from the EMBOSS application
Our WS Job Submission should fulfill the EMBRACE Technology
recommendations of: Being a ‘Stateful Web Service’
Implement both synchronous and asynchronous functionality
Synchronous – submit a job (locked in to that application untill it returns a result)
Asynchronous (not synchronised) – submit a job but retain a free hand (not locked in)
– we can poll the service with a jobid to obtain job status and results
Operations to support requirement of ‘Stateful’ WS
RunJob: i.e. runJob(water); – all parameters for the job are
encapsulated in the water object. Operation will return a jobid.
CancelJob: i.e. cancelJob(“water12”);
This can be used to cancel the job execution
GetStatus: i.e. getStatus(“water12”);
Waiting, Scheduled, Running, Done, Cancelled, Aborted)
GetResult: i.e. getResult(“water12”);
Retrieve result of job, given a identifier
Do we have to reinvent the wheel? – Enter OMII
We propose borrowing established technology as one possible
solution to our requirements
Recently (this week) I met with Software Group Leader at OMII –
Open Middleware Infrastructure Institute based at University of
Southampton – www.omii.ac.uk
OMII is an established GRID middleware service provider – very
keen to have real users (developers using their products)
OMII design GRID related software products
What can they offer us?
We are interested in their GridSAM product
GridSAM consists of several subsystems that support:
Pluggable job persistence (if your job fails, it will be retried)
Job Queuing, Launching
Job Monitoring
Pending, staging in, active, executed, staging out, job
completed
GridSAM cont’d
File Staging (stage in input files, stage out output files)
All this functionality is available through an API – JobManager
Interface
Providing us with rich job submission functionality at little cost
Typically this functionality will be invoked from within the embedding
Application – web service – using the API
How do I pass my job content to GridSAM Server
Jobs are launched by passing a JSDL (Job Submission
Description Language) document to the GridSAM server from a
GridSAM client using the JobManager API
All of this can exist underneath your web service layer
Opportunity for a shared EMBRACE server perhaps!
Sample JSDL
<xml version”1.0” encoding=“UTF-8”?>
<JobDefinition xmlns=http://schemas.ggf.org/jsdl/2005/11/jsdl>
<JobDescription>
<Application>
<POSIXApplication xmlnshttp://schema.gff.org.jsdl/2005/11/jsdl-posix>
<Executable>/bin/echo</Executable>
</Application
</JobDescription>
</JobDefinition>
Very good! – What about the EMBOSS WS
As mentioned, we propose to transform the EMBOSS WS
payloads (soap message) at runtime into a valid JSDL document
to be submitted to GridSAM
GridSAM looks promising!
We will use the EMBOSS WS as a test bed
If successful we may make a recommendation to WP3
Thank you for listening!