biocatalogue, filequirks, myexperiment

30
Summary from my fellowship in Manchester e-Science in Manchester Jerzy Orłowski Jerzy Orłowski

Upload: jerzy

Post on 02-Jul-2015

678 views

Category:

Technology


0 download

DESCRIPTION

Presentation from IIMCB Seminar. Summary from my Fellowship in MyGrid, Manchester

TRANSCRIPT

Page 1: Biocatalogue, FileQuirks, MyExperiment

Summary from my fellowship in Manchester

e-Science in Manchester

Jerzy OrłowskiJerzy Orłowski

Page 2: Biocatalogue, FileQuirks, MyExperiment

What will I talk About

Part one: Biocatalogue SearchByData

Searching for services that will analyze or process your data file

Other ideas, born meanwhile

Part two: Other things I've done meanwhile

Part three: How do they do it in MyGrid

New methodology that we could adopt

How can I help you, how you can help me

Page 3: Biocatalogue, FileQuirks, MyExperiment

Part one:Biocatalogue SearchByData

Page 4: Biocatalogue, FileQuirks, MyExperiment

Biocatalogue

The BioCatalogue is a catalogue of Life Science Web Services

A web service is a network application with programmatic interface

BioCatalogue relies on community annotation

Service providers

Users

Curator

Technology: Ruby on Rails

Page 5: Biocatalogue, FileQuirks, MyExperiment

Browsing services

Page 6: Biocatalogue, FileQuirks, MyExperiment
Page 7: Biocatalogue, FileQuirks, MyExperiment

My contribution

Search ByData

Ability to find services not on tags, providers etc. but on exemple input files

Algorithm based on FileQuirks based in GeneSilico

User provides an real input file, which is matched with example inputs of all the services using regular expressions

Services most likely to analyze / process user file are returned

Page 8: Biocatalogue, FileQuirks, MyExperiment
Page 9: Biocatalogue, FileQuirks, MyExperiment

Other ideas – getting example input files

The main limitation of Search By Data is lack of example inputs for services

for 1169 services, more than 3000 operations there are no more than 500 example inputs

Most of inputs are numbers or ids

Idea – get more inputs:

From people executing the servicesTaverna Provenance

Soap Servlet

By executing services by bots with some data

Page 10: Biocatalogue, FileQuirks, MyExperiment

Soap Servlet

Automatic generation of web interface for web services:

For users: allows to quickly test or execute a service

For us: allows to get example inputs for services

Currently – alpha version

Page 11: Biocatalogue, FileQuirks, MyExperiment

Soap Servlet interface for Afold

Page 12: Biocatalogue, FileQuirks, MyExperiment

Soap Servlet interface for Afold

Page 13: Biocatalogue, FileQuirks, MyExperiment

Part 2:Other projects I've done meanwhile

Page 14: Biocatalogue, FileQuirks, MyExperiment

GeneSilico web services

Turning some of our programs into SOAP services

ProteinSilico

ModeRNA

Parts of MetaServer

Parts of MetaRNA

See: https://wiki.genesilico.pl/GenesilicoSOAPServices

Good documentation on BioCatalogue, used to test Search By Data

Page 15: Biocatalogue, FileQuirks, MyExperiment

FileQuirks

FileQuirks – web server for recognition of biological data types

New user interface

More data types

Help pages

Summary sent to NAR (waiting for decision)

http://filequirks.genesilico.pl

Page 16: Biocatalogue, FileQuirks, MyExperiment
Page 17: Biocatalogue, FileQuirks, MyExperiment
Page 18: Biocatalogue, FileQuirks, MyExperiment

FileQuirks Help Pages

I decided to use Joomla CMS

Help pages have standard format

Joomla make them easy to write and update

GeneSilico home page is written in Joomla so it would be easy to migrate/merge and graphic template already exists

It easy to add help pages of other services

Software and server list on http://www.genesilico.pl/index.php/servers.html is outdated

Page 19: Biocatalogue, FileQuirks, MyExperiment

Genesilico web services

Web service is a network tool with programatic API “program as a service”

Pros

Compatibility between languages (XML is the protocol)

Code reusage – no need to install programs

Easy linking with other tools

Automatic user interface generation

Cons:

You have to maintain the server

Harder to make it private

Page 20: Biocatalogue, FileQuirks, MyExperiment

Example 1

MetaMQAP

Kudlaty Chimera MetaMQAP plugin uses MetaMQAP (wrote his own interface)

Toolkit uses MetaMQAP

I have also written scripts for using MetaMQAP

Conclusions:MetaMQAP needs to be installed and maintained on many

different systems by different people

Making a SOAP server will save people time

Page 21: Biocatalogue, FileQuirks, MyExperiment

Example 2

Methods for RNA secondary structure prediction

They are used by RNA MetaServer

Tomek Puton uses them for CompaRNA

They were used by me for testing Search By Data

Conclusions:SOAP interface for fast methods exists

It just need updating and incorporating in other tools

Page 22: Biocatalogue, FileQuirks, MyExperiment

GeneSilico web services

Instructions on:

https://wiki.genesilico.pl/GenesilicoSOAPServices

Page 23: Biocatalogue, FileQuirks, MyExperiment

How do they do it in MyGrid?

Some methodology we might adopt or just be aware of

Page 24: Biocatalogue, FileQuirks, MyExperiment

Working system

They do dot make science itself – they make tools for scientists

And science about how new technologies are adopted in science

Every project is collaboration with other groups

There is always more than one people working on a project

more than 25% of time spent on meetings

Code developers are not scientists, but employees

Page 25: Biocatalogue, FileQuirks, MyExperiment

Working system2 “uncommon” positions

Project managerNot a scientist

Not a developer

Responsible for: keeping up with release schedule

grant schedule

cooperation between projects

Service curatorNot a developer

Responsible for keeping in touch with user community

Organizing meetings with focus group, jamborees etc.

Page 26: Biocatalogue, FileQuirks, MyExperiment

Working system

No seminars

Instead weekly meeting with advances on all projects

A lot of project dedicated meetings and teleconferences

Page 27: Biocatalogue, FileQuirks, MyExperiment

Sharing policy

Code and ideas are even from the beginning of the project

Scientific finding can be published only once but tools can be better and better

Selling your ideas enables cooperations and making tools compatible – better grants

Publishing your code (git, svn) get you more users – nice for publications and grants

Page 28: Biocatalogue, FileQuirks, MyExperiment

Development

Languages: Java and Ruby on Rails

Every code is under version control

Massive branching and merging

Dependency management systems (maven)

All services are hosted

Collaborations (EMBL-EBI)

Corporate hosting

Clouds (Amazon EC2)

Making user community

Page 29: Biocatalogue, FileQuirks, MyExperiment

Summary – what we could discuss

Programatic interfaces (Middleware)I can make SOAP interfaces for you,

deploy and publish them

I would require you to use such interfaces in your future code

What else I can give:CMS for public help pages for programs

and web servers

What I'd like to ask

Page 30: Biocatalogue, FileQuirks, MyExperiment

Acknowledgments

MyGrid

Carole Goble

Charlotte Hooson-Sykes

Jithen Bhagat, Franck Tanoh, Soahib Sufi, Peter Li and others

University of Southampton

David da Roure

GeneSilico

Janusz Bujnicki

Iga Korneta

Piotr Iwaniuk, Jakub Jopek, Bartosz Bedyński, Artur