impact/mygrid hackathon - introduction to taverna

40
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011

Upload: impact-centre-of-competence

Post on 12-Nov-2014

2.607 views

Category:

Education


0 download

DESCRIPTION

Katy Wolstencroft gives an introduction to Taverna at the IMPACT/myGrid Taverna Hackathon, 14th November 2011

TRANSCRIPT

Page 1: IMPACT/myGrid Hackathon - Introduction to Taverna

An Introduction to Designing, Executing and Sharing Workflows

with Taverna

Katy Wolstencroft

myGrid

University of Manchester

IMPACT/Taverna Hackathon 2011

Page 2: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 1: Exploring the Workbench

Taverna can be downloaded from http://www.taverna.org.uk/

Go to the page and find the latest (2.3) Download the correct version for your operating system Follow the instructions in the Taverna installer

The following page shows a screenshot of Taverna and the different panels that make up the workbench

Page 3: IMPACT/myGrid Hackathon - Introduction to Taverna

Taverna Workbench

Workflow DiagramServices Panel

Workflow Explorer

Page 4: IMPACT/myGrid Hackathon - Introduction to Taverna

1. Workflow Diagram

The visual representation of workflow Shows inputs/outputs, services and control flows Allows editing of the workflow by dragging and dropping

and connecting services together Enables saving of workflow diagrams for publishing and

sharing

Page 5: IMPACT/myGrid Hackathon - Introduction to Taverna

1. Workflow Explorer

The Workflow Explorer shows the detailed view of your workflow. It shows default values and descriptions for service inputs and outputs and it shows where remote services are located. It also shows configuration details, such as iteration and looping (we will come back to these things later).

Workflow validation details can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available.

Page 6: IMPACT/myGrid Hackathon - Introduction to Taverna

1. Available Services Panel

Lists services available by default in Taverna Local java services WSDL Web Service – secure and public RESTful Services R Processor services (for statistical analyses) Beanshell scripts Xpath scripts

Allows the user to add new services or workflows from the web or from file systems – there are loads more available!

Page 7: IMPACT/myGrid Hackathon - Introduction to Taverna

In the Services panel, type ‘image’ into the search box. Select ‘Get Image from URL’ This is a local service, but web services work the same way Many historical documents are stored as images on the

web. This is a simple, but useful service to help gather data

Exercise 2: Building a Simple Workflow

Drag this service across to the workflow diagram panel

Page 8: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 2: Building a Simple Workflow

In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port”

Type a name (e.g. URL) for this input in the pop-up window and click “ok”

Do the same to create a new workflow output. Call this output “image”

Page 9: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 2: Building a Simple Workflow

You now have 3 boxes in the diagram and we need to connect them up into a workflow

First, we need to find out how many inputs and outputs the ‘get image from URL’ service has

At the top of the workflow diagram, select the ‘show ports’ icon

Show Ports

Page 10: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 2: Building a Simple Workflow

Click on the workflow input box and drag the linking arrow across to the URL input of the ‘get_image_from_URL’ service.

Link the image output of ‘get_image_from_url’ to the workflow output port

Page 11: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 2: Building a Simple Workflow

You have now built your first workflow! It should look something like this.

In many cases, you have to supply input data for EVERY service input port. In this case, however, the ‘base’ input is optional, so we will leave it.

Save the workflow by going to file -> save workflow

Page 12: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 2: Building a Simple Workflow

Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

Page 13: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 2: Building a Simple Workflow

An input window will appear. As you can see, we have not yet added a description of the workflow or of the input

Click on ‘New Value’ in the input window and add the urlhttp://www.archives.gov/exhibits/featured_documents/magna_carta/images/magna_carta.jpg where it says “some input data goes here”

Page 14: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 2: Building a Simple Workflow

Click “run workflow” In the bottom left of the results window, click on the

results. You will now see an image from the specified web page

Workflow results can be saved here if required by clicking on ‘save all values’

Page 15: IMPACT/myGrid Hackathon - Introduction to Taverna

2: Adding a Workflow Description

Right-click on a blank part of the workflow diagram and select “show details”

In the workflow explorer panel, the details page will open up. Add some details about the workflow (e.g. who is the author, what does the workflow do).

You can also add examples and descriptions for the workflow inputs by selecting them in the explorer panel and selecting “details”

Adding this metadata makes the workflow much more reusable

Save the workflow by going to “File -> save workflow”

Page 16: IMPACT/myGrid Hackathon - Introduction to Taverna

New services can be gathered from anywhere on the web We will find a new service and add it to the workbench IMPACT and SACPE have a whole suite of services. We

will add one (you will be using it later on today) Go to https://fue.onb.ac.at/synapse. Here you will find a list

of IMPACT services Click on IMPACTTesseractV3Proxy and copy the link you

are directed to. This is the WSDL address and is what Taverna needs to

run the service

Exercise 3: Adding New Services

Page 17: IMPACT/myGrid Hackathon - Introduction to Taverna

3. Adding New Services

Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service

Select ‘WSDL service…’ A window will pop-up asking for a web address

Page 18: IMPACT/myGrid Hackathon - Introduction to Taverna

3. Adding New Services

Enter the service address you just copied

Scroll down the Services list, you will see your new service there

Page 19: IMPACT/myGrid Hackathon - Introduction to Taverna

Exercise 4: Sharing and Reusing Workflows

Go to http://www.myexperiment.org myExperiment is a social networking site for sharing

workflows and workflow expertise and experiences Browse around the site and see what it contains Find everything that has been tagged with ‘text mining’,

for example Look at the text mining workflows. You will see some

that are specific to biology, some that are generally applicable, and some that are specific to other scientific disciplines

Page 20: IMPACT/myGrid Hackathon - Introduction to Taverna

4. Sharing and Reusing workflows

IMPACT have many workflows on myExperiment, but they are not public. You must join an IMPACT group before you can see them and use them.

Create yourself an account and join the group called ‘IMPACT-myGrid-Hackathon’ (NOTE: you need to join this group to access content for future exercises)

Explore the shared items in this group. These are examples of the types of tasks IMPACT workflows can perform

Page 21: IMPACT/myGrid Hackathon - Introduction to Taverna

5. Using Workflows from myExperiment

You can download and run the workflows from the myExperiment website, or you can use myExperiment directly from Taverna

To use workflows from the website, you can either download them, or copy the workflow file location into the ‘open workflows from the web’ option in Taverna’s file menu.

Page 22: IMPACT/myGrid Hackathon - Introduction to Taverna

5. Using Workflows from myExperiment

Go back to Taverna and click on the myExperiment icon at the top of the workbench

Go to ‘my stuff’ and log in (using the same credentials as the web page)

Find the IMPACT-myGrid-Hackathon group by using the ‘search’ option.

Look at the shared items and find the workflow called ‘Text to List’

Click on ‘open’ and this workflow will be automatically imported into your Taverna design window

Page 23: IMPACT/myGrid Hackathon - Introduction to Taverna

5. Validate your Workflow

Taverna checks to see that everything is connected properly and that all the required services are available

Go to the workflow explorer and click on ‘validation report’

See if Taverna has found any problems with the workflow. Errors will be displayed in red, warnings in yellow. Workflows with warnings often still run.

If there are problems, follow the instructions to resolve them by clicking on the ‘Solution’ tab

If not, run the workflow

Page 24: IMPACT/myGrid Hackathon - Introduction to Taverna

5. Using Workflows from myExperiment

Use the default input suggested to run the workflow. The workflow will collect and list some example data stored at the given URL

It returns a list of image files We can now combine this workflow with the one we

made earlier to return the actual images.

In Taverna, you can add workflows as if they were any other kind of service – these are called ‘Nested Workflows’

Page 25: IMPACT/myGrid Hackathon - Introduction to Taverna

6. Reusing and connecting Workflows

From the current workflow design window, go to

‘Insert -> Nested workflow Import the workflow you

made earlier, by selecting ‘import from file’

You can see a small version of the workflows, so you can check you are importing the correct workflow

Page 26: IMPACT/myGrid Hackathon - Introduction to Taverna

6. Reusing and connecting Workflows

We now need to connect the two workflows together Connect the Text2List service to the input of the nested

workflow by dragging an arrow across. Make a new workflow output port (by right-clicking and

adding workflow output port) Connect the output of the nested workflow to the new

workflow output port

Page 27: IMPACT/myGrid Hackathon - Introduction to Taverna

6. Reusing and connecting Workflows

Your new workflow should look something like this

Save and run the workflow This time, as it runs, you will

see Taverna automatically iterates over the list of data produced by Text2List

NOTE: some of the iterations will fail. See if you can tell which

Look at one of the resulting images

Page 28: IMPACT/myGrid Hackathon - Introduction to Taverna

7. Looking at Intermediate Results

You can track intermediate workflow values through the results view. This is very useful for working out where unexpected results came from.

On the diagram, click the Text2List service and look at its inputs and outputs in the results.

You can save the workflow in myExperiment if you wish, but make sure you give credit to the nested workflow author and make sure you ONLY share it with the IMPACT-myGrid-Hackathon group

Page 29: IMPACT/myGrid Hackathon - Introduction to Taverna

Controlling data flow in WorkflowsAdvanced Exercises

Page 30: IMPACT/myGrid Hackathon - Introduction to Taverna

As you have already seen, Taverna can automatically iterate over sets of data.

When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have:

A cross product – combining every item from list 1 with every item from list 2 - all against all

A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line

8. Iteration

Page 31: IMPACT/myGrid Hackathon - Introduction to Taverna

Find and load the workflow ‘Demonstration of configurable iteration’ from myExperiment

Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’)

Select the ‘ColourAnimals’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’

Click on ‘dot product’ in the pop-up window. This allows you to switch to cross product

8. Iteration

Page 32: IMPACT/myGrid Hackathon - Introduction to Taverna

Run the workflow twice – once with ‘dot product’ and once with ‘cross product’.

Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product?

8. Iteration

Page 33: IMPACT/myGrid Hackathon - Introduction to Taverna

9. Retries: Making your Workflow Robust

Web services can sometimes fail due to network connectivity

If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow

Upload the ‘Retry-Example’ workflow from the IMPACT-myGrid-Hackathon group. This workflow is designed to fail sometimes.

Run the workflow as it is and count the number of failed iterations

Page 34: IMPACT/myGrid Hackathon - Introduction to Taverna

9. Retries: Making your Workflow Robust

Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel

Click on ‘advanced’ and ‘configure’ for retries In the pop-up box, change it so that it retries each

service iteration 2 times Run the workflow again – how many failures do you get

this time? Change the workflow to retry 5 times – does it work

every time now?

Page 35: IMPACT/myGrid Hackathon - Introduction to Taverna

10. Looping

From myExperiment, download and open the workflow “dummy_example_of_looping”

This workflow is asynchronous. This means that when you submit data (by running the workflows), it will return a jobID and place your job in a queue. This is very useful if your job will take a long time!

The ‘CheckStatus’ service will query your job ID to find out if it is complete

Page 36: IMPACT/myGrid Hackathon - Introduction to Taverna

10. Looping

The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks?

Run the workflow Almost every time, the workflow will ‘fail’ (in this case,

that means it will return 0) because the results have not been returned before the workflow reaches the ‘getResults’ service

Page 37: IMPACT/myGrid Hackathon - Introduction to Taverna

10. Looping

This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done.

Select the ‘CheckStatus’ service and click on the ‘details’ tab in the workflow explorer

Select ‘advanced’ and click on ‘add looping’ Use the drop-down boxes in the looping window to set

‘state’ ‘is_not_equal_to’ RUNNING

Page 38: IMPACT/myGrid Hackathon - Introduction to Taverna

10. Looping

Save the workflow and run it again This time, the workflow will run until the ‘CheckStatus’

service reports that it is either COMPLETE, or it has an ERROR.

You will see results for ‘GetResults’, but you will still get an error for ‘GetResults2’. This is because there is one more configuration to change – we also need ‘Control Links’

Page 39: IMPACT/myGrid Hackathon - Introduction to Taverna

A control link specifies that there is a dependency of one service on another even though there is no data flowing between them.

A control link is a line with a white circle at the end that connects two services (see the link between ‘CheckStatus’ and ‘getResults’

11. Control Links

Page 40: IMPACT/myGrid Hackathon - Introduction to Taverna

11. Control Links

We will add control link to getResults2 Right-click on getResult2 and select ‘Run after’ from the

drop down menu. Set it to ‘Run after’ -> ‘CheckStatus’ Save and run the workflow Now you will see both results returned