jip pipeline system introduction

Post on 13-Jun-2015

228 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

This talks covers some of the basic aspects of the JIP pipeline system (http://pyjip.readthedocs.org) and it's command line interface. JIP is a system to manage jobs on a Cluster system and simplify the process of building computational pipelines. JIP can interact with Slurm, SGE, PBS/Torque, or LSF clusters and comes with a small local scheduler to run without any remote grid engine.

TRANSCRIPT

J I P - P I P E L I N E S Y S T E MA C C E S S I B L E H I G H T H R O U G H P U T C O M P U T I N G

W H Y ?S E R I O U S LY

• Job Management

• Implementation

• Batch job handling

• Reusable and…

• … documented tools

L O C AT I O N S

P L E A S E TA K E A L O O K

• Documentation http://pyjip.rtfd.org

• Source Code https://github.com/thasso/pyjip

• Exampleshttps://github.com/thasso/pyjip/tree/master/examples

C L I O R A P I

• Commands to run and submit jobs

• List and query jobs

• Manipulate jobs (delete, archive, cancel, edit,…)

• Cleanup jobs and list profiles and tools

• Start your own server

Commands ======== run Locally run a jip script submit submit a jip script to a remote cluster bash Run or submit a bash command !List and query jobs =================== jobs list and update jobs from the job database !Manipulate jobs =============== delete delete the selected jobs archive archive the selected jobs cancel cancel selected and running jobs hold put selected jobs on hold restart restart selected jobs logs show log files of jobs edit edit job commands for a given job show show job options and command for jobs !Miscellaneous ============= tools list all tools available through the search paths profiles list all available profiles clean remove job logs check check job status server start the jip grid server

C

L

I

O

R

A

P

I

H E L L O W O R L D

Lets get started

H E L L O W O R L D

#!/usr/bin/env jip # Prints hello world !echo "Hello world"

#!/usr/bin/env jip # Prints hello world using perl !#%begin command perl print "Hello world\n"; #%end

#!/usr/bin/env jip !#%begin command python print "Hello world" #%end @pytool()

def hello_world(): """Prints hello world""" print "Hello python"

#%begin command [perl|RScript|…]

• command block to run scripts

• specify an interpreter (default bash)

• use templates to access options and variables

#%end

O P T I O N S A N D D O C U M E N TAT I O N

• Options are specified in your documentation

• Specify Inputs, Outputs, and other Options

• Options are available as ${variables}

O P T I O N S A N D D O C U M E N TAT I O N

#!/usr/bin/env jip # # BWA/Samtools pileup # # Usage: # pileup.jip -i <input> -r <reference> -o <output> # # Inputs: # -i, --input <input> The input file # -r, --reference <reference> The genomic reference # # Outputs: # -o, --output <output> The .bcf output file # # Options: # —-fast Enable fast mode

T E M P L AT E S A N D VA R I A B L E S

• Access variables and options ${variable}

• Apply filters:

• arg — ${bool|arg} ${file|arg(“>”)}

• pre / suf — ${input|suf(“.txt”)}

• name, ext, and, abs — ${input|name|ext}

S I N G L E T O O L S

• Inputs, Outputs, Options

• Phases:

• init — initialise the tool and its options

• setup — perform setup using option (values)

• validate — check input files and options

• execute — execute through interpreter

E X E C U T I O N

• Check all inputs (dependency aware)

• Update the DB and run the command block

• Update DB

S U C C E S S FA I L U R E

• Remove output

• Update DB

G E M T O B E D

#!/usr/bin/env jip # Delegates to gem-2-bed to create BED graphs from .map files # # Usage: # gem2bed -i <input> -I <index> # # Inputs: # -i, --input <input> The .map input file (can be compressed) # -I, --index <index> The .gem index !#%begin init add_output('graph', '${input|name|re("\.map(.gz)?", ".bg")}') add_output('sizes', '${input|name|re("\.map(.gz)?", ".sizes")}') #%end !zcat -f ${input} | \ ${__file__|parent}/gem-2-bed blocks-coverage -I ${index} \ -o ${graph|ext} -T $JIP_THREADS

D O C U M E N TAT I O N

I N I T I A L I S AT I O N

E X E C U T I O N

B E D 2 B I G W I G#!/usr/bin/env jip # Delegates to gem-2-bed to create BED graphs from .map files # # Usage: # bed2wig -g <graph> -s <sizes> [-o <output>] # # Inputs: # -g, --graph <graph> The graph file generated with gem-2-bed # -s, --sizes <sizes> The sizes file generated with gem-2-wig # # Outputs: # -o, --output <output> The output file name # [default: ${graph|ext}.bw] !#%begin init add_output('output', '${graph|name|ext}.bw') #%end !#%begin setup profile.threads = 1 #%end !${__file__|parent}/bedGraphToBigWig ${graph} ${sizes} ${output}

P I P E L I N E S

• Inputs, Outputs, Options

• Phases

• init, setup, validate

• create pipeline

G E M 2 B I G W I G

#!/usr/bin/env jip # Creates a bed graph from a .map file and converts it to wig # # Usage: # gem2wig -i <input> -I <index> # # Inputs: # -i, --input <input> The .map input file (can be compressed) # -I, --index <index> The .gem index !#%begin pipeline bed = job(temp=True).run('gem2bed', input=input, index=index) run('bed2wig', graph=bed.graph, sizes=bed.sizes)

G E M 2 B I G W I G

#!/usr/bin/env jip # Creates a bed graph from a .map file and converts it to wig # # Usage: # gem2wig -i <input> -I <index> # # Inputs: # -i, --input <input> The .map input file (can be compressed) # -I, --index <index> The .gem index !#%begin pipeline bed = job(temp=True).run('gem2bed', input=input, index=index) run('bed2wig', graph=bed.graph, sizes=bed.sizes)

D O C U M E N TAT I O N

P I P E L I N E

#%begin pipeline

bed = job(temp=True).run('gem2bed', input=input, index=index)

#%end

#%begin pipeline

bed = job(temp=True).run('gem2bed', input=input, index=index)

run('bed2wig', graph=bed.graph, sizes=bed.sizes)

#%end

D E M O

M U LT I P L E X I N G

S T R E A M S

M U LT I P L E X I N G A N D S T R E A M S

echo "Hello World" | \ (tee > producer_out.txt | (tee >(wc -w) | wc -l))

bash('echo "Hello World"'), output='producer_out.txt') \ | (bash('wc -l') + bash('wc -w'))

producer = bash('echo "Hello World"', output='producer_out.txt') word_count = bash("wc -w", input=producer) line_count = bash("wc -l", input=producer) producer | (word_count + line_count)

B A S H

J I P

J I P

Common Questions

S U B M I T S I N G L E C O M M A N D S

• The jip bash command wraps single executions

• You can run or submit

• Dry runs and multiplexing are supported

D E M O

S U B M I T F O R M U LT I P L E F I L E S

• Fan-Out operations work for all tools

• Define a single input option

• Specify multiple values

• Works also for the jip bash command

D E M O

W H AT W A S T H E C O M M A N D

• jip show shows job properties and the command

• jip edit loads the job command in an editor

D E M O

R E S TA R T I N G A N D M O V I N G

• jip restart resubmits jobs after failure

• jip restart can also move jobs and pipelines to other queues/partitions

D E M O

C U S T O M I S E L O G F I L E S

• The job profile covers stdout and stderr log files

• jip logs finds and shows log files for jobs

D E M O

Q U E S T I O N S ?

Thank You

top related