bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus
DESCRIPTION
Reproducibility is a fundamental goal of good experimental science. Despite the increasing availability and deployment of analytic frameworks such as Galaxy, readily reproducible bioinformatic analysis remains difficult to achieve. Mature complex workflows often require small tweaks to accommodate the idiosyncracies of new datasets, but integrating the required new capabilities into the framework is prohibitively complex and expensive. As a result, when problems are encountered in an existing pipeline, data may be temporarily diverted for manual processing outside the framework. These manual steps typically involve relatively trivial, transient, undocumented and poorly curated programs or scripts - "dark script matter" that rarely reaches appropriate local version control or archiving systems where production code is maintained, threatening the goal of reproducible analysis. The Galaxy Toolfactory is a Galaxy tool that allows scripts (R, perl, python, Bash...) to be run directly and repeatably through the normal Galaxy interface. The Toolfactory optionally generates all the biolerplate code needed for a new Galaxy tool that permanently wraps the script for reuse. Newly generated tools can be uploaded to a local or remote Galaxy Toolshed. Tools can be installed in a running Galaxy server from any Toolshed through the administrative interface for subsequent use in worflows and analyses. The conversion of a trivial script into a working, shareable Galaxy tool will be demonstrated during the presentation.TRANSCRIPT
![Page 1: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/1.jpg)
1
Bioinformatic Alchemy 101
Transmuting dark script
matter into reusable tools
Ross Lazarus
BakerIDI
![Page 2: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/2.jpg)
2
Context: bioinformatic analyses
Big data; complex analyses
Repeatable, automated pipelines
Reproducibility real goal
Reproducibility is hard
![Page 3: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/3.jpg)
3
Frameworks
Eg VGL
Local SOPs for biologists
Tools, canned workflows
Minimise opportunities for error
Maximise reproducibilty
![Page 4: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/4.jpg)
4
In real life
90/10 rule
Need to tweak SOPs
Trivial 'disposable' scripts
Not documented or curated
Not reliably available to re-run
“Dark script matter”
![Page 5: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/5.jpg)
5
Dark Script Matter
Outside usual VCS/pipelines
Manual =/= reproducible
Necessary evil?
Platform extensions complex
Eg Galaxy – hours of work
![Page 6: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/6.jpg)
6
Plan
Context: Reproducible analyses
Frameworks vs Dark Scripts
Alchemy: script to Galaxy
tool Demonstration
Summary
Conclusions
![Page 7: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/7.jpg)
7
Galaxy Tool Factory
An installable Galaxy tool
Runs scripts: Python,R,Perl,sh
Generates new Galaxy tools
Tool code wraps the script
Minutes – not hours
![Page 8: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/8.jpg)
8
Galaxy Tool Shed
Separate server
Stores/serves Galaxy tools
Admin can install to Galaxy
Mercurial VCS archives
Explicit tool versioning
Sharing and reproducibility
![Page 9: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/9.jpg)
Demo 1: Install the Tool Factory
![Page 10: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/10.jpg)
Demo 2: Create a new tool
![Page 11: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/11.jpg)
11
Prepare script
Python; R; Perl; Sh
Parse CL params – 1=in, 2=out
Typically workflow transformations
Arbitrary complexity
Simple example
Write transpose of a tabular file
![Page 12: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/12.jpg)
12
Prepare/upload test data
SMALL sample input
Becomes functional test case
h1 h2 h3 h4
r11 r12 r13 r14
r21 r22 r23 r24
r31 r32 r33 r34
![Page 13: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/13.jpg)
13
# R transpose a tabular input file and write as
# a tabular output file
ourargs = commandArgs(TRUE)
inf = ourargs[1]
outf = ourargs[2]
inp = read.table(inf,head=F,row.names=NULL,sep='\t')
outp = t(inp)
write.table(outp,outf,quote=FALSE, sep="\t",row.names=F,col.names=FALSE)
![Page 14: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/14.jpg)
14
Demo part 1
As an admin, test run the code
Can't make a new tool until it works!
Admin only real time scripting in Galaxy.
Overrides ALL other security.
Generated tools run with normal security.
![Page 15: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/15.jpg)
15
Use Redo button; Generate
When working right
Use Redo to save retyping
Select Generate option
Provide tool ID, help text
Execute
Expect a toolfactory.gz in history
Copy link (floppy disk icon)
![Page 16: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/16.jpg)
16
What's in the toolshed.gz ?
A gzip'd mercurial tool repository (!)
Auto generated tool XML file
Auto generated tool python wrapper
Functional test case - the sample data
Familiar Galaxy tool for all users
Executes your script over their data
Interoperably inside Galaxy
![Page 17: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/17.jpg)
17
Upload TS gzip to new repository
Upload to any tool shed
Create new repo; sensible name!
Choose Upload files to new repo
Paste URL (floppydisk save icon)
New tool ready to install
![Page 18: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/18.jpg)
18
Install and Test New Tool
Back to Galaxy admin interface
Browse local tool shed
Choose new tool
Install to local Galaxy
Try it out
Run functional test
![Page 19: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/19.jpg)
19
Summary
GTF = script to tool in minutes
Integrated with Galaxy and TS
Simple workflow components
If needed, generate simple tool
Then add parameters manually
![Page 20: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/20.jpg)
20
Tool Factory Operation Guide
Script
(Python,R,
perl, sh)
Galaxy Tool Factory
Tool Form;
Paste script;
Generate TS gzip;
Copy download link for
pasting
Upload/paste
Sample Input for
functional test Test run;
Check outputs;
Rerun/fix;
Tool Shed
Create new repository.
Upload files – paste TS gzip
link and upload
Install new tool from toolshed
from Galaxy admin page;
Test; Functional test;
![Page 21: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/21.jpg)
21
GALAXY
http://usegalaxy.org
![Page 22: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/22.jpg)
22
Generate a new Galaxy tool
Galaxy Tool Factory
From a python, R, Perl or bash script
# transpose a tabular input file and write as a tabular output file
ourargs = commandArgs(T)
inf = ourargs[1]
outf = ourargs[2]
inp = read.table(inf,head=F,row.names=NULL,sep='\t')
outp = t(inp)
write.table(outp,outf,quote=F, sep="\t",row.names=F,col.names=F)
Using a Galaxy tool
Via a Tool Shed
![Page 23: Bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus](https://reader034.vdocuments.net/reader034/viewer/2022052618/554e8a39b4c90573338b49be/html5/thumbnails/23.jpg)
23
Tool Factory Operation Guide
Script – R,
perl, python
Galaxy Tool Factory
Tool Form;
Paste script;
Generate TS gzip;
Copy download link for
pasting
Upload/paste
Sample Input for
functional test Test run;
Check outputs;
Rerun/fix;
Tool Shed
Create new repository.
Upload files – paste TS gzip
link and upload
Install new tool from toolshed
from Galaxy admin page;
Test; Functional test;