introduction to makeflow li yu university of notre dame 1
Post on 21-Dec-2015
219 views
TRANSCRIPT
![Page 1: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/1.jpg)
Introduction to Makeflow
Li YuUniversity of Notre Dame
1
![Page 2: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/2.jpg)
Overview
2
Distributed systems are hard to use!
An abstraction is a regular structure that can be efficiently scaled up to large problem sizes.
Today – Makeflow and Work Queue:◦ Makeflow is a workflow engine for executing
large complex workflows on clusters, grids and clouds.
◦ Work Queue is Master/Worker framework.◦ Together they are compact, portable, data
oriented, good at lots of small jobs and familiar syntax.
![Page 3: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/3.jpg)
General Workflow
3
D13D12
D11D10
F3
D14
F4
D15
D16 D17 D18
F5
Final Output
D1
F1
D2 D5…D7D6 D10
F2
…
![Page 4: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/4.jpg)
Makeflow
4
Makeflow is a workflow engine for executing large complex workflows on clusters, grids and clouds.
Can express any arbitrary Directed Acyclic Graph (DAG).
Good at lots of small jobs. Data is treated as a first class citizen. Has a syntax similar to traditional UNIX
Make It is fault-tolerant.
![Page 5: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/5.jpg)
Application – Data Mining
5
Betweenness Centrality◦ Vertices that occur on many shortest
paths between other vertices have higher betweenness than those that do not.
◦ Application: social network analysis.◦ Complexity: O(n3) where ‘n’ is the number of
vertices.
Highest Betweenness
![Page 6: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/6.jpg)
The Workflow
6
Vertex Neighbors
V1 V2, V5…
V2 V10, V13
…… ……
V5500000 V1000, …
algr
algr
algr
Output1
Vertex
Credits
V1 23
V2 2355
… …V5.5M 46923
Output2
OutputN
Final Output
Add… …
![Page 7: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/7.jpg)
Size of the Problem
7
About 5.5 million vertices About 20 million edges Each job computes 50 vertices (110K jobs)
Vertex Neighbors
V1 V2, V5…
V2 V10, V13
…… ……
V5500000 V1000, …
Vertex
Credits
V1 23
V2 2355
… …V5.5M 46923
Raw : 250MBGzipped: 93MB
Raw : 30MBGzipped: 13MB
Input Data Format Output Data Format
![Page 8: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/8.jpg)
The Result
8
Resource used: 300 Condor CPU cores 250 SGE CPU cores
Runtime:
2000 CPU Days -> 4 Days500X speedup!
![Page 9: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/9.jpg)
Application - Biocompute
9
Sequence Search and Alignment by Hashing Algorithm (SSAHA)
Short Read Mapping Package (SHRiMP) Genome Alignment:
CGGAAATAATTATTAAGCAA | | | | | | | | | GTCAAATAATTACTGGATCG
Single nucleotide polymorphism (SNP) discovery
![Page 10: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/10.jpg)
The Workflow
10
Align
Align
Align
Matches1
Matches2
MatchesN
All Matches
Combine
… …
Query
Split
Read1
Reference
Read1
Read1
Reference
Reference
…
![Page 11: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/11.jpg)
Sizes of some real workloads
11
Anopheles gambiae: 273 million bases 2.5 million reads consisting of 1.5 billion bases
were aligned using SSAHA
Sorghum bicolor: 738.5 million bases 11.5 million sequences consisting of 11 billion
bases were aligned using SSAHA
7 million query reads of Oryza rufipogon to the genome Oryza sativa using SHRiMP
![Page 12: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/12.jpg)
Performance
12
![Page 13: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/13.jpg)
Makeflow Example
13
part1 part2 part3: input.data
split.py ./split.py input.data
out1: part1 mysim.exe
./mysim.exe part1 >out1
out2: part2 mysim.exe
./mysim.exe part2 >out2
out3: part3 mysim.exe
./mysim.exe part3 >out3
result: out1 out2 out3 join.py
./join.py out1 out2 out3 > result
![Page 14: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/14.jpg)
Makeflow Syntax A Makeflow script consists of a set of rules. Each rule specifies:
a set of target files to create; a set of source files needed to create them; a command that generates the target files from
the source files.
14
Out1 : part1 mysim.exe ./mysim.exe part1 >out1
Target file(s) Source file(s)
Command
![Page 15: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/15.jpg)
No Phony Rules A correct rule:out1: part1 mysim.exe
./mysim.exe part1 >out1 An incorrect rule: out1:
./mysim.exe part1 >out1 Another incorrect rule: clean:
rm –rf *.o
15
![Page 16: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/16.jpg)
16
part1 part2 part3: input.data split.py
./split.py input.data
out3: part3 mysim.exe
./mysim.exe part3 >out3
result: out1 out2 out3 join.py
./join.py out1 out2 out3 > result
![Page 17: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/17.jpg)
A Real Example – Image Processing
17
Internet
1. Download
2. Convert
3. Combine into Movie
![Page 18: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/18.jpg)
Image Processing - Makeflow Script
18
# This is an example of Makeflow.CURL=/usr/bin/curl
CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert
URL=http://www.cse.nd.edu/~ccl/images/a.jpg
a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg
LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif
a.90.jpg: a.jpg
$CONVERT -swirl 90 a.jpg a.90.jpg
a.180.jpg: a.jpg
$CONVERT -swirl 180 a.jpg a.180.jpg
a.270.jpg: a.jpg
$CONVERT -swirl 270 a.jpg a.270.jpg
a.360.jpg: a.jpg
$CONVERT -swirl 360 a.jpg a.360.jpg
a.jpg:
LOCAL $CURL -o a.jpg $URL
Comments start with ‘#’
![Page 19: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/19.jpg)
Image Processing - Makeflow Script
19
# This is an example of Makeflow.CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convertURL=http://www.cse.nd.edu/~ccl/images/capitol.jpga.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg
LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif
a.90.jpg: a.jpg
$CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg
$CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg
$CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg
$CONVERT -swirl 360 a.jpg a.360.jpg a.jpg:
LOCAL $CURL -o a.jpg $URL
Stands for:/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert
![Page 20: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/20.jpg)
Image Processing - Makeflow Script
20
# This is an example of Makeflow.CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convertURL=http://www.cse.nd.edu/~ccl/images/capitol.jpga.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg
LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif
a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg
a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg
a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg
a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg
a.jpg: LOCAL $CURL -o a.jpg $URL
Forces this job to run on the controlling machine.
![Page 21: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/21.jpg)
Image Processing - Makeflow Script
21
# This is an example of Makeflow.
CURL=/usr/bin/curl
CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert
URL=http://www.cse.nd.edu/~ccl/images/capitol.jpg
a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg
LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif
a.90.jpg: a.jpg
$CONVERT -swirl 90 a.jpg a.90.jpg
a.180.jpg: a.jpg
$CONVERT -swirl 180 a.jpg a.180.jpg
a.270.jpg: a.jpg
$CONVERT -swirl 270 a.jpg a.270.jpg
a.360.jpg: a.jpg
$CONVERT -swirl 360 a.jpg a.360.jpg a.jpg:
LOCAL $CURL -o a.jpg $URL
![Page 22: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/22.jpg)
Image Processing - Makeflow Script
22
# This is an example of Makeflow.CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convertURL=http://www.cse.nd.edu/~ccl/images/capitol.jpg
a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif
a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg
a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg
a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg
a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg
a.jpg: LOCAL $CURL -o a.jpg $URL
![Page 23: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/23.jpg)
Get the example.makeflow script
23
% mkdir /tmp/makeflow
% cd /tmp/makeflow
% cp ~lyu2/Public/example.makeflow .% cat example.makeflow# This is an example of Makeflow.
CURL=/usr/bin/curl
CONVERT=/usr/bin/convert
URL=http://www.cse.nd.edu/~ccl/images/a.jpg
a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg
LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif
…………
![Page 24: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/24.jpg)
Setup the cctools environment (in csh)
24
Set the PATH to use cctools:% setenv PATH ~ccl/software/cctools/bin:$PATH
If the PATH is set correctly:% makeflow -hUse: makeflow [options] <dagfile>Where options are: -c Clean up: remove logfile and all targets.…………
If the PATH is NOT set correctly:% makeflow –hmakeflow: Command not found.
![Page 25: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/25.jpg)
Run the Makeflow Script
25
Just use the local machine:% makeflow example.makeflow
Output:makeflow: checking for duplicate targets...makeflow: DAG created.makeflow: checking rules for consistency...makeflow: Width of DAG: 4………………………………makeflow: nothing left to do.
Now we can check if the target file - a.montage.gif is successfully created.% display a.montage.gif
![Page 26: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/26.jpg)
Re-run a Makeflow Script If you run it a second time, nothing would
happen, because all of the target files are already created:% makeflow example.makeflow makeflow: nothing left to do
Use the -c option to clean everything up before trying it again:% makeflow -c example.makeflow
26
![Page 27: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/27.jpg)
Run the Makeflow Script with a Distributed System
27
Use a distributed system with ‘-T’ option:
◦ ‘-T condor’: uses the Condor batch system% makeflow -T condor example.makeflow
◦ Take advantage of Condor MatchMakerBATCH_OPTIONS=Requirements=(Memory>1024)\n Arch= x86_64
◦ ‘-T sge’: uses the Sun Grid Engine% makeflow -T sge example.makeflow
◦ ‘-T wq’: uses the Work Queue framework% makeflow -T wq example.makeflow
![Page 28: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/28.jpg)
Makeflow with Work Queue
28
Start workers on local machines, clusters,
via campus grid, etc.
WorkerWorker
WorkerWorker
WorkerWorker
WorkerWorker
WorkerMakeflow
Input App
OutputApp
put Appput Input
work “App < Input > Output”get Output
exec
DAG
![Page 29: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/29.jpg)
Ways of starting workers
Start one worker on your local machinework_queue_worker hostname port
Start some Condor workerscondor_submit_workers hostname port
Start some SGE workerssge_submit_workers hostname port
29
![Page 30: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/30.jpg)
Make Your Own Cloud
30
Condor
SGE
Makeflow –T wq example.makeflow
Cloud
1100 cores unlimited
4000 cores
(but you canonly have 250)
![Page 31: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/31.jpg)
Setup Condor environment
31
Set the PATH to use condor:% setenv PATH ~condor/software/bin:$PATH
If the PATH is set correctly:% condor_q-- Submitter: cclsubmit00.cse.nd.edu: <129.74.152.171:9087> : cclsubmit00.cse.nd.eduID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD0 jobs; 0 idle, 0 running, 0 held
If the PATH is NOT set correctly:% condor_q condor_q: Command not found.
![Page 32: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/32.jpg)
Re-run the makeflow with Work Queue
32
Go to the experiment directory and clean things up:
% cd /tmp/makeflow
% makeflow –c example.makeflowRun the example with Work Queue:
% condor_submit_workers `hostname` 9123 5% makeflow –T wq example.makeflow
![Page 33: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/33.jpg)
Google “Makeflow”
33
![Page 34: Introduction to Makeflow Li Yu University of Notre Dame 1](https://reader031.vdocuments.net/reader031/viewer/2022012916/56649d635503460f94a46717/html5/thumbnails/34.jpg)
Contact us Li Yu
[email protected] Peter Bui
[email protected] Prof. Douglas Thain
[email protected] Cooperative Computing Lab
http://www.cse.nd.edu/~ccl
34