case studies of using condor for scientists barcelona, 2006
DESCRIPTION
Case Studies of Using Condor for Scientists Barcelona, 2006. Agenda. Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs. BLAST. Background. - PowerPoint PPT PresentationTRANSCRIPT
Condor ProjectComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Case Studies of Using Condor for Scientists
Barcelona, 2006
2http://www.cs.wisc.edu/condor
AgendaExtended user’s tutorialAdvanced Uses of Condor
Java programsDAGManStorkMWGrid Computing
Case studies, and a discussion of your application‘s needs
3http://www.cs.wisc.edu/condor
BLAST
4http://www.cs.wisc.edu/condor
Background
• Each species has a genetic encoding within its cells
• Humans are made of approximately 1014 cells
5http://www.cs.wisc.edu/condor
Background• The human nucleus of each
cell contains 46 chromosomes• Each chromosome contains
between 231 and 2958 genes• Each chromosome is made of
somewhere between 25 million and 237 million (approximately) base pairs
6http://www.cs.wisc.edu/condor
7http://www.cs.wisc.edu/condor
Base Pairs (Simplified)
• Each base pair is one of 4 nucleotides
• Each nucleotide is represented by one letter:
A C G T
8http://www.cs.wisc.edu/condor
The Science Issue
Scientists ask many questions and pose computationally difficult issues:map a species’ genome - build a huge
database of informationunderstand evolution at a genetic level –
answer homology and related questionsidentify mutations and genes – to develop
diagnoses and medical treatments
9http://www.cs.wisc.edu/condor
BLAST
Basic Local Alignment Search Tool A really good pattern matching program An answer to the science questions often
requires queries such asDoes the following nucleotide sequence
(~1000 pairs), or something close appear in the database (several billions of pairs)? To what certainty is there a match?
10http://www.cs.wisc.edu/condor
The Biological Magnetic Resonance Data Bank
Department of Biochemistry at University of Wisconsin-Madison
Part of the Center for Eukaryotic Structural Genomics (CESG)
Working on three dimensional protein structure
11http://www.cs.wisc.edu/condor
The BMRB and BLAST
The BMRB (with the help of the Condor Team) has a weekly set of automated BLAST runs
These BLAST runs compare progress on the BMRB set of working proteins to the Protein Data Bank
12http://www.cs.wisc.edu/condor
Serial versus Parallel
Too slow: The BMRB working set could be input as a single BLAST program execution Load the Protein Data Bank database Serially query the database with each protein
in the working set
Faster: Divide the working set into pieces that allow parallel executions of BLAST
13http://www.cs.wisc.edu/condor
Weekly BMRB Runs
1. Obtain and install the BLAST executable and Protein Data Bank database
2. Decide on the best way to split the BMRB working set of proteins to minimize the parallel execution time
3. Make a custom DAG for this split4. Produce a report on the BMRB run
14http://www.cs.wisc.edu/condor
E
BBB
The Custom DAG
. . .
E E. . .
C
B is BLAST
E is Extract results
15http://www.cs.wisc.edu/condor
An Economics Application
Computations are done at points on a coordinate plane
Initial values are known along the axes Computation of one point at a time is too
slow (serial execution) Each point is dependent on 2 neighboring
points(x,y) can be computed knowing (x-1,y) and (x,y-
1)
16http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
result
17http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
18http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
19http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
20http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
21http://www.cs.wisc.edu/condor
The Coordinate Plane
1 2 3 5 64
1
2
3
4
5
6
know
n
resultinput
s
ready
22http://www.cs.wisc.edu/condor
The DAG
1-1
1-2
2-1
1-3
2-2
3-1
1-4
2-3
3-2
4-1
etc.
23http://www.cs.wisc.edu/condor
Use DAGMan
Write a program to generate the DAG input file
The submit description file (and the executable) is the same for each node in the DAG
24http://www.cs.wisc.edu/condor
DAG Input FileJob 1-1 gonkulate.submitJob 1-2 gonkulate.submitParent 1-1 Child 1-2Job 2-1 gonkulate.submitParent 1-1 Child 2-1Job 1-3 gonkulate.submitParent 1-2 Child 1-3Job 2-2 gonkulate.submitParent 1-2 2-1 Child 2-2Vars 2-2 left=“file1-2”Vars 2-2 below=“file2-1”Vars 2-2 result=“file2-2”. . .
DAG input file, continued
Job 3-4 gonkulate.submit
Parent 2-4 3-3 Child 3-4
Vars 3-4 left=“file2-4”
Vars 3-4 below=“file3-3”
Vars 3-4 result=“file3-4”
. . .
25http://www.cs.wisc.edu/condor
Submit Description File
In gonkulate.submit:universe = vanillaexecutable = gonkulateoutput = $(result)should_transfer_files = YESwhen_to_transfer_output = ON_EXITtransfer_input_files = $(left) $(below)log = gonkulate.lognotification = Neverqueue
26http://www.cs.wisc.edu/condor
Nug30
27http://www.cs.wisc.edu/condor
Description of Nug30 nug30 (a Quadratic Assignment Problem
instance of size 30) had been the “holy grail” of computational QAP research since 1968
In 2000, Anstreicher, Brixius, Goux, & Linderoth set out to solve this problem
Using a mathematically sophisticated and well-engineered algorithm, they still estimated that we would require 11 CPU years to solve the problem.
28http://www.cs.wisc.edu/condor
Nugent’s Problem
There are a set of N locations and a set of N facilities, and each facility must be assigned a location. To measure the cost of each possible assignment, the flow between each pair of facilities is multiplied by the distance between the pair's assigned locations, and then a sum is taken over all of the pairs.
For Nug30, N = 30
29http://www.cs.wisc.edu/condor
The formal definition of the quadratic assignment problem is Given two sets, P ("facilities") and L ("locations"), of equal
size, together with a weight function w : P x P R and a distance function d : L x L R. Find the bijection f : P L (assignment) such that the cost function:
w(a,b) . d(f(a), f(b))
is minimized and a and b are members of P.Usually weight and distance functions are viewed as a
square real-valued matrices.
QAP Definition*
* Wikipedia
30http://www.cs.wisc.edu/condor
Scope of the Problem
This QAP problem is difficult due to the excessively large number of possible facility assignments.
The number of possible assignments is factorial in the number of facilities.N! = N x (N-1) x (N-2) x . . . x 2
30! is approximately 2.6 x 1032
31http://www.cs.wisc.edu/condor
The Simplified Approach
• Method of choice is branch and bound
• The complete tree has 30! nodes as leaves
• Branching grows the tree• Bounding results in
pruning the tree
32http://www.cs.wisc.edu/condor
The Nug30 Solution
Used a new algorithm calledquadratic programming bound
developed by Anstreicher and Brixius Sequential execution would have
taken 7 years, so parallelization of the algorithm was important
Used MW
33http://www.cs.wisc.edu/condor
Nug30 Computational Grid
Number Arch/OS Location 414 Intel/Linux Argonne
96 SGI/Irix Argonne
1024 SGI/Irix NCSA
16 Intel/Linux NCSA
45 SGI/Irix NCSA
246 Intel/Linux Wisconsin
146 Intel/Solaris Wisconsin
133 Sun/Solaris Wisconsin
190 Intel/Linux Georgia Tech
94 Intel/Solaris Georgia Tech
54 Intel/Linux Italy (INFN)
25 Intel/Linux New Mexico
12 Sun/Solaris Northwestern
5 Intel/Linux Columbia U.
10 Sun/Solaris Columbia U.
Used tricks to make it look like one Condor pool Flocking Glidein
2510 CPUs total
34http://www.cs.wisc.edu/condor
Workers Over Time
35http://www.cs.wisc.edu/condor
Nug30 solvedWall Clock Time 6 days
22:04:31 hours
Avg # Machines 653
CPU Time 11 years
Parallel Efficiency
93%
36http://www.cs.wisc.edu/condor
The Football Pool Problem
37http://www.cs.wisc.edu/condor
Win By Gambling
Each week, 6 games are played
The outcome of each game is
1. win2. lose3. tie
38http://www.cs.wisc.edu/condor
Bet, and win $$$
• Get 5 of the 6 games correctly predicted, and you win
• What is the minimum number of predictions you must make to guarantee winning?
39http://www.cs.wisc.edu/condor
Known Values
3 5
4 9
5 27
number of games minimum predictions
40http://www.cs.wisc.edu/condor
Problem Description
A covering code An NP Hard problem Many years of research and effort for 6
games leads to65 < minimum number of predictions < 73
An integer programming problem Best solver is the commercial application
CPLEX
41http://www.cs.wisc.edu/condor
Why the Problem is Difficult
Number of tickets possible: 6! x 36
The tree that represents the problem (and solutions) has many isomorphic branches. This makes it difficult to prune the tree.
New techniques have been developed, which leads to reducing the interval of solution
The latest and greatest does many smaller problems using MW
42http://www.cs.wisc.edu/condor
Solution! Not yet. . . The first effort (many CPU years
worth of time) had a very small error in input
Second effort is still in progress. All this to improve the lower bound
from 65 to 70, thereby reducing the range for the solution