lqcd workflow discussion
DESCRIPTION
LQCD Workflow Discussion. Fermilab, 18 Dec 2006 Mike Wilde [email protected]. Virtual Data Origins: The Grid Physics Network. Enhance scientific productivity through… Discovery, application and management of data and processes at all scales - PowerPoint PPT PresentationTRANSCRIPT
Virtual Data Origins:The Grid Physics Network
Enhance scientific productivity through…• Discovery, application and management of
data and processes at all scales• Using a worldwide data grid as a scientific
workstation
The key to this approach is Virtual Data – creating and managing datasets through workflow “recipes” and provenance recording.
Virtual Data WorkflowAbstracts Grid Details
Complex Workflows Run on Grid
Galaxy Image Montage Workflow~1200 node workflow, 7 levels
Mosaic of M42 created onthe Teragrid using PegasusCourtesy NASA and NVO= Data
Transfer= Compute Job
Neuroscience Example• Create a library of typed procedures for popular fMRI
tools, that are pre-installed across Grid sites• Process structured data collections with simple
procedures – automated iteration• Establish a data cataloging scheme with metadata
annotation and search• Provide easy-to-use workflow execution portals –
command line environments that make clusters and Grids appear like workstations
• Work in an environment that is a fully-tracked “clean room” with accurate provenance and powerful search
Example Science Driver:Functional MRI Analysis
3a.h
align_warp/1
3a.i
3a.s.h
softmean/9
3a.s.i
3a.w
reslice/2
4a.h
align_warp/3
4a.i
4a.s.h 4a.s.i
4a.w
reslice/4
5a.h
align_warp/5
5a.i
5a.s.h 5a.s.i
5a.w
reslice/6
6a.h
align_warp/7
6a.i
6a.s.h 6a.s.i
6a.w
reslice/8
ref.h ref.i
atlas.h atlas.i
slicer/10 slicer/12 slicer/14
atlas_x.jpg
atlas_x.ppm
convert/11
atlas_y.jpg
atlas_y.ppm
convert/13
atlas_z.jpg
atlas_z.ppm
convert/15
Workflow courtesy James Dobson, Dartmouth Brain Imaging Center
fMRI Dataset processing
FOREACH BOLDSEQDV reorient (# Process Blood O2 Level Dependent Sequence input = [ @{in: "$BOLDSEQ.img"},
@{in: "$BOLDSEQ.hdr"} ], output = [@{out: "$CWD/FUNCTIONAL/r$BOLDSEQ.img"} @{out: "$CWD/FUNCTIONAL/r$BOLDSEQ.hdr"}], direction = "y", );END
DV softmean ( input = [ FOREACH BOLDSEQ @{in:"$CWD/FUNCTIONAL/har$BOLDSEQ.img"} END ], mean = [ @{out:"$CWD/FUNCTIONAL/mean"} ]);
Spatial normalization of functional runreorientRun
reorientRun
reslice_warpRun
random_select
alignlinearRun
resliceRun
softmean
alignlinear
combinewarp
strictmean
gsmoothRun
binarize
reorient/01
reorient/02
reslice_warp/22
alignlinear/03 alignlinear/07alignlinear/11
reorient/05
reorient/06
reslice_warp/23
reorient/09
reorient/10
reslice_warp/24
reorient/25
reorient/51
reslice_warp/26
reorient/27
reorient/52
reslice_warp/28
reorient/29
reorient/53
reslice_warp/30
reorient/31
reorient/54
reslice_warp/32
reorient/33
reorient/55
reslice_warp/34
reorient/35
reorient/56
reslice_warp/36
reorient/37
reorient/57
reslice_warp/38
reslice/04 reslice/08reslice/12
gsmooth/41
strictmean/39
gsmooth/42gsmooth/43gsmooth/44 gsmooth/45 gsmooth/46 gsmooth/47 gsmooth/48 gsmooth/49 gsmooth/50
softmean/13
alignlinear/17
combinewarp/21
binarize/40
reorient
reorient
alignlinear
reslice
softmean
alignlinear
combine_warp
reslice_warp
strictmean
binarize
gsmooth
Dataset-level workflow Expanded (10 volume) workflow
VDL2 Workflow – Data types and “atomic” procedures
type file {} type fileNames{ file f[]; }
// Procedure to run "R" statistical package
(file t) bricRInvoke (file script, int iteration, file dataAll, file dataPerm){ app { bricRInvoke @filename(script) iteration @filename(dataAll) @filename(dataPerm); }}
// Procedure to run AFNI Clustering tool
(file t, file v) bricCluster (file clusterScript, int iteration, file randBrain, file brainFile, file specFile) { app { bricPerlCluster @filename(clusterScript) iteration @filename(randBrain) @filename(brainFile) @filename(specFile); }}
// Procedure to merge results based on statistical likelihoods
(file t) bricCentralize ( file fn[]) { app { bricCentralize @filenames(fn); } }
VDL2 Workflow – Dataset iteration procedures
// Procedure to iterate over the data collection
(fileNames randCluster, fileNames dsetReturn) brain_cluster () {
int j[]=[1:2000];
file dataAll <fixed_mapper; file="obs.imit.all">; file dataPerm <fixed_mapper; file="perm.matrix.11">; file brainFile <fixed_mapper; file="colin_lh_mesh140_std.pial.asc">; file specFile <fixed_mapper; file="colin_lh_mesh140_std.spec">; file randScript <fixed_mapper; file="script.obs.imit.tibi">; file clusterScript <fixed_mapper; file="surfclust.tibi">;
fileNames randBrain <simple_mapper; prefix="rand.brain.set">;
foreach int i in j { randBrain.f[i] = bricRInvoke(randScript,i,dataAll,dataPerm); file rBrain=randBrain.f[i]; (randCluster.f[i], dsetReturn.f[i]) = bricCluster(clusterScript,i,rBrain, brainFile,specFile); } }
VDL2 Workflow – Main Workflow Program
// Declare datasets
fileNames randBrain <simple_mapper; prefix="rand.brain.set">; fileNames randCluster <simple_mapper; prefix="Tmean.4mm.perm", suffix="_ClstTable_r4.1_a2.0.1D">; fileNames dsetReturn <simple_mapper; prefix="Tmean.4mm.perm", suffix="_Clustered_r4.1_a2.0.niml.dset">; file bricResult <fixed_mapper; file="OUT.TXT">;
// Main program – launches the whole workflow
(randCluster, dsetReturn) = brain_cluster();
bricResult= bricCentralize (randCluster.f);
fMRI Virtual Data QueriesWhich transformations can process a “subject image”?• Q: xsearchvdc -q tr_meta dataType
subject_image input• A: fMRIDC.AIR::align_warp
List anonymized subject-images for young subjects: • Q: xsearchvdc -q lfn_meta dataType subject_image privacy anonymized subjectType young• A: 3472-4_anonymized.img
Show files that were derived from patient image 3472-3:• Q: xsearchvdc -q lfn_tree 3472-3_anonymized.img• A: 3472-3_anonymized.img
3472-3_anonymized.sliced.hdr atlas.hdr atlas.img … atlas_z.jpg 3472-3_anonymized.sliced.img
US-ATLASData Challenge 2
Sep 10
Mid July
CP
U-d
ay
Event generation using Virtual Data
LIGO Inspiral Search Application
• Describe…
Inspiral workflow application is the work of Duncan Brown, Caltech,
Scott Koranda, UW Milwaukee, and the LSC Inspiral group
Virtual Data ApplicationsApplication Jobs / workflow Levels Status
ATLAS
HEP Event Simulation
500K 1 In Use
LIGO
Inspiral/Pulsar
~700 2-5 Inspiral In Use
NVO/NASA
Montage/Morphology
1000s 7 Both In Use
GADU/
BLAST Genomics
40K 1 In Use
fMRI DBIC
AIRSN Image Proc
100s 12 In Devel
QuarkNet
CosmicRay science
<10 3-6 In Use
SDSS
Coadd image proc
40K 2 In Devel
SDSS
Galaxy cluster search
500K 8 CS Research
GTOMO
Image proc
1000s 1 In Devel
SCEC
Earthquake sim
- - In Devel
Dimensions of Provenance Data
Virtual DataCatalog
Virtual datarelationships
Metadataannotations
Derivationlineage
Virtual Data Catalog Schema
dvIDhoststart
durationexitcode
stats
Invocation
nmspacename
version
Call
passes passes
executescalls
binds references
describesuses
includes
nmspacename
version
Procedure
argnametype
direction
FormalArg
argnamevalue
ActualArg
wfidfromDV
toDV
Workflow
nmspacename
Dataset
objectpred
type/valuserdate
Annotation
1
1
1
1
1
1
*
*
*
*
*
1
11
1
1
1
1 describes
Acknowledgements…many thanks to the entire OSG Collaboration and our application science partners in ATLAS, CMS, LIGO, SDSS, UChicago Human Neuroscience Laboratory, SCEC, and Argonne’s Computational Biology and Climate Science Groups of the Mathematics and Computer Science Division.
Workflow research and development:• ISI/USC: Ewa Deelman, Carl Kesselman, Gaurang Mehta, Gurmeet
Singh, Mei-Hui Su, Karan Vahi, Jens Voeckler• U of Chicago: Ben Clifford, Ian Foster, Mihael Hategan, Tibi Stef-
Praun, Gregor von Laszewsk, Mike Wilde, Yong Zhao• www.griphyn.org/vds
GriPhyN and iVDGL are supported by the National Science Foundation
Many of the research efforts involved in this work are supported by the US Department of Energy, office of Science.