virtual data tools status update
DESCRIPTION
Virtual Data Tools Status Update. ATLAS Grid Software Meeting BNL, 6 May 2002 Mike Wilde Argonne National Laboratory An update on work by Jens Voeckler, Yong Zhao, Gaurang Mehta, and many others. The Virtual Data Model. Data suppliers publish data to the Grid - PowerPoint PPT PresentationTRANSCRIPT
Virtual Data ToolsStatus UpdateATLAS Grid Software Meeting
BNL, 6 May 2002
Mike Wilde
Argonne National Laboratory
An update on work by Jens Voeckler, Yong Zhao, Gaurang Mehta, and many others.
2
Data suppliers publish data to the Grid Users request raw or derived data from Grid,
without needing to know– Where data is located
– Whether data is stored or computed on demand
User and applications can easily determine– What it will cost to obtain data
– Quality of derived data
Virtual Data Grid serves requests efficiently, subject to global and local policy constraints
The Virtual Data Model
3
pythia_input
pythia.exe
cmsim_input
cmsim.exe
writeHits
writeDigis
begin v /usr/local/demo/scripts/cmkin_input.csh file i ntpl_file_path file i template_file file i num_events stdout cmkin_param_fileend
begin v /usr/local/demo/binaries/kine_make_ntpl_pyt_cms121.exe pre cms_env_var stdin cmkin_param_file stdout cmkin_log file o ntpl_fileend
begin v /usr/local/demo/scripts/cmsim_input.csh file i ntpl_file file i fz_file_path file i hbook_file_path file i num_trigs stdout cmsim_param_fileend
begin v /usr/local/demo/binaries/cms121.exe condor copy_to_spool=false condor getenv=true stdin cmsim_param_file stdout cmsim_log file o fz_file file o hbook_fileend
begin v /usr/local/demo/binaries/writeHits.sh condor getenv=true pre orca_hits file i fz_file file i detinput file i condor_writeHits_log file i oo_fd_boot file i datasetname stdout writeHits_log file o hits_dbend
begin v /usr/local/demo/binaries/writeDigis.sh pre orca_digis file i hits_db file i oo_fd_boot file i carf_input_dataset_name file i carf_output_dataset_name file i carf_input_owner file i carf_output_owner file i condor_writeDigis_log stdout writeDigis_log file o digis_dbend
CMS Pipeline in VDL
4
Virtual Data for Real Science:A Prototype Virtual Data Catalog
Virtual DataCatalog
(PostgreSQL)
Local FileStorage
Virtual DataLanguage
VDLInterpreter
(VDLI)GSI
GSI
GSI
Job Execution SiteU of Chicago
GridFTPClient
GlobusGRAM
Co
nd
or
Po
ol
Job Execution SiteU of Wisconsin
GridFTPClient
GlobusGRAM
Co
nd
or
Po
ol
Job Execution SiteU of Florida
GridFTPClient
GlobusGRAM
Co
nd
or
Po
ol
JobSumissionSitesANL, SC,…
Condor-GAgent
GlobusClient
GridFTPServer
Grid testbed
Simulate Physics
Simulate CMS Detector
Response
Copy flat-fileto OODBMS
Simulate Digitizationof Electronic Signals
Production DAG of Simulated CMS Data:
Architecture of the System:
5
Cluster-finding Data Pipelinecatalog
cluster
5
4
core
brg
field
tsObj
3
2
1
brg
field
tsObj
2
1
brg
field
tsObj
2
1
brg
field
tsObj
2
1
core
3
6
Virtual Data Tools Virtual Data API
– A Java class hierarchy to represent transformations and derivations
Virtual Data Language– Textual for illustrative examples– XML for machine-to-machine interfaces
Virtual Data Database– Makes the objects of a virtual data definition
persistent Virtual Data Service
– Provides an OGSA interface to persistent objects
7
Languages
VDLt – textual version– mainy for documentation for now
– May eventually implement a ytranslator
– Can dump data structures in this representation VDLx – XML version – app-to-VDC interchange
– Useful for bulk data entry – catalog import-export aDAGx – XML version of abstract DAG cDAG – actual DAGman DAG
8
Components and Interfaces
Java API– Manage Catalog objects (tr,dv, args…)
– Create / Locate / Update / Delete
– Same API at client and within server
– Can embed Java classes in an App for now Virtual Data Catalog Server
– Web (eventually OGSA)
– SOAP interface mirrors Java API operations XML processor Database – managed by VDCS
9
System Architecture
Client App Virtual Data
Catalog Service
Virtual DataCatalog Objects
Virtual DataCatalog Database
Clie
nt A
PI
10
Initial Release Architecture
Client App
Virtual DataCatalog Objects
Virtual DataCatalog Database
Client API
11
Applicaton interfaces
Invoke Java client API (to make OGSA calls) Invoke Java server API (for now, embed
VDC processing directly in App Make OGSA calls directly Formulate XML (VDLx) to load the catalog
or request derivations
12
Example VDL-Text
TR t1( output a2, input a1, none env="100000", none pa="500" )
{ app = "/usr/bin/app3";
argument parg = "-p "${none:pa};
argument farg = "-f "${input:a1};
argument xarg = "-x -y ";
argument stdout = ${output:a2};
profile env.MAXMEM = ${none:env};
}
13
Example Derivation
DV t1 (
a2=@{output:run1.exp15.T1932.summary},
a1=@{input:run1.exp15.T1932.raw},
env="20000", pa="600“
);
14
Derivations with dependencies
TR trans1( output a2, input a1 ){ app = "/usr/bin/app1"; argument stdin = ${input:a1}; argument stdout = ${output:a2};}TR trans2( output a2, input a1 ){ app = "/usr/bin/app2"; argument stdin = ${input:a1}; argument stdout = ${output:a2};}DV trans1( a2=@{output:file2}, a1=@{input:file1});DV trans2( a2=@{output:file3}, a1=@{output:file2});
15
Expressing Dependencies
generate f.a
findrange
findrange f.b
f.c
analyze f.d
16
Define the transformations
TR generate( output a ){ app = "generator.exe";
argument stdout = ${output:a2};
TR findrange( output b, input a, none p="0.0" ){
app = "ranger.exe"; argument arg = "-i "${:p}; argument stdin = ${output:a}; argument stdout = ${output:b};}
TR default.analyze( input a[], output c ){ pfnHint vanilla = "analyze.exe"; argument files = ${:a}; argument stdout = ${output:a2};}
17
Derivations forming a DAG
DV generate( a=@{output:f.a} );DV findrange( b=@{output:f.b},
a=@{input:f.a}, p="0.5" );DV findrange( b=@{output:f.c}, a=@{input:f.a}, p="1.0" );DV analyze( a=[ @{input:f.b}, @{input:f.c} ],
c=@{output:f.d} );
18
Virtual Data Class Diagram
Diagram by Jens Voeckler
19
Virtual Data Catalog Structure
20
Virtual Data Language - XML
21
VDL Searches Locate the derivations that can produce a
specific lfn General queries for catalog maintenance Locate transforms that can produce a
specific file type (what does a type mean in this context?)
22
Virtual Data Issues
Param file support Param structures Sequences Virtual datasets
23
Execution Environment Profile
Condor / DAGman / GRAM / WP1 Concept of a EE driver
– Allows plug-in of DAG generating code for: DAGman, Condor, GRAM, WP1 JM/RB
Execution Profile: Global, User/Group, Transformation, Derviation , Invocation
24
First Release – June 2002
Java Catalog Classes XML import – export Textual VDL formatting DAX – (abstract) DAG in XML Simple planner for constrained Grid
– Will generate Condor DAGs
25
Next Releases - Features
RLS Integration Compound Transformations Database persistency OGSA Service Other needed clients: C, TCL, ? Expanded execution profiles / planners
– Support for WP1 scheduler / broker
– Support for generic RSL-based schedulers
26
Longer-term Feature Preview
Instance tracking Virtual files and virtual transformations Multi-modal data Structured namespaces Grid-wide distributed catalog service Metadata database integration Knowledge-base integration
29
SDSS Extension:Dynamic Dependencies
Data is organized into spacial cells Scope of search is not known until run time In this case – nearest 9 or 25 cells to a centroid Need a dynamic algorithmic spec for what the
range of cells to process is – a nested loop that generates the actual file names to examine.
In complex cases, might be a sequence of such centroid-based sequences.
30
LIGO Example
Consider 3 (fictitious) channels: c, p, t Operations are extract and concatenate ex –i a –s t0 –e tb >ta ex –i e –s te –e t1 >te cat ta b c d te | filter exch p <a –s t0 –e t1 filter –v p,t Examine whether derived metadata handles this concept
t0 t1
a edcb
tetdtctbta tf
31
Distributed Virtual Data Service Will parallel the service architecture
of the RLS …but probably can’t use soft-state approach –
needs consistency; can accept latency Need a global name space for collaboration-wide
information and knowledge sharing May use distributed database technology below the
covers Will leverage a distributed, structured namespce Preliminary – not yet designed
32
Distributed Virtual Data Service
apps
Tier 1 centers
Regional Centers
Local sites
VDC
VDCVDC
VDC
Distributed virtual data service
33
End of presentation
34
Supplementary Material
35
Knowledge Management Architecture
Knowledge based requests are formulated in terms of science data– Eg, Give me this transform of channels c,p,&t over time range t0-t1
Finder finds the data files– Translates range “t0-t1” into a set of files
Coder creates an execution plan and defines derivations from known transformations– Can deal with missing files (e.g, file c in LIGO example)
K-B request is formulated in terms of virtual datasets Coder translates into logical files Planner trans;ates into physical files
Coderknowledge-
basedrequest
Finder
MetadataCatalog
Virtual DataCatalog
PlanneraDAG
36
NCSA Linux cluster
5) Secondary reports complete to master
Master Condor job running at
Caltech
7) GridFTP fetches data from UniTree
NCSA UniTree - GridFTP-enabled FTP server
4) 100 data files transferred via GridFTP, ~ 1 GB each
Secondary Condor job on WI
pool
3) 100 Monte Carlo jobs on Wisconsin Condor pool
2) Launch secondary job on WI pool; input files via Globus GASS
Caltech workstation
6) Master starts reconstruction jobs via Globus jobmanager on cluster
8) Processed objectivity database stored to UniTree
9) Reconstruction job reports complete to master
User View of the Virtual Data Grid
Scott Koranda, Miron Livny, others
37
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Data: 0.5 MB 175 MB 275 MB 105 MB
SC2001 Demo Version:
pythia cmsim writeHits writeDigis
1 run = 500 events
1 run
1 run
1 run
1 run
1 event
CPU: 2 min 8 hours 5 min 45 min
truth.ntpl hits.fz hits.DB digis.DB
Production Pipeline GriphyN-CMS Demo
38
GriPhyN: Virtual DataTracking Complex Dependencies
Dependency graph is:
– Files: 8 < (1,3,4,5,7), 7 < 6, (3,4,5,6) < 2
– Programs: 8 < psearch, 7 < summarize,(3,4,5) < reformat, 6 < conv, (1,2) < simulate
simulate –t 10 …
file1
file2reformat –f fz …
file1file1File3,4,5
psearch –t 10 …
conv –I esd –o aodfile6 summarize –t 10 …
file7
file8
Requestedfile
39
Re-creating Virtual Data
To recreate file 8: Step 1
– simulate > file1, file2
simulate –t 10 …
file1
file2reformat –f fz …
file1file1File3,4,5
psearch –t 10 …
conv –I esd –o aodfile6 summarize –t 10 …
file7
file8
Requestedfile
40
Re-creating Virtual Data
To re-create file8: Step 2
– files 3, 4, 5, 6 derived from file 2
– reformat > file3, file4, file5
– conv > file 6
simulate –t 10 …
file1
file2reformat –f fz …
file1file1File3,4,5
psearch –t 10 …
conv –I esd –o aodfile6 summarize –t 10 …
file7
file8
Requestedfile
41
Re-creating Virtual Data
To re-create file 8: step 3
– File 7 depends on file 6
– Summarize > file 7
simulate –t 10 …
file1
file2reformat –f fz …
file1file1File3,4,5
psearch –t 10 …
conv –I esd –o aodfile6 summarize –t 10 …
file7
file8
Requestedfile
42
Re-creating Virtual Data
To re-create file 8: final step
– File 8 depends on files 1, 3, 4, 5, 7
– psearch < file1, file3, file4, file5, file 7 > file 8
simulate –t 10 …
file1
file2
psearch –t 10 …
reformat –f fz …
conv –I esd –o aod
file1file1File3,4,5
file6 summarize –t 10 …
file7
file8
Requestedfile
43
SDSS Galaxy Cluster Finding
44
Cluster-finding Grid
Work of: Yong Zhao, James Annis, & others
45
Cluster-finding pipeline execution
46
Virtual Data in CMS
Virtual Data Long Term Vision of CMS: CMS Note 2001/047, GRIPHYN 2001-16
47
CMS Data Analysis
100b 200b
5K 7K
100K
50K
300K100K
100K
50K
100K200K
100K
100b 200b
5K 7K
100K
50K
300K100K
100K
50K
100K200K
100K
Tag 2
Jet finder 2
Jet finder 1
ReconstructionAlgorithm
Tag 1
Calibration data
Raw data(simulated
or real)
Reconstructeddata
(produced by physics
analysis jobs)
Event 1 Event 2 Event 3
Uploaded data Virtual data Algorithms
Dominant use of Virtual Data in the Future
48
Topics – Planner Does the planner have a queue? What does
presence and absence of queue imply? How is responsibility between planner and the
executor (cluster scheduler) partitioned? How does planner estimate times if it only has
partial responsibility for when/where things run? How does a cluster sched assign CPUs – dedicated
or shared? See Mirons email on NeST for more Qs Use of a Execution profiler in the planner arch?
– Characterize the resource requirements of an app over time
– Parameterize the res reqs of an app w.r.t its (salient) parameters
49
Planner Context
Map of grid resources Status of grid resources
– State (up/down)
– Load
– Dedication (commitment of resource to VO or group based on policy)
Policy Request Queue (w/ lookahead, or process
sequentially?)
50
CAS and SAS
Site Authorization Service– How does a physical site control the policy by which
its resources get used?
– How does a SAS and a CAS interact?
– Can a resource inerpret restructed proxies from multiple CAS’s? (Yes, but not from arbitrary CASes)
– Consider MPI and MPICH-G jobs – how would the latter be handled?
– Consider: if P2 schedules a whole DAG up front – causes schedule to use outdated information
51
Planner Architecture
S1A
sharedSE
C C
LRC
SAS-1
ooo
Site 1
VO-A
CAS-A Planner 1
Virtual Data Service vdb
Replica Location Service RIS
SnA
sharedSE
C C
LRC
S2A
sharedSE
C C
LRC
S1C
sharedSE
C C
LRC
VO-C
CAS-C
52
Policy
Focuses on Security and Configuration (controlled resource sharing/allocation)
Allocation example:– “cms should get 90% of the resources at
Caltech”
– Issues of fair share scheduling How to factor in time quanta:CPU-hours;
GB-Days Relationship to accounting
53
Policy and the Planner
Planner considers:– Policy (fairly static, from CAS/SAS)
– Grid status
– Job (user/group) resource consumptn history
– Job profiles (resources over time) from Prophesy
planner
policy
AccountingRecords
Status
Job Usageinfo
Job ProfileRecords
Prohphesy(predictor)
Job ProfilingData
54
GriPhyN/PPDGData Grid Architecture
Application
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
DAG (concrete)
DAG (abstract)
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI, CAS
MDS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
55
(evolving) View of Data Grid Stack
Data Transport(GridFTP)
Storage Element
Local Repl Catalog(Flat or Hierarchical)
Reliable FileTransfer
Replica LocationService
Publish-SubscribeService (GDMP)
StorageElementManager
Reliable Replication
56
Executor Example: Condor DAGMan
Directed Acyclic Graph Manager
Specify the dependencies between Condor jobs using DAG data structure
Manage dependencies automatically– (e.g., “Don’t run job “B” until job “A” has completed
successfully.”)
Each job is a “node” in DAG
Any number of parent or children nodes
No loops
Job A
Job B Job C
Job D
Slide courtesy Miron Livny, U. Wisconsin
57
Executor Example: Condor DAGMan (Cont.)
DAGMan acts as a “meta-scheduler” – holds & submits jobs to the Condor queue at the
appropriate times based on DAG dependencies
If a job fails, DAGMan continues until it can no longer make progress and then creates a “rescue” file with the current state of the DAG– When failed job is ready to be re-run, the rescue file is
used to restore the prior state of the DAG
DAGMan
CondorJobQueue
C
D
B
C
B
A
Slide courtesy Miron Livny, U. Wisconsin
58
Abstract DAG– Represents user requests
– Simplest case: request for one or more data product
– Complex case: request execution of a chained set of applications
– No file or execution locations need be present
Concrete DAG– Specifies any application invocations needed to derive
data
– Specifes locations of all invocations (to the site level)
– Includes explicit job steps to move data
DAG Usage
59
Strawman Architecture
VDLX VDCStrawman
Planner 1aDAG
Planner 2
cDAG(concrete
DAGman dag)
60
The GriPhyN Charter
“A virtual data grid enables the definition and delivery of a potentially unlimited virtual space of data products derived from other data. In this virtual space, requests can be satisfied via direct retrieval of materialized products and/or computation, with local and global resource management, policy, and security constraints determining the strategy used.”
61
GriPhyN-LIGO SC2001 Demo
Desired Result
:
Single channel time series
HTTP
frontend
MyProxyserver
ReplicaCatalog
ExecutorCondorG/DAGMan
Planner Monitoring
TransformationCatalog
GridFTP GRAM/LDAS
LDAS at UWMGridCVS
Logs
SC floor
GridFTP
ComputeResource
GRAM
xml
Cgi interface
G-DAG (DAGMan)
GridFTP GRAM/LDAS
LDAS at CaltechUWM
GridFTP
UWM
GridFTP
ReplicaSelection
Frame
In integration
Prototype exclusive
In design
Globus component
62
GriPhyN CMS SC2001 Demo
Full Event Database of ~100,000
large objects
Full Event Database of
~40,000 large objects
“Tag” database of ~140,000
small objects
RequestRequest
Parallel tuned GSI FTP Parallel tuned GSI FTP
Bandwidth Greedy Grid-enabled Object Collection Analysisfor Particle Physics
http://pcbunn.cacr.caltech.edu/Tier2/Tier2_Overall_JJB.htm
63
Virtual Datain Action
?
Major Archive Facilities
Network caches & regional centers
Local sites
Data request may Access local data Compute locally Compute remotely Access remote
data Scheduling &
execution subject to local & global policies