planning on the grid
DESCRIPTION
Planning on the Grid. With slides contributed by Ewa Deelman and Yolanda Gil. Thinking about applications of planning. You’ve seen Planning as X, X { SAT, CSP, ILP, …} Now: Y as Planning Y { Grid/Web services composition, …}. - PowerPoint PPT PresentationTRANSCRIPT
Planning on the Grid
With slides contributed by
Ewa Deelman and Yolanda Gil
2USC INFORMATION SCIENCES INSTITUTE
Thinking about applications of planning
You’ve seen Planning as X,
X {SAT, CSP, ILP, …}
Now: Y as Planning
Y {Grid/Web services composition, …}
3USC INFORMATION SCIENCES INSTITUTE
Problem-solving on Grids
Users pool access to distributed resources (computers, instruments, data, ..)
Applications are often composed of separate components run at several locations
Grid middleware tools allow for scheduling jobs, resource discovery. e.g. Globus toolkit
4USC INFORMATION SCIENCES INSTITUTE
The Computational Grid
Emerging computational and networking infrastructure bring together compute resources, data storage system,
instruments, human resources Enable entirely new approaches to applications and problem
solving remote resources the rule, not the exception can solve ever bigger problems
Wide-area distributed computing national and international
Facilitate collaborative environments Sharing of data which can be expensive to produce
(experimentation/simulation)
5USC INFORMATION SCIENCES INSTITUTE
Example: LIGO Experiment(Laser Interferometer Gravitational-Wave Observatory)
Aims to detect gravitational waves predicted by theory of relativity. Can be used to detect
binary pulsars mergers of black holes “starquakes” in neutron stars
Two installations: in Louisiana (Livingston) and Washington State Other projects: Virgo (Italy), GEO (Germany), Tama (Japan)
Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum.
Data collected during experiments is a collection of time series (multi-channel)
Analysis is performed in time and Fourier domains
6USC INFORMATION SCIENCES INSTITUTE
LIGO’s Pulsar Search(Laser Interferometer Gravitational-wave Observatory)
Long time frames
Store
raw channels
Short time frames
Hz
Time
Single Frame
Extract channel
transpose
Time-frequency Image
Find Candidate event DB
archiveIn
terf
ero
mete
r
ShortFourierTransform
Extract frequency range
Construct image
30 minutes
7USC INFORMATION SCIENCES INSTITUTE
Motivation: Using Today’s Grid
Users have high level requirements naturally stated in terms of the application domain Ex: Obtain frequency spectrum for signal S in instrument I and
timeframe T Users have to turn these requirements into executable job
workflows in detailed scripts Users must figure out which code generates desired products,
which files contain it, physical location of the files, hosts that support execution given code requirements, availability of hosts, access policies, etc.
Users must query Grid middleware: metadata catalog, replica locator, resource descriptor and monitoring, etc.
Users must oversee execution
8USC INFORMATION SCIENCES INSTITUTE
Problems with today’s Grid
Usability: users must be proficient in grid computing Complexity: many interrelated choices and dead
ends Solution cost: any-cost solutions are already hard Global cost: optimization necessary when
contention Reliability of execution: job resubmission upon
failure
9USC INFORMATION SCIENCES INSTITUTE
Planning for workflow generation and maintenance
Outline:
Formalization as a planning problem Integration with the grid middleware Case study: planning for workflows in LIGO The grid as a test bed for planning and scheduling
research
10USC INFORMATION SCIENCES INSTITUTE
FFT
FFT filea
/usr/local/bin/fft /home/file1
transfer filea from host1://home/filea
to host2://home/file1
ApplicationDomain
AbstractWorkflow
ConcreteWorkflow
ExecutionEnvironment
host1 host2
Data
Data
host2
App
licat
ion
Dev
elop
men
t and
Exe
cutio
n P
roce
ss
DataTransfer
Resource SelectionData Replica Selection
Transformation InstanceSelection
ApplicationComponentSelection
Retry
Pick different Resources
Specify aDifferentWorkflow
Failure RecoveryMethod
Abstract Workflow
Generation
ConcreteWorkflow
Generation
11USC INFORMATION SCIENCES INSTITUTE
Desiderata for workflow generator
Allow users to refer to data requirements by descriptions, not file names Intuitive, requires far less input
Seek high quality workflows according to variable metric
Model variety of constraints declaratively Data dependencies, resource constraints, user access
rights, ….
12USC INFORMATION SCIENCES INSTITUTE
Planning for workflow generation and maintenance
Outline:
Formalization as a planning problem Integration with the grid middleware Case study: planning for workflows in LIGO The grid as a test bed for planning and scheduling
research
13USC INFORMATION SCIENCES INSTITUTE
Planning for workflow generation
Application components as operators
Desired data as goals
World state includes available hosts, existing data products, network bandwidths, …
14USC INFORMATION SCIENCES INSTITUTE
Existing tools for building workflows:abstract workflow generation
Chimera Input-ouput transforms for files, in ‘Virtual Data Language’:
DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"}, b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd"}, t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q", fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234", fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");
15USC INFORMATION SCIENCES INSTITUTE
Planning operator(operator pulsar-search (preconds (
(<start-time> 7143800) (<channel> LSC-AS-Q) (<fcenter> 0.5) (<right-ascension> 50) (<sample-rate> 20)
…) (and (created “H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd”))
(effects () ( (add
(created “H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.ilwd”)) ) ))
16USC INFORMATION SCIENCES INSTITUTE
Operator with metadata parameters(operator pulsar-search (preconds (
(<start-time> Number) (<channel> Channel) (<fcenter> Number) (<right-ascension> Number) (<sample-rate> Number) (<file> File-Handle) ;; These two are parameters for the frequency-extract. (<f0> (and Number (get-low-freq-from-center-and-band
<fcenter> <fband>))) (<fN> (and Number (get-high-freq-from-center-and-band
<fcenter> <fband>))) …)
(and (forall ((<sub-sft-file-group>
(and File-Group-Handle (gen-sub-sft-range-for-pulsar-search <f0> <fN> <start-time> <end-time>
<sub-sft-file-group>)))) (and (sub-sft-group <start-time> <end-time>
<channel> <instrument> <format><f0> <fN> <sample-rate> <sub-sft-file-group>)
(at <sub-sft-file-group> <host>)))))
(effects () ( (add (created <file>)) (add (pulsar <start-time> <end-time> <channel>
<instrument> <format> <fcenter> <fband>
<fderv1> <fderv2> <fderv3> <fderv4> <fderv5> <right-ascension> <declination> <sample-
rate> <file>))
) ))
17USC INFORMATION SCIENCES INSTITUTE
Operator with host identified(operator pulsar-search (preconds ((<host> (or Condor-pool Mpi))
(<start-time> Number) (<channel> Channel) (<fcenter> Number) (<right-ascension> Number) (<sample-rate> Number) (<file> File-Handle) ;; These two are parameters for the frequency-extract. (<f0> (and Number (get-low-freq-from-center-and-band
<fcenter> <fband>))) (<fN> (and Number (get-high-freq-from-center-and-band
<fcenter> <fband>))) (<run-time> (and Number
(estimate-pulsar-search-run-time <start-time> <end-time> <sample-rate>
<f0> <fN> <host> <run-time>))) …)
(and (available pulsar-search <host>) (forall ((<sub-sft-file-group>
(and File-Group-Handle (gen-sub-sft-range-for-pulsar-search <f0> <fN> <start-time> <end-time>
<sub-sft-file-group>)))) (and (sub-sft-group <start-time> <end-time>
<channel> <instrument> <format><f0> <fN> <sample-rate> <sub-sft-file-group>)
(at <sub-sft-file-group> <host>)))))
(effects () ( (add (created <file>)) (add (at <file> <host>)) (add (pulsar <start-time> <end-time> <channel>
<instrument> <format> <fcenter> <fband>
<fderv1> <fderv2> <fderv3> <fderv4> <fderv5> <right-ascension> <declination> <sample-
rate> <file>))
) ))
18USC INFORMATION SCIENCES INSTITUTE
Planning for workflow generation
Application components as operators Parameters include host: plan is a concrete workflow
Desired data (in descriptive form) as goals
World state includes available hosts, existing data products, network bandwidths, …
19USC INFORMATION SCIENCES INSTITUTE
Operator descriptions
Represent applying a given component at a particular location with fixed parameters, inputs and outputs.
Preconditions combine data dependencies – derive input requirements from outputs Task constraints – e.g. component must be run on an MPI
machine
20USC INFORMATION SCIENCES INSTITUTE
Plan quality
Objective function may include Performance – expected runtime, variance Reliability – probability of failure, expected number
of retries Computational cost – use of ‘expensive’ resources,
conformance to policies
21USC INFORMATION SCIENCES INSTITUTE
Using local heuristics and global metrics
Need local heuristics since search space is intractable e.g. prefer host for program with high-bandwidth connection
to where the output is required
Need to test a global metric (e.g. overall runtime) since local heuristics can lead to globally poor solution Create as many plans as possible, return best Search control to eliminate redundant solutions
22USC INFORMATION SCIENCES INSTITUTE
Example search heuristics
(control-rule only-transfer-from-loc-with-greatest-bandwidth
(if (and (current-ops (transfer-file))
(current-goal (at <file> <dest>))
(true-in-state (at <file> <loc1>))
(true-in-state (at <file> <loc2>))
(higher-bandwidth <loc1> <loc2> <dest>)))
(then reject bindings ((<from-loc> . <loc2>))))
(control-rule prefer-mpi-to-condor-for-pulsar-search
(if (and (current-ops (pulsar-search))
(type-of <mpi> Mpi)
(type-of <condor> Condor-pool)))
(then prefer bindings ((<host> . <mpi>)) ((<host> . <condor>))))
23USC INFORMATION SCIENCES INSTITUTE
Planning for workflow generation and maintenance
Outline:
Formalization as a planning problem Integration with the grid middleware The grid as a test bed for planning and scheduling
research
24USC INFORMATION SCIENCES INSTITUTE
GridGridGrid
workflow executor(DAGman)Execution
WorkflowPlanning
Globus ReplicaLocation Service
Globus Monitoringand Discovery
Service
Information andModels
Metadata CatalogService
Resource Models
detector
Raw data
Co
nc
rete
Wo
rkfl
ow
High-level specs ofdesired results andintermediate data
products
Dy
na
mic
info
rma
tio
n
Request Manager
CurrentState
Generator
Submission andMonitoring System
AI-basedPlanner
25USC INFORMATION SCIENCES INSTITUTE
Generating the planning problem
Currently, static file representation for available hosts, bandwidths
Query grid services prior to planning to find which relevant files exist Future versions will make dynamic queries
Goal is translated from user request, plan is translated into DAG format suitable for grid scheduler.
26USC INFORMATION SCIENCES INSTITUTE
LIGO’s Pulsar Search at SC’02
Used LIGO’s data collected during the first scientific run of the instrument
Targeted a set of 1000 locations: known pulsar or random locations
Results of the analysis published to the LIGO Scientific Collaboration
Performed using LDAS and compute and storage resources at Caltech, University of Southern California, University of Wisconsin Milwaukee.
27USC INFORMATION SCIENCES INSTITUTE
Summary: benefits of planning
Automating workflow composition Just being addressed in Grid middleware
Reasoning with explicit descriptions of data More intuitive for users Far fewer inputs required than at file level
Better workflows by searching many plans
28USC INFORMATION SCIENCES INSTITUTE
Planning for workflow generation and maintenance
Outline:
Existing Grid tools for workflow generation Formalization as a planning problem Integration with the grid middleware The grid as a test bed for planning and
scheduling research
29USC INFORMATION SCIENCES INSTITUTE
Many areas of planning research relevant for grid
Planning for a dynamic environment: plan monitoring and repair, planning under uncertainty
Scheduling: resource reasoning, temporal reasoning Plan quality: learning, acquiring preferences, local
search planning Planning for information gathering: integrating access
to grid services with workflow creation Domain modeling: handling multiple ontologies,
acquiring metadata descriptions, acquiring operators
30USC INFORMATION SCIENCES INSTITUTE
Fault-tolerant planning for a dynamic environment
Grid resources become unavailable, queue length & network bandwidth change
Exploring plan repair strategies, balance of work done off-line and on-line
Modeling failures, keeping statistics for creating plans more likely to succeed, conditional plans, ..
31USC INFORMATION SCIENCES INSTITUTE
Fault-tolerant straw men
1. Current version: build fully detailed plan offline, resource allocation is fixed Ignores world dynamics
2. Build abstract plan (without specifying hosts) offline, use a matchmaker online Matchmaker makes local decisions only
32USC INFORMATION SCIENCES INSTITUTE
Global reasoning is needed for resource allocation
Start
B (1)
C (5)
A (3)
Finish
33USC INFORMATION SCIENCES INSTITUTE
Approaches for fault-tolerant planning in dynamic domains
RAX (Jonsson et al.) general framework. As implemented: offline: builds complete plan
online: adjusts temporal intervals
Combining planning and scheduling offline: build several abstract plans
online: reason about critical path to instantiate each plan
MDP/POMDP approaches
Open area..
34USC INFORMATION SCIENCES INSTITUTE
Challenge: understanding when different approaches are more important
Hypotheses: Uneven task distribution, in terms of computational and data
expense and resource constraints will indicate global planning
Time-dependency, e.g. need to re-plan during execution, will indicate local planning
Interesting project: use experiments in synthetic and real domains to test hypotheses and uncover new insights
35USC INFORMATION SCIENCES INSTITUTE
Empirical tests with synthetic LIGO problems
Example: Problem requires 100 files on one machine. Vary the number that exist.
distribution - 1 machine
300
400
500
600
700
800
no of files
run
-tim
e
min
max
p-max
g-max
avg
36USC INFORMATION SCIENCES INSTITUTE
Domain modeling
Monolithic planner
Knowledge from several sources must be used
Current system:
Info from Grid services(RLS, MCS etc)
State info (files, resources)
Grid task schedulers
Concrete tasks
KBs combinedin one location
task requirements
existing data in files
available resources
Comp. selector
Resource selector
resourcepolicies
Resourcequeues
Exec.monitor
Networkbandwidth
Userpolicies
37USC INFORMATION SCIENCES INSTITUTE
Where does knowledge used by our planners come from?
(Operator …
(preconditions
..
))
(effects
..
))data
dependencies(VDL*)
task resource
requirements
user policies & preferences
resource policies
Each knowledge component is used for other purposes beyond planning
38USC INFORMATION SCIENCES INSTITUTE
Automatically generated operators for several application domains
(Operator …
(preconditions
..
))
(effects
..
))datadependencies(VDL*)
task resourcerequirements
policies
Investigating patterns of data descriptions for more efficient planning
Digital sky surveyLIGOGEOGalaxy morphologyTomography
{
39USC INFORMATION SCIENCES INSTITUTE
Question: if operators are gathered from distributed services, can we still guarantee soundness and completeness?
Under what kinds of conditions?
40USC INFORMATION SCIENCES INSTITUTE
Representing appropriate information units with metadata
E.g. Have 60,000 files, want to allocate 60 tasks each dealing with 1,000 files.
Previously, application components specified in terms of specific files:
DV run59000->extractSFTData( input=[@{input:“nSFT.59000"},…,@{input:”nSFT.59999”}],
output=[@{output:” eSFT.59000”},…,@{output:”eSFT.59999”}],
t1="714384000", t2="714384063", freq=“1008”,band=“4”,instrument="H2");
… 59 similar clauses…
DV final->computeFStatistic( input=[@{input:”eSFT.00000”},…,@{input:”eSFT.59999”}],…);
1000 files
60000 files
41USC INFORMATION SCIENCES INSTITUTE
Metadata representation
Replace with two clauses, two input predicates A predicate now represents a range of files Simpler to model, greater generality, more efficient for reasoner
(operator run-extractSFTData-range (preconds ((<begin-file> Number) (<number-of-files> (and Number (> <number-of-files> 0))) (<local-begin-file> (and Number
(gen-smaller-number <number-of-files> 1000 <begin-file>))))
(and (range "eSFT" <begin-file> 2 1 <local-begin-file>) (range "nSFT" <local-begin-file> 2 1 999))) (effects ()
((add (range "eSFT" <begin-file> 2 <number-of-files>)))))
42USC INFORMATION SCIENCES INSTITUTE
Requires library operators for ranges
E.g. if a range of files exists, then so does any subrange
Questions: what are the required operators? Similar to spatial calculus RCC-8?
(operator subranges-exist (preconds ((<begin-file> Number) (<type> Object) (<number-of-files> (and Number (> <number-of-files> 0))) (<enclosing-begin> (and Number (gen-known-enclosing-begins <type> <begin-file> 2 1 <number-of-files>))) (<enclosing-number-of-files> (and Number (gen-known-enclosing-number-of-files <type> <enclosing-begin>
2 1 <number-of-files> <begin-file>))))
(created-range <type> <enclosing-begin> 2 1 <enclosing-number-of-files>)) (effects ()
((add (created-range <type> <begin-file> 2 1 <number-of-files>)))))
43USC INFORMATION SCIENCES INSTITUTE
Conclusions
Implemented system takes data description requests from LIGO users, composes workflow and executes on the Grid
Planning and scheduling technologies can make a large contribution to Grid infrastructure
Many interesting challenges for planning and scheduling research from Grid applications
http://www.isi.edu/ikcap/cognitive-grids
http://www.isi.edu/~deelman/pegasus.htm
44USC INFORMATION SCIENCES INSTITUTE
Koehler and Srivastava
Different approaches to specifying workflows by hand
45USC INFORMATION SCIENCES INSTITUTE
WSDL service specification(no workflow specified)<definitions targetNamespace="http://..."xmlns="http://schemas.xmlsoap.org/wsdl/"><message name = "OrderEvent"></message><message name = "TripRquest"></message><message name = "FlightRequest"></message><message name = "HotelRequest"></message><message name = "BookingFailure"></message><portType name ="pt1"><operation name ="CToCI"><input message ="TripRequest"/></operation></portType><portType name ="pt2"><operation name ="CIToHS"><output message ="HotelRequest"/></operation></portType><portType name ="pt3"><operation name ="CIToFS"><output message ="FlightRequest"/></operation></portType>...<portType name ="pt9"><operation name ="RIToFS"><output message ="BookingFailure/></operation></portType></definitions>
46USC INFORMATION SCIENCES INSTITUTE
BPEL4WS
<sequence><receive partner="Customer"portType ="pt1"operation ="CToCI"container ="OrderEvent"></receive><flow><invoke partner ="HotelService"portType ="pt2"operation ="CIToHS"inputContainer ="HotelRequest"></invoke><invoke partner ="FlightService"portType ="pt3"operation ="CIToFS"inputContainer ="FlightRequest"></invoke></flow>
47USC INFORMATION SCIENCES INSTITUTE
Golog
48USC INFORMATION SCIENCES INSTITUTE
Back-up slides
49USC INFORMATION SCIENCES INSTITUTE
What is Needed
We need alternative foundations that offer expressive representations flexible reasoners
Many Artificial Intelligence (AI) techniques are relevant: Planning to achieve given requirements Searching through problem spaces of related choices Using and combining heuristics Expressive knowledge representation languages Reasoners that can incorporate rules, definitions, axioms,
etc. Schedulers and resource allocation techniques
50USC INFORMATION SCIENCES INSTITUTE
Existing tools for building workflows:abstract workflow generation
Chimera Input-ouput transforms at level of actual files, in ‘Virtual Data
Language’:
DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"}, t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q", instrument="H2");
DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"}, t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q", instrument="H2");
DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"}, b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"}, t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q", fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234", fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");
51USC INFORMATION SCIENCES INSTITUTE
Existing tools for building workflows:abstract workflow generation
Chimera Input-ouput transforms for files, in ‘Virtual Data Language’:
DV first1->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384000_64.gwf"}, t1="714384000", t2="714384063", format="frame", channel="H2:LSC-AS-Q", instrument="H2");
DV first2->createSFT( b=@{output:"H2_SFT_LSC-AS-Q_714384064_64.gwf"}, t1="714384064", t2="714384127", format="frame", channel="H2:LSC-AS-Q", instrument="H2");
DV third1->pulsar(a=@{input:"H2_sSFT_LSC-AS-Q_714384000_256_50_1.ilwd"}, b=@{output:"H2_pulsar_LSC-AS-Q_714384000_256_50.5_0.004_3.123643_+2.56234.ilwd"}, t1="714384000", t2="714384255", format="ilwd", channel="LSC-AS-Q", fcenter="50.5", fband="0.004", instrument="H2", ra="3.123643", de="+2.56234", fderv1="0.0", fderv2="0.0", fderv3="0.0", fderv4="0.0", fderv5="0.0");
52USC INFORMATION SCIENCES INSTITUTE
Existing tools 2: concrete planner
Assigns specific hosts and data locations for tasks Makes random selection of resources and data Provided a feasible solution Reused existing data products
INPUT: OUTPUT:F.a
F.b2F.b1
F.c2F.c1
F.d
Extract
DecimateResample
Concat
Gridftp host://f.a ….lumpy.isi.edu/nfs/temp/f.a
F.c2
F.c1
Register /F.d at home/malcolm/f2
lumpy.isi.edu://usr/local/bin/extract
Jet.caltech.edu://home/malcom/resample -I /home/malcolm/F.b1
Concat
DataTransferNodes
ReplicaCatalog
RegistrationNodes
53USC INFORMATION SCIENCES INSTITUTE
Sample Pulsar Search Results to Date
SC 2002 run: Over 58 pulsar searches Total of
330 tasks 469 data transfers 330 output files produced.
The total runtime was 11:24:35.
To date: 185 pulsar searches Total of
975 tasks 1365 data transfers 975 output files
Total runtime96:49:47