![Page 1: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/1.jpg)
GRID superscalar: a programming model for the Grid
Raül Sirvent PardellAdvisor: Rosa M. Badia Sala
Doctoral ThesisComputer Architecture Department
Technical University of Catalonia
![Page 2: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/2.jpg)
GRID superscalar: a programming model for the Grid
2
Outline
1. Introduction2. Programming interface3. Runtime4. Fault tolerance at the programming model level5. Conclusions and future work
![Page 3: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/3.jpg)
GRID superscalar: a programming model for the Grid
3
Outline
1. Introduction1.1 Motivation1.2 Related work1.3 Thesis objectives and contributions
2. Programming interface3. Runtime4. Fault tolerance at the programming model level5. Conclusions and future work
![Page 4: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/4.jpg)
GRID superscalar: a programming model for the Grid
4
1.1 Motivation
The Grid architecture layers
Applications
Grid Middleware
(Job management, Data transfer,
Security, Information, QoS, ...)
Distributed Resources
![Page 5: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/5.jpg)
GRID superscalar: a programming model for the Grid
5
1.1 Motivation
What middleware should I use?
![Page 6: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/6.jpg)
GRID superscalar: a programming model for the Grid
6
1.1 Motivation
Programming tools: are they easy?
VS.Grid AWARE Grid UNAWARE
GRID
![Page 7: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/7.jpg)
GRID superscalar: a programming model for the Grid
7
1.1 Motivation
Can I run my programs in parallel?
VS.Explicit
parallelismImplicit
parallelism
fork
join
for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j))
…
Draw it by
hand means
explicit
![Page 8: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/8.jpg)
GRID superscalar: a programming model for the Grid
8
1.1 Motivation
The Grid: a massive, dynamic and heterogeneous environment prone to failures– Study different techniques to detect and overcome
failures
Checkpoint Retries Replication
![Page 9: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/9.jpg)
GRID superscalar: a programming model for the Grid
9
1.2 Related work
System / Features
Grid unawareImplicit
parallelismLanguage
Triana No No Graphical
Satin Yes No Java
ProActive Partial Partial Java
Pegasus Yes Partial VDL
Swift Yes Partial SwiftScript
![Page 10: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/10.jpg)
GRID superscalar: a programming model for the Grid
10
1.3 Thesis objectives and contributions Objective: create a programming model
for the Grid– Grid unaware– Implicit parallelism– Sequential programming– Allows to use well-known imperative languages– Speed up applications– Include fault detection and recovery
![Page 11: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/11.jpg)
GRID superscalar: a programming model for the Grid
11
1.3 Thesis objectives and contributions Contribution: GRID superscalar
– Programming interface– Runtime environment– Fault tolerance features
![Page 12: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/12.jpg)
GRID superscalar: a programming model for the Grid
12
Outline
1. Introduction2. Programming interface
2.1 Design2.2 User interface2.3 Programming comparison
3. Runtime4. Fault tolerance at the programming model level5. Conclusions and future work
![Page 13: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/13.jpg)
GRID superscalar: a programming model for the Grid
13
2.1 Design
Interface objectives– Grid unaware
– Implicit parallelism– Sequential programming– Allows to use well-known imperative
languages
![Page 14: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/14.jpg)
GRID superscalar: a programming model for the Grid
14
2.1 Design
Target applications– Algorithms which may be easily splitted in tasks
• Branch and bound computations, divide and conquer algorithms, recursive algorithms, …
– Coarse grained tasks– Independent tasks
• Scientific workflows, optimization algorithms, parameter sweep
– Main parameters: FILES• External simulators, finite element solvers, BLAST, GAMESS
![Page 15: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/15.jpg)
GRID superscalar: a programming model for the Grid
15
2.1 Design
Application’s architecture: a master-worker paradigm– Master-worker parallel paradigm fits with our objectives– Main program: the master– Functions: workers
• Function = Generic representation of a task
– Glue to transform a sequential application into a master-worker application: stubs – skeletons (RMI, RPC, …)
• Stub: call to runtime interface• Skeleton: binary which calls to the user function
![Page 16: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/16.jpg)
GRID superscalar: a programming model for the Grid
16
2.1 Design
app.c
app-functions.c
for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j))
void matmul(char *f1, char *f2, char *f3){ getBlocks(f1, f2, f3, A, B, C); for (i = 0; i < A->rows; i++) { for (j = 0; j < B->cols; j++) { for (k = 0; k < A->cols; k++) { C->data[i][j] += A->data[i][k] * B->data[k][j]; putBlocks(f1, f2, f3, A, B, C);}
Local scenario
![Page 17: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/17.jpg)
GRID superscalar: a programming model for the Grid
17
2.1 Design
Middleware
Master-Worker paradigm
app.c
app-functions.capp-functions.capp-functions.capp-functions.capp-functions.capp-functions.c
![Page 18: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/18.jpg)
GRID superscalar: a programming model for the Grid
18
2.1 Design
Intermediate language concept: assembler code
In GRIDSs
The Execute generic interface– Instruction set is defined by the user– Single entry point to the runtime– Allows easy building of programming language bindings
(Java, Perl, Shell Script)• Easier technology adoption
C, C++, … Assembler Processor execution
C, C++, … Workflow Grid execution
![Page 19: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/19.jpg)
GRID superscalar: a programming model for the Grid
19
2.2 User interface
Steps to program an application– Task definition
• Identify those functions/programs in the application that are going to be executed in the computational Grid
• All parameters must be passed in the header (remote execution)
– Interface Definition Language (IDL)• For every task defined, identify which parameters are
input/output files and which are input/output scalars
– Programming API: master and worker• Write the main program and the tasks using GRIDSs API
![Page 20: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/20.jpg)
GRID superscalar: a programming model for the Grid
20
interface MATMUL {void matmul(in File f1, in File f2, inout File f3);
};
Interface Definition Language (IDL) file
– CORBA-IDL like interface:• in/out/inout files• in/out/inout scalar values
– The functions listed in this file will be executed in the Grid
2.2 User interface
![Page 21: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/21.jpg)
GRID superscalar: a programming model for the Grid
21
2.2 User interface
Programming API: master and worker
Master sideGS_OnGS_OffGS_FOpen/GS_FCloseGS_Open/GS_CloseGS_BarrierGS_Speculative_End
app.c app-functions.c
Worker sideGS_Systemgs_resultGS_Throw
![Page 22: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/22.jpg)
GRID superscalar: a programming model for the Grid
22
2.2 User interface
Task’s constraints and cost specification– Constraints: allow to specify the needs of a task (CPU,
memory, architecture, software, …)• Build an expression in a constraint function (evaluated for
every machine)
– Cost: estimated execution time of a task (in seconds)• Useful for scheduling• Calculate it in a cost function• GS_GFlops / GS_Filesize may be used• An external estimator can be also called
other.Mem == 1024
cost = operations / GS_GFlops();
![Page 23: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/23.jpg)
GRID superscalar: a programming model for the Grid
23
2.3 Programming comparison
Globus vs GRIDSs
int main(){ rsl = "&(executable=/home/user/sim)(arguments=input1.txt output1.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input1.txt home/user/input1.txt))(file_stage_out=/home/user/output1.txt gsiftp://bscgrid01.bsc.es/path/output1.txt)(file_clean_up=/home/user/input1.txt /home/user/output1.txt)"; globus_gram_client_job_request(bscgrid02.bsc.es, rsl, NULL, NULL);
rsl = "&(executable=/home/user/sim)(arguments=input2.txt output2.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input2.txt /home/user/input2.txt))(file_stage_out=/home/user/output2.txt gsiftp://bscgrid01.bsc.es/path/output2.txt)(file_clean_up=/home/user/input2.txt /home/user/output2.txt)"; globus_gram_client_job_request(bscgrid03.bsc.es, rsl, NULL, NULL);
rsl = "&(executable=/home/user/sim)(arguments=input3.txt output3.txt) (file_stage_in=(gsiftp://bscgrid01.bsc.es/path/input3.txt /home/user/input3.txt))(file_stage_out=/home/user/output3.txt gsiftp://bscgrid01.bsc.es/path/output3.txt)(file_clean_up=/home/user/input3.txt /home/user/output3.txt)"; globus_gram_client_job_request(bscgrid04.bsc.es, rsl, NULL, NULL);}
Grid-awareExplicit parallelism
![Page 24: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/24.jpg)
GRID superscalar: a programming model for the Grid
24
2.3 Programming comparison
Globus vs GRIDSs
void sim(File input, File output){ command = "/home/user/sim " + input + ' ' + output; gs_result = GS_System(command);}
int main(){ GS_On(); sim("/path/input1.txt", "/path/output1.txt"); sim("/path/input2.txt", "/path/output2.txt"); sim("/path/input3.txt", "/path/output3.txt"); GS_Off(0);}
![Page 25: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/25.jpg)
GRID superscalar: a programming model for the Grid
25
2.3 Programming comparison
DAGMan vs GRIDSs
JOB A A.condorJOB B B.condorJOB C C.condorJOB D D.condorPARENT A CHILD B CPARENT B C CHILD D
int main(){ GS_On(); task_A(f1, f2, f3); task_B(f2, f4); task_C(f3, f5); task_D(f4, f5, f6); GS_Off(0);}
A
B C
DExplicit parallelism
No if/while clauses
![Page 26: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/26.jpg)
GRID superscalar: a programming model for the Grid
26
2.3 Programming comparison
Ninf-G vs GRIDSsint main(){ grpc_initialize("config_file"); grpc_object_handle_init_np("A", &A_h, "class"); grpc_object_handle_init_np("B", &B_h," class"); for(i = 0; i < 25; i++) { grpc_invoke_async_np(A_h,"foo",&sid,f_in[2*i],f_out[2*i]); grpc_invoke_async_np(B_h,"foo",&sid,f_in[2*i+1],f_out[2*i+1]); grpc_wait_all(); } grpc_object_handle_destruct_np(&A_h); grpc_object_handle_destruct_np(&B_h); grpc_finalize();} int main()
{ GS_On(); for(i = 0; i < 50; i++)
foo(f_in[i], f_out[i]); GS_Off(0);}
Grid-awareExplicit parallelism
![Page 27: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/27.jpg)
GRID superscalar: a programming model for the Grid
27
2.3 Programming comparison
VDL vs GRIDSsDV trans1( a2=@{output:tmp.0}, a1=@{input:filein.0} );DV trans2( a2=@{output:fileout.0}, a1=@{input:tmp.0} );
DV trans1( a2=@{output:tmp.1}, a1=@{input:filein.1} );DV trans2( a2=@{output:fileout.1}, a1=@{input:tmp.1} );
...
DV trans1( a2=@{output:tmp.999}, a1=@{input:filein.999} );DV trans2( a2=@{output:fileout.999}, a1=@{input:tmp.999} );
int main(){ GS_On(); for(i = 0; i < 1000; i++) {
tmp = "tmp." + i; filein = "filein." + i;fileout = "fileout." + i;trans1(tmp, filein);trans2(fileout, tmp);
} GS_Off(0);}
No if/while clauses
![Page 28: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/28.jpg)
GRID superscalar: a programming model for the Grid
28
Outline
1. Introduction2. Programming interface3. Runtime
3.1 Scientific contributions3.2 Developments3.3 Evaluation tests
4. Fault tolerance at the programming model level5. Conclusions and future work
![Page 29: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/29.jpg)
GRID superscalar: a programming model for the Grid
29
3.1 Scientific contributions
Runtime objectives– Extract implicit parallelism in sequential
applications
– Speed up execution using the Grid
Main requirement: Grid middleware– Job management
– Data transfer
– Security
![Page 30: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/30.jpg)
GRID superscalar: a programming model for the Grid
30
3.1 Scientific contributions
Apply computer architecture knowledge to the Grid (superscalar processor)
Grid
ns seconds/minutes/hours
L3 D
irectory/C
on
trol L2 L2 L2
LSU LSUIFUBXU
IDU IDU
IFUBXU
FPU FPU
FX
U
FX
U
ISU ISU
![Page 31: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/31.jpg)
GRID superscalar: a programming model for the Grid
31
3.1 Scientific contributions
Data dependence analysis: allow parallelism
Read after Write
Write after Read
Write after Write
task1(..., f1)
task2(f1, ...)
task1(f1, ...)
task2(..., f1)
task1(..., f1)
task2(..., f1)
![Page 32: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/32.jpg)
GRID superscalar: a programming model for the Grid
32
3.1 Scientific contributions
for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j))
matmul(A(0,0), B(0,0), C(0,0))
matmul(A(0,1), B(1,0), C(0,0))
matmul(A(0,2), B(2,0), C(0,0))
i = 0
j = 0
matmul(A(0,0), B(0,0), C(0,1))
matmul(A(0,1), B(1,0), C(0,1))
matmul(A(0,2), B(2,0), C(0,1))
i = 0
j = 1
...
k = 0
k = 1
k = 2
k = 0
k = 1
k = 2
![Page 33: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/33.jpg)
GRID superscalar: a programming model for the Grid
33
3.1 Scientific contributions
for(i=0; i < MSIZE; i++) for(j=0; j < MSIZE; j++) for(k=0; k < MSIZE; k++) matmul(A(i,k), B(k,j), C(i,j))
matmul(A(0,0), B(0,0), C(0,0))
matmul(A(0,1), B(1,0), C(0,0))
matmul(A(0,2), B(2,0), C(0,0))
i = 0
j = 0
matmul(A(0,0), B(0,0), C(0,1))
matmul(A(0,1), B(1,0), C(0,1))
matmul(A(0,2), B(2,0), C(0,1))
i = 0
j = 1
i = 0
j = 2
...
i = 1
j = 0
i = 1
j = 1
i = 1
j = 2
...
k = 0
k = 1
k = 2
k = 0
k = 1
k = 2
![Page 34: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/34.jpg)
GRID superscalar: a programming model for the Grid
34
3.1 Scientific contributions
File renaming: increase parallelism
Read after Write
Write after Read
Write after Write
task1(..., f1)
task2(f1, ...)Unavoidab
le
task1(f1, ...)
task1(..., f1)
Avoidable
Avoidable
task2(..., f1)task2(..., f1_NEW)
task2(..., f1)task2(..., f1_NEW)
![Page 35: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/35.jpg)
GRID superscalar: a programming model for the Grid
35
3.2 Developments
Basic functionality– Job submission (middleware usage)
• Select sources for input files• Submit, monitor or cancel jobs• Results collection
– API implementation• GS_On: read configuration file and environment• GS_Off: wait for tasks, cleanup remote data, undo renaming• GS_(F)Open: create a local task• GS_(F)Close: notify end of local task• GS_Barrier: wait for all running tasks to finish• GS_System: translate path• GS_Speculative_End: barrier until throw. If throw, discard
tasks from throw to GS_Speculative_End• GS_Throw: use gs_result to notify it
![Page 36: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/36.jpg)
GRID superscalar: a programming model for the Grid
36
3.2 Developments
Middleware
...
Task scheduling: Direct Acyclic Graph
![Page 37: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/37.jpg)
GRID superscalar: a programming model for the Grid
37
3.2 Developments
Task scheduling: resource brokering– A resource broker is needed (but not an objective)
– Grid configuration file• Information about hosts (hostname, limit of jobs, queue,
working directory, quota, …)• Initial set of machines (can be changed dynamically)
<?xml version="1.0" encoding="UTF-8"?><project isSimple="yes" masterBandwidth="100000" masterBuildScript="" masterInstallDir="/home/rsirvent/matmul-master" masterName="bscgrid01.bsc.es" masterSourceDir="/datos/GRID-S/GT4/doc/examples/matmul" name="matmul" workerBuildScript="" workerSourceDir="/datos/GRID-S/GT4/doc/examples/matmul">...<workers><worker Arch="x86" GFlops="5.985" LimitOfJobs="2" Mem="1024" NCPUs="2" NetKbps="100000" OpSys="Linux" Queue="none" Quota="0" deploymentStatus="deployed" installDir="/home/rsirvent/matmul-worker" name="bscgrid01.bsc.es">
![Page 38: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/38.jpg)
GRID superscalar: a programming model for the Grid
38
3.2 Developments
Task scheduling: resource brokering– Scheduling policy
• Estimation of total execution time of a single task
• FileTransferTime: time to transfer needed files to a resource (calculated with the hosts information and the location of files)
– Select fastest source for a file
• ExecutionTime: estimation of the task’s run time in a resource. An interface function (can be calculated, or estimated by an external entity)
– Select fastest resource for execution
• Smallest estimation is selected
imeExecutionTerTimeFileTransft
![Page 39: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/39.jpg)
GRID superscalar: a programming model for the Grid
39
3.2 Developments
Task scheduling: resource brokering– Match task constraints and machine capabilities– Implemented using the ClassAd library
• Machine: offers capabilities (from Grid configuration file: memory, architecture, …)
• Task: demands capabilities
– Filter candidate machines for a particular task
Software = BLAST
SoftwareList = BLAST, GAMESS
SoftwareList = GAMESS
![Page 40: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/40.jpg)
GRID superscalar: a programming model for the Grid
40
3.2 Developments
Middleware
f1 f2
f3f3
Task scheduling: File locality
imeExecutionTerTimeFileTransft
![Page 41: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/41.jpg)
GRID superscalar: a programming model for the Grid
41
3.2 Developments
Other file locality exploitation mechanisms– Shared input disks
• NFS or replicated data
– Shared working directories• NFS
– Erasing unused versions of files (decrease disk usage)
– Disk quota control (locality increases disk usage and quota may be lower than expected)
![Page 42: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/42.jpg)
GRID superscalar: a programming model for the Grid
42
3.3 Evaluation
NAS Grid Benchmarks Representative benchmark, includes different types of workflows which emulate a wide range of Grid Applications
Simple optimization example
Representative of optimization algorithms, workflow with two-level synchronization
New product and process development
Production application, workflow with parallel chains of computation
Potential energy hypersurface for acetone
Massively parallel, long running application
Protein comparison Production application, big computational challenge, massively parallel, high number of tasks
fastDNAml Well-known application in the context of MPI for Grids, workflow with synchronization steps
![Page 43: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/43.jpg)
GRID superscalar: a programming model for the Grid
43
3.3 Evaluation
NAS Grid BenchmarksLaunchLaunch
ReportReport
SP
SP
SP
SPSP
SPSP
SPSP
SP
SP
SP
SPSP
SPSP
SPSP
SP
SP
SP
SPSP
SPSP
SPSP
Launch
Report
BT MG FT
BT MG FT
BT MG FT
MF
MF
MF
MFMF
MF
LaunchLaunch
ReportReport
BTBT MGMG FTFT
BTBT MGMG FTFT
BTBT MGMG FTFT
MF
MF
MF
MFMF
MF
Launch
Report
LU LU LU
MG MG MG
FT FT FT
MFMFMF
MFMFMF
LaunchLaunch
ReportReport
LULU LULU LULU
MGMG MGMG MGMG
FTFT FTFT FTFT
MFMFMF
MFMFMF
Launch
Report
BT SP LU
BT SP LU
BT SP LU
MF
MF MF
MF
MF MF
MF
MF
LaunchLaunch
ReportReport
BTBT SPSP LULU
BTBT SPSP LULU
BTBT SPSP LULU
MF
MF MF
MF
MF MF
MF
MF
ED
HC
VP
MB
![Page 44: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/44.jpg)
GRID superscalar: a programming model for the Grid
44
3.3 Evaluation
Run with classes S, W, A (2 machines x 4 CPUs) VP benchmark must be analyzed in detail (does
not scale up to 3 CPUs)
0,00
0,50
1,00
1,50
2,00
2,50
3,00
1 2 3 4
Limit of tasks (Kadesh8)
Sp
eed
up
MB.S
MB.W
VP.S
VP.W
VP.A
![Page 45: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/45.jpg)
GRID superscalar: a programming model for the Grid
45
3.3 Evaluation
Performance analysis– GRID superscalar runtime instrumented– Paraver tracefiles from the client side– The lifecycle of all tasks has been studied in detail
Overhead of GRAM Job Manager polling intervalGlobus overhead (VP.W)
05
1015202530354045
1 3 5 7 9 11 13 15
Task N
tim
e (
s) Task duration
Active to Done
Request to Active
![Page 46: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/46.jpg)
GRID superscalar: a programming model for the Grid
46
3.3 Evaluation
VP.S task assignment– 14.7% of the transfers when exploiting locality– VP is parallel, but its last part is sequentially executed
BT
BT
BT
MF
MF
MF
MG
MG
MG
MF
MF
MF
FT
FT
FT
Kadesh8
Khafre Remote file transfers
![Page 47: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/47.jpg)
GRID superscalar: a programming model for the Grid
47
3.3 Evaluation
Conclusion: workflow and granularity are important to achieve speed up
![Page 48: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/48.jpg)
GRID superscalar: a programming model for the Grid
48
Two-dimensional potential energy hypersurface for acetone as a function of the 1, and 2 angles
3.3 Evaluation
![Page 49: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/49.jpg)
GRID superscalar: a programming model for the Grid
49
3.3 Evaluation
Number of executed tasks: 1120 Each task between 45 and 65 minutes Speed up: 26.88 (32 CPUs), 49.17 (64 CPUs) Long running test, heterogeneous and
transatlantic Grid
22 CPUs14 CPUs
28 CPUs
![Page 50: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/50.jpg)
GRID superscalar: a programming model for the Grid
50
15 million Proteins
Genomes
15 million Proteins
3.3 Evaluation
15 million protein sequences have been compared using BLAST and GRID superscalar
![Page 51: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/51.jpg)
GRID superscalar: a programming model for the Grid
51
3.3 Evaluation
100,000 tasks in 4000 CPUs (= 1,000 exclusive nodes)
“Grid” of 1,000 machines with very low latency between them– Stress test for the runtime
Avoids user to work with queuing system Saves queuing system from handling a huge set
of independent tasks
![Page 52: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/52.jpg)
GRID superscalar: a programming model for the Grid
52
GRID superscalar: programming interface and runtime
Publications
Raül Sirvent, Josep M. Pérez, Rosa M. Badia, Jesús Labarta, "Automatic Grid workflow based on imperative programming languages", Concurrency and Computation: Practice and Experience, John Wiley & Sons, vol. 18, no. 10, pp. 1169-1186, 2006.
Rosa M. Badia, Raul Sirvent, Jesus Labarta, Josep M. Perez, "Programming the GRID: An Imperative Language-based Approach", Engineering The Grid: Status and Perspective, Section 4, Chapter 12, American Scientific Publishers, January 2006.
Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela and Rogeli Grima, "Programming Grid Applications with GRID Superscalar", Journal of Grid Computing, Volume 1, Issue 2, 2003.
![Page 53: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/53.jpg)
GRID superscalar: a programming model for the Grid
53
GRID superscalar: programming interface and runtime
Work related to standards
R.M. Badia, D. Du, E. Huedo, A. Kokossis, I. M. Llorente, R. S. Montero, M. de Palol, R. Sirvent, and C. Vázquez, "Integration of GRID superscalar and GridWay Metascheduler with the DRMAA OGF Standard", Euro-Par, 2008.
Raül Sirvent, Andre Merzky, Rosa M. Badia, Thilo Kielmann, "GRID superscalar and SAGA: forming a high-level and platform-independent Grid programming environment", CoreGRID Integration Workshop. Integrated Research in Grid Computing, Pisa (Italy), 2005.
![Page 54: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/54.jpg)
GRID superscalar: a programming model for the Grid
54
Outline
1. Introduction2. Programming interface3. Runtime4. Fault tolerance at the programming model level
4.1 Checkpointing4.2 Retry mechanisms4.3 Task replication
5. Conclusions and future work
![Page 55: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/55.jpg)
GRID superscalar: a programming model for the Grid
55
4.1 Checkpointing
Inter-task checkpointing Recovers sequential consistency in the out-of-
order execution of tasks– Single version of every file is saved– No need to save any data structures in the runtime
Drawback: some completed tasks may be lost– Application-level checkpoint can avoid this
30 1 2 3 4 5 6
![Page 56: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/56.jpg)
GRID superscalar: a programming model for the Grid
56
4.1 Checkpointing
Conclusions– Low complexity in order to checkpoint a task
• ~1% overhead introduced
– Can deal with both application level errors or Grid level errors
• Most important when an unrecoverable error appears
– Transparent for end users
![Page 57: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/57.jpg)
GRID superscalar: a programming model for the Grid
57
4.2 Retry mechanisms
Middleware
Automatic drop of machines
CC
![Page 58: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/58.jpg)
GRID superscalar: a programming model for the Grid
58
4.2 Retry mechanisms
Middleware
Soft and hard timeouts for tasks
Soft timeout
FailureSuccess
Soft timeoutHard timeout
![Page 59: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/59.jpg)
GRID superscalar: a programming model for the Grid
59
4.2 Retry mechanisms
Middleware
Retry of operations
Request
FailureSuccess
syscall
SuccessFailure
![Page 60: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/60.jpg)
GRID superscalar: a programming model for the Grid
60
4.2 Retry mechanisms
Conclusions– Keep running despite failures– Dynamic: when and where to resubmit– Detects performance degradations– No overhead when no failures are detected– Transparent for end users
![Page 61: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/61.jpg)
GRID superscalar: a programming model for the Grid
61
4.3 Task replication
Middleware
Replicate running tasks depending on successors
3 4 5 6 7
0 1 210 21
![Page 62: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/62.jpg)
GRID superscalar: a programming model for the Grid
62
4.3 Task replication
Middleware
Replicate running tasks to speed up the execution
3 4 5 6 7
0 1 210
![Page 63: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/63.jpg)
GRID superscalar: a programming model for the Grid
63
4.3 Task replication
Conclusions– Dynamic replication: application level knowledge is used
(the workflow)– Replication can deal with failures hiding retry overhead– Replication can speed up applications in heterogeneous
Grids– Transparent for end users
– Drawback: increased usage of resources
![Page 64: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/64.jpg)
GRID superscalar: a programming model for the Grid
64
4. Fault tolerance features
Publications
Vasilis Dialinos, Rosa M. Badia, Raül Sirvent, Josep M. Pérez and Jesús Labarta, "Implementing Phylogenetic Inference with GRID superscalar", Cluster Computing and Grid 2005 (CCGRID 2005), Cardiff, UK, 2005.
Raül Sirvent, Rosa M. Badia and Jesús Labarta, "Graph-based task replication for workflow applications", Submitted, HPCC 2009.
![Page 65: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/65.jpg)
GRID superscalar: a programming model for the Grid
65
Outline
1. Introduction2. Programming interface3. Runtime4. Fault tolerance at the programming model level5. Conclusions and future work
![Page 66: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/66.jpg)
GRID superscalar: a programming model for the Grid
66
5. Conclusions and future work
Grid-unaware programming model
Transparent features for users, exploiting parallelism and failure treatment
Used in REAL systems and REAL applications
Some future research is already ONGOING (StarSs)
![Page 67: GRID superscalar: a programming model for the Grid](https://reader033.vdocuments.net/reader033/viewer/2022051621/56814916550346895db64cba/html5/thumbnails/67.jpg)
GRID superscalar: a programming model for the Grid
67
5. Conclusions and future work
Future work– Grid of supercomputers (Red Española de
Supercomputación)– Higher scale tests (hundreds? thousands?)– More complex brokering
• Resource discovery/monitoring• New scheduling policies based on the workflow• Automatic prediction of execution times
– New policies for task replication– New architectures for StarSs