grid application toolkits-1gallen/teaching/fall2007... · 9/6/2007 10 abstract requirements...

59
9/6/2007 1 Grid Application Toolkits Abstracting the Grid for Application Programmers Hartmut Kaiser [email protected] http://www.cct.lsu.edu/~gallen/Teaching

Upload: others

Post on 07-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 1

Grid Application Toolkits Abstracting the Grid for Application Programmers

Hartmut [email protected]

http://www.cct.lsu.edu/~gallen/Teaching

Page 2: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 2

Outline

• Why do we need Grids?• What Grid Services do we have today?

– Globus Toolkit• Why special application oriented

toolkits?• SAGA: A Simple API for Grid

Applications

Page 3: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 3

POTD: How to get tricked by ‘C’

• Non-terminated comment– a = b; /* this is a bug

c = d; /* c = d will never happen */– Solution: use C++ style comments: // …

• Accidental assignment/Accidental Booleans– /* a always equals 2 and b will be always executed */

if (a = 2) b;– Solution: if (2 == a) b;

– /* this "boolean" is always true! */if (0 < a < 5) c;

– /* this is compiled as (a = !b), rather than (a != b) or (a == !b) */if (a =! b) c;

Page 4: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 4

Outline

• Why do we need Grids?• What Grid Services do we have today?

– Globus Toolkit• Why special application oriented

toolkits?• SAGA: A Simple API for Grid

Applications

Page 5: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 5

Concrete Example

• I have a source file Main.F on machineA, an input file on machine B. Main.F iswritten using MPI, it will need around4GB of core memory to run, it will takeseveral hours to complete, and willproduce a large output file.

• What functionality do we need?• What resource is usable to run this?

Page 6: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 6

Issues• How to select a machine to run it on?• How to provide an executable which can run

on that machine?• How to move the input file?• How to start the executable?• How to monitor the job? When does it start?

When does it finish?• How to move the output file back?• What about security?• How do we know if it didn’t work and how it

failed?

Page 7: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 7

How to Select a Machine• What properties of a machine are we interested in?

– What resources does my executable require?• 4 GB memory, “several hours of compute time”• Enough disk space for the output

– What kind of environment do I need on the machine?• OS limitations?• MPI? (Which version?), Fortran?

– What resources am I authorized to run on?– How quickly will it run?– How much will it cost/what is my allocation there?– How to find all this information? What should the user

provide?

Page 8: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 8

More Complicated• What if the program might need to read in

data kept on machine C while it is running?• What about distributing across processors on

different machines?• What if I have a lot of interconnected

programs?• How do I find the output file afterwards?• What if it doesn’t work?

Page 9: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 9

Questions

• What kind of functionality do we need?• What tools exist to do this?• What kinds of features of distributed

computing do they need to be designed?• What design issues to watch for?

Page 10: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 10

Abstract Requirements• Single sign-on• Job submission, monitoring and management

– submit a job to a resource on the grid– monitor the progress of a submitted job– retrieve results– cancel job

• File transfer– move files from A to B, securely, reliably and efficiently

• Logical file management– Store a file on different machines, using different local

names, access it using a single logical name• Resource discovery

– locate resources or services with particular characteristics

Page 11: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 11

Basic Grid Model

Page 12: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 12

Outline

• Why do we need Grids?• What Grid Services do we have today?

– Globus Toolkit• Why special application oriented

toolkits?• SAGA: A Simple API for Grid

Applications

Page 13: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 13

What do I have to choose from?• Globus Toolkit

– version 2 is widely deployed; nearest thing to a de facto standard– horizontally integrated bag of tools– suits grid application developers better than end users– Brand new V4 based on web services

• UNICORE– less widely deployed; few UK deployments– vertically integrated– suits end users better than application developers

• Condor– high throughput computing– great for cycle harvesting

• Web Services?– GT4 or roll your own using Web Services tools

• Others– yes, there are others, a lot!

Page 14: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 14

Computationally intensive File access/transfer Bag of various heterogeneous protocols & toolkits Monolithic design Recognised internet, ignored Web Academic teams

Generation GameIn

crea

sed

func

tiona

lity,

stan

dard

izat

ion

Time

Customsolutions

Open GridServices

ArchitectureWeb services

Globus ToolkitCondor, Unicore

Defacto standardsGridFTP, GSI

X.509,LDAP,

FTP, …

App-specificServices

Data and knowledge intensive Open services-based architecture

Builds on Web services GGF + OASIS + W3C

Multiple implementationsGlobal Grid Forum

Industry participation(adapted from Ian Foster GGF7 Plenary)

Time

Page 15: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 15

UNICORE• Packaged Software with GUI• Open source

– http://unicore.sourceforge.net/• Designed for firewalls• Strict security model

– explicit delegation• Abstract Job Object (AJO)

– built-in workflow management• Resource Broker

– can submit to Globus grids• Has notion of software resource• Few APIs

– extend through plug-ins– starting to expose service

interfaces• Serves the user

http://www.unicore.org/

Page 16: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 16

Condor: High-throughput computing

Condor converts collections of workstations and clusters into adistributed high-throughput computing facility

• Emphasis on policy management and reliability• High-throughput scheduler• Supports job checkpoint and migration

– single processor jobs only• Remote system calls

Condor-G lets Condor users add Globus-enabled resources to theirprivate view of a Condor pool ("flock")

• "glide-in"

http://www.cs.wisc.edu/condor/

Page 17: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 17

Globus Toolkit V2• GT2 “Implements Grid protocols for security,

information discovery, resource management,data management, communication, faultdetection and portability”

• Bag of tools rather than a uniform programmingmodel, aims to provide distinct services withwell defined APIs

• Assumes suitable software deployed onresources to provide basic fabric functionality(although some tools to help this are provided)

Page 18: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 18

Globus Toolkit version 2• "Single sign-on" through Grid Security

Infrastructure (GSI)• Remote execution of jobs

– GRAM, job-managers, Resource SpecificationLanguage (RSL)

• Grid-FTP– Efficient, reliable file transfer; third-party file

transfers• MDS (Metacomputing Directory Service)

– Resource discovery (GRIS and GIIS)• Co-allocation (DUROC)

– Limited by support from schedulinginfrastructure

• Other GSI-enabled utilities– gsi-ssh, grid-cvs, etc.

• Low-level APIs and command-line interfaces• Commodity Grid Kits (CoG-kits), Java, Perl,

Python• Widespread deployment, lots of projects

Diverse global services

A p p l i c a t i o n s

Local OS

Coreservices

Page 19: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 19

Web Services• A Web service is a software system designed to

support interoperable machine-to-machine interactionover a network.

• It has an interface that is described in a machine-processable format such as WSDL.

• Other systems interact with the Web service in amanner prescribed by its interface using messages(usually enclosed in a SOAP envelope).

• These messages are typically conveyed using HTTP, andare normally comprised of XML

• Software applications written in various programminglanguages and running on various platforms can use webservices to exchange data over networks.

Page 20: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 20

Web Services• This interoperability (e.g., between Java and Python, or

Windows and Linux applications) is due to the use ofopen standards.

• OASIS and the W3C are the primary committeesresponsible for the architecture and standardization ofweb services.

• Specifications for additional features underdevelopment.

• Basically: Web service = TRANSPORT (HTTP) +MESSAGING (SOAP) + DESCRIPTION (WSDL) +DISCOVERY (UDDI) + MESSAGE (XML)

Page 21: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 21

Service Oriented Architecture• Components are defined by service interfaces (e.g. Web

Services)• Characterized by:

– Abstract logical view of programs, databases etc– Services defined by exchanged messages (not by properties of

the agents themselves)– Internal structure of agent is not relevant (can accommodate

legacy systems)– Services defined by machine processable meta data

(documented semantics)– Small number of operations– Services oriented towards network usage– Platform neutral (e.g. messages in XML)

Page 22: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 22

Open Grid Services Architecture

• Resulted from attempt to standardize GTprotocols, influenced by uptake of webservices and SoA ideas:– Modularize components for different grid

functions– Uniform treatment of network entities

(service orientation)– Standard IDLs aligned with Web services– Develop within standards body (Global Grid

Forum)

Page 23: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 23

Open Grid Services Architecture• Grid Service

– A web service which is extended to include transient and statefulservices

• OGSI specification– Open Grid Services Infrastructure– Defines interfaces, behaviours and conventions for grid services– Now replaced by range of web service definitions

• OGSA defines services and interfaces required in a working gridenvironment– GGF working groups are identifying required functions and then

making OGSI compliant interfaces• Multiple implementations

– GT3: reference implementation of OGSI and basic OGSA services– GT4: pure web services

Page 24: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 24

Globus Toolkit V4 (GT4)• Released April 2005, today V4.2.x• Service oriented architecture• Web services to describe and invoke most

components• GT4 web service containers for deploying and

managing GT4 services (Java, C, Python)• Most interfaces still need to be standardized• Still exposes GT2 interfaces

Page 25: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 25

Page 26: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 26

Outline

• Why do we need Grids?• What Grid Services do we have today?

– Globus Toolkit• Why special application oriented

toolkits?• SAGA: A Simple API for Grid

Applications

Page 27: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 27

Realities of Grids• Different hardware

– Chipset– Architecture (desktops, beowulf clusters, SMP, etc.)

• Different software– Operating system, compilers– Libraries, software stack– Middleware service packages and versions

• Different administrative policies– Access policies, quotas, upgrade policies

• Highly dynamic at all levels– without notifying you as the user

Page 28: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 28

Levels of dynamicity• Everything is static

– Fixed set of dependencies– Everything is known beforehand and guaranteed to be there– Example: desktop systems

• Applications usually are not prepared for failures of the environment(the desktop)

• Polite environments– Within a single VM, you have a number of services written by a

number of different people that may be coming and goingindependently of each other

– Java OSGi: one universal middleware, lots of services• Inpolite (distributed) environments

– Networks get involved– Things may disappear without saying good-bye, no reliability– Network failures, temporary outages, and you don't have full

information about the cause• Difficult to handle, requires lots of effort to be done right

Page 29: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 29

100$ Question

• What does this mean for applicationprogrammers?– Problems: (multifold and often cited - amongst others)

• Interfaces are NOT simple (see next slides …)• Different and evolving interfaces to the ’Grid’• Environment changes in many ways

– Globus, Unicore, Condor, my_service, your_service, …The same functionality has different interfaces allover the place.

– Restricting yourself to Globus does not help either:version changes every couple of months (2.4.x, 3.2.y,4.a.b)

Page 30: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 30

100$ Question

The application that runs todaywill fail tomorrow!

Right now, it is basically impossible for a programmer to focus on thescience, not on IT (i.e. Grid) problems.

Page 31: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 31

Copy a File: Globusif (source_url.scheme_type == GLOBUS_URL_SCHEME_GSIFTP ||

source_url.scheme_type == GLOBUS_URL_SCHEME_FTP ) {

globus_ftp_client_operationattr_init (&source_ftp_attr);

globus_gass_copy_attr_set_ftp (&source_gass_copy_attr,

&source_ftp_attr);

}

else {

globus_gass_transfer_requestattr_init (&source_gass_attr,

source_url.scheme);

globus_gass_copy_attr_set_gass(&source_gass_copy_attr,

&source_gass_attr);

}

output_file = globus_libc_open ((char*) target,

O_WRONLY | O_TRUNC | O_CREAT,

S_IRUSR | S_IWUSR | S_IRGRP |

S_IWGRP);

if ( output_file == -1 ) {

printf ("could not open the file \"%s\"\n", target);

return (-1);

}

/* convert stdout to be a globus_io_handle */

if ( globus_io_file_posix_convert (output_file, 0,

&dest_io_handle)

!= GLOBUS_SUCCESS) {

printf ("Error converting the file handle\n");

return (-1);

}

result = globus_gass_copy_register_url_to_handle (

&gass_copy_handle, (char*)source_URL,

&source_gass_copy_attr, &dest_io_handle,

my_callback, NULL);

if ( result != GLOBUS_SUCCESS ) {

printf ("error: %s\n", globus_object_printable_to_string

(globus_error_get (result)));

return (-1);

}

globus_url_destroy (&source_url);

return (0);

}

int RemoteFile::GetFile (char const* source, char const* target) {

globus_url_t source_url;

globus_io_handle_t dest_io_handle;

globus_ftp_client_operationattr_t source_ftp_attr;

globus_result_t result;

globus_gass_transfer_requestattr_t source_gass_attr;

globus_gass_copy_attr_t source_gass_copy_attr;

globus_gass_copy_handle_t gass_copy_handle;

globus_gass_copy_handleattr_t gass_copy_handleattr;

globus_ftp_client_handleattr_t ftp_handleattr;

globus_io_attr_t io_attr;

int output_file = -1;

if ( globus_url_parse (source_URL, &source_url) != GLOBUS_SUCCESS ) {

printf ("can not parse source_URL \"%s\"\n", source_URL);

return (-1);

}

if ( source_url.scheme_type != GLOBUS_URL_SCHEME_GSIFTP &&

source_url.scheme_type != GLOBUS_URL_SCHEME_FTP &&

source_url.scheme_type != GLOBUS_URL_SCHEME_HTTP &&

source_url.scheme_type != GLOBUS_URL_SCHEME_HTTPS ) {

printf ("can not copy from %s - wrong prot\n", source_URL);

return (-1);

}

globus_gass_copy_handleattr_init (&gass_copy_handleattr);

globus_gass_copy_attr_init (&source_gass_copy_attr);

globus_ftp_client_handleattr_init (&ftp_handleattr);

globus_io_fileattr_init (&io_attr);

globus_gass_copy_attr_set_io (&source_gass_copy_attr, &io_attr);

&io_attr);

globus_gass_copy_handleattr_set_ftp_attr

(&gass_copy_handleattr,

&ftp_handleattr);

globus_gass_copy_handle_init (&gass_copy_handle,

&gass_copy_handleattr);

Page 32: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 32

Copy a File: CoG/RFTTransferRequestType transferRequest = new TransferRequestType ();

transferRequest.setTransferArray (transfers1);

int concurrency = Integer.valueOf

((String)requestData.elementAt(6)).intValue();

if (concurrency > transfers1.length)

{

System.out.println ("Concurrency should be less than the number"

"of transfers in the request");

System.exit (0);

}

transferRequest.setConcurrency (concurrency);

TransferRequestElement requestElement = new TransferRequestElement ();

requestElement.setTransferRequest (transferRequest);

ExtensibilityType extension = new ExtensibilityType ();

extension = AnyHelper.getExtensibility (requestElement);

OGSIServiceGridLocator factoryService = new OGSIServiceGridLocator ();

Factory factory = factoryService.getFactoryPort (new URL (source_url));

GridServiceFactory gridFactory = new GridServiceFactory (factory);

LocatorType locator = gridFactory.createService (extension);

System.out.println ("Created an instance of Multi-RFT");

MultiFileRFTDefinitionServiceGridLocator loc

= new MultiFileRFTDefinitionServiceGridLocator();

RFTPortType rftPort = loc.getMultiFileRFTDefinitionPort (locator);

((Stub)rftPort)._setProperty (Constants.AUTHORIZATION,

NoAuthorization.getInstance());

((Stub)rftPort)._setProperty (GSIConstants.GSI_MODE,

GSIConstants.GSI_MODE_FULL_DELEG);

((Stub)rftPort)._setProperty (Constants.GSI_SEC_CONV,

Constants.SIGNATURE);

((Stub)rftPort)._setProperty (Constants.GRIM_POLICY_HANDLER,

new IgnoreProxyPolicyHandler ());

int requestid = rftPort.start ();

System.out.println ("Request id: " + requestid);

}

catch (Exception e)

{

System.err.println (MessageUtils.toString (e));

}

}

package org.globus.ogsa.gui;

import java.io.BufferedReader;

import java.io.File;

import java.io.FileReader;

import java.net.URL;

import java.util.Date;

import java.util.Vector;

import javax.xml.rpc.Stub;

import org.apache.axis.message.MessageElement;

import org.apache.axis.utils.XMLUtils;

import org.globus.*

import org.gridforum.ogsi.*

import org.gridforum.ogsi.holders.TerminationTimeTypeHolder;

import org.w3c.dom.Document;

import org.w3c.dom.Element;

public class RFTClient {

public static void copy (String source_url, String target_url) {

try {

File requestFile = new File (source_url);

BufferedReader reader = null;

try {

reader = new BufferedReader (new FileReader (requestFile));

} catch (java.io.FileNotFoundException fnfe) { }

Vector requestData = new Vector ();

requestData.add (target_url);

TransferType[] transfers1 = new TransferType[transferCount];

RFTOptionsType multirftOptions = new RFTOptionsType ();

multirftOptions.setBinary (Boolean.valueOf (

(String)requestData.elementAt (0)).booleanValue ());

multirftOptions.setBlockSize (Integer.valueOf (

(String)requestData.elementAt (1)).intValue ());

multirftOptions.setTcpBufferSize (Integer.valueOf (

(String)requestData.elementAt (2)).intValue ());

multirftOptions.setNotpt (Boolean.valueOf (

(String)requestData.elementAt (3)).booleanValue ());

multirftOptions.setParallelStreams (Integer.valueOf (

(String)requestData.elementAt (4)).intValue ());

multirftOptions.setDcau(Boolean.valueOf(

(String)requestData.elementAt (5)).booleanValue ());

int i = 7;

for (int j = 0; j < transfers1.length; j++)

{

transfers1[j] = new TransferType ();

transfers1[j].setTransferId (j);

transfers1[j].setSourceUrl ((String)requestData.elementAt (i++));

transfers1[j].setDestinationUrl ((String)requestData.elementAt (i++));

transfers1[j].setRftOptions (multirftOptions);

}

Page 33: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 33

Copy a File: a Dream?#include <saga.hpp>

bool get_file(std::string source_url, std::string target_url){ try { // copy the file saga::file file(source_url); file.copy (target_url); } catch (saga::exception const &e) { std::cerr << "Some error: " << e.what() << std::endl; return false; } return true;}

• Wouldn’t it be nice to have so little code only?

Page 34: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 34

Copy a File: SAGA/C++#include <saga.hpp>

bool get_file(std::string source_url, std::stringtarget_url)

{ try { // copy the file saga::file file(source_url); file.copy (target_url); } catch (saga::exception const &e) { std::cerr << "Some error: " << e.what() << std::endl; return false; } return true;}

Page 35: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 35

Outline

• Why do we need Grids?• What Grid Services do we have today?

– Globus Toolkit• Why special application oriented

toolkits?• SAGA: A Simple API for Grid

Applications

Page 36: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 36

Basic Grid Model

Page 37: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 37

Simple API for Grid Applications

• OGF standardization effort– Intends to simplify the development of grid-enabled

applications by standardizing syntax and semantics ofa Grid API

– Even for scientists with no background in computerscience, or grid computing

– Interesting, challenging problem due to dynamicenvironment of today’s Grids

– 80/20 rule: maximal abstraction for minimalprogramming effort for the user

• Simple and Standard are the keywords

Page 38: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 38

Simplicity!The key objective for application programmers

Simple API’s should:• Be easy to use:

– Simple, finite, consistent API which allows error tracing• Be invariant: make upgrades really, really simple

– Well defined API which rarely changes.– Implementation which allows dynamic exchange of key elements and provides runtime

abstractions.• Avoid refactoring/recoding/recompilation

– Same applications runs today and tomorrow; here and there; on Globus and Unicore; onGlobus 2.2.4 and Globus 4.2; on Linux and on Mac; local and on the Grid;

• Focus on well-known programming paradigms(e.g., for a file: provide a file API – without services to services to files…)– Files are best example: expect open, close, read, write, seek, copy, move,

delete. Do not introduce fancy things like the need to ask a servicediscovery service to tell me the location of an service which is able to tellme the location of my file…

Page 39: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 39

SAGA API Scope• Remote Files• Logical Files• Streams• Job Management• Monitoring and Events• Information Management and Exchange• Utility classes (error handling, security,

preferences...)

Page 40: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 40

Example: File Access• Synchronous file access

saga::file f ("gridftp://remote.host.net/data/data.bin");char buf[100];if (f.get_size() >= 223){ f.seek (123, saga::file::current); f.read (buf, sizeof(buf));}

• Read 100 bytes from a given remote filestarting at byte position 123

Page 41: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 41

Example: Replicate a logical file

• Replicate a logical filetry { saga::logicalfile lf(“lfn://somehost/some_name”); std::vector<saga::file> files = lf.get_files();

files[0].copy(dest_url);}catch (saga::exception const &e) { std::cerr << "Some error: " << e.what() << std::endl; return e.result();}

• Access physical file based on abstractlogical name

Page 42: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 42

Existing SAGA implementations• C++ Implementation (LSU/VU)

– Implements full API– plugin architecture (any middleware)

• Java Implementations:– DEISA/EPCC:

• Partial API (Job and File package)• Binds to UNICORE only

– NAREGI/NEC• Partial API (Job package)• Binds to NAREGI only

– OMII• Implements full API• Binds to OMII services only

Page 43: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 43

Architecture Overview

Page 44: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 44

Overall Architecture

• Application binds to SAGA engine andpackages at link time

• Adaptors are loaded at runtime by theengine depending on call context

• Adaptors get selected based on CPIgroup and preferences

• Packages get connected to adaptors oncall by call basis

Page 45: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 45

• Modular structure– SAGA engine is usable with different API

implementations– Applications can load different package

sets on demand– Extensible: user specific packages are

possible

Overall Architecture

Page 46: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 46

Extensibility• Horizontal Extensibility – API Packages

– Current packages:• file management, job management, remote procedure

calls, replica management, data streaming, etc.

• Vertical Extensibility – Middleware Bindings– Different adaptors for different middleware– Set of ‘local’ adaptors

• Extensibility for Optimization and Features– Bulk optimization, modular design

Page 47: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 47

Example: Bulk optimization

std::vector<std::string> files = …initialize with list of url’s…;saga::task_container tc;

// create file copy tasksfor (std::size_t i = 0; i < files.size(); ++i){ saga::file f (files[i]); tc.add (f.copy<saga::Task>(...destination url...));}

// run all tasks, then wait for allsaga::run_wait(tc); // bulk optimization is applied here.

Page 48: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 48

Asynchronous API• SAGA API is asynchronous by design

– We decided to implement every APIfunction in 3 different ways:• Synchronous• Asynchronous• Task based (same as asynchronous, but not

started)– But on CPI level we expose only 2:

• Synchronous• Task based

Page 49: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 49

Asynchronous API• Packages

– Implement the corresponding binding– Emulate missing CPI functions

• Adaptor may implement only one, synchronous orasynchronous CPI functions

• Adaptor may implement asynchronous functionsolely based on his own synchronousimplementation by letting the engine to wrap itin a task

– Run task if asynchronous operation isrequired

Page 50: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 50

Asynchronous API• API directory::get_url()

1. string s = dir.get_url();• Returns result directly

2. task t = dir.get_url<saga::Sync>(s);• Returns finished task• Returns result indirectly

3. task t = dir.get_url<saga::Async>(s);• Returns running task• Returns result indirectly

4. task t = dir.get_url<saga::Task>(s);• Returns pending task• Returns result indirectly

Page 51: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 51

Example: Task modelusing namespace saga;

file f ("any://host.net//data/src.dat");

// normal sync version of the copy methodf.copy ("any://host.net//data/dest1.dat");

// the three task versions of the same methodtask t1 = f.copy<task::Sync> ("any://host.net//data/dest2.dat");task t2 = f.copy<task::ASync> ("any://host.net//data/dest3.dat");task t3 = f.copy<task::Task> ("any://host.net//data/dest4.dat");

// task states of the returned saga::task// t1 is in ’Finished’ or ’Failed’ state// t2 is in ’Running’ state// t3 is in ’New’ state

t3.run ();

t2.wait ();t3.wait ();

Page 52: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 52

Example: Combined Operations• Asynchronous combined operations

saga::task task = f.copy <saga::task::Async> ("b");saga::job job = js.run_job ("remote.host.net", "/bin/date");saga::task_container tc;

tc.add (task);tc.add (job);tc.wait (); // wait for task and job to complete

• Copy file, run job and wait for both

Page 53: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 53

Remote File Management

• Allows to create copy, move,delete remote files anddirectories, additionally:open, close, read, write,seek etc.

• File system hierarchynavigation: is_file, is_dir,is_link, change_dir,

Page 54: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 54

Logical File Management

• Handle several instances ofthe same file located ondifferent machines using asingle (logical) name

• Replicate to a new instance• Associate meta data• Search for files using

metadata criteria

Page 55: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 55

Job Management• Submit a job to a

remote resource– Specify additional

parameters as:requirements, filestaging, arguments

• Monitor this job’s state– Running, Done, Cancelled,

Failed• Receive events if job

state changes• Migrate, clone or cancel

a job• Trigger checkpointing

Page 56: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 56

Information Exchange• Central (hierarchical)

information storage (advert)• Useful to exchange information

between several jobs• Store any type of information• Associate metadata• Search for stored information

using metadata criteria

Page 57: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 57

Monitoring

• Monitor almost anything• Monitor continuous events• Monitor “event-like” events• Allows SAGA application to

be monitored• …and more

Page 58: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 58

Conclusions

• Grids are environments based on‚inpolite‘ services

• Grid programming abstractions areneeded for succesful application writing

• SAGA is a high abstraction of Gridsusing well known programming paradigms

Page 59: Grid Application Toolkits-1gallen/Teaching/Fall2007... · 9/6/2007 10 Abstract Requirements •Single sign-on •Job submission, monitoring and management –submit a job to a resource

9/6/2007 59

Suggested Reading• The Physiology of the Grid

– http://www.globus.org/research/papers/ogsa.pdf

• Computational Methods in Science andTechnology: SAGA - A Simple API for GridApplications

– http://saga.cct.lsu.edu/publications/SAGA_paper-a_simple_api_for_grid_applications_sc05.pdf

• Reading list assigments and projects will bediscussed next Tuesday