web services load leveler enabling autonomic meta-scheduling in grid environments objective enable...

1
Web Services Load Leveler Enabling Autonomic Meta-Scheduling in Grid Environments Objective Enable autonomic meta-scheduling over different organizations Strategic Importance Enhance usability: common job control language to different resource domains Drive interoperability of schedulers: proprietary and open-source Provide integrated scheduling views for enterprise and grid customers Technology Benefits Meet various user service objectives: policy driven (e.g. capability based, response time based) Maximize resource availability to users with transparency of locations Optimize utilization of resources across domains I: Objectives Connection API • Establish and terminate connections between domain meta- schedulers. • Negotiate roles and connection parameters using the interface •Provider roles: provide resources for job execution; is responsible of sending out resource information •Consumer roles: use resources provided by providers; route job request to providers. • Send heart beats: exchanged to guarantee the healthy state of the connection. Resource exchange API • Exchange the scheduling capability and capacity of the domain controlled by the meta- scheduler •Exchanged information can be a complete or incremental set of data IV: System architecture Job Mgmt API Connection API Resource exchange API Job Management Resource Management Connection Management FIU Peer-to-peer II: P2P Meta scheduling TDWB IBM-USA TDWB IBM- India IBM Fork BSCgrid BSC SGE FIU-GCB Fork LAGrid FIU Meta- Scheduler Meta- Scheduler Meta- Scheduler Peer-to-peer Peer-to-peer III: Autonomic Meta scheduling Yanbin Liu 1 , S. Masoud Sadjadi 2 , Liana Fong 1 , Ivan Rodero 3 , David Villegas 2 , Selim Kalayci 2 , Norman Bobroff 1 , Juan Carlos Martinez 2 1: IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 2: Florida International University, Miami, FL 33199 3: Barcelona Supercomputing Center, Barcelona, Spain C P: Job flow is from C to P, resource info flow is from P to C Job Management Resource Management Connection Management IBM JSDL Resource Repository Conn Records Job Records LL Proxy User Client JSDL GCB Cluster LAGrid Cluster SGE Fork Some key aspects of the Metascheduler Protocol: • Heterogeneous sites; inner structure of domains doesn’t effect the functionality of the protocol. • Site autonomy; each metascheduler is responsible from its own site, and offers as much information as it wants to other sites. • Peer-to-peer; no centralized body, no single-point of failure. BSC BSCGrid Cluster Fork CEPBA Resources LoadLeveler JSDL GT4 Container eNANOS Resource Broker LAGrid RP WS Client LAGrid Plugin Apache Axis2 Server WS Interface Command-line Java API eNANOS Client LL/ Fork CEPBA WS Client Site scheduling manager Global Scheduling manager Resource manager Globus Globus Globus Globus Scheduler Job Management Resource Management Connection Management Gridway Sched Policy Meta-TDWB 1. User Client takes the job request from the local User. This request is forwarded to Global Scheduling Manager (GSM). 2. GSM queries the Resource Manager (RM) for resources. RM stores information about local and remote resources. 3. If available resources are found on local site, job request is forwarded to Site Scheduling Manager (SSM). 4. SSM leverages Gridway functionality to submit the job to the Grid Middleware (Globus). 5. If there are not available resources locally, job request is sent to a remote site through WS Client 6. Alternatively, job requests from other peers can be received from the WS layer. 1. The eNANOS Client forwards the user requests to the eNANOS Broker. 2. The remote request from the P2P infrastructure are managed by regular WS (Axis2) acting as a wrapper to a GT4 service that implements the LAGrid APIs and protocols. Connections and other data is stored in Resource Properties. 3. Jobs and resources (aggregated data) obtained from local and remote sites are used in the eNANOS Resource Broker scheduling. Jobs are executed under the local domain through Globus services, or are forwarded to other meta-scheduler. 4. eNANOS provide its resources data, forwards jobs and performs other operations (such as sending heart beats) through a WS Client. 1. Users send job requests in JSDL format to Meta- TDWB using either TDWB web console or command line. 2. The connection with other meta-schedulers is created and recorded. 3. The resource information from other meta-schedulers, local schedulers such as LoadLeveler and TDWB, and local resources are recorded. 4. Combined with chosen scheduling policy, scheduler matches job requests to resources. 5. Based on the allocated resources, job requests are either forwarded to other schedulers or Command-ine Web Console TDWB Resources Tivoli Agent • Use of a common protocol for communication • Initial negotiation phase • Meta scheduler roles • Heartbeat rate • Type of authentication • Quality of Service • Job execution • Based on user policies • Resource matching • Physical resources • Scheduler capabilities (FCFS, priority- based)

Upload: shona-jenkins

Post on 01-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Web Services Load Leveler Enabling Autonomic Meta-Scheduling in Grid Environments Objective Enable autonomic meta-scheduling over different organizations

Web Services

Load Leveler

Enabling Autonomic Meta-Scheduling in Grid Environments

Objective• Enable autonomic meta-scheduling over different

organizations

Strategic Importance• Enhance usability: common job control language to different

resource domains• Drive interoperability of schedulers: proprietary and open-

source• Provide integrated scheduling views for enterprise and grid

customers

Technology Benefits• Meet various user service objectives: policy driven (e.g.

capability based, response time based)• Maximize resource availability to users with transparency of

locations• Optimize utilization of resources across domains

I: Objectives

Connection API• Establish and terminate connections

between domain meta-schedulers. • Negotiate roles and connection parameters

using the interface• Provider roles: provide resources for job

execution; is responsible of sending out resource information

• Consumer roles: use resources provided by providers; route job request to providers.

• Send heart beats: exchanged to guarantee the healthy state of the connection.

Resource exchange API• Exchange the scheduling capability and

capacity of the domain controlled by the meta-scheduler• Exchanged information can be a

complete or incremental set of data

Job management API• Submit, re-route and monitor job executions

across schedulers

IV: System architecture

Job Mgmt API

Connection API

Resource exchange API

Job Management

ResourceManagement

ConnectionManagement

FIU

Peer-to-peer

II: P2P Meta scheduling

TDWB

IBM-USA

TDWB

IBM-India

IBM

Fork

BSCgrid

BSC

SGE

FIU-GCB

Fork

LAGrid

FIU

Meta-Scheduler

Meta-Scheduler

Meta-Scheduler

Peer-to-peer

Peer-to-peer

III: Autonomic Meta scheduling

Yanbin Liu1, S. Masoud Sadjadi2, Liana Fong1, Ivan Rodero3, David Villegas2, Selim Kalayci2, Norman Bobroff1, Juan Carlos Martinez2

1: IBM T. J. Watson Research Center, Yorktown Heights, NY 105982: Florida International University, Miami, FL 33199

3: Barcelona Supercomputing Center, Barcelona, Spain

C P: Job flow is from C to P, resource info flow is from P to C

Job Management

ResourceManagement

ConnectionManagement

IBM

JSDL

Resource Repository

Conn Records

Job Records

LL Proxy

UserClient

JSDL

GCBCluster

LAGridCluster

SGE Fork

Some key aspects of the Metascheduler Protocol:

• Heterogeneous sites; inner structure of domains doesn’t effect the functionality of the protocol.

• Site autonomy; each metascheduler is responsible from its own site, and offers as much information as it wants to other sites.

• Peer-to-peer; no centralized body, no single-point of failure.

BSC

BSCGridCluster

Fork

CEPBAResources

LoadLeveler

JSDL

GT4 Container

eNANOSResource

Broker

LAGrid RP

WS ClientLAGrid

Plugin

ApacheAxis2 Server WS Interface

Command-line

Java API

eNANOSClient

LL/Fork

CEPBA

WS Client

Site scheduling manager

GlobalSchedulingmanager

Resourcemanager

Globus Globus

Globus

Globus

Scheduler

Job Management

ResourceManagement

ConnectionManagement

Gridway

Sched Policy

Meta-TDWB

1. User Client takes the job request from the local User. This request is forwarded to Global Scheduling Manager (GSM).2. GSM queries the Resource Manager (RM) for resources. RM stores information about local and remote resources. 3. If available resources are found on local site, job request is forwarded to Site Scheduling Manager (SSM). 4. SSM leverages Gridway functionality to submit the job to the Grid Middleware (Globus).5. If there are not available resources locally, job request is sent to a remote site through WS Client6. Alternatively, job requests from other peers can be received from the WS layer.

1. User Client takes the job request from the local User. This request is forwarded to Global Scheduling Manager (GSM).2. GSM queries the Resource Manager (RM) for resources. RM stores information about local and remote resources. 3. If available resources are found on local site, job request is forwarded to Site Scheduling Manager (SSM). 4. SSM leverages Gridway functionality to submit the job to the Grid Middleware (Globus).5. If there are not available resources locally, job request is sent to a remote site through WS Client6. Alternatively, job requests from other peers can be received from the WS layer.

1. The eNANOS Client forwards the user requests to the eNANOS Broker.2. The remote request from the P2P infrastructure are managed by regular WS (Axis2) acting as a wrapper to a GT4 service that implements the LAGrid APIs and protocols. Connections and other data is stored in Resource Properties.3. Jobs and resources (aggregated data) obtained from local and remote sites are used in the eNANOS Resource Broker scheduling. Jobs are executed under the local domain through Globus services, or are forwarded to other meta-scheduler.4. eNANOS provide its resources data, forwards jobs and performs other operations (such as sending heart beats) through a WS Client.

1. The eNANOS Client forwards the user requests to the eNANOS Broker.2. The remote request from the P2P infrastructure are managed by regular WS (Axis2) acting as a wrapper to a GT4 service that implements the LAGrid APIs and protocols. Connections and other data is stored in Resource Properties.3. Jobs and resources (aggregated data) obtained from local and remote sites are used in the eNANOS Resource Broker scheduling. Jobs are executed under the local domain through Globus services, or are forwarded to other meta-scheduler.4. eNANOS provide its resources data, forwards jobs and performs other operations (such as sending heart beats) through a WS Client.

1. Users send job requests in JSDL format to Meta-TDWB using either TDWB web console or command line.

2. The connection with other meta-schedulers is created and recorded.

3. The resource information from other meta-schedulers, local schedulers such as LoadLeveler and TDWB, and local resources are recorded.

4. Combined with chosen scheduling policy, scheduler matches job requests to resources.

5. Based on the allocated resources, job requests are either forwarded to other schedulers or dispatched to the resources directly.

1. Users send job requests in JSDL format to Meta-TDWB using either TDWB web console or command line.

2. The connection with other meta-schedulers is created and recorded.

3. The resource information from other meta-schedulers, local schedulers such as LoadLeveler and TDWB, and local resources are recorded.

4. Combined with chosen scheduling policy, scheduler matches job requests to resources.

5. Based on the allocated resources, job requests are either forwarded to other schedulers or dispatched to the resources directly.

Command-ine

Web Console

TDWBResources

TivoliAgent

• Use of a common protocol for communication

• Initial negotiation phase

• Meta scheduler roles

• Heartbeat rate

• Type of authentication

• Quality of Service

• Job execution

• Based on user policies

• Resource matching

• Physical resources

• Scheduler capabilities (FCFS, priority-based)