faucets: scheduling on clusters and across the grid

45
06/23/22 LACSI 2003 1 Faucets: Scheduling on Clusters and Across the Grid Presenter: Sameer Kumar Team: Sanjay Kalé, Sameer Kumar, Sindhura Bandhakavi, Justin Meyer Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm. cs . uiuc . edu /

Upload: maya-stokes

Post on 31-Dec-2015

20 views

Category:

Documents


0 download

DESCRIPTION

Faucets: Scheduling on Clusters and Across the Grid. Presenter: Sameer Kumar Team: Sanjay Kal é , Sameer Kumar, Sindhura Bandhakavi, Justin Meyer Parallel Programming Laboratory Department of Computer Science University of Illinois at Urbana-Champaign http://charm.cs.uiuc.edu/. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Faucets:  Scheduling on Clusters and Across the Grid

04/19/23 LACSI 2003 1

Faucets: Scheduling on Clusters and Across the

GridPresenter: Sameer KumarTeam: Sanjay Kalé, Sameer Kumar, Sindhura Bandhakavi, Justin Meyer

Parallel Programming LaboratoryDepartment of Computer Science

University of Illinois at Urbana-Champaignhttp://charm.cs.uiuc.edu/

Page 2: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 2

Outline High-level Description

Motivation Faucets, Cluster Bartering Adaptive jobs, Adaptive queuing system (AQS) Demo

Usage and Installation1. How to write an adaptive program2. Installing and Using the AQS3. Adding your cluster to an existing faucets server4. Installing a faucets server

Page 3: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 3

Motivation1. Demand for high end compute power, but

Dispersed Which machine would give me back my results

quickest? Hard to use

Use ssh to login, ftp files, decide queue, create script, submit Because of the hassle, users just submit same

script to same machine even if a better alternative exists

Monitor a running job

2. Low operational efficiency of existing computing systems

Page 4: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 4

Solution 1: Faucets Motivation #1: dispersed, hard to use Central source of compute power

Users Providers of compute resources User account not needed on every resource

Match users and providers Market economy ? Cluster bartering QoS requirements, contracts and bidding systems

GUI or web-based interface Submission Monitoring

Page 5: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 5

Job Monitor

Job SubmissionFile UploadJob Specs

Bids

Job Specs

File Upload

Job Id

Job Id

Cluster

Cluster

Cluster

Faucets

http://charm.cs.uiuc.edu/research/faucets

Parallel systems need to maximize their efficiency!

Page 6: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 6

Motivation #2: Inefficient Utilization

Job A 10 process

ors

Allocate A !

Job B

8 processors

B QueuedConflict !16 Processor system

Job A

Job B

Current Job Schedulers can have low system utilization !

Page 7: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 7

Solution : Adaptive Jobs Jobs that can shrink or expand the number

of processors they are running on at runtime Improve system utilization and response

time Properties

Min_pe, related to the memory requirements of the job

Max_pe, related to speedup

Scheduler can take advantage of this adaptivity

Page 8: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 8

Two Adaptive Jobs

Job AMax_pe = 10

Min_pe = 1

A Expands !

Job B

Min_pe = 8Max_pe= 16

Shrink AAllocate B !16 Processor system

Job A

Job B

B FinishesAllocate A !

Page 9: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 9

Adaptive Job Scheduler Maximize system utilization and minimize

response time Scheduling decisions

Shrink existing jobs when a new job arrives Expand jobs to use all processors when a job

finishes Processor map sent to the job

Bit vector specifying which processors a job is allowed to use

00011100 (use 3 4 and 5!)

Handles regular (non-adaptive) jobs

Page 10: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 10

Outline High-level description

Motivation Faucets, cluster bartering Adaptive jobs, adaptive queuing system (AQS) Demo

Usage and installation1. How to write an adaptive program2. Installing and using the AQS3. Adding your cluster to an existing faucets server4. Installing a faucets server

Page 11: Faucets:  Scheduling on Clusters and Across the Grid

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTER

ADAPTIVEQ SYSTEM PE PE PE

CLUSTER

SystemOverview

Page 12: Faucets:  Scheduling on Clusters and Across the Grid

FAUCETS SERVER

GUI CLIENT(or)

Web Browser(or)

Command-lineClient

CLUSTERDAEMON

CLUSTER

ADAPTIVEQ SYSTEM PE PE PE

CLUSTER

GUI Client

Page 13: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 13

Secure Communication SSL communication Certificate for Faucets Server

public key distributed on web page, in code

One certificate for each CD Future: Globus

Page 14: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 14

GUI Client One JAR file Runs on Win32 platform Faucets Server Certificate included in

code. GUI client gets CD certificates from

CS

Page 15: Faucets:  Scheduling on Clusters and Across the Grid
Page 16: Faucets:  Scheduling on Clusters and Across the Grid
Page 17: Faucets:  Scheduling on Clusters and Across the Grid

Perf Monitor

Page 18: Faucets:  Scheduling on Clusters and Across the Grid
Page 19: Faucets:  Scheduling on Clusters and Across the Grid

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTER

LOCALSCHEDULER PE PE PE

CLUSTER

Adaptive Jobs

Page 20: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 20

Adaptive Job Framework

Applications written in AMPI or Charm++

Scheduler controls the processor map for each job

Processor map is used by the job’s load balancer

Scheduler

Adaptive Application

AMPI

CHARM++

Loadbalancer

Converse

Proc. Map

Page 21: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 21

Charm++ Charm++: object based

virtualization Program written as a large number of

objects which can migrate Number of objects typically much larger

than processors Load-balancer can remap objects

Measurement based load balancing

Page 22: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 22

Adaptive Charm++ Programs

Charm++ program is adaptive automatically if an adaptive load-balancing strategy is used Currently CommLB and RandcentLB are

adaptive Compile with +balancer CommLB

Page 23: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 23

MPI Jobs

How do we make MPI jobs adaptive? AMPI

AMPI maps the MPI processes to user level threads which can migrate

Each thread is embedded in a charm++ object, thus allowing load balancing and shrink-expand

Page 24: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 24

Writing Adaptive AMPI Programs

Build AMPI with an adaptive load balancing strategies

Call MPI_MIGRATE() at regular intervals in each MPI process, because it will not listen to the processor map otherwise

Use specific load-balancers

Page 25: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 25

Shrink Expand Overhead

Performance for MD program with 10MB migrated data per processor on NCSA Platinum

0.49 0.56 16 8

0.46 0.59 32 16

0.54 0.66 64 32

0.50 0.61128 64

Expand Time (s)Shrink Time (s)Processors

Page 26: Faucets:  Scheduling on Clusters and Across the Grid

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTER

ADAPTIVEQ SYSTEM PE PE PE

CLUSTER

Adaptive Queuing System

Page 27: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 27

AQS Features

Multithreaded Reliable and robust Tested on Linux clusters at UIUC Supports most features of standard queuing

systems Has the ability to manage adaptive jobs

currently implemented in Charm++ and MPI For more details check out

http://charm.cs.uiuc.edu/research/faucets/faucets.html

Page 28: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 28

Components Database Job scheduler Compute cluster

Page 29: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 29

Installing Database Download latest version of MySql

http://www.mysql.com/

Install, then:mysql> create database <dbname>;

mysql> use <dbname>;

mysql> create table jobInfo (id mediumint primary key NOT NULL DEFAULT '0' auto_increment, …..)

mysql> grant all on *.* to <user> identified by <passwd>;

Page 30: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 30

Installing Scheduler cd charm/net-linux/pgms/scheduler; make scheduler; make client; Edit Makefile, put correct path to

MySql Running scheduler as root

su chown root scheduler; chmod +s scheduler

./startScheduler

Page 31: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 31

Installing Scheduler, contd. Edit the startScheduler file:

Edit Database to match <dbname> used earlier.

Edit PORT to point to port of the scheduler Edit DATABASE_HOST DATABASE_USER

and DATABASE_PASSWD to point to the database host, user and password

NODELIST points to the nodelist for the scheduler

Page 32: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 32

Configuring The Cluster User must have access to the cluster only

through the queuing system Each node runs an rsh daemon Access to rsh through a restrictive group

Job switches to the rsh group before running the job

only head node can rsh to the other nodes rsh disabled on the compute nodes

All connections through unix sockets

Page 33: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 33

Using the AQS locally frun runs a job interactively fsub submits a batch job fkill kills the job fjobs list the running and queued

jobs

Page 34: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 34

Scheduling Events When :

Job arrival Job completion Job requests change of number of

processors Job suspension

Scheduling Strategy A plugable component that makes decisions

on which jobs to schedule

Page 35: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 35

Scheduling Strategy Studied

Similar to equipartitioning [N Islam et al] On job arrival and job completion

All running jobs and the new one are allocated their minimum number of processors

Leftover processors are shared equally subject to each job's maximum processor usage

If it is not possible to allocate the new job its minimum number of processors, it is queued

Page 36: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 36

Scheduler Performance

λ=Arrival Rate, MRT=Mean Response Time Utilization=Processor utilization, Load Factor (lf)=Execution Time*λ

Simulation results on 64 processors with mean job execution time of 64.5 sec

1.08764889216460

1.0713968814364.5

0.65462336096100

0.32231853176200

0.1391651368500

Utilization (%)

MRT (s)Utilization (%)

MRT (s)

lfTraditional JobsAdaptive Jobs1/(λ) (s)

Page 37: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 37

Experimental Results

Experiments on Linux cluster on 64 processors and mean

job execution time of 60 sec

1.0743039921160

0.6491166876100

0.3231082970200

0.1291091789500

Utilization (%)

MRT (s)Utilization (%)

MRT (s)

lfTraditional JobsAdaptive Jobs1/(λ) (s)

Page 38: Faucets:  Scheduling on Clusters and Across the Grid

04/19/23 LACSI 2003 38

Adding a Cluster to Faucets

Page 39: Faucets:  Scheduling on Clusters and Across the Grid

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTER

LOCALSCHEDULER PE PE PE

CLUSTER

Page 40: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 40

Adding new cluster Prerequisites

Install Charm++ Install Adaptive Queuing System

Then Download the faucets software

http://charm.cs.uiuc.edu/ Compile the cluster daemon (CD)

cd faucets/cd; make Run the cluster daemon (CD)

cd .. java cd.ClusterDaemon <central server>

<central server port> -p <ClusterDaemon port> <working dir>

Page 41: Faucets:  Scheduling on Clusters and Across the Grid

04/19/23 LACSI 2003 41

Installing a Faucets Server

Page 42: Faucets:  Scheduling on Clusters and Across the Grid

FAUCETS SERVER

GUICLIENT

(or)Web

Browser

CLUSTERDAEMON

CLUSTER

LOCALSCHEDULER PE PE PE

CLUSTER

Page 43: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 43

Installing a Faucets Server Install MySQL

create tables grant permissions

Download JDBC driver http://mmmysql.sourceforge.net/

Install CS download faucets code and unpack cd faucets/cs; make Edit faucets/cs/db.properties cd faucets java -cp .:/path/to/mm.mysql-2.0.8-bin.jar TheServer

Page 44: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 44

Installing Appspector Installation is a little involved Each application needs a display

module written in Java Contact us if you want to install

Page 45: Faucets:  Scheduling on Clusters and Across the Grid

[email protected] LACSI 2003 45

Summary and Future Work Showed you how to use and install the Charm+

+/AMPI adaptive job system Download at http://charm.cs.uiuc.edu

/research/faucets Future

Extend the system to other parallel machines Eliminate residual processes Integrate the scheduler with Globus More comprehensive QoS contracts being developed Sophisticated bidding schemes for the faucets

framewor