we make the net affordable. we make the net available. we make the net reliable. we make the net...

Post on 15-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

We make the net

affordable.

We make the net

available.

We make the net

reliable.

We make the net

work.

Version: 03/2005

Jongjun Son

Sun Microsystems , korea

Sun's Infrastructure Solution for Grid Engine

http://sun.com/grid

2

Agenda

• Sun's Grid Strategy• Sun's Grid Software• N1 Grid Engine 6

Technical Overview

3

Sun's Grid StrategySun's Grid Strategy

4

Sun's Grid Computing Approach

● A flexible and scalable architecture– Pools computing resources to solve important

problems– Collects unused capacity for better utilization– Architecture for seamless addition of resources– Up to hundreds or thousands of processors and

systems

● Multi-platform, Multi-OS● Distributed resource management (DRM)● Distributed system and software

managementA well-designed Grid Computing infrastructure is accessed, A well-designed Grid Computing infrastructure is accessed, used, and managed as a single, unified resourceused, and managed as a single, unified resource

5

Supported Platforms

Download and try it out free atDownload and try it out free athttp://gridengine.sunsource.net:http://gridengine.sunsource.net:

N1 Grid Engine Operating Systems SGEEE53 N1GE6

Solaris (Sparc32) 2.6, √ -

Solaris (Sparc) 7, 8, 9 32-bit √ √

Solaris (Sparc) 10, 64 or 32-bit

- √

Solaris (Sparc) 7, 8, 9,10 64-bit

√ √

Solaris x86, 8, 9 √ √

Solaris x86 10 - √

Linux x86, kernel 2.4, glib >=2.2

√ √

Linux AMD64 (Opteron),kernel 2.4, glibc >= 2.3

√ √

Linux Kernel 2.6 support - √

HP HP-UX 10.20 , 11 - √

IBM AIX 4.3.x, 5.0, 5.1 - √

SGI IRIX 6.2 - 6.5 - √

Apple Mac OS/X - √

Windows 2000, XP, 2003 - √

6

Compute Elements

● Access systems– Thin clients, workstations

● Compute nodes– Linux and Solaris Operating Systems– Compact 1U and 2U servers– Blade servers– Larger symmetric multiprocessing

(SMP) systems– Sun Fire Superclusters

● Pre-configured Grid Computing rack systems

Sun's End-to-end Product Line

Sun Fire ComputeGrid rack system

7

Sun Fire Compute Grid

Engineered, Tested, Integrated, Supported

● Up to 32 Sun Fire V20z,

or Up to 10 V40z

● Sun Control Station

● Sun N1 Grid Engine Software

● Upto 2 * 24port Gigabit Ethernet Switches

● 48-port Terminal Server

● Keyboard/Video/Mouse shelf unit

● Sun Rack 1000-38

48 portTerminal server

Cisco 3750 Gigabit Ethernet Switches

Keyboard/Video/MouseExtendable Shelf

LANLANGigabit Ethernet

Sun Fire V20z Compute nodes

8

Sun's Grid SoftwareSun's Grid Software

9

Software Elements

Sun QFS/SamFSSolaris CacheFS

N1 Grid EngineSolarisTM Resource Manager

N1 Grid Engine

Cluster Grid Cluster Grid

InfrastructuInfrastructure re

Global Grid Global Grid InfrastructurInfrastructur

e e

Enterprise Grid Enterprise Grid

Infrastructure Infrastructure

Sun Management CenterSun Control Station

Service Service Discovery Discovery

Authentication/Authentication/Authorization Authorization

Data Data Management Management

Policy Policy Management Management

Resource Resource Management Management

System System Management Management

Data Data Access Access

Small to Large Grid Computing Solutions

Industry Standards and Industry Standards and partner technologies partner technologies

OGSA, OGSA, Globus Toolkit,Globus Toolkit,AvakiAvaki

10

N1 Grid Engine 6

• Policy management– Owners negotiate usage– 4 different, customizable

policy schemes– Exceptions for specific needs

• Benefits– Equitable, enforceable

sharing between groups– Alignment of resources

with business goals

Distributed Resource Manager, Job scheduling

11

Sun Cluster Grid Manager

● Sun Control Station software– System health and performance monitoring– Pull, push, and automatic provisioning– Deploy both Linux and Solaris x86 images

● Integrated grid management module– Manages Sun Grid Engine or

Sun Grid Engine, Enterprise Edition

● Aggregated Management– Address hundreds of systems individually or

groups– Combined system, software, and grid

management

Unified Remote System and Grid Management

12

Sun Cluster Grid Manager

13

Grid Engine Portal

14

A Complete SolutionProven and Repeatable Reference Architectures

Servers

Workstations

Control Network (Gigabit Ethernet)

Data Network (Gigabit Ethernet)

Sun StorEdgestorage solutions (Direct-attached, NAS, HA-NFS, HPTC SAN)

Sun ONE Grid Engine

Sun Compute Grid rack systems

Sun ClusterGrid Manager

15

Grid Scalability from Local to GlobalCluster, Enterprise, and Global Grids

Global Grid

Enterprise Grid

Enterprise Grid

InternetInternetCluster Grid

Cluster Grid

16

N1 Grid Engine 6N1 Grid Engine 6Technical OverviewTechnical Overview

Agenda● N1 Grid Engine Overview● Architecture● Resource, data access● Application Intergration● N1GE6 New feature● Accounting & Reporting

N1 Grid Engine Overview

## BLAST # blastall -p blastn -i /nfs/data

Grid EngineGrid Engine

Selection of Jobs Simple policies :

FIFO, equal share, rank Sophisticated policies :

sharing, urgency, priority, deadline,resource-based, etc

Selection of Resources System characteristics: CPU,

memory, OS, patches, etc.

Status of systems: avail. mem,load, free disk space, etc.

Status of other resources:

licenses,shared storage, other software, etc.

Selection of Jobs Simple policies :

FIFO, equal share, rank Sophisticated policies :

sharing, urgency, priority, deadline,resource-based, etc

Selection of Resources System characteristics: CPU,

memory, OS, patches, etc.

Status of systems: avail. mem,load, free disk space, etc.

Status of other resources:

licenses,shared storage, other software, etc.

Resource Management

N1 Grid Engine Overview

## BLAST # blastall -p blastn -i /nfs/data

Grid EngineGrid Engine

Control of jobs Suspend, Resume, Kill,

Migrate, Restart Customizable action methods Manual or automated via

policies

Control of resources Regulate load from Grid jobs

basedupon resource value thresholds

Control access via permissions,time/date, jobtype

Allocate systems to jobs based ontotal resource consumption(eg, memory, CPUs, disk, etc)

Control of jobs Suspend, Resume, Kill,

Migrate, Restart Customizable action methods Manual or automated via

policies

Control of resources Regulate load from Grid jobs

basedupon resource value thresholds

Control access via permissions,time/date, jobtype

Allocate systems to jobs based ontotal resource consumption(eg, memory, CPUs, disk, etc)

Resource Control

N1 Grid Engine Overview

## BLAST # blastall -p blastn -i /nfs/data

Grid EngineGrid Engine

Accounting of jobs Current resource consumption

always monitored Total detailed consumption

recorded at end of job Includes record of user,

department, project, etc,

Accounting of resources Current usage of resources on

hosts always monitored Information recorded

over time: resource utilizationof hosts, grid; grid configurationchanges

Accounting of jobs Current resource consumption

always monitored Total detailed consumption

recorded at end of job Includes record of user,

department, project, etc,

Accounting of resources Current usage of resources on

hosts always monitored Information recorded

over time: resource utilizationof hosts, grid; grid configurationchanges

Resource Accounting

Grid Engine 6 Architecture

Submit Host

Admin Host

Master HostSchedd

Qmaster

ExecHost

execd

Access Tier

Compute TierManagement Tier

SGE daemons

TCP/IP

Shadow Host?

22

Resources

Per Host● load_avg● mem_free● OS/patch-level

Global● floating licenses● shared storage

● job resource request: job A needs 1 license and 1GB● Load/suspend thresholds: suspend jobs if load_avg > 1.5● load formulas: send jobs to hosts with least load; out of those, choose hosts with most free memory

Resources used for

THE HEART OF GRID ENGINE MANAGEMENT

Built-in and custom resources● Static resources: strings, numbers, boolean● Countable resources: eg, licenses, MB of memory/disk● Measured resources: value provided through Load Sensor

Parallel and CheckpointingEnvironmentsEnvironmenta set of hosts that is used to support parallel or checkpointing applications

applications must inherently support parallel/checkpointing execution

Env BEnv A

H2

H3

H1 H4

H5

H6

H7

Data Access

Exechosts

App binariesJob data

CONFIGURED INDEPENDENTLY

NFS sharing

File staging

Data Grid

Application Integration Methods

queue/host prolog

JobJob

ENDqueue/host epilog

terminate method

resume method

suspend method

parallel start

parallel stop

parallel stop

queue/host epilog

migration command

clean command

requeue job

checkpoint command

run at specified intervals

MIGRATE

SUSPEND

DELETE

START

EXIT

starter method

General methods

Parallel methods

Checkpointing methods

Integrating applications with Grid Engine

1) Unmodified/legacy application binaries:integrate using wrapper script

2) Interactive applications: use pluggable remote mechanisms, eg, ssh, rsh, telnettwo most common approaches

3) Grid-ready applications: modify code touse DRM APIsAPI recently standardized

4) Java applications: JGrid package for low-level coupling (object/method distribution)currently provided separately

27

N1GE 6 New FeaturesArchitecture● Berkeley DB spooling● Multi-threaded Master Daemon● New communication system

● Scalability goals: N1GE 6 per 1 master– Up to 10,000 unique hosts– Up to 500,000 unique jobs* Array Jobs counted as a single job

28

N1GE 6 Supporting PlatformsOperating Systems SGEEE53 N1GE6

Solaris (Sparc32) 2.6, √ -

Solaris (Sparc) 7, 8, 9 32-bit √ √

Solaris (Sparc) 10, 64 or 32-bit

- √

Solaris (Sparc) 7, 8, 9,10 64-bit

√ √

Solaris x86, 8, 9 √ √

Solaris x86 10 - √

Linux x86, kernel 2.4, glib >=2.2

√ √

Linux AMD64 (Opteron),kernel 2.4, glibc >= 2.3

√ √

Linux Kernel 2.6 support - √

HP HP-UX 10.20 , 11 - √

IBM AIX 4.3.x, 5.0, 5.1 - √

SGI IRIX 6.2 - 6.5 - √

Apple Mac OS/X - √

Windows 2000, XT, 2003 - √ end.CY2004

29

N1GE 6 New FeaturesScheduler Functionality

– Advanced planning capabilities● Resource Reservation w/ Backfilling● Can reserve any resource, eg

memory, CPU, license– More sophisticated scheduling

algorithms● Management policies matched with

business priorities:– Priority, urgency, share tree,

category, deadline, etc

30

Job Resource Reservation

1 CPU1 GB Mem1 license

1 CPU1 GB Mem

1 CPU1 GB Mem

1 GB Mem

1 CPU1 GB Mem

1 CPU1 GB Mem

1 CPU1 GB Mem

1 license

1 CPU

1 GB Mem1 GB Mem1 GB Mem

Job 1 Job 2 Job 3

Job 4

Job 5

Job 6

Jobs, with resource requirements,ordered by priority

length represents duration of requirement

1 license

31

Simple, priority-based scheduling

Time

CP

UM

em.

CP

UM

em.

Lic

.

Host 1

Host 2

Global

Job 1

Job 1

Job 6

Job 3

Job 3

Job 6

Job 3

Job 4

Job 4

Job 4

Job 5

Job 5

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2Job 2Wasted

resources

Goes l ast !

Job 6

32

Scheduling with Resource Reservation

Time

Lic

.

Host 1

Host 2

Global

Job 1

Job 1

Job 3

Job 3

Job 3

Job 4

Job 4

Job 4

Job 5

Job 5

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2Job 2

Job 6

Job 6

Job 6

33

Resource Reservation with backfilling

Time

Lic.

Host 1

Host 2

Global

Job 1

Job 1

Job 6 Job 3

Job 3Job 6Job 3

Job 4

Job 4

Job 4

Job 5

Job 5

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2

Job 2Job 2

Job 6

Resource Management PoliciesResource allocation based upon business priorities

● policy basis includes: cumulative utilization, category priority, time-based priority, resource value, etc

● powerful, flexible, tunable, easy to configure

All jobs

HighPriority

NormalPriority

LowPriority

Dept A: 70more rights

tohigh

priority jobs

Dept B: 30

Dept A: 50

Dept B: 50

Dept A: 50

Dept B: 50

Group X:temporary

boost

35

Policies for Job Prioritization

Priority determines which pending jobs get dispatchedJob priority calculated based on three sub-policies (normalized to 0.0 < N < 1.0):

prio = Wurg Nurg + Wtix Ntix + Wpsx Npsx

Nurg = normalized UrgencyNtix = normalized TicketsNpsx = normalized PosixW = weighting factors

36

6.x Cluster Queue

A B C D

...

... ... ... ...

5.x Queue

A B C D

...

... ... ... ...

Hosts:

Cluster Queue

37

N1GE 6 New FeaturesAnalysis / Monitoring / Accounting● Value-add module for doing

analysis, monitoring, accounting reports, etc.– Fine-grained resource recording– Stored in RDBMS in well-defined schema– provides built-in capability for reporting,

chargeback, etc– Web-based console tool provided for

generating reports, queries, etc.

38

Why 2nd separated DB?● Different access considerations

– Standardized access (SQL, ODBC, JDBC)– More powerful database structure

● Independent of core system data– historical data– Derived data (sums, averages ...)– queries won't affect system

performance– lower requirements on availability

39

Architecture

● Reporting-Writer:Java application

● loosely coupled to the SGE system via qmaster-generated reporting file

● Stores raw data,pre-processed data to SQL-DB via JDBC

Reporting-DB

ReportingFile

Reporting-Writer

Qmaster

rawdata

build derivedvalues

40

Stored Data● Job related information

times, user, project, exit status ...● Host and queue related information

load information, consumables ...● Sharetree

configured shares, actual shares ...● Precomputed, derived values

sums, averages per host, queue, user, project ...

41

ARCo: Accounting and Reporting Console● Web-based tool for displaying data

in reporting DB● Based on Sun Web Console● Ability to create simple and

advanced (SQL-based) queries● Generates tables, graphs,

exportable as CVS, PDF● Also, command-line report

generation

42

Selecting a query

43

Query Results

44

Defining new query

We make the net

affordable.

We make the net

available.

We make the net

reliable.

We make the net

work.

Version: 03/2005

jongjun.son@sun.com

http://sun.com/grid

top related