we make the net affordable. we make the net available. we make the net reliable. we make the net...
TRANSCRIPT
We make the net
affordable.
We make the net
available.
We make the net
reliable.
We make the net
work.
Version: 03/2005
Jongjun Son
Sun Microsystems , korea
Sun's Infrastructure Solution for Grid Engine
http://sun.com/grid
2
Agenda
• Sun's Grid Strategy• Sun's Grid Software• N1 Grid Engine 6
Technical Overview
3
Sun's Grid StrategySun's Grid Strategy
4
Sun's Grid Computing Approach
● A flexible and scalable architecture– Pools computing resources to solve important
problems– Collects unused capacity for better utilization– Architecture for seamless addition of resources– Up to hundreds or thousands of processors and
systems
● Multi-platform, Multi-OS● Distributed resource management (DRM)● Distributed system and software
managementA well-designed Grid Computing infrastructure is accessed, A well-designed Grid Computing infrastructure is accessed, used, and managed as a single, unified resourceused, and managed as a single, unified resource
5
Supported Platforms
Download and try it out free atDownload and try it out free athttp://gridengine.sunsource.net:http://gridengine.sunsource.net:
N1 Grid Engine Operating Systems SGEEE53 N1GE6
Solaris (Sparc32) 2.6, √ -
Solaris (Sparc) 7, 8, 9 32-bit √ √
Solaris (Sparc) 10, 64 or 32-bit
- √
Solaris (Sparc) 7, 8, 9,10 64-bit
√ √
Solaris x86, 8, 9 √ √
Solaris x86 10 - √
Linux x86, kernel 2.4, glib >=2.2
√ √
Linux AMD64 (Opteron),kernel 2.4, glibc >= 2.3
√ √
Linux Kernel 2.6 support - √
HP HP-UX 10.20 , 11 - √
IBM AIX 4.3.x, 5.0, 5.1 - √
SGI IRIX 6.2 - 6.5 - √
Apple Mac OS/X - √
Windows 2000, XP, 2003 - √
6
Compute Elements
● Access systems– Thin clients, workstations
● Compute nodes– Linux and Solaris Operating Systems– Compact 1U and 2U servers– Blade servers– Larger symmetric multiprocessing
(SMP) systems– Sun Fire Superclusters
● Pre-configured Grid Computing rack systems
Sun's End-to-end Product Line
Sun Fire ComputeGrid rack system
7
Sun Fire Compute Grid
Engineered, Tested, Integrated, Supported
● Up to 32 Sun Fire V20z,
or Up to 10 V40z
● Sun Control Station
● Sun N1 Grid Engine Software
● Upto 2 * 24port Gigabit Ethernet Switches
● 48-port Terminal Server
● Keyboard/Video/Mouse shelf unit
● Sun Rack 1000-38
48 portTerminal server
Cisco 3750 Gigabit Ethernet Switches
Keyboard/Video/MouseExtendable Shelf
LANLANGigabit Ethernet
Sun Fire V20z Compute nodes
8
Sun's Grid SoftwareSun's Grid Software
9
Software Elements
Sun QFS/SamFSSolaris CacheFS
N1 Grid EngineSolarisTM Resource Manager
N1 Grid Engine
Cluster Grid Cluster Grid
InfrastructuInfrastructure re
Global Grid Global Grid InfrastructurInfrastructur
e e
Enterprise Grid Enterprise Grid
Infrastructure Infrastructure
Sun Management CenterSun Control Station
Service Service Discovery Discovery
Authentication/Authentication/Authorization Authorization
Data Data Management Management
Policy Policy Management Management
Resource Resource Management Management
System System Management Management
Data Data Access Access
Small to Large Grid Computing Solutions
Industry Standards and Industry Standards and partner technologies partner technologies
OGSA, OGSA, Globus Toolkit,Globus Toolkit,AvakiAvaki
10
N1 Grid Engine 6
• Policy management– Owners negotiate usage– 4 different, customizable
policy schemes– Exceptions for specific needs
• Benefits– Equitable, enforceable
sharing between groups– Alignment of resources
with business goals
Distributed Resource Manager, Job scheduling
11
Sun Cluster Grid Manager
● Sun Control Station software– System health and performance monitoring– Pull, push, and automatic provisioning– Deploy both Linux and Solaris x86 images
● Integrated grid management module– Manages Sun Grid Engine or
Sun Grid Engine, Enterprise Edition
● Aggregated Management– Address hundreds of systems individually or
groups– Combined system, software, and grid
management
Unified Remote System and Grid Management
12
Sun Cluster Grid Manager
13
Grid Engine Portal
14
A Complete SolutionProven and Repeatable Reference Architectures
Servers
Workstations
Control Network (Gigabit Ethernet)
Data Network (Gigabit Ethernet)
Sun StorEdgestorage solutions (Direct-attached, NAS, HA-NFS, HPTC SAN)
Sun ONE Grid Engine
Sun Compute Grid rack systems
Sun ClusterGrid Manager
15
Grid Scalability from Local to GlobalCluster, Enterprise, and Global Grids
Global Grid
Enterprise Grid
Enterprise Grid
InternetInternetCluster Grid
Cluster Grid
16
N1 Grid Engine 6N1 Grid Engine 6Technical OverviewTechnical Overview
Agenda● N1 Grid Engine Overview● Architecture● Resource, data access● Application Intergration● N1GE6 New feature● Accounting & Reporting
N1 Grid Engine Overview
## BLAST # blastall -p blastn -i /nfs/data
Grid EngineGrid Engine
Selection of Jobs Simple policies :
FIFO, equal share, rank Sophisticated policies :
sharing, urgency, priority, deadline,resource-based, etc
Selection of Resources System characteristics: CPU,
memory, OS, patches, etc.
Status of systems: avail. mem,load, free disk space, etc.
Status of other resources:
licenses,shared storage, other software, etc.
Selection of Jobs Simple policies :
FIFO, equal share, rank Sophisticated policies :
sharing, urgency, priority, deadline,resource-based, etc
Selection of Resources System characteristics: CPU,
memory, OS, patches, etc.
Status of systems: avail. mem,load, free disk space, etc.
Status of other resources:
licenses,shared storage, other software, etc.
Resource Management
N1 Grid Engine Overview
## BLAST # blastall -p blastn -i /nfs/data
Grid EngineGrid Engine
Control of jobs Suspend, Resume, Kill,
Migrate, Restart Customizable action methods Manual or automated via
policies
Control of resources Regulate load from Grid jobs
basedupon resource value thresholds
Control access via permissions,time/date, jobtype
Allocate systems to jobs based ontotal resource consumption(eg, memory, CPUs, disk, etc)
Control of jobs Suspend, Resume, Kill,
Migrate, Restart Customizable action methods Manual or automated via
policies
Control of resources Regulate load from Grid jobs
basedupon resource value thresholds
Control access via permissions,time/date, jobtype
Allocate systems to jobs based ontotal resource consumption(eg, memory, CPUs, disk, etc)
Resource Control
N1 Grid Engine Overview
## BLAST # blastall -p blastn -i /nfs/data
Grid EngineGrid Engine
Accounting of jobs Current resource consumption
always monitored Total detailed consumption
recorded at end of job Includes record of user,
department, project, etc,
Accounting of resources Current usage of resources on
hosts always monitored Information recorded
over time: resource utilizationof hosts, grid; grid configurationchanges
Accounting of jobs Current resource consumption
always monitored Total detailed consumption
recorded at end of job Includes record of user,
department, project, etc,
Accounting of resources Current usage of resources on
hosts always monitored Information recorded
over time: resource utilizationof hosts, grid; grid configurationchanges
Resource Accounting
Grid Engine 6 Architecture
Submit Host
Admin Host
Master HostSchedd
Qmaster
ExecHost
execd
Access Tier
Compute TierManagement Tier
SGE daemons
TCP/IP
Shadow Host?
22
Resources
Per Host● load_avg● mem_free● OS/patch-level
Global● floating licenses● shared storage
● job resource request: job A needs 1 license and 1GB● Load/suspend thresholds: suspend jobs if load_avg > 1.5● load formulas: send jobs to hosts with least load; out of those, choose hosts with most free memory
Resources used for
THE HEART OF GRID ENGINE MANAGEMENT
Built-in and custom resources● Static resources: strings, numbers, boolean● Countable resources: eg, licenses, MB of memory/disk● Measured resources: value provided through Load Sensor
Parallel and CheckpointingEnvironmentsEnvironmenta set of hosts that is used to support parallel or checkpointing applications
applications must inherently support parallel/checkpointing execution
Env BEnv A
H2
H3
H1 H4
H5
H6
H7
Data Access
Exechosts
App binariesJob data
CONFIGURED INDEPENDENTLY
NFS sharing
File staging
Data Grid
Application Integration Methods
queue/host prolog
JobJob
ENDqueue/host epilog
terminate method
resume method
suspend method
parallel start
parallel stop
parallel stop
queue/host epilog
migration command
clean command
requeue job
checkpoint command
run at specified intervals
MIGRATE
SUSPEND
DELETE
START
EXIT
starter method
General methods
Parallel methods
Checkpointing methods
Integrating applications with Grid Engine
1) Unmodified/legacy application binaries:integrate using wrapper script
2) Interactive applications: use pluggable remote mechanisms, eg, ssh, rsh, telnettwo most common approaches
3) Grid-ready applications: modify code touse DRM APIsAPI recently standardized
4) Java applications: JGrid package for low-level coupling (object/method distribution)currently provided separately
27
N1GE 6 New FeaturesArchitecture● Berkeley DB spooling● Multi-threaded Master Daemon● New communication system
● Scalability goals: N1GE 6 per 1 master– Up to 10,000 unique hosts– Up to 500,000 unique jobs* Array Jobs counted as a single job
28
N1GE 6 Supporting PlatformsOperating Systems SGEEE53 N1GE6
Solaris (Sparc32) 2.6, √ -
Solaris (Sparc) 7, 8, 9 32-bit √ √
Solaris (Sparc) 10, 64 or 32-bit
- √
Solaris (Sparc) 7, 8, 9,10 64-bit
√ √
Solaris x86, 8, 9 √ √
Solaris x86 10 - √
Linux x86, kernel 2.4, glib >=2.2
√ √
Linux AMD64 (Opteron),kernel 2.4, glibc >= 2.3
√ √
Linux Kernel 2.6 support - √
HP HP-UX 10.20 , 11 - √
IBM AIX 4.3.x, 5.0, 5.1 - √
SGI IRIX 6.2 - 6.5 - √
Apple Mac OS/X - √
Windows 2000, XT, 2003 - √ end.CY2004
29
N1GE 6 New FeaturesScheduler Functionality
– Advanced planning capabilities● Resource Reservation w/ Backfilling● Can reserve any resource, eg
memory, CPU, license– More sophisticated scheduling
algorithms● Management policies matched with
business priorities:– Priority, urgency, share tree,
category, deadline, etc
30
Job Resource Reservation
1 CPU1 GB Mem1 license
1 CPU1 GB Mem
1 CPU1 GB Mem
1 GB Mem
1 CPU1 GB Mem
1 CPU1 GB Mem
1 CPU1 GB Mem
1 license
1 CPU
1 GB Mem1 GB Mem1 GB Mem
Job 1 Job 2 Job 3
Job 4
Job 5
Job 6
Jobs, with resource requirements,ordered by priority
length represents duration of requirement
1 license
31
Simple, priority-based scheduling
Time
CP
UM
em.
CP
UM
em.
Lic
.
Host 1
Host 2
Global
Job 1
Job 1
Job 6
Job 3
Job 3
Job 6
Job 3
Job 4
Job 4
Job 4
Job 5
Job 5
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2Job 2Wasted
resources
Goes l ast !
Job 6
32
Scheduling with Resource Reservation
Time
Lic
.
Host 1
Host 2
Global
Job 1
Job 1
Job 3
Job 3
Job 3
Job 4
Job 4
Job 4
Job 5
Job 5
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2Job 2
Job 6
Job 6
Job 6
33
Resource Reservation with backfilling
Time
Lic.
Host 1
Host 2
Global
Job 1
Job 1
Job 6 Job 3
Job 3Job 6Job 3
Job 4
Job 4
Job 4
Job 5
Job 5
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2
Job 2Job 2
Job 6
Resource Management PoliciesResource allocation based upon business priorities
● policy basis includes: cumulative utilization, category priority, time-based priority, resource value, etc
● powerful, flexible, tunable, easy to configure
All jobs
HighPriority
NormalPriority
LowPriority
Dept A: 70more rights
tohigh
priority jobs
Dept B: 30
Dept A: 50
Dept B: 50
Dept A: 50
Dept B: 50
Group X:temporary
boost
35
Policies for Job Prioritization
Priority determines which pending jobs get dispatchedJob priority calculated based on three sub-policies (normalized to 0.0 < N < 1.0):
prio = Wurg Nurg + Wtix Ntix + Wpsx Npsx
Nurg = normalized UrgencyNtix = normalized TicketsNpsx = normalized PosixW = weighting factors
36
6.x Cluster Queue
A B C D
...
... ... ... ...
5.x Queue
A B C D
...
... ... ... ...
Hosts:
Cluster Queue
37
N1GE 6 New FeaturesAnalysis / Monitoring / Accounting● Value-add module for doing
analysis, monitoring, accounting reports, etc.– Fine-grained resource recording– Stored in RDBMS in well-defined schema– provides built-in capability for reporting,
chargeback, etc– Web-based console tool provided for
generating reports, queries, etc.
38
Why 2nd separated DB?● Different access considerations
– Standardized access (SQL, ODBC, JDBC)– More powerful database structure
● Independent of core system data– historical data– Derived data (sums, averages ...)– queries won't affect system
performance– lower requirements on availability
39
Architecture
● Reporting-Writer:Java application
● loosely coupled to the SGE system via qmaster-generated reporting file
● Stores raw data,pre-processed data to SQL-DB via JDBC
Reporting-DB
ReportingFile
Reporting-Writer
Qmaster
rawdata
build derivedvalues
40
Stored Data● Job related information
times, user, project, exit status ...● Host and queue related information
load information, consumables ...● Sharetree
configured shares, actual shares ...● Precomputed, derived values
sums, averages per host, queue, user, project ...
41
ARCo: Accounting and Reporting Console● Web-based tool for displaying data
in reporting DB● Based on Sun Web Console● Ability to create simple and
advanced (SQL-based) queries● Generates tables, graphs,
exportable as CVS, PDF● Also, command-line report
generation
42
Selecting a query
43
Query Results
44
Defining new query
We make the net
affordable.
We make the net
available.
We make the net
reliable.
We make the net
work.
Version: 03/2005
http://sun.com/grid