resource management system : condor · computer science, university of warwick 1 resource...
TRANSCRIPT
![Page 1: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/1.jpg)
1Computer Science, University of WarwickComputer Science, University of Warwick
Resource Management System : CondorResource Management System : Condor
Condor was designed to make use of the spare capabilities in a network of pc/workstations.
Dual modes • Non-dedicated mode (cycle stealing).
• dedicated mode that can manage clusters
Condor uses resource ClassAd to advertise resources
Jobs’ requirements are specified in job submission script
Condor matchmaker matches the job requirements with resource requirements
Condor Software is available for free on the web.
Hundreds of Condor installations used worldwide in both academia and industry.
![Page 2: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/2.jpg)
2Computer Science, University of WarwickComputer Science, University of Warwick
Resource Management System : CondorResource Management System : Condor
Universes:
Condor have the notion of an “execution universe”.
Each universe is an environment that supports a particular set of functionalities.
Condor has a number of these:• Standard - more advanced environment that supports checkpointing
and process migration; use condor_compile to relink the executable
• Vanilla - very basic execution functionality designed to support plain serial applications.
• parallel - designed to run parallel jobs (e.g. MPI) on dedicated clusters
![Page 3: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/3.jpg)
3Computer Science, University of WarwickComputer Science, University of Warwick
Resource Management System : CondorResource Management System : Condor
ClassAds:ClassAds allow Condor to match job resource requests with the most appropriate resource for running the job.
Users (jobs) have constraints - Job ClassAds:
• “must run on Linux… have 1 GB of memory available… 50 GB of disk space”
Owners (machines) wish to advertise their resources but also have constraints - Machine ClassAds:
• “this cluster consists of 32 Nodes… 2GHz Opterons… reject jobs from Ligang”
Condor matchmaker reviews the Job ClassAds and the Machine ClassAds, then identifies a match that satisfies the requirements of both.
![Page 4: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/4.jpg)
4Computer Science, University of WarwickComputer Science, University of Warwick
Resource Management System : CondorResource Management System : Condor
Submitting (from Condor manual):
Write a condor job submission script to inform the system about the job:
• includes the executable, universe, input, output and error files to use, command-line arguments, requirements.
Can describe many jobs at once, each with different parameters.
# Simple condor_submit input fileUniverse = vanillaExecutable = my_jobQueue
![Page 5: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/5.jpg)
5Computer Science, University of WarwickComputer Science, University of Warwick
Resource Management System : CondorResource Management System : Condor
Condor Submission Process:
After you have submitted the job submission script to Condor, itgenerates a ClassAd that describes your job(s).
It is then up to Condor to identify the more appropriate match between the generated job ClassAd and the Machine ClassAds.
![Page 6: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/6.jpg)
6Computer Science, University of WarwickComputer Science, University of Warwick
Resource Management System : CondorResource Management System : Condor
# Example condor_submit input file
Universe = vanillaExecutable = /home/dps/condor/dan-job.condorInput = myjob.stdinOutput = myjob.stdoutError = myjob.stderrArguments = -X 10 -Y 20 -Z 30InitialDir = /home/condor/exp-1.0Requirements = Memory >= 1024 && Disk > 50000Rank = (KFLOPS*10000) + MemoryQueue
![Page 7: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/7.jpg)
7Computer Science, University of WarwickComputer Science, University of Warwick
Condor daemonsCondor daemons
condor_masterIt is responsible for keeping the rest of the Condor daemons running on each machine in a pool
The master spawns the other daemon
runs on every machine in your Condor pool
condor_startdAny machine that wants to execute jobs needs to have this daemon running
It advertises a machine ClassAd
responsible for enforcing the policy under which the jobs will be started, suspended, resumed, vacated, or killed
condor_starterSpawned by condor_startd
sets up the execution environment , create a process to run the user job, and monitors the job running
Upon job completion, sends back status information to the submitting machine, and exits
![Page 8: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/8.jpg)
8Computer Science, University of WarwickComputer Science, University of Warwick
condor_scheddAny machine that allows users to submit jobs needs to have the daemon running
Users submit jobs to the condor_schedd, where they are stored in the job queue.
condor_submit, condor_q, or condor_rm connect to the condor_schedd to view and manipulate the job queue
condor_shadowCreated by condor_schedd
runs on the machine where a job was submitted
Any system call performed on the remote execute machine is sent over the network to this daemon, and the shadow performs the system call (such as file I/O) on the submit machine and the result is sent back over the network to the remote job
![Page 9: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/9.jpg)
9Computer Science, University of WarwickComputer Science, University of Warwick
condor_collector
collecting all the information about the status of a Condor pool
All other daemons periodically updates information to the collector
These information contain all the information about the state of the daemons, the resources they represent, or resource requests of the submitted jobs
condor_status connects to this daemon for information
condor_negotiator
responsible for all the matchmaking within the Condor system
responsible for enforcing user priorities in the system
![Page 10: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/10.jpg)
10Computer Science, University of WarwickComputer Science, University of Warwick
![Page 11: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/11.jpg)
11Computer Science, University of WarwickComputer Science, University of Warwick
![Page 12: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/12.jpg)
12Computer Science, University of WarwickComputer Science, University of Warwick
Resource Management System : CondorResource Management System : Condor
Most jobs are not independent:Dependencies exists between jobs.
Second stage cannot start until first stage has completed.
Condor uses DAGMan - Directed Acyclic Graph Manager
DAGMan allows you to specify dependencies between your Condor jobs, so it can manage them automatically for you.
DAGs are the data structures used by DAGMan to represent these dependencies.
Each job is a “node” in the DAG.
Each node can have any number of “parent” or “children” nodes – as long as there are no loops.
(example from Condor tutorial).
![Page 13: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/13.jpg)
13Computer Science, University of WarwickComputer Science, University of Warwick
Resource Management System : CondorResource Management System : Condor
A DAG is defined by another text file, listing each of nodes and their dependencies:# diamond.dag
Job A a.sub
Job B b.sub
Job C c.sub
Job D d.sub
Parent A Child B C
Parent B C Child D
Each node will run the Condor job specified by an accompanying Condor submit file.
Job A
Job B Job C
Job D
![Page 14: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/14.jpg)
High Performance ComputingHigh Performance Computing Course Notes 2007Course Notes 2007--20082008
Cluster TechnologiesCluster Technologies
![Page 15: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/15.jpg)
15Computer Science, University of WarwickComputer Science, University of Warwick
21 00
2 1 00 2 1 00 2 1 002 1 00
2 1 00 2 1 00 2 1 002 1 00
Desktop(Single Processor?)
SMPs orSuperCom
puters
LocalCluster
GlobalCluster/Grid
PERFORMANCE
Computing Platforms EvolutionComputing Platforms EvolutionBreaking Administrative BarriersBreaking Administrative Barriers
Inter PlanetCluster/Grid ??
IndividualGroupDepartmentCampusSta teNationalGlobeInte r Plane tUniverse
Administrative Barriers
EnterpriseCluster/Grid
?
![Page 16: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/16.jpg)
16Computer Science, University of WarwickComputer Science, University of Warwick
Design Space of Competing Computer Design Space of Competing Computer ArchitectureArchitecture
![Page 17: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/17.jpg)
Original Food Chain Picture
![Page 18: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/18.jpg)
18Computer Science, University of WarwickComputer Science, University of Warwick
1984 Computer Food Chain
Mainframe
Vector Supercomputer
Mini ComputerWorkstation
PC
![Page 19: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/19.jpg)
19Computer Science, University of WarwickComputer Science, University of Warwick
Mainframe
Vector Supercomputer MPP
WorkstationPC
1994 Computer Food Chain
Mini Computer(hitting wall soon)
(future is bleak)
Obsolete
![Page 20: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/20.jpg)
20Computer Science, University of WarwickComputer Science, University of Warwick
Computer Food Chain (Now and Future)
![Page 21: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/21.jpg)
21Computer Science, University of WarwickComputer Science, University of Warwick
No. of Processors
C.P
.I.
1 2 . . . .
Computational Power ImprovementComputational Power Improvement
Multiprocessor
Uniprocessor
![Page 22: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/22.jpg)
22Computer Science, University of WarwickComputer Science, University of Warwick
Age
Gro
wth
5 10 15 20 25 30 35 40 45 . . . .
Human Physical Growth AnalogyHuman Physical Growth Analogy:: Computational Power ImprovementComputational Power Improvement
Vertical Horizontal
![Page 23: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/23.jpg)
23Computer Science, University of WarwickComputer Science, University of Warwick
Why Clusters now?Why Clusters now?
Clustering gained new wave of interests when 3 technologies converged:
1. Very high performance Microprocessors• workstation performance = yesterday supercomputers
2. High speed communication
3. Standard tools for parallel/ distributed programming
![Page 24: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/24.jpg)
24Computer Science, University of WarwickComputer Science, University of Warwick
Cluster Components...1Cluster Components...1 NodesNodes
Multiple High Performance Components:PCsWorkstationsSMPs (CLUMPS)Distributed HPC Systems
They can be based on different architectures and running difference OS
But usually, the nodes in a cluster are homogenous - they have same architecture and performance and are installed with the same os
![Page 25: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/25.jpg)
25Computer Science, University of WarwickComputer Science, University of Warwick
Cluster ComponentsCluster Components……22 OSOS
State of the art OS:Linux (Beowulf)
Microsoft NT (Illinois HPVM)
SUN Solaris (Berkeley NOW)
IBM AIX (IBM SP2)
HP UX (Illinois - PANDA)
Mach (Microkernel based OS) (CMU)
![Page 26: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/26.jpg)
26Computer Science, University of WarwickComputer Science, University of Warwick
Cluster ComponentsCluster Components……33 High Performance NetworksHigh Performance Networks
Ethernet (10Mbps), Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps)SCIMyrinetInfinibandFDDI
![Page 27: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/27.jpg)
27Computer Science, University of WarwickComputer Science, University of Warwick
Cluster ComponentsCluster Components……44 Network InterfacesNetwork Interfaces
Network Interface CardMyrinet NIC
Ethernet NIC
…
![Page 28: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/28.jpg)
28Computer Science, University of WarwickComputer Science, University of Warwick
Cluster ComponentsCluster Components……5 Communication 5 Communication SoftwareSoftware
Traditional OS supported facilities (heavy weight due to protocol processing)..
Sockets (TCP/IP), Pipes, etc.
Light weight protocols (User Level)Active Messages (Berkeley)
Fast Messages (Illinois)
U-net (Cornell)
XTP (Virginia)
System systems can be built on top of the above protocols
![Page 29: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/29.jpg)
29Computer Science, University of WarwickComputer Science, University of Warwick
Cluster ComponentsCluster Components……66 Cluster MiddlewareCluster Middleware
Resides Between OS and Applications and support:
Single System Image (SSI)
System Availability (SA)
SSI makes collection appear as single machine (globalised view of system resources). ssh cluster.myinstitute.edu
SA - Check pointing and process migration..
![Page 30: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/30.jpg)
30Computer Science, University of WarwickComputer Science, University of Warwick
Cluster ComponentsCluster Components……77 Programming environmentsProgramming environments
Threads (PCs, SMPs, NOW..) POSIX Threads
Java Threads
MPILinux, NT, on many Supercomputers
PVM
DSMs
![Page 31: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/31.jpg)
31Computer Science, University of WarwickComputer Science, University of Warwick
Cluster ComponentsCluster Components……88 Development Tools Development Tools
CompilersC/C++/Java/ ;
Parallel programming with C++ (MIT Press book)
Debuggers
Performance Analysis Tools
Visualization Tools
![Page 32: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/32.jpg)
32Computer Science, University of WarwickComputer Science, University of Warwick
Cluster ComponentsCluster Components……99 ApplicationsApplications
Sequential
Parallel / Distributed
Grand Challenging applications• Weather Forecasting
• Computational Fluid Dynamics
• Molecular Biology Modeling
• Engineering Analysis (CAD/CAM)
• ……………….
Web servers,data-mining
![Page 33: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/33.jpg)
33Computer Science, University of WarwickComputer Science, University of Warwick
Cluster Architectures Cluster Architectures
Cluster system architecture
![Page 34: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/34.jpg)
34Computer Science, University of WarwickComputer Science, University of Warwick
Clusters Classification..1Clusters Classification..1
Based on Focus (in Market)High Performance (HP) Clusters
• Grand Challenging Applications
High Availability (HA) Clusters• Mission Critical applications
![Page 35: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/35.jpg)
35Computer Science, University of WarwickComputer Science, University of Warwick
Clusters Classification..2Clusters Classification..2
Based on Workstation/PC OwnershipDedicated Clusters
Non-dedicated clusters
![Page 36: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/36.jpg)
36Computer Science, University of WarwickComputer Science, University of Warwick
Clusters Classification..3Clusters Classification..3
Based on Node Architecture..Clusters of PCs (CoPs)
Clusters of Workstations (COWs)
Clusters of SMPs (CLUMPs)
![Page 37: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/37.jpg)
37Computer Science, University of WarwickComputer Science, University of Warwick
Clusters Classification..4Clusters Classification..4
Based on Node OS Type..Linux Clusters (Beowulf)
Solaris Clusters (Berkeley NOW)
NT Clusters (HPVM)
AIX Clusters (IBM SP2)
SCO/Compaq Clusters (Unixware)
HP clusters, ………………..
![Page 38: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/38.jpg)
38Computer Science, University of WarwickComputer Science, University of Warwick
Clusters Classification..5Clusters Classification..5
Based on node components architecture & configuration (Processor Arch, Node Type: PC/Workstation.. & OS: Linux/NT..):
Homogeneous Clusters• All nodes will have similar configuration
Heterogeneous Clusters• Nodes based on different processors and running
different OSes.
![Page 39: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/39.jpg)
39Computer Science, University of WarwickComputer Science, University of Warwick
Clusters Classification..6Clusters Classification..6 Levels of ClusteringLevels of Clustering
Group Clusters (#nodes: 2-99) (a set of dedicated/non-dedicated computers - mainly connected by SAN like Myrinet)
Departmental Clusters (#nodes: 99-999)Organizational Clusters (#nodes: many 100s)Internet-wide Clusters=Global Clusters:(#nodes: 1000s to many millions)
![Page 40: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/40.jpg)
40Computer Science, University of WarwickComputer Science, University of Warwick
What is Single System Image (SSI) ?What is Single System Image (SSI) ?
A single system image is the illusion, created by software or hardware, that presents a collection of resources as one, more powerful resource.
SSI makes the cluster appear like a single machine to the user, to applications, and to the network.
A cluster without a SSI is not a cluster
![Page 41: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/41.jpg)
41Computer Science, University of WarwickComputer Science, University of Warwick
A typical Cluster Computing EnvironmentA typical Cluster Computing Environment
PVM / MPI/ SSH
Application
Hardware/OS
???
![Page 42: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/42.jpg)
42Computer Science, University of WarwickComputer Science, University of Warwick
The missing link is provide by cluster The missing link is provide by cluster middleware/middleware/underwareunderware
PVM / MPI/ SSH
Application
Hardware/OS
Middleware
Single Entry Pointssh cluster.my_institute.edussh node1.cluster. institute.edu
![Page 43: Resource Management System : Condor · Computer Science, University of Warwick 1 Resource Management System : Condor Condor was designed to make use of the spare capabilities in a](https://reader030.vdocuments.net/reader030/viewer/2022040522/5e7dcb4053dc5829cc6b20be/html5/thumbnails/43.jpg)
43Computer Science, University of WarwickComputer Science, University of Warwick
Benefits of Single System ImageBenefits of Single System Image
Usage of system resources transparentlyTransparent process migration and load balancing across nodes.Improved reliability and higher availabilityImproved system response time and performanceSimplified system managementReduction in the risk of operator errorsUser need not be aware of the underlying system architecture to use these machines effectively