Ian Lumb Bright Evangelist
Bright Cluster ManagerA Comprehensive, Integrated Management Solution for Parallel Universes Today and Tomorrow
In My Parallel Universe …In My Parallel Universe …
In my parallel universe, parallel computing at extreme scale is easy! • Scientists focus on science, engineers on engineering
No problem is out of computational reach Coding has been deprecated!
– Problems are stated in the natural language of the discipline » Implementation suggestions/guidelines are optional
– `Heuristic algorithms’ take care of the implementation specifics (i.e., the coding)
Resources are plentiful!– Physical constraints (e.g., power, cooling & space) have been
eliminated– Generic processors to specialized coprocessors are readily
available – Resource management is completely transparent
Parallel Computing via Bright Cluster ManagerParallel Computing via Bright Cluster Manager
Provisions, monitors and manages all neo-heterogeneous resources• Systems, storage, interconnects, etc.
Management, parallelized• Adaptive provisioning in real time • Topologically based monitoring• Fault tolerance via high availability • One GUI for multiple clusters and clouds
Development simplified • Tools and libraries available • Workloads managed
Bright Cluster
ArchitectureArchitecture
CMDaemon
SOAP+SSL
SOAP+SSL
ClusterManagemen
tGUI
ClusterManagement
Shell
Web-BasedUser Portal
Third-PartyApplications
head node
node001
node003
node002
Cluster Management GUI
Cluster Management Shell
User Portal
SSL / SOAP / X509 / IPtables
Cluster Management Daemon
SLES / RHEL / CentOS / SL
Bright Cluster Manager — ElementsBright Cluster Manager — Elements
SLES / RHEL / CentOS / SL
ScaleMP vSMP
Provisioning
MonitoringAutomation
Health Checks
Management
SLURMTorque/MauiTorque/MOAB
PBS ProGrid Engine
LSF
CompilersLibraries
DebuggersProfilers
CP
U
MIC
Mem
ory
PD
U
IPM
I/iL
O
Inte
rcon
nect
Eth
ern
et
Dis
k
Management InterfaceManagement Interface
Graphical User Interface (GUI) Offers administrator full cluster control Standalone desktop application Manages multiple clusters simultaneously Runs natively on Linux, Windows and MacOS
Cluster Management Shell (CMSH) All GUI functionality also available through
Cluster Management Shell Interactive and scriptable in batch mode
ClusterManagement
GUI
ClusterManagement
Shell
Intel Xeon Phi IntegrationIntel Xeon Phi Integration
Everything needed to enable Xeon Phi on a cluster is packaged as easy-to-install Bright packages:• Xeon Phi driver • Xeon Phi runtime• Xeon Phi SDK• Xeon Phi OFED• Xeon Phi flash utilities
Environment modules ensure that user environment is set up perfectly (PATH, LD_LIBRARY_PATH, ...)
Xeon Phi driver recompiled automatically against running kernel at boot-time
Intel Xeon Phi IntegrationIntel Xeon Phi Integration
Set-up wizard takes care of initial Xeon Phi configuration (e.g. creating bridge interfaces, assigning IP addresses)
Xeon Phi appears as a first-class device type in cluster management infrastructure
Xeon Phi can be configured, controlled and monitored through CMSH and CMGUI
Xeon Phi is automatically added to the workload management system as a consumable resource
Compute jobs may request Xeon Phi resource in job script
11
Bright Cluster
Architecture — MonitoringArchitecture — Monitoring
CMDaemon
metrics
data
ClusterManagemen
tGUI
ClusterManagement
Shell
Web-BasedUser Portal
Third-PartyApplications
head node
node001
node003
node002
metrics
metrics
metrics
metrics
raw data consolidated data
BMC
BMC
BMC
Cluster Health ManagementCluster Health Management
Goal: provide problem free environment for running jobs Regular health checks
• Actions that return PASS, FAIL or UNKNOWN• Can be associated with a settable severity and a message• Can launch an action based on any response value
Pre-job health checks 16 Xeon Phi health checks included by default Jobs will only be scheduled to nodes where Xeon Phi is
working properly (as determined by health checks) Intel Cluster Checker included to verify that cluster is set
up properly
Intel Xeon Phi Workload ManagementIntel Xeon Phi Workload Management
Three ways to run Xeon Phi jobs:• Offload (i.e. Xeon Phi is used as coprocessor from host)• Native (i.e. job executes entirely on Xeon Phi)• Symmetric (i.e. communicating processes on both host and
Xeon Phi) Offload: Xeon Phi represented as consumable resource in
workload management system Native: Ported Slurm to Xeon Phi Symmetric: work in progress, will require some changes
to workload managers Additional work in progress: make sure Xeon Phi is not
used in multiple modes simultaneously
Cherry Creek
Bright Cluster Manager makes it easy to install, manage and use clusters with Intel Xeon
Phi coprocessors.
Questions?