Download - ERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER
J. Eitzinger, T. Röhl, W. Hesse, A. Jeutter, E. Focht15.12.2015
FEPAProject status and further steps
2
§ Cluster administrators employ monitoring to§ Detect errors or faulty operation§ Observe total system utilization
§ Application developers use (mostly GUI) tools to do performance profiling
Motivation
Primary TargetProvide a monitoring infrastructure to allow for a continuous system-wide application performance and energy profiling based on hardware performance counter measurements
Ein flexibles Framework zur Energie- und Performanceanalysehochparalleler Applikationen im Rechenzentrum
3
§ Allow to detect applications with pathological performance behavior
§ Help to identify applications with large optimization potential
§ Give users feedback about application performance
§ Ease access to hardware performance counter data
Objectives
STATUS
5
§ Support for new architectures: Intel Silvermont, Intel Broadwelland Broadwell-EP, Intel Skylake
§ Improved overflow detection (including RAPL)§ Improved documentation with many new examples (Cilk+,
C++11 threads)§ More performance groups and validated metrics for many
architectures§ Improvements in likwid-bench and likwid-mpirun
§ New access layer to support platform-independent code (x86, Power, ARM)
RRZE (Thomas Röhl)
6
NEC (Andreas Jeuter)
collector
collector
controller
aggregator
aggregator
collector
node
node
node
node
node
node
node
nodegroup
group
groupper job
per job
store
store
store
NoSQLDB
NoSQLDB
NoSQLDB
ResourceScheduler
per group
Sharding+
Replication
instantiateInstantiate
Program tagger
jobstart/stop
Instantiate at job start(Trigger aggregation)
Kill when job stops
AggMon
§ Componentized§ Fully distributed§ Separate processes: truly parallel§ Implemented in Python§ Connected through ZeroMQ
7
AggMon: Collector
queueZMQPULL tagger match &
publishZMQPUSH
RPCcollectorgmond
ZMQ PUSH
modified
Add tagRemove tag
SubscribeUnsubscribe
Messages: JSON serialized dicts/mapsTagger: adds a key-value to message, based on match condition
Subscribe: based on match condition (key-value, key-value regex)
O(50k)msg/s O(10k)
msg/s
8
§ TokuMX: MongoDB compatible§ Collections can be sharded
§ Spread Documents on different mongod instances§ Entry point: any mongos instance
§ Replication (for example master-slave) is possible
AggMon: Data Store
mongod
Group master
mongod mongod mongod
rack1
mongos
Group master Group mastermongos
Group master...
...
configsvr
rack2 rack3 ...
shard key{ group:rack1, … }
O(10k)msg/s
9
LRZ (Wolfram Hesse, Carla Guillen)
Erfolgreicher Abschluss der Promotion von C. Guillen
§ Validierung der verwendeten Performancemuster§ Statistische Auswertung der Performancemuster§ Dokumentation des PerSyst-Monitoring-System
Knowledge-based Performance Monitoring for Large Scale HPC Architectures; Dissertation C. Guillen Carias; 2015; http://mediatum.ub.tum.de?id=1237547
10
§ PerSyst-Monitoring ist produktiv @ SuperMUC Phase I + II § Definition und Umsetzung der Performancemuster Phase 1 (Westmere-
EX,SandyBridge-EP) und Phase 2 (Haswell-EP)
§ Nutzung und Verifikation durch:§ LRZ-Applikationsunterstützungsgruppe und IBM-Mitarbeiter› Benachrichtigung der Benutzer, falls offensichtliche Bottlenecks vorliegen +
Vorschläge für Optimierungen› Sichtung von Anwendungen für Extreme Scaling und Benchmarks
§ SuperMUC-Benutzer › Pos. Feedback bzg. Nützlichkeit
§ Umsetzung des PerSyst Web-Frontend am RRZE
LRZ: PerSyst Status
ONGOING WORK
Integrate complete stack at RRZEValidate Performance Patterns from profiling data
12
§ How to deal with established monitoring infrastructure (Ganglia)?§ Easy: Use existing monitoring infrastructures§ Target: Replace existing software with FEPA stack
§ Concerns about large overhead of continous HPM profiling§ Overhead could be lower with a better interface to HPM (ISA, OS)§ Missing knowledge about overheads in general
§ Picking the right building blocks.§ Backend daemon: diamond (https://github.com/python-diamond/Diamond)§ Communication protocol: ZeroMQ (http://zeromq.org)§ Storage: TokuMX (NoSQL)
Current Questions
13
§ Target system: 80-node Nehalem cluster system in normal production use
Objectives§ Sort out issues between components§ Validate and benchmark solution:
§ diamond§ mongoDB/TokuMX§ Liferay framework based PerSyst frontend
§ Experiment on application profiling data§ Required granularity for phase detection§ Performance Pattern validation on set of known codes
Integration of FEPA components
14
§ Layers are ready to be integrated into complete stack§ Convergence for finding external building blocks
§ LRZ PerSyst System in production use
Next:§ Continue integrating stack to make FEPA ready to be distributed
at associated HPC centers§ Validate FEPA on a set of known benchmarks (Mantevo, NPB,
SPEC)
Conclusion and Outlook
ERLANGEN REGIONAL COMPUTING CENTER
Thank You.
Leibniz-Rechenzentrum
NEC Deutschland GmbH
RegionalesRechenzentrum
Erlangen