a comparison of library tracking methods in high performance computing computer system cluster and...
TRANSCRIPT
A Comparison of Library Tracking Methods in High Performance
Computing
Computer System Cluster and Networking Summer Institute 2013Poster Seminar
William Rosenberger (New Mexico Tech), Dennis Trujillo (New Mexico State University)
Chris DeJager (Michigan Technological University)
LA-UR-13-25963
Need for a Library Tracking Utility
• HPC requires the support of many software packages, and multiple versions must be supported at the same timeo Issues
License availability Additional licensing cost Admin support
• No existing methods of tracking user software behavior
• Current user tracking methods don't typically illustrate true use
• Eliminate less utilized software and compiler options
5/3
0/0
9
6/2
/09
6/5
/09
6/8
/09
6/1
1/0
9
6/1
4/0
9
6/1
7/0
9
6/2
0/0
9
6/2
3/0
9
6/2
6/0
9
6/2
9/0
9
7/2
/09
7/5
/09
7/8
/09
7/1
1/0
9
7/1
4/0
9
0
100
200
300
400
500
600
Simulated New Software Version Acceptance
openmpi-1.5.4openmpi-1.6.5U
se
rs
5/3
0/0
9
6/2
/09
6/5
/09
6/8
/09
6/1
1/0
9
6/1
4/0
9
6/1
7/0
9
6/2
0/0
9
6/2
3/0
9
6/2
6/0
9
6/2
9/0
9
7/2
/09
7/5
/09
7/8
/09
7/1
1/0
9
7/1
4/0
9
0
100
200
300
400
500
600
Simulated New Software Version Re-jection
gcc-4.4.4
gcc-4.4.7Us
ers
Potential Tracking Solutions
Automatic Library Tracking Database• Developed at National Institute of Computational
Science for use on Cray systems• Presented at Cray User Group 2010• Wraps the linker 'ld' and job launcher 'aprun'• Stores to MySQL databaseLinux Auditing Utility (auditd)
• System daemon• Tracks specified files for access and or writes• Stores to log files
Linux Auditing Utility (auditd)
• Provides log information on targeted files or directorieso Object access and useo Stores to log fileso Provides commands for searching log data
• Originally meant for security
• Root access required to add files to be tracked
• Log file is readable by a non-root linux group
ALTD OperationMySQL Database
Linkline TableTable Data:• Linkline_id: Reference number to this
record• Unique list of libraries used in a
compilation
Link Tag TableTable Data:• Tag_id: Reference number to this record• Reference number to the Linkline entry.• Username, date linked, build machine
Jobs TableTable Data:• Reference number to the Link Tag table• Run machine, date run, username
Wrapper ScriptsLinker Wrapper
• Gets list of libraries used• Creates Linkline entry• Creates Link Tag entry
Job Launch Wrapper• Pulls reference number to the Link Tag
table from the ALTD assembly header• Creates Jobs table entry
ALTD Alterations
• Support added for wrapping mpirun and mvapich2 job launchers, while still allowing aprun
• Detects version of prefered MPI wrapped compiler and MPI executable and wraps accordingly
• Created script to query the database and collect the number of times a library has been compiled and run
• Made ALTD entries generic across servers
ALTD
Pros:• Data is well organized into
structured database• Theoretical constant low
overhead• Tracks all libraries
automatically
Cons:• Requires a SQL database• Tracks libraries used at
compile time, not necessarily libraries used at run time
auditdPros: • Standard Linux daemon• Well tested by the Linux
community• Logs to a flat file
Cons:
• Dynamically linked libraries cannot be reliably tracked, the .so files are cached in memory and not read at every run
• Statically linked libraries can’t be track at runtime as they are in the executable
• A single make can generate numerous log entries
System Overhead Testing
• Interested in increase in compile and run time of different solutions over a control time
• Linpack was run while varying the problem size and the number of processors working on the problem
• A simple "do nothing" MPI program was also tested to investigate launch time more easily
4 9 16 25 36 49 64Number of Processors
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Additional Runtime for a Simple MPI Program
ALTD
auditd
Tim
e (s
)
2*2 3*3 4*4 5*5 6*6 7*7 8*8
0102030405060708090
100
500
1500
2500
3500
4500
5500
6500
7500
Linpack Runtime (Control)
Processor Grid (P*Q)
Ru
nti
me
(s
)
Pro
ble
m S
ize
(N
)
2*2 3*3 4*4 5*5 6*6 7*7 8*8
0102030405060708090
100
500
1500
2500
3500
4500
5500
6500
7500
Linpack Runtime with ALTD
Processor Grid (P*Q)
Ru
nti
me
(s
)
Pro
ble
m S
ize
(N
)
2*2 3*3 4*4 5*5 6*6 7*7 8*8
0102030405060708090
100
500
1500
2500
3500
4500
5500
6500
7500
Linpack Runtime with auditd
Processor Grid (P*Q)
Ru
nti
me
(s
)
Pro
ble
m S
ize
(N
)
2*2 3*3 4*4 5*5 6*6 7*7 8*8-0.2
0
0.2
0.4
0.6
0.8
1
5001500 2500 3500 4500 5500 6500 7500
Additional Runtime for Linpack withALTD
Processor Grid (P*Q)
Ru
nti
me
Dif
fere
nc
e (
s)
Pro
ble
m S
ize
(N
)
Average: 0.121 (s)
2*2 3*3 4*4 5*5 6*6 7*7 8*8-0.2
0
0.2
0.4
0.6
0.8
1
5001500 2500 3500 4500 5500 6500 7500
Additional Runtime for Linpack with auditd
Processor Grid (P*Q)
Ru
nti
me
Dif
fere
nc
e (
s)
Pro
ble
m S
ize
(N
)
Average: 0.00304 (s)
Simple MPI Program Linpack0
0.05
0.1
0.15
0.2
0.25
Additional Compilation Time Required
ALTD
auditd
Tim
e (s
)
Summary
• ALTDo The Automatic Library Tracking Database has the
potential to track libraries well in HPCo Performance overhead is minimal and seems to be
constant, regardless of job sizeo Additions to the software have provided for simplicity
in utilization and data evaluation.
• auditdo Outperforms ALTD in some cases but falls behind
compiling larger programs o auditd is not sufficient to track all types of libraries
during compilation and runtime
We would like to thank the following individuals for their support throughout the workshop:
Georgia PediciniDavid GunterDane GardnerMatt BroomfieldCarol HogsettJosephine OlivasCarol ConnorGary Grider
Thanks!