www.vacet.org brad whitlock october 14, 2009 brad whitlock october 14, 2009 porting visit to bg/p

of 9/9
www.vacet.org Brad Whitlock October 14, 2009 Porting VisIt to BG/P

Post on 20-Dec-2015

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • www.vacet.org Brad Whitlock October 14, 2009 Brad Whitlock October 14, 2009 Porting VisIt to BG/P
  • Slide 2
  • www.vacet.org Overview Objectives Building 3rd party libraries Building VisIt Running VisIt on BG/P Improvements Impact Future work Objectives Building 3rd party libraries Building VisIt Running VisIt on BG/P Improvements Impact Future work
  • Slide 3
  • www.vacet.org Objectives Port VisIt to IBMs BlueGene/P platform so VisIt can run on LLNLs Dawn and eventually Sequoia Dawn is a 500 Teraflop, 36,864 node, 147,456 cpu, IBM BG/P system 4 850MHz PowerPC cores/node, 4Gb Memory/node Compute nodes run CNK OS Cross-compile code for CNK Identify weaknesses in VisIt that prevent it from scaling to tens/hundreds of thousands of processors Port VisIt to IBMs BlueGene/P platform so VisIt can run on LLNLs Dawn and eventually Sequoia Dawn is a 500 Teraflop, 36,864 node, 147,456 cpu, IBM BG/P system 4 850MHz PowerPC cores/node, 4Gb Memory/node Compute nodes run CNK OS Cross-compile code for CNK Identify weaknesses in VisIt that prevent it from scaling to tens/hundreds of thousands of processors
  • Slide 4
  • www.vacet.org Building 3rd party libraries Built all libraries on login nodes for regular Linux PowerPC version of VisIt Ran into runtime problems using xlC compiler so reverted to g++ for the time being Cross-compiled all libraries for CNK No support for this platform in VisIts 3rd party libraries so special builds were required Mesa built unmangled and no X11 VTK tricky to build No OpenGL so VTK built with Mesa as its OpenGL No X11 so created custom render window Used CMake toolchain file Built all libraries on login nodes for regular Linux PowerPC version of VisIt Ran into runtime problems using xlC compiler so reverted to g++ for the time being Cross-compiled all libraries for CNK No support for this platform in VisIts 3rd party libraries so special builds were required Mesa built unmangled and no X11 VTK tricky to build No OpenGL so VTK built with Mesa as its OpenGL No X11 so created custom render window Used CMake toolchain file
  • Slide 5
  • www.vacet.org Building VisIt No X11 so graphical components cant be built for CNK (dont build gui) Added new --enable-engine-only build mode to VisIts build system that only builds the compute engine and its plugins VisIt always used to require mangled mesa This support had to become conditional on VTK having mangled mesa support No X11 so graphical components cant be built for CNK (dont build gui) Added new --enable-engine-only build mode to VisIts build system that only builds the compute engine and its plugins VisIt always used to require mangled mesa This support had to become conditional on VTK having mangled mesa support
  • Slide 6
  • www.vacet.org Running VisIt on Dawn Dawn uses mpirun to start VisIt on compute nodes Minor differences required environment variables to be exported via mpirun command, which could be handled via host profile in VisIt VisIt ran at 1k,2k,4k,8k,16k nodes VisIt ran with 1 and 4 trillion zone datasets (June09) Encountered scaling problems early Launch time slow because each processor was reading plugin directory to obtain plugin information VisIt commands were sent from rank 0 to other ranks 1Kb at a time until a message was sent Non-spinning bcast substitute used for sending commands had point-to-point that performed poorly at scale Certain metadata consumed too much memory (each processor has ~700Mb only) Synchronization step for SR mode used slow point-to-point Dawn uses mpirun to start VisIt on compute nodes Minor differences required environment variables to be exported via mpirun command, which could be handled via host profile in VisIt VisIt ran at 1k,2k,4k,8k,16k nodes VisIt ran with 1 and 4 trillion zone datasets (June09) Encountered scaling problems early Launch time slow because each processor was reading plugin directory to obtain plugin information VisIt commands were sent from rank 0 to other ranks 1Kb at a time until a message was sent Non-spinning bcast substitute used for sending commands had point-to-point that performed poorly at scale Certain metadata consumed too much memory (each processor has ~700Mb only) Synchronization step for SR mode used slow point-to-point
  • Slide 7
  • www.vacet.org Improvements Broadcast plugin information from rank 0 to other ranks to improve plugin loading time 9x Broadcast VisIt commands from rank 0 in a single chunk instead of 1Kb at a time Use standard bcast in engine main loop instead of poorly performing non-spin substitute geared towards shared nodes Switched to alternate metadata representation to free up most available memory for calculations Mark Miller was able to replace SR mode synchronization step with much faster version that reduced time to 2 seconds from 20 minutes Broadcast plugin information from rank 0 to other ranks to improve plugin loading time 9x Broadcast VisIt commands from rank 0 in a single chunk instead of 1Kb at a time Use standard bcast in engine main loop instead of poorly performing non-spin substitute geared towards shared nodes Switched to alternate metadata representation to free up most available memory for calculations Mark Miller was able to replace SR mode synchronization step with much faster version that reduced time to 2 seconds from 20 minutes
  • Slide 8
  • www.vacet.org Impact So far this projects impact has been small for customers They do not yet run on Dawn They might not notice small improvements at todays everyday processor counts (4k) optimizations added by this work prevent bottlenecks in compute engine, improving scalability So far this projects impact has been small for customers They do not yet run on Dawn They might not notice small improvements at todays everyday processor counts (4k) optimizations added by this work prevent bottlenecks in compute engine, improving scalability
  • Slide 9
  • www.vacet.org Future work Resolve load problems with xlC compiler so we can use the best optimizations, including using BG/Ps dual FPUs Improve 3rd party library build process for BG/P by adding support in build_visit script Continue profiling plots and improving performance Reduce memory usage where possible Investigate I/O patterns and attempt optimizations Resolve load problems with xlC compiler so we can use the best optimizations, including using BG/Ps dual FPUs Improve 3rd party library build process for BG/P by adding support in build_visit script Continue profiling plots and improving performance Reduce memory usage where possible Investigate I/O patterns and attempt optimizations