high performance computing: lessons learned in … · – nmr, again – machine learning/...

43
10/20/15 1 High Performance Computing: Lessons Learned in Academia and Industry William J. Brouwer Senior Software Geophysicist Schlumberger So Cal. Simulations in Science

Upload: dinhanh

Post on 03-Jul-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 1

High Performance Computing: Lessons Learned in Academia and Industry

William J. Brouwer Senior Software Geophysicist

Schlumberger

So Cal. Simulations in Science

Page 2: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 2

Outline● Career Overview● HPC

● Definitions,challenges

● Selected Projects

– GPU Applications

– Plot2txt● Technology

– Embedded

– Networks● Mesh/Ad-hoc

– Cloud● AWS

● Tools

● Summary

So Cal. Simulations in Science

Page 3: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 3

Career Overview

● Began life as an experimentalist ● BSc Honors (UQ/ Australia)

– Optical diagnostics / shock tunnels / scramjet● PhD (W&M / Virginia)

– Solid state nuclear magnetic resonance / condensed matter● Moving to computation

● Postdoctoral (Penn State)

– NMR, again

– Machine learning/ databases / ab initio computations

So Cal. Simulations in Science

Page 4: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 4

Career Overview● A shadowgraph of a shock wave forming around a cylinder, produced in a shock

tunnel using custom-built Cranz- Schardin camera (Honors year)

So Cal. Simulations in Science

Page 5: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 5

Career Overview● As a grad student & postdoc, performed multiple-quantum MAS experiments and

wrote software for lineshape simulation & parameter extraction

So Cal. Simulations in Science

Page 6: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 6

Career Overview● As a postdoc, started working on software to mine data from images … more on

this later

So Cal. Simulations in Science

Page 7: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 7

Career Overview ● Life after post-doc

● Generally engaged in solving problems with HPC

– Independent oil + gas shops/consultancies

– HPC unit at Penn State

– Schlumberger● Microseismic monitoring

– Forward modeling eg., finite differences– Network/IO– Parallel & embedded computing– GPU (visualization & computation)– Native & managed code optimization– Algorithms

So Cal. Simulations in Science

Page 8: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 8

Career Overview● Microseismic monitoring→ continuously record > 1k channels of low

amplitude/SNR and reproduce seismic events in time and space

● An inverse problem that requires many aspects of high performance computing

So Cal. Simulations in Science

Page 9: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 9

HPC: definitions● Traditionally refers to parallel computation, but a more general definition is

hardware + software combinations that are performant and scalable.

● HPC requires attention to one or more of the following:

● Application software itself and libraries

● I/O eg., network,disk

● Memory models eg., distributed, chosen to suit the problem

● Data storage, compression, databases etc

● Kernel fundamentals, eg., limiting impact of system calls

● Processor type eg., ARM, x86, GPU etc

● Hardware configuration eg., x86 nodes with IB interconnect, distributed ARM etc

So Cal. Simulations in Science

Page 10: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 10

Outline● Career Overview● HPC

● Definitions,challenges

● Selected Projects

– GPU Applications

– Plot2txt● Technology

– Embedded

– Networks● Mesh/Ad-hoc

– Cloud● AWS

● Tools

● Summary

So Cal. Simulations in Science

Page 11: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 11

Metabolic Networks● (project with A. Khodayari) Optimal models for the metabolic networks of

microbial organisms important in pharma, energy industries

● Ensemble Modeling (EM) is used to construct chemical kinetics of microbial organisms → decompose metabolic reactions into the elementary mechanisms, which are ODE systems f(k

i,y

j) = dy

j/dt

So Cal. Simulations in Science

● Overall approach maximizes correlation between model predictions and experimental measurements, performed in steady state → solve f(k,y) = 0

Page 12: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 12

Metabolic Networks

● Devised a hybrid CPU/GPU solver :

● [CPU] parse equations f(k,y)● [CPU] differentiate f(k,y), create analytic J(k,y)● [CPU] populate data structures representing f(k,y), J(k,y),

copy to GPU● [GPU] Iterate (Newton-Raphson) →

● Numerically evaluate f(k,y) and J(k,y) by parallel reduction

● Solve for delta in f(k,y) = -delta . J(k,y) using GMRES ● Update y += delta and repeat until ||f(k,y)|| < tol

So Cal. Simulations in Science

Page 13: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 13

Metabolic Networks

So Cal. Simulations in Science

● Solution uses various libraries including Boost, Thrust, CUSP and CUDA

● Matrices sparse, poorly conditioned, but solution works well for O(10^2) equations

● Challenging extending this to O(10^3) largely due to numerical/convergence issues

Page 14: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 14

Molecular Dynamics + Sim Anneal

So Cal. Simulations in Science

● (project with P.Y Taunay) Solve for MD potentials by fitting experimental data for structure factor

● Optimization surface (below) is highly non-convex → use simulated annealing, each GPU performs independent MD run

Page 15: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 15

LU Decomposition

So Cal. Simulations in Science

● (project with S. Ganesh Jaya) Fractional quantum Hall effect → fundamental physics that has implications in quantum computation and material science

● O(N!) determinants need to be evaluated in constructing wavefunction, process repeated many times in Monte Carlo calculation

● Small, dense matrices of side <= 512

● Created a GPU implementation that exploits SIMD architecture, parallel reduction

● Example; N=11, computation time using 8 GPU devices (w/ MPI), 1024 Monte Carlo iterations is ~ 246 seconds from ~ 31488 single CPU

Page 16: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 16

LU Decomposition

So Cal. Simulations in Science

Page 17: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 17

QR Decomposition

So Cal. Simulations in Science

● Another matrix decomposition integral to solving systems of equations, but also relevant to SVD

● SVD can be performed using the QR algorithm, in turn a function of QR decomposition

● Devised a unique approach for large batches of dense small matrices using Givens rotations; largely independent ops, maps well to GPU

● Results of LU & QR work : “Efficient Batch LU & QR Decomposition on GPU” Brouwer/Taunay in “Numerical Computations with GPUs” Kindratenko (Springer)

Page 18: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 18

QR Decomposition

So Cal. Simulations in Science

Page 19: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 19

Quantum Espresso

So Cal. Simulations in Science

● Density Functional Theory (DFT) has enjoyed huge growth in popularity owing to computational and numerical advancements; used widely in material science

● Quantum Espresso (QE) is an open source DFT package that has recently added GPU acceleration, largely through BLAS and FFT routines

● When building QE with MAGMA (UT/ORNL) or phiGEMM, one introduces heterogeneous CPU/GPU linear algebra routines

● Quantum Espresso and similar codes that require Self Consistent Field step use matrix diagonalization eg., using Lanczos power method

● Lanczos consists of many matrix-vector operations, very amenable to GPU, tested using cuBLAS &MKL in a heterogeneous solution.

Page 20: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 20

Lanczos Diagonalization

So Cal. Simulations in Science

Page 21: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 21

Lanczos Diagonalization

So Cal. Simulations in Science

● CUDA 5.5/Kepler overall yielded pleasing communication results for 1-2 GPUs (CUDA-enabled openmpi 1.7.3, MPI send/recv), scaling dropped off rapidly thereafter

● These I/O challenges may be resolved by the release of NVLINK

1e+07

2 4 6 8

5

4

3

2Ban

dwi d

th G

B/s

Increasing msg size in MB, within single application

● Results of 4 tests● Rhel 6, Intel x86_64, Nvidia

driver 331.38 ● Communication btwn K20 & K40

Page 22: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 22

NVLINK

So Cal. Simulations in Science

Page 23: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 23

plot2txt

So Cal. Simulations in Science

● We live in an increasingly data-driven world, and much is represented as figures and other objects in documents

● However data represented in traditional document images is ‘lost’ for the purposes of search and further reuse, since the underlying storage is binary

● Plot2txt (P2T) converts technical figures back into useful textual data representations automatically

● P2T uses a combination of new and traditional image analysis algorithms, heuristics and unsupervised machine learning

● Can even work with objects like chemical structures, molecules etc eg.,

Page 24: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 24

Input : PDF document collection, split into pages, handed to p2t instances and processed

Output : Spectra in CSV, molecules in BMP images

So Cal. Simulations in Science

pdf

page

page

p2t

p2t

plot2txt

Page 25: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 25

plot2txt

So Cal. Simulations in Science

● Example performance (work performed for Royal Society of Chemistry 2013):

● Input : 74 documents/ 3444 pages● Output : p2t algorithms extracted content in 1069 page

instances

– 578 molecules~ 10% false positives eg., classifies Bruker logo as chemical object~ 20% false negatives eg., missing some symbols from structure

– 1151 spectra> 80% of peaks extracted to within 1-2 decimal places (ppm)

Page 26: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 26

plot2txt

So Cal. Simulations in Science

Page 27: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 27

Outline● Career Overview● HPC

● Definitions,challenges

● Selected Projects

– GPU Applications

– Plot2txt● Technology

– Embedded

– Networks● Mesh/Ad-hoc

– Cloud● AWS

● Tools

● Summary

So Cal. Simulations in Science

Page 28: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 28

ARM/embedded

So Cal. Simulations in Science

● Computing using ARM/embedded devices is very much in vogue:

● 'Internet of Things'

● Lower power consumption/cost

● Nvidia Jetson TK1 provides an excellent resource for test and development

● Kepler GPU w/ 192 cores, quad core ARM Cortex A15 CPU, 2.3GHz, 16 GB eMMC storage, 2GB RAM,Ubuntu Linux

● Perf. is excellent eg., 6+ Gflop/s DGEMM, 40+ G flop/s for SGEMM

Page 29: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 29

ARM/embedded

So Cal. Simulations in Science

● Comparison of eig and matrix multiplication:

● For small matrices (ARM has small cache), comparable to older revision Intel (Westmere), even without GPU acceleration

Page 30: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 30

Network

So Cal. Simulations in Science

● Network has great significance for a variety of reasons, including:

● MPI communication/distributed computing

● Networked disk I/O

● Telemetry

● A variety of protocols are used for the latter, including the ubiquitous Transmission Control Protocol or TCP

● In distinction to UDP, has advantage of data integrity and time ordering, disadvantages include congestion control mechanisms that throttle traffic when network becomes congested.

● Currently working on strategies at the application layer for avoiding congestion control & ensuring high bandwidth under adverse conditions

Page 31: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 31

Network

So Cal. Simulations in Science

● Wireshark & tcpdump are powerful/useful tools for studying networks

Page 32: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 32

Network

So Cal. Simulations in Science

● Telemetry often begins in adverse and/or sparsely networked environments, where mesh or ad-hoc networks are the only solution

● Created between wireless devices, each node is router; a good open source example is Batman/OpenMESH : http://www.open-mesh.org/projects/open-mesh/wiki

● Wireless routing protocol implementations generally operate on layer 3, exchanging routing information by UDP packets and making changes to the kernel level routing table

● The Batman-adv daemon captures and forwards all traffic until it reaches the destination, emulating a virtual network switch of all nodes participating.

● All nodes are unaware of the network's topology as well as unaffected by any network changes, providing resilience under change

Page 33: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 33

Outline● Career Overview● HPC

● Definitions,challenges

● Selected Projects

– GPU Applications

– Plot2txt● Technology

– Embedded

– Networks● Mesh/Ad-hoc

– Cloud● AWS

● Tools

● Lessons/Summary

So Cal. Simulations in Science

Page 34: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 34

Cloud/AWS

So Cal. Simulations in Science

● Cloud isn't really profound-it's just 'someone else's computer'

● However, it promises 'instant scalability' in a world where demand for compute cycles and storage are increasingly rapidly

● Vendors like AWS have incredible footprints, which for all intents and purposes looks like an infinitely large compute/storage sink

● AWS offers a rich stack and many ways in which to leverage cloud compute and storage

● Several technologies worth investigating:

● Simple Storage Service (S3)

● Elastic Cloud Compute (EC2)

● Lambda functions

● Dynamo DB

Page 35: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 35

Cloud/AWS

So Cal. Simulations in Science

● Lambda functions are backend services to client application

● Event driven, tied to simple storage service (S3)

● Upload content to a S3 bucket → triggers lambda function, which may/may not work on file content(s)

● Very responsive (~1-100ms after data is received), 'infinitely' scalable

● Execution time limited to 60s, but certainly useful for short running tasks, or long running tasks that may be solved using a divide and conquer strategy

Page 36: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 36

Cloud/AWS

So Cal. Simulations in Science

● There are many entry points to AWS, with bindings to common languages. Web console is best place to start eg., here a dynamoDB (NoSQL) table is examined:

Page 37: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 37

Outline● Career Overview● HPC

● Definitions,challenges

● Selected Projects

– GPU Applications

– Plot2txt● Technology

– Embedded

– Networks● Mesh/Ad-hoc

– Cloud● AWS

● Tools

● Summary

So Cal. Simulations in Science

Page 38: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 38

Tools● Important to develop profiles of application codes, in order to find and target

bottlenecks for optimization:● Valgrind (memory), Helgrind (threads), Cachegrind

● VTUNE

– Native & managed code, Linux and windows● Projects large and small benefit from management tools eg., trac, JIRA

(Atlassian), TFS (Microsoft)

● Use googleTest for units in order to support Test Driven Development (TDD)

● Know and use libraries, don't reinvent the wheel eg.,:

● Intel MKL / AMD ACML / OpenBLAS / cuBLAS for linear alegbra

● Boost

● FFTW

● OpenMPI

So Cal. Simulations in Science

Page 39: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 39

Summary

So Cal. Simulations in Science

● In academia, understandably workers wish to focus on their particular domain; there is a need for teaching and promoting HPC

● Often a shortage of people to do the work/provisioning to meet user demand

● Cloud offers a solution, although code and workflows need to adapt

● XSEDE (formerly Teragrid) has been offering HPC solutions for some time, excellent model for distribution of compute cycles

● Start-up cycles (limited service units) available almost immediately

● Larger allocations by proposal/review

● 'Campus champions' also have cycles they can give away to users

Page 40: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 40

Summary

So Cal. Simulations in Science

Page 41: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 41

Summary

So Cal. Simulations in Science

● In oil+gas industry, much investment is made in :

● network and ensuring high data throughput, as well as

● compute infrastructure,

in order provide answers to large-scale problems quickly

● Obviously workflows and solutions are driven heavily by business demands

● Simulations generally involve forward modeling eg., wave motion in complex environments, in order to ultimately help solve inverse problems

● The development environment is challenging, fast paced and constantly evolving

● Fred Brooks says it best →

Page 42: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 42

Summary

So Cal. Simulations in Science

“...the technological base on which one builds is always advancing... implementation of real products demands phasing and quantizing... The challenge and the mission are to find real solutions to real problems on actual schedules with available resources.

This then is programming, both a tar pit in which many efforts have floundered and a creative activity with joys and woes all it's own”

Fred Brooks in 'The Mythical Man-Month'

Page 43: High Performance Computing: Lessons Learned in … · – NMR, again – Machine learning/ databases / ab initio computations ... An inverse problem that requires many aspects of

10/20/15 43

Acknowledgements

So Cal. Simulations in Science

● {J. Nielsen & team, N. Thompson, P. Primiero}, SLB

● Mark Berger, Nvidia

● Tony Williams, EPA

● Pierre-Yves Taunay, Princeton

● {Ryan Eagen/Cowen group, Ali Khodayari/Maranas group, Sreejith Jaya Ganesh, Jim Kubicki, Dan Haworth, Adri Van Duin} PSU