accelerating sanjeevini: a drug discovery …on-demand.gputechconf.com/gtc/2018/presentation/s...•...

49
ACCELERATING SANJEEVINI: A DRUG DISCOVERY SOFTWARE SUITE Abhilash Jayaraj, IIT Delhi Bharatkumar Sharma, Nvidia Shashank Shekhar, IIT Delhi Nagavijayalakshmi, Nvidia

Upload: others

Post on 21-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

ACCELERATING SANJEEVINI: A DRUG DISCOVERY SOFTWARE SUITEAbhilash Jayaraj, IIT Delhi Bharatkumar Sharma, Nvidia

Shashank Shekhar, IIT Delhi Nagavijayalakshmi, Nvidia

Page 2: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

2

AGENDA

• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani

• Challenges

• Code documentation in process of being improved

• Code maintained by Non Computer Science

• Designed to suit distributed programming

• Constraints

• Code modification should be minimal Ease of Maintenance.

• The current cluster has mix of CPU and GPU. Should run on both Portable

• Learnings

What to expect and what not to

Page 3: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

3

COMPUTER AIDED DRUG DISCOVERYIntroduction

Target Discovery

Lead Generation

Lead Optimization

Preclinical Development

Phase I, II & III Clinical Trials

FDA Review & Approval

Drug to the Market

14 yrs $1.4 billion

2.5yrs

3.0yrs

1.0yrs

6.0yrs

1.5yrs

4%

15%

10%

68%

3%

Page 4: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

4

SANJEEVINI FOR COMPUTER AIDED DRUG DESIGN

Check Lipinski compliance

Generate rapid binding energy estimates by

RASPD protocol

Predict all

possible

binding sites

and store top

ten sites

Dock and Score

Optimize geometry /

Assign TPACM4/derive quantum

mechanical charges

Assign force field

parameters

Perform molecular dynamics simulations and post facto free energy component analyses (Optional)

Generate

canonical A/B

DNA or MD

averaged

structure of B

DNA

Self drawn

ligand

molecule

Protein-ligand Complex/ Protein/DNA sequenceNRDBSM/Million molecule

library/Natural products

database

Overview

Page 5: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

5

SANJEEVINIGPU acceleration

▪ OpenACCacceleration of ParDOCK module

▪ All atom energy based Monte Carlo docking for protein-ligand complexes

Page 6: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

6

PERFORMANCE OPTIMIZATION Strategy

Analyze

ParallelizeOptimize

Page 7: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

7

PERFORMANCE OPTIMIZATION Strategy

Analyze

ParallelizeOptimize

Page 8: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

8

SANJEEVINI: PARDOCK

Flat profile:

Hotspots

% time Cumulative

seconds

Self

seconds

Calls Self calls Total

s/calls

Name

69.78 557.90 557.90 1188000 0.00 0.00 PDB::EnergyCalculator()

12.92 661.19 103.29 8 12.91 20.26 PDB::clashCombination()

7.35 719.96 58.77 26051422500 0.00 0.00 getRadius1()

5.49 763.85 43.89 885075 0.00 0.00 PDB::energyAtom()

Page 9: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

9

PERFORMANCE OPTIMIZATION Strategy

Analyze

ParallelizeOptimize

Page 10: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

10

SANJEEVINI: PARDOCKCPU code: EnergyCalculator

double PDB::EnergyCalculator(float **&energyGrid, const vector <points> &vDrugGrid, points

coords[], const unsigned &totalDockAtoms, … ){

for( int atomcount = 0; atomcount < totalDockAtoms; atomcount++ ){

for( int counter = 0; counter < vDrugGrid.size(); counter++ ){

// compute ‘distance’ between coords[atomcount] and vDrugGrid[counter]

// minDis = minimum of ‘distance’, minCounter = counter corresponding to minDis

}

ene += EnergyGrid[minCounter][atomcount];

}

return ene; }

Page 11: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

11

OpenACCSimple | Powerful | Portable

Fueling the Next Wave of

Scientific Discoveries in HPC

University of IllinoisPowerGrid- MRI Reconstruction

70x Speed-Up

2 Days of Effort

http://www.cray.com/sites/default/files/resources/OpenACC_213462.12_OpenACC_Cosmo_CS_FNL.pdf

http://www.hpcwire.com/off-the-wire/first-round-of-2015-hackathons-gets-underway

http://on-demand.gputechconf.com/gtc/2015/presentation/S5297-Hisashi-Yashiro.pdf

http://www.openacc.org/content/experiences-porting-molecular-dynamics-code-gpus-cray-xk7

RIKEN JapanNICAM- Climate Modeling

7-8x Speed-Up

5% of Code Modified

main() {

<serial code>#pragma acc kernels//automatically runs on GPU

{ <parallel code>

}}

Page 12: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

12

OPENACC DIRECTIVES

Manage

Data

Movement

Initiate

Parallel

Execution

Optimize

Loop

Mappings

#pragma acc data copyin(x,y) copyout(z){...#pragma acc parallel {#pragma acc loop gang vector

for (i = 0; i < n; ++i) {z[i] = x[i] + y[i];...

}}...

}

Performance portable

Interoperable

Single source

Incremental

Page 13: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

13

SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (1)

double PDB::EnergyCalculator(float **&energyGrid, const vector <points> &vDrugGrid, points

coords[], const unsigned &totalDockAtoms, … ){

#pragma acc parallel loop reduction(+:ene) private(minDis,minCounter) present() copyin()

firstprivate()

for( int atomcount = 0; atomcount < totalDockAtoms; atomcount++ ){

#pragma acc loop reduction(min:minDis)

for( int counter = 0; counter < vDrugGrid.size(); counter++ ){

// compute ‘distance’ between coords[atomcount] and vDrugGrid[counter]

minDis = (minDis > distance) ? distance;

}

Page 14: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

14

SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (2)

#pragma acc loop reduction(min:minCounter)

for( int counter = 0; counter < vDrugGrid.size(); counter++ ){

// compute ‘distance’ between coords[atomcount] and vDrugGrid[counter]

if ( distance == minDis ){

minCounter = (minCounter > counter) ? counter; }

}

ene += EnergyGrid[minCounter][atomcount];

}

return ene; }

Page 15: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

15

SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (3)

const points *vDrugGridData = vDrugGrid.data();

// compute ‘distance’ between coords[atomcount] and vDrugGridData[counter]

▪ Use ‘raw data pointer’ to access vectors

Page 16: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

16

SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (4)

unsigned totDockAtoms = totalDockAtoms;

float **eneGrid = EnergyGrid;

#pragma acc parallel loop reduction(+:ene) …

copyin(coords[0:tot DockAtoms]) present(eneGrid)

ene += eneGrid[minCounter][atomcount];

▪ Use ‘raw data pointer’ to access vectors

▪ Avoid using C++ references in OpenACC pragmas

Page 17: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

17

SANJEEVINI: PARDOCKOpenACC parallelization: EnergyCalculator (4)

unsigned totDockAtoms = totalDockAtoms;

float **eneGrid = EnergyGrid;

#pragma acc parallel loop reduction(+:ene) …

copyin(coords[0:tot DockAtoms]) present(eneGrid)

ene += eneGrid[minCounter][atomcount];

▪ Use ‘raw data pointer’ to access vectors

▪ Avoid using C++ references in OpenACC pragmas

PDB::EnergyCalculator(float **&, const

std::vector<points, std::allocator<points>> &,

const std::vector<points, std::allocator<points>>

&, points *, const unsigned int &, energy &, int):

22, Generating present(vDrugGridData[:])

Generating copyin(coords[:totalDockAtoms->])

Generating present(EnergyGrid[:][:][:])

Runtime memory access

violation

Page 18: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

18

OPENACC: 3 LEVELS OF PARALLELISM

• Vector threads work in

lockstep (SIMD/SIMT

parallelism)

• Workers compute a vector

• Gangs have 1 or more

workers and share resources

(such as cache, the

streaming multiprocessor,

etc.)

• Multiple gangs work

independently of each other

Workers

Gang

Workers

Gang

Vector

Vector

Page 19: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

19

SANJEEVINI: PARDOCKOpenACC compiler output: EnergyCalculator

PDB::EnergyCalculator(float **&, const std::vector<points, std::allocator<points>> &, const std::vector<points,

std::allocator<points>> &, points *, const unsigned int &, energy &, int):

22, Generating present(vDrugGridData[:],eneGrid[:][:])

Generating copyin(coords[:totDockAtoms])

22, Accelerator kernel generated

Generating Tesla code

22, Generating reduction(+:ene)

24, #pragma acc loop gang /* blockIdx.x */

31, #pragma acc loop vector(256) /* threadIdx.x */

Generating reduction(min:minDis)

45, #pragma acc loop vector(256) /* threadIdx.x */

Generating reduction(min:minIdx)

31, Loop is parallelizable

45, Loop is parallelizable

Page 20: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

20

MANAGE DATA HIGHER IN THE PROGRAM

Currently data is moved at the beginning and end of each function, in case the data is needed on the CPU

We know that the data is only needed on the CPU after convergence

We should inform the compiler when data movement is really needed to improved performance

Page 21: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

21

STRUCTURED DATA REGIONS

The data directive defines a region of code in which GPU arrays remain on the GPU and are shared among all kernels in that region.

#pragma acc data

{

#pragma acc parallel loop

...

#pragma acc parallel loop

...

}

Data Region

Arrays used within the

data region will remain

on the GPU until the

end of the data region.

Page 22: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

22

UNSTRUCTURED DATA DIRECTIVES

Used to define data regions when scoping doesn’t allow the use of normal data regions (e.g. the constructor/destructor of a class).

enter data Defines the start of an unstructured data lifetime

• clauses: copyin(list), create(list)

exit data Defines the end of an unstructured data lifetime

• clauses: copyout(list), delete(list), finalize

#pragma acc enter data copyin(a)

...

#pragma acc exit data delete(a)

Page 23: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

23

SANJEEVINI: PARDOCKOpenACC parallelization: EnergyAtom (3)

int **vProteinListData = new int

*[vProteinList.size()];

n = vProteinList.size();

#pragma acc enter data

create(vProteinListData[0:n][0:1])

for( int count = 0; count < n; count++ ){

int numPro = vProteinList[count].size();

vProteinListData[count] =

vProteinList[count].data();

#pragma acc enter data

copyin(vProteinListData[count:1][0:numPro])

}

▪ Use ‘raw data pointer’ to access vectors

▪ How will you access ‘vector of vector (jagged arrays)’ ?

Creation and copy of jagged arrays

Page 24: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

24

SANJEEVINI: PARDOCKOpenACC parallelization: EnergyAtom (4)

for( int count = 0; count < n; count++ ){

int numPro = vProteinList[count].size();

#pragma acc exit data

delete(vProteinListData[count:1][0:numPro])

vProteinListData[count] = NULL;

}

#pragma acc exit data

delete(vProteinListData[0:n][0:1])

▪ Use ‘raw data pointer’ to access vectors

▪ How will you access ‘vector of vector (jagged arrays)’ ?

Deletion of jagged arrays

Page 25: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

25

SANJEEVINI: PARDOCKOpenACC compiler output: EnergyAtom (1)

PDB::energyAtom(const std::vector<PDB, std::allocator<PDB>> &, PDB, points, const std::vector<Box,

std::allocator<Box>>&, const std::vector<int, std::allocator<int>>&, const std::vector<std::vector<int,

std::allocator<int>>, std::allocator<std::vector<int, std::allocator<int>>>>&, int **):

79, Generating enter data copyin(boxListData[:boxListNumElements],rec,coord)

85, Generating present(coord,boxListData[:],rec,vProteinListData[:][:],vProData[:])

Accelerator kernel generated

Generating Tesla code

85, Generating reduction(+:electro,vandw,ehyd)

87, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */

129, Generating exit data delete(boxListData[:boxListNumElements],rec,coord)

Page 26: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

26

SANJEEVINI: PARDOCKOpenACC compiler output: EnergyAtom (2)

main:

266, Generating enter data copyin(vProData[:vProNumElements])

Generating enter data create(vProteinListData[:vProteinListNumElements][:1])

275, Generating enter data copyin(vProteinListData[proList][:numElements])

321, Generating exit data delete(vProteinListData[proList][:numElements])

322, Generating exit data

delete(vProteinListData[:vProteinListNumElements][:1],vProData[:vProNumElements])

Page 27: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

27

CUDA UNIFIED MEMORYSimplified Developer Effort

Without Unified Memory With Unified Memory

Unified MemorySystem Memory

GPU Memory

Sometimes referred to as

“managed memory.”

New “Pascal” GPUs handle Unified Memory in hardware.

Page 28: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

28

PERFORMANCE OPTIMIZATION Strategy

Analyze

ParallelizeOptimize

Page 29: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

30

SAJEEVINI: PARDOCKPerformance: CPU and GPU (1)

▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core

▪ 256 GB RAM▪ Tesla P100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61▪ MPS enabled for GPU access

CPU+GPU 5.8x/3.3x faster than CPU at 8 MPI procs, ROTATE=1000/100

16 MPI procs on a single GPU -> GPU is the

bottleneck!

Page 30: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

31

SAJEEVINI: PARDOCKPerformance: CPU and GPU (2)

▪ Average ‘time to predict’ over 160 datasets

▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core

▪ 256 GB RAM▪ Tesla P100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61▪ MPS enabled for GPU access

CPU+GPU 5.3x/3.2x faster than CPU at 8 MPI procs, ROTATE=1000/100

Page 31: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

32

TESLA V100The Fastest and Most Productive GPU for AI and HPC

Volta Architecture

Most Productive GPU

Tensor Core

125 Programmable

TFLOPS Deep Learning

Improved SIMT Model

New Algorithms

Volta MPS

Inference Utilization

Improved NVLink &

HBM2

Efficient Bandwidth

Page 32: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

33

MULTI PROCESS SERVICE (MPS) FOR MPI APPLICATIONS

Page 33: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

34

GPU ACCELERATION OF LEGACY MPI APPS

Typical legacy application

MPI parallel

Single or few threads per MPI rank (e.g. OpenMP)

Running with multiple MPI ranks per node

GPU acceleration in phases

Proof of concept prototype, …

Great speedup at kernel level

Application performance misses expectations

4/2/2018

Page 34: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

35

MULTI PROCESS SERVICE (MPS)For Legacy MPI Applications

4/2/2018

N=4N=2N=1 N=8

Multicore CPU only

With Hyper-Q/MPSAvailable on Tesla/Quadro with CC 3.5+

(e.g. K20, K40, K80, M40,…)

N=4N=2 N=8

GPU parallelizable partCPU parallel partSerial part

GPU-accelerated

N=1

Page 35: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

36

PROCESSES SHARING GPU WITHOUT MPSNo Overlap

4/2/2018

Process A Process B

Context A Context B

Process A Process B

GPU

Page 36: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

37

PROCESSES SHARING GPU WITHOUT MPSContext Switch Overhead

4/2/2018

Time-slided use of GPU

Context switch Context

Switch

Page 37: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

38

PROCESSES SHARING GPU WITH MPSMaximum Overlap

4/2/2018

Process A Process B

Context A Context B

GPU Kernels from

Process A

Kernels from

Process B

MPS Process

Page 38: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

39

PROCESSES SHARING GPU WITH MPSNo Context Switch Overhead

4/2/2018

Page 39: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

40

Page 40: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

41

Page 41: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

42

SAJEEVINI: PARDOCKPascal vs Volta

▪ Average ‘time to predict’ over 160 datasets, ROTATE=1000

▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core

▪ 256 GB RAM▪ Tesla P100/V100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61/9.0.176▪ MPS enabled for GPU access

Volta is 2.1x faster than Pascal

Page 42: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

43

SANJEEVINI: PARDOCKOpenACC parallelization

▪ Use ‘raw data pointer’ to access vectors

▪ Avoid using C++ references in OpenACC pragmas

▪ Standard classes called from an OpenACC region may result in compilation/linking errors. Use math.h instead of cmath ☺

▪ Unified memory has improved over time but sometimes there might be a need to explicitly use data clause to minimize data copies

▪ Volta works excellent with program needing functionality of MPS

Page 43: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

44

ONGOING WORK

Page 44: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

45

SAJEEVINI: PARDOCKPascal vs Volta

▪ Average ‘time to predict’ over 160 datasets, ROTATE=1000

▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core

▪ 256 GB RAM▪ Tesla P100/V100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61/9.0.176▪ MPS enabled for GPU access

Volta is 2.1x faster than Pascal due to hardware

accelerated MPS

Page 45: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

46

SAJEEVINI: PARDOCKMulti-GPU scalability (2)

▪ ‘1qbt’ dataset, ROTATE=1000, 8 MPI procs

▪ PSG Cluster node, Haswell E5-2698 v3@ 2.3 GHz, dual socket, 16 core

▪ 256 GB RAM▪ Tesla P100 GPU▪ CentOS 7.2▪ Cuda Toolkit 8.0.61▪ MPS enabled for GPU access

▪ Higher concurrency possible with more devices->lower GPU time

▪ Lesser latency with more devices/MPS servers->lower CPU time

Page 46: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

47

SAJEEVINI: PARDOCKMulti-GPU scalability (3)

▪ ‘5cna’ dataset, ROTATE=100, 8 MPI procs, Tesla P100 GPUs, MPS

Page 47: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

48

SAJEEVINI: PARDOCKPascal vs Volta (2)

▪ ‘1a4w’ dataset, ROTATE=100, 8 MPI procs, Tesla P100/V100 GPUs, MPS

Page 48: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

49

REFERENCES: PARDOCK

• Gupta, A., et al. "ParDOCK: An all atom energy based Monte Carlo docking protocol for protein-

ligand complexes." Protein and peptide letters 14.7 (2007): 632-646.

• Nishikawa, Joy L., et al. "Inhibiting fungal multidrug resistance by disrupting an activator–Mediator

interaction." Nature 530.7591 (2016): 485.

• Singh, Tanya, D. Biswas, and Bhyravabhotla Jayaram. "AADS-An automated active site

identification, docking, and scoring protocol for protein targets based on physicochemical

descriptors." Journal of chemical information and modeling 51.10 (2011): 2515-2527.

• Singh, Tanya, Olayiwola Adedotun Adekoya, and B. Jayaram. "Understanding the binding of

inhibitors of matrix metalloproteinases by molecular docking, quantum mechanical calculations,

molecular dynamics simulations, and a MMGBSA/MMBappl study." Molecular BioSystems 11.4

(2015): 1041-1051.

• Jayaram, Bhyravabhotla, et al. "Sanjeevini: a freely accessible web-server for target directed lead

molecule discovery." BMC bioinformatics. Vol. 13. No. 17. BioMed Central, 2012.

Page 49: ACCELERATING SANJEEVINI: A DRUG DISCOVERY …on-demand.gputechconf.com/gtc/2018/presentation/s...• Quick Introduction to Computer Aided Drug Discovery software Sanjeevani • Challenges

50

SANJEEVINI: PARDOCKSteps involved