desktop, hpc and cloud big data pipelines for gpu-accelerated...gpu-accelerated big data pipelines...

19
GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith 1 , Ben Shealy 1 , Josh Burns 2 , Dr. Alex Feltus 3 , Dr. Stephen Ficklin 2 1 Department of Electrical and Computer Engineering, Clemson University 2 Department of Horticulture, Washington State University 3 Department of Genetics and Biochemistry, Clemson University

Upload: others

Post on 19-Jul-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

GPU-AcceleratedBig Data Pipelines for

Desktop, HPC and CloudDr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex Feltus3, Dr. Stephen Ficklin2

1 Department of Electrical and Computer Engineering, Clemson University2 Department of Horticulture, Washington State University

3 Department of Genetics and Biochemistry, Clemson University

Page 2: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

2

Page 3: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Overview

- KINC

- KINC Nextflow Pipeline

- Running KINC Pipeline on Kubernetes

- Demo

- Challenges / Opportunities

3

Page 4: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

The Gene Co-Expression Network (GCN)

4

Page 5: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Knowledge Independent Network Construction (KINC)

Gene expression matrix

Gene co-expression network

Similarity matrix

Pairwise scatter plot

5

Page 6: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Human Brain Tissue-Specific Network

6

Page 7: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Human Kidney Tumor-Specific Network

7

Page 8: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

KINC GPU Implementation

Shealy, Burns, et. al., “GPU Implementation of Pairwise Gaussian Mixture Models for Multi-Modal Gene Co-Expression Networks”, IEEE Access

8

Page 9: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

KINC Pipeline

9

Page 10: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Pipeline Portability with Nextflow

# Local

nextflow run systemsgenetics/KINC-nf -with-docker

# HPC

nextflow run systemsgenetics/KINC-nf -profile pbs -with-singularity

# Kubernetes

nextflow kuberun systemsgenetics/KINC-nf -v <pvc-name>

10

Page 11: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Running Genomics Pipelines on Kubernetes

11

Page 12: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Integrating Nextflow / Kubernetes / GPUs

nextflow.config

process {

cpus = 4

memory = 8.GB --->

accelerator = 1

}

pod.yaml

resources:

limits:

cpu: 4

memory: "8Gi"

nvidia.com/gpu: 1

12

Page 13: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

DemoKINC / Nextflow / Kubernetes / NVIDIA GPUs

13

Page 14: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Challenges and Opportunities

- Using Kubernetes as an HPC platform- Transient failures (network timeouts, node-specific issues)

- Scheduling policies (walltime)

- Access controls (users)

- “SLURM-enetes” / “KubLURM”

- Scale up

- Increase usability for domain scientists- Friendly user interface

- Assist user with machine learning 14

Page 15: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Thank you! Questions?Research funded by NSF Grant #1659300

15

Page 16: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Nextflow API / Workflows

16

Page 17: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Nextflow API / Workflow Instance

17

Page 18: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Grafana / Cluster Usage

18

Page 19: Desktop, HPC and Cloud Big Data Pipelines for GPU-Accelerated...GPU-Accelerated Big Data Pipelines for Desktop, HPC and Cloud Dr. Melissa Smith1, Ben Shealy1, Josh Burns2, Dr. Alex

Grafana / GPU Usage

19