built for big data: accelerated deep learning with power · built for big data: accelerated deep...

14
© 2016 International Business Machines Corporation 1 Built for Big Data: Accelerated Deep Learning with POWER Ruchir Puri IBM Fellow

Upload: dangthuy

Post on 27-Jul-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

© 2016 International Business Machines Corporation 1

Built for Big Data: Accelerated Deep Learning with POWER Ruchir Puri IBM Fellow

2 © 2016 International Business Machines Corporation

POWER8 Core: Back bone of big data computing system

! Enhanced Micro-Architecture ! Increased Execution Bandwidth ! SMT 8 ! Transactional Memory ! Vector/Scalar Unit ! High-performance Integer & FP Vector Processor ! Optimized for Data Rich Applications

VSU FXU

IFU

DFU

ISU

PC

PC

LSU

3 © 2016 International Business Machines Corporation

Combined I/O Bandwidth = 7.6Tb/s

POWER8 Processor

Memory Buffers

Memory Buffers

PCI

DMI

PCI

POWER8 Processor

POWER8 Processor

DMI

DMI

DMI

DMI

DMI

DMI

DMI

NODE-to-NODE

ON-NODE SMP

Big Bandwidth for

Big Data

Putting it all together with the memory links, on- and off-node SMP links as well as PCIe, at 7.6Tb/s of chip I/O bandwidth

4 © 2016 International Business Machines Corporation

General-Purpose CPU Design

! Many competing requirements –  Branchy control-flow dominated code – Code with unpredictable data access patterns – Operating system code – Multiple separate applications – Multiple virtual machines concurrently

! Result in low efficiency for any one metric –  Flops / area –  Integer ops / area – …

VSU FXU

IFU

DFU

ISU

PC

PC

LSU

Out of order execution

Register renaming

Branch prediction & prefetch

Robust virtual

memory support

decode

I$

RF

int

D$

SIMD

5 © 2016 International Business Machines Corporation

Power GPU Acceleration ! Support for up to 18 GPUs or more ! Tested with up to 24 devices ! Exploit IBM design for big data ! Large address space enables rich accelerator configurations ! 1 TB of address space per PCI host interfaces ! Standard LE Linux Nvidia drivers ! CUDA 7.5 ! Available now

6 © 2016 International Business Machines Corporation

Power Machine and Deep Learning Software Stack

BLAS, LAPACK

cuBLAS, cuDNN

Nervana Auviz lib

Xilinx blks

CAFFE (Berkeley)

Torch (NYU, Facebook)

Theano (Montreal)

TensorFlow (Google)

CNTK (Microsoft)

DL4J (Spark)

Library layer

Frameworks

DLSL

7 © 2016 International Business Machines Corporation

Power GPU acceleration

! CUDA programming environment supported under LE Linux – GPU as compute accelerator – Offload dominated compute-intensive application portions to GPU

! Advances in GPU Performance and Programmability – UVA – Universal Virtual Addressing – UM – Unified Memory

! Ongoing collaboration to co-optimize systems – Next generation hardware enhancements

8 © 2016 International Business Machines Corporation

Programming Heterogeneous Systems

OpenCL? SystemC

? VHDL?

C++? Java?

CUDA?

9 © 2016 International Business Machines Corporation

Portability and Optimization in Heterogeneous Systems

Library Layer

Accelerator X

CPU enablement

GPU enablement

FPGA interface &

configuration

Accelerator X Enablement

Cognitive Middleware

Application Application Application

10 © 2016 International Business Machines Corporation

First Power Machine and Deep Learning Distro available

DL Framework Status CAFFE Ported. Included in First Release of Power MLDL Distro.

Torch Ported. Included in First Release of Power MLDL Distro.

Theano Ported. Included in First Release of Power MLDL Distro.

DIGITS-3 Ported. Included in First Release of Power MLDL Distro.

TensorFlow Ported.

DL4J Ported.

CNTK Ported.

© 2015 International Business Machines Corporation 11

Accelerated Deep Learning on Power Designed for Compute-Intensive Cognitive Workloads

© 2015 International Business Machines Corporation 12

Drug1 Drug2 Aspirin Probenecid

Aspirin Azilsartan

Learn Predict Ingest

Personalized Medicine, Adverse Drug Reaction Prediction w/ ML

Drug1 Drug2 Sim Salsalate Aspirin .7

Dicoumarol Warfarin .6

Drug1 Drug2 Aspirin Gliclazide

Aspirin Dicoumarol

Drug1 Drug2 Sim Salsalate Aspirin .9

Dicoumarol Warfarin .76

Known Interactions of type 1 to …

Drug1 Drug2 Best Sim1*Sim1

Best SimN*SimN

Salsalate Gliclazide .9*1 .7*1

Salsalate Warfarin .9*.76 .7*.6

Candidate Interactions of type i Features

Chemical Similarity 1 to …

Drug1 Drug2 Prediction

Salsalate Gliclazide 0.85

Salsalate Warfarin 0.7

Interactions of type 1 Prediction

Drug1 Drug2 Prediction

Salsalate Gliclazide 0.53

Salsalate Warfarin 0.32

Interactions of type M Prediction

+

Machine Learning Model

30X improvement in Learning performance

100s of TBs of data 50 million patients, 2000 drugs

2000 features

13 © 2016 International Business Machines Corporation

Personalization will result in massive increase in computation complexity Real time prediction requirements for operational needs (< 1 minute for emergency situations)

! Computational pattern: –  Sparse cube to dense cube with patient as additional dimension

!  Training: – Number of patients above 50 Million – Number of features around 1800 –  Additional samples for training O(#patients) – Number of cross-validation stages and #models per stage increases

dramatically –  100X increase in training complexity with ~100 TBs of Data

!  Prediction: –  Input Model (#features) and dataset (# patients in the hospital) –  1800 features and 500,000 patients – Real time

Personalized Medicine – Adverse Drug Reaction Workload

© 2015 International Business Machines Corporation 14

Thank You!