big data analytics using accelerator for · pdf filebig data analytics using accelerator for...

32
Accelerative Technology Lab 2014 Big Data Analytics Using Accelerator for HPC KK Yong ([email protected]) R&D Activities carried out at NVIDIA-MIMOS Joint Lab (First in South East Asia)

Upload: lammien

Post on 24-Mar-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Accelerative Technology Lab 2014

Big Data Analytics Using Accelerator for HPC

KK Yong ([email protected])

R&D Activities carried out at NVIDIA-MIMOS Joint Lab (First in South East Asia)

Page 2: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Outline

• About MIMOS – ATL

• MIMOS Platform & Application

• Challenges of GPU Libraries adaption in Big Data Analytics

• MiAccLib Architecture and Framework

• MIMOS R&D GPU Cluster

Page 3: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

About MIMOS

• Malaysia’s National R&D Center

• 10 core research areas: – Advanced Analysis & Modelling

– Advanced Computing (*)

• Accelerative Technology Lab

– Information Security

– Intelligent Informatics

– Knowledge Technology

– Microenergy

– Microelectronics

– Nanoelectronics

– Psychometrics

– Wireless Communications

(*) Advanced Computing

Spearheads R&D activities in acceleration on large-scale computing, chiefly Cloud Computing; from SaaS and IaaS to Services Delivery Platform.

Page 4: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

About Accelerative Technology Lab

• To facilitate adaptation of many-core/parallel/GPU techniques in scientific, financial, big data processing areas

• To enhance GPU related R&D activities in Malaysia

• To serve as a one-stop center to promote, share & teach GPU technologies/solutions to customers and those interested in GPGPU, and to do joint collaborations on GPU topics

Accelerative

Technology Lab

Finance

Text / string

analytics

Crypto Video

Analytics

Database Acceleration (Galactica)

Oil and Gas

Page 5: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

DB

Open Platform

5

Traditional Multicores (Main Processor)

General Purpose GPU 512-2880 GPU Cores (Co-Processor)

Many Integrated CPU Core (60

Cores) (Co-Processor)

On-board Memory Additional/External

Memory (SSD/HSM)

Parallel programming

CUDA platform 40Gb/s Infiniband ConnectX

Page 6: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

GPU/Multi Core Driven Applications for MIMOS

Pa ra l l e l Da ta

P ro c e s s i n g

En/Decryption (scrambling+)

+ SSL Accelerator

Large DataSets (SOCSO)

Streamed Data (ISP & AVMedic)

Fraud Management System (SOCSO)

PDRM (HRMIS)

Business/ Enterprise Data MiAccLib 2.0

Patient Data Analytics

Page 7: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Challenges of GPU Libraries adaption in Big Data Analytics

• Most of our library users are non-scientific in nature • GPGPU is seen as an “Acceleration Co-Processor” • Hide the algorithmic complexity with simple API parameters • Structured & Unstructured Data in xxx Gigabytes

Facebook status updates: 700 per second Twitter tweets: 600 per second Buzz posts: 55 per second

Google: 34,000 searches per second Yahoo: 3,200 searches per second Bing: 927 searches per second

F u t u r e g r o w t h i s s t a g g e r i n g …

Page 8: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Overall Architecture

Generic web service

SOAP interface

Specific web-service 1

Specific web-service 2

Specific web-service

3

Specific Application

Generic Application

Specific Application

Specific Application

Application specific

Specific API exposed through web service interface ( contains

specific data preparation stage)

Application specific Algorithms

Functional Algorithms

Various Hardware

Mi-AccLib 2.x APIs (DLL)

Application Programming Interface

VAR

- Historical

- Generic

GPU Multi-Core CPU

- Text/String Processing

- Text/String Analytics

Queries Acceleration

& Query Parser/Optimiser

String Library

- Searching

- Sorting

- Matching

- Scrambler

- …

Numeric Library

- Financial

- Matrix

- Scientific

- Statistics

- …

IMDB Library

- Retrieval

- Transfer

- Indexing

- Analytics

- …

SQL/ SPARQL Library

- Unified Indexer

- Query Operator

- Multi-Format Data Manager

- Resource Manager

- …

Library API (dll & .so)

Page 9: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

MIMOS Middle Framework

Many-Core Compute Engine

MIDDLEWARE LAYER

Analytics Component

Processing Component

Orchestration Component

PRESENTATION LAYER

Storage

Analytics Component

Processing Component

Mi-AccLib Libraries (Specific)

Finance Libraries

Video Analytics (Specific)

Text/String Libraries

Statistical/ Predictive Analytics

ETL Tool (Mi-Morphe)

Parallel Queries Processing

(OLAP Accelerator)

Mi-AccLib Libraries

(Algorithms & Generic)

Batch / Real Time

Hadoop & Storm

Libraries with Nodes

Traditional SQL Engine

Machine Learning

GIS Based Analytics

Crypto Libraries

Orchestration & Scheduling Engine

Page 10: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Use Case: SOCSO’s Data Cleansing

• Data Detection and Rectification

• Consisting 7711 cleansing rules

• Key 8 steps:

Extraction Loading Transformation Exception detection

Potential bad data extraction

Rectification discussion

Correction Correction verification

Using

Page 11: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Data Cleansing Challenges

• 31 systems from heterogeneous environment: – Environment: UNISYS, AS400,

Windows, Linux. – Data source: DMS1100, DB2,

Informix, MS SQL, MySQL, Excel, MS Access, Foxpro and flat files.

• Big Data with Big Computation:

– 319 source data – Involves ~1 billions records, e.g.:

• 15 million employees with 15 millions of monthly contribution

• 880,000 employers with 65 millions of monthly contributions

• Match against reference JPN data with 15 million records

Source

Database

Accelerated Duplicate Detection

Source

Database

Reference

Database

Accelerated Data Validation with Reference Data

Accelerated Data Detection | Exact match | Edit distance |

Numeric Distance | Date Distance |

search

~15 Millions

~15 Millions

Page 12: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Snapshot

Data Duplication in One Column Data Duplication in Multi Column

370120105041 370120105041 Identification

number 460721025197 460731025197

Full name Othman Md Amin

Othman B Md Amin

Page 13: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Data Cleansing Performance

0

200

400

600

800

1000

1200

10k 65k 1 million 14 million

Min

ute

s

Number of Records

CPU (1-core) CPU (8-core) GPU optimised (448 cores)

More than 24 hours More than 24 hours

GPU < 0.1 min

GPU < 0.5 min

GPU < 3 min

GPU = 45 min

Page 14: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

High-Speed Name Search Performance

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Exact Match Edit Distance Wild CardSe

con

ds

10 million records 40 million records

All search < 0.2 s

Exact Match Mohamad Mohamad

Edit Distance Mohamad Muhammad

Wild Card Moh • Mohamad • Lee Ang Moh • El-Mohan

Accelerated

&

Parallelized

Algorithms

10+ Million Records of

transaction data

Mi-AccLib

Perkeso Data X JPN Data

350 Trillion

X 7711 Rules combinations/rule

2.7 Quintillion Operations

14

Page 16: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Parallel Queries Accelerator

• Orchestration of Heterogeneous Hardware Components

– Multi-Core CPUs with Many-Core GPGPU

• Emerging GPU accelerated queries processing engine for massively data parallel computation – Analytical algorithms

• Easy to access parallel engine – SQL style accessing – Standard Database Connector

Data Warehouse

Business Intelligence Tool

Presentation Layer

NVIDIA Tesla GPU Technology

March 24-27, 2014 – San Jose Galactica - Accelerated Queries Processing Presented at NVIDIA GTC 2014

Page 17: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Galactica Performance

TPC-H Dataset (1GB, 10GB & 50GB) with three set of queries and PostgreSQL

32GB Dataset, Distributed Processing with 7 Nodes (Hadoop) & PostgreSQL

N/A – Failed in query execution

March 24-27, 2014 – San Jose Galactica - Accelerated Queries Processing Presented at NVIDIA GTC 2014

Page 18: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms
Page 19: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

MiBIS – Data Visualization

MIMOS Mi-BIS is a platform that creates a convenient environment for customised report creation and business analytics. With Mi-BIS, organisations can easily create and manage reports, perform in-depth analysis which includes data exploration, ad hoc query analysis and visualisation of multi-dimensional data, to assist their decision-making process.

Features:

• Dashboard Management • KPI Management • Location Intelligence • Parallel query processing accelerators • Big Data Processing Engine

Page 20: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Scrambling (Database Encrypt & Decrypt)

* ECB = electronic code book;

0

200

400

600

800

1000

1200

1400

1600

1800

32MB 64MB 128MB 256MB

mill

ise

con

d

Message size

Encryption

CPU AES-128 (Quadro 4000)

AES-128 (K20) AES-256 (K20)

0

200

400

600

800

1000

1200

1400

1600

1800

32MB 64MB 128MB 256MB

Message size

Decryption

CPU AES-128 (Quadro 4000)

AES-128 (K20) AES-256 (K20)

> 7x > 6x

Page 21: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Video Analytics Implementation in GPU

Intelligent Surveillance Platform

Video Analytics • Intrusion Detection • Loitering Detection • Slip & Fall Detection • Unattended Object Detection • Object Removal Detection

Page 22: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Video Analytics Implementation in GPU

*40++ cameras implementation

~25% utilization*

Region of Interest during intrusion

* Differs based on server configuration & video complexity

Page 23: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Camera 2 MJPEG Decoding

IP Network

CPU

Surveillance Server

Video Analytics Processing

Client PC

ALERT !!!

Camera 1 MJPEG Decoding

Camera N MJPEG Decoding

GPU VA Library

Background Subtraction

AccBackgroundSubtractionFrameDiff AccCompMotion, AccUpdateBackground AccCompShadow, AccRGB2HSV

Morphing Process AccMorphFilterVariable

CCL AccConnectComponentLabel

Region Analyzer

AccExtractPropertiesCentroid, AccExtractPropertiesSize, AccExtractPropertiesBB, AccExtractPropertiesHWRatio, AccExtractPropertiesOrientation, AccExtractPropertiesHProject, AccExtractPropertiesSkew, AccRegionLabelUpdate, AccCompOverlap, AccPropUpdate, AccCombineBlob

Filters AccFlickerFilter, AccRegionFilter

Detection

AccVAParallelIntrusionDetection

Video Analytics

Processing

Parallelization of the VA algorithms

Previous data dependency Efficient memory management. Algorithm Decomposition

CPU + GPU

CPU

Page 24: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Tasks CPU + GPU * CPU Utilization

Network Stream In CPU 10%

Decompression CPU 5%

Video Analytics GPU 35%

Streaming Out & Display

CPU 50%

* Data taken on system server CPU - Dual 8 cores

GPU VA System Results

* Reference to 10fps

0

2

4

6

CPU Dual 6 Core GPU Kepler K20C

Tim

e (

ms)

VA Processing Time CPU vs GPU

3.6x

0

10

20

30

40

50

60

70

80

90

100

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

20 15 10

No

of C

ame

ra

Uti

lizat

ion

fps

CPU with SINGLE GPU K20C

No. of Cameras

CP

U

GP

U

GP

U M

EMO

RY

0

10

20

30

40

50

60

70

80

90

100

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

20 15 10

No

of C

ame

ra Uti

lizat

ion

fps

CPU with DUAL GPU K20C

No. of Cameras

CP

U

GP

U

GP

U M

EMO

RY

41 41 35

70

80 90

CP

U

GP

U

GP

U M

EMO

RY

CP

U

GP

U

GP

U M

EMO

RY

CP

U

GP

U

GP

U M

EMO

RY

CP

U

GP

U

GP

U M

EMO

RY

Page 25: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

MIMOS ATL: R&D GPU Cluster

Enabling High Performance Computing for MiAccLib

29.3 Teraflops (SP) / 13.9 Teraflops (DP)

Page 26: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

MIMOS GPU Cluster Features

• Altair PBS system (v12.0) – PBS Scheduling – PBS Display Manager – PBS Compute Manager – PBS Analytic

• Point To Point Mellanox Infiniband • NVIDIA GPU Direct • MVAPICH-GDR v2.0: MPI Over

Infiniband • CUDA 6.5 • Operating System: CentOS 6.4 • NVIDIA Tesla & Intel Xeon Phi

Page 27: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Experiment on Infiniband Bandwidth

CPU Model Socket/Cores/GHz Memory IB Card IB Switch OS CUDA GPU

Ohio Intel E5-2680 2x10 @ 2.8Ghz 64GB Mellanox

Connect-IB Mellanox FDR

IB Switch RHEL 6.5 CUDA 6.5

NVIDIA Tesla K40c

MIMOS Intel E5-2640 2x6 @ 2.50GHz 32 GB Mellanox

ConnectX-3 Pointer to

Pointer CentOS 6.5 CUDA 6.5

NVidia Tesla K20c

Updated on: 25/9/2014, MIMOS

Ohio Result MIMOS’s Result

MPI Benchmark Test: MVAPICH2-GDR 2.0

0

1000

2000

3000

4000

5000

6000

7000

1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M

Page 28: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

GPU Direct RDMA Analysis - ATL

Updated on: 25/9/2014, MIMOS

CPU

GPU K20c GDDR5

Memory

Server 1

CPU

GPU K20c GDDR5

Memory

Server 2

0

1000

2000

3000

4000

5000

6000

7000

1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M

Ban

dw

isth

(M

B/s

ec)

Meesage Size (Bytes)

MVAPICH2-GDR 2.0 ATL Benchmark

Device to Device

Host to Host

Host to Device

Device to Host

Page 29: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Work on 2D Wave Equation Demo

𝑢𝑗𝑛+1 = 𝑐2

𝜕𝑡2

𝜕𝑥2𝑢𝑗+1,𝑖

𝑛 + 𝑢𝑗−1,𝑖𝑛 + 𝑢𝑗,𝑖+1

𝑛 + 𝑢𝑖,𝑗−1𝑛 + 2 1 − 2𝑐2

𝜕𝑡2

𝜕𝑥2𝑢𝑗,𝑖

𝑛 − 𝑢𝑗,𝑖𝑛−1

𝜕2𝑢

𝜕𝑡2= 𝑐2

𝜕2𝑢

𝜕𝑥2+

𝜕2𝑢

𝜕𝑦2

Algo 1: Wave Equation 2D

Algo 2: Finite Difference with Wave Equation 2D

Ported Two Dimension Wave Equation into ATL GPU Cluster with Demonstration Application

Page 30: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

FDTD with RDMA (Work In-Progress) Objective: Multi GPUs compute larger area of simulation of wave propagation

Challenge: Communication between halo is needed

Advantage: RDMA is use to reduce the memory transfer latency by directly transfer the halo into the other GPU’s in different nodes

GPU RDMA

Halo need to be exchange for each steps as wave propagate into different GPU, More partitioning mean more sharing is needed

2048 pixel

1st 1024 pixel compute in GPU1 2nd 1024 pixel compute in GPU2

Wave are crossing into different GPU frame. Red line is the boundary of the image in different GPUs

In-progress works:

Page 31: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

High Performance Parallel Data Warehouse

Big Data Support

Query performance: Expect

10x or more improve query

performance compared to

RDBMS

Scale Out: Incremental by

adding Hardware

Shared Nothing Support RDBMS SQL query

Support familiar BI tools through RDBMS SQL query

In-Memory Modified HDFS

with RDMA

Data Warehouse and Data Model Plugin to BI tools

Data Loading Data Warehouse Analyze & Visualize

Hig

h S

pe

ed

Ne

two

rk C

on

ne

ctio

n

(In

fin

iba

nd

)

Page 32: Big Data Analytics Using Accelerator for  · PDF fileBig Data Analytics Using Accelerator for HPC KK Yong ... string analytics Crypto ... Algorithms Functional Algorithms

Visit us http://gpu.mimos.my

Up Coming Event: • CUDA Programming Challenge 2014 (30th September, 2014) • GPU Annual Workshop (10th October 2014, Technology Park Malaysia, MIMOS)