2 october 2015ideas 2011 1 extend core udf framework for gpu-enabled analytical query evaluation...

18
March 14, 2022 IDEAS 2011 1 Extend Core UDF Framework for GPU-Enabled Analytical Query Evaluation Qiming Chen, Ren Wu, Meichun Hsu, Bin Zhang HP Labs Palo Alto, CA, USA March 14, 2022

Upload: winfred-cooper

Post on 29-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

April 19, 2023 IDEAS 20111

Extend Core UDF Framework for GPU-Enabled Analytical Query

Evaluation

Qiming Chen, Ren Wu, Meichun Hsu, Bin Zhang

HP Labs

Palo Alto, CA, USA

April 19, 2023

2 April 19, 2023 IDEAS 2011

Problems

• Motivated by pushing-down analytics to DB layer for fast data access and reduced data move −which requires integrating analytic computation

into the query pipeline using UDFs

• Existing UDF cannot act as a block operator with chunk-wise input, therefore −unable to deal with the application semantics

definable on a set of incoming tuples (e.g. representing an object)

−unable to leverage external computation engines (e.g. GPU) for efficient batch processing.

April 19, 2023

3 April 19, 2023 IDEAS 2011

Why need Block UDFs• From semantic point of view, many applications are

definable on a set of tuples−Minimal Spanning Ttree (MST) computation is defined on a tuple-

set representing a graph and returns a tuple-set representing the MST

April 19, 2023

• From performance point of view, processing data by external engine should be in-batch rather than copying data back and forth on the per-tuple basis

A graph tuple-set

MST relation

A MST tuple-set

Graph relation

GPU

Computation node

SAS server

UDF

4 April 19, 2023 IDEAS 2011

Solution: Set-In Set-Out (SISO) UDF• Introduce a new kind of UDFs called Set-In Set-Out

(SISO) as a block operator for processing the input tuples chunk by chunk from query processing pipeline−pool a chunk of input tuples,

−dispatches them to GPUs or an analytic engine for batch computation

−materializes the computation results and then streams out tuple by tuple to the query processing pipeline

April 19, 2023

SISO

UDF

Pipelined input

pooling Materializeresult

Pipelined output

GPU

5 April 19, 2023 IDEAS 2011

SISO Example: select vectorize(x,y,10) from point_table

April 19, 2023

comp

On t1, …t9, do “ETL” but return NULL

act like a scalar function

On t10, act like a table function

FIRST CALL: run computation on the 10 tuples

Normal CALLs: return 1 result tuple per call – tupe by tuple pipelined again

Build phase

Compute phase

Streamout phase

6 April 19, 2023 IDEAS 2011

Comparison with Scalar, Table UDF• Scalar UDF

−1 tuple in, 1 value/tuple out (tuple as composite value)

−Access to per-function state and per-tuple state

• Table UDF−1 tuple in, N tuples out

−Access to per-tuple (input) state and per-return state

• SISO−N tuple in, M value/tuple out

−Access to 4 level states: per-function, per-chunk, per-tuple (input), per-return• runs chunk by chunk; each chunk contains N tuples; return

nothing for (1,N-1)th tuple, return a result-set for Nth tuple

April 19, 2023

7 April 19, 2023 IDEAS 2011

Comparison with UDA• Agg operator or UDA

−No general form of set output (except group-by)

−No chunk-wise semantics

• SISO−Flexible forms of set output

−Chunk-wise semantics

April 19, 2023

Comparison with RVF• RVF

−Input relation initially as static data

−Input relation is loaded entirely rather than by chunks

• SISO−Input tuple-set chunk by chunk along query processing

−Input tuple-set as dynamic data

8 April 19, 2023 IDEAS 2011

Extending Query Engine to Support SISO UDF

• Support SISO as block-operator along the tuple-by-tuple query processing pipeline−With hybrid behavior in processing a chunk of N

tuples• for input tuples 1,…,N-1, like a scalar function, 1 call per

input tuple, returning nothing

• For tuple N, like a table function, multi-calls corresponding to that input tuple, returning a set

• Need to extend UDF Accessible States• Need to extend Invocation Pattern

April 19, 2023

9 April 19, 2023 IDEAS 2011

UDF Memory Context• A UDF is called multiple times in query

processing−In the FIRST_CALL a buffer can be initiated

−Then each NORMAL_CALL references and updates the buffer – buffer state across multi-calls

−After the FINAL_CALL, the buffer is discarded

• Multi-call context different for scalar and table UDF−For scalar UDF, 1 call per input

−For table UDF, N calls per input

• Therefore their memory contexts are different

April 19, 2023

10 April 19, 2023 IDEAS 2011

Extend UDF Accessible States

April 19, 2023

Per-function state

Per-chunk state

Per-tuple state

Per-return state

Per-function state

Per-tuple state Per-tuple state

Per-return state

SISO UDF Scalar UDF Table UDF

11 April 19, 2023 IDEAS 2011

Extend Call Skeleton

April 19, 2023

SISO UDF Scalar UDF Table UDF

Global First Call

Per-chunk First Call

Per-tuple single Call (no return)Per-tuple single Call (no return)Per-tuple single Call (no return)

:Last-tuple First Call

Normal Call (1 return)

Normal Call (1 return):

Last-tuple Last Call

Per-chunk Last Call

:

Per-chunk First Call

Per-chunk Final Call

Per-chunk First Call

Per-chunk Final Call:

Per-tuple Normal Call (1 return)

Per-tuple Normal Call (1 return)

Normal Call (1 return)

Normal Call (1 return):

Per-tuple First Call

Per-tuple Final Call

Global First Call

:

Final call optional (system specific)

:

Normal Call (1 return)

Normal Call (1 return):

Per-tuple First Call

Per-tuple Final Call

12 April 19, 2023 IDEAS 2011

SISO Call Skeleton Explained

April 19, 2023

SISO UDF

Global First Call

Per-chunk First Call

Per-tuple single Call (no return)Per-tuple single Call (no return)Per-tuple single Call (no return)

:Per-tuple First Call

Normal Call (1 return)

Normal Call (1 return):

Per-tuple Final Call

Per-chunk Final Call

:

Per-chunk First Call

Per-chunk Final Call

Per-chunk First Call

Per-chunk Final Call:

Pool last tuple in the chunk, make batch analytic computation

Set up function call global context for chunk-wise invocation (extend from fun-call node

Set up chunk-based buffer for pooling data

Pool tuples (vectorizing), return null

Rewind chunk oriented tuple index; Cleanup buffer

Return materialized results one tuple at a time

Advance chunk oriented tuple index, return null

13 April 19, 2023 IDEAS 2011

Integrate Query Processing with GPU Computation using SISO UDF

• Using General Purpose GPU (GPGPU) to accelerate analytic query processing allows us to leverage SQL’s analysis power and GPU’s computation power

• However, their operational patterns are different

−GPU computation is a kind of batch–processing with data-parallelism

−Query processing is tuple-by-tuple pipelined• We solve this problem by using SISO UDFs in

queries

−To handle batch GPU computation in query dataflow pipeline

April 19, 2023

14 April 19, 2023 IDEAS 2011

Experiment on Accelerating K-Means Clustering of Very Large Data Sets • K-Means clustering is an iterative process, in

each iteration −each point is assigned to the nearest cluster

center as the member of that cluster

−then for each center, its coordinates is re-calculated as the “mean” of the coordinates of its member points

• The process is repeated until convergence is achieved.

April 19, 2023

Init. Centers

Assign Center

Calc Centers

DoneConvergence Check

15 April 19, 2023 IDEAS 2011

Single Iteration of K-Means by SQL and SISO UDF

SELECT (p).cid, AVG((p).x) AS cx, AVG((p).y) AS cy FROM (

SELECT assign_center_siso(x, y, “SELECT * FROM Centers”, N)

AS p FROM Points ) r

GROUP BY (p).cid;

April 19, 2023

xp,yp

assign_center()

AVG GROUPBY

Points

Centers

cid,xc,yc

cid,xp,yp cid,xc,yc

SISO UDF

chunk-wise

initially

16 April 19, 2023 IDEAS 2011

Experiment Results Comparison• We compare performance of

−scalar UDF-wrapped, CPU-based implementation

−SISO UDF-wrapped, CPU-based implementation

−SISO-wrapped, GPU-accelerated implementation

April 19, 2023

Overall end-to-end query performance – Scalar UDF/CPU vs. SISO/CPU vs. SISO/GPUs

10M Points (second)

100M Points (second)

Q1: generalized scalar UDF (tuple by tuple)

155.45 1845.66

Q2: SISO UDF in 1M chunk computed by CPU

145.02 1541.41

Q3: SISO UDF in 1M chunk computed by GPGPU

27.41 345.01

17 April 19, 2023 IDEAS 2011

Scalar UDF vs. SISO UDF• the number of clusters set to 1000 • the number of data points from 1M to 100M• the chunk size fixed to 1M

−Beyond 1M (1000K), the performance gain gradually diminishes with further increase in chunk size

April 19, 2023

18 April 19, 2023 IDEAS 2011

Conclusions• In-DB analytics has been extensively

investigated, but not yet become a scalable approach

• An important reason lies in the lack of block UDFs to deal with the application semantics definable on a set of tuples, and to leverage external computation units such as GPUs for efficient batch processing

• To solve this problem, we developed SISO as a new kind of UDFs

• Integrating SISO with parallel DB is under further investigation

April 19, 2023