sap hana real time computing - web.stanford.edu · sap hana – real time computing chris...

32
SAP HANA Real Time Computing Chris Hallenbeck, HANA Solution Management Christof Bornhoevd, HANA Architecture Richard Pledereder, HANA Product Engineering May 22 nd , 2013

Upload: phamxuyen

Post on 09-May-2018

237 views

Category:

Documents


3 download

TRANSCRIPT

SAP HANA – Real Time Computing

Chris Hallenbeck, HANA Solution Management

Christof Bornhoevd, HANA Architecture

Richard Pledereder, HANA Product Engineering May 22nd, 2013

© 2013 SAP AG. All rights reserved. 2 Public

SAP Software Portfolio

Applications Mobile Analytics Cloud Database & Tech.

Powered by SAP HANA

a completely re-imagined, modern platform for real-time business

© 2013 SAP AG. All rights reserved. 3 Public

Outline

Overview of HANA

Leveraging New Generation Commodity Hardware

HANA Architecture

Summary

© 2013 SAP AG. All rights reserved. 4 Public

Modern Enterprise Business Applications

The Need for Efficient and Flexible Data Management

Execute

Me

as

ure

Understand

Op

tim

ize

External Sources

Combine different information access approaches: search, analysis, and exploration

No clear separation between transactional and analytical parts of the application

Leverage data of different degrees of structure and quality, from well-structured to irregularly structured to unstructured text data

Flexibly combine internal and external data based on business decisions to be made not the set of available integrated data

Are based on “real-time” current data and historical data

Need to support different form factors and deployment models: on-premise, on-demand and on-device

© 2013 SAP AG. All rights reserved. 5 Public

SAP HANA: The Challenge

Broad

Deep

High Speed

Complex & interactive questions

on granular data

Big data,

many

data types

Fast

response-time,

interactivity

Broad

Deep

High Speed

Simple Real-time

Complex & interactive questions

on granular data

Big data,

many

data types

Fast

response-time,

interactivity

No data preparation,

no pre-aggregates,

no tuning

Recent data, preferably

real-time

Simple Real-time

No data preparation,

no pre-aggregates,

no tuning

Recent data, preferably

real-time

OR

© 2013 SAP AG. All rights reserved. 6 Public

SAP HANA: The Challenge

Unify Transaction Processing and Analytics

Single System

Same Data Instance

Run Analytics in Real-Time

Run Analytics and Transactions at the “speed of thought”

© 2013 SAP AG. All rights reserved. 7 Public

SAP HANA: Single System for diverse Business Needs

© 2013 SAP AG. All rights reserved. 8 Public

SAP HANA: A New In-Memory Data Platform

One Foundation

for

OLTP + OLAP | Structured + Unstructured Data

Legacy + New Applications

Distribution | Single Lifecycle Management

© 2013 SAP AG. All rights reserved. 9 Public

Outline

Overview of HANA

Leveraging New Generation Commodity Hardware

HANA Architecture Overview

Summary

© 2013 SAP AG. All rights reserved. 10 Public

Hardware Advances: Moore‟s Law - DRAM Pricing

1980: Memory $10,000/MB

2000: Memory $1/MB

2013: Memory $0.004/MB

Time

Memory

Cost /

Speed

© 2013 SAP AG. All rights reserved. 11 Public

Hardware Advances: Moore„s Law - CPUs

2002

1 core

32 bits

4MB

2007

2 cores

2 CPUs per server

External Controllers

8 cores -16 threads / CPU

4 CPUs per server

On-chip memory control

Quick interconnect

VM and vector support

64 bits; 256 GB - 1 TB

2010

More cores, bigger caches

16 ... 64 CPUs per server

Greater on-chip integration

(PCIe, network, ...)

Data-direct I/O

Tens of TBs

2013

Images: Intel, Danilo Rizzuti / FreeDigitalPhotos.net

© 2013 SAP AG. All rights reserved. 12 Public

Software Advances: Build for In-Memory Computing Reduce Memory Access Stalls

Parallelism: Take advantage of tens, hundreds of cores

Data Locality: On-chip cache awareness

In-Memory Computing: It is all data-structures (not just tables)

© 2013 SAP AG. All rights reserved. 13 Public

In-Memory Computing

Yes, DRAM is 100,000

times faster than disk, but

DRAM access is still 6-200

times slower than on-chip

caches

100 NS

CPU

Core Core

L1 Cache L1 Cache

L2 Cache L2 Cache

L3 Cache

Main Memory

Disk

0.5 NS

7.0 NS

15.0 NS

SSD: 150K NS

HD: 10M NS

© 2013 SAP AG. All rights reserved. 14 Public

Order Country Product Sales

456 France corn 1000

457 Italy wheat 900

458 Italy corn 600

459 Spain rice 800

In-Memory Computing – Data Structures

456 France corn 1000

457 Italy wheat 900

458 Italy corn 600

459 Spain rice 800

456

457

458

459

France

Italy

Italy

Spain

corn

wheat

corn

rice

1000

900

600

800

Typical Database

SAP HANA: column order

SELECT Country, SUM(sales) FROM SalesOrders

WHERE Product = ‘corn’

GROUP BY Country

© 2013 SAP AG. All rights reserved. 15 Public

SAP HANA: Dictionary Compression

JonesMiller

MillmanZsuwalskiBakerMillerJohnMillerJohnsonJones

Column „Name“

(uncompressed)Value-ID sequence

One element for each row in column

415N042431

Va

lue

IDs

JohnsonMiller

JohnJones

01234

Millman

ZsuwalskiN

Dictionary

so

rte

d

Value ID implicitly given

by sequence in which

values are stored

Value

Baker

5

Column „Name“ (dictionary compressed)

point into

dictionary

© 2013 SAP AG. All rights reserved. 16 Public

Col C2500

21

78675

3432423

123

56743

342564

4523523

3665364

1343414

33129089

89089

562356

processed by Core 3

Core 4processed by

Col B4545

76

6347264

435

3434

342455

3333333

8789

4523523

78787

1252

Col A1000032

67867868

2345

89886757

234123

2342343

78787

9999993

13427777

454544711

21

Core 1 Core 2

pro

ce

sse

d b

y

pro

ce

sse

d b

y

676731223423

123123123 789976

1212

2009

20002

2346098

SAP HANA: Multi-Core Parallelization

© 2013 SAP AG. All rights reserved. 17 Public

• Scalar processing

− traditional mode

− one instruction produces

one result

• SIMD processing

−with Intel® SSE(2,3,4)

−one instruction produces

multiple results

X4

Y4

X4opY4

SOURCE

X3

Y3

X3opY3

X2

Y2

X2opY2

X1

Y1

X1opY1

DEST

SSE/2/3 OP

0 127

X

Y

XopY

SOURCE

DEST

Scalar OP

Single Instruction Multiple Data (SIMD)

© 2013 SAP AG. All rights reserved. 18 Public

Single Instruction Multiple Data (SIMD)

128-bit wide with

Intel® SSE(2,3,4)

2 64-bit integer ops/cycle

4 32-bit integer ops/cycle

8 16-bit integer ops/cycle

16 8-bit integer ops/cycle

256-bit with AVX

X4

Y4

X4opY4

SOURCE

X3

Y3

X3opY3

X2

Y2

X2opY2

X1

Y1

X1opY1

DEST

SSE2 OP

0 127

CLOCK CYCLE 1

SSE Operation

Vector-Processing Unit built-in standard processors

© 2013 SAP AG. All rights reserved. 19 Public

Parallelization at All Levels

Multiple user sessions

Concurrent operations within

a query (… T1.A … T2.B…)

Data partitioning on one or

more hosts

Horizontal segmentation,

concurrent aggregation

Multi-threading at Intel

processor core level

Vector Processing

host 1 host 2 host 3

© 2013 SAP AG. All rights reserved. 20 Public

SAP HANA: True In-Memory Computing

Use Vector Based Processing (SIMD)

Leverage Data Locality

Act on Compressed Data

Cache Line (64K) Aligned Data

Hyper-Threading

3.2B Integer Scans / Second / Core

12.5M Aggregates / Second / Core

1.5M Inserts / Second

© 2013 SAP AG. All rights reserved. 21 Public

SAP HANA deployment options

Single Server • 2 CPU 128GB to 8 CPU

1TB

• Single HANA deployments

for data marts or

accelerators

Scale Out Cluster • 2 to n servers per cluster

• Each server is either 4 CPU/

512GB or 8 CPU/ 1TB

• Largest certified configuration:

56 servers

• Largest tested configuration:

250 servers

• Support for high availability

and disaster tolerance

Cloud

Deployment • HANA instances can be

deployed to AWS

• Free developer license

• 99 cents per hour for

productive use

(+ EUR 2.50 for the AWS

machine)

© 2013 SAP AG. All rights reserved. 22 Public

Outline

Overview of HANA

Leveraging New Generation Commodity Hardware

HANA Architecture Overview

Summary

© 2013 SAP AG. All rights reserved. 23 Public

SAP HANA Technology & Features

Combined in one DBMS Platform

Common DBMS features

SQL

ACID: isolation (MVCC), logging and recovery

Stored procedures

Hybrid DBMS

Column store, row store, graph store

In-memory and disk based

Insert only (temporal) and updatable tables

High Performance

Efficient compression techniques

Massive parallelization over CPU cores and nodes

Data structures optimized for main memory

Data aging concept

Reduced TCO: OLTP, OLAP, search in one system

© 2013 SAP AG. All rights reserved. 24 Public

SAP HANA Database

Multi-Engine for Different Application Needs

Business Applications

Page Management

Logging and Recovery

Calculation Engine

SQL

Metadata

Manager

Trans-

action

Manager

Persistency Layer

Execution Engine

Connection and Session Management

Optimizer and Plan Generator

SQL Script MDX FOX

(Planning) WIPE

(Graphs)

In-Memory Processing Engines

Authori-

zation

Manager

Relational

Engine

(Row and Column

Store)

Graph Engine Text Engine

© 2013 SAP AG. All rights reserved. 25 Public

Diverse Set of Operators

Engine Layer Rel.

Ops

OLAP

Ops

L

Runtime

Text

Ops

Graph

Ops

• Engine Layer – a collection of different physical operators with some local optimization logic to

adapt the fragment of the global plan to the specifics of the actual physical operator.

– implement standard data transfer protocol as well as highly specialized communication protocols within operator groups

– all operators exploit a common interface to the unified table layer

• Set of plan operators – Relational Operators: classic relational query graph processing

– OLAP operators: optimized for star-join scenarios with fact and dimension tables

– L runtime: runtime for the internal language to execute L code

– Text operators: set of text search analysis operators ( e.g. entity resolution capabilities)

– Graph operators: provide support for graph-based algorithms

© 2013 SAP AG. All rights reserved. 26 Public

High data compression

Efficient compression methods (dictionary, run length, cluster, prefix, etc.)

Compression works well with columns and can speedup operations on

columns (~ factor 10)

Because of compression, write changes into less compressed delta storage

Needs to be merged into columns from time to time or when a certain size is

exceeded

Delta merge can be done in background

Trade-off between compression ratio and delta merge runtime

Updates into delta data storage and periodically merged into main data storage

High write performance not affected by compression

Data is written to delta storage with less compression which is optimized for

write access. This is merged into the main area of the column store later on.

SAP HANA Column Store

© 2013 SAP AG. All rights reserved. 27 Public

Delta main store

REDO

log data area

DML

Merge

SAP HANA: Column Store

write-optimized

representation read-optimized

representation

© 2013 SAP AG. All rights reserved. 28 Public

SAP HANA Database

Optimized Communication with the Application Layer

Browser-Based Clients

SQL

Persistency Layer

Request Processing and execution Control

Connection and Session Management

SQL Script MDX FOX (Planning) WIPE (Graphs)

In-Memory Processing Engines

Relational Engine Graph Engine Text Engine

Other HTTP Clients

Core Services

Application Application Application

Application Container Lightweight Application Server

Protocol Handler

OData Service

© 2013 SAP AG. All rights reserved. 29 Public

SAP HANA Database

Optimized Communication with the Application Layer

HANA integrates a lightweight application server component into the database cluster infrastructure to allow efficient communication and data exchange between database layer and application layer

The lightweight application server component in HANA provides core application services like:

Application service runtime engine

Application lifecycle management (versioning/transport) via content repository

Programming model

standardized data exchange (OData: JSON / ATOM / XML)

Session and connection management

outbound connectivity (HTTP, SMTP) etc.)

© 2013 SAP AG. All rights reserved. 30 Public

Outline

Overview of HANA

Leveraging New Generation Commodity Hardware

HANA Architecture Overview

Summary

© 2013 SAP AG. All rights reserved. 31 Public

A New Data Management Platform for

Modern Business Applications

Much more than a Relational Database

OLTP + OLAP | Structured + Unstructured Data

Legacy + New Applications

Distribution | Single Lifecycle Management

SAP HANA – Real Time Computing

Chris Hallenbeck, HANA Solution Management

Christof Bornhoevd, HANA Architecture

Richard Pledereder, HANA Product Engineering May 22nd, 2013