© 2017 IBM Corporation
Data Warehouse appliances: IBM Pure Data for Analytics
Fabio Bresciani, Cloud & Cognitive, IBM Italia
May 2017
1935 Training courses
for Women
1956 Data storage
industry creation
1944 First machine to
handle long calculations
automatically
1962 First computer-driven airline reservation
system
1971 Floppy
disk
1969 Magnetic strips on credit cards
1986 IBM scientists won the Nobel
Prize
1997 Supercomputer
defeated the best chess
player
1997 IBM
“eBusiness”
1924 International
Business Machines
1911 Computing-Tabulating-
Recording (CTR)
1961 The
Selectric Typewriter
1973 UPC bar codes
2011 IBM Watson
1981 The IBM PC
1969 IBM technology guided Apollo mission to the
moon
1927 Italy
2
Systems
Analytics
Healthcare
Consulting Services
Research
IBM Technical and Infrastructure
Services
Cloud
Internet of Things
Commerce
Security
Cosa fa IBM?
© 2017 IBM Corporation 4
5.7 B$ in R&D
(6% del fatturato)
13 centri di Ricerca
in 6 continenti,
fra cui quello di
Zurigo guidato
dall’italiano
Alessandro Curioni
5 premi Nobel
Per 24 anni consecutivi l’impresa leader nei brevetti
8.088 brevetti U.S. nel 2016
8.500 master inventor in 43 paesi
Concentrati in aree strategiche:
Cloud Computing, Analytics, Security
Cognitive Computing, Healthcare
IBM Research
Tokyo
Beijing/Shanghai
Melbourne
Delhi/Bengaluru
Nairobi
Haifa
Zurich
Dublin
New York
São Paulo/
Rio de Janeiro
Almaden
Austin
Centri di Ricerca IBM
Johannesburg
© 2017 IBM Corporation 5
Too complex an infrastructure
Too complicated to deploy
Too much tuning required
Too inefficient at analytics
Too many people needed to maintain
Too costly to operate
5
Traditional Data Warehouses
They do NOT meet the demands of advanced analytics on big data.
are just too complex
Too long to get answers
© 2017 IBM Corporation 6
Big Data Floods Traditional Database Systems
© 2017 IBM Corporation 7
Let’s Simplify This Mess
© 2017 IBM Corporation 8
And Bring Analytics In To The Warehouse
© 2017 IBM Corporation 9
Legacy RDBMS
Create Table - Logical Model
CREATE TABLE CRRADMIN.OT_ORDER_EVENTS
(
TRADE_DATE DATE NOT NULL,
ORIGIN_SYS_CD VARCHAR2(32 BYTE) NOT NULL,
ORIGIN_SYS_EVENT_SEQ VARCHAR2(32 BYTE) NOT NULL,
EVENT_ID NUMBER(9) NOT NULL,
EVENT_CLASS_CD VARCHAR2(32 BYTE) NOT NULL,
EVENT_DATETIME DATE NOT NULL,
ORIGIN_SYS_REF VARCHAR2(32 BYTE) NOT NULL,
ORIGIN_SYS_PARENT_REF VARCHAR2(32 BYTE),
ORIGIN_SYS_ORDER_REF VARCHAR2(32 BYTE),
ORIGIN_SYS_RELATED_REF VARCHAR2(32 BYTE),
ORIGIN_SYS_GROUP_REF VARCHAR2(32 BYTE),
ORIGIN_SYS_DATETIME DATE NOT NULL,
TRADE_ID NUMBER(9),
BASKET_ID NUMBER(9),
ORDER_ID NUMBER(9),
BASKET_NAME VARCHAR2(32 BYTE),
SQC_SQN VARCHAR2(20 BYTE),
EXECFAC_ID NUMBER(9),
CUSTOMER_REF VARCHAR2(255 BYTE),
INSTRUMENT_ID NUMBER(9),
SYMBOL VARCHAR2(64 BYTE),
…
);
Netezza Simplicity
Allocate Space
TABLESPACE “OTR_DATA" LOCAL
(PARTITION BY RANGE (TRADE_DATE) (
PARTITION P20070102 VALUES LESS THAN (20070102)
PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 262144 NEXT 262144 MINEXTENTS 1
MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL
DEFAULT, PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 262144 NEXT 262144 MINEXTENTS 1
MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL
DEFAULT,
PARTITION P20070104 VALUES LESS THAN (20070104)
PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 262144 NEXT 262144 MINEXTENTS 1
MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL
DEFAULT,
PARTITION P20070105 VALUES LESS THAN (20070105)
PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 262144 NEXT 262144 MINEXTENTS 1
MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL
DEFAULT,
PARTITION P20070106 VALUES LESS THAN (20070106)
PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 262144 NEXT 262144 MINEXTENTS 1
MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL
DEFAULT,
PARTITION P20070105 VALUES LESS THAN (20070105)
PCTFREE 10 INITRANS 2 MAXTRANS 255
STORAGE(INITIAL 262144 NEXT 262144 MINEXTENTS 1
MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL
DEFAULT,
Create Indexes
CREATE INDEX OTOE_EVENT_ID
ON
CRRADMIN.OT_ORDER_EVENTS(EVENT_ID)
TABLESPACE OTR_IDX
NOLOGGING
PCTFREE 10
INITRANS 2
MAXTRANS 255
STORAGE(BUFFER_POOL DEFAULT)
NOPARALLEL
NOCOMPRESS
/
CREATE INDEX OTOE_TRADE_ID
ON
CRRADMIN.OT_ORDER_EVENTS(TRADE_ID)
TABLESPACE OTR_IDX
NOLOGGING
PCTFREE 10
INITRANS 2
MAXTRANS 255
Netezza DDL
Create Table - Logical Model
CREATE TABLE CRRADMIN.OT_ORDER_EVENTS
(
TRADE_DATE DATE NOT NULL,
ORIGIN_SYS_CD VARCHAR (32) NOT NULL,
ORIGIN_SYS_EVENT_SEQ VARCHAR (32) NOT NULL,
EVENT_ID INTEGER NOT NULL,
EVENT_CLASS_CD VARCHAR (32) NOT NULL,
EVENT_DATETIME TIMESTAMP NOT NULL,
ORIGIN_SYS_REF VARCHAR (32) NOT NULL,
ORIGIN_SYS_PARENT_REF VARCHAR (32),
ORIGIN_SYS_ORDER_REF VARCHAR (32),
ORIGIN_SYS_RELATED_REF VARCHAR (32),
ORIGIN_SYS_GROUP_REF VARCHAR (32),
ORIGIN_SYS_DATETIME TIMESTAMP NOT NULL,
TRADE_ID INTEGER ,
BASKET_ID INTEGER ,
ORDER_ID INTEGER ,
BASKET_NAME VARCHAR (32),
SQC_SQN VARCHAR (20),
EXECFAC_ID INTEGER ,
CUSTOMER_REF VARCHAR (255),
INSTRUMENT_ID INTEGER ,
SYMBOL VARCHAR (64),
…
)
DISTRIBUTE ON (ORIGIN_SYS_REF);
•Logical Model Only
•No indexes
•No Physical Tuning/Admin
•Distribute Data by Columns or Round Robin
© 2017 IBM Corporation 10
IBM PureData System for Analytics The Simple Data Warehouse Appliance for Serious Analytics
What makes it different?
Speed - 10-100x faster than traditional custom systems1
Simplicity - minimal administration and tuning
Scalability - petabyte+ scale user data capacity
Smart - high performance, advanced analytics
Purpose-built analytics appliance
Integrated database, server and storage
Standard interfaces
Low total cost of ownership
© 2017 IBM Corporation 11
Massively Parallel Processing Architecture “Divide and conquer”
MPP
“Shared Nothing” concept
Divides the work in smaller tasks
• A big task is sliced vertically into a series of smaller tasks
• The smaller tasks run independently
• The work is automatically balanced among the tasks to minimize the
time to complete
• Each task is assigner the same amount of physical resources
• Communication between is made only at the beginning and end of the
task
Benefits
A large task completes in a short elapsed time
Maximizes use of resources
Points of Attention
Complexity on administration and management
Communication bottlenecks
© 2017 IBM Corporation 12
Data Warehouse Workload Fewer requests, lots of data manipulation
CPU
Request
General Purpose
Storage
Request
Transactional System used for BI
© 2017 IBM Corporation 13
Data Warehouse Workload Transaction systems are inefficient for data shuffling
Results
Transactional System used for BI
Request
General Purpose
Storage
CPU
© 2017 IBM Corporation 14
Results
IBM Pure Data System
Data Warehouse Blades Designed for Tera-scale Business Intelligence
Intelligent Storage CPU
Request
Asymmetric Massively Parallel Processing
© 2017 IBM Corporation 15
Results
IBM Pure Data System
Data Warehouse Blades Highly efficient data movement
Intelligent Storage CPU
Request
1% of network
traffic
2% of CPU
requirements
Asymmetric Massively Parallel Processing
© 2017 IBM Corporation 16
Asymmetric Massively Parallel Processing™
Massively Parallel
Intelligent Storage
1
2
3
920
Ÿ
Ÿ
Ÿ
Network
Fabric SMP Host
DBOS Front End
Netezza Appliance
High-Speed
Loader/Unloader
ODBC 3.X
JDBC Type 4
OLE-DB
SQL/92
Execution
Engine
SQL
Compiler
Query
Plan
Optimize
Admin
Source
Systems
Client
High
Performance
Loader
3rd Party
Apps
DBA CLI
ETL Server
SOLARIS
LINUX
HP-UX
AIX
WINDOWS
TRU64
High-Performance
Database Engine
Streaming joins,
aggregations, sorts
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
© 2017 IBM Corporation 17
High-Performance
Database Engine
Streaming joins,
aggregations, sorts
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
Execution
Engine
Asymmetric Massively Parallel Processing™
Massively Parallel
Intelligent Storage
1
2
3
920
Ÿ
Ÿ
Ÿ
Network
Fabric SMP Host
DBOS Front End
Netezza TwinFin Appliance
High-Speed
Loader/Unloader
SQL
Compiler
Query
Plan
Optimize
Admin
SQL
1 2 3
1 2 3
1 2 3
1 2 3
Snippets
1 2 3
SQL
Source
Systems
Client
High
Performance
Loader
3rd Party
Apps
DBA CLI
ETL Server
SOLARIS
LINUX
HP-UX
AIX
WINDOWS
TRU64
© 2017 IBM Corporation 18
FPGA Core CPU Core
Uncompress Project Restrict,
Visibility
Complex ∑
Joins, Aggs, etc.
select DISTRICT,
PRODUCTGRP,
sum(NRX)
from MTHLY_RX_TERR_DATA
where MONTH = '20091201'
and MARKET = 509123
and SPECIALTY = 'GASTRO'
Slice of table
MTHLY_RX_TERR_DATA
(compressed)
where MONTH = '20091201'
and MARKET = 509123
and SPECIALTY = 'GASTRO'
sum(NRX)
select DISTRICT,
PRODUCTGRP,
sum(NRX)
S-Blade Data Stream Processing
© 2017 IBM Corporation 19
High-Performance
Database Engine
Streaming joins,
aggregations, sorts, etc.
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
S-Blade
Processor &
streaming DB logic
Asymmetric Massively Parallel Processing™
Massively Parallel
Intelligent Storage
1
2
3
920
Ÿ
Ÿ
Ÿ
Network
Fabric SMP Host
DBOS Front End
Netezza TwinFin Appliance
High-Speed
Loader/Unloader
SQL
Compiler
Query
Plan
Optimize
Admin
1 2 3
1 2 3
1 2 3
1 2 3
Consolidate
Execution
Engine
ODBC 3.X
JDBC Type 4
OLE-DB
SQL/92
Source
Systems
Client
High
Performance
Loader
3rd Party
Apps
DBA CLI
ETL Server
SOLARIS
LINUX
HP-UX
AIX
WINDOWS
TRU64
© 2017 IBM Corporation 20
Inside the IBM PureData System for Analytics N3001
Optimized Hardware +
Software
Hardware
accelerated AMPP
Purpose-built for
high performance
analytics
Requires no tuning Snippet Blades ™
Hardware-based
query acceleration
with FPGAs
Blistering fast
results
Complex analytics
executed as the data
streams from disk
Disk Enclosures
User data, mirror,
swap partitions
High speed data
streaming
SMP Hosts
SQL Compiler
Query Plan
Optimize
Admin
© 2017 IBM Corporation 21
Disk Mirroring and Failover
All user data and temp space mirrored
Disk failures transparent to queries and transactions
Failed drives automatically regenerated
Bad sectors automatically rewritten or relocated
Primary
Mirror
Temp
© 2017 IBM Corporation 22
S-Blade™ Failover and Query Continuity
• Drives automatically reassigned to remaining S-Blades within a chassis
• Read-only queries (that have not returned data yet) automatically restarted
• Transactions and loads interrupted
• Loads automatically restarted from last successful checkpoint
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
S-Blades
© 2017 IBM Corporation 23
© 2017 IBM Corporation 24
ZoneMap™ – Pure Data's Anti-Index: Automatic Query Acceleration
Zone
Maps Base Table
Data Blocks
Col 1:
Date
Col 2:
Zip
Zone Maps: 18 out of 48 Extents Read
• Indexes are additional structures on disk, derived
from the base table to accelerate locating
information
• ZoneMaps are a method within the storage
system without the need for add’l structures on
the disk to indicate where data DOES NOT reside
• The NPS system
> Automatically stores min. & max. values of all
integer columns in each file extent
> Uses the ZoneMap information to determine if a
given extent should be read
Both indices and ZoneMaps are techniques to avoid full table scans, but
Netezza’s ZoneMap approach is:
• Automatic; and
• Does not require a separate structure to create, tune & maintain
© 2017 IBM Corporation 25
© 2017 IBM Corporation 26
© 2017 IBM Corporation 27
Distributions and Performance
SPU
Node
1
2
3
4
5
6
7
CPU Disk I/O Network
Response time
is affected by
the completion
time for all of
the SPUs in the
AMPP array.
A distribution method that distributes data evenly across all SPUs is the
single most important factor that can influence overall performance!
Respon
se T
ime
© 2017 IBM Corporation 28
Hash Distributions and Data Skew
Response T
ime
Gender = M or F
will distribute all table records on 2 SPUs
Select a distribution key with unique values and high cardinality
CPU Disk I/O Network
SPU
Node
1
2
3
4
5
6
7
© 2017 IBM Corporation 29
Hash Distributions and Processing Skew
Using a DATE column as the distribution key may distribute rows evenly across all S-
Blades. However, most analysis (queries) is performed on a date range. Massive
parallel processing won’t be achieved when all of the records to be processed for a
given date range are located on a single or a few S-Blades)
Response T
ime
Jan
Feb
Mar
Apr
May
Jun
Jul
CPU Disk I/O Network
SPU
Node
1
2
3
4
5
6
7
© 2017 IBM Corporation 30
CREATE TABLE customer
( c_custkey integer,
c_name character varying(25),
c_address character varying(40),
c_nationkey integer,
c_phone character(15),
c_acctbal numeric(15,2),
c_mktsegment character(10),
c_comment character varying(117)
) DISTRIBUTE ON ( c_custkey );
CREATE TABLE orders
( o_orderkey integer,
o_custkey integer,
o_orderstatus character(1),
o_totalprice numeric(15,2),
o_orderdate date,
o_orderpriority character(15),
o_clerk character(15),
o_shippriority integer,
o_comment character varying(79)
) DISTRIBUTE ON ( o_custkey );
Commonly JOINed Tables:
Use the Same Distribution Key
For tables commonly joined (WHERE clause) use the same
column/distribution key used in the JOIN!
© 2017 IBM Corporation 31
Impact of Distribution Key on Table Join Performance
100, 1, … 135, 4, …
190, 4, … 222, 8, …
1, … 4, …
8, … 10, …
118, 6, … 149, 7, …
206, 3, … 282, 11, …
3, … 6, …
7, … 11, …
112, 2, … 168, 5, …
174, 12, … 211, 2, …
2, … 5, …
9, … 12, …
ORDERS
Table
CUSTOMERS
Table
Join
Processing
Join
Processing
Join
Processing
No data
movement
is required
CREATE TABLE ORDERS (ORDER_NO, CUST_NO, …) DISTRIBUTE BY HASH (CUST_NO)
CREATE TABLE CUSTOMERS (CUST_NO, …) DISTRIBUTE BY HASH (CUST_NO)
SELECT … FROM ORDERS O, CUSTOMERS C WHERE O.CUST_NO = C.CUST_NO
Identical Distribution Keys
© 2017 IBM Corporation 32
Impact of Distribution Key on Table Join (cont.)
ORDERS
Table
CUSTOMERS
Table
118, 6, … 135, 4, …
174, 12, … 282, 11, …
1, … 4, …
8, … 10, …
112, 2, … 168, 5, …
206, 3, … 222, 8, …
3, … 6, …
7, … 11, …
100, 1, … 149, 7, …
190, 4, … 211, 2, …
2, … 5, …
9, … 12, …
Shipped rows Shipped rows Shipped rows
Data shipping
Join
Processing
Join
Processing
Join
Processing
CREATE TABLE ORDERS (ORDER_NO, CUST_NO, …) DISTRIBUTE BY HASH (ORDER_NO)
CREATE TABLE CUSTOMERS (CUST_NO, …) DISTRIBUTE BY HASH (CUST_NO)
SELECT … FROM ORDERS O, CUSTOMERS C WHERE O.CUST_NO = C.CUST_NO
Different Distribution Keys
Data movement
is required
© 2017 IBM Corporation 33
Workload Management
Workload Management (WLM) provided optional functionality to manage resources and prioritize usage
across a diverse multi-user environment to meet the need of mixed user workloads
Guaranteed Resource Allocation (GRA)
Mechanism to allocate NPS resources among groups of users in a multi-user environment
Prioritized Query Execution (PQE)
Finer control over resource allocation by extending the notion of query priorities from scheduling to execution
Short Query Bias (SQB)
Ensures users with short queries receive faster, higher, biased query response time under heavy system workloads
Workload Limits (GRA)
You can use the JOB MAXIMUM attribute of the group definition to control the number of actively running jobs
submitted by that group
Minimum Resource
Guarantees Request Queues User Requests
Departmental User
Admin Tasks
Power User
© 2017 IBM Corporation 34
Appliances are easy to monitor
© 2017 IBM Corporation 35 35 Page © 2017 IBM Corporation
Traditional storage is not ready for the digital transformation Object storage solves the problems of scale, management and costs
• Storage for unstructured data (photos, videos,
audios, …) and big data.
• Object is data with metadata.
• Basis for cloud storage, spans geographies.
• High scalability (seamless, multi-dimensional
scaling).
• Ease of use.
• Lower cost of operations.
BLOCK
&
FILE
• Traditional Storage
• Block storage = fixed size blocks in rigid
arrangement, ideal for enterprise databases.
• File storage - sharing files in hierarchically
nested folders, ideal for active documents.
OBJECT
© 2017 IBM Corporation 36 36 Page © 2017 IBM Corporation
© 2017 IBM Corporation 37 37 Page © 2017 IBM Corporation
© 2017 IBM Corporation 38
text
38
Original Data
Objects are sent to the Accesser via the S3 Compatible API or Openstack Swift Compatible API $
Accesser
Writing Data to IBM Cloud Object Storage
1
Each object is segmented into 4MB segments e.g a 1GB object will be segmented into 250 segments.
2 $
4MB 4MB 4MB 4MB 4MB
Let’s store a Video!
© 2017 IBM Corporation 39
text
39
Each segment is encrypted and then sliced. 3 $
4MB 4MB 4MB 4MB
7
6
5
4
3
1
2
Writing Data to IBM Cloud Object Storage
$
4MB 4MB 4MB 4MB
7
6
5
4
3
1
2
Erasure coding is used to transform the data into a customizable number of slices
4
12
11
10
9
8
7
6
5
4
3
1
2
Erasure Coding
Expansion
© 2017 IBM Corporation 40
text
40
$
4MB 4MB 4MB 4MB
7
6
5
4
3
1
2
12
11
10
9
8
7
6
5
4
3
1
2
SITE 1 SITE 2 SITE 3
Storage Nodes
Each slice is written to a separate storage node. In this example, the storage nodes are geographically dispersed across 3 sites.
5
SITE 2 SITE 3
SITE 3 SITE 1 SITE 2
Writing Data to IBM Cloud Object Storage
© 2017 IBM Corporation 41
text
41
4MB 4MB 4MB 4MB 4MB
SITE 1 SITE 2 SITE 3
Storage Nodes
SITE 2 SITE 3
SITE 3 SITE 1 SITE 2 With this 12/7 Information Dispersal Algorithm, a read can still be executed with any five storage nodes being unavailable.
Reading Data from IBM Cloud Object Storage
© 2017 IBM Corporation 42
text
42
SITE 1 SITE 2 SITE 3
Storage Nodes
SITE 2 SITE 3
SITE 3 SITE 1 SITE 2
$
Even an entire site outage (plus one additional storage node outage) can be tolerated.
Reading Data from IBM Cloud Object Storage
43 Page
IBM Cloud Object Storage
EFFICIENCY
How to build a highly reliable storage system
for 1 Petabyte of usable data?
RAID 6 + Replication IBM Cloud Object Storage
1 PB
3.6 PB
600
3.6x
3.6x
3 FTE
Replication/backup
Usable Storage
Raw Storage
6TB Disks
Racks Required
Floor Space
Ops Staffing
Extra Software
$ 70% + TCO Savings
Original
1.20 PB Raw
Onsite mirror
1.20 PB Raw
Remote copy
1.20 PB Raw
1 PB
1.7 PB
288
1.7x
1.7x
.5 FTE
None