点击左侧按 钮下载音频 - zhiding.cnftps.zdnet.com.cn/files/3/23991.pdf · 2013-09-13 ·...
TRANSCRIPT
大数据时代:现实世界中的数据量呈爆炸式增长
46 亿移动电话用户
13 亿 RFID 标签 in 200530 亿 RFID标签 by 2010
20 亿 Internet 用户by 2011
Twitter 每天处理7 terabytes 数据
Facebook 每天处理10 terabytes
世界气象数据中心 220 Terabytes 网页数据 9 Petabytes 其他数据
资产市场数据量增长1,750%,
2003-06
下一代数据仓库
预处理中心 可查询的归档
信息整合
数据仓库
Streams实时处理
BigInsights所有数据的着陆区
数据仓库
BigInsights
结合非结构化信息
数据仓库
1
查找和查看数据
Data Explorer
Data Explorer
BigInsights
Streams微秒级延迟的离线分析
Cognos BI
SPSS Modeler
SPSS Modeler
Cognos BI
探索性分析2 3
适用于像电子商务这样的应用程序…
已针对事务吞吐量和可扩展性进行了优化的数据库集群服务
适用于像客户分析这样的应用程序…
已针对 PB 级规模的高速分析和简单性进行了优化的数据仓库服务
适用于像实时欺诈检测这样的应用程序…
已经过优化的运营数据仓库服务,可平衡高性能分析和实时运营吞吐量
由 Netezza 技术驱劢
应对大数据挑战 – 快速而又简单!
IBM PureData System
适合于像探索性分析和归档查询这样的应用…
已针对大数据分析和在线归档进行优化的Hadoop数据服务
System for Transactions
System for Analytics
System for Operational Analytics
System for Hadoop
容量 = 用户数据空间压缩 = 有效用户数据空间
可靠的扩展性,广泛的适用场景
• 8 个磁盘阵列柜• 96个1TB SAS 硬盘 (4 热备份)
• RAID 1 镜像
• 14个Netezza S-Blades™:
• 2 Intel Quad-Core 2+ GHz CPUs
• 4 Dual-Engine 125 MHz FPGAs
• 24 GB DDR2 RAM
• Linux 64-bit Kernel
• 2个主机 (1主1备):
• 2 个Quad-Core Intel 2.4 GHz CPUs
• 7x146 GB SAS硬盘• Red Hat Linux 5 64-bit
• 用户数据容量: 128 TB
• 数据扫描速度: 145 TB/hr
• 装载速度 (每个系统): 2+ TB/hr
• 电源要求: 7.6 kW
• 制冷要求: 7.8 kW
PDA N1001 设备构成
Scales from 1/4 Rack to 10 Racks
用户数据容量: 192 TB* 数据扫描速度: 450 TB/hr* 装载速度(每个系统): 5+ TB/hr
电源要求: 7.5 kW 制冷要求: 7.9 kW
2 Hosts (Active-Passive) 2 6-Core Intel 3.46 GHz CPUs 7x300 GB SAS Drives Red Hat Linux 6 64-bit
7 PureData for Analytics S-Blades™ 2 Intel 8 Core 2+ GHz CPUs 2 8-Engine Xilinx Virtex-6 FPGAs 128 GB RAM + 8 GB slice buffer Linux 64-bit Kernel
12 Disk Enclosures 288 600 GB SAS2 Drives 240 for User Data 14 for S-Blades 34 Spare
RAID 1 Mirroring
Scales from ½ Rack to 4 Racks
PDA N2001 设备构成
S-BladeMPP架构
1
2
3
1000+
以太网前端 SMP 主机
LINUX前端
高性能数据库引擎Streaming joins, aggregations,
sorts, etc.
Processor &streaming DB logic
Processor &streaming DB logic
Processor &streaming DB logic
Snippet 处理核心
Processor &streaming DB logic
Netezza Performance Server® - TwinFin 一体机
高速加载/导出
执行引擎
SQL 编译器
查询计划
优化
管理
源系统
客户端
高速加载器
三方应用
DBA CLI
ETL 服务器
SOLARIS
LINUX
HP-UX
AIX
WINDOWS
TRU64
ODBCJDBC Type 4
SQL
PDA AMPP™架构
Snippet 处理核心
Snippet 处理核心
Snippet 处理核心
PDA数据流处理程序
FPGA Core CPU Core
解压 投影(列) 过滤(行)
SELECT DISTRICT, PRODUCTGRP,SUM(NRX)
From MTHLY_RX_TERR_DATAWHERE MONTH=‘20091201’AND MARKET=509123AND SPECIALTY=‘GASTRO’
Slice of TABLE MTHLY_RX_TERR_DATA
(compressed)
SELECT DISTRICT, PRODUCTGRP,SUM(NRX)
WHERE MONTH=‘20091201’AND MARKET=509123AND SPECIALTY=‘GASTRO’
SUM(NRX)
聚合 ∑连接, 汇总, 等.
查询结果
Netezza数据仓库设备
PDA数据仓库专用设备:性能的革命性突破
MPP “智能存储”: 数据处理单元与存储相结合
SMP主机(2-4 CPU)
查询请求
网络流量:仅为现有系统的1%
CPU:仅为现有系统的2%
数据在进入到内存之前,已经经过流式的数据处理
运转中的压缩引擎
在数据装载时
行数据分隔成列数据流
每个数据流处理独立编译
区段指令应用到数据块头block headers
压缩数据保留行级结构特征
在数据扫描/查询时
FPGA执行区段指令以线速wire speed 解码
数据重新组装成行传递到后续高速引擎继续处理
Burst rows into column
streams
Compile independent
streams
Apply field instructions
Compressed storage retains
all structural properties of
row-wise uncompressed
storage
Execute field instructions to
recover full-sized values
Reassemble values to
recover full-sized,
uncompressed rows & pass
on to remaining FAST
engines
Burst rows into column
streams
Compile independent
streams
Apply field instructions
Compressed storage retains
all structural properties of
row-wise uncompressed
storage
Execute field instructions to
recover full-sized values
Reassemble values to
recover full-sized,
uncompressed rows & pass
on to remaining FAST
engines
行数据分隔成列数据流
编译独立的数据流
应用区段指令
压缩后存储保留所有压缩前存储的行级结构特征
执行区段指令恢复完整大小数值
重新组装恢复完整大小的数值,解压数据行,传递到后续的高速引擎
FROM
ADMIN.MBR_NM_ADDR,
ADMIN.MBR_PRFL LEFT OUTER JOIN ADMIN.REF_RCNCY_CD Member_Recency_CD ON
Member_Recency_CD.RCNCY_CD=ADMIN.MBR_PRFL.RCNCY_CD LEFT OUTER JOIN
ADMIN.REF_FREQ_CD Member_Frequecncy_CD ON
Member_Frequecncy_CD.FREQ_CD=ADMIN.MBR_PRFL.FREQ_CD LEFT OUTER JOIN
ADMIN.REF_AMT_CD Member_Amount_CD ON Member_Amount_CD.AMT_CD=ADMIN.MBR_PRFL.AMT_CD,
ADMIN.MBR_BY_GIFT,
ADMIN.MBR_GIFT_HIST LEFT OUTER JOIN ADMIN.REF_CPGN_TYP Gift_Campaign_Type ON
ADMIN.MBR_GIFT_HIST.CPGN_TYP=Gift_Campaign_Type.CPGN_TYP LEFT OUTER JOIN
ADMIN.REF_DONOR_CLASS_CD Gift_Donor_Class ON
ADMIN.MBR_GIFT_HIST.DONOR_CLASS_CD=Gift_Donor_Class.DONOR_CLASS_CD LEFT OUTER JOIN
ADMIN.REF_CPGN_AUDNC_CD Gift_Cpgn_Audience ON
ADMIN.MBR_GIFT_HIST.CPGN_AUDNC_CD=Gift_Cpgn_Audience.CPGN_AUDNC_CD LEFT OUTER JOIN
ADMIN.REF_PRFL_CD Gift_Profile_CD ON
Gift_Profile_CD.PRFL_CD=ADMIN.MBR_GIFT_HIST.PRFL_CD LEFT OUTER JOIN
ADMIN.REF_RCNCY_CD Gift_Recency_CD ON
Gift_Recency_CD.RCNCY_CD=ADMIN.MBR_GIFT_HIST.RCNCY_CD LEFT OUTER JOIN
ADMIN.REF_FREQ_CD Gift_Frequency_CD ON
ADMIN.MBR_GIFT_HIST.FREQ_CD=Gift_Frequency_CD.FREQ_CD LEFT OUTER JOIN
ADMIN.REF_AMT_CD Gift_Amount_CD ON
ADMIN.MBR_GIFT_HIST.AMT_CD=Gift_Amount_CD.AMT_CD LEFT OUTER JOIN
ADMIN.REF_RSP_CD Gift_Response_Code ON
Gift_Response_Code.RSP_CD=ADMIN.MBR_GIFT_HIST.RSP_CD LEFT OUTER JOIN
ADMIN.REF_SRC_CD Gift_Source_CD ON
Gift_Source_CD.SRC_CD=ADMIN.MBR_GIFT_HIST.SRC_CD LEFT OUTER JOIN
ADMIN.REF_PREM_TYP Gift_Premium_Type ON
Gift_Premium_Type.PREM_TYP=ADMIN.MBR_GIFT_HIST.PREM_TYP LEFT OUTER JOIN
ADMIN.REF_CARE_GVNG_CD Gift_Caregiver ON
ADMIN.MBR_GIFT_HIST.CARE_GVNG_CD=Gift_Caregiver.CARE_GVNG_CD
WHERE
( ADMIN.MBR_NM_ADDR.MBR_ID=ADMIN.MBR_PRFL.MBR_ID )
AND ( ADMIN.MBR_BY_GIFT.MBR_ID=ADMIN.MBR_PRFL.MBR_ID )
AND ( ADMIN.MBR_PRFL.MBR_ID=ADMIN.MBR_GIFT_HIST.MBR_ID )
AND (
ADMIN.MBR_PRFL.MBR_ID = '00331415'
)
GROUP BY
ADMIN.MBR_NM_ADDR.ADDR_TYP,
ADMIN.MBR_NM_ADDR.ADDR_LINE_1,
ADMIN.MBR_NM_ADDR.ADDR_LINE_2,
ADMIN.MBR_NM_ADDR.CITY,
ADMIN.MBR_NM_ADDR.STATE_CD,
ADMIN.MBR_NM_ADDR.ZIP_CD_BASE,
ADMIN.MBR_NM_ADDR.ZIP_CD_SUFX,
ADMIN.MBR_PRFL.RCNCY_CD,
ADMIN.MBR_PRFL.FREQ_CD,
Member_Recency_CD.RCNCY_CD_DESC,
ADMIN.MBR_PRFL.AMT_CD,
ADMIN.MBR_PRFL.RCNCY_CD || ADMIN.MBR_PRFL.FREQ_CD,
ADMIN.MBR_PRFL.RCNCY_CD || ADMIN.MBR_PRFL.FREQ_CD || ADMIN.MBR_PRFL.AMT_CD,
Member_Frequecncy_CD.FREQ_CD_DESC,
Member_Amount_CD.AMT_CD_DESC,
Member_Recency_CD.RCNCY_CD_DESC || ' ' || Member_Frequecncy_CD.FREQ_CD_DESC,
Member_Recency_CD.RCNCY_CD_DESC || ' ' || Member_Frequecncy_CD.FREQ_CD_DESC || '
' || Member_Amount_CD.AMT_CD_DESC,
ADMIN.MBR_BY_GIFT.FRST_GIFT_AMT,
ADMIN.MBR_BY_GIFT.FRST_GIFT_DT,
ADMIN.MBR_BY_GIFT.LAST_GIFT_AMT,
ADMIN.MBR_BY_GIFT.LAST_GIFT_DT,
ADMIN.MBR_BY_GIFT.HGST_GIFT_AMT,
ADMIN.MBR_BY_GIFT.HGST_GIFT_DT,
ADMIN.MBR_BY_GIFT.HGST_GIFT_LAST_24_MTH,
ADMIN.MBR_BY_GIFT.FSCL_YTD_AVG,
ADMIN.MBR_BY_GIFT.PREV_FSCL_YR_AVG,
ADMIN.MBR_BY_GIFT.LFTM_AVG,
ADMIN.MBR_GIFT_HIST.MAIL_KEY_CD,
ADMIN.MBR_GIFT_HIST.CPGN_TYP,
Gift_Campaign_Type.CPGN_TYP_DESC,
ADMIN.MBR_GIFT_HIST.DONOR_CLASS_CD,
Gift_Donor_Class.DONOR_CLASS_CD_DESC,
ADMIN.MBR_GIFT_HIST.CPGN_AUDNC_CD,
Gift_Cpgn_Audience.CPGN_AUDNC_CD_DESC,
ADMIN.MBR_GIFT_HIST.CPGN_YR,
ADMIN.MBR_GIFT_HIST.PRFL_CD,
ADMIN.MBR_GIFT_HIST.CPGN_NUM,
ADMIN.MBR_GIFT_HIST.PKG_CD,
Gift_Profile_CD.PRFL_CD_DESC,
ADMIN.MBR_GIFT_HIST.RCNCY_CD,
ADMIN.MBR_GIFT_HIST.FREQ_CD,
Gift_Recency_CD.RCNCY_CD_DESC,
ADMIN.MBR_GIFT_HIST.AMT_CD,
Gift_Frequency_CD.FREQ_CD_DESC,
ADMIN.MBR_GIFT_HIST.RCNCY_CD ||
ADMIN.MBR_GIFT_HIST.FREQ_CD,
Gift_Amount_CD.AMT_CD_DESC,
ADMIN.MBR_GIFT_HIST.RCNCY_CD ||
ADMIN.MBR_GIFT_HIST.FREQ_CD || ADMIN.MBR_GIFT_HIST.AMT_CD,
ADMIN.MBR_GIFT_HIST.LOT_CD,
ADMIN.MBR_GIFT_HIST.CARE_GVNG_CD,
ADMIN.MBR_GIFT_HIST.SRC_CD,
Gift_Caregiver.CARE_GVNG_CD_DESC,
ADMIN.MBR_GIFT_HIST.RSP_CD,
Gift_Source_CD.SRC_CD,
ADMIN.MBR_GIFT_HIST.PREM_TYP,
Gift_Response_Code.RSP_CD_DESC,
Gift_Premium_Type.PREM_TYP_DESC,
ADMIN.MBR_GIFT_HIST.MBR_ID,
ADMIN.MBR_GIFT_HIST.GIFT_DT,
ADMIN.MBR_GIFT_HIST.GIFT_AMT,
ADMIN.MBR_GIFT_HIST.AFFL_CD,
ADMIN.MBR_GIFT_HIST.UPDT_NUM,
ADMIN.MBR_GIFT_HIST.LAST_UPDT_DT,
ADMIN.MBR_NM_ADDR.SALU_LINE_1,
ADMIN.MBR_NM_ADDR.SALU_LINE_2,
ADMIN.MBR_NM_ADDR.SALU_LINE_3,
ADMIN.MBR_PRFL.UPDT_NUM,
ADMIN.MBR_PRFL.LAST_UPDT_DT,
ADMIN.MBR_NM_ADDR.INSIDE_SALU_NM,
ADMIN.MBR_PRFL.MBR_ID,
ADMIN.MBR_PRFL.ACCT_TYP,
ADMIN.MBR_PRFL.ACCT_CAT_CD,
ADMIN.MBR_PRFL.AFFL_CD,
ADMIN.MBR_PRFL.CHAP_IND,
ADMIN.MBR_PRFL.RSP_CD,
ADMIN.MBR_PRFL.NEW_MBR_FLG,
ADMIN.MBR_PRFL.SEED_FLG,
ADMIN.MBR_PRFL.SLCITN_CD;
SELECT
sum(ADMIN.MBR_GIFT_HIST.GIFT_AMT),
count(ADMIN.MBR_GIFT_HIST.GIFT_DT),
count(distinct ADMIN.MBR_GIFT_HIST.MBR_ID),
ADMIN.MBR_NM_ADDR.ADDR_TYP,
ADMIN.MBR_NM_ADDR.ADDR_LINE_1,
ADMIN.MBR_NM_ADDR.ADDR_LINE_2,
ADMIN.MBR_NM_ADDR.CITY,
ADMIN.MBR_NM_ADDR.STATE_CD,
ADMIN.MBR_NM_ADDR.ZIP_CD_BASE,
ADMIN.MBR_NM_ADDR.ZIP_CD_SUFX,
ADMIN.MBR_PRFL.RCNCY_CD,
ADMIN.MBR_PRFL.FREQ_CD,
Member_Recency_CD.RCNCY_CD_DESC,
ADMIN.MBR_PRFL.AMT_CD,
ADMIN.MBR_PRFL.RCNCY_CD || ADMIN.MBR_PRFL.FREQ_CD,
ADMIN.MBR_PRFL.RCNCY_CD || ADMIN.MBR_PRFL.FREQ_CD ||
ADMIN.MBR_PRFL.AMT_CD,
Member_Frequecncy_CD.FREQ_CD_DESC,
Member_Amount_CD.AMT_CD_DESC,
Member_Recency_CD.RCNCY_CD_DESC || ' ' ||
Member_Frequecncy_CD.FREQ_CD_DESC,
Member_Recency_CD.RCNCY_CD_DESC || ' ' ||
Member_Frequecncy_CD.FREQ_CD_DESC || ' ' ||
Member_Amount_CD.AMT_CD_DESC,
ADMIN.MBR_BY_GIFT.FRST_GIFT_AMT,
ADMIN.MBR_BY_GIFT.FRST_GIFT_DT,
ADMIN.MBR_BY_GIFT.LAST_GIFT_AMT,
ADMIN.MBR_BY_GIFT.LAST_GIFT_DT,
ADMIN.MBR_BY_GIFT.HGST_GIFT_AMT,
ADMIN.MBR_BY_GIFT.HGST_GIFT_DT,
ADMIN.MBR_BY_GIFT.HGST_GIFT_LAST_24_MTH,
ADMIN.MBR_BY_GIFT.FSCL_YTD_AVG,
ADMIN.MBR_BY_GIFT.PREV_FSCL_YR_AVG,
ADMIN.MBR_BY_GIFT.LFTM_AVG,
ADMIN.MBR_GIFT_HIST.MAIL_KEY_CD,
ADMIN.MBR_GIFT_HIST.CPGN_TYP,
Gift_Campaign_Type.CPGN_TYP_DESC,
ADMIN.MBR_GIFT_HIST.DONOR_CLASS_CD,
Gift_Donor_Class.DONOR_CLASS_CD_DESC,
ADMIN.MBR_GIFT_HIST.CPGN_AUDNC_CD,
Gift_Cpgn_Audience.CPGN_AUDNC_CD_DESC,
ADMIN.MBR_GIFT_HIST.CPGN_YR,
ADMIN.MBR_GIFT_HIST.PRFL_CD,
ADMIN.MBR_GIFT_HIST.CPGN_NUM,
ADMIN.MBR_GIFT_HIST.PKG_CD,
Gift_Profile_CD.PRFL_CD_DESC,
ADMIN.MBR_GIFT_HIST.RCNCY_CD,
ADMIN.MBR_GIFT_HIST.FREQ_CD,
Gift_Recency_CD.RCNCY_CD_DESC,
ADMIN.MBR_GIFT_HIST.AMT_CD,
Gift_Frequency_CD.FREQ_CD_DESC,
ADMIN.MBR_GIFT_HIST.RCNCY_CD || ADMIN.MBR_GIFT_HIST.FREQ_CD,
Gift_Amount_CD.AMT_CD_DESC,
ADMIN.MBR_GIFT_HIST.RCNCY_CD || ADMIN.MBR_GIFT_HIST.FREQ_CD ||
通过BO拖拽生成的 …
6亿条记录的事实表
13个join
75个group by
传统数据仓库: 需要花费的时间以小
时计
PDA: 只用了3分钟
卓越的数据仓库性能
ADMIN.MBR_GIFT_HIST.AMT_CD,
ADMIN.MBR_GIFT_HIST.LOT_CD,
ADMIN.MBR_GIFT_HIST.CARE_GVNG_CD,
ADMIN.MBR_GIFT_HIST.SRC_CD,
Gift_Caregiver.CARE_GVNG_CD_DESC,
ADMIN.MBR_GIFT_HIST.RSP_CD,
Gift_Source_CD.SRC_CD,
ADMIN.MBR_GIFT_HIST.PREM_TYP,
Gift_Response_Code.RSP_CD_DESC,
Gift_Premium_Type.PREM_TYP_DESC,
ADMIN.MBR_GIFT_HIST.MBR_ID,
ADMIN.MBR_GIFT_HIST.GIFT_DT,
ADMIN.MBR_GIFT_HIST.GIFT_AMT,
ADMIN.MBR_GIFT_HIST.AFFL_CD,
ADMIN.MBR_GIFT_HIST.UPDT_NUM,
ADMIN.MBR_GIFT_HIST.LAST_UPDT_DT,
ADMIN.MBR_NM_ADDR.SALU_LINE_1,
ADMIN.MBR_NM_ADDR.SALU_LINE_2,
ADMIN.MBR_NM_ADDR.SALU_LINE_3,
ADMIN.MBR_PRFL.UPDT_NUM,
ADMIN.MBR_PRFL.LAST_UPDT_DT,
ADMIN.MBR_NM_ADDR.INSIDE_SALU_NM,
ADMIN.MBR_PRFL.MBR_ID,
ADMIN.MBR_PRFL.ACCT_TYP,
ADMIN.MBR_PRFL.ACCT_CAT_CD,
ADMIN.MBR_PRFL.AFFL_CD,
ADMIN.MBR_PRFL.CHAP_IND,
ADMIN.MBR_PRFL.RSP_CD,
ADMIN.MBR_PRFL.NEW_MBR_FLG,
ADMIN.MBR_PRFL.SEED_FLG,
ADMIN.MBR_PRFL.SLCITN_CD
传统的复杂度 ... PDA的简单性
ORACLECREATE TABLE "MRDWDDM"."RDWF_DDM_ROOMS_SOLD" ("ID_PROPERTY" NUMBER(5,
0) NOT NULL ENABLE, "ID_DATE_STAY" NUMBER(5, 0) NOT NULL ENABLE,
"CD_ROOM_POOL" CHAR(4) NOT NULL ENABLE, "CD_RATE_PGM" CHAR(4) NOT
NULL ENABLE, "CD_RATE_TYPE" CHAR(1) NOT NULL ENABLE,
"CD_MARKET_SEGMENT" CHAR(2) NOT NULL ENABLE, "ID_CONFO_NUM_ORIG"
NUMBER(9, 0) NOT NULL ENABLE, "ID_CONFO_NUM_CUR" NUMBER(9, 0) NOT
NULL ENABLE, "ID_DATE_CREATE" NUMBER(5, 0) NOT NULL ENABLE,
"ID_DATE_ARRIVAL" NUMBER(5, 0) NOT NULL ENABLE, "ID_DATE_DEPART"
NUMBER(5, 0) NOT NULL ENABLE, "QY_ROOMS" NUMBER(5, 0) NOT NULL
ENABLE, "CU_REV_PROJ_NET_LOCAL" NUMBER(21, 3) NOT NULL ENABLE,
"CU_REV_PROJ_NET_USD" NUMBER(21, 3) NOT NULL ENABLE,
"QY_DAYS_STAY_CUR" NUMBER(3, 0) NOT NULL ENABLE, "CD_BOOK_SOURCE"
CHAR(1) NOT NULL ENABLE) PCTFREE 5 PCTUSED 95 INITRANS 4 MAXTRANS 255
STORAGE( FREELISTS 6) TABLESPACE "DDM_ROOMS_SOLD_DATA" NOLOGGING
PARTITION BY RANGE ("ID_PROPERTY" ) (PARTITION "PART1" VALUES LESS
THAN (600) PCTFREE 5 PCTUSED 95 INITRANS 4 MAXTRANS 255
STORAGE(INITIAL 16777216 FREELISTS 6 FREELIST GROUPS 1) TABLESPACE
"DDM_ROOMS_SOLD_DATA" NOLOGGING NOCOMPRESS, PARTITION "PART2" VALUES
LESS THAN (1200) PCTFREE 5 PCTUSED 95 INITRANS 4 MAXTRANS 255
STORAGE(INITIAL 16777216 FREELISTS 6 FREELIST GROUPS 1) TABLESPACE
"DDM_ROOMS_SOLD_DATA" NOLOGGING NOCOMPRESS, PARTITION "PART3" VALUES
LESS THAN (1800) PCTFREE 5 PCTUSED 95 INITRANS 4 MAXTRANS 255
STORAGE(INITIAL 16777216 FREELISTS 6 FREELIST GROUPS 1) TABLESPACE
"DDM_ROOMS_SOLD_DATA" NOLOGGING NOCOMPRESS, PARTITION "PART4" VALUES
LESS THAN (2400) PCTFREE 5 PCTUSED 95 INITRANS 4 MAXTRANS 255
STORAGE(INITIAL 16777216 FREELISTS 6 FREELIST GROUPS 1) TABLESPACE
"DDM_ROOMS_SOLD_DATA" NOLOGGING NOCOMPRESS, PARTITION "PART5" VALUES
LESS THAN (3000) PCTFREE 5 PCTUSED 95 INITRANS 4 MAXTRANS 255
STORAGE(INITIAL 16777216 FREELISTS 6 FREELIST GROUPS 1) TABLESPACE
"DDM_ROOMS_SOLD_DATA" NOLOGGING NOCOMPRESS, PARTITION "PART6" VALUES
LESS THAN (MAXVALUE) PCTFREE 5 PCTUSED 95 INITRANS 4 MAXTRANS 255
STORAGE(INITIAL 16777216 FREELISTS 6 FREELIST GROUPS 1) TABLESPACE
"DDM_ROOMS_SOLD_DATA" NOLOGGING NOCOMPRESS ) ;
ORACLE Bitmap indexCREATE BITMAP INDEX "CRDBO"."SNAPSHOT_MONTH_IDX13" ON
"SNAPSHOT_OPPTY_MONTH_HIST" ("SNAPSHOT_YEAR" ) PCTFREE 10 INITRANS 2
MAXTRANS 255 STORAGE(INITIAL 4194304 NEXT 4194304 MINEXTENTS 2 MAXEXTENTS
2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL
DEFAULT) TABLESPACE "SFA_DATAMART_INDEX" NOLOGGING ;
ORACLE Table ClustersCREATE CLUSTER "MRDW"."CT_INTRMDRY_CAL" ("ID_YEAR_CAL" NUMBER(4, 0),
"ID_MONTH_CAL" NUMBER(2, 0), "ID_PROPERTY" NUMBER(5, 0)) SIZE 16384
PCTFREE 10 PCTUSED 90 INITRANS 3 MAXTRANS 255 STORAGE(INITIAL
83886080 NEXT 41943040 MINEXTENTS 1 MAXEXTENTS 1017 PCTINCREASE 0
FREELISTS 4 FREELIST GROUPS 1 BUFFER_POOL RECYCLE) TABLESPACE
"TSS_FACT" ;
•没有 indexes
•没有物理 Tuning/Admin
•随机的条带化数据分布或按列hash值分布
优势:花费更少的时间/精力用于日常的DBA工作;将更多的资源用于对业务更有价值的工作
• 灵活的应用开发• “随需”的数据集市• 更多的BI业务上线• 数据深度挖掘/分析
NetezzaCREATE TABLE MRDWDDM.RDWF_DDM_ROOMS_SOLD (
ID_PROPERTY numeric(5, 0) NOT NULL ,
ID_DATE_STAY integer NOT NULL ,
CD_ROOM_POOL CHAR(4) NOT NULL ,
CD_RATE_PGM CHAR(4) NOT NULL ,
CD_RATE_TYPE CHAR(1) NOT NULL ,
CD_MARKET_SEGMENT CHAR(2) NOT NULL ,
ID_CONFO_NUM_ORIG integer NOT NULL ,
ID_CONFO_NUM_CUR integer NOT NULL ,
ID_DATE_CREATE integer NOT NULL ,
ID_DATE_ARRIVAL integer NOT NULL ,
ID_DATE_DEPART integer NOT NULL ,
QY_ROOMS integer NOT NULL ,
CU_REV_PROJ_NET_LOCAL numeric(21, 3) NOT NULL ,
CU_REV_PROJ_NET_USD numeric(21, 3) NOT NULL ,
QY_DAYS_STAY_CUR smallint NOT NULL ,
CD_BOOK_SOURCE CHAR(1) NOT NULL)
distribute on random;
从复杂到简洁
• 来自 Oracle 的 DDL 在 1 天内迁移到 Netezza。
• 进程从现有 DDL 文件中删除了 95% 以上的内容,包括表空间、分区和索引声明。
ORACLE – 1821 行 SQLCREATE TABLE CMVFACT.FCT_MKV_CUST_BAL
(
PER_CCYYMM NUMBER(6) NOT NULL,
SK_CIF NUMBER(15) NOT NULL,
SK_PR_L5 NUMBER(8) NOT NULL,
BAL_CD NUMBER(5) NOT NULL,
BUS_SEG_L3_ID NUMBER(4) NOT NULL,
SEG2_CD NUMBER(2) NOT NULL,
ISO_CRNCY_CD CHAR(3) NOT NULL,
CO_CST_CTR_ID CHAR(10) NOT NULL,
CAL_MON_DAY_NR NUMBER(2) NOT NULL,
CAL_MON_BUS_DAY_NR NUMBER(2) NOT NULL,
BAL_AM NUMBER(15,2) NOT NULL,
BAL_ORIG_CRNCY_AM NUMBER(18,2) NOT NULL,
SRCE_SYS_CD NUMBER(4) NULL
)
TABLESPACE CMVFACT
NOLOGGING
PCTFREE 0
PCTUSED 90
INITRANS 1
MAXTRANS 255
STORAGE(BUFFER_POOL DEFAULT)
NOPARALLEL
NOCACHE
PARTITION BY RANGE(PER_CCYYMM)
(PARTITION P200205 VALUES LESS THAN (200206)
STORAGE(FREELISTS 1
FREELIST GROUPS 1),
PARTITION P200206 VALUES LESS THAN (200207)
STORAGE(FREELISTS 1
FREELIST GROUPS 1),
PARTITION P200207 VALUES LESS THAN (200208)
STORAGE(FREELISTS 1
FREELIST GROUPS 1),
PARTITION P200208 VALUES LESS THAN (200209)
STORAGE(FREELISTS 1
FREELIST GROUPS 1),
Netezza – 17 行 SQLCREATE TABLE FCT_MKV_CUST_BAL
(
PER_CCYYMM integer NOT NULL,
SK_CIF bigint NOT NULL,
SK_PR_L5 integer NOT NULL,
BAL_CD integer NOT NULL,
BUS_SEG_L3_ID integer NOT NULL,
SEG2_CD integer NOT NULL,
ISO_CRNCY_CD CHAR(3) NOT NULL,
CO_CST_CTR_ID CHAR(10) NOT NULL,
CAL_MON_DAY_NR integer NOT NULL,
CAL_MON_BUS_DAY_NR integer NOT NULL,
BAL_AM numeric(15,2) NOT NULL,
BAL_ORIG_CRNCY_AM numeric(18,2) NOT NULL,
SRCE_SYS_CD integer NULL
)
distribute on (sk_cif);
一切都是那么简单……
没有dbspace/tablespace容量规划和配置
无需redo/physical/Logical log的规划和配置
无需表的page/block 规划和配置
无需表的extent规划和配置
无需临时空间Temp space分配和监控
无需dbspaces 级RAID级别选择
无需文件的逻辑卷logical volume创建
无需推荐OS kernel 的集成
无需操作系统OS建议补丁级别的维护
不需要存储管理
不需要索引 indexes 和调优tuning
不需要软件安装
有限的技术人力资源成为真正的数据管理员而不是数据库管理员
预优化配置,简单布署
• 资源整合与按需服务• 服务器、存储及网络资源池• 统一管理与监控• 随需扩展
Jaql Parallel Processing
简单备份For easy data offload from enterprise data warehouse
PureData System for Analytics
PureData System for Hadoop
EasyArchiveImport/Export
NPS Host
Import From NPS
Export to NPS
HDFS (Hive) NPS Tables
IBM PDA的优势
IBM PDA 客户价值
成本
–低, 透明的初始成本–简单安装不需要额外专业服务–标准维护,包括硬件/软件支持和软件升级–PDA Migrator轻松移植Oracle应用
–快速部署,快速实现业务–容易理解,可预测的成本–最小化”额外”服务,更容易基于PDA制定预算
智慧 –各种各样的分析下压到计算节点–快速实现信息洞察–大数据量的简单访问和高级分析
简单–软硬件一体方案,适于数据仓库类高性能运算–无需调优
–把更多的资源、时间用于业务价值实现,而不是耗费精力于调优来获得可接受的性能
扩展–PB级的经过验证的可扩展性–没有扩展性瓶颈
–对业务和数据增长没有限制
速度–专门针对数据仓库运算和高级分析优化的一体机
–最快的数据仓库性能–操作简单
架构–真正的AMPP架构, 每个MPP节点内部的FPGA技术,实现硬件级加速运算
–通过最小化资源竞争和瓶颈,使成为最好的适用于数据仓库和高级分析的架构
谢 谢