building a terabyte data warehouse, using linux and rac george lumpkin director product management...

Post on 20-Jan-2016

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Building a Terabyte Data Warehouse, Using Linux and RAC

George Lumpkin

Director Product Management

Oracle Corporation

Session id: 40177

Do More with Less

More performance More scalability More users Less capital cost Less administration cost

RAC for Scalability, Availability,

and Flexibility

Linux and RAC for DWScalability

Data Warehouse DB

Linux ‘Starter’ Cluster:-Two nodes-One shared database

Linux and RAC for DWScalability

As the Business Grows …

Data Warehouse DB

Linux and RAC for DWScalability

As the Business Grows …

… so does yourEnvironment:-Three Nodes-One Database

Data Warehouse DB

Linux and RAC for DWScalability

As the Business Grows …

Data Warehouse DB

… and again:-Four Nodes-One Database

Linux and RAC for DWAvailability

When one node fails …

Data Warehouse DB

Linux and RAC for DWAvailability

When one node fails …

… the load is rebalanced and

3/4th of the cluster continues the work

Data Warehouse DB

Linux and RAC for DWFlexibility

The Cluster can share all workload ubiquitously …

QueryQueryQueryQueryETL ETL ETL ETL

Data Warehouse DB

Linux and RAC for DWFlexibility

… or do workloadpartitioning

QueryQueryQueryETL ETL

ETLQuery

ETL

Data Warehouse DB

Linux and RAC for DWFlexibility

QueryQueryQueryETL ETL

ETLQuery

ETL

Workload Management and Provisioning made easy

ETLETL

Data Warehouse DB

Christmas – “Data Season”for Retail

Linux and RAC for DWFlexibility

QueryQueryQueryETL Query

ETL

Workload Management and Provisioning made easy

ETLETL

Data Warehouse DB

January – “Analysis Season”

QueryQuery

RAC and Parallel Execution

RAC and Parallel Execution

• Very large queries utilize all resources on the cluster

Large Query

RAC and Parallel Execution

• Many large-scale DWs have many concurrrent jobs– Multiple “small-to-medium” size queries – Degree of parallelism < CPUs-per-node

• With Oracle, queries will automatically run on a single node, eliminating traffic over the interconnect

Q1 Q2 Q4Q3

Q5 Q7Q6 Q8

Q9 Q12Q11Q10

Recipe for a RAC Linux DW

Processors I/O Interconnect

Data warehouse workload determines total number of CPU’s

– Same sizing considerations as non-clustered DW

How many processors per node? – Enough CPU’s so that a single node can handle

most database operations Often, 4 cpu’s is a good balance

Recipe for a RAC Linux DW:Processors

Recipe for a RAC Linux DW:I/O I/O is typically the primary determinant of data

warehouse performance– Storage configurations for a data warehouse

should always be chosen based on I/O bandwidth not storage capacity

Rule of thumb: at least 100 MBytes/sec of IO bandwidth per gigahertz of processing power

Every component of the IO system should provide enough bandwidth: disks, IO channels, IO adapters

Recipe for a RAC Linux DW:I/O

CPU power and IO bandwidth should be balanced within a server

– Example: Each node has 4 x 2ghz processors each node can utilize

at least 800 MB/sec Each node should have enough slots to accommodate the

necessary IO throughput If one host bus adapter drives 150 MB/sec, then 6 HBA’s

should accommodate the needed IO bandwidth Note that at least one slot is required for the interconnect

Recipe for a RAC Linux DW:Interconnect Gigabit ethernets are generally sufficient for

data-warehouse workloads– Oracle minimizes interconnect traffic for multi-

user workloads

Workloads requiring inter-node parallel query will utilize more interconnect bandwidth

– 10Gb ethernet, fibre channel, Infiniband

‘Typical’ Cluster configuration

16-port switch

16-port switch

1 Gigabit ethernet

16 Storage arrays, each with

10-20 disks

4 nodes, each with 4 x 2 Ghz CPUs 5 PCI slots

Oracle Linux/RAC DW Customers

RAC/Linux DW Customers Euronext

– Database size: 1.5 TB– Hardware: 2 x HP DL580 (4 CPUs)– Storage: HP MSA 1000– Interconnect: 1 Gb ethernet– OS: Red Hat

AOK Berlin– Database size: 780 GB– Hardware: 2 x HP DL580 (4 CPUs)– Storage: EMC Symmetrix– Interconnect: 2 x 1Gb ethernet– OS: SuSE

Vanderbilt University– Database size: 50 GB– Hardware: 3 x HP DL580 (4 CPUs)– Storage: EMC Symmetrix– Interconnect: 1 Gb ethernet– OS: Red Hat

National Bank AG– Database size: 75 GB– Hardware: 3 x IBM Express5800 (2

CPUs)– Interconnect: 100 Mb ethernet– OS: SuSE

Ellis Island Foundation– Database size: 60 GB– Hardware: 2 x HP DL580 (4 CPUs)– Storage: NetApp– Interconnect: 1Gb ethernet– OS: Red Hat

Linux-RAC and the Grid

Increasingly common customer theme these days is “provisioning”

Customers want more value out of their hardware expenditures – they want to take advantage of unused capacity

Oracle’s architecture is unique in being able to truly support flexible provisioning of processing power across multiple databases

Oracle will be widely deployed in large commercial computing “grids” in the future

Evolution of Business Intelligence with Oracle

ETL processing, Query & Reporting, Data Mining and Scoring, Cube Creation and OLAP Analysis

Order Entry, Shipments, Procurement, Inventory, …

Real Application Clusters

Resource ProvisioningDecember: Order Processing Heavy – Analytics Light

ETL processing, Query & Reporting, Data Mining, …

Order Entry, Shipments, Procurement, Inventory, …

Order Entry, Shipments, Procurement, Inventory, …

ETL processing, Query & Reporting, Data Mining and Scoring, Cube Creation and OLAP Analysis

Resource ProvisioningJanuary: Order Processing Light – Heavy Analytics

Oracle RACBrings Flexible Processing Power to Databases on the Grid

Next Steps …Data Warehousing DB Sessions

11:00 AM

#40153, Room 304

Oracle Warehouse Builder:

New Oracle Database 10g Release

3:30 PM

#40176, Room 303

Security and the Data Warehouse

4:00 PM

#40166, Room 130

Oracle Database 10g

SQL Model Clause

8:30 AM#40125, Room 130

Oracle Database 10g: A Spatial VLDB Case Study

3:30 PM#40177, Room 303

Building a Terabyte Data Warehouse,Using Linux and RAC

  5:00 PM

#40043, Room 104

Data Pump in Oracle Database 10g:Foundation for Ultrahigh-Speed Data

Movement

TuesdayMonday

For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html

8:30 AM #40179, Room 304

Oracle Database 10g Data Warehouse Backup and Recovery

11:00 AM#36782, Room 304

Experiences with Real-Time Data Warehousing using Oracle 10g

1:00PM#40150, Room 102

Turbocharge your Database, Using the Oracle Database 10g SQLAccess

Advisor

Thursday

Oracle Database 10g

Oracle OLAP

Oracle Data Mining

Oracle Warehouse Builder

Oracle Application Server 10g

Business Intelligence and Data Warehousing Demos All Four DaysIn The Oracle Demo Campground

For More Info On Oracle BI/DW Go To http://otn.oracle.com/products/bi/db/dbbi.html

Next Steps …Data Warehousing DB Sessions

Reminder – please complete the OracleWorld online session survey

Thank you.

top related