design patterns and development ... abstract which works better, a b-tree index or a bitmap index?...

74
DESIGN PATTERNS AND DEVELOPMENT CONSIDERATIONS IN A DATA WAREHOUSE Dan Stober Intermountain Healthcare Wednesday, September 23, 2015

Upload: doannguyet

Post on 10-Mar-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

DESIGN PATTERNS AND DEVELOPMENT CONSIDERATIONS IN A DATA WAREHOUSE

Dan Stober

Intermountain Healthcare

Wednesday, September 23, 2015

Dan Stober

• Intermountain Healthcare • Enterprise Data Architect

• Data Architect

• Working in Oracle databases since 2001

• Frequent presenter at UTOUG events. • IOUG, OAUG, and Oracle Open World

• Occasional invited lecturer at universities ;-)

• California State University Fresno

Session Abstract

Which works better, a B-Tree index or a bitmap index?

What's the difference anyway? Should I partition my

table? What's the best way to perform an upsert?

Should I compress my table? In this session, we'll take a

look at these questions and more. We've only got an

hour, so buckle up for a wild ride!

Agenda

• Datatypes and Storage

• Tables Storage

• Indexing

• Compression

• Table Partitioning

What‟s In Your Wallet?

Highlighter

DATATYPES

Datatypes

• Major groups of datatypes • NUMBER

• Includes INTEGER, which is just NUMBER (38,0)

• DATE • Includes TIMESTAMP, INTERVAL

• VARCHAR • CHAR • Includes XMLType • DATES and NUMBERS can be stored in VARCHAR fields

• LOB • Includes CLOB, BLOB • Stored “out of line”

• LONG – Hey! Doesn‟t this make five types?!

• ROWID

Why Are Datatypes Important?

• Appropriate storage • NUMBER takes less space than storing VARCHAR representation

of same value

• Datatypes serve as enforcement mechanism to prevent invalid values

• Math and Functions

INSERT INTO scott.emp ( empno, ename, sal ) VALUES ( 6006, 'WILLIAMS', '750K' ); ORA-01722: invalid number

SELECT SYSDATE - hiredate FROM scott.emp;

How Much Space Does It Take?

VARCHAR NUMBER

SELECT VSIZE ( ename ) vsize, ename FROM scott.emp WHERE deptno = 10; VSIZE ENAME ---------- ---------- 4 KING 5 CLARK 6 MILLER 3 rows selected.

SELECT VSIZE ( sal ) vsize, sal FROM scott.emp WHERE deptno = 10; VSIZE SAL ---------- ---------- 2 1300 3 2450 2 5000 3 rows selected.

One byte per character

(with a single-bit character set)

Larger numbers do not necessarily take more space.

Based on scientific notation (significant digits)

DATE : seven bytes

Actual space used will vary. Dependent upon:

• PCTFREE

• Compression

• Efficiency of storage (deletes leaving holes, etc)

Storage Space for NUMBER Values

• NUMBER, from 1 to 22 bytes • One byte for every two significant digits PLUS ONE BYTE

• One more for negative numbers

WITH dta AS ( SELECT TO_NUMBER ( RPAD ( '9', 38, '9' )) num1 , TO_NUMBER ( RPAD ( '9', 38, '9' )) + 1 num2 FROM DUAL ) SELECT num1, num2 , VSIZE ( num1 ) , VSIZE ( num2) FROM dta;

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1856720300346322149

99999999999999999999999999999999999999 100000000000000000000000000000000000000

Which of these

numbers requires

more space? WITH dta AS ( SELECT TO_NUMBER ( RPAD ( '9', 38, '9' )) num1 , TO_NUMBER ( RPAD ( '9', 38, '9' )) + 1 num2 FROM DUAL ) SELECT num1, num2 , VSIZE ( num1 ) , VSIZE ( num2) FROM dta; NUM1 NUM2 VSIZE(NUM1) VSIZE(NUM2) ---------- ---------- ----------- ----------- 1.0000E+38 1.0000E+38 20 2 1 row selected.

What Does Precision Mean?

• NUMBER can hold 38 digits of precision

• But Oracle will accept 1.234 ×10125.

• Why?

• This is only four digits of precision

NUMBER [ (p [, s]) ]

Number having precision p and scale s. The precision p can

range from 1 to 38. The scale s can range from -84 to 127. Both precision and

scale are in decimal digits. A NUMBER value requires from 1 to 22 bytes.

Oracle Database SQL Language Reference

11g Release 2 (E10592-04)

CHAR vs VARCHAR

CREATE TABLE a_table_char ( rec_id NUMBER , creature CHAR(100) ); CREATE TABLE a_table_varchar ( rec_id NUMBER , creature VARCHAR2(100) );

SELECT LENGTH ( creature ) , COUNT(*) FROM a_table_varchar GROUP BY LENGTH ( creature ); LENGTH(CREATURE) COUNT(*) ---------------- ---------- 4 2857143 6 5714285 10 1428572 3 rows selected.

SELECT LENGTH ( creature ) , COUNT(*) FROM a_table_char GROUP BY LENGTH ( creature ); LENGTH(CREATURE) COUNT(*) ---------------- ---------- 100 10000000 1 row selected.

REC_ID CREATURE

1 Zamp

2 Wasket

3 Noothgrush

4 Yottle

5 Nureau

6 Yeps

7 Wocket

8 Zamp

. . . 7 creatures repeat . . .

10000000 Noothgrush

SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name IN ('A_TABLE_VARCHAR' , 'A_TABLE_CHAR'); SEGMENT_NAME MB -------------------- ---------- A_TABLE_CHAR 1088 A_TABLE_VARCHAR 173 2 rows selected.

Why not make everything VARCHAR2(4000)?

• Tools that display query results based on the size of the field

• Allocating space in a fetch array

• Consider the field size as a constraint

ORA-12899: Value too large for column

https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:114513253705

5

I have a good idea! Let‟s avoid ORA-12899

and just make everything 4000 characters

wide!

Storing Dates: DATE VS NUMBER VS VARCHAR

DATE NUMBER VARCHAR

JANUARY 1, 1900 19000101 19000101

JANUARY 2, 1900 19000102 19000102

. . .

DECEMBER 21, 2099 20991231 20991231

SELECT segment_name, bytes , bytes / POWER ( 1024,2) mb , blocks FROM dba_segments WHERE segment_name IN ( 'TBL_DATES','TBL_NUM_DATES','TBL_STR_DATES'); SEGMENT_NAME BYTES MB BLOCKS ---------------- ---------- ---------- ---------- TBL_DATES 983040 .9375 60 TBL_NUM_DATES 851968 .8125 52 TBL_STR_DATES 2097152 2 128 3 rows selected.

• Sure, Numbers consumed slightly less storage

• But…

• Data Integrity • Numbers and Varchar will not

prevent entry of invalid dates (20150931)

• Cannot perform math

• Date functions not available

IMPLICIT CONVERSIONS

• Oracle cannot compare different datatypes from different groups

• NUMBER to VARCHAR Precedence • Oracle always converts VARCHAR

values to number • Or at least it tries to!

SELECT * FROM DUAL WHERE 1 = '1';

NUMBER VARCHAR

SELECT * FROM scott.emp WHERE empno = '7843';

SELECT * FROM scott.emp WHERE empno = TO_NUMBER( '7843');

SELECT * FROM dan.addresses WHERE TO_NUMBER ( zip_code ) = 84096;

ORA-01722: invalid number

SELECT * FROM DUAL WHERE 1 = TO_NUMBER('1');

NUMBER field

compared to

VARCHAR value

VARCHAR field

compared to

NUMBER value

SELECT * FROM dan.addresses WHERE zip_code = 84096;

What Oracle

executes

What‟s In Your Wallet?

A ticket to

a sporting event

STORAGE

USER_SEGMENTS/DBA_SEGMENTS

• Data Dictionary objects that provide information about storage

• SEGMENT can be: • A TABLE, a PARTITION, a SUBPARTITION, or an INDEX

• Size is given in BLOCKS and in BYTES

• Sometimes, the actual space consumed by the data is considerably smaller than what is indicated in DBA_SEGMENTS

Conversions

BYTES/1024 = KB

BYTES/10242 = MB

BYTES/10243 = GB

SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'SIZE_DEMO_NUMS'; SEGMENT_NAME MB -------------------- ---------- SIZE_DEMO_NUMS 112 1 row selected.

Sessions

Creating session

INSERT INTO size_demo_nums SELECT level FROM DUAL CONNECT BY LEVEL <= POWER ( 10,7); 10000000 rows created.

Second session

SELECT COUNT(*) FROM size_demo_nums; COUNT(*) ---------- 10000000 1 row selected.

SELECT COUNT(*) FROM size_demo_nums; COUNT(*) ---------- 0 1 row selected.

SELECT COUNT(*) FROM size_demo_nums; COUNT(*) ---------- 10000000 1 row selected.

COMMIT; Commit complete.

When is a Record Written?

CREATE TABLE size_demo_nums ( num_fld NUMBER ); Table created.

no rows selected.

INSERT INTO size_demo_nums SELECT level FROM DUAL CONNECT BY LEVEL <= POWER ( 10,7); 10000000 rows created.

SEGMENT_NAME MB -------------------- ---------- SIZE_DEMO_NUMS 112 1 row selected.

COMMIT; Commit complete.

SEGMENT_NAME MB -------------------- ---------- SIZE_DEMO_NUMS 112 1 row selected.

SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'SIZE_DEMO_NUMS';

Timing

INSERT RECORD

Database table

Rollback

Segments

Before COMMIT...

After COMMIT...

Queries in the same

session simply read the

database table

Queries in other

sessions must read the

database table

AND

reapply the ROLLBACK

segments

All queries simply read

the database table

Heap Tables

THIS IS A GOOD THING! • More efficient storage

• Empty space does not have to be maintained for future inserts and updates

• Faster writes • Uses “first available” space

ALTERNATIVES:

• Table Partitioning

• Direct Path Loads

• Index Organized Tables

heap n. A group of things placed, thrown,

or lying one on another; pile. SOURCE: dictionary.com

• Records are not stored in any

particular order

• Records are stored wherever

space is available

• Not necessary together

Oracle Storage

• All Oracle data is stored in BLOCKS (or DATABLOCKS)

• The BLOCK is the smallest unit of Oracle storage • To access a single record, Oracle must read the entire block which contains

the record

• BLOCK > EXTENT > SEGMENT > TABLESPACE • A table is stored in a single tablespace (unless partitioned) • A tablespace can contain multiple tables (usually does) • Indexes can be in a separate tablespace from the table

• Block space for deleted records is reclaimed only when new records are inserted

Block

overhead

Inserts are made

into available space in block,

wherever it can be found

PCTFREE

• Attribute of a TABLE that tells Oracle how much space to reserve for future updates

• If table is never updated, there‟s no need to reserve any free space

AAAWWAAY|7369|SMITH|CLERK|7902|17-DEC-1980|800||20

AAAWWAAZ|7499|ALLEN|SALESMAN|7698|20-FEB-1981|1600|300|30 AAAWWB

BAZ|7521|WARD|SALESMAN|7698|22-FEB-1981|1250|500|30 AAAWWAZA|

7566|JONES|MANAGER|7839|02-APR-19812975||20 AAAWWAZB|7654|MARTI

UPDATE scott.emp SET ename = 'COX' WHERE empno = 7499

AAAWWAAZ|7499|COX|SALESMAN|7698|20-FEB-1981|1600|300|30

ABACABAAW|7369|THOMPSON|CLERK|7902|17-DEC-1980|800||20

AAAWWAAY->ABACABAAW

UPDATE scott.emp SET ename = 'THOMPSON' WHERE empno = 7369

Prior Value ALLEN 5 chars

New Value COX 3 chars

Prior Value SMITH 5 chars

New Value THOMPSON 8 chars

For updates, when not enough PCTFREE has been allocated…

If the updates require less space than the original values, record will be updated in place

If the updates require more space, then the record will have to be written in a new location

PCTFREE

• If the data will never be updated, then setting PCTFREE to 0 is the most efficient.

• However, if there will be updates, then it makes sense to leave 10% or 20% for subsequent updates

High Water Mark

• When records are deleted from a table

• Oracle continues to reserve the space

• TRUNCATE releases space

HIGH WATER MARK

DELETE FROM size_demo_nums; 10000000 rows deleted. COMMIT; Commit complete. SELECT COUNT(*) FROM size_demo_nums; COUNT(*) ---------- 0 1 row selected.

SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'SIZE_DEMO_NUMS'; SEGMENT_NAME MB -------------------- ---------- SIZE_DEMO_NUMS 112 1 row selected.

This is for efficiency: Oracle will hold

this space to be removed for more

records for this table

Resetting the High Water Mark

TRUNCATE TABLE size_demo_nums; Table truncated. SELECT segment_name , bytes/power ( 1024,2) mb , bytes/1024 kb FROM dba_segments WHERE segment_name = 'SIZE_DEMO_NUMS'; SEGMENT_NAME MB KB -------------------- ---------- ---------- SIZE_DEMO_NUMS .0625 64 1 row selected.

Comparing TRUNCATE vs DELETE

Advantages

QUICK – No Rollback Segments

Resets High Water Mark

Disadvantages

No Abilility to ROLLBACK

This is DDL (Implicit COMMIT)

All or Nothing

How can you lower the high water mark

when deleting half of the records

What‟s In Your Wallet?

A Printed SQL query

INDEXING

Indexes

• Ordered list of values in a table column • Designed to help Oracle find record

more quickly

• Indexes are maintained in real-time – Synchronized • If a record is updated, the index is

updated, too

• Using an INDEX is a two-step process

1. Look up the value in the index

2. Go to the record indicated

Index Lookup vs “Full Store” Scan

Will Oracle Always Use the Index?

• Steve Catmull: The Index Tipping Point

Yeats

Like

„T%‟ There are too

many!

It would be faster

just to scan the

entire table

Index Selectivity

SELECT owner , index_name , distinct_keys / num_rows * 100 AS s , num_rows , distinct_keys , leaf_blocks , blevel , avg_leaf_blocks_per_key FROM dba_indexes WHERE num_rows > 0

The measure of the effectiveness of an index is

its SELECTIVITY

Defined as Number of Distinct Values / number

of rows in the table

An index over a column of unique values would

have a selectivity of 1.

“Highly selective”

B-Tree index

*Balanced at time of creation; B-Tree indexes can become unbalanced over time

• “Normal” index in Oracle

• Balanced: Records are divided in half *

• Then again…

• … And again

• NULL Values are not indexed

• B-Tree indexes can be very large: Often

larger than the table itself

• They can be updated very efficiently

Unbalanced Tree:

Depending on the field, over time, all of the new

records can be created on one half of the tree, leading

to an unbalanced tree.

B-Tree Indices

51618

24800

17957 36651

84084

70140 89631

Oracle steps through index,

half of all values are below

and half are above

Because of this “median”

logic, B-tree indices are

ineffective for fields with

only a few distinct values

91001

91002

91003

91004

91005

91006

Bitmap Indices

Advantages and Uses:

• Recommended for columns with few unique values

• Indexes NULL values

• Best used in concert with other bitmap indices

Disadvantages:

• Not good in transactional systems • Or any database with frequent DML

• One update causes entire index entry to be rebuilt

• Concurrency issues (locking)

How does a bitmap index work?

• One entry for each unique value • If there are only two distinct values in the field, then there are

only two entries in the index

• Examples:

• Sex: M or F

• Flag fields

• Then, a string of ones and zeros for each record in the table!

A Bitmap Indexes Distinct Values

10 00000010100001

20 10010001001010

30 01101100010100

Unique

Values

Occurs in records...

10 7, 9, 14

20 1, 4, 8, 11, 13

30 2, 3, 5, 6, 10, 12

10

10

10

20

20

20

20

20 Index

entries

Bitmap Indices in a Query

Bitmap Index on MGR

7566 00000001000010

7698 01101000010100

7782 00000000000001

7788 00000000001000

7839 00010110000000

7902 10000000000000

{null} 00000000100000

SELECT * FROM scott.emp WHERE deptno = 30 AND mgr = 7839

Mgr 7839 00010110000000

Dept 30 01101100010100

||||||||||||||

AND 00000100000000

7839 00010110000000

||||||||||||||

OR 01111110010100

Notice also:

A BITMAP index

indexes NULL

values, too

• Power of bitmap Indices comes from combining results from multiple bitmap indices • Oracle will not use two

B-Tree indices from the same table, but it will use multiple bitmaps

• This can make bitmaps highly selective

What‟s the downside of Bitmap Indices?

• Does it lock the table? • Well, no

• But.... • As UPDATE will lock every record in the

index for both old and new values

7369 SMITH 20

7499 ALLEN 30

7521 WARD 30

7566 JONES 20

7654 MARTIN 30

7698 BLAKE 30

7782 CLARK 10

7788 SCOTT 20

7839 KING 10

7844 TURNER 30

7876 ADAMS 20

7900 JAMES 30

7902 FORD 20

7934 MILLER 10

10 00000010100001

20 10010001001010

30 01101100010100

UPDATE emp SET deptno = 20 WHERE empno = 7934

7934 MILLER 10

10 00000010100000

20

10 00000010100000

20 10010001001011

Index bitmaps for both

values must be rebuilt

With UPDATE, the entire

effected rows must be

rebuilt

For INSERT and DELETE,

the entire index must be

rebuilt

Index Comparison

B-TREE BITMAP

• Entries point to ROWIDs

• Only one B-Tree index per table will be

used for any query

• NULL values are not indexed

• Better performance for transactional

systems

• Can be much larger than the table

• Not efficient for fields with few distinct

values

• Can become unbalanced over time

• Series of 0 and 1

• Pivoted

• Cannot be used as a Primary Key

• Multiple bitmap indices can be used in

one query

• Works well with multiple WHERE clause

conditions

• NULL values are indexed

• Poor performance on updates and deletes

• Generally small – compress well

• Often quicker to DROP the index and

CREATE again than to update and insert

Index Issues

• Indices are kept in synch with table changes • Oracle maintains the index at the same time records are inserted, updated

or deleted

• Reading an index is a two-step process • For a large number of records, index may not necessarily be the fastest

What‟s In Your Wallet?

A traffic ticket

What‟s In Your Wallet?

A Checkbook

COMPRESSION

Compression

• Why? What is Compression?

• How to compress

• Compression Examples

• MOVE: Compressing Uncompressed Data

Why Compress?

• Compression saves storage (disk space) • More records fit into each block

• Faster queries • Fewer blocks must be read

Compression

• Oracle compression algorithms • BASIC

• ADVANCED ROW (OLTP)

• INDEX

• ( ARCHIVE – Exadata )

• Compression can be applied to • Tables

• Partitions

• Tablespace

• How? • Table must have COMPRESS attribute

• CREATE TABLE or ALTER TABLE

• Direct path insert 1. INSERT /*+ append */ hint, or…

2. CREATE TABLE AS SELECT

EXTRA LICENSE COST

Simplified Schematic for Compression

• Oracle compression is really de-duplication of data • If the same value appears

multiple times in the block, store it only once

• Token Map stores repeated values in block header

• Repeated values are replaced by the tokens

• Tokens can represent repeated combinations of tokens, too

CUST_ID|CUST_NM|CUST_CITY|CUST_ST 101 |HERBERT|ALPINE |UT 102 |HOLMES |DRAPER |UT 103 |HOLMES |PRESTON |ID 104 |HECKER |DRAPER |UT

V1:UT|V2:DRAPER| V3:HOLMES CUST_ID|CUST_NM|CUST_CITY|CUST_ST 101 |HERBERT|ALPINE |V1 102 |V3 |V2 |V1 103 |V3 |PRESTON |ID 104 |HECKER |V2 |V1

V1:UT|V2:DRAPER|V3:HOLMES|V4:V2||V1 CUST_ID|CUST_NM|CUST_CITY|CUST_ST 101 |HERBERT|ALPINE |V1 102 |V3 |V4 103 |V3 |PRESTON |ID 104 |HECKER |V4

Comparing Data Dictionary Entries

CREATE TABLE a_heap_table ( rec_id NUMBER , creature VARCHAR2(100) ) ;

CREATE TABLE a_compressed_table ( rec_id NUMBER , creature VARCHAR2(100) ) COMPRESS;

SELECT table_name, compression, compress_for, pct_free FROM all_tables WHERE table_name = ANY ('A_HEAP_TABLE','A_COMPRESSED_TABLE');

SELECT table_name, compression, compress_for, pct_free FROM all_tables WHERE table_name = ANY ('A_HEAP_TABLE','A_COMPRESSED_TABLE'); TABLE_NAME COMPRESSION COMPRESS_FOR PCT_FREE ------------------------------ ----------- ------------ ---------- A_HEAP_TABLE DISABLED 10 A_COMPRESSED_TABLE ENABLED BASIC 0 2 rows selected.

This is the default

By default,

Oracle assumes PCTFREE

will be 0 when table is

compressed

Ready to Insert 10,000,000 Records

REC_ID CREATURE

1 Wocket

2 Zamp

3 Wasket

4 Noothgrush

5 Yottle

6 Nureau

7 Yeps

8 Wocket

Values repeat every seven records

9999998 Zamp

9999999 Wasket

10000000 Noothgrush

WITH nums AS ( SELECT level rec_id FROM DUAL CONNECT BY level <= power ( 10,7) ) , creatures AS ( SELECT 1 AS id, 'Wocket' AS creature FROM DUAL UNION ALL SELECT 2 AS id, 'Zamp' AS creature FROM DUAL UNION ALL SELECT 3 AS id, 'Wasket' AS creature FROM DUAL UNION ALL SELECT 4 AS id, 'Noothgrush' AS creature FROM DUAL UNION ALL SELECT 5 AS id, 'Yottle' AS creature FROM DUAL UNION ALL SELECT 6 AS id, 'Nureau' AS creature FROM DUAL UNION ALL SELECT 7 AS id, 'Yeps' AS creature FROM DUAL ) SELECT rec_id, creature FROM nums JOIN creatures ON id = MOD ( rec_id, 7) + 1 )

Table Compression Comparison

SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = ANY ( 'A_HEAP_TABLE','A_COMPRESSED_TABLE') ;

CREATE TABLE a_heap_table ( rec_id NUMBER , creature VARCHAR2(100) ); INSERT INTO a_heap_table . . .

CREATE TABLE a_compressed_table ( rec_id NUMBER , creature VARCHAR2(100) ) COMPRESS; INSERT /*+ append */ INTO a_compressed_table . . .

Two elements must be present for

compression:

1) Table must have COMPRESS attribute

CREATE TABLE or ALTER TABLE

2) Direct path insert

(“append” hint)

Inserting

10 million records

SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = ANY ( 'A_HEAP_TABLE','A_COMPRESSED_TABLE') ; SEGMENT_NAME MB -------------------- ---------- A_HEAP_TABLE 172 A_COMPRESSED_TABLE 128 2 rows selected

In this example,

table storage was

compressed by

26%

Updates on a Compressed Table

Before Update After Update

128 mb 208 mb

(63% more)

8192 blocks 13,312 blocks

(63% more)

UPDATE a_compressed_table SET creature = 'Zizzer-Zazzer-Zuzz' WHERE MOD ( rec_id, 4 )= 3; COMMIT;

The newly updated values are not compressed. SELECT segment_name--,bytes , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'A_COMPRESSED_TABLE'; SEGMENT_NAME MB ---------------------- ---------- A_COMPRESSED_TABLE 208 1 row selected.

How to Regain Compression after Updates

INSERT INTO temp_copy...

TRUNCATE TABLE original

INSERT /*+ append */ INTO compressed_table...

1

2

3

CAVEAT: In either of these cases, you must have

available space to create a copy

1

3 2

ALTER TABLE original MOVE;

OR…

ALTER TABLE a_compressed_table MOVE;

SELECT segment_name--,bytes , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'A_COMPRESSED_TABLE'; SEGMENT_NAME MB ---------------------- ---------- A_COMPRESSED_TABLE 120 1 row selected.

Creating a New Table with Same data

CREATE TABLE a_compressed_table_new ( rec_id NUMBER , creature VARCHAR2(100) ) COMPRESS; INSERT /*+ append */ INTO a_compressed_table_new SELECT * FROM a_compressed_table ; COMMIT; SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = ANY ( 'A_COMPRESSED_TABLE','A_COMPRESSED_TABLE_NEW') SEGMENT_NAME MB ---------------------- ---------- A_COMPRESSED_TABLE 208 A_COMPRESSED_TABLE_NEW 120 2 rows selected.

TABLE PARTITIONING

Table Partitioning

• Divides a large table into smaller, more manageable units

• Choose a field to be used as the partition field

• Partition by: RANGE, LIST, HASH

• Oracle supports sub-partitioning, too • You can have partitions of your partitions!

Do not confuse

table

partitioning with

PARTITION BY

from analytic

SQL

Same word;

vastly different

concepts!

Why Partition?

• Two basic reasons to partition a table: • Query Performance • Data Loading

• Improve query performance

• Partition pruning

• Data Loads • Partition swaps for loading

• Additional benefits • Can aid table management

• Backups • Recovery • Compression -> Reorganization

• Table partitions may have different attributes • Compression, TABLESPACE

Partitioning Choices: RANGE vs LIST

sales_dt store_no sales_amt

2/12/2014 10 183.95

6/16/2014 20 450.00

3/09/2014 20 195.99

11/04/2013 10 25.00

12/22/2014 20 1992.16

5/01/2015 20 106.50

10/20/2013 10 35.33

VALUES LESS THAN DATE '2014-01-01'

VALUES LESS THAN DATE '2015-01-01'

VALUES LESS THAN MAXVALUE

VALUES ( 20 )

VALUES ( 10, 30 )

sales_dt store_no sales_amt

6/16/2014 20 450.00

3/09/2014 20 195.99

12/22/2014 20 1992.16

5/01/2015 20 106.50

sale_dt store_no sales_amt

2/12/2015 10 183.95

11/04/2013 10 25.00

10/20/2013 10 35.33

sales_dt store_no sales_amt

11/04/2013 10 25.00

10/20/2013 10 35.33

sales_dt store_no sales_amt

2/12/2014 10 183.95

6/16/2014 20 450.00

3/09/2014 20 195.99

12/22/2014 20 1992.16

sales_dt store_no sales_amt

5/01/2015 20 106.50

PARTITION BY RANGE ( sales_dt ) PARTITION BY LIST ( store_no )

VALUES ( DEFAULT )

Partition Pruning

Two examples

Partition Pruning

• In queries, allows the optimizer to narrow down to one or a few partitions

• When filtering criterion matches partitioning scheme

sales_dt store_no sales_amt

6/16/2012 20 450.00

3/09/2012 20 195.99

12/22/2012 20 1992.16

5/01/2013 20 106.50

sale_dt store_no sales_amt

2/12/2013 10 183.95

11/04/2011 10 25.00

10/20/2011 10 35.33

SALES

SELECT SUM ( sales_amt ) FROM sales WHERE store_no = 10 AND sales_dt > = DATE '2013-01-01'

SELECT SUM ( sales_amt ) FROM sales PARTITION S20 WHERE sales_dt > = DATE '2013-01-01'

Partitioned on

STORE_NO

Use Partitioned field in

WHERE clause

Use PARTITION keyword in

FROM clause

BIG_TABLE

Partition Swap

LOAD_TABLE OCT

SEP

AUG

JUL

JUN

Load October

refresh data into

this table

LOAD_TABLE must have

the same column

structure as BIG_TABLE

ALTER TABLE big_table EXCHANGE PARTITION oct WITH TABLE load_table;

A very large table…

DAN.GL_AUDIT_EOM_VRSN2_BASE

257,458,739 ROWS 14.6 gb 957,504 blocks

Partition data by month (Divide by 68 )

3,786,158 ROWS 220 mb 14,081 blocks

Contains data since January 2010 (68 months)

• The explain plans on the next slides are based on this table

2011:

12 partitions

2012:

12 partitions 2013:

12 partitions

2010:

12 partitions 2014:

12 partitions

2015:

8 partitions

Specifying Partition in the Query

EXPLAIN PLAN FOR SELECT * FROM gl_audit_eom_vrsn2_base PARTITION (gl_audit_eom_2015_04);

-------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3801K| | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM)| :TQ10000 | 3801K| | | | 3 | PX BLOCK ITERATOR | | 3801K| 65 | 65 | |* 4 | TABLE ACCESS FULL| GL_AUDIT_EOM_VRSN2_BASE | 3801K| 65 | 65 | --------------------------------------------------------------------------------

Partition Field in WHERE Clause

EXPLAIN PLAN FOR SELECT * FROM gl_audit_eom_vrsn2_base WHERE gl_post_dt BETWEEN DATE '2015-04-01' AND DATE '2015-04-30'

-------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3801K| | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM)| :TQ10000 | 3801K| | | | 3 | PX BLOCK ITERATOR | | 3801K| 65 | 65 | |* 4 | TABLE ACCESS FULL| GL_AUDIT_EOM_VRSN2_BASE | 3801K| 65 | 65 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("GL_POST_DT"<=TO_DATE(' 2015-04-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

“Partition Pruning”

With a function on the Partitioned Field

EXPLAIN PLAN FOR SELECT * FROM gl_audit_eom_vrsn2_base WHERE TRUNC( gl_post_dt ) BETWEEN DATE '2015-04-01' AND DATE '2015-04-30'

---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20636 | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 4973K| | | | 3 | VIEW | VW_TE_2 | 4973K| | | | 4 | UNION-ALL | | | | | | 5 | PX BLOCK ITERATOR | | 524K|KEY(OR)|KEY(OR)| |* 6 | TABLE ACCESS FULL| GL_AUDIT_EOM_VRSN2_BASE | 524K|KEY(OR)|KEY(OR)| | 7 | PX BLOCK ITERATOR | | 4449K|KEY(OR)|KEY(OR)| |* 8 | TABLE ACCESS FULL| GL_AUDIT_EOM_VRSN2_BASE | 4449K|KEY(OR)|KEY(OR)| ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------

Cardinality estimate is WAY off

Problems with Denormalizing Partitioned Field

GL_POST_DT GL_POST_YR GL_POST_MNTH

06/17/2015 2015 6

08/10/2015 2015 8

10/08/2014 2014 10

08/13/2015 2015 8

WHERE gl_post_dt BETWEEN DATE '2015-04-01' AND DATE '2015-04-30'

WHERE gl_post_yr = 2015 AND gl_post_mnth = 4

Partitioned Field

• Users request denormalized fields to • Ease WHERE clauses

• Facilitate grouping

Good query; utilizing the partitioned field

Bad query; Oracle doesn‟t know that the

fields are correlated with the partitioned field

Querying on Denormalized Partition Field

EXPLAIN PLAN FOR SELECT * FROM gl_audit_eom_vrsn2_base WHERE gl_post_yr = 2015 AND gl_post_mnth = 4

-------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20636 | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 6649K| | | | 3 | VIEW | VW_TE_2 | 6649K| | | | 4 | UNION-ALL | | | | | | 5 | PX PARTITION RANGE OR | | 2199K|KEY(OR)|KEY(OR)| |* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| GL_AUDIT_EOM_VRSN2_BASE | 2199K|KEY(OR)|KEY(OR)| | 7 | BITMAP CONVERSION TO ROWIDS | | | | | | 8 | BITMAP AND | | | | | |* 9 | BITMAP INDEX SINGLE VALUE | GLAM_V2$POST_MNTH | |KEY(OR)|KEY(OR)| |* 10 | BITMAP INDEX SINGLE VALUE | GLAM_V2$POST_YR | |KEY(OR)|KEY(OR)| | 11 | PX BLOCK ITERATOR | | 4449K|KEY(OR)|KEY(OR)| |* 12 | TABLE ACCESS FULL | GL_AUDIT_EOM_VRSN2_BASE | 4449K|KEY(OR)|KEY(OR)| --------------------------------------------------------------------------------------------------

Partitions: Record Count vs Size

196 mb 12,544 blocks

194 mb 12,416 blocks

225 mb 14,400 blocks

804 mb 51,456 blocks ? ?

What‟s In Your Wallet?

Something printed in a foreign language

What‟s In Your Wallet?

A Bus pass

( or

TRAX ticket stub or UTA transfer )

Recap

• Datatypes and Storage

• Tables Storage

• Indexing

• Compression

• Table Partitioning

THANK-YOU!

Dan Stober

Please fill out session surveys

Questions?

Comments?

Corrections?

Complaints?

[email protected]