design patterns and development ... abstract which works better, a b-tree index or a bitmap index?...
TRANSCRIPT
DESIGN PATTERNS AND DEVELOPMENT CONSIDERATIONS IN A DATA WAREHOUSE
Dan Stober
Intermountain Healthcare
Wednesday, September 23, 2015
Dan Stober
• Intermountain Healthcare • Enterprise Data Architect
• Data Architect
• Working in Oracle databases since 2001
• Frequent presenter at UTOUG events. • IOUG, OAUG, and Oracle Open World
• Occasional invited lecturer at universities ;-)
• California State University Fresno
Session Abstract
Which works better, a B-Tree index or a bitmap index?
What's the difference anyway? Should I partition my
table? What's the best way to perform an upsert?
Should I compress my table? In this session, we'll take a
look at these questions and more. We've only got an
hour, so buckle up for a wild ride!
Datatypes
• Major groups of datatypes • NUMBER
• Includes INTEGER, which is just NUMBER (38,0)
• DATE • Includes TIMESTAMP, INTERVAL
• VARCHAR • CHAR • Includes XMLType • DATES and NUMBERS can be stored in VARCHAR fields
• LOB • Includes CLOB, BLOB • Stored “out of line”
• LONG – Hey! Doesn‟t this make five types?!
• ROWID
Why Are Datatypes Important?
• Appropriate storage • NUMBER takes less space than storing VARCHAR representation
of same value
• Datatypes serve as enforcement mechanism to prevent invalid values
• Math and Functions
INSERT INTO scott.emp ( empno, ename, sal ) VALUES ( 6006, 'WILLIAMS', '750K' ); ORA-01722: invalid number
SELECT SYSDATE - hiredate FROM scott.emp;
How Much Space Does It Take?
VARCHAR NUMBER
SELECT VSIZE ( ename ) vsize, ename FROM scott.emp WHERE deptno = 10; VSIZE ENAME ---------- ---------- 4 KING 5 CLARK 6 MILLER 3 rows selected.
SELECT VSIZE ( sal ) vsize, sal FROM scott.emp WHERE deptno = 10; VSIZE SAL ---------- ---------- 2 1300 3 2450 2 5000 3 rows selected.
One byte per character
(with a single-bit character set)
Larger numbers do not necessarily take more space.
Based on scientific notation (significant digits)
DATE : seven bytes
Actual space used will vary. Dependent upon:
• PCTFREE
• Compression
• Efficiency of storage (deletes leaving holes, etc)
Storage Space for NUMBER Values
• NUMBER, from 1 to 22 bytes • One byte for every two significant digits PLUS ONE BYTE
• One more for negative numbers
WITH dta AS ( SELECT TO_NUMBER ( RPAD ( '9', 38, '9' )) num1 , TO_NUMBER ( RPAD ( '9', 38, '9' )) + 1 num2 FROM DUAL ) SELECT num1, num2 , VSIZE ( num1 ) , VSIZE ( num2) FROM dta;
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:1856720300346322149
99999999999999999999999999999999999999 100000000000000000000000000000000000000
Which of these
numbers requires
more space? WITH dta AS ( SELECT TO_NUMBER ( RPAD ( '9', 38, '9' )) num1 , TO_NUMBER ( RPAD ( '9', 38, '9' )) + 1 num2 FROM DUAL ) SELECT num1, num2 , VSIZE ( num1 ) , VSIZE ( num2) FROM dta; NUM1 NUM2 VSIZE(NUM1) VSIZE(NUM2) ---------- ---------- ----------- ----------- 1.0000E+38 1.0000E+38 20 2 1 row selected.
What Does Precision Mean?
• NUMBER can hold 38 digits of precision
• But Oracle will accept 1.234 ×10125.
• Why?
• This is only four digits of precision
NUMBER [ (p [, s]) ]
Number having precision p and scale s. The precision p can
range from 1 to 38. The scale s can range from -84 to 127. Both precision and
scale are in decimal digits. A NUMBER value requires from 1 to 22 bytes.
Oracle Database SQL Language Reference
11g Release 2 (E10592-04)
CHAR vs VARCHAR
CREATE TABLE a_table_char ( rec_id NUMBER , creature CHAR(100) ); CREATE TABLE a_table_varchar ( rec_id NUMBER , creature VARCHAR2(100) );
SELECT LENGTH ( creature ) , COUNT(*) FROM a_table_varchar GROUP BY LENGTH ( creature ); LENGTH(CREATURE) COUNT(*) ---------------- ---------- 4 2857143 6 5714285 10 1428572 3 rows selected.
SELECT LENGTH ( creature ) , COUNT(*) FROM a_table_char GROUP BY LENGTH ( creature ); LENGTH(CREATURE) COUNT(*) ---------------- ---------- 100 10000000 1 row selected.
REC_ID CREATURE
1 Zamp
2 Wasket
3 Noothgrush
4 Yottle
5 Nureau
6 Yeps
7 Wocket
8 Zamp
. . . 7 creatures repeat . . .
10000000 Noothgrush
SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name IN ('A_TABLE_VARCHAR' , 'A_TABLE_CHAR'); SEGMENT_NAME MB -------------------- ---------- A_TABLE_CHAR 1088 A_TABLE_VARCHAR 173 2 rows selected.
Why not make everything VARCHAR2(4000)?
• Tools that display query results based on the size of the field
• Allocating space in a fetch array
• Consider the field size as a constraint
ORA-12899: Value too large for column
https://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:114513253705
5
I have a good idea! Let‟s avoid ORA-12899
and just make everything 4000 characters
wide!
Storing Dates: DATE VS NUMBER VS VARCHAR
DATE NUMBER VARCHAR
JANUARY 1, 1900 19000101 19000101
JANUARY 2, 1900 19000102 19000102
. . .
DECEMBER 21, 2099 20991231 20991231
SELECT segment_name, bytes , bytes / POWER ( 1024,2) mb , blocks FROM dba_segments WHERE segment_name IN ( 'TBL_DATES','TBL_NUM_DATES','TBL_STR_DATES'); SEGMENT_NAME BYTES MB BLOCKS ---------------- ---------- ---------- ---------- TBL_DATES 983040 .9375 60 TBL_NUM_DATES 851968 .8125 52 TBL_STR_DATES 2097152 2 128 3 rows selected.
• Sure, Numbers consumed slightly less storage
• But…
• Data Integrity • Numbers and Varchar will not
prevent entry of invalid dates (20150931)
• Cannot perform math
• Date functions not available
IMPLICIT CONVERSIONS
• Oracle cannot compare different datatypes from different groups
• NUMBER to VARCHAR Precedence • Oracle always converts VARCHAR
values to number • Or at least it tries to!
SELECT * FROM DUAL WHERE 1 = '1';
NUMBER VARCHAR
SELECT * FROM scott.emp WHERE empno = '7843';
SELECT * FROM scott.emp WHERE empno = TO_NUMBER( '7843');
SELECT * FROM dan.addresses WHERE TO_NUMBER ( zip_code ) = 84096;
ORA-01722: invalid number
SELECT * FROM DUAL WHERE 1 = TO_NUMBER('1');
NUMBER field
compared to
VARCHAR value
VARCHAR field
compared to
NUMBER value
SELECT * FROM dan.addresses WHERE zip_code = 84096;
What Oracle
executes
USER_SEGMENTS/DBA_SEGMENTS
• Data Dictionary objects that provide information about storage
• SEGMENT can be: • A TABLE, a PARTITION, a SUBPARTITION, or an INDEX
• Size is given in BLOCKS and in BYTES
• Sometimes, the actual space consumed by the data is considerably smaller than what is indicated in DBA_SEGMENTS
Conversions
BYTES/1024 = KB
BYTES/10242 = MB
BYTES/10243 = GB
SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'SIZE_DEMO_NUMS'; SEGMENT_NAME MB -------------------- ---------- SIZE_DEMO_NUMS 112 1 row selected.
Sessions
Creating session
INSERT INTO size_demo_nums SELECT level FROM DUAL CONNECT BY LEVEL <= POWER ( 10,7); 10000000 rows created.
Second session
SELECT COUNT(*) FROM size_demo_nums; COUNT(*) ---------- 10000000 1 row selected.
SELECT COUNT(*) FROM size_demo_nums; COUNT(*) ---------- 0 1 row selected.
SELECT COUNT(*) FROM size_demo_nums; COUNT(*) ---------- 10000000 1 row selected.
COMMIT; Commit complete.
When is a Record Written?
CREATE TABLE size_demo_nums ( num_fld NUMBER ); Table created.
no rows selected.
INSERT INTO size_demo_nums SELECT level FROM DUAL CONNECT BY LEVEL <= POWER ( 10,7); 10000000 rows created.
SEGMENT_NAME MB -------------------- ---------- SIZE_DEMO_NUMS 112 1 row selected.
COMMIT; Commit complete.
SEGMENT_NAME MB -------------------- ---------- SIZE_DEMO_NUMS 112 1 row selected.
SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'SIZE_DEMO_NUMS';
Timing
INSERT RECORD
Database table
Rollback
Segments
Before COMMIT...
After COMMIT...
Queries in the same
session simply read the
database table
Queries in other
sessions must read the
database table
AND
reapply the ROLLBACK
segments
All queries simply read
the database table
Heap Tables
THIS IS A GOOD THING! • More efficient storage
• Empty space does not have to be maintained for future inserts and updates
• Faster writes • Uses “first available” space
ALTERNATIVES:
• Table Partitioning
• Direct Path Loads
• Index Organized Tables
heap n. A group of things placed, thrown,
or lying one on another; pile. SOURCE: dictionary.com
• Records are not stored in any
particular order
• Records are stored wherever
space is available
• Not necessary together
Oracle Storage
• All Oracle data is stored in BLOCKS (or DATABLOCKS)
• The BLOCK is the smallest unit of Oracle storage • To access a single record, Oracle must read the entire block which contains
the record
• BLOCK > EXTENT > SEGMENT > TABLESPACE • A table is stored in a single tablespace (unless partitioned) • A tablespace can contain multiple tables (usually does) • Indexes can be in a separate tablespace from the table
• Block space for deleted records is reclaimed only when new records are inserted
Block
overhead
Inserts are made
into available space in block,
wherever it can be found
PCTFREE
• Attribute of a TABLE that tells Oracle how much space to reserve for future updates
• If table is never updated, there‟s no need to reserve any free space
AAAWWAAY|7369|SMITH|CLERK|7902|17-DEC-1980|800||20
AAAWWAAZ|7499|ALLEN|SALESMAN|7698|20-FEB-1981|1600|300|30 AAAWWB
BAZ|7521|WARD|SALESMAN|7698|22-FEB-1981|1250|500|30 AAAWWAZA|
7566|JONES|MANAGER|7839|02-APR-19812975||20 AAAWWAZB|7654|MARTI
UPDATE scott.emp SET ename = 'COX' WHERE empno = 7499
AAAWWAAZ|7499|COX|SALESMAN|7698|20-FEB-1981|1600|300|30
ABACABAAW|7369|THOMPSON|CLERK|7902|17-DEC-1980|800||20
AAAWWAAY->ABACABAAW
UPDATE scott.emp SET ename = 'THOMPSON' WHERE empno = 7369
Prior Value ALLEN 5 chars
New Value COX 3 chars
Prior Value SMITH 5 chars
New Value THOMPSON 8 chars
For updates, when not enough PCTFREE has been allocated…
If the updates require less space than the original values, record will be updated in place
If the updates require more space, then the record will have to be written in a new location
PCTFREE
• If the data will never be updated, then setting PCTFREE to 0 is the most efficient.
• However, if there will be updates, then it makes sense to leave 10% or 20% for subsequent updates
High Water Mark
• When records are deleted from a table
• Oracle continues to reserve the space
• TRUNCATE releases space
HIGH WATER MARK
DELETE FROM size_demo_nums; 10000000 rows deleted. COMMIT; Commit complete. SELECT COUNT(*) FROM size_demo_nums; COUNT(*) ---------- 0 1 row selected.
SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'SIZE_DEMO_NUMS'; SEGMENT_NAME MB -------------------- ---------- SIZE_DEMO_NUMS 112 1 row selected.
This is for efficiency: Oracle will hold
this space to be removed for more
records for this table
Resetting the High Water Mark
TRUNCATE TABLE size_demo_nums; Table truncated. SELECT segment_name , bytes/power ( 1024,2) mb , bytes/1024 kb FROM dba_segments WHERE segment_name = 'SIZE_DEMO_NUMS'; SEGMENT_NAME MB KB -------------------- ---------- ---------- SIZE_DEMO_NUMS .0625 64 1 row selected.
Comparing TRUNCATE vs DELETE
Advantages
QUICK – No Rollback Segments
Resets High Water Mark
Disadvantages
No Abilility to ROLLBACK
This is DDL (Implicit COMMIT)
All or Nothing
How can you lower the high water mark
when deleting half of the records
Indexes
• Ordered list of values in a table column • Designed to help Oracle find record
more quickly
• Indexes are maintained in real-time – Synchronized • If a record is updated, the index is
updated, too
• Using an INDEX is a two-step process
1. Look up the value in the index
2. Go to the record indicated
Will Oracle Always Use the Index?
• Steve Catmull: The Index Tipping Point
Yeats
Like
„T%‟ There are too
many!
It would be faster
just to scan the
entire table
Index Selectivity
SELECT owner , index_name , distinct_keys / num_rows * 100 AS s , num_rows , distinct_keys , leaf_blocks , blevel , avg_leaf_blocks_per_key FROM dba_indexes WHERE num_rows > 0
The measure of the effectiveness of an index is
its SELECTIVITY
Defined as Number of Distinct Values / number
of rows in the table
An index over a column of unique values would
have a selectivity of 1.
“Highly selective”
B-Tree index
*Balanced at time of creation; B-Tree indexes can become unbalanced over time
• “Normal” index in Oracle
• Balanced: Records are divided in half *
• Then again…
• … And again
• NULL Values are not indexed
• B-Tree indexes can be very large: Often
larger than the table itself
• They can be updated very efficiently
Unbalanced Tree:
Depending on the field, over time, all of the new
records can be created on one half of the tree, leading
to an unbalanced tree.
B-Tree Indices
51618
24800
17957 36651
84084
70140 89631
Oracle steps through index,
half of all values are below
and half are above
Because of this “median”
logic, B-tree indices are
ineffective for fields with
only a few distinct values
91001
91002
91003
91004
91005
91006
Bitmap Indices
Advantages and Uses:
• Recommended for columns with few unique values
• Indexes NULL values
• Best used in concert with other bitmap indices
Disadvantages:
• Not good in transactional systems • Or any database with frequent DML
• One update causes entire index entry to be rebuilt
• Concurrency issues (locking)
How does a bitmap index work?
• One entry for each unique value • If there are only two distinct values in the field, then there are
only two entries in the index
• Examples:
• Sex: M or F
• Flag fields
• Then, a string of ones and zeros for each record in the table!
A Bitmap Indexes Distinct Values
10 00000010100001
20 10010001001010
30 01101100010100
Unique
Values
Occurs in records...
10 7, 9, 14
20 1, 4, 8, 11, 13
30 2, 3, 5, 6, 10, 12
10
10
10
20
20
20
20
20 Index
entries
Bitmap Indices in a Query
Bitmap Index on MGR
7566 00000001000010
7698 01101000010100
7782 00000000000001
7788 00000000001000
7839 00010110000000
7902 10000000000000
{null} 00000000100000
SELECT * FROM scott.emp WHERE deptno = 30 AND mgr = 7839
Mgr 7839 00010110000000
Dept 30 01101100010100
||||||||||||||
AND 00000100000000
7839 00010110000000
||||||||||||||
OR 01111110010100
Notice also:
A BITMAP index
indexes NULL
values, too
• Power of bitmap Indices comes from combining results from multiple bitmap indices • Oracle will not use two
B-Tree indices from the same table, but it will use multiple bitmaps
• This can make bitmaps highly selective
What‟s the downside of Bitmap Indices?
• Does it lock the table? • Well, no
• But.... • As UPDATE will lock every record in the
index for both old and new values
7369 SMITH 20
7499 ALLEN 30
7521 WARD 30
7566 JONES 20
7654 MARTIN 30
7698 BLAKE 30
7782 CLARK 10
7788 SCOTT 20
7839 KING 10
7844 TURNER 30
7876 ADAMS 20
7900 JAMES 30
7902 FORD 20
7934 MILLER 10
10 00000010100001
20 10010001001010
30 01101100010100
UPDATE emp SET deptno = 20 WHERE empno = 7934
7934 MILLER 10
10 00000010100000
20
10 00000010100000
20 10010001001011
Index bitmaps for both
values must be rebuilt
With UPDATE, the entire
effected rows must be
rebuilt
For INSERT and DELETE,
the entire index must be
rebuilt
Index Comparison
B-TREE BITMAP
• Entries point to ROWIDs
• Only one B-Tree index per table will be
used for any query
• NULL values are not indexed
• Better performance for transactional
systems
• Can be much larger than the table
• Not efficient for fields with few distinct
values
• Can become unbalanced over time
• Series of 0 and 1
• Pivoted
• Cannot be used as a Primary Key
• Multiple bitmap indices can be used in
one query
• Works well with multiple WHERE clause
conditions
• NULL values are indexed
• Poor performance on updates and deletes
• Generally small – compress well
• Often quicker to DROP the index and
CREATE again than to update and insert
Index Issues
• Indices are kept in synch with table changes • Oracle maintains the index at the same time records are inserted, updated
or deleted
• Reading an index is a two-step process • For a large number of records, index may not necessarily be the fastest
Compression
• Why? What is Compression?
• How to compress
• Compression Examples
• MOVE: Compressing Uncompressed Data
Why Compress?
• Compression saves storage (disk space) • More records fit into each block
• Faster queries • Fewer blocks must be read
Compression
• Oracle compression algorithms • BASIC
• ADVANCED ROW (OLTP)
• INDEX
• ( ARCHIVE – Exadata )
• Compression can be applied to • Tables
• Partitions
• Tablespace
• How? • Table must have COMPRESS attribute
• CREATE TABLE or ALTER TABLE
• Direct path insert 1. INSERT /*+ append */ hint, or…
2. CREATE TABLE AS SELECT
EXTRA LICENSE COST
Simplified Schematic for Compression
• Oracle compression is really de-duplication of data • If the same value appears
multiple times in the block, store it only once
• Token Map stores repeated values in block header
• Repeated values are replaced by the tokens
• Tokens can represent repeated combinations of tokens, too
CUST_ID|CUST_NM|CUST_CITY|CUST_ST 101 |HERBERT|ALPINE |UT 102 |HOLMES |DRAPER |UT 103 |HOLMES |PRESTON |ID 104 |HECKER |DRAPER |UT
V1:UT|V2:DRAPER| V3:HOLMES CUST_ID|CUST_NM|CUST_CITY|CUST_ST 101 |HERBERT|ALPINE |V1 102 |V3 |V2 |V1 103 |V3 |PRESTON |ID 104 |HECKER |V2 |V1
V1:UT|V2:DRAPER|V3:HOLMES|V4:V2||V1 CUST_ID|CUST_NM|CUST_CITY|CUST_ST 101 |HERBERT|ALPINE |V1 102 |V3 |V4 103 |V3 |PRESTON |ID 104 |HECKER |V4
Comparing Data Dictionary Entries
CREATE TABLE a_heap_table ( rec_id NUMBER , creature VARCHAR2(100) ) ;
CREATE TABLE a_compressed_table ( rec_id NUMBER , creature VARCHAR2(100) ) COMPRESS;
SELECT table_name, compression, compress_for, pct_free FROM all_tables WHERE table_name = ANY ('A_HEAP_TABLE','A_COMPRESSED_TABLE');
SELECT table_name, compression, compress_for, pct_free FROM all_tables WHERE table_name = ANY ('A_HEAP_TABLE','A_COMPRESSED_TABLE'); TABLE_NAME COMPRESSION COMPRESS_FOR PCT_FREE ------------------------------ ----------- ------------ ---------- A_HEAP_TABLE DISABLED 10 A_COMPRESSED_TABLE ENABLED BASIC 0 2 rows selected.
This is the default
By default,
Oracle assumes PCTFREE
will be 0 when table is
compressed
Ready to Insert 10,000,000 Records
REC_ID CREATURE
1 Wocket
2 Zamp
3 Wasket
4 Noothgrush
5 Yottle
6 Nureau
7 Yeps
8 Wocket
Values repeat every seven records
9999998 Zamp
9999999 Wasket
10000000 Noothgrush
WITH nums AS ( SELECT level rec_id FROM DUAL CONNECT BY level <= power ( 10,7) ) , creatures AS ( SELECT 1 AS id, 'Wocket' AS creature FROM DUAL UNION ALL SELECT 2 AS id, 'Zamp' AS creature FROM DUAL UNION ALL SELECT 3 AS id, 'Wasket' AS creature FROM DUAL UNION ALL SELECT 4 AS id, 'Noothgrush' AS creature FROM DUAL UNION ALL SELECT 5 AS id, 'Yottle' AS creature FROM DUAL UNION ALL SELECT 6 AS id, 'Nureau' AS creature FROM DUAL UNION ALL SELECT 7 AS id, 'Yeps' AS creature FROM DUAL ) SELECT rec_id, creature FROM nums JOIN creatures ON id = MOD ( rec_id, 7) + 1 )
Table Compression Comparison
SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = ANY ( 'A_HEAP_TABLE','A_COMPRESSED_TABLE') ;
CREATE TABLE a_heap_table ( rec_id NUMBER , creature VARCHAR2(100) ); INSERT INTO a_heap_table . . .
CREATE TABLE a_compressed_table ( rec_id NUMBER , creature VARCHAR2(100) ) COMPRESS; INSERT /*+ append */ INTO a_compressed_table . . .
Two elements must be present for
compression:
1) Table must have COMPRESS attribute
CREATE TABLE or ALTER TABLE
2) Direct path insert
(“append” hint)
Inserting
10 million records
SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = ANY ( 'A_HEAP_TABLE','A_COMPRESSED_TABLE') ; SEGMENT_NAME MB -------------------- ---------- A_HEAP_TABLE 172 A_COMPRESSED_TABLE 128 2 rows selected
In this example,
table storage was
compressed by
26%
Updates on a Compressed Table
Before Update After Update
128 mb 208 mb
(63% more)
8192 blocks 13,312 blocks
(63% more)
UPDATE a_compressed_table SET creature = 'Zizzer-Zazzer-Zuzz' WHERE MOD ( rec_id, 4 )= 3; COMMIT;
The newly updated values are not compressed. SELECT segment_name--,bytes , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'A_COMPRESSED_TABLE'; SEGMENT_NAME MB ---------------------- ---------- A_COMPRESSED_TABLE 208 1 row selected.
How to Regain Compression after Updates
INSERT INTO temp_copy...
TRUNCATE TABLE original
INSERT /*+ append */ INTO compressed_table...
1
2
3
CAVEAT: In either of these cases, you must have
available space to create a copy
1
3 2
ALTER TABLE original MOVE;
OR…
ALTER TABLE a_compressed_table MOVE;
SELECT segment_name--,bytes , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = 'A_COMPRESSED_TABLE'; SEGMENT_NAME MB ---------------------- ---------- A_COMPRESSED_TABLE 120 1 row selected.
Creating a New Table with Same data
CREATE TABLE a_compressed_table_new ( rec_id NUMBER , creature VARCHAR2(100) ) COMPRESS; INSERT /*+ append */ INTO a_compressed_table_new SELECT * FROM a_compressed_table ; COMMIT; SELECT segment_name , bytes/power ( 1024,2) mb FROM dba_segments WHERE segment_name = ANY ( 'A_COMPRESSED_TABLE','A_COMPRESSED_TABLE_NEW') SEGMENT_NAME MB ---------------------- ---------- A_COMPRESSED_TABLE 208 A_COMPRESSED_TABLE_NEW 120 2 rows selected.
Table Partitioning
• Divides a large table into smaller, more manageable units
• Choose a field to be used as the partition field
• Partition by: RANGE, LIST, HASH
• Oracle supports sub-partitioning, too • You can have partitions of your partitions!
Do not confuse
table
partitioning with
PARTITION BY
from analytic
SQL
Same word;
vastly different
concepts!
Why Partition?
• Two basic reasons to partition a table: • Query Performance • Data Loading
• Improve query performance
• Partition pruning
• Data Loads • Partition swaps for loading
• Additional benefits • Can aid table management
• Backups • Recovery • Compression -> Reorganization
• Table partitions may have different attributes • Compression, TABLESPACE
Partitioning Choices: RANGE vs LIST
sales_dt store_no sales_amt
2/12/2014 10 183.95
6/16/2014 20 450.00
3/09/2014 20 195.99
11/04/2013 10 25.00
12/22/2014 20 1992.16
5/01/2015 20 106.50
10/20/2013 10 35.33
VALUES LESS THAN DATE '2014-01-01'
VALUES LESS THAN DATE '2015-01-01'
VALUES LESS THAN MAXVALUE
VALUES ( 20 )
VALUES ( 10, 30 )
sales_dt store_no sales_amt
6/16/2014 20 450.00
3/09/2014 20 195.99
12/22/2014 20 1992.16
5/01/2015 20 106.50
sale_dt store_no sales_amt
2/12/2015 10 183.95
11/04/2013 10 25.00
10/20/2013 10 35.33
sales_dt store_no sales_amt
11/04/2013 10 25.00
10/20/2013 10 35.33
sales_dt store_no sales_amt
2/12/2014 10 183.95
6/16/2014 20 450.00
3/09/2014 20 195.99
12/22/2014 20 1992.16
sales_dt store_no sales_amt
5/01/2015 20 106.50
PARTITION BY RANGE ( sales_dt ) PARTITION BY LIST ( store_no )
VALUES ( DEFAULT )
Partition Pruning
Two examples
Partition Pruning
• In queries, allows the optimizer to narrow down to one or a few partitions
• When filtering criterion matches partitioning scheme
sales_dt store_no sales_amt
6/16/2012 20 450.00
3/09/2012 20 195.99
12/22/2012 20 1992.16
5/01/2013 20 106.50
sale_dt store_no sales_amt
2/12/2013 10 183.95
11/04/2011 10 25.00
10/20/2011 10 35.33
SALES
SELECT SUM ( sales_amt ) FROM sales WHERE store_no = 10 AND sales_dt > = DATE '2013-01-01'
SELECT SUM ( sales_amt ) FROM sales PARTITION S20 WHERE sales_dt > = DATE '2013-01-01'
Partitioned on
STORE_NO
Use Partitioned field in
WHERE clause
Use PARTITION keyword in
FROM clause
BIG_TABLE
Partition Swap
LOAD_TABLE OCT
SEP
AUG
JUL
JUN
Load October
refresh data into
this table
LOAD_TABLE must have
the same column
structure as BIG_TABLE
ALTER TABLE big_table EXCHANGE PARTITION oct WITH TABLE load_table;
A very large table…
DAN.GL_AUDIT_EOM_VRSN2_BASE
257,458,739 ROWS 14.6 gb 957,504 blocks
Partition data by month (Divide by 68 )
3,786,158 ROWS 220 mb 14,081 blocks
Contains data since January 2010 (68 months)
• The explain plans on the next slides are based on this table
2011:
12 partitions
2012:
12 partitions 2013:
12 partitions
2010:
12 partitions 2014:
12 partitions
2015:
8 partitions
Specifying Partition in the Query
EXPLAIN PLAN FOR SELECT * FROM gl_audit_eom_vrsn2_base PARTITION (gl_audit_eom_2015_04);
-------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3801K| | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM)| :TQ10000 | 3801K| | | | 3 | PX BLOCK ITERATOR | | 3801K| 65 | 65 | |* 4 | TABLE ACCESS FULL| GL_AUDIT_EOM_VRSN2_BASE | 3801K| 65 | 65 | --------------------------------------------------------------------------------
Partition Field in WHERE Clause
EXPLAIN PLAN FOR SELECT * FROM gl_audit_eom_vrsn2_base WHERE gl_post_dt BETWEEN DATE '2015-04-01' AND DATE '2015-04-30'
-------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3801K| | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM)| :TQ10000 | 3801K| | | | 3 | PX BLOCK ITERATOR | | 3801K| 65 | 65 | |* 4 | TABLE ACCESS FULL| GL_AUDIT_EOM_VRSN2_BASE | 3801K| 65 | 65 | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("GL_POST_DT"<=TO_DATE(' 2015-04-30 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
“Partition Pruning”
With a function on the Partitioned Field
EXPLAIN PLAN FOR SELECT * FROM gl_audit_eom_vrsn2_base WHERE TRUNC( gl_post_dt ) BETWEEN DATE '2015-04-01' AND DATE '2015-04-30'
---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20636 | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 4973K| | | | 3 | VIEW | VW_TE_2 | 4973K| | | | 4 | UNION-ALL | | | | | | 5 | PX BLOCK ITERATOR | | 524K|KEY(OR)|KEY(OR)| |* 6 | TABLE ACCESS FULL| GL_AUDIT_EOM_VRSN2_BASE | 524K|KEY(OR)|KEY(OR)| | 7 | PX BLOCK ITERATOR | | 4449K|KEY(OR)|KEY(OR)| |* 8 | TABLE ACCESS FULL| GL_AUDIT_EOM_VRSN2_BASE | 4449K|KEY(OR)|KEY(OR)| ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------
Cardinality estimate is WAY off
Problems with Denormalizing Partitioned Field
GL_POST_DT GL_POST_YR GL_POST_MNTH
06/17/2015 2015 6
08/10/2015 2015 8
10/08/2014 2014 10
08/13/2015 2015 8
WHERE gl_post_dt BETWEEN DATE '2015-04-01' AND DATE '2015-04-30'
WHERE gl_post_yr = 2015 AND gl_post_mnth = 4
Partitioned Field
• Users request denormalized fields to • Ease WHERE clauses
• Facilitate grouping
Good query; utilizing the partitioned field
Bad query; Oracle doesn‟t know that the
fields are correlated with the partitioned field
Querying on Denormalized Partition Field
EXPLAIN PLAN FOR SELECT * FROM gl_audit_eom_vrsn2_base WHERE gl_post_yr = 2015 AND gl_post_mnth = 4
-------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Pstart| Pstop | -------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 20636 | | | | 1 | PX COORDINATOR | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | 6649K| | | | 3 | VIEW | VW_TE_2 | 6649K| | | | 4 | UNION-ALL | | | | | | 5 | PX PARTITION RANGE OR | | 2199K|KEY(OR)|KEY(OR)| |* 6 | TABLE ACCESS BY LOCAL INDEX ROWID| GL_AUDIT_EOM_VRSN2_BASE | 2199K|KEY(OR)|KEY(OR)| | 7 | BITMAP CONVERSION TO ROWIDS | | | | | | 8 | BITMAP AND | | | | | |* 9 | BITMAP INDEX SINGLE VALUE | GLAM_V2$POST_MNTH | |KEY(OR)|KEY(OR)| |* 10 | BITMAP INDEX SINGLE VALUE | GLAM_V2$POST_YR | |KEY(OR)|KEY(OR)| | 11 | PX BLOCK ITERATOR | | 4449K|KEY(OR)|KEY(OR)| |* 12 | TABLE ACCESS FULL | GL_AUDIT_EOM_VRSN2_BASE | 4449K|KEY(OR)|KEY(OR)| --------------------------------------------------------------------------------------------------
Partitions: Record Count vs Size
196 mb 12,544 blocks
194 mb 12,416 blocks
225 mb 14,400 blocks
804 mb 51,456 blocks ? ?
THANK-YOU!
Dan Stober
Please fill out session surveys
Questions?
Comments?
Corrections?
Complaints?