praveen srivatsa director| astrhasoft consulting blogs.asthrasoft.com/praveens | [email protected]
TRANSCRIPT
Designing VLDBs
Praveen SrivatsaDirector| AstrhaSoft Consultingblogs.asthrasoft.com/praveens | [email protected]
Table and Index Partitioning
Designed for …VLDB w/very large tables (100’s GB) Large machines with 8, 16, 32, and more ‘real’ CPUs Replacing partitioned views where the partitions are in a single database Improving large data set manageability • Loading • Efficient dropping of whole partitions
How do you do Partitioning?
• Before a table is partitioned two things have to be created, a partition Function and Schema
• Partition FunctionEach row of an index/table is assigned to a partition (numbered 1, 2, 3, ...) by using a “Partition Function”• SQL Server 2005 supports Range partitioning only on a single
column • User defines the key column, the number of partitions, and the
partition boundary points • Partition Scheme
Each partition for a partition function is mapped to a physical storage location (Filegroup) through a “Partition Scheme”
Rules For Partitions
Restrictions are derived from the principles of no data movement and no scans All target indexes must exist in the source All indexes must be aligned (use the same partition function as was used on the table)Corresponding target index is in the same filegroup as the source There must be a constraint on the source to ensure all its data “fits” into the target partition
Example Code
CREATE PARTITION FUNCTION testtablepart_fn (int) AS RANGE LEFT FOR VALUES (-1,10, 20, 30)
CREATE PARTITION SCHEME testtablepart_ps AS PARTITION testtablepart_fn TO([Filegroup4], [Filegroup1], [Filegroup1], [Filegroup3], [Filegroup2])The order of filegroups being displayed is significant
CREATE TABLE Employees (EmpId int, EmpName varchar(50)) ON testtablepart_ps(EmpId)
Partitioning Key restrictions
• Must be based on columns in the table • Restrictions similar to index key limitations
Only 1 column is permitted No Text/NText/Image in partition key No varchar(max) No timestamp Only “native types” – no user-defined types Column values must be deterministic Computed columns must be persisted (a new feature in 2005)Maximum 1000 partitions per table in SQL Server 2005 • (practical limitation, not tested beyond 1000)
All partitions of a single table or index must be in a single database• Partitioned views can be used in conjunction with table partitioning to span
databases and servers
Partitioning Benefits• Allow easy management of very large
tables and indexes (data scale-up)For example Fast Insert or Delete of large quantities of data (per-partition) Index defragmentation or rebuild on one partition using ALTER INDEX … PARTITION (<num>)
• In some cases may experience some performance improvements for some operators, e.g. hash join would be more efficient
take advantage of collocation joins of large tables
Partitioning Benefits• Without interfering with access to the rest of a
table:Add 100's of millions of rows to a table that already has billions of rowsDelete 100's of millions of rows from that table
• Solution … use partitions with ranges• Very fast (1 second), just metadata changes
Switch data in and out • Table and partition of another table • Two partitions of different tables
Bulk data loading (into separate table - then switch into main table)Index maintenance per partition
• Two key best practices to improve manageability Index Alignment (partitioned like the table)Storage Alignment (table partitions in the same filegroup\file)
Partitioning Benefits• When you have to BCP 100 million rows into a table that
already has 600 million row, need to reindex Table has indexes => load is slow; about 10+ times slower than loading into a heap and creating indexes afterwardsReindexing made table unavailable Using a partitioned table, • can load into the table, and index it • while everyone still using the rest of the partitions • When done loading and indexing, just switch it in
• Delete is slow – deleting rows is orders of magnitude slower than truncating a multi-GB table
But you don’t want to truncate the table, just delete 20 or 30% of it With partitioning, just switch out the unneeded partition
• Cost of running a utility usually grows linearly with the table size
DBCC CHECKTABLE, use it on a partition at a time
Table Partitioning and Switching
• Bring a ‘partition’ of new data in or take a ‘partition’ of old data out of a partitioned table very fast
A table has 12 billion rows divided into 12 monthsYou want to archive off the 13th month and then bring in the new monthYou have partitioned by month
• Data is ‘moved’ (not really) between tables by SWITCHing the pointers to physical data locations
• DATA IS NOT MOVED or even scanned, so size doesn’t matter
It’s a metadata operationTakes a second
SWITCHing Partition In
Q1Q1 Q2Q2 Q3Q3 Q4 = Q4 = EmptyEmpty
Table1
New Q4New Q4
Table2
ALTER TABLE Table2 SWITCH TO Table1 PARTITION 4ALTER TABLE Table2 SWITCH TO Table1 PARTITION 4
Example Switch CodeALTER TABLE table_name1 SWITCH [PARTITION <partition_number1>] TO table_name2 [PARTITION <partition_number2>]
table_name1 is called “source” and table_name2 “target” of the SWITCH
Target table (or partition_number2) is empty
SWITCH Partition• Performance – allows building new and
removing old partition fast • Availability – allows adding new and
removing old partition with minimal downtime Note: Schema modification lock is acquired for the duration of the ALTER TABLE … SWITCH
• Per-partition manageability – enables taking single partition out and run utility on it
CREATE INDEX, ALTER INDEX … REBUILDAnd other operations that are not intended to work on individual partition numbers, DBCC CHECKTABLE
• Efficiently supports Sliding Window scenario
Indexes can also be Partitioned
Syntax is the same as for tables “ON PartitionScheme(col)”
They may be partitioned differently from the base table However, index alignment is the best practice
If an index uses a similar partition function and same partitioning key as the base table, then the index is “aligned” Similar means the same number of partitions and same boundary points One-to-one correspondence between data in table and index partition • All index entries in one partition map to data in a single partition of
the base table
Aligned Index One-to-one partition correspondence
Table Partitions
Index Partitions
Can I Change a Partition?• Yes, but do you want to? Not if there are already 100’s of
millions of rows • Plan partitioning very carefully and avoid a huge I/O
problem later• But if you have to add or remove a partition:
SPLIT – adding a partition • Taking 1 partition and splitting it into 2 partitions• ALTER PARTITION FUNCTION pfname() SPLIT RANGE
(new_boundary_value)MERGE – removing a partition • Joining 2 partitions into 1• ALTER PARTITION FUNCTION pfname() MERGE RANGE
(old_boundary_value)• All affected tables are locked during the operation by
EXCLUSIVE lock • Logging is in effect similarly as in INSERT INTO … SELECT
FROM• No data moves in or out of the table; only rows of tables
and indexes are moved from one partition to anotherThis is the I/O killer, so plan ahead of time to avoid doing this
Adding and Removing Partition
F TP
F K TP
11 443322
11 554422 33
MERGEMERGE SPLITSPLIT
Partition NumbersPartition Numbers
Partitioning Best Practices• Plan Plan Plan ahead and save yourself from pain• Ensure minimal data movement for
SPLIT MERGE
• Have many partition functions so any alteration of one function will have limited impact
• Use storage and index alignment• Empty partition should be the one that is “SPLIT”• Watch how filegroups are assigned to partitions• Spread the filegroups over many drives – let SQL
Server handle the I/O distribution (e.g. each partition on a separate disk may not be the best choice)
Feedback / QnA
Your Feedback is Important!Please take a few moments to fill out our
online feedback form
For detailed feedback, use the form at http://www.connectwithlife.co.in/vtd/helpdesk.aspx
Or email us at [email protected]
Use the Question Manager on LiveMeeting to ask your questions now!
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.