gp 4111-adminguide-111021215551-phpapp02

1041
Greenplum ® Database 4.1 Administrator Guide P/N: 300-012-428 Rev: A03 The Data Computing Division of EMC

Upload: ahmad-yani-emrizal

Post on 21-Nov-2014

651 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

  • 1. The Data Computing Division of EMC Greenplum Database 4.1 Administrator Guide P/N: 300-012-428 Rev: A03
  • 2. Copyright 2011 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners.
  • 3. Greenplum Database Administrator Guide 4.1 - Contents Greenplum Database Administrator Guide 4.1 - Contents Preface ............................................................................................... 1 About This Guide .............................................................................. 1 Document Conventions .................................................................... 2 Text Conventions ........................................................................ 2 Command Syntax Conventions ................................................... 3 Getting Support ............................................................................... 3 Product information .................................................................... 3 Technical support ....................................................................... 4 Section I: Introduction to Greenplum Chapter 1: About the Greenplum Architecture ........................ 6 About the Greenplum Master ............................................................ 7 About the Greenplum Segments....................................................... 7 About the Greenplum Interconnect .................................................. 7 About Redundancy and Failover in Greenplum Database .................. 8 About Segment Mirroring ............................................................ 8 About Master Mirroring ............................................................... 9 About Interconnect Redundancy ................................................. 9 About Parallel Data Loading ............................................................10 About Management and Monitoring .................................................10 Chapter 2: About Distributed Databases .................................12 Understanding How Data is Stored ..................................................12 Understanding Greenplum Distribution Policies................................13 Chapter 3: Summary of Greenplum Features .........................14 Greenplum SQL Standard Conformance ..........................................14 Core SQL Conformance ..............................................................14 SQL 1992 Conformance .............................................................15 SQL 1999 Conformance .............................................................16 SQL 2003 Conformance .............................................................16 SQL 2008 Conformance .............................................................17 Greenplum and PostgreSQL Compatibility .......................................18 Chapter 4: About Greenplum Query Processing ....................25 Understanding Query Planning and Dispatch ...................................25 Understanding Greenplum Query Plans ...........................................26 Understanding Parallel Query Execution ..........................................27 Section II: Access Control and Security Chapter 5: Managing Roles and Privileges ..............................30 Security Best Practices for Roles and Privileges ...............................30 Creating New Roles (Users) .............................................................31 Altering Role Attributes ..............................................................31 Creating Groups (Role Membership) ................................................32 Managing Object Privileges .............................................................33 Simulating Row and Column Level Access Control .....................34 Encrypting Data ..............................................................................34 Table of Contents iii
  • 4. Greenplum Database Administrator Guide 4.1 - Contents Chapter 6: Configuring Client Authentication .........................36 Allowing Connections to Greenplum Database .................................36 Editing the pg_hba.conf File.......................................................37 Limiting Concurrent Connections .....................................................38 Encrypting Client/Server Connections .............................................39 Chapter 7: Accessing the Database ...........................................41 Establishing a Database Session .....................................................41 Supported Client Applications ..........................................................42 Greenplum Database Client Applications ....................................43 pgAdmin III for Greenplum Database ........................................44 Database Application Interfaces .................................................47 Third-Party Client Tools .............................................................48 Troubleshooting Connection Problems .............................................49 Chapter 8: Managing Workload and Resources .....................50 Overview of Greenplum Workload Management ..............................50 How Resource Queues Work in Greenplum Database .................50 Steps to Enable Workload Management .....................................54 Configuring Workload Management .................................................55 Creating Resource Queues ..............................................................56 Creating Queues with an Active Query Limit ..............................56 Creating Queues with Memory Limits .........................................57 Creating Queues with a Query Planner Cost Limits ....................57 Setting Priority Levels ................................................................58 Assigning Roles (Users) to a Resource Queue..................................59 Removing a Role from a Resource Queue ..................................59 Modifying Resource Queues.............................................................60 Altering a Resource Queue.........................................................60 Dropping a Resource Queue ......................................................60 Checking Resource Queue Status ....................................................60 Viewing Queued Statements and Resource Queue Status ..........61 Viewing Resource Queue Statistics ............................................61 Viewing the Roles Assigned to a Resource Queue ......................61 Viewing the Waiting Queries for a Resource Queue ....................62 Clearing a Waiting Statement From a Resource Queue ..............62 Viewing the Priority of Active Statements ..................................63 Resetting the Priority of an Active Statement.............................63 Section III: Database Administration Chapter 9: Defining Database Objects......................................65 Creating and Managing Databases ..................................................65 About Template Databases ........................................................65 Creating a Database ..................................................................65 Viewing the List of Databases ....................................................66 Altering a Database ...................................................................66 Dropping a Database .................................................................66 Creating and Managing Tablespaces ................................................67 Creating a Filespace...................................................................67 Creating a Tablespace ...............................................................68 Table of Contents iv
  • 5. Greenplum Database Administrator Guide 4.1 - Contents Using a Tablespace to Store Database Objects ..........................68 Viewing Existing Tablespaces and Filespaces .............................69 Dropping Tablespaces and Filespaces ........................................69 Creating and Managing Schemas.....................................................69 The Default Public Schema .......................................................70 Creating a Schema ....................................................................70 Schema Search Paths ................................................................70 Dropping a Schema ...................................................................71 System Schemas .......................................................................71 Creating and Managing Tables ........................................................72 Creating a Table ........................................................................72 Altering a Table .........................................................................79 Dropping a Table .......................................................................80 Partitioning Large Tables .................................................................80 Understanding Table Partitioning in Greenplum Database ..........80 Deciding on a Table Partitioning Strategy ..................................81 Creating Partitioned Tables ........................................................82 Loading Partitioned Tables .........................................................86 Verifying Your Partition Strategy ................................................86 Viewing Your Partition Design ....................................................87 Maintaining Partitioned Tables ...................................................87 Creating and Using Sequences ........................................................91 Creating a Sequence..................................................................91 Using a Sequence ......................................................................91 Altering a Sequence ...................................................................92 Dropping a Sequence.................................................................92 Using Indexes in Greenplum Database ............................................92 Index Types ...............................................................................94 Creating an Index ......................................................................96 Examining Index Usage .............................................................96 Managing Indexes .....................................................................97 Dropping an Index .....................................................................97 Creating and Managing Views..........................................................97 Creating Views ...........................................................................97 Dropping Views..........................................................................97 Chapter 10: Managing Data .........................................................99 About Concurrency Control in Greenplum Database ........................99 Inserting New Rows ...................................................................... 100 Updating Existing Rows ................................................................. 101 Deleting Rows ............................................................................... 101 Truncating a Table ................................................................... 102 Working With Transactions ............................................................ 102 Transaction Isolation Levels ..................................................... 102 Vacuuming the Database .............................................................. 103 Configuring the Free Space Map .............................................. 103 Chapter 11: Querying Data ........................................................ 105 Defining Queries ........................................................................... 105 SQL Lexicon ............................................................................. 105 Table of Contents v
  • 6. Greenplum Database Administrator Guide 4.1 - Contents SQL Value Expressions ............................................................ 105 Using Functions and Operators ...................................................... 114 Using Functions in Greenplum Database .................................. 114 User-Defined Functions ............................................................ 115 Built-in Functions and Operators .............................................. 115 Query Profiling .............................................................................. 130 Reading EXPLAIN Output ......................................................... 131 Reading EXPLAIN ANALYZE Output .......................................... 132 What to Look for in a Query Plan ............................................. 133 Chapter 12: Loading and Unloading Data .............................. 135 Greenplum Database Loading Tools Overview ............................... 135 About External Tables .............................................................. 135 About gpload ........................................................................... 136 About COPY .............................................................................. 136 Loading Data into Greenplum Database ........................................ 136 Accessing File-Based External Tables ....................................... 137 Defining External Tables - Examples ............................................. 139 Using the Greenplum Parallel File Server (gpfdist) ................... 141 Using Hadoop Distributed File System (HDFS) Tables .............. 144 Creating and Using Web External Tables .................................. 146 Loading Data Using an External Table ...................................... 148 Handling Load Errors ............................................................... 148 Loading Data from Greenplum Database ....................................... 150 Loading Data with gpload ........................................................ 150 Loading Data with the gphdfs Protocol ..................................... 151 Loading Data with COPY .......................................................... 152 Data Loading Performance Tips ............................................... 152 Unloading Data from Greenplum Database .................................... 153 Defining a File-Based Writable External Table .......................... 153 Defining a Command-Based Writable External Web Table ........ 155 Unloading Data Using a Writable External Table ...................... 156 Unloading Data Using COPY ..................................................... 157 Readable External Tables and Query Planner Statistics ............ 157 Formatting Data Files .................................................................... 157 Formatting Rows...................................................................... 157 Formatting Columns ................................................................ 158 Representing NULL Values ....................................................... 158 Escaping .................................................................................. 158 Character Encoding.................................................................. 160 Section IV: System Administration Chapter 13: Starting and Stopping Greenplum .................... 162 Overview ....................................................................................... 162 Starting Greenplum Database ....................................................... 162 Restarting Greenplum Database .............................................. 162 Uploading Configuration File Changes Only .............................. 163 Starting the Master in Maintenance Mode ................................ 163 Stopping Greenplum Database ...................................................... 163 Table of Contents vi
  • 7. Greenplum Database Administrator Guide 4.1 - Contents Chapter 14: Configuring Your Greenplum System .............. 165 About Greenplum Master and Local Parameters ............................ 165 Setting Configuration Parameters .................................................. 165 Setting a Local Configuration Parameter .................................. 166 Setting a Master Configuration Parameter ............................... 166 Viewing Settings of Server Configuration Parameters ................... 167 Configuration Parameter Categories .............................................. 167 Connection and Authentication Parameters .............................. 168 System Resource Consumption Parameters ............................. 169 Query Tuning Parameters ........................................................ 170 Error Reporting and Logging Parameters ................................. 172 System Monitoring Parameters ................................................ 172 Runtime Statistics Collection Parameters ................................. 173 Automatic Statistics Collection Parameters .............................. 173 Client Connection Default Parameters ...................................... 174 Lock Management Parameters ................................................. 174 Workoad Management Parameters .......................................... 174 External Table Parameters ....................................................... 175 Append-Only Table Parameters ................................................ 175 Database and Tablespace/Filespace Parameters ...................... 175 Past PostgreSQL Version Compatibility Parameters .................. 175 Greenplum Array Configuration Parameters ............................. 175 Chapter 15: Enabling High Availability Features ................. 177 Overview of High Availability in Greenplum Database.................... 177 Overview of Segment Mirroring ............................................... 177 Overview of Master Mirroring ................................................... 178 Overview of Fault Detection and Recovery ............................... 179 Enabling Mirroring in Greenplum Database .................................... 180 Enabling Segment Mirroring ..................................................... 180 Enabling Master Mirroring ........................................................ 181 Knowing When a Segment is Down ............................................... 182 Enabling Alerts and Notifications .............................................. 182 Checking for Failed Segments .................................................. 182 Checking the Log Files ............................................................. 183 Recovering a Failed Segment ........................................................ 183 Recovering From Segment Failures .......................................... 184 Recovering a Failed Master............................................................ 187 Restoring Master Mirroring After a Recovery ............................ 188 Chapter 16: Backing Up and Restoring Databases .............. 190 Overview of Backup and Restore Operations ................................. 190 About Parallel Backups ............................................................ 190 About Non-Parallel Backups ..................................................... 191 About Parallel Restores ............................................................ 191 About Non-Parallel Restores .................................................... 192 Backing Up a Database ................................................................. 192 Backing Up a Database with gp_dump ..................................... 193 Automating Parallel Backups with gpcrondump ........................ 194 Restoring From Parallel Backup Files ............................................. 195 Table of Contents vii
  • 8. Greenplum Database Administrator Guide 4.1 - Contents Restoring a Database with gp_restore ..................................... 195 Restoring a Database Using gpdbrestore ................................. 197 Restoring to a Different Greenplum System Configuration ....... 197 Chapter 17: Expanding a Greenplum System ....................... 199 Planning Greenplum System Expansion ......................................... 199 System Expansion Overview .................................................... 199 System Expansion Checklist .................................................... 201 Planning New Hardware Platforms ........................................... 202 Planning Initialization of New Segments .................................. 202 Planning Table Redistribution ................................................... 203 Preparing and Adding Nodes ......................................................... 206 Adding New Nodes to the Trusted Host Environment ............... 206 Verifying OS Settings ............................................................... 208 Validating Disk I/O and Memory Bandwidth ............................. 208 Integrating New Hardware into the System ............................. 209 Initializing New Segments ............................................................. 209 Creating an Input File for System Expansion ........................... 209 Running gpexpand to Initialize New Segments ........................ 212 Rolling Back an Failed Expansion Setup ................................... 213 Redistributing Tables ..................................................................... 213 Ranking Tables for Redistribution ............................................ 213 Redistributing Tables Using gpexpand...................................... 214 Monitoring Table Redistribution................................................ 214 Removing the Expansion Schema .................................................. 215 Chapter 18: Monitoring a Greenplum System....................... 216 Monitoring Database Activity and Performance .............................. 216 Monitoring System State ............................................................... 216 Enabling System Alerts and Notifications ................................. 217 Checking System State ............................................................ 223 Checking Disk Space Usage ..................................................... 224 Checking for Data Distribution Skew ........................................ 225 Viewing Metadata Information about Database Objects ........... 226 Viewing the Database Server Log Files .......................................... 227 Log File Format ........................................................................ 227 Searching the Greenplum Database Server Log Files ............... 228 Using gp_toolkit ............................................................................ 228 Chapter 19: Routine System Maintenance Tasks................. 230 Routine Vacuum and Analyze ........................................................ 230 Transaction ID Management .................................................... 230 System Catalog Maintenance ................................................... 230 Vacuum and Analyze for Query Optimization ........................... 231 Routine Reindexing ....................................................................... 231 Managing Greenplum Database Log Files ...................................... 232 Database Server Log Files ....................................................... 232 Management Utility Log Files ................................................... 232 Table of Contents viii
  • 9. Greenplum Database Administrator Guide 4.1 - Contents Section V: Performance Tuning Chapter 20: Defining Database Performance ....................... 234 Understanding the Performance Factors ........................................ 234 System Resources ................................................................... 234 Workload ................................................................................. 234 Throughput .............................................................................. 234 Contention ............................................................................... 235 Optimization ............................................................................ 235 Determining Acceptable Performance ............................................ 235 Baseline Hardware Performance .............................................. 235 Performance Benchmarks ........................................................ 235 Chapter 21: Common Causes of Performance Issues......... 237 Identifying Hardware and Segment Failures .................................. 237 Managing Workload ....................................................................... 238 Avoiding Contention ...................................................................... 238 Maintaining Database Statistics ..................................................... 238 Identifying Statistics Problems in Query Plans ......................... 238 Tuning Statistics Collection ...................................................... 239 Optimizing Data Distribution ......................................................... 239 Optimizing Your Database Design.................................................. 239 Greenplum Database Maximum Limits ..................................... 240 Chapter 22: Investigating a Performance Problem ............ 241 Checking System State ................................................................. 241 Checking Database Activity ........................................................... 241 Checking for Active Sessions (Workload) ................................. 241 Checking for Locks (Contention) .............................................. 241 Checking Query Status and System Utilization ......................... 242 Troubleshooting Problem Queries .................................................. 242 Investigating Error Messages ........................................................ 242 Gathering Information for Greenplum Support ......................... 243 Section VI: Extending Greenplum Database Chapter 23: Using Greenplum MapReduce ............................ 245 About Greenplum MapReduce ....................................................... 245 The Basics of MapReduce ......................................................... 245 How Greenplum MapReduce Works .......................................... 246 Programming Greenplum MapReduce ............................................ 247 Defining Inputs ........................................................................ 247 Defining Map Functions ............................................................ 250 Defining Reduce Functions ....................................................... 252 Defining Outputs...................................................................... 255 Defining Tasks ......................................................................... 256 Putting Together a Complete MapReduce Specification ............ 257 Submitting MapReduce Jobs for Execution .................................... 257 Troubleshooting Problems with MapReduce Jobs ........................... 258 Language Does Not Exist ......................................................... 258 Generic Python Iterator Error .................................................. 259 Function Defined Using Wrong MODE ....................................... 259 Table of Contents ix
  • 10. Greenplum Database Administrator Guide 4.1 - Contents Section VII: References Appendix A: SQL Command Reference....................................... 264 SQL Syntax Summary ................................................................... 266 ABORT .......................................................................................... 293 ALTER AGGREGATE ....................................................................... 294 ALTER CONVERSION ..................................................................... 296 ALTER DATABASE.......................................................................... 297 ALTER DOMAIN ............................................................................. 299 ALTER EXTERNAL TABLE ............................................................... 301 ALTER FILESPACE ......................................................................... 304 ALTER FUNCTION .......................................................................... 305 ALTER GROUP ............................................................................... 308 ALTER INDEX ................................................................................ 309 ALTER LANGUAGE ......................................................................... 311 ALTER OPERATOR ......................................................................... 312 ALTER OPERATOR CLASS .............................................................. 313 ALTER RESOURCE QUEUE ............................................................. 314 ALTER ROLE .................................................................................. 317 ALTER SCHEMA ............................................................................. 321 ALTER SEQUENCE ......................................................................... 322 ALTER TABLE ................................................................................ 325 ALTER TABLESPACE ...................................................................... 337 ALTER TRIGGER ............................................................................ 338 ALTER TYPE ................................................................................... 339 ALTER USER .................................................................................. 340 ANALYZE ....................................................................................... 341 BEGIN ........................................................................................... 343 CHECKPOINT ................................................................................. 345 CLOSE ........................................................................................... 346 CLUSTER ....................................................................................... 347 COMMENT ..................................................................................... 350 COMMIT ........................................................................................ 353 COPY ............................................................................................. 354 CREATE AGGREGATE ..................................................................... 362 CREATE CAST ................................................................................ 366 CREATE CONVERSION ................................................................... 369 CREATE DATABASE ....................................................................... 371 CREATE DOMAIN ........................................................................... 373 CREATE EXTERNAL TABLE ............................................................. 375 CREATE FUNCTION........................................................................ 383 CREATE GROUP ............................................................................. 389 CREATE INDEX .............................................................................. 390 CREATE LANGUAGE ....................................................................... 394 CREATE OPERATOR ....................................................................... 398 CREATE OPERATOR CLASS ............................................................ 403 CREATE RESOURCE QUEUE ........................................................... 408 CREATE ROLE ................................................................................ 412 CREATE RULE ................................................................................ 417 Table of Contents x
  • 11. Greenplum Database Administrator Guide 4.1 - Contents CREATE SCHEMA ........................................................................... 420 CREATE SEQUENCE ....................................................................... 422 CREATE TABLE .............................................................................. 426 CREATE TABLE AS ......................................................................... 437 CREATE TABLESPACE .................................................................... 441 CREATE TRIGGER .......................................................................... 443 CREATE TYPE ................................................................................ 446 CREATE USER ............................................................................... 453 CREATE VIEW ............................................................................... 454 DEALLOCATE ................................................................................. 457 DECLARE ....................................................................................... 458 DELETE ......................................................................................... 461 DROP AGGREGATE ........................................................................ 464 DROP CAST ................................................................................... 465 DROP CONVERSION ...................................................................... 466 DROP DATABASE........................................................................... 467 DROP DOMAIN .............................................................................. 468 DROP EXTERNAL TABLE ................................................................ 469 DROP FILESPACE .......................................................................... 470 DROP FUNCTION ........................................................................... 471 DROP GROUP ................................................................................ 473 DROP INDEX ................................................................................. 474 DROP LANGUAGE .......................................................................... 475 DROP OPERATOR .......................................................................... 476 DROP OPERATOR CLASS ............................................................... 478 DROP OWNED ............................................................................... 480 DROP RESOURCE QUEUE .............................................................. 482 DROP ROLE ................................................................................... 484 DROP RULE ................................................................................... 485 DROP SCHEMA .............................................................................. 486 DROP SEQUENCE .......................................................................... 487 DROP TABLE ................................................................................. 488 DROP TABLESPACE ....................................................................... 489 DROP TRIGGER ............................................................................. 490 DROP TYPE .................................................................................... 491 DROP USER ................................................................................... 492 DROP VIEW ................................................................................... 493 END .............................................................................................. 494 EXECUTE ....................................................................................... 495 EXPLAIN ........................................................................................ 496 FETCH ........................................................................................... 499 GRANT .......................................................................................... 503 INSERT ......................................................................................... 508 LOAD ............................................................................................ 510 LOCK ............................................................................................. 511 MOVE ............................................................................................ 515 PREPARE ....................................................................................... 517 REASSIGN OWNED ........................................................................ 520 REINDEX ....................................................................................... 521 Table of Contents xi
  • 12. Greenplum Database Administrator Guide 4.1 - Contents RELEASE SAVEPOINT .................................................................... 523 RESET ........................................................................................... 524 REVOKE ........................................................................................ 525 ROLLBACK ..................................................................................... 528 ROLLBACK TO SAVEPOINT ............................................................ 529 SAVEPOINT ................................................................................... 531 SELECT ......................................................................................... 533 SELECT INTO ................................................................................ 549 SET ............................................................................................... 551 SET ROLE ...................................................................................... 553 SET SESSION AUTHORIZATION .................................................... 555 SET TRANSACTION ....................................................................... 557 SHOW ........................................................................................... 560 START TRANSACTION ................................................................... 561 TRUNCATE .................................................................................... 563 UPDATE ......................................................................................... 564 VACUUM ........................................................................................ 568 VALUES ......................................................................................... 571 Appendix B: Management Utility Reference................................ 574 Backend Server Programs ............................................................. 575 Management Utility Summary ....................................................... 576 gp_dump....................................................................................... 597 gp_restore .................................................................................... 602 gpaddmirrors ................................................................................ 606 gpactivatestandby ......................................................................... 611 gpbitmapreindex ........................................................................... 614 gpcheck ........................................................................................ 616 gpcheckperf .................................................................................. 618 gpconfig ........................................................................................ 622 gpcrondump .................................................................................. 626 gpdbrestore .................................................................................. 632 gpdeletesystem ............................................................................. 635 gpdetective ................................................................................... 637 gpexpand ...................................................................................... 640 gpfdist........................................................................................... 644 gpfilespace .................................................................................... 647 gpinitstandby ................................................................................ 650 gpinitsystem ................................................................................. 653 gpload ........................................................................................... 660 gplogfilter...................................................................................... 672 gpmapreduce ................................................................................ 675 gpmigrator .................................................................................... 677 gpmigrator_mirror ........................................................................ 680 gpperfmon_install ......................................................................... 683 gprecoverseg ................................................................................ 685 gpscp ............................................................................................ 690 gpseginstall ................................................................................... 692 gpsetupsanfailover ........................................................................ 695 Table of Contents xii
  • 13. Greenplum Database Administrator Guide 4.1 - Contents gpsnmpd ....................................................................................... 697 gpssh ............................................................................................ 700 gpssh-exkeys ................................................................................ 702 gpstart .......................................................................................... 705 gpstate.......................................................................................... 708 gpstop ........................................................................................... 712 gpsys1 .......................................................................................... 715 Appendix C: Client Utility Reference ............................................ 716 Client Utility Summary .................................................................. 718 clusterdb ....................................................................................... 727 createdb ........................................................................................ 729 createlang ..................................................................................... 731 createuser ..................................................................................... 733 dropdb .......................................................................................... 736 gp_db_interfaces .......................................................................... 738 droplang........................................................................................ 739 dropuser ....................................................................................... 741 ecpg .............................................................................................. 743 pg_config ...................................................................................... 745 pg_dump....................................................................................... 748 pg_dumpall ................................................................................... 755 pg_restore .................................................................................... 759 psql ............................................................................................... 764 reindexdb ...................................................................................... 787 vacuumdb ..................................................................................... 789 Appendix D: Server Configuration Parameters .......................... 792 add_missing_from ......................................................................... 794 application_name .......................................................................... 794 array_nulls .................................................................................... 794 authentication_timeout ................................................................. 794 backslash_quote ........................................................................... 794 block_size ..................................................................................... 794 bonjour_name ............................................................................... 794 check_function_bodies .................................................................. 794 client_encoding ............................................................................. 795 client_min_messages .................................................................... 795 cpu_index_tuple_cost ................................................................... 795 cpu_operator_cost ........................................................................ 795 cpu_tuple_cost .............................................................................. 795 cursor_tuple_fraction .................................................................... 795 custom_variable_classes ............................................................... 795 DateStyle ...................................................................................... 795 db_user_namespace ..................................................................... 796 deadlock_timeout .......................................................................... 796 debug_assertions .......................................................................... 796 debug_pretty_print ....................................................................... 796 debug_print_parse ........................................................................ 796 debug_print_plan .......................................................................... 796 Table of Contents xiii
  • 14. Greenplum Database Administrator Guide 4.1 - Contents debug_print_prelim_plan .............................................................. 796 debug_print_rewritten................................................................... 796 debug_print_slice_table ................................................................ 796 default_statistics_target ................................................................ 796 default_tablespace ........................................................................ 797 default_transaction_isolation......................................................... 797 default_transaction_read_only ...................................................... 797 dynamic_library_path.................................................................... 797 effective_cache_size ..................................................................... 797 enable_bitmapscan ....................................................................... 797 enable_groupagg .......................................................................... 797 enable_hashagg ............................................................................ 798 enable_hashjoin ............................................................................ 798 enable_indexscan .......................................................................... 798 enable_mergejoin ......................................................................... 798 enable_nestloop ............................................................................ 798 enable_seqscan ............................................................................. 798 enable_sort ................................................................................... 798 enable_tidscan .............................................................................. 798 escape_string_warning.................................................................. 798 explain_pretty_print ...................................................................... 799 extra_float_digits .......................................................................... 799 from_collapse_limit ....................................................................... 799 gp_adjust_selectivity_for_outerjoins ............................................. 799 gp_analyze_relative_error ............................................................. 799 gp_autostats_mode ...................................................................... 800 gp_autostats_on_change_threshold .............................................. 800 gp_cached_segworkers_threshold ................................................. 800 gp_command_count ...................................................................... 800 gp_connectemc_mode .................................................................. 801 gp_connections_per_thread .......................................................... 801 gp_content .................................................................................... 801 gp_dbid ......................................................................................... 801 gp_debug_linger ........................................................................... 801 gp_email_from .............................................................................. 801 gp_email_smtp_password ............................................................. 801 gp_email_smtp_server .................................................................. 801 gp_email_smtp_userid .................................................................. 801 gp_email_to .................................................................................. 802 gp_enable_adaptive_nestloop ....................................................... 802 gp_enable_agg_distinct ................................................................ 802 gp_enable_agg_distinct_pruning ................................................... 802 gp_enable_direct_dispatch ............................................................ 802 gp_enable_fallback_plan ............................................................... 802 gp_enable_fast_sri ........................................................................ 802 gp_enable_gpperfmon .................................................................. 803 gp_enable_groupext_distinct_gather ............................................ 803 gp_enable_groupext_distinct_pruning ........................................... 803 gp_enable_multiphase_agg ........................................................... 803 Table of Contents xiv
  • 15. Greenplum Database Administrator Guide 4.1 - Contents gp_enable_predicate_propagation ................................................. 803 gp_enable_preunique .................................................................... 803 gp_enable_sequential_window_plans ............................................ 803 gp_enable_sort_distinct ................................................................ 803 gp_enable_sort_limit..................................................................... 804 gp_external_enable_exec.............................................................. 804 gp_external_grant_privileges ........................................................ 804 gp_external_max_segs ................................................................. 804 gp_fts_probe_interval ................................................................... 804 gp_fts_probe_threadcount ............................................................ 804 gp_fts_probe_timeout ................................................................... 804 gp_gpperfmon_send_interval ........................................................ 804 gp_hashjoin_tuples_per_bucket .................................................... 805 gp_interconnect_hash_multiplier................................................... 805 gp_interconnect_queue_depth ...................................................... 805 gp_interconnect_setup_timeout .................................................... 805 gp_interconnect_type.................................................................... 805 gp_log_format .............................................................................. 805 gp_max_csv_line_length ............................................................... 805 gp_max_databases ....................................................................... 806 gp_max_filespaces ........................................................................ 806 gp_max_local_distributed_cache ................................................... 806 gp_max_packet_size ..................................................................... 806 gp_max_tablespaces ..................................................................... 806 gp_motion_cost_per_row .............................................................. 806 gp_num_contents_in_cluster ........................................................ 806 gp_reject_percent_threshold......................................................... 806 gp_reraise_signal .......................................................................... 806 gp_resqueue_memory_policy ........................................................ 806 gp_resqueue_priority .................................................................... 806 gp_resqueue_priority_cpucores_per_segment .............................. 807 gp_resqueue_priority_sweeper_interval ........................................ 807 gp_role ......................................................................................... 807 gp_safefswritesize ......................................................................... 807 gp_segment_connect_timeout ...................................................... 807 gp_segments_for_planner ............................................................. 807 gp_session_id ............................................................................... 808 gp_set_proc_affinity ..................................................................... 808 gp_set_read_only ......................................................................... 808 gp_snmp_community .................................................................... 808 gp_snmp_monitor_address ........................................................... 808 gp_snmp_use_inform_or_trap ...................................................... 808 gp_statistics_pullup_from_child_partition ..................................... 808 gp_statistics_use_fkeys ................................................................ 808 gp_vmem_idle_resource_timeout ................................................. 808 gp_vmem_protect_limit ................................................................ 809 gp_vmem_protect_segworker_cache_limit .................................... 809 gp_workfile_checksumming .......................................................... 809 gp_workfile_compress_algorithm .................................................. 809 Table of Contents xv
  • 16. Greenplum Database Administrator Guide 4.1 - Contents gpperfmon_port ............................................................................ 809 integer_datetimes ......................................................................... 809 IntervalStyle ................................................................................. 810 join_collapse_limit ........................................................................ 810 krb_caseins_users ......................................................................... 810 krb_server_keyfile ........................................................................ 810 krb_srvname ................................................................................. 810 lc_collate ....................................................................................... 810 lc_ctype ........................................................................................ 810 lc_messages ................................................................................. 811 lc_monetary .................................................................................. 811 lc_numeric .................................................................................... 811 lc_time .......................................................................................... 811 listen_addresses ........................................................................... 811 local_preload_libraries .................................................................. 811 log_autostats ................................................................................ 811 log_connections ............................................................................ 812 log_disconnections ........................................................................ 812 log_dispatch_stats ........................................................................ 812 log_duration .................................................................................. 812 log_error_verbosity ....................................................................... 812 log_executor_stats ........................................................................ 812 log_hostname ............................................................................... 812 log_min_duration_statement ........................................................ 812 log_min_error_statement .............................................................. 813 log_min_messages ........................................................................ 813 log_parser_stats ........................................................................... 813 log_planner_stats.......................................................................... 813 log_rotation_age ........................................................................... 813 log_rotation_size ........................................................................... 813 log_statement ............................................................................... 813 log_statement_stats ..................................................................... 814 log_timezone ................................................................................ 814 log_truncate_on_rotation .............................................................. 814 maintenance_work_mem .............................................................. 814 max_appendonly_tables................................................................ 814 max_connections .......................................................................... 815 max_files_per_process .................................................................. 815 max_fsm_pages ............................................................................ 815 max_fsm_relations ........................................................................ 815 max_function_args ....................................................................... 815 max_identifier_length ................................................................... 815 max_index_keys ........................................................................... 815 max_locks_per_transaction ........................................................... 816 max_prepared_transactions .......................................................... 816 max_resource_portals_per_transaction ......................................... 816 max_resource_queues .................................................................. 816 max_stack_depth .......................................................................... 816 max_statement_mem ................................................................... 817 Table of Contents xvi
  • 17. Greenplum Database Administrator Guide 4.1 - Contents max_work_mem ........................................................................... 817 password_encryption .................................................................... 817 pljava_classpath ........................................................................... 817 pljava_statement_cache_size ........................................................ 817 pljava_release_lingering_savepoints ............................................. 817 pljava_vmoptions .......................................................................... 818 port ............................................................................................... 818 random_page_cost........................................................................ 818 regex_flavor .................................................................................. 818 resource_cleanup_gangs_on_wait ................................................. 818 resource_select_only..................................................................... 818 search_path .................................................................................. 818 seq_page_cost .............................................................................. 819 server_encoding ............................................................................ 819 server_version .............................................................................. 819 server_version_num ..................................................................... 819 shared_buffers .............................................................................. 819 shared_preload_libraries ............................................................... 819 ssl ................................................................................................. 819 ssl_ciphers .................................................................................... 819 standard_conforming_strings ........................................................ 819 statement_mem ............................................................................ 820 statement_timeout ........................................................................ 820 stats_queue_level ......................................................................... 820 superuser_reserved_connections .................................................. 820 tcp_keepalives_count .................................................................... 820 tcp_keepalives_idle ....................................................................... 820 tcp_keepalives_interval ................................................................. 820 temp_buffers ................................................................................ 820 TimeZone ...................................................................................... 821 timezone_abbreviations ................................................................ 821 track_activities .............................................................................. 821 track_counts ................................................................................. 821 transaction_isolation ..................................................................... 821 transaction_read_only ................................................................... 821 transform_null_equals................................................................... 821 unix_socket_directory ................................................................... 821 unix_socket_group ........................................................................ 821 unix_socket_permissions .............................................................. 822 update_process_title ..................................................................... 822 vacuum_cost_delay ....................................................................... 822 vacuum_cost_limit ........................................................................ 822 vacuum_cost_page_dirty .............................................................. 822 vacuum_cost_page_hit ................................................................. 822 vacuum_cost_page_miss .............................................................. 822 vacuum_freeze_min_age .............................................................. 823 work_mem .................................................................................... 823 Appendix E: Greenplum MapReduce Specification ................... 824 Table of Contents xvii
  • 18. Greenplum Database Administrator Guide 4.1 - Contents Greenplum MapReduce Document Format ..................................... 824 Greenplum MapReduce Document Schema ................................... 827 Example Greenplum MapReduce Document................................... 838 MapReduce Flow Diagram ........................................................ 845 Appendix F: Greenplum Environment Variables ........................ 846 Required Environment Variables.................................................... 846 Optional Environment Variables .................................................... 847 Appendix G: Greenplum Database Data Types .......................... 849 Appendix H: System Catalog Reference ...................................... 852 gp_configuration_history ............................................................... 855 gp_distributed_log ........................................................................ 856 gp_distributed_xacts ..................................................................... 857 gp_distribution_policy ................................................................... 858 gp_fastsequence ........................................................................... 859 gp_fault_strategy .......................................................................... 860 gp_global_sequence ...................................................................... 861 gpexpand.status ........................................................................... 862 gpexpand.status_detail ................................................................. 863 gp_id ............................................................................................ 865 gp_interfaces ................................................................................ 866 gp_master_mirroring .................................................................... 867 gp_persistent_database_node ....................................................... 868 gp_persistent_filespace_node ....................................................... 869 gp_persistent_relation_node ......................................................... 870 gp_persistent_tablespace_node .................................................... 871 gp_relation_node .......................................................................... 872 gp_san_configuration .................................................................... 873 gp_segment_configuration ............................................................ 875 gp_pgdatabase ............................................................................. 876 gp_transaction_log........................................................................ 877 gp_version_at_initdb..................................................................... 878 gpexpand.expansion_progress ...................................................... 879 pg_aggregate ................................................................................ 880 pg_am .......................................................................................... 881 pg_amop ....................................................................................... 883 pg_amproc .................................................................................... 884 pg_appendonly ............................................................................. 885 pg_attrdef ..................................................................................... 887 pg_attribute .................................................................................. 888 pg_auth_members ........................................................................ 890 pg_authid...................................................................................... 891 pg_autovacuum ............................................................................ 892 pg_cast ......................................................................................... 893 pg_class ........................................................................................ 894 pg_constraint ................................................................................ 897 pg_conversion ............................................................................... 898 pg_database ................................................................................. 899 Table of Contents xviii
  • 19. Greenplum Database Administrator Guide 4.1 - Contents pg_depend .................................................................................... 901 pg_description .............................................................................. 902 pg_exttable ................................................................................... 903 pg_filespace .................................................................................. 904 pg_filespace_entry ........................................................................ 905 pg_index ....................................................................................... 906 pg_inherits .................................................................................... 908 pg_language ................................................................................. 909 pg_largeobject .............................................................................. 910 pg_listener .................................................................................... 911 pg_locks........................................................................................ 912 pg_namespace .............................................................................. 914 pg_opclass .................................................................................... 915 pg_operator .................................................................................. 916 pg_partition .................................................................................. 917 pg_partition_columns .................................................................... 918 pg_partition_rule .......................................................................... 919 pg_partition_templates ................................................................. 920 pg_partitions ................................................................................. 921 pg_pltemplate ............................................................................... 923 pg_proc......................................................................................... 924 pg_resqueue ................................................................................. 926 pg_resourcetype ........................................................................... 927 pg_resqueue_attributes ................................................................ 928 pg_resqueue_status - Deprecated ................................................. 929 pg_resqueuecapability................................................................... 930 pg_rewrite .................................................................................... 931 pg_roles ........................................................................................ 932 pg_shdepend ................................................................................ 933 pg_shdescription ........................................................................... 934 pg_stat_activity ............................................................................ 935 pg_stat_operations ....................................................................... 936 pg_stat_partition_operations ........................................................ 937 pg_stat_resqueues ........................................................................ 938 pg_stat_last_operation ................................................................. 939 pg_stat_last_shoperation .............................................................. 940 pg_statistic ................................................................................... 941 pg_tablespace ............................................................................... 943 pg_trigger ..................................................................................... 944 pg_type ........................................................................................ 945 pg_window .................................................................................... 948 Appendix I: The gp_toolkit Administrative Schema ................... 950 Checking for Tables that Need Routine Maintenance ..................... 950 gp_bloat_diag .......................................................................... 951 gp_stats_missing ..................................................................... 951 Checking for Locks ........................................................................ 951 gp_locks_on_relation ............................................................... 952 gp_locks_on_resqueue ............................................................ 952 Table of Contents xix
  • 20. Greenplum Database Administrator Guide 4.1 - Contents Viewing Greenplum Database Server Log Files .............................. 953 gp_log_command_timings ....................................................... 953 gp_log_database ..................................................................... 954 gp_log_master_concise ........................................................... 955 gp_log_system ........................................................................ 955 Checking Server Configuration Files .............................................. 956 gp_param_setting('parameter_name')..................................... 957 gp_param_settings_seg_value_diffs ........................................ 957 Checking for Failed Segments ....................................................... 957 gp_pgdatabase_invalid ............................................................ 957 Checking Resource Queue Activity and Status ............................... 958 gp_resq_activity ...................................................................... 958 gp_resq_activity_by_queue ..................................................... 959 gp_resq_priority_statement..................................................... 959 gp_resq_role ........................................................................... 959 gp_resqueue_status ................................................................ 960 Viewing Users and Groups (Roles)................................................. 960 gp_roles_assigned ................................................................... 961 Checking Database Object Sizes and Disk Space ........................... 961 gp_size_of_all_table_indexes .................................................. 962 gp_size_of_database ............................................................... 962 gp_size_of_index ..................................................................... 962 gp_size_of_partition_and_indexes_disk ................................... 963 gp_size_of_schema_disk ......................................................... 963 gp_size_of_table_and_indexes_disk ........................................ 963 gp_size_of_table_and_indexes_licensing ................................. 964 gp_size_of_table_disk ............................................................. 964 gp_size_of_table_uncompressed ............................................. 964 gp_disk_free ............................................................................ 965 Checking for Uneven Data Distribution .......................................... 965 gp_skew_coefficients ............................................................... 965 gp_skew_idle_fractions ........................................................... 966 Appendix J: Oracle Compatibility Functions .............................. 967 Installing Oracle Compatibility Functions ....................................... 967 Oracle and Greenplum Implementation Differences ....................... 967 Available Oracle Compatibility Functions ....................................... 968 decode .......................................................................................... 969 nvl ................................................................................................ 972 Appendix K: Character Set Support ............................................. 973 Setting the Character Set .............................................................. 974 Character Set Conversion Between Server and Client.................... 975 Appendix L: SQL 2008 Optional Feature Compliance................ 978 Glossary .......................................................................................... 999 Index ............................................................................................. 1008 Table of Contents xx
  • 21. Greenplum Database Administrator Guide 4.1 Preface Preface This guide provides information for system administrators and database superusers responsible for administering a Greenplum Database system. About This Guide Document Conventions Getting Support About This Guide This guide provides information and instructions for configuring, maintaining and using a Greenplum Database system. This guide is intended for system and database administrators responsible for managing a Greenplum Database system. This guide assumes knowledge of Linux/UNIX system administration, database management systems, database administration, and structured query language (SQL). Because Greenplum Database is based on PostgreSQL 8.2.15, this guide assumes some familiarity with PostgreSQL. Links and cross-references to PostgreSQL documentation are provided throughout this guide for features that are similar to those in Greenplum Database. This guide contains the following main sections: Section I, Introduction to Greenplum explains the distributed architecture and parallel processing concepts of Greenplum Database. Section II, Access Control and Security explains how clients connect to a Greenplum Database system, and how to configure access control and workload management. Section III, Database Administration explains how to do basic database administration tasks such as defining database objects, loading data, writing queries and managing data. Section IV, System Administration explains the various system administration tasks of Greenplum Database such as configuring the server, monitoring system activity, enabling high-availability, backing up and restoring databases, and other routine system administration tasks. Section V, Performance Tuning provides guidance on identifying and troubleshooting the most common causes of performance issues in Greenplum Database. Section VI, Extending Greenplum Database describes how to extend the functionality of Greenplum Database by developing your own functions and programs. Section VII, References contains reference documentation for SQL commands, command-line utilities, client programs, system catalogs, and configuration parameters. About This Guide 1
  • 22. Greenplum Database Administrator Guide 4.1 Preface Document Conventions The following conventions are used throughout the Greenplum Database documentation to help you identify certain types of information. Text Conventions Command Syntax Conventions Text Conventions Table 0.1 Text Conventions Text Convention Usage Examples bold Button, menu, tab, page, and field names in GUI applications Click Cancel to exit the page without saving your changes. italics New terms where they are defined The master instance is the postgres process that accepts client connections. Database objects, such as schema, table, or columns names Catalog information for Greenplum Database resides in the pg_catalog schema. File names and path names Edit the postgresql.conf file. Programs and executables monospace Use gpstart to start Greenplum Database. Command names and syntax Parameter names monospace italics Variable information within file paths and file names Variable information within command syntax monospace bold /home/gpadmin/config_file COPY tablename FROM 'filename' Used to call attention to a particular Change the host name, port, and part of a command, parameter, or database name in the JDBC code snippet. connection URL: jdbc:postgresql://host:5432/m ydb UPPERCASE Environment variables SQL commands Keyboard keys Document Conventions Make sure that the Java /bin directory is in your $PATH. SELECT * FROM my_table; Press CTRL+C to escape. 2
  • 23. Greenplum Database Administrator Guide 4.1 Preface Command Syntax Conventions Table 0.2 Command Syntax Conventions Text Convention Usage Examples { } Within command syntax, curly braces group related command options. Do not type the curly braces. FROM { 'filename' | STDIN } [ ] Within command syntax, square brackets denote optional arguments. Do not type the brackets. TRUNCATE [ TABLE ] name ... Within command syntax, an ellipsis DROP TABLE name [, ...] denotes repetition of a command, variable, or option. Do not type the ellipsis. | Within command syntax, the pipe symbol denotes an OR relationship. Do not type the pipe symbol. VACUUM [ FULL | FREEZE ] $ system_command Denotes a command prompt - do not type the prompt symbol. $ and # denote terminal command prompts. => and =# denote Greenplum Database interactive program command prompts (psql or gpssh, for example). $ createdb mydatabase # root_system_command => gpdb_command =# su_gpdb_command # chown gpadmin -R /datadir => SELECT * FROM mytable; =# SELECT * FROM pg_database; Getting Support EMC support, product, and licensing information can be obtained as follows. Product information For documentation, release notes, software updates, or for information about EMC products, licensing, and service, go to the EMC Powerlink website (registration required) at: http://Powerlink.EMC.com Getting Support 3
  • 24. Greenplum Database Administrator Guide 4.1 Preface Technical support For technical support, go to Powerlink and choose Support. On the Support page, you will see several options, including one for making a service request. Note that to open a service request, you must have a valid support agreement. Please contact your EMC sales representative for details about obtaining a valid support agreement or with questions about your account. Getting Support 4
  • 25. Section I: Introduction to Greenplum Greenplum Database is a massively parallel processing (MPP) database server based on PostgreSQL open-source technology. MPP (also known as a shared nothing architecture) refers to systems with two or more processors which cooperate to carry out an operation - each processor with its own memory, operating system and disks. Greenplum leverages this high-performance system architecture to distribute the load of multi-terabyte data warehouses, and is able to use all of a systems resources in parallel to process a query. Greenplum Database is essentially several PostgreSQL database instances acting together as one cohesive database management system. It is based on PostgreSQL 8.2.15, and in most cases is very similar to PostgreSQL with regards to SQL support, features, configuration options, and end-user functionality. Database users interact with Greenplum Database as they would a regular PostgreSQL DBMS. The internals of PostgreSQL have been modified or supplemented to support the parallel structure of Greenplum Database. For example the system catalog, query planner, optimizer, query executor, and transaction manager components have been modified and enhanced to be able to execute queries in parallel across all of the PostgreSQL database instances at once. The Greenplum interconnect (the networking layer) enables communication between the distinct PostgreSQL instances and allows the system to behave as one logical database. Greenplum Database also includes features designed to optimize PostgreSQL for business intelligence (BI) workloads. For example, Greenplum has added parallel data loading (external tables), resource management, query optimizations and storage enhancements which are not found in regular PostgreSQL. Many features and optimizations developed by Greenplum do make their way back into the PostgreSQL community. For example, table partitioning is a feature developed by Greenplum which is now in standard PostgreSQL. To learn more about Greenplum Database, refer to the following topics: About the Greenplum Architecture About Distributed Databases About Greenplum Query Processing Summary of Greenplum Features Section I 5
  • 26. Greenplum Database Administrator Guide 4.1 Chapter 1: About the Greenplum Architecture 1. About the Greenplum Architecture Greenplum Database is able to handle the storage and processing of large amounts of data by distributing the load across several servers or hosts. A database in Greenplum is actually an array of individual PostgreSQL databases, all working together to present a single database image. The master is the entry point to the Greenplum Database system. It is the database instance where clients connect and submit SQL statements. The master coordinates the work with the other database instances in the system, the segments, which handle data processing and storage. Figure 1.1 High-Level Greenplum Database Architecture This section describes all of the components that comprise a Greenplum Database system, and how they work together: About the Greenplum Master About the Greenplum Segments About the Greenplum Interconnect About Redundancy and Failover in Greenplum Database About Parallel Data Loading About Management and Monitoring 6
  • 27. Greenplum Database Administrator Guide 4.1 Chapter 1: About the Greenplum