teradata query optimization guidelines

Upload: puneetswarnkar

Post on 03-Mar-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 Teradata Query Optimization Guidelines

    1/13

  • 7/26/2019 Teradata Query Optimization Guidelines

    2/13

    Introduction

    Optimization is the technique of selecting the least expensive plan (fastest plan) forthe query to fetch results. The optimizer considers the possible query plans for agiven input query, and attempts to determine which of those plans will be the most

    ecient.

    Teradata performance tuning is a technique of improving the process in order forquery to perform faster with the minimal use of !"# resources.

    The typical goal of an $%& optimization is to get the result (data set) with lesscomputing resources consumed and'or with shorter response time.

    Query Optimization Process

    The following processes list the logical sequence of the processes undertaen by theOptimizer as it optimizes a *& request. The processes that are listed here do not

    include the in+uence of parameterized value peeing to determine whether the

    Optimizer should generate a specic plan or a generic plan for a given request.

    The input to the Optimizer is the %uery -ewrite -esTree. The Optimizer then

    produces the optimized white tree, which it passes to an Optimizer subcomponent

    called the /enerator.

    The Optimizer engages in the following process stages.

    1. -eceives the %uery -ewrite -esTree as input.2. "rocesses correlated sub queries by converting them to unnested $0&0!Ts or simple

    1oins.3. "rocesses non2correlated subqueries by materializing the subquery and placing its

    value in the #$34/ row for the query regardless of whether the subquery is on the

    &5$ or the -5$ of the operator in the predicate.4. $earches for a relevant 1oin or hash index.5. *aterializes subqueries to spool les.6. 6nalyzes the materialized subqueries for optimization possibilities.

    a. $eparates conditions from one another.b. "ushes down predicates.c. /enerates connection information.d. &ocates any complex 1oins.e. iscovers aggregations and opportunities for partial group by optimizations

    7. /enerates size and content estimates of spool les required for further processing.8. /enerates an optimal single2table access path.9. $implies and optimizes any complex 1oins identied in stage 7d.10.*aps 1oin columns from a 1oin (spool) relation to the list of eld 3s from the input

    base tables to prepare the relation for 1oin planning.

  • 7/26/2019 Teradata Query Optimization Guidelines

    3/13

    11./enerates information about local connections. 6 connecting condition is one that

    connects an outer query and a subquery. 6 direct connection exists between two

    tables if either of the following conditions is found.

    64ed bind term8 miscellaneous terms such as inequalities, 64s, and O-s9 cross,outer, or minus 1oin term that satises the dependent information between the twotables6 spool le of an uncorrelated subquery 0:3$T predicate that connects with anyouter table

    12./enerates information about indexes that might be used in 1oin planning, including

    the primary indexes for the relevant tables and pointers to the table descriptors of

    any other useful indexes.13."erforms row and column partition elimination for partitioned tables.14.#ses a recursive greedy ;2table looahead algorithm to generate the best 1oin plan.15.3f the 1oin plan identied in step;< does not meet the heuristics2based criteria for

    an adequate 1oin plan, generate another best 1oin plan using an n2table looahead

    algorithm.16.$elects the better 1oin plan of the two plans generated in steps ;.19."asses the optimized white tree to the /enerator.

    The /enerator then generates plastic steps for the plan chosen in step ;?.

    MethodologiesOptimization is one the most taled about technique in today@s time for Teradata.

    Aecause of the huge amount of data in Teradata database, it becomes very

    important to tae out the optimized performance from it, otherwise the queries will

    perform poorly and the meaning of parallelism will be lost.

    3n order to select the least expensive plan for the query to fetch results, mentioned

    techniques or practices can be followed8

    (1) STATISTICS

    !ollecting statistics is one of the most primary steps in Teradata query Optimization.

    $tatistics collection is essential for the optimal performance of the Teradata query

    optimizer. The query optimizer relies on statistics to help it determine the best way

    to access data. $tatistics also help the optimizer ascertain how many rows exist in

    tables being queried and predict how many rows will qualify for given conditions.

  • 7/26/2019 Teradata Query Optimization Guidelines

    4/13

    &ac of statistics, or out2dated statistics, might result in the optimizer choosing a

    less2than2optimal method for accessing data tables.

    6lso, statistics help Teradata determine the spool le size needed to contain the

    resulting data. 6ccurate statistics could mae the diBerence between a successful

    query and a query that runs out of spool space.Syntax8

    To chec whether the $tatistics dened for the table85elp stats tableCname9

    To collect or refresh the statistics8!ollect stats on tableCname Dindex'columnE(colCname, colCname, F)9

    DIAG!STIC STAT"#"T36/4O$T3! 50&"$T6T$ O4 GO- $0$$3O4

    The above statement can be used to determine the stats that might be

    required to improve the performance of the $%&. The 0:"&634 plan needs to be

    executed following the above statement to nd the stats suggestion.

    $tats will qualify one of the below condence levels8

    ;) 4o !ondence 2 no statistics dened for a table.H) &ow !ondence 2 $tats are dicult to use precisely.I) 5igh !ondence 2 Optimizer is sure of results based on the stats available.

    $tatistics need to be collected for8

    ;. 6ll non2unique indexes.H. #"3 of small tables (tables with less than x rows per 6*", depends on

    6vailable number of 6*"s)I. 6ll indexes of a 1oin index

  • 7/26/2019 Teradata Query Optimization Guidelines

    5/13

    6lways collect statistics at the column level even when collecting on an index. This

    is because indexes can be dropped at any time, so they are often dropped and

    recreated.

    Jhen to collect $tatistics8

    6fter the following8

    ;. Gast loadsH. *ulti loadsI. 4on2utility (T"ump'AT0%'OA!'NA!) !ollect statistics after a signicant

    percentage of data values have changed.

  • 7/26/2019 Teradata Query Optimization Guidelines

    6/13

    $0&0!T *.0*"C3,*.0*"C46*0, K4. 0"TC0$!, !O#4T(S) G-O*0*"&O00 *, 0"6-T*04T K4J50-0 *.0"TC3 K4. 0"TC364 K4.&O! U#$6U64 *.0"TC3 34

    (UHLV>U,UIL?U,UU,UV7=@,@HV7

  • 7/26/2019 Teradata Query Optimization Guidelines

    7/13

    Jhile 1oining two tables mae sure that both the columns fall under the same

    character set. Otherwise implicit conversion of one to the other taes place resulting

    in poor performance.

    (7) DAT" C!#+AIS!

    Jhen comparing values of date in a particular range, the query may result inproduct 1oin.

    This can be avoided with the usage of $$C!6&046-.!6&046-, which is

    TeradataUs in2built database.

    0xample8

    3nsert into tableCa select tH.a;,tH.aH,tH.aI,tH.a< from

    tableCH tH 1oin tableCI tI on tH.a;tI.a; and tH.a=CdtRtI.a

  • 7/26/2019 Teradata Query Optimization Guidelines

    8/13

    tH.a;, tH.aH, tH.aI, tH.a< from

    tableCH tH 1oin tableCI tI on tH.a;tI.a; 1oin tableC< t