03 join strategy

39
Teradata V13 Join Strategy LEVEL – LEARNER

Upload: anjali-kalra

Post on 24-Sep-2015

18 views

Category:

Documents


3 download

DESCRIPTION

teradata

TRANSCRIPT

Content_Development_Template_Learner

Teradata V13Join StrategyLEVEL LEARNERSujata Datta (173236)5 years of Teradata hands on experience in WellPoint accountNR 011(Basic SQL) and NR 013(Teradata SQL Assistant)TD/PPT/2012-09-10/1.02About the AuthorCreated By:Credential Information: Version and Date:Cognizant Certified Official CurriculumIcons Used

QuestionsDemonstration

Hands on ExerciseCoding Standards

A Welcome BreakTools

3ReferenceTest Your Understanding

Contacts

Introduction:Teradata supports a number of Join strategies. The join strategy for a query is chosen by the optimizer at compile time. The optimizer chooses by evaluating the relative cost of each possible strategy and choosing the best. Most Teradata joins operate on two tables at a time. The optimizer builds a query plan out of successive two table joins until the result relation has been built.

In this chapter you will learn the various kind of join strategies.

Join Strategy - Overview4This chapter will give a brief idea of the following:

-- Join types-- what is meant by join-- different types of join strategies such as Hash Join, Merge Join, Nested Join,Exclusion Join,Product Join

--Define usage of join index and hash Index-- understand how Teradata chooses the best join strategy

Join Strategy - Objective5Teradata Optimizer has the ability to interpret a users join types and then make decisions on what is the best path or join strategy to take in order complete the query.

Teradata allows up to 64 tables to be joined in a single query. Some of the common join types are:Inner2. Outer (Left, Right, Full)3. Self4. CrossCartesian

When user inputs a join type, Teradata will then utilize below join strategies to perform the join.1. Merge Join2. Nested Join3. Hash Join4. Product joinJoin Strategy Vs. Join Types6Each AMP holds a portion of a table.

Teradata uses the Primary Index to distribute the rows among the AMPs.

Each AMP keeps their tables separated from other tables like someone might keep clothes in a dresser drawer.

For a JOIN to take place the two rows being joined must find a way to get to the same AMP.

If the rows to be joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data in spool to make that happen.

Each AMP sorts their tables by Row ID.

Teradata Join Concept and key things7The 1st merge join will utilize the Primary Index on both tables in the join equality.The key here is that both of the Primary Index columns of each table are used in the WHERE or ON clause in the join type.

SELECT E1.EMP, E2.DEPT, E1.Name, E2.SalaryFROM EMPLOYEE1 E1INNER JOIN EMPLOYEE2 E2ON E1.EMP=E2.EMP;EMP is the Primary Index for both tables. This first merge join type is extremely efficient because both columns in the ON clause are the Primary Indexes of their respective tables. When this occurs, NO data has to be moved into spool and the joins can be performed in what is called AMP LOCAL.

Merge Join Strategy - 18Merge Join Strategy - 19

The inner join above focuses on returning all rows when there is a match between the two tables.

Teradata can perform this join with rapid speed.

This merge join is performed on a Primary Index column(DEPT) of one table (DEPARTMENT) to a non-primary indexed column of another table( EMPLOYEE).

Merge Join Strategy - 210

SELECT EMP, E.Dept, Dept_NameFROM EMPLOYEE E INNER JOIN DEPARTMENT DON E.DEPT=D.DEPT;Merge Join Strategy - 211

The department table that has an equality condition match on the Primary Index Column would be stationary on the AMP.

The next step would be to move the rows from the Employee table into spool.This would be accomplished by hashing (locating) the columns in the employee table, and then moving these rows into spool to the appropriate AMPs where the department table rows reside.Merge Join Strategy - 312

Strategy 3 happens when neither table is being joined on the primary index.In this case Teradata will redistribute both tables into spool and sort them by hash code.When we want to redistribute and sort by hash code we merely hash the non-primary index columns and move them to the AMPs spool where they are sorted by hash. Once this is accomplished, then the appropriate rows are together in spool on all the AMPs.The Primary Index of the department table is DEPT and the Primary Index for the manager table is LOC.Merge Join Strategy - 313

SELECT LOC, Dept_Name,BudgetFROM DEPARTMENT INNER JOIN MANAGEROn MgrEmp=MgrNo;

Basically rows from both tables will need to be rehashed and redistributed into SPOOL.

The reason is because neither columns selected in the ON Clause are the Primary Index of the respective tables. Therefore, both tables are redistributed based on the ON clause.

Merge Join Strategy - 314The next step in this process is to redistributed the rows and locate them to the matching AMPs.

When this is completed, the rows from both tables will be located in two different spools. Lastly, the rows in each spool will be joined together to bring back the matching rows.

This type of join strategy is extremely inefficient.

It consumes a ton of resources and time to manage and assemble this type of join.Merge Join Strategy - 415

This type of Join Strategy is taken for Small Table Big Table Join.

If one of the tables being joined is small, then Teradata may choose a strategy that will duplicate the smaller table across all the AMPs.

The key about this strategy is that regardless if the table is part of the Primary Index Column or not Teradata could still choose to duplicate the table across all the AMPs.Merge Join Strategy - 416SELECT EMP, Salary , Dept_nameFROM EMPLOYEE EINNER JOIN DEPARTMENT DON e.dept=d.dept;

In this inner join above, the two tables involved in the join are the Employee table and the Department table. The DEPT column is the join equality and is the Primary Index Column in the Department table. The Employee table has the EMP column as the Primary Index. The final analysis of this join is that the Department table is small and makes a good candidate for this type of join strategy.Teradata will choose to duplicate the entire Department table on each AMP into spool. Once this is completed, then the next step is for the AMPs to join the base Employee rows with the Department rows.

Merge Join Strategy - 417

Instead of redistributing the larger Employee table, which is not part of the Primary Index Column in the equality (ON) condition.

Strategy of Teradata would be to duplicate the smaller table across all the AMPs (Big Table -Small Table Join).

This merge join strategy will consume minimal resources, and allow for Teradata to excel.

A Hash Join can only take place if one or both of the tables on each AMP can fit completely inside the AMPs memory.SELECT Emp, DeptName, MgrEmp FROM Employee_Table INNER JOIN Department_Table ON Emp = MgrEmp;

Hash Join Strategy18

In the example, the join condition is based on EMP and MgrEmp column and the column names are not same.Now the smaller table( Department) among these two will be sorted by row hash and duplicated in each AMP.Teradata will use the join column of larger table for a match in duplicated smaller table records.Rows never get into spool and no disk intervention which increases performance.

Hash Join Strategy - Example19

This join is designed to utilize a unique index type (Either Unique Primary Index or Unique Secondary Index) from one of the tables in the join statement in order to retrieves a single row. It then matches that row to one or more rows on the other table being used in the join.From the example below, the nested join has the join equality (ON) condition based on the DEPT column. The dept column is the Primary Index Column on the department table. In addition, the dept column is the Secondary Index Column in the employee table.

SELECT Emp, Salary , DeptNameFROM Employee_Table eINNER JOIN Department_Table dON e.dept= d.deptWHERE d.dept=10;

Nested join Strategy20Since there is only one row in the department table that has a match for department =10, which is based on the AND option in the join statement, the Teradata Optimizer will choose a path to move the department table columns into spool and duplicate them across all the AMPs.

Nested join Strategy- Example21

SELECT EMP, Dept, NameFROM EMPLOYEE EWHERE E.dept='10'AND EMP NOT IN ( SELECT MgrEmp FROM DEPARTMENT WHERE MgrEmp IS NOT NULL);

The above join exclude rows during the join, instead of finding matching rows between the joined tables.Exclusion joins are used for finding rows that dont have a matching row in the other table. Queries with the NOT IN operator are the types of queries that always result in exclusion joins. In this case, this query will find all the employees who belong to department 10 who are NOT managers.

Exclusion Join Strategy22These joins will always involve a Full Table Scan because Teradata will need to compare every record to eliminate rows that will need to be excluded.

This type of join can be resource intensive if the two tables in this comparison are large.

NULLs are considered as unknowns so the data returned in the answer will be NULLs because of the NOT IN statement.

There are two ways to correct this:Define NOT IN columns as NOT NULL on the CREATE TABLE.Add the AND WHERE Column IS NOT NULL to the end of the JOIN as seen in the above example.

Exclusion Join Strategy23Product Joins compare every row of one table to every row of another table.The result of this join is a product of the number of rows in table one multiplied by the number of rows in table two. About 99% of the time, product joins are major mistakes, because all rows in both tables will be compared. SELECT EMP, E.Dept, Name , DeptNameFROM EMPLOYEE E, DEPARTMENT DWHERE EMP LIKE '_b%';To avoid a product join, the join should be based on an EQUALITY condition.The equality statement reads WHERE EMP Like _b%, but no common domain condition between the two tables (i.e., e.dept = d.dept).Another cause of a product join is when aliases are not used after being established. Finally check your join syntax to ensure the WHERE clause is not missing.

Product Join24It is kind of Product join without even WHERE clause.This kind of join utilize all the spool space assigned and is huge performance bottleneck. SELECT EMP, E.Dept, Name , DeptNameFROM EMPLOYEE E, DEPARTMENT DWHERE EMP LIKE '_b%';To avoid a Cartesian product join, the join should be based on an EQUALITY condition. Or at least a WHERE condition should be present.The equality statement reads WHERE EMP Like _b%, but no common domain condition between the two tables (i.e., e.dept = d.dept).Another cause of a product join is when aliases are not used after being established. Finally check your join syntax to ensure the WHERE clause is not missing.

Cartesian Product Join25The Join index will join two tables and hold the result set in permanent space of Teradata. At the time of join, the Parsing Engine will decide whether it is fast to build the result set from Base tables or from the join index.It can be defined on one or several tables.The join index result set can not be accessed by the query directly, only PE can access the same.

Main fundamentals of Join Index are:

1. Join Index is not a pointer to data it actually store data in PERM space.2. Users never query them directly , its PE who decide which result set would be more suitable for efficient processing of data.3.Updated when base tables are changed.4. Can not be loaded with Fast load or Multi load. Join Index Concept26Not more than 64 columns can be defined per joined table in a join index.No more than 128 columns can be defined for a compressed join index definition.There is not limit on how many columns can be defined in an uncompressed join index other than system restrictions on the amount of SQL text required to define them.

Although the Optimizer substitutes only one multitable join index per referenced table in a query , it also considers additional single table join indexes for inclusion in the join plan after the optimal multitable join index has been substituted and evaluated for the plan.

Columns with BLOB and CLOB data types can not be included in a join index definition.

Join Index Restrictions27Multi table join Index:

CREATE JOIN INDEX EMP_DEPT AS SELECT EMP_NO, EMP_NAME, EMP_DEPT, EMP_SALFROM EMPLOYEE EMPINNER JOIN DEPARTMENT DEPTON EMP.DEPT_NO = DEPT.DEPT_NOUNIQUE PRIMARY INDEX (EMP_NO);

With above join index created, the result set of the join is held in perm space. At the time of actually joining the tables during execution, PE will refer to this result set for fast processing of data.

Join Index Types and Examples28Single table join Index:

CREATE JOIN INDEX EMP_SNAP AS SELECT EMP_NO, EMP_NAME, EMP_DEPT, EMP_SALFROM EMPLOYEE EMPPRIMARY INDEX (EMP_DEPT);

This join will duplicate a single table , but with a different primary index as mentioned in the index definition above. When user queries the base tables, PE will decide which one is faster based on the query issued. Obtaining the data from Join index result set or from base table directly.

Join Index Types and Examples Contd.29Aggregate join Index:

CREATE JOIN INDEX AGG_TableSELECT EMP_NO, SUM(EMP_SAL)FROM EMPLOYEE EMPGROUP BY 1;

This join will allow the tracking of Average , sum or count of a column in a table. This join helps PE to get aggregated data faster during query processing.

Join Index Types and Examples Contd.30Hash Index is used for same purpose as single table join index, i.e. generate the result set from the join and store in PERM space for PEs use.

Hash Index create a partial or full replication of Base table with a primary index on a foreign key column table to facilitate the joins of very large table by hashing them to the same AMP.

Hash Index can not work on aggregates like single table join index.

Hash Index can be defined on one table only.The result set generated by Hash Index cannot be accessed by the query.The rows of Hash Index are sometimes a little shorter than the Single table join index rows. Hence have a small storage advantage over the other.

Hash IndexPurpose31Excluding the Primary Index we can only define 32 indexes on table. These 32 indexes can be a combination of Hash, Secondary and Join Index.

Columns having BLOB and CLOB data types are not allowed in a hash index definition.

A hash index can not have partitioned data index.

Hash Index Limitation32This index is built for the table 'emp1' which is defined as follows:CREATE SET TABLE emp1 (employee_number INTEGER, manager_employee_number INTEGER, department_number INTEGER, job_code INTEGER, last_name CHAR(20) NOT NULL, first_name VARCHAR(30) NOT NULL, salary_amount DECIMAL(10,2) NOT NULL) UNIQUE PRIMARY INDEX ( employee_number );Example 1:CREATE HASH INDEX hash_1 (employee_number, department_number) ON emp1 BY (employee_number) ORDER BY HASH (employee_number);

Hash IndexExamples33Each hash index row contains the employee number, the department number.Specifying the employee number is unnecessary, since it is the primary index of the base table and will therefore be automatically included. The BY clause indicates that the rows of this index will be distributed by the employee_number hash value. The ORDER BY clause indicates that the index rows will be ordered on each AMP in sequence by the employee_number hash value. Example 2:The same hash index definition could have been abbreviated as follows:CREATE HASH INDEX hash_1 (employee_number, department_number) ON emp1;

The BY clause defaults to the primary index of the base table. The ORDER BY clause defaults to the order of the base table rows.

Hash IndexExamples Contd.34Questions35

Welcome Break36

Which Join strategy works for Equality join condition?Which Join Strategy works for Inequality join condition?Which Join Strategy works for filter condition?Maximum how many columns can be defined in a table as join index?What should be done to avoid product join?Test Your Understanding37

Teradata ForumRelease 13.10Source38

Disclaimer: Parts of the content of this course is based on the materials available from the Web sites and books listed above. The materials that can be accessed from linked sites are not maintained by Cognizant Academy and we are not responsible for the contents thereof. All trademarks, service marks, and trade names in this course are the marks of the respective owner(s).Teradata 13.0You have successfully completed - Join Strategy