presentation 1 (28 apr) new

49
Agenda y Teradata Architecture y T eradata Utilities( Client T ools ) BTEQ Fast Load F ast Export Multiload Tpump

Upload: tprithiru

Post on 08-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 1/49

Agenday Teradata Architecture

y Teradata Utilities(Client Tools)

BTEQFast Load

Fast Export

Multiload

Tpump

Page 2: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 2/49

BTEQ 

� BTEQ is available on every Teradata system ever built, Becausethe Basic Teradata Query Language(BTEQ) tool was the original

 way that SQL was submitted to Teradata as a means of getting ananswer set in a desired format.

� BTEQ is also an excellent tool for importing and exporting data.� BTEQ Sessions provides a quick and easy way to access aTeradata RDBMS. In a BTEQ session, you can do the following

- enter Teradata SQL statements to view, add, modify, anddelete data.

- enter BTEQ commands.

- enter operating system commands.� interactive mode -start a BTEQ session, and submit commands

to the database as needed.� batch mode -prepare scripts or macros, and then submit them

to BTEQ for processing.

Page 3: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 3/49

The BTEQ Command SetThe BTEQ command set can be categorized as:

y Session control - begin and end BTEQ sessions, and

control session characteristics.y File control - specify input and output formats and

identify information sources and destinations.

y Sequence control - control the sequence in which

other BTEQ commands and Teradata SQL statements will be executed within scripts and macros.

y Format control - control the format of screen andprinter output.

Page 4: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 4/49

Session Control Commands

Use the following BTEQ commands to begin, control, and endsessions.

LOGON - starts a BTEQ session.

SESSIONS - specify the number of sessions to use with the

next LOGON command.

LOGOFF - end the current sessions without exiting BTEQ.

EXIT or QUIT - end the current sessions and exit BTEQ.

 ABORT - abort any active requests and transactions

 without exiting BTEQ.

SHOW CONTROLS - display the current configuration of theBTEQ

control command options.

SHOW VERSIONS - display the BTEQ version number, module

revision numbers, and linking date.

Page 5: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 5/49

SESSION TRANSACTION - specify whether transactionboundaries are determined by Teradata SQL semantics or ANSI

semantics.COMPILE - create or replace a Teradata stored procedure.

File Control Commands

OS - execute an MS-DOS or UNIX command from within theBTEQ environment.

TSO - execute an MVS TSO command from within the BTEQ

environment.

RUN - execute Teradata SQL requests and BTEQ commands from

a specified run file.

REPEAT - submit the next request a specified number of times.

FORMAT - enable or inhibit the page-oriented format commandoptions.

Page 6: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 6/49

EXPORT - open a file with a specific format to transferinformation from the Teradata RDBMS.

INDICDATA /RECORDMODE - specify the response mode, eitherField mode, Indicator mode, or Record mode, for data selectedfrom the Teradata RDBMS.

Sequence Control Commands

HANG - pause BTEQ processing for a specified period of time.ERRORLEVEL - Assigns severity levels to errors.

IF...THEN.. - Tests the validity of the condition stated in the IFclause.

LABEL - Identifies the point at which BTEQ resumes processing,

as specified in a previous GOTO command.MAXERROR - Designates a maximum error severity level beyond

 which BTEQ terminates job processing.

REMARK - Places a specified string on the standard output stream.

Page 7: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 7/49

Format Control CommandsRETLIMIT - specify the maximum number of rows and/or

columns displayed or written in response to a SQL request.RETCANCEL - cancel a request when the value specified by 

the RETLIMIT command ROWS option is exceeded.

UNDERLINE - display a row of dash characters whenever the value of a specified column changes.

SKIPLINE - insert a blank line in a report whenever the valueof a specified column changes.

IMPORT - open a file with a specific format to transferinformation to the Teradata RDBMS.

EXPORT - open a file with a specific format to transferinformation from the Teradata RDBMS.

FOOTING- specify a footer to appear at the bottom of every page of a report.

Page 8: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 8/49

Starting and Exiting BTEQLogging On in Interactive Mode

1. Enter the LOGON command as follows:.LOGON tdpid/userid 

 2. Enter your RDBMS password:

Password: ____________

Logging On in Batch Mode

Submit the LOGON command in an input file, including thepassword, as follows:

.LOGON tdpid/userid, password 

Logging Off the Teradata RDBMS / Exiting BTEQ

Enter the LOGOFF command at the BTEQ command prompt:.LOGOFF

Enter either the EXIT or QUIT command:

.EXIT Or .QUIT

Page 9: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 9/49

Request Types

There are two types of Teradata SQL requests. A single-statement request is a single Teradata SQL statement sentas a request. A multistatement request is two or morestatements that are sent as a request.

Single-Statement Example

BTEQ submits the following statements to the TeradataRDBMS as three singlestatement requests:

SELECT * FROM Employee;

DELETE FROM Employee WHERE Name =

Ramesh AND Empno = 10014;SELECT Name FROM Employee;

Page 10: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 10/49

Multistatement Example

To submit the same three statements as a multistatement

request, enter:SELECT * FROM Employee

; DELETE FROM Employee WHERE Name = Ramesh AND

Empno = 10014; SELECT Name FROM Employee;

BTEQ does not submit any of the statements to theTeradata RDBMS until it encounters a semicolon as the last

nonblank character of a line. At that time, BTEQ sends allof the statements to the Teradata RDBMS for processing asone single request.

Page 11: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 11/49

Handling ErrorsBTEQ error handling involves these elements:

Teradata RDBMS error codes

BTEQ return codes

Error severity levels

Maximum errorlevelStored procedure compilation errors.

Page 12: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 12/49

Transactions in Teradata(BTEQ) Mode

Often in Teradata we'll see multiple queries within the

same transaction. We can use the BT/ET keywords tobundle several queries into one transaction.

For example

BT;

Update empTableSET lastname=kumar where firstname=raj;

SELECT * from empTable where firstname=raj;

ET;

Make sure that your syntax is correct whenusing the method of BT and ET because a mistake causes amassive rollback.

Page 13: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 13/49

Transactions in ANSI Mode

To change to ANSI mode, simply type '.set session

transaction ANSI' and be sure to do it before you actually logon to BTEQ. All queries in ANSI mode will also work inTeradata mode and vice versa.

 A ll transactions must be committed by the user

actually using the word 'COMMIT'. Also, in ANSI mode after any 

DDL statement (CREATE, DROP, ALTER,DATABASE) wehave to use the 'commit' command immediately.

 ANSI mode is great because when you bundle severalqueries into one transaction and one doesn't work, the rest won't be rolled back to their original state.

Page 14: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 14/49

For example

.set session transaction ansi

.logon Tclass/sql01

password: *****

**** logon successfully completedUpdate empTable

SET lastname=kumar where firstname=raj;

SELECT * from empTable where firstname=raj;

SELECT * from empXXX;

COMMIT;

Page 15: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 15/49

FastLoad

  Fastload is used for loading large amount of data into teradata tables

Only one table can be loaded per job

Target table must be empty and have no secondary indexes

Full Restart capability 

It doesnt load duplicate records even if the target table is a multiset table

Page 16: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 16/49

Phase 1 FastLoad uses one SQL session to define AMP steps. AMPs hash each record and redistribute them to the AMP responsible forthe hash value.

The PE sends a block to each AMP whichstores blocks of unsorted data records.

Phase 2 Each AMP sorts the target table, puts the rows into blocks, and writes theblocks to disk. Fallback rows are then generated if required.

TwoPhases of FastLoad

Page 17: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 17/49

Error Tables

Error Table 1

Contains one row for each row which failed to be

loaded due to constraint violations or translation errors.

Error Table 2Captures any error which is related to duplication of 

values for Unique Primary Indexes (UPI). Fastload will

capture only one occurrence of the value and store the

duplicate occurrence in the second error table. However if the entire row is duplicated then Fastload count it but does

not store the row.

Page 18: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 18/49

FastLoad Commands

Page 19: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 19/49

FastLoad Commands«

Page 20: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 20/49

/* Number of Sessions */SESSIONS 4;/* Maximum Number of Errors allowed to occur */ERRLIMIT 50;

.logon 127.0.0.1/DBC,DBC;

/* Dropping Error Tables */DROP TABLE TRAINING.DEPTERR1;DROP TABLE TRAINING.DEPTERR2;

DELETE FROM TRAINING.DEPT;

/*DEFINE FILE =C:\DEPT.TXT;SHOW;*/

BEGIN LOADING TRAINING.DEPT ERRORFILES TRAINING.DEPTERR1, TRAINING.DEPTERR2;

/* Specifying the Type of File */SET RECORD VARTEXT " ";

/* Defining the Columns in Flat File Format */

DEFINEDEPT_NO (VARCHAR(20)),DEPT_NAME (VARCHAR(50)),

/* Loading from Input File */FILE =C:\Desktop\Sample_fload.txt;

/* Inserting Rows into the Table*/INSERT INTO TRAINING.DEPT

 VALUES(:DEPT_NO,:DEPT_NAME);

.END LOADING;

.LOGOFF;

FastLoad Script

$ fastload < emp101.fl

Page 21: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 21/49

Limitations

� If an AMP goes down, FastLoad cannot be restarted until it is backonline

� Concatenation of Input Data Files are not allowed.

� NO SECONDARY INDEXES ARE ALLOWED ON TARGET TABLE ±

Fastload can load tables only with primary indexes defined on it. If we have

a secondary index on the table then Fastload will not load that table.� NO REFERENTIAL INTEGRITY IS ALLOWED ± Fastload cannot load

data into tables that are defined with Referential Integrity (RI). This would

require too much system checking to prevent referential constraints to a

different table

� DUPLICATE ROWS ARE NOT SUPPORTED ± Multiset tables are a table

that allow duplicate rows ² that is when the values in every column areidentical. When Fastload finds duplicate rows, they are discarded. While

Fastload can load data into a multi-set table, Fastload will not load duplicate

rows into a multi-set table because Fastload discards duplicate rows

Page 22: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 22/49

Page 23: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 23/49

FastExport Steps

Page 24: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 24/49

FastExport Commands

Page 25: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 25/49

FastExport Commands«

Page 26: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 26/49

Sample Fast Export Script

.logtable RestartLog;

.run file logon;

.begin export sessions 12;

.export outfile dataout;

select * from Customer;

.end export;

.logoff;

Define Restart Log

Specify sessions.

Destination file.

Send request

Terminate sessions

Page 27: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 27/49

Invoking FastExport

Page 28: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 28/49

Invoking FastExport«

Page 29: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 29/49

 ET table

.ml script

.dat (input) files

EMP101

error tables

data table

.log file

 UV table

MultiLoad

  WK table

 LG table

 work table

restart log table

 MultiLoad has many features that make it appealingfor maintaining large tables:

Support for up to five tables per script.

Tables may contain pre-existing data, but cannot haveUnique Secondary Indexes nor can it have ReferentialIntegrity.

It can be used to do fast, high-volume maintenance onmultiple tables and views.

Each Multiload import task can perform multipleINSERTs, UPDATEs, DELETEs and UPSERTs (UPDATE if exists, else INSERT) on up to five different tables or views.

Each Multiload delete task can remove large numbers of rows from a single table.

Full Restart capability using a Log file

Page 30: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 30/49

MultiLoad

MultiLoad

Table ATable A

Table BTable B

Table CTable C

Table DTable D

Table ETable E

updateupdate

DeleteDelete

InsertInsert

Host

Server 

Page 31: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 31/49

MultiLoad Tasks

MultiLoad allows INSERT, UPDATE, DELETE and UPSERT

operations against up to five target tables per task.

Two distinct tasks are:

IMPORT task:

These are the tasks which intermix a number of differentSQL/DML statements and apply them to up to five different

tables depending on the APPLY conditions

DELETE task:

These are tasks which execute a single DELETE

statement on a single table.

Page 32: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 32/49

5 Phases in Multiload

Page 33: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 33/49

5 Phases in Multiload«

Page 34: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 34/49

Tables in MultiLoadMload uses 2 Error tables(ET,UV), 1 Work table and 1 log table.

1.ET Table- Data Error a. Also called as ACQUISITION PHASE ERROR TABLE

b. Is used to store data errors found during the acquisition phase of a

multiload import task

2. UV Table- UPI Violations

a. Also called as APPLICATION PHASE ERROR TABLE

b. Is used to store data errors found during the application phase of a

multiload import or delete task

3. Work Table-WT

a. MLOAD loads the selected records in the work table

4. Log Table

a. Maintains records of all checkpoints related to the load job, it is

essential / mandatory to specify a log table in mload job.

b. This table will be useful in case you have a job abort or restart due to

any reason

Page 35: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 35/49

MultiLoad Commands

Page 36: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 36/49

MultiLoad Commands«

Page 37: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 37/49

Page 38: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 38/49

Sample MultiLoad Script

.LOGTABLE dwlogtable;

.LOGON tdp1/etltoolsinfo,dwpwd1;

.begin import mload tables customers;

.layout custlayout; .field ID 1 INTEGER;

.field CUST_ID * VARCHAR(6);

.field CUST_NAME * VARCHAR(30);

.field CUST_GROUP * VARCHAR(30);

.field CUST_SEGMENT * VARCHAR(10);

.field CUST_COUNTRY_ID * VARCHAR(3);

.dml label custdml;insert into customers.*;

.import infile /dw/input/Dwh_cust_extract.txt format

VARtext ';'layout custlayout

apply custdml;.end mload;

.logoff;$ mload < load_cust_extract.mload

Page 39: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 39/49

Invoking Multiload

Page 40: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 40/49

Invoking Multiload

Page 41: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 41/49

Restarting a Multiload job

Restarting M LTILOAD:

Multiload can be paused due to errors encountered during the job:1.MultiLoad will check the Restart Logtable and automatically resumethe load process from the last successful CHECKPOINT before the failureoccurred.

2.Suppose Teradata experiences a reset while MultiLoad is running. Inthis case, the host program will restart MultiLoad after Teradata is back up and

running without user interaction.3.If a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may simply resubmit the script without changing any thing.MultiLoad will find out where it stopped and startagain from that very spot.

4.If Mload failed in the Acquisition phase just rerun the job.5.If Mload failed in Application Phase then either you can restart it by 

simply resubmitting the job again and it will be fine and start from thecheckpoint, of the last block that have updated to disk but if you don't want torestart it then you need to drop the two error tables one work table one log tableand release mload from the target table.

Page 42: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 42/49

MULTILOAD will creates 2 error tables, 1 work table . When MULTILOAD fails in Acquisition phase, we have to unlock the MainTable as

RELEA SE MLOAD <TABLE NAME>;

 When MULTILOAD fails in Application phase , we have to unlock the Main

Table as

RELEA SE MLOAD <Table Name> .IN A PPLY ;

Limitation:1.The RELEASE MLOAD command is used to release the locks and rollback

the job. But if you have been loading multiple millions of rows, the rollbackmay take a lot of time. For this reason, most customers would rather just goahead and RESTART.2. Should be very cautious using the RELEASE command. It could potentially leave your table half updated

Releasing a Multiload job

Page 43: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 43/49

MultiLoad Limitations

� MultiLoad Utility doesn¶t support SELECT statement.� Concatenation of multiple input data files is not allowed.

� MultiLoad doesn¶t support Arithmetic Functions i.e. ABS,

LOG etc. in Mload Script.

� MultiLoad doesn¶t support Exponentiation and

 Aggregator Operators i.e. AVG, SUM etc. in Mload Script.

� MultiLoad doesn¶t support USIs (Unique Secondary

Indexes), Referential Integrity, Join Indexes, Hash

Indexes and Triggers.

� Import tasks require use of Primary Index

Page 44: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 44/49

� Allows near real-time updates from

transactional systems into the wearhouse

� Performs INSERT, UPDATE, and DELETEoperations, or a combination, to more than 60tables at a time from the same source

� Alternative to MultiLoad for low-volume batchmaintenance of large databases

� Allows target tables to:

- Have secondary indexes and Referential Integrity

constraints

- Be MULISET or SET

- Be populated or empty

TPump

Page 45: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 45/49

Page 46: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 46/49

Page 47: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 47/49

� MultiLoad performance improves as the volume

of changes increases.

� TPump does better on relatively low volumes of changes

� TPump uses macros to modify tables rather thanactual DML commands

� MultiLoad uses the DML statements.

� TPump uses row hash locking to allow for 

concurrent read and write access to target tables.� MultiLoad locks tables for write access (Phase 2)

until it completes

TPump vs MultiLoad

Page 48: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 48/49

Choosing best utilities

Page 49: Presentation 1 (28 Apr) New

8/6/2019 Presentation 1 (28 Apr) New

http://slidepdf.com/reader/full/presentation-1-28-apr-new 49/49