table of contents - informatica pc... · table of contents introduction ... of the checked-in...
TRANSCRIPT
Table of Contents
Introduction .................................................................................................................................................. 2
About the tool ............................................................................................................................................... 2
Scope ......................................................................................................................................................... 2
Components .............................................................................................................................................. 4
Report Structure ....................................................................................................................................... 4
Limitations ................................................................................................................................................ 5
How to Use .................................................................................................................................................... 5
Installation ................................................................................................................................................ 5
Configuration and Execution .................................................................................................................... 5
Oracle .................................................................................................................................................... 6
SQL Server ............................................................................................................................................. 6
Important Points to Remember .................................................................................................................... 7
Appendix ....................................................................................................................................................... 8
A. Step by step process flow for installation ......................................................................................... 8
Oracle .................................................................................................................................................... 8
SQL Server ............................................................................................................................................. 9
B. Naming Convention ........................................................................................................................ 10
C. Procedures and their function ........................................................................................................ 11
D. Sample Review Reports .................................................................................................................. 14
Informatica Code Review Tool
Introduction
Manual review of Informatica PowerCenter code takes a considerable amount of time and
effort. Even then, it leaves room for human error and has an impact on project deadlines.
The purpose of this tool is to reduce the time taken for review and to increase the code
review quality and effectiveness while automating much of the process.
About the tool
The Informatica Code Review Tool reviews Informatica code against a set of widely accepted
guidelines. The tool is a database dependent tool and as such is specific to the repository
database. The tool in its current version can review code of Informatica Powercenter
repositories on Oracle or SQL server databases. The tool consists of two sets of code- one
for each of type of database. Each set consists of a Powercenter workflow, a template
entity list (to be used as the source file for the workflow), one installation batch file and a
set of database procedures.
The Powercenter mapping takes in a list of entities to be reviewed and calls the wrapper
review procedure for each entity. These procedures generate the review reports in a path
defined by the user. The entities in the entity-list can be folders, workflows or mappings.
The tool reviews only the latest checked-in version of the code. The procedures employed to
review the code are based on Informatica‟s Metadata Exchange (MX) views and hence are
not Powercenter version specific.
Scope
The tool reviews the code on following standards:
All sub-entities within a Powercenter entity should have a description. This includes
transformations, mappings, tasks and workflows.
Two similar transformations shouldn‟t be present one after the other in a mapping.
This applies to Expression, Transaction Control, Update Strategy, Aggregator, Filter,
Router and Sorter transformations.
The ports within an expression and an aggregator should be in the order: Input-
Input/Output-Variable-Output.
All the filters present in the mapping should have a valid filter condition and not have
a default value (TRUE).
There shouldn‟t be a joiner after two homogenous source qualifiers on the same
database. Neither should we have a filter right after a source qualifier that reads
from a database.
The Override Tracing of a session should be „Normal‟ or „Terse‟.
The data type, the precision and the scale of the ports should be kept homogenous
across the mapping.
Aggregators and joiners should have sorted input. This helps improve the
performance of the workflow.
Stop on Errors for a session should be set to 1, so that no errors are skipped.
The transformations, mappings, command tasks, sessions and workflows should be
named as per the accepted standards. E.g. The name of a mapping should start with
„m_‟. The complete list of naming standards that are checked by the tool is listed in
Appendix A.
All the output and input-output ports in an expression and an aggregator should be
linked to at least one port downstream.
All the input and variable ports in an expression and in an aggregator should be used
in the expression of at least one port.
All the unconnected lookups should be called at least once in the mapping.
All the transformations in a mapping should have a tracing level of „Normal‟ or „Terse‟
to avoid creating large session logs in production environment.
The session and workflow log directory should be parameterized.
Additional concurrent pipelines for lookup cache creation should not be disabled.
The tool also lists all the source qualifiers that have an SQL override.
The tool checks to ensure that every link between tasks in a workflow contains a
condition.
In addition, a summary report is also generated. The details contained in the summary
report are described in the SUMMARY_REPORT procedure description. A sample
summary report is attached in the Appendix C.
Components
The deployment package contains two sets of code. One set to review Oracle based
Informatica Powercenter and the set that uses SQL Server as database. The two sets are
almost identical but have a few minor differences.
Each set consists of following entities:
One Informatica Powercenter Workflow xml which should be imported using the
repository manager. It contains a mapping that processes the entity list and a
workflow associated with the mapping.
A batch file which creates the review procedures in the database.
SQL procedures that review the entities.
Entity List template. This template is specific to the database being used for the
Powercenter Repository.
The batch file is used to create the procedures in the respective databases. The installation
process is explained in detail in a later section.
Report Structure
The tool generates three types of reports. A summary report for folders and reports for
each of the workflows and mappings within the folder. The tool generates text reports. The
reports follow the naming convention:
SummaryReport_<Folder name>.txt – Summary report for folders.
<Folder name>_<Workflow name>.txt - For workflows
<Folder name>_<Mapping name>.txt - For mappings
Folder name is the name of the folder in which the workflow/mapping is present.
If a folder is reviewed, the tool creates one report for every workflow and one report for
every mapping within those workflows. The mappings that are not called in any of the
workflows are not reviewed.
The review comments in the report are mentioned below appropriate headings. Two sample
review reports are attached in the section Appendix C.
Limitations
1. The tool can handle Informatica Powercenter installation on Oracle and SQL server
databases only. Installations on any other database can‟t be reviewed.
2. The mappings and the sessions that are not used in any of the workflows are not
reviewed in case of a folder level review. The mappings however can be reviewed
independently.
3. A session can‟t be reviewed independently.
4. The tool reviews only the latest checked in version. If an entity is checked-out, then
the last checked-in version is reviewed.
5. The code appends the output in the review report. So, the user should make sure
that there are no existing reports in the path where the reports are to be generated.
How to Use
Installation
The installation package consists of a zipped file. Steps to install the code packages for both
DB types are described in detail in Appendix A.
Configuration and Execution
Once the package has been installed, the user has to edit some of the session properties to
suit their environment.
The source file and the target files used in the mappings are flat files and hence the user
needs to either alter these as per his/her requirements or needs to make sure that the
respective default paths are available.
A user needs to be created/used with a default schema of that used to store the MX views
for the Informatica Powercenter Repository on the server and use this connection for the
Stored Procedure in the mapping.
Configuration of the Entity File
The entity file lists present in each of the folders act as the source for the Powercenter
workflows. The entity file lists are different for the two different databases. The template
for each type is present in the respective folders. They are described in detail below:
Oracle
The entity file is a comma separated value file and consists of three columns :
Folder_Name, Entity_Name, Entity_Type.
A sample entity file would look like :
Folder_Name,Entity_Name,Entity_Type
Folder1,Mapping1,mapping
Folder2,workflow1,workflow
Folder3,Folder3,folder
The first line should always contain the name of the columns.
This input will review 1 workflow, 1 mapping and 1 folder. The entries in the file are not
case sensitive.
SQL Server
The entity file is a comma separated value file and has four columns
Entity_Name,Entity_type,Folder_name,Path
A sample entity file would look like:
Entity_Name,Entity_Type, Folder_Name, Path
Mapping1,mapping, Folder1,C:\Review
workflow1,workflow Folder2, C:\Review
Folder3,folder , Folder3, C:\Review
The path in the source file is the directory where the review reports are generated and
hence should be accessible to Powercenter as well as SQL server.
The first line should always contain the column names. The entries in the file are not case
sensitive.
Important Points to Remember
The source file (entity file list) can have three types of entities- folder, workflow,
mapping. If the entity type is folder it reviews all the workflows within the folder and
all the mappings associated with the workflows. If the entity type is workflow, the
tool review the workflow as well as all the sessions and mappings associated with the
workflow. If the entity type is mapping it just reviews the listed mapping.
The review reports for Oracle installation are generated at the same path as defined
by the user while executing the Install_Oracle.bat batch file. For SQL server
installations the review reports are generated for each entity as specified in the
source file.
The procedures are based upon Informatica MX views which contain information
about only the checked in entities. So, the user should make sure that the entities
that are to be reviewed are checked in before the Informatica job is kicked off.
For Oracle code, the review directory created should be accessible to Oracle server.
The review directory being created should be present beforehand for both the code
sets.
The entity list used as source in the Informatica job can only handle three entity
types: Workflow, Mapping and Folder.
Informatica Repository manager should be used to import the workflow XML from the
installer package.
The user for the stored procedure should have “select” access on the MX views.
The source file directory and target file directory should be present before the
execution of the workflow.
The entries in the entity file are not case-sensitive and hence can be entered in any
case.
The output file can have two messages:
a. “<Entity type> is incorrect” which means that the either the entity type or the
entity name is incorrect
b. “Processed” which means successful execution of the Wrapper_Review
procedure.
When a folder is reviewed, only those mappings are reviewed which are called in one
of the checked-in workflows.
For Oracle code, the Oracle username used during installation should have “create
directory” privilege.
The user provided while installing the SQL server version should have select access
on the Informatica MX views.
Appendix
A. Step by step process flow for installation
As described earlier the code contains two set of procedures. But, the user needs to install
only one of the sets that is applicable for the installation. If the Informatica Powercenter
installation is done on a Oracle database then the user needs to install the Oracle set only.
The installation method for both the sets is described below.
Oracle
Follow the following steps for installation of the code review tool for Oracle:
1. Extract the zip package. The package will extract into a folder called Code_Review.
Code_Review folder further contains two folders – Oracle and SQLServer.
2. The „Oracle‟ folder contains a batch file – Install_Oracle.bat file, a sql file Install.sql
and a Informatica powercenter workflow-wf_s_m_QualityReview.XML
3. For installation, open the command prompt. Change directory to the folder where the
the Install_Oracle.bat is present.
4. Execute the Install_Oracle.bat file from the command prompt. For Oracle installation,
the batch file expects 4 parameters, in sequence- DB Username, DB password, DB
connect string/TNS name, server path where the review reports are to be generated.
The path should be accessible to Oracle as well. Ex- If the username that has select
access on the MX views is Usr, password – Usr_pwd, TNS name – Ora11, server path
where report is to be generated – C:\ReviewReport, then the command would look
like
Install_Oracle.bat Usr Usr_pwd Ora11 C:\ReviewReport
5. The file should execute two sql files – DIR_REPORT.sql(generated by
Install_Oracle.bat) and Install.sql. Both these files should successfully execute.
6. Import the workflow xml wf_s_m_QualityReview.XML using Informatica repository
manager.
7. Change the source path, target path and the connection string for the stored
procedure in workflow as per the requirements.
8. The installation is done and the tool is ready for use.
SQL Server
Follow the following steps for installation of the code review tool for SQL server:
1. Extract the zip package. The package will extract into a folder called Code_Review.
Code_Review folder further contains two folders – Oracle and SQLServer.
2. The „SQLServer‟ folder contains a batch file – Install_SQL.bat file and a Informatica
powercenter workflow-wf_s_m_QualityReview.XML
3. For installation, open the command prompt. Change directory to the folder where the
the Install_SQL.bat is present.
4. Execute the Install_SQL.bat file from the command prompt. For SQL Server
installation, the batch file expects 4 parameters, in sequence- DB schema, Connect
String, LoginId and password. The DB schema to be provided here should have
select access on MX views. Ex – If DB schema is InfaDev, Connect string is
sqlserver.test.com\SQLSRVR, LoginId= InfaUsr and Password – InfaUsr Pwd then the
command would look like :
Install_SQL.bat InfaDev sqlserver.test.com\SQLSRVR InfaUsr InfaUsrPwd
5. The file should execute all the stored procedure SQL files. All these files should
successfully execute.
6. Import the workflow xml wf_s_m_QualityReview.XML using Informatica repository
manager.
7. Change the source path, target path and the connection string for the stored
procedure in workflow as per the requirements.
8. The installation is done and the tool is ready for use.
B. Naming Convention
The Informatica code is evaluated against the following naming convention:
Entity Type Name should start with
Workflow wf_
Session s_
Command Task cmd_
Mapping m_
Source Qualifier sq_
Transaction Control tct_
Stored Procedure sp_
Update Strategy upd_
Expression exp_
Joiner jnr_
Aggregator agg_
Lookup lkp_
Filter fil_
Router rtr_
Mapplet mplt_
Sequence Generator seq_
Sorter srt_
Rank rnk_
SQL sql_
Union un_
C. Procedures and their function
We have two sets of procedures for the purpose of code review. One set for Oracle
installations of Informatica while the other set is for the Informatica installations that use
SQL server as database. The procedures have same name and functionality in both the sets.
All the procedures and a short description for each is listed below:
PRC_WRAPPER_REVIEW: The procedure accepts the Entity_Type, Entity_Name and
Folder_name as input and calls one of the three procedures (PRC_FOLDER_REVIEW,
PRC_WORKFLOW_REVIEW or PRC_MAPPING_REVIEW) based upon the entity type. As the
name suggests, if the entity type is a folder, this calls PRC_FOLDER_REVIEW procedure and
so on.
PRC_FOLDER_REVIEW: This procedure is called by PRC_WRAPPER_REVIEW procedure
when the entity type is „Folder‟. This procedure lists all the checked-in workflows in the
folder and then calls the PRC_WORKFLOW_REVIEW procedure for each of these workflows.
SUMMARY_REPORT: This procedure is used to generate a summary report for the folder.
This summary report is only generated for folder reviews. This procedure calls the following
procedures: PRC_SUMREP_AGG_LKP_DEF_MEM, PRC_SUMREP_COMPLEX_MAPS,
PRC_SUMREP_MAP_TASK_DESC, PRC_SUMREP_PORT_TYPE_PREC_MIS,
PRC_SUMREP_PORTS_DEF_VALS, PRC_SUMREP_SESS_CONCUR_LKP,
PRC_SUMREP_SESS_LOG_DIR, PRC_SUMREP_SESS_OVERRIDE,
PRC_SUMREP_SQ_OVERRIDE, PRC_SUMREP_TRANS_DESC, PRC_SUMREP_WFLOW_DESC,
PRC_SUMREP_WFLOW_LOG.
The following checks are reported in the summary report:
a. The memory values for aggregator and lookup is defaulted to Auto
b. The number of complex mappings in a folder
c. The mappings, sessions and workflows having no description
d. The ports with mismatching data type and precision from one transformation to
another
e. The ports with default or no default values
f. Sessions with no additional concurrent lookup threads
g. Session and workflows with hardcoded log directories
h. Source qualifiers with source qualifier overrides
i. Sessions with override tracing other than terse or normal
j. Transformations with no description
k. Workflows and sessions with no description
The summary report details the results in terms of percentages in most of the scenarios and
rates the various aspects of the code in the folder. This procedure is called from the
PRC_WRAPPER_REVIEW when the entity type to be reviewed is a folder.
PRC_WORKFLOW_REVIEW: This procedure is called by PRC_FOLDER_REVIEW procedure
as indicated in the previous section. This workflow calls the individual procedures that check
for various quality standards pertaining to the sessions within the workflow as well as
performs the quality checks at the workflow level. This procedure also lists all the mappings
called by the sessions contained within it and calls the PRC_MAPPING_REVIEW procedure
for quality review of each of these mappings.
PRC_MAPPING_REVIEW: This procedure is called by the PRC_WORKFLOW_REVIEW
procedure for each of the mappings associated with the workflow. This procedure performs
the mapping level checks and calls procedures that review the mapping with respect to
individual quality standards.
PRC_WF_COMMENTS: This is a workflow level procedure which checks if all the tasks
contain within a workflow have description or not. This procedure is called by the procedure
PRC_WORKFLOW_REVIEW.
PRC_STOP_ON_ERRS: This is a workflow level procedure which checks whether the „Stop
on Errors‟ property of all the session within the workflow is set to 0. This procedure is called
by the procedure PRC_WORKFLOW_REVIEW.
PRC_OVERRIDE_TRACING: This is a workflow level procedure which checks whether the
Override Tracing for all the sessions within the workflow is set to Normal. This procedure is
called by the procedure PRC_WORKFLOW_REVIEW.
PRC_WF_LINK_COND: This is a workflow level procedure which checks if all the links
present in the workflow have some condition attached to it or not. This procedure is called
by the procedure PRC_WORKFLOW_REVIEW.
PRC_SESSION_LOG_DIRECTORY: This is a workflow level procedure. It checks for
hardcoded session and workflow log directories. This procedure is called by the procedure
PRC_WORKFLOW_REVIEW.
PRC_SESS_CONCUR_LKP: This is a workflow level procedure. It checks for concurrent
lookup builds for all the sessions in the workflow. This procedure is called by the procedure
PRC_WORKFLOW_REVIEW.
PRC_TRANS_COMMENTS: This is a mapping level procedure which checks for description
for all the transformations in the mapping. This is called by the procedure
PRC_MAPPING_REVIEW.
PRC_SORTED_INP_AGGJNR: This is a mapping level procedure which checks if all the
aggregators and joiners in the mapping have sorted input property checked. This is called
by the procedure PRC_MAPPING_REVIEW.
PRC_FILTER_COND: This is a mapping level procedure which checks if all the filters within
the mapping have valid filter conditions and aren‟t defaulted to NULL, so that they have no
real functionality and are just used a pass-through. This is called by the procedure
PRC_MAPPING_REVIEW.
PRC_JNR_FIL_POST_SOURCE: This is a mapping level procedure which raises a warning
if a filter or a joiner is used right after a source qualifier that caters to a database and hence
the function of the filter or the joiner could have been pushed to the source qualifier itself
without bringing in any excess data into the mapping. This is called by the procedure
PRC_MAPPING_REVIEW.
PRC_UNLINKED_OUTPUT_PORTS: This is a mapping level procedure which checks if all
the output and input-output ports in the expression and aggregator are linked to one or
more ports downstream. This helps reduce the redundant code within the mapping. This is
called by the procedure PRC_MAPPING_REVIEW.
PRC_CONSECUTIVE_TRANS: This is a mapping level procedure which checks for presence
of two consecutive transformations of same type, the functionality of which could have been
handled by just one. The transformations which the procedure would report are Expression,
Transaction Control, Update Strategy, Aggregator, Filter, Router and Sorter. This is called
by the procedure PRC_MAPPING_REVIEW.
PRC_UNUSED_UNCONN_LKP: This is a mapping level procedure which checks whether all
the unconnected lookups within the mapping are called in at least one of the expression in
the mapping. This is called by the procedure PRC_MAPPING_REVIEW.
PRC_UNUSED_INP_VAR_PORTS: This is a mapping level procedure which checks for
unused input and output ports within all transformations of the mapping. This is called by
the procedure PRC_MAPPING_REVIEW.
PRC_TRANSFORMATION_NAMING_CONV: This is a mapping level procedure which
checks for the conformity of transformation names with the naming standards as detailed in
Appendix B. This is called by the procedure PRC_MAPPING_REVIEW.
PRC_EXP_PORT_ORDER: This is a mapping level procedure which checks for the order of
the port within a transformation. The ports should follow the order Input-Input/Output-
Variable-Output. This is called by the procedure PRC_MAPPING_REVIEW.
PRC_PORT_PROP: This is a mapping level procedure which checks if a port‟s data type,
precision or scale changes from one transformation to other and reports if there is a
change. This is called by the procedure PRC_MAPPING_REVIEW.
PRC_TRANS_TRACING_LEVEL: This is a mapping level procedure which checks for the
Tracing Level for all the transformations within a mapping. The procedure lists the
transformations within the mapping that have a tracing level greater than „Normal‟. This is
called by the procedure PRC_MAPPING_REVIEW.
PRC_LKP_AGG_DEF_MEM: This is a mapping level procedure which checks for the
memory value settings for aggregators and lookups within a mapping. This procedure lists
all the aggregators and lookups which have default „Auto‟ value. This is called by the
procedure PRC_MAPPING_REVIEW.
PRC_SQ_OVERRIDE: This is a mapping level procedure which checks for source qualifier
overrides and list all the source qualifiers with overrides in the report generated. This
procedure is called by PRC_MAPPING_REVIEW.
D. Sample Review Reports
Sample summary report for folder:
Sample workflow review report:
Sample mapping review report: